thohemp / 6drepnet Goto Github PK
View Code? Open in Web Editor NEWOfficial Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.
License: MIT License
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.
License: MIT License
Hey,
I can not reproduce Your results on the BIWI Dataset.
Im comparing X Y Z angles obtained from the ground truth rotation matrix transformed by extrinsic calibration with -Pitch, Yaw and Roll repsectively.
Im using Your pip package. I crop face with Retina Face detector as You do in demo.py and pass it to the model.predict() function. I instantiate model without any parameters, so the path to the weights are default.
I have spotted one difference. In the Readme You wirte
The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi.py). For 7:3 splitting of the BIWI dataset you can use the equivalent script [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi_70_30.py). We set the cropped image size to 256.
however, in the model.predict() the crop is resized to 244 (which i believe is longer edge of the picture, the shorter is then scaled with an appropriate ratio). Is it desired?
I can not find more differences, but mean error is about 25 on X and Y and about 8 on Z.
Can You help me figure it out?
Best,
Jan
First, thanks for this great work and for sharing your code.
In the running example,
img = cv2.imread('/path/to/image.jpg')
pitch, yaw, roll = model.predict(img)
the image is loaded as a BGR numpy array, as it is the default mode of OpenCV. However, I think the model training has been done using RGB numpy arrays as the images were opened using PIL.Image.Open. Thus, I am wondering if we should convert the BGR arrays into RGB arrays before using them as input for the model.
Would you mind clarifying this?
Hi,
Thanks for your impressive paper and code. I tried this repo to reproduce this performance, I followed all instruction and trained on300w-lp use train.py without change any parameters, then evaluate on AFLW2000 using test.py and results as below:
me: Yaw: 3.9897, Pitch: 5.0923, Roll: 3.6405, MAE: 4.2408
yours: Yaw: 3.63 , Pitch: 4.91 Roll: 3.37 , MAE: 3.97
Is there any other tricks or changes should be apply for reproduce your results?
Hi, thank you for this great work, but I've met some trouble... I trained the model with my own data, and got MAE 4.05 on validation set. However when I test the trained model combined with Retinaface on same images(uncropped), the MAE turns to be more than 20. What's the possible reason?
I get the following error when I try to run the SixDRepNet() model on an image.
File "/Users/gurpreetmukker/Desktop/face_detection/face_detection/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled
Thanks
Hi, could I achieve the 3d head position?
I use the code of demo.py and test the video in the demo. The results below look wired. I also test other videos and a common problem is that the results have lots of jitters.
https://github.com/thohemp/6DRepNet/assets/39046939/740edd75-7565-4bfa-a82b-0303b7ee9bde
Are there any tricks here?
My result:
MAE: 5.0
Pitch: 5.8
Yaw: 4.9
Roll: 4.4
I am not an expert on a scheduler, but I tried to understand it through this one.
Why did you set up the scheduler as MultiStepLR=False?
I guess that is meaningless if we set up the scheduler as MultiStepLR=False.
Please, explain to me if I understood it wrong.
Hello, I assume this model is trained with the camera reference system is that correct ( and I suppose it is left handed, y-down)?
If this is the case, let's say I have a rotation matrix of a calibrated camera, let's call it Rc. Can I use Rc*Rd, where Rd will be the model's rotation matrix to register the head pose to the 3D space? Should I use Rw(Rc.T) instead of Rc for example?
Have you tried something similar? My world reference system is right handed y up btw.
Thanks for your good job.
I try to test and train the 6DRepNet model, and find some issue.
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
transformations = transforms.Compose([transforms.Resize(240),
transforms.RandomCrop(224),
transforms.ToTensor(),
normalize])
Preprocess code in demo.py
img = frame[y_min:y_max,x_min:x_max]
# cv2.imshow("crop", img)
# cv2.waitKey(5)
img = cv2.resize(img, (244, 244))/255.0
img = img.transpose(2, 0, 1)
img = torch.from_numpy(img).type(torch.FloatTensor)
img = torch.Tensor(img).cuda(gpu)
normalize and input size are different.
Just use demo.py code to test 9 faces with one image, output are wrong.
9 faces have same yaw, row and pitch valudes.
and I also test the Fine-trained models from here, the pose values look well.
So what are the difference between pre-trained RepVGG model and Fine-trained models?
about the 3d point accuracy and speed?
I have the following problem while running the training code, can you help me?Thanks very much!
Traceback (most recent call last):
File "/home/zelong/D/testdemo/6DRepNet/train.py", line 112, in
model = SixDRepNet(backbone_name='RepVGG-B1g2',
File "/home/zelong/D/testdemo/6DRepNet/model.py", line 19, in init
checkpoint = torch.load(backbone_file)
File "/home/zelong/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 594, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/zelong/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/zelong/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'RepVGG-B1g2-train.pth'
currently runs on CPU is very slow
I tried to check your results.
When testing your fine_tuned model (6DRepNet_300W_LP_AFLW2000.pth) I got the same your results for AFLW2000 data but when testing the same fine_tuned model using BIWI data. I got this result:
Yaw: 3.5631, Pitch: 5.5943, Roll: 2.9061, MAE: 4.0212
It is not the same as your result?
3.24 | 4.48 | 2.68 | 3.47
what is my wrong?
Hi,@thohemp
and it will cover the local torch,even it is cuda and latest version
# Import SixDRepNet
from sixdrepnet import SixDRepNet
import cv2
# Create model
# Weights are automatically downloaded
model = SixDRepNet()
img = cv2.imread('/path/to/image.jpg')
pitch, yaw, roll = model.predict(img)
model.draw_axis(img, yaw, pitch, roll)
cv2.imshow("test_window", img)
cv2.waitKey(0)
Does this code measure the roll, pitch, and yaw of the entire image without a face detection model?
Will it be more accurate if I add a cropped image of the face to the img?
please tell me where 'output/snapshots/1.pth' in test.py
Hello, first of all, thank you very much for your great work, but now I have some problems. I used the pre-training model you provided for testing, which can reproduce the results in the paper, but I trained myself to get the weights, and the test result is 10 times that of the original model (Yaw: 35.6810, Pitch: 42.4646, Roll: 24.0692, MAE: 34.0716), my data set use is the same as yours, and other parameters have not changed, may I ask what is the reason for this? Could you please answer it, thank you。
@thohemp
Hi,
I tried to train the model from scratch, seems hard to train comparable performance as training model from the pre-train model.
My question is how to train the pre-train model or how to train the similar performance from scratch?
Thanks!
SixDRepNet2 likes a simple ResNET,does it works well?
I am training your model on my own data. It arrived in 19 epochs. I would like to re-run the code to continue training, but I got this error:
Traceback (most recent call last):
File "train.py", line 124, in <module>
model = SixDRepNet(backbone_name,
File "/home/redhwan/2/HPE/RosNet/sixdrepnet/model.py", line 22, in __init__
backbone.load_state_dict(ckpt)
File "/home/redhwan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RepVGG:
Missing key(s) in state_dict: "stage0.se.down.weight", "stage0.se.down.bias", "stage0.se.up.weight", "stage0.se.up.bias", "stage0.rbr_reparam.weight",....
I replaced it RepVGG-B1g2-train.pth
by 300W_LP_epoch_19.pth
after converting. So, I change deploy=True
,
backbone_name = 'RepVGG-B1g2'
# backbone_file = 'RepVGG-D2se-200epochs-train.pth'
backbone_file = '300W_LP_epoch_19.pth'
model = SixDRepNet(backbone_name,
backbone_file,
deploy=True,
pretrained=True)
Your help, please.
Hi,
Thanks for this amazing work. I am really interested in your work. I just want to test your network on the 2 test dataset (AFLW2000 and BIWI).
I am wondering that why you provide two .pth files (6DRepNet_300W_LP_AFLW2000.pth and 6DRepNet_300W_LP_BIWI.pth ) for each specific test data, should not we just test the network with one pretrained model for both test datasets?
I am looking forward to your response,
Thanks
I see that to construct rotation matrix(R) from yaw, pitch and roll angle values, you use zyx order i.e Rz * Ry * Rx,
where Rz is rotation about z-axis, Ry is rotation about the y-axis, and Rx is rotation about the x-axis.
But for visualisation, it looks like the order you use is xyz i.e Rx * Ry * Rz and then use column vectors of this resulting matrix as axis coordinates (https://github.com/thohemp/6DRepNet/blob/master/utils.py#L54). May I know why this is done? Am I missing something?
Thanks.
Hello there!!
Thank you very much for your great work. I am really interested in your work and would like to implement it on images with full rotation appearance.
I have two questions regarding this.
1.) Is your pre-trained model trained on full-rotation-appearance datasets (-180, 180) and capable of predicting head poses on images in which faces cannot be seen?
2.) If the answer to my first question is NO, could you please guide me on which datasets I should use for finetuning the pre-trained model to learn full orientation appearance?
Thank you very much in advance for your consideration
pip3 install sixdrepnet #Works!
import sixdrepnet
I get the following error (I am currently running it on colab)
[/usr/local/lib/python3.8/dist-packages/sixdrepnet/regressor.py](https://localhost:8080/#) in <module>
8 import numpy as np
9
---> 10 from model import SixDRepNet
11 import utils
12
ModuleNotFoundError: No module named 'model'
I want to use libtorch to infer, it seems that the '.pt' format is a must for C++, how to convert '.pth' to '.pt'?
Thanks for the work! Is the pretrained model is for only face pictures? If so, is there any other pretrained model for other objects, like box, bottle, shoe etc.?
When I install the package: pip install SixDRepNet, it returns
Collecting SixDRepNet
Downloading SixDRepNet-0.1.1.tar.gz (23 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 36, in
File "", line 34, in
File "/private/var/folders/5l/9fsdwp_n1td91scw10n67zwc0000gp/T/pip-install-8xwcmbp1/sixdrepnet_2b06c43f46c5428d9c99677633d23b6e/setup.py", line 23, in
long_description="".join(open("README.MD", "r").readlines()),
FileNotFoundError: [Errno 2] No such file or directory: 'README.MD'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Excellent work! Does the pip installation package support Linux aarch64?
there is a cuda error when changing gpu to another id(except 0)
Traceback (most recent call last): File "demo.py", line 136, in <module> R_pred = model(img) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl return forward_call(*input, **kwargs) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/model.py", line 48, in forward return utils.compute_rotation_matrix_from_ortho6d(x) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 146, in compute_rotation_matrix_from_ortho6d x = normalize_vector(x_raw, use_gpu) #batch*3 File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 119, in normalize_vector v_mag = torch.max(v_mag, torch.autograd.Variable(torch.FloatTensor([1e-8]).cuda())) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!
@thohemp hello,
python convert.py 6DRepNet_300W_LP_BIWI.pth 6DRepNet_300W_LP_BIWI_depoly.pth
errors:
File "convert.py", line 35, in convert
load_filtered_state_dict(model, saved_state_dict['model_state_dict'])
KeyError: 'model_state_dict'
Hi.
Thanks for this interesting and wonderful piece of work!
I have a question about licensing, as the title says...
What would be the licence for Fine-tuned models?
Is it MIT like the codes, or is it different?
I want to use it as part of a work study, but I am not skilled in machine learning and would like to use the model as is!
I don't intend to publish, redistribute or incorporate them into products, but even if it is for research purposes, under my work rules, it is still a commercial use.
So, I would like to ask you for more information about the lisence for the models.
I look forward to response from you.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.