thohemp / 6drepnet Goto Github PK

View Code? Open in Web Editor NEW

520.0 10.0 70.0 104 KB

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

License: MIT License

Python 99.75% Shell 0.25%

head pose estimation facial analysis biwi aflw2000 orientation 6d head-pose-estimation

6drepnet's People

Contributors

Stargazers

Watchers

Forkers

liuguoyou tianhaoyue jaedukseo jinwook-shim kang-hana moileehyeji jie311 sharozhaseeb osanseviero techthiyanes ahmednull ivanqi7799 celeste-cj yaoq arsalan0004 zhangxiancai larryteal xuweigogogo aicure weizhongjin fenglian425 jxncyym zardzen thirteentj pinto0309 wepromen liuyongjie985 stippler janglinko-dac qinb hologerry neuraljiaang viliusmat leifengsoul marenan nviso alphaqi modular-ml deepak-1530 kradonneoh xueyedamo521 skyrockets-21 akonoroshi andreaskuepfer oooxxx996 fuoum olez961 spark-rtg mucunwuxian ruabliuqiu chky pdh930105 shashisingh huyhoang17 4nmos athrunsunny zardyuan liufqing dmitrycs peterzs tllewellynn1 lzn722 rhtm02 anniedde ashrafsa pjjiang0112 xiezixiustc neural-sorcerer danclas elliottzheng

6drepnet's Issues

module 'utils' has no attribute 'compute_rotation_matrix_from_ortho6d'

BIWI Dataset

Hey,
I can not reproduce Your results on the BIWI Dataset.
Im comparing X Y Z angles obtained from the ground truth rotation matrix transformed by extrinsic calibration with -Pitch, Yaw and Roll repsectively.
Im using Your pip package. I crop face with Retina Face detector as You do in demo.py and pass it to the model.predict() function. I instantiate model without any parameters, so the path to the weights are default.
I have spotted one difference. In the Readme You wirte
The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi.py). For 7:3 splitting of the BIWI dataset you can use the equivalent script [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi_70_30.py). We set the cropped image size to 256.
however, in the model.predict() the crop is resized to 244 (which i believe is longer edge of the picture, the shorter is then scaled with an appropriate ratio). Is it desired?

I can not find more differences, but mean error is about 25 on X and Y and about 8 on Z.
Can You help me figure it out?

Best,
Jan

RGB inputs or BGR inputs for model.predict(img)?

First, thanks for this great work and for sharing your code.

In the running example,

img = cv2.imread('/path/to/image.jpg')
pitch, yaw, roll = model.predict(img)

the image is loaded as a BGR numpy array, as it is the default mode of OpenCV. However, I think the model training has been done using RGB numpy arrays as the images were opened using PIL.Image.Open. Thus, I am wondering if we should convert the BGR arrays into RGB arrays before using them as input for the model.

Would you mind clarifying this?

gap with results in papers

Hi,
Thanks for your impressive paper and code. I tried this repo to reproduce this performance, I followed all instruction and trained on300w-lp use train.py without change any parameters, then evaluate on AFLW2000 using test.py and results as below:

me: Yaw: 3.9897, Pitch: 5.0923, Roll: 3.6405, MAE: 4.2408
yours: Yaw: 3.63 , Pitch: 4.91 Roll: 3.37 , MAE: 3.97

Is there any other tricks or changes should be apply for reproduce your results?

High MAE when test with face detector

Hi, thank you for this great work, but I've met some trouble... I trained the model with my own data, and got MAE 4.05 on validation set. However when I test the trained model combined with Retinaface on same images(uncropped), the MAE turns to be more than 20. What's the possible reason?

Is there a way to run this model on Apple M1 that doesn't have CUDA support?

I get the following error when I try to run the SixDRepNet() model on an image.

File "/Users/gurpreetmukker/Desktop/face_detection/face_detection/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Thanks

3d head position

Hi, could I achieve the 3d head position?

Could not achieve same results in demo

I use the code of demo.py and test the video in the demo. The results below look wired. I also test other videos and a common problem is that the results have lots of jitters.
https://github.com/thohemp/6DRepNet/assets/39046939/740edd75-7565-4bfa-a82b-0303b7ee9bde

I trained myself, but I can't achieve the effect of the paper

Are there any tricks here?

My result:

MAE: 5.0
Pitch: 5.8
Yaw: 4.9
Roll: 4.4

Why did you set up the scheduler as MultiStepLR=False?

I am not an expert on a scheduler, but I tried to understand it through this one.

Why did you set up the scheduler as MultiStepLR=False?

I guess that is meaningless if we set up the scheduler as MultiStepLR=False.

Please, explain to me if I understood it wrong.

Reference System

Hello, I assume this model is trained with the camera reference system is that correct ( and I suppose it is left handed, y-down)?
If this is the case, let's say I have a rotation matrix of a calibrated camera, let's call it Rc. Can I use Rc*Rd, where Rd will be the model's rotation matrix to register the head pose to the 3D space? Should I use Rw(Rc.T) instead of Rc for example?

Have you tried something similar? My world reference system is right handed y up btw.

Preprocess part in train.py code and demo.py are different

Thanks for your good job.

I try to test and train the 6DRepNet model, and find some issue.

Preprocess code in train.py

    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225])

    transformations = transforms.Compose([transforms.Resize(240),
                                          transforms.RandomCrop(224),
                                          transforms.ToTensor(),
                                          normalize])

Preprocess code in demo.py

                img = frame[y_min:y_max,x_min:x_max]
               # cv2.imshow("crop", img)
               # cv2.waitKey(5)
                img = cv2.resize(img, (244, 244))/255.0
                img = img.transpose(2, 0, 1)
                img = torch.from_numpy(img).type(torch.FloatTensor)
                img = torch.Tensor(img).cuda(gpu)

normalize and input size are different.

I download the pre-trained RepVGG model 'RepVGG-A0-train.pth' from here

Just use demo.py code to test 9 faces with one image, output are wrong.
9 faces have same yaw, row and pitch valudes.

and I also test the Fine-trained models from here, the pose values look well.

So what are the difference between pre-trained RepVGG model and Fine-trained models?

what the difference between those different pretain models?

If I want to train my own model on other objects, which one should I choose

The output below includes pitch, roll and YAw. Does the result contain the remaining three degrees of freedom

may i ask how does the performance compare with 3ddfav2?

about the 3d point accuracy and speed?

problem while running the training code

I have the following problem while running the training code, can you help me？Thanks very much!
Traceback (most recent call last):
File "/home/zelong/D/testdemo/6DRepNet/train.py", line 112, in
model = SixDRepNet(backbone_name='RepVGG-B1g2',
File "/home/zelong/D/testdemo/6DRepNet/model.py", line 19, in init
checkpoint = torch.load(backbone_file)
File "/home/zelong/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 594, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/zelong/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/zelong/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'RepVGG-B1g2-train.pth'

is there possible to provide a smaller version?

currently runs on CPU is very slow

Getting different results using your code and your fine_tuned model for BIWI.

I tried to check your results.

When testing your fine_tuned model (6DRepNet_300W_LP_AFLW2000.pth) I got the same your results for AFLW2000 data but when testing the same fine_tuned model using BIWI data. I got this result:

Yaw: 3.5631, Pitch: 5.5943, Roll: 2.9061, MAE: 4.0212

It is not the same as your result?
3.24 | 4.48 | 2.68 | 3.47
what is my wrong?

Why does Equation 1 in the paper hold?

Hi，@thohemp

quick start version will install torch-cpu automaticlly

and it will cover the local torch,even it is cuda and latest version

Where is the face detection model in test.py?

# Import SixDRepNet
from sixdrepnet import SixDRepNet
import cv2

# Create model
# Weights are automatically downloaded
model = SixDRepNet()

img = cv2.imread('/path/to/image.jpg')

pitch, yaw, roll = model.predict(img)

model.draw_axis(img, yaw, pitch, roll)

cv2.imshow("test_window", img)
cv2.waitKey(0)

Does this code measure the roll, pitch, and yaw of the entire image without a face detection model?

Will it be more accurate if I add a cropped image of the face to the img?

Euler angle visualization

The demo file runs the camera. How to make inference on videos and images?

I want below results that was represented in your paper.

Thanks

Pretrained weight cannot be downloaded

Hi there,

I am experimenting with the SixDRepNet_Detector and I am experiencing the issue that the model pretrained weight cannot be downloaded

here is the error message on Google Colab:

Thank you!

please tell me where 'output/snapshots/1.pth' in test.py

Using your own trained weight files, the test results lose a lot

Hello, first of all, thank you very much for your great work, but now I have some problems. I used the pre-training model you provided for testing, which can reproduce the results in the paper, but I trained myself to get the weights, and the test result is 10 times that of the original model (Yaw: 35.6810, Pitch: 42.4646, Roll: 24.0692, MAE: 34.0716), my data set use is the same as yours, and other parameters have not changed, may I ask what is the reason for this? Could you please answer it, thank you。
@thohemp

How to train pre-train model

Hi,
I tried to train the model from scratch, seems hard to train comparable performance as training model from the pre-train model.
My question is how to train the pre-train model or how to train the similar performance from scratch?

Thanks!

Does SixDRepNet2 works?

SixDRepNet2 likes a simple ResNET,does it works well?

Finetuning the model

I am training your model on my own data. It arrived in 19 epochs. I would like to re-run the code to continue training, but I got this error:

Traceback (most recent call last):
  File "train.py", line 124, in <module>
    model = SixDRepNet(backbone_name,
  File "/home/redhwan/2/HPE/RosNet/sixdrepnet/model.py", line 22, in __init__
    backbone.load_state_dict(ckpt)
  File "/home/redhwan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RepVGG:
	Missing key(s) in state_dict: "stage0.se.down.weight", "stage0.se.down.bias", "stage0.se.up.weight", "stage0.se.up.bias", "stage0.rbr_reparam.weight",....

I replaced it RepVGG-B1g2-train.pth by 300W_LP_epoch_19.pth after converting. So, I change deploy=True,

backbone_name = 'RepVGG-B1g2' 
  # backbone_file = 'RepVGG-D2se-200epochs-train.pth'
  backbone_file = '300W_LP_epoch_19.pth'
  model = SixDRepNet(backbone_name,
                     backbone_file,
                     deploy=True,
                     pretrained=True)

Your help, please.

pre-trained models

Hi,
Thanks for this amazing work. I am really interested in your work. I just want to test your network on the 2 test dataset (AFLW2000 and BIWI).
I am wondering that why you provide two .pth files (6DRepNet_300W_LP_AFLW2000.pth and 6DRepNet_300W_LP_BIWI.pth ) for each specific test data, should not we just test the network with one pretrained model for both test datasets?

I am looking forward to your response,
Thanks

Query regarding face pose axis visualisation

I see that to construct rotation matrix(R) from yaw, pitch and roll angle values, you use zyx order i.e Rz * Ry * Rx,
where Rz is rotation about z-axis, Ry is rotation about the y-axis, and Rx is rotation about the x-axis.

But for visualisation, it looks like the order you use is xyz i.e Rx * Ry * Rz and then use column vectors of this resulting matrix as axis coordinates (https://github.com/thohemp/6DRepNet/blob/master/utils.py#L54). May I know why this is done? Am I missing something?

Thanks.

Questions regarding Learning full rotation appearance

Hello there!!

Thank you very much for your great work. I am really interested in your work and would like to implement it on images with full rotation appearance.

I have two questions regarding this.
1.) Is your pre-trained model trained on full-rotation-appearance datasets (-180, 180) and capable of predicting head poses on images in which faces cannot be seen?
2.) If the answer to my first question is NO, could you please guide me on which datasets I should use for finetuning the pre-trained model to learn full orientation appearance?

Thank you very much in advance for your consideration

Import error with pip package

pip3 install sixdrepnet   #Works!

import sixdrepnet

I get the following error (I am currently running it on colab)

[/usr/local/lib/python3.8/dist-packages/sixdrepnet/regressor.py](https://localhost:8080/#) in <module>
      8 import numpy as np
      9 
---> 10 from model import SixDRepNet
     11 import utils
     12 

ModuleNotFoundError: No module named 'model'

Model convert

I want to use libtorch to infer, it seems that the '.pt' format is a must for C++， how to convert '.pth' to '.pt'?

is the pretrained model for only faces?

Thanks for the work! Is the pretrained model is for only face pictures? If so, is there any other pretrained model for other objects, like box, bottle, shoe etc.?

Pip install failed

When I install the package: pip install SixDRepNet, it returns
Collecting SixDRepNet
Downloading SixDRepNet-0.1.1.tar.gz (23 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 36, in
File "", line 34, in
File "/private/var/folders/5l/9fsdwp_n1td91scw10n67zwc0000gp/T/pip-install-8xwcmbp1/sixdrepnet_2b06c43f46c5428d9c99677633d23b6e/setup.py", line 23, in
long_description="".join(open("README.MD", "r").readlines()),
FileNotFoundError: [Errno 2] No such file or directory: 'README.MD'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Does the pip installation package support Linux aarch64

Excellent work! Does the pip installation package support Linux aarch64?

cant change gpu id in demo.py

there is a cuda error when changing gpu to another id(except 0)

Traceback (most recent call last): File "demo.py", line 136, in <module> R_pred = model(img) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl return forward_call(*input, **kwargs) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/model.py", line 48, in forward return utils.compute_rotation_matrix_from_ortho6d(x) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 146, in compute_rotation_matrix_from_ortho6d x = normalize_vector(x_raw, use_gpu) #batch*3 File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 119, in normalize_vector v_mag = torch.max(v_mag, torch.autograd.Variable(torch.FloatTensor([1e-8]).cuda())) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!

File "convert.py", line 35, in convert load_filtered_state_dict(model, saved_state_dict['model_state_dict']) KeyError: 'model_state_dict'

@thohemp hello,
python convert.py 6DRepNet_300W_LP_BIWI.pth 6DRepNet_300W_LP_BIWI_depoly.pth
errors:
File "convert.py", line 35, in convert
load_filtered_state_dict(model, saved_state_dict['model_state_dict'])
KeyError: 'model_state_dict'

Licence for Fine-tuned models

Hi.
Thanks for this interesting and wonderful piece of work!

I have a question about licensing, as the title says...
What would be the licence for Fine-tuned models?
Is it MIT like the codes, or is it different?

I want to use it as part of a work study, but I am not skilled in machine learning and would like to use the model as is!

I don't intend to publish, redistribute or incorporate them into products, but even if it is for research purposes, under my work rules, it is still a commercial use.
So, I would like to ask you for more information about the lisence for the models.

I look forward to response from you.
Thank you.