Giter Site home page Giter Site logo

qitaozhao / poseformerv2 Goto Github PK

View Code? Open in Web Editor NEW
244.0 244.0 29.0 36.41 MB

The project is an official implementation of our paper "PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation".

License: MIT License

Python 94.50% MATLAB 5.50%
human-pose-estimation pytorch

poseformerv2's People

Contributors

qitaozhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

poseformerv2's Issues

About camera intrinsics and inference

Thank you for sharing the code.

May I ask whether the camera intrinsics have been used for preprocessing the data (assuming that we have access to 2D gt in pixel space)? According to my understanding even though you are returning them with "out_camera_params" they have not been used in the scripts so I just wanted to ask whether there is any other place where they might be needed.

I also wanted to ask about your recommended 2D pose estimator for running your code for inference and whether there is a plan to provide an inference script in the future.

Thanks in advance for your time.

Wrong SOTA performance

Hi! Thanks for your work! I found that MPJPE performance of MixSTE is probably wrong in your paper. It should be 40.9 and better than yours?

Performance issue?

Thank you very much for sharing!

As you mentioned in the article, PoseFormerV2 is reported to be more efficient than MHFormer by approximately 4 times. I have tried processing the same video using MHFormer and PoseFormerV2 using the demo sources provided here:

MHFormer: https://github.com/Vegetebird/MHFormer/blob/main/demo/vis.py
PoseFormerV2: https://github.com/QitaoZhao/PoseFormerV2/blob/main/demo/vis.py

  • For MHFormer, I used the default source code without any additional modifications.

  • For PoseFormerV2, I used the 9_81_46.0.bin model.

  • Both were run on a GTX 3090.

  • Results:

  • MHFormer: ~6.7 it/s

  • PoseFormerV2: ~2.7 it/s

  • Results when I commented out the drawing and file-saving part in the get_pose3D() function:

  • MHFormer: ~54.00 it/s

  • PoseFormerV2: ~30.63 it/s

Could you please explain why there is such a difference? Have I done something wrong?

AttributeError: 'NoneType' object has no attribute 'astype'

Thank you very much for your work. when I run this command
python run_poseformer.py -d h36m -k gt
-c checkpoint -g 0
--evaluate 27_243_45.2.bin
--render --viz-subject S11
--viz-action Walking --viz-camera 0
--viz-export output3d

, I encountered the following issue.

Traceback (most recent call last):
File "/home/dvlab/PoseFormerV2-main/run_poseformer.py", line 574, in module
prediction = evaluate(gen, return_predictions=True)
File "/home/dvlab/PoseFormerV2-main/run_poseformer.py", line 481, in evaluate
cam = torch.from_numpy(cam.astype('float32'))
AttributeError: 'NoneType' object has no attribute 'astype'

I had already modified the code, and I check if cam had exact value.

test_generator = UnchunkedGenerator(cameras_valid, poses_valid, poses_valid_2d,
pad=pad, causal_shift=causal_shift, augment=False,
kps_left=kps_left, kps_right=kps_right, joints_left=joints_left, joints_right=joints_right)


class UnchunkedGenerator:
def init(self, cameras, poses_3d, poses_2d, pad=0, causal_shift=0,
augment=False, kps_left=None, kps_right=None, joints_left=None, joints_right=None):
assert poses_3d is None or len(poses_3d) == len(poses_2d)
assert cameras is None or len(cameras) == len(poses_2d)
print("cameras 3 :",cameras )

its value is a list:

Details

cameras 3 : [array([ 2.29009891e+00, 2.28756237e+00, 2.50830650e-02, 2.89029814e-02,
-2.07098916e-01, 2.47775182e-01, -3.07515031e-03, -9.75698873e-04,
-1.42447161e-03, 4.36662363e-01, 0.00000000e+00, -1.09528303e-02,
0.00000000e+00, 4.37146551e-01, -1.26348389e-02, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29935122e+00, 2.29518342e+00, 1.76972151e-02, 1.61298513e-02,
-1.94213629e-01, 2.40408540e-01, 6.81997556e-03, -1.61902665e-03,
-2.74089444e-03, 4.34905287e-01, 0.00000000e+00, -7.69661227e-03,
0.00000000e+00, 4.35695026e-01, -7.02769589e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29828143e+00, 2.29759789e+00, 3.96317244e-02, 2.80535221e-03,
-2.08338186e-01, 2.55488008e-01, -2.46049743e-03, 1.48438697e-03,
-7.59999326e-04, 4.35107723e-01, 0.00000000e+00, -1.72440689e-02,
0.00000000e+00, 4.35237169e-01, -1.22099358e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29102278e+00, 2.28954792e+00, 2.99364328e-02, 1.76403334e-03,
-1.98384091e-01, 2.18323678e-01, -8.94780736e-03, -5.87205577e-04,
-1.81336200e-03, 4.36486276e-01, 0.00000000e+00, -1.30668422e-02,
0.00000000e+00, 4.36767447e-01, -7.70472339e-04, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29009891e+00, 2.28756237e+00, 2.50830650e-02, 2.89029814e-02,
-2.07098916e-01, 2.47775182e-01, -3.07515031e-03, -9.75698873e-04,
-1.42447161e-03, 4.36662363e-01, 0.00000000e+00, -1.09528303e-02,
0.00000000e+00, 4.37146551e-01, -1.26348389e-02, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29935122e+00, 2.29518342e+00, 1.76972151e-02, 1.61298513e-02,
-1.94213629e-01, 2.40408540e-01, 6.81997556e-03, -1.61902665e-03,
-2.74089444e-03, 4.34905287e-01, 0.00000000e+00, -7.69661227e-03,
0.00000000e+00, 4.35695026e-01, -7.02769589e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29828143e+00, 2.29759789e+00, 3.96317244e-02, 2.80535221e-03,
-2.08338186e-01, 2.55488008e-01, -2.46049743e-03, 1.48438697e-03,
-7.59999326e-04, 4.35107723e-01, 0.00000000e+00, -1.72440689e-02,
0.00000000e+00, 4.35237169e-01, -1.22099358e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29102278e+00, 2.28954792e+00, 2.99364328e-02, 1.76403334e-03,
-1.98384091e-01, 2.18323678e-01, -8.94780736e-03, -5.87205577e-04,
-1.81336200e-03, 4.36486276e-01, 0.00000000e+00, -1.30668422e-02,
0.00000000e+00, 4.36767447e-01, -7.70472339e-04, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29009891e+00, 2.28756237e+00, 2.50830650e-02, 2.89029814e-02,
-2.07098916e-01, 2.47775182e-01, -3.07515031e-03, -9.75698873e-04,
-1.42447161e-03, 4.36662363e-01, 0.00000000e+00, -1.09528303e-02,
0.00000000e+00, 4.37146551e-01, -1.26348389e-02, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29935122e+00, 2.29518342e+00, 1.76972151e-02, 1.61298513e-02,
-1.94213629e-01, 2.40408540e-01, 6.81997556e-03, -1.61902665e-03,
-2.74089444e-03, 4.34905287e-01, 0.00000000e+00, -7.69661227e-03,
0.00000000e+00, 4.35695026e-01, -7.02769589e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00])

I have no idea where the bug is.
Thanks for your help

About MPI-INF-3DHP

Thanks for your awesome work. I also hope to use the MPI-INF-3DHP data set in my experiment.
How did you deal with MPI-INF-3DHP?
This problem has been bothering me for several days.

I would be very glad if you could help me.

Potential memory leak during plotting in demo

Hello,

I've noticed that in vis.py after generating plots with matplotlib, there isn't a call to plt.clf() or plt.close(fig).

These functions might be needed for releasing the memory associated with a figure, especially when creating plots for long video.

I would like to know the hyperparameter setting of 45.2mm model

recently , I tried to train the model , frame = 243 central frame = 27 coefficient = 27 . I expect to get the result which is close to 45.2mm , because accuracy of my model is only about 46.6mm

I really want to know how to train the similar model like yours. please tell me how can I set correctly to get this outcome

thank you very much for your help

Joint and number

Thank you for your excellent work.
I encountered a problem while visualizing and obtaining the coordinates of each key point. Can I know what the joints corresponding to each keypoint are, and whether they are the same as those marked in the following figure.Thank you.
b0e01ef8c24f089601be0213b300c9b

Runtime issues

Great work!The utilization rate of the GPU is very low when I am running. How should I modify the config?

visualization code

Hello, I am interested in your excellent work, but as a beginner I have a problem with visualization, can you provide instructions for visualization, thank you very much!

hoping to receive your advice

Thank you for your sharing.
I run the command as follows, but 3 hours passed, there has no change, can you please tell me what causes this bug and how to fix it up.
Wish you have a good day.
微信图片_20231211173427

some confusion about the figure of network structure

image

If I remember correctly, you took the structure of the PoseformerV1 spatial-temporal transformer and replaced the temporal transformer with a "DCT + LPF + Linear projection" module, right?

I'm curious why only the spatial transformer part has temporal embedding instead of spatial embedding.
Is it related to "reformulated as a Time-Frequency Feature Fusion module"? So it must have temporal embedding.
Please help me figure this out.

Thanks for your reply

About MFLOPs

Hello, sir. May I inquire about the methodology you utilize to calculate the value of MFLOPs?

ModuleNotFoundError: No module named 'utils.transforms'

When I enter this command :
python demo/vis.py --video sample_video.mp4

I will get the error like this :

Traceback (most recent call last):
File "demo/vis.py", line 5, in
from lib.hrnet.gen_kpts import gen_video_kpts as hrnet_pose
File "/home/dvlab/PoseFormerV2-main/demo/lib/hrnet/gen_kpts.py", line 21, in
from lib.hrnet.lib.utils.inference import get_final_preds
File "/home/dvlab/PoseFormerV2-main/demo/lib/hrnet/lib/utils/inference.py", line 17, in
from utils.transforms import transform_preds
ModuleNotFoundError: No module named 'utils.transforms'

however, i download the whole file on the github. Also, I check if my document is missed, but everything is good
please tell me how to solve this problem
thanks

ONNX export

could you provide a script to export model to ONNX, please?

camera instrinsic parameters

Thank you for your great works!!
I have a doubt about human3.6m datasets code, why was instrinsic parameter changed as following:

cam['intrinsic'] = np.concatenate((cam['focal_length'],
                                                   cam['center'],
                                                   cam['radial_distortion'],
                                                   cam['tangential_distortion'],
                                                   [1/cam['focal_length'][0], 0, -cam['center'][0]/cam['focal_length'][0], 
                                                    0, 1/cam['focal_length'][1], -cam['center'][1]/cam['focal_length'][1], 
                                                    0, 0, 1]))

I checked out the project P-STMO, its code as the same as videopose3d project like this:

                cam['intrinsic'] = np.concatenate((cam['focal_length'],
                                                   cam['center'],
                                                   cam['radial_distortion'],
                                                   cam['tangential_distortion']))

Thanks

Python API to get the 3D results

Hi I am a student trying to use this project. Firstly thanks for the amazing project.

I am trying to find a python API to run the model on in_the_wild videos of mine. I saw the CLI way of doing it and tried it out. worked pretty good
python demo/vis.py --video sample_video.mp4

Is there a way of calling this using python code? since I want to customize it to just give me the 3D coordinates (I feel the saving of images is the reason the inference is slow).

Edit: I got the way we get the 2D point and then pass them to get_pose3D. but I can't seem to understand which part in this block of code has the values of the 3D keypoints

I'm confused about Code

x = torch.cat((x, Spatial_feature), dim=1)

self.mlp1 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

self.mlp2 = FreqMlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

x1 = x[:, :f//2] + self.drop_path(self.mlp1(self.norm2(x[:, :f//2])))

x2 = x[:, f//2:] + self.drop_path(self.mlp2(self.norm3(x[:, f//2:])))

x[:, :f//2] x[:, f//2:] Are the frame numbers of x set incorrectly, or did I misunderstand?
x combine with Spatial_feature , does x belongs to the x[:, :f//2], and Spatial_feature belongs to the x[:, f//2:].?

thanks for your help

Able to run realtime?

Great work!I am wondering can the model run real-time with video stream input?
Thanks!

Replication of Poseformer V1 and MixSTE experiments

Great innovation. I noticed that the PoseFormer V1 experiment you replicated had a much better performance compared to the original article, while the MixSTE experiment had a significant decrease in performance. Is this a normal phenomenon?

weights file

Hi team, thanks for the work on this repo it's quite fantastic. but where is the weights for PoseFormerV2

Demo Script

Thank you for the amazing work! Can you please share the demo script to run on the wild videos? Thanks

Training code of 243 frames

Hello, thank you very much for your great work, I wanted to make sure that to train to get the best results mentioned in the paper, the training code is

python run_poseformer.py -g 5 -k cpn_ft_h36m_dbb -frame 243 -frame-kept 27 -coeff-kept 27 -c checkpoint/243 -b 1024?

python3.8 error

when pip install -r requirements.txt, some pkgs need python 3.9. the python of your project is python3.8?

parameters and FLOPs

Thank you very much for your work. I would like to know how the number of parameters and the amount of calculation of the model in your work is calculated?

Inference on custom videos

Thank you for your work ! Are the released pretrained models suitable for inference on custom videos ? Or for the demos do you use models trained differently ? Thanks !

Questions about the results of StridedTrans. in Experiment Table 2

I noticed in Table 2 that StridedTrans.[15] TMM’22 has an MPJPE of 47.5 and MFLOPs of 342.5. This is far from the results reported in the StridedTrans. article. In the StridedTrans. author's article, an input of 81 frames can reach 45.4 MPJPE. The input of 27 frames can reach 46.9MPJPE and only requires 128MFLOPs.
Can you explain why you chose the result 47.5?
Thanks!

Figure 1: Table 2 in your paper.
image

Figure 2: Table in StridedTrans. author's article
image

I don't know how to make Figure 5

image

I'm having difficulty understanding what each point on the figure indicates. For example, why does PoseFormerV2 27xRF have three points? Was this data mentioned in the paper?

Also, under what conditions does PoseFormerV2 9xRF have an MPJPE of 46?"

if i can jump into conclusion . most of all , the PoseFormerV2 27xRF is the best model except for PoseFormerV2 9xRF with MPJPE of 46
PoseFormerV2 9xRF with MPJPE of 46 is the special case ?

Thank you for your reply

the PCK and AUC for MPI-INF-3DHP in PoseFormerV2

First, thanks for your greatest work of PoseFormerV2. I checked your code and found that when evaluating the mpii dataset, the calculation of the two metrics pck and auc is not defined. May I know how to calculate the pck and auc for mpiinf-3dhp? Thanks a lot!

Some confusion about run_poseformer.py

Accroding to run_poseformer.py
line 287 and line 337 "inputs_3d[:, :, 0] = 0"

What does "inputs_3d[:, :, 0] = 0" mean?
The performance is worse after taking it out, but I can't understand it, it doesn't seem to have anything to do with the training model, I just know that the "MPJPE" calculated this way is not the same.

Thank you for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.