The poseformerv2 from qitaozhao

About camera intrinsics and inference

Thank you for sharing the code.

May I ask whether the camera intrinsics have been used for preprocessing the data (assuming that we have access to 2D gt in pixel space)? According to my understanding even though you are returning them with "out_camera_params" they have not been used in the scripts so I just wanted to ask whether there is any other place where they might be needed.

I also wanted to ask about your recommended 2D pose estimator for running your code for inference and whether there is a plan to provide an inference script in the future.

Thanks in advance for your time.

Wrong SOTA performance

Hi! Thanks for your work! I found that MPJPE performance of MixSTE is probably wrong in your paper. It should be 40.9 and better than yours?

Performance issue?

Thank you very much for sharing!

As you mentioned in the article, PoseFormerV2 is reported to be more efficient than MHFormer by approximately 4 times. I have tried processing the same video using MHFormer and PoseFormerV2 using the demo sources provided here:

MHFormer: https://github.com/Vegetebird/MHFormer/blob/main/demo/vis.py
PoseFormerV2: https://github.com/QitaoZhao/PoseFormerV2/blob/main/demo/vis.py

For MHFormer, I used the default source code without any additional modifications.
For PoseFormerV2, I used the 9_81_46.0.bin model.
Both were run on a GTX 3090.
Results:
MHFormer: ~6.7 it/s
PoseFormerV2: ~2.7 it/s
Results when I commented out the drawing and file-saving part in the get_pose3D() function:
MHFormer: ~54.00 it/s
PoseFormerV2: ~30.63 it/s

Could you please explain why there is such a difference? Have I done something wrong?

AttributeError: 'NoneType' object has no attribute 'astype'

Thank you very much for your work. when I run this command
python run_poseformer.py -d h36m -k gt
-c checkpoint -g 0
--evaluate 27_243_45.2.bin
--render --viz-subject S11
--viz-action Walking --viz-camera 0
--viz-export output3d
, I encountered the following issue.

Traceback (most recent call last):
File "/home/dvlab/PoseFormerV2-main/run_poseformer.py", line 574, in module
prediction = evaluate(gen, return_predictions=True)
File "/home/dvlab/PoseFormerV2-main/run_poseformer.py", line 481, in evaluate
cam = torch.from_numpy(cam.astype('float32'))
AttributeError: 'NoneType' object has no attribute 'astype'

I had already modified the code, and I check if cam had exact value.

test_generator = UnchunkedGenerator(cameras_valid, poses_valid, poses_valid_2d,
pad=pad, causal_shift=causal_shift, augment=False,
kps_left=kps_left, kps_right=kps_right, joints_left=joints_left, joints_right=joints_right)

class UnchunkedGenerator:
def init(self, cameras, poses_3d, poses_2d, pad=0, causal_shift=0,
augment=False, kps_left=None, kps_right=None, joints_left=None, joints_right=None):
assert poses_3d is None or len(poses_3d) == len(poses_2d)
assert cameras is None or len(cameras) == len(poses_2d)
print("cameras 3 :",cameras )

its value is a list:

Details

cameras 3 : [array([ 2.29009891e+00, 2.28756237e+00, 2.50830650e-02, 2.89029814e-02,
-2.07098916e-01, 2.47775182e-01, -3.07515031e-03, -9.75698873e-04,
-1.42447161e-03, 4.36662363e-01, 0.00000000e+00, -1.09528303e-02,
0.00000000e+00, 4.37146551e-01, -1.26348389e-02, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29935122e+00, 2.29518342e+00, 1.76972151e-02, 1.61298513e-02,
-1.94213629e-01, 2.40408540e-01, 6.81997556e-03, -1.61902665e-03,
-2.74089444e-03, 4.34905287e-01, 0.00000000e+00, -7.69661227e-03,
0.00000000e+00, 4.35695026e-01, -7.02769589e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29828143e+00, 2.29759789e+00, 3.96317244e-02, 2.80535221e-03,
-2.08338186e-01, 2.55488008e-01, -2.46049743e-03, 1.48438697e-03,
-7.59999326e-04, 4.35107723e-01, 0.00000000e+00, -1.72440689e-02,
0.00000000e+00, 4.35237169e-01, -1.22099358e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29102278e+00, 2.28954792e+00, 2.99364328e-02, 1.76403334e-03,
-1.98384091e-01, 2.18323678e-01, -8.94780736e-03, -5.87205577e-04,
-1.81336200e-03, 4.36486276e-01, 0.00000000e+00, -1.30668422e-02,
0.00000000e+00, 4.36767447e-01, -7.70472339e-04, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29009891e+00, 2.28756237e+00, 2.50830650e-02, 2.89029814e-02,
-2.07098916e-01, 2.47775182e-01, -3.07515031e-03, -9.75698873e-04,
-1.42447161e-03, 4.36662363e-01, 0.00000000e+00, -1.09528303e-02,
0.00000000e+00, 4.37146551e-01, -1.26348389e-02, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29935122e+00, 2.29518342e+00, 1.76972151e-02, 1.61298513e-02,
-1.94213629e-01, 2.40408540e-01, 6.81997556e-03, -1.61902665e-03,
-2.74089444e-03, 4.34905287e-01, 0.00000000e+00, -7.69661227e-03,
0.00000000e+00, 4.35695026e-01, -7.02769589e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29828143e+00, 2.29759789e+00, 3.96317244e-02, 2.80535221e-03,
-2.08338186e-01, 2.55488008e-01, -2.46049743e-03, 1.48438697e-03,
-7.59999326e-04, 4.35107723e-01, 0.00000000e+00, -1.72440689e-02,
0.00000000e+00, 4.35237169e-01, -1.22099358e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29102278e+00, 2.28954792e+00, 2.99364328e-02, 1.76403334e-03,
-1.98384091e-01, 2.18323678e-01, -8.94780736e-03, -5.87205577e-04,
-1.81336200e-03, 4.36486276e-01, 0.00000000e+00, -1.30668422e-02,
0.00000000e+00, 4.36767447e-01, -7.70472339e-04, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29009891e+00, 2.28756237e+00, 2.50830650e-02, 2.89029814e-02,
-2.07098916e-01, 2.47775182e-01, -3.07515031e-03, -9.75698873e-04,
-1.42447161e-03, 4.36662363e-01, 0.00000000e+00, -1.09528303e-02,
0.00000000e+00, 4.37146551e-01, -1.26348389e-02, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00]), array([ 2.29935122e+00, 2.29518342e+00, 1.76972151e-02, 1.61298513e-02,
-1.94213629e-01, 2.40408540e-01, 6.81997556e-03, -1.61902665e-03,
-2.74089444e-03, 4.34905287e-01, 0.00000000e+00, -7.69661227e-03,
0.00000000e+00, 4.35695026e-01, -7.02769589e-03, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00])

I have no idea where the bug is.
Thanks for your help

About MPI-INF-3DHP

Thanks for your awesome work. I also hope to use the MPI-INF-3DHP data set in my experiment.
How did you deal with MPI-INF-3DHP?
This problem has been bothering me for several days.

I would be very glad if you could help me.

Potential memory leak during plotting in demo

Hello,

I've noticed that in vis.py after generating plots with matplotlib, there isn't a call to plt.clf() or plt.close(fig).

These functions might be needed for releasing the memory associated with a figure, especially when creating plots for long video.

I would like to know the hyperparameter setting of 45.2mm model

recently , I tried to train the model , frame = 243 central frame = 27 coefficient = 27 . I expect to get the result which is close to 45.2mm , because accuracy of my model is only about 46.6mm

I really want to know how to train the similar model like yours. please tell me how can I set correctly to get this outcome

thank you very much for your help

Joint and number

Thank you for your excellent work.
I encountered a problem while visualizing and obtaining the coordinates of each key point. Can I know what the joints corresponding to each keypoint are, and whether they are the same as those marked in the following figure.Thank you.

Runtime issues

Great work！The utilization rate of the GPU is very low when I am running. How should I modify the config?

visualization code

Hello, I am interested in your excellent work, but as a beginner I have a problem with visualization, can you provide instructions for visualization, thank you very much!

hoping to receive your advice

Thank you for your sharing.
I run the command as follows, but 3 hours passed, there has no change, can you please tell me what causes this bug and how to fix it up.
Wish you have a good day.

How to check the robustness of a model

Great work！And I want to know how to add noise to test the robustness of the model, like Figure 6 in your paper. Thank you！！！

some confusion about the figure of network structure

If I remember correctly, you took the structure of the PoseformerV1 spatial-temporal transformer and replaced the temporal transformer with a "DCT + LPF + Linear projection" module, right?

I'm curious why only the spatial transformer part has temporal embedding instead of spatial embedding.
Is it related to "reformulated as a Time-Frequency Feature Fusion module"? So it must have temporal embedding.
Please help me figure this out.

Thanks for your reply

Unable to reproduce the corresponding result in the article when the sequence length is 81, f=1, dct=3

I successfully reproduced the in-text accuracy for an input sequence of 27, f=1, dct=3, but when I raise the RF to 81, the MPJPE only reaches about 48.7. I have a learning rate of 0.0008 and a batchsize of 1024, what went wrong, any answer would be appreciated!

Thanks. We did reproduce PoseFormerV1 but _not_ reproduce MixSTE. The reproduced PoseFormerV1 result (AUC & MPJPE) is worse than MixSTE, which is consistent with Human3.6M.

          Thanks. We did reproduce PoseFormerV1 but _not_ reproduce MixSTE. The reproduced PoseFormerV1 result (AUC & MPJPE) is worse than MixSTE, which is consistent with Human3.6M.

Originally posted by @QitaoZhao in #3 (comment)

About MFLOPs

Hello, sir. May I inquire about the methodology you utilize to calculate the value of MFLOPs?

ModuleNotFoundError: No module named 'utils.transforms'

When I enter this command :
python demo/vis.py --video sample_video.mp4

I will get the error like this :

Traceback (most recent call last):
File "demo/vis.py", line 5, in
from lib.hrnet.gen_kpts import gen_video_kpts as hrnet_pose
File "/home/dvlab/PoseFormerV2-main/demo/lib/hrnet/gen_kpts.py", line 21, in
from lib.hrnet.lib.utils.inference import get_final_preds
File "/home/dvlab/PoseFormerV2-main/demo/lib/hrnet/lib/utils/inference.py", line 17, in
from utils.transforms import transform_preds
ModuleNotFoundError: No module named 'utils.transforms'

however, i download the whole file on the github. Also, I check if my document is missed, but everything is good
please tell me how to solve this problem
thanks

ONNX export

could you provide a script to export model to ONNX, please?

camera instrinsic parameters

Thank you for your great works!!
I have a doubt about human3.6m datasets code, why was instrinsic parameter changed as following:

cam['intrinsic'] = np.concatenate((cam['focal_length'],
                                                   cam['center'],
                                                   cam['radial_distortion'],
                                                   cam['tangential_distortion'],
                                                   [1/cam['focal_length'][0], 0, -cam['center'][0]/cam['focal_length'][0], 
                                                    0, 1/cam['focal_length'][1], -cam['center'][1]/cam['focal_length'][1], 
                                                    0, 0, 1]))

I checked out the project P-STMO, its code as the same as videopose3d project like this:

                cam['intrinsic'] = np.concatenate((cam['focal_length'],
                                                   cam['center'],
                                                   cam['radial_distortion'],
                                                   cam['tangential_distortion']))

Thanks

Python API to get the 3D results

Hi I am a student trying to use this project. Firstly thanks for the amazing project.

I am trying to find a python API to run the model on in_the_wild videos of mine. I saw the CLI way of doing it and tried it out. worked pretty good
python demo/vis.py --video sample_video.mp4

Is there a way of calling this using python code? since I want to customize it to just give me the 3D coordinates (I feel the saving of images is the reason the inference is slow).

Edit: I got the way we get the 2D point and then pass them to get_pose3D. but I can't seem to understand which part in this block of code has the values of the 3D keypoints

I'm confused about Code

PoseFormerV2/common/model_poseformer.py

Line 224 in 0c0b125

x = torch.cat((x, Spatial_feature), dim=1)

PoseFormerV2/common/model_poseformer.py

Line 120 in 0c0b125

    
           self.mlp1 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

PoseFormerV2/common/model_poseformer.py

Line 122 in 0c0b125

    
           self.mlp2 = FreqMlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

PoseFormerV2/common/model_poseformer.py

Line 127 in 0c0b125

x1 = x[:, :f//2] + self.drop_path(self.mlp1(self.norm2(x[:, :f//2])))

PoseFormerV2/common/model_poseformer.py

Line 128 in 0c0b125

x2 = x[:, f//2:] + self.drop_path(self.mlp2(self.norm3(x[:, f//2:])))

x[:, :f//2] x[:, f//2:] Are the frame numbers of x set incorrectly, or did I misunderstand?
x combine with Spatial_feature , does x belongs to the x[:, :f//2], and Spatial_feature belongs to the x[:, f//2:].?

thanks for your help

Able to run realtime?

Great work!I am wondering can the model run real-time with video stream input?
Thanks!

Replication of Poseformer V1 and MixSTE experiments

Great innovation. I noticed that the PoseFormer V1 experiment you replicated had a much better performance compared to the original article, while the MixSTE experiment had a significant decrease in performance. Is this a normal phenomenon?

weights file

Hi team, thanks for the work on this repo it's quite fantastic. but where is the weights for PoseFormerV2

Demo Script

Thank you for the amazing work! Can you please share the demo script to run on the wild videos? Thanks

Training code of 243 frames

Hello, thank you very much for your great work, I wanted to make sure that to train to get the best results mentioned in the paper, the training code is

python run_poseformer.py -g 5 -k cpn_ft_h36m_dbb -frame 243 -frame-kept 27 -coeff-kept 27 -c checkpoint/243 -b 1024?

python3.8 error

when pip install -r requirements.txt, some pkgs need python 3.9. the python of your project is python3.8?

parameters and FLOPs

Thank you very much for your work. I would like to know how the number of parameters and the amount of calculation of the model in your work is calculated?

the file "torch_dct" NotFound

Hello, Great work! but I can not find the torch_dct file,can you upload it? Thank you a lot!

How to obtain the same experimental results as 47.9

I cannot obtain the same experimental result 47.9 using the parameters suggested by the author. Do you need to pay attention to any further experimental details? I hope the author can inform me.

Inference on custom videos

Thank you for your work ! Are the released pretrained models suitable for inference on custom videos ? Or for the demos do you use models trained differently ? Thanks !

Questions about the results of StridedTrans. in Experiment Table 2

I noticed in Table 2 that StridedTrans.[15] TMM’22 has an MPJPE of 47.5 and MFLOPs of 342.5. This is far from the results reported in the StridedTrans. article. In the StridedTrans. author's article, an input of 81 frames can reach 45.4 MPJPE. The input of 27 frames can reach 46.9MPJPE and only requires 128MFLOPs.
Can you explain why you chose the result 47.5?
Thanks!

Figure 1： Table 2 in your paper.

Figure 2: Table in StridedTrans. author's article

I don't know how to make Figure 5

I'm having difficulty understanding what each point on the figure indicates. For example, why does PoseFormerV2 27xRF have three points? Was this data mentioned in the paper?

Also, under what conditions does PoseFormerV2 9xRF have an MPJPE of 46?"

if i can jump into conclusion . most of all , the PoseFormerV2 27xRF is the best model except for PoseFormerV2 9xRF with MPJPE of 46
PoseFormerV2 9xRF with MPJPE of 46 is the special case ?

Thank you for your reply

the PCK and AUC for MPI-INF-3DHP in PoseFormerV2

First, thanks for your greatest work of PoseFormerV2. I checked your code and found that when evaluating the mpii dataset, the calculation of the two metrics pck and auc is not defined. May I know how to calculate the pck and auc for mpiinf-3dhp? Thanks a lot!

MPI-INF-3DHP数据集下运行结果中的p1指的是什么

在该数据集下MPI-INF-3DHP进行的复现结果中P1指的是什么

Some confusion about run_poseformer.py

Accroding to run_poseformer.py
line 287 and line 337 "inputs_3d[:, :, 0] = 0"

What does "inputs_3d[:, :, 0] = 0" mean?
The performance is worse after taking it out, but I can't understand it, it doesn't seem to have anything to do with the training model, I just know that the "MPJPE" calculated this way is not the same.

Thank you for your reply.

为什么训练Human3.6M数据集的模型代码和训练MPI-INF-3DHP数据集模型代码不同

qitaozhao / poseformerv2 Goto Github PK

poseformerv2's People

Contributors

Stargazers

Watchers

Forkers

poseformerv2's Issues

Recommend Projects

Recommend Topics

Recommend Org