qitaozhao / contextaware-poseformer Goto Github PK

The project is an official implementation of our paper "A Single 2D Pose With Context is Worth Hundreds for 3D Human Pose Estimation".

Python 100.00%

3d humanposeestimation

contextaware-poseformer's People

Contributors

Stargazers

Forkers

zhaoyijiang rishiagarwal2000

contextaware-poseformer's Issues

Question about reproduce results of model training and evaluation

Firstly, thanks for excellent work. I use the HRNetW32 as the backbone to train the model from scratch and use your pretrained model to evaluate, but I failed to get the same results in the paper. I made sure the configuration file and code were NOT modified. Could you help me?

Experiment environment:

OS: Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-165-generic x86_64)
PyTorch: 2.1.0 py3.8_cuda12.1_cudnn8.9.2_0 build
CUDA: 12.1
cuDNN: 8.9.2
Python: 3.8.18
Numpy: 1.24.4
GPU: 4*RTX4090 24GB

Training:

Loading backbone from data/pretrained/coco/pose_hrnet_w32_256x192.pth
Loading data...
Trainable parameter count: 14094147
evaluating....
[1] time 4.81 lr 0.006400 3d_train 49.176782 3d_test_p1 49.100000 3d_test_p2 41.900000
save best checkpoint
evaluating....
[2] time 4.29 lr 0.006336 3d_train 30.382120 3d_test_p1 46.300000 3d_test_p2 39.500000
save best checkpoint
evaluating....
[3] time 4.13 lr 0.006273 3d_train 27.326651 3d_test_p1 47.400000 3d_test_p2 39.700000
evaluating....
[4] time 4.68 lr 0.006210 3d_train 25.889964 3d_test_p1 49.800000 3d_test_p2 39.500000
evaluating....
[5] time 4.46 lr 0.006148 3d_train 24.977543 3d_test_p1 46.000000 3d_test_p2 38.600000
save best checkpoint
evaluating....
[6] time 4.07 lr 0.006086 3d_train 24.334025 3d_test_p1 46.000000 3d_test_p2 38.300000
evaluating....
[7] time 4.07 lr 0.006025 3d_train 23.788538 3d_test_p1 46.600000 3d_test_p2 38.800000
evaluating....
[8] time 4.08 lr 0.005965 3d_train 23.397479 3d_test_p1 46.700000 3d_test_p2 39.600000
evaluating....
[9] time 4.02 lr 0.005906 3d_train 23.026331 3d_test_p1 46.200000 3d_test_p2 38.400000
evaluating....
[10] time 4.03 lr 0.005847 3d_train 22.708713 3d_test_p1 46.600000 3d_test_p2 38.300000
evaluating....
[11] time 4.00 lr 0.005788 3d_train 22.455362 3d_test_p1 45.400000 3d_test_p2 38.700000
save best checkpoint
evaluating....
[12] time 4.09 lr 0.005730 3d_train 22.204906 3d_test_p1 48.500000 3d_test_p2 39.900000
evaluating....
[13] time 4.01 lr 0.005673 3d_train 22.004024 3d_test_p1 48.400000 3d_test_p2 40.300000
evaluating....
[14] time 4.00 lr 0.005616 3d_train 21.822001 3d_test_p1 47.500000 3d_test_p2 39.200000
evaluating....
[15] time 4.00 lr 0.005560 3d_train 21.618120 3d_test_p1 45.800000 3d_test_p2 38.700000
evaluating....
[16] time 4.02 lr 0.005504 3d_train 21.461701 3d_test_p1 45.500000 3d_test_p2 38.800000
evaluating....
[17] time 4.00 lr 0.005449 3d_train 21.276216 3d_test_p1 47.400000 3d_test_p2 39.000000
evaluating....
[18] time 4.01 lr 0.005395 3d_train 21.145432 3d_test_p1 45.800000 3d_test_p2 38.600000
evaluating....
[19] time 4.00 lr 0.005341 3d_train 20.996187 3d_test_p1 44.400000 3d_test_p2 38.100000
save best checkpoint
evaluating....
[20] time 4.01 lr 0.005287 3d_train 20.866346 3d_test_p1 47.900000 3d_test_p2 39.600000
evaluating....
[21] time 4.00 lr 0.005235 3d_train 20.739880 3d_test_p1 47.300000 3d_test_p2 39.100000
evaluating....
[22] time 4.99 lr 0.005182 3d_train 20.644170 3d_test_p1 46.400000 3d_test_p2 38.800000
evaluating....
[23] time 4.06 lr 0.005130 3d_train 20.531064 3d_test_p1 44.900000 3d_test_p2 38.300000
evaluating....
[24] time 4.56 lr 0.005079 3d_train 20.443442 3d_test_p1 46.500000 3d_test_p2 38.800000
evaluating....
[25] time 4.69 lr 0.005028 3d_train 20.305373 3d_test_p1 46.400000 3d_test_p2 39.200000
evaluating....
[26] time 4.61 lr 0.004978 3d_train 20.219376 3d_test_p1 45.800000 3d_test_p2 38.000000
evaluating....
[27] time 4.54 lr 0.004928 3d_train 20.140064 3d_test_p1 45.400000 3d_test_p2 38.200000
evaluating....
[28] time 4.52 lr 0.004879 3d_train 20.054451 3d_test_p1 45.400000 3d_test_p2 39.100000
evaluating....
[29] time 4.57 lr 0.004830 3d_train 19.944584 3d_test_p1 46.000000 3d_test_p2 38.200000
evaluating....
[30] time 4.53 lr 0.004782 3d_train 19.868793 3d_test_p1 46.700000 3d_test_p2 38.300000
evaluating....
[31] time 4.49 lr 0.004734 3d_train 19.795471 3d_test_p1 46.000000 3d_test_p2 38.200000
evaluating....
[32] time 4.55 lr 0.004687 3d_train 19.692436 3d_test_p1 44.800000 3d_test_p2 38.200000
evaluating....
[33] time 4.58 lr 0.004640 3d_train 19.614389 3d_test_p1 45.900000 3d_test_p2 38.000000
evaluating....
[34] time 4.44 lr 0.004593 3d_train 19.536873 3d_test_p1 45.300000 3d_test_p2 38.100000
evaluating....
[35] time 4.19 lr 0.004548 3d_train 19.439839 3d_test_p1 45.300000 3d_test_p2 38.400000
evaluating....
[36] time 4.12 lr 0.004502 3d_train 19.388139 3d_test_p1 44.600000 3d_test_p2 37.500000
evaluating....
[37] time 4.13 lr 0.004457 3d_train 19.285385 3d_test_p1 45.600000 3d_test_p2 38.400000
evaluating....
[38] time 4.11 lr 0.004412 3d_train 19.235931 3d_test_p1 45.100000 3d_test_p2 37.800000
evaluating....
[39] time 4.12 lr 0.004368 3d_train 19.170924 3d_test_p1 46.600000 3d_test_p2 38.800000
evaluating....
[40] time 4.11 lr 0.004325 3d_train 19.090504 3d_test_p1 45.200000 3d_test_p2 38.000000
evaluating....
[41] time 4.11 lr 0.004281 3d_train 19.003893 3d_test_p1 44.100000 3d_test_p2 37.500000
save best checkpoint
evaluating....
[42] time 4.12 lr 0.004239 3d_train 18.934756 3d_test_p1 45.600000 3d_test_p2 38.200000
evaluating....
[43] time 4.13 lr 0.004196 3d_train 18.869245 3d_test_p1 46.500000 3d_test_p2 38.500000
evaluating....
[44] time 4.13 lr 0.004154 3d_train 18.793367 3d_test_p1 44.300000 3d_test_p2 37.500000
evaluating....
[45] time 4.11 lr 0.004113 3d_train 18.724124 3d_test_p1 46.000000 3d_test_p2 38.400000
evaluating....
[46] time 4.08 lr 0.004072 3d_train 18.644237 3d_test_p1 43.600000 3d_test_p2 37.000000
save best checkpoint
evaluating....
[47] time 4.09 lr 0.004031 3d_train 18.576078 3d_test_p1 44.300000 3d_test_p2 37.900000
evaluating....
[48] time 4.10 lr 0.003991 3d_train 18.521755 3d_test_p1 44.600000 3d_test_p2 37.600000
evaluating....
[49] time 4.64 lr 0.003951 3d_train 18.450661 3d_test_p1 46.600000 3d_test_p2 38.500000
...

Evaluation:

Loading backbone from data/pretrained/coco/pose_hrnet_w32_256x192.pth

Loading checkpoint from checkpoint/best_epoch.bin
Loading data...
Trainable parameter count: 14094147
evaluating....
Directions p1: 36.63153281294129 p2: 33.322910581303375 e_vel: 9.164078815738552
Discussion p1: 39.82447826629832 p2: 34.028286791734374 e_vel: 8.781655082878878
Eating p1: 41.99778062325656 p2: 37.52106284586025 e_vel: 7.93242165380017
Greeting p1: 40.52818624621007 p2: 36.16558751165517 e_vel: 9.61580945074562
Phoning p1: 44.55775526013358 p2: 39.48139939858347 e_vel: 8.588618913769292
Posing p1: 35.120068948246164 p2: 30.16375467219442 e_vel: 7.4973565293475986
Purchases p1: 41.532423378916306 p2: 34.51261582311643 e_vel: 8.295511473545925
Sitting p1: 53.01805047344423 p2: 46.22330983671712 e_vel: 9.160967384168965
SittingDown p1: 66.42743196575857 p2: 57.5437042211559 e_vel: 11.916997125804917
Smoking p1: 42.44397760450428 p2: 37.35604246408777 e_vel: 8.832516694819118
TakingPhoto p1: 47.88065968744589 p2: 40.48026534455535 e_vel: 8.756812389985763
Waiting p1: 40.25246101332572 p2: 33.56289316008893 e_vel: 7.907669573507538
Walking p1: 32.0853940534577 p2: 26.509119530232343 e_vel: 9.03892060352246
WalkingDog p1: 43.24917765434277 p2: 38.02175195903479 e_vel: 9.992337857260921
WalkingTogether p1: 36.09475831218842 p2: 31.240653150390727 e_vel: 9.073780178528253
avg p1: 42.8 p2: 37.1 MPJVE: 8.97
Done.

A question about keypoints

Are keypoints_2d_cpn and keypoints_2d_cpn_crop respectively representing the keypoints detected on the original image and the keypoints after cropping?

Data loading speed

Thanks for your excellent work. Currently, data loads are stuck after every k iteration, where k is related to the number of num_workers I set for the data loader. I have tried this problem on different servers. May I ask what causes it?

About sample number on human36m.

Hi Qitao, thanks for your work on comparing image-base method and lifting method.

I wonder know how many samples you have used on human36m experiment about single-frame?
I find the number of samples in 'h36m_validation.pkl' you provided is 543,344, and 1,559,752 in 'h36m_train.pkl'.
I want to know do you use all samples in training and validation? If not, how to downsample in your exp?

Thanks!

Train/val on MPI-INF-3DHP

Thank you for your excellent work! I wonder if you would be willing to share the train/val dataloader for MPI-INF-3DHP.
Thank you in advance for considering my request.

cpn50_256x192.pth.tar

Thanks for your awesome work. Could you share the file ‘data/pretrained/coco/CPN50_256x192.pth.tar ’ or provide a download link?
I would be very glad if you could help me.

Training/Testing on MPI-INF-3DHP

Hi @QitaoZhao,

thanks for your very interesting work!

Could you please clarify the following points:

did you train on the MPI-INF-3DHP training set and then evaluate on the test set?
OR
did you directly evaluate your model pre-trained on Human3.6M on the MPI-INF-3DHP test set (cross-dataset scenario)?

Thanks in advance for your response.

Question about train.py

I followed the readme file to process the data, completed the preparatory work, and entered the training and testing instructions. However, I was prompted that human36m was not defined and could not be trained or tested

name 'human36m' is not defined
File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 57, in setup_human36m_dataloaders
train_dataset = eval(config.dataset.train_dataset)(
File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 126, in setup_dataloaders
train_dataloader, val_dataloader, train_sampler, dist_size = setup_human36m_dataloaders(config, is_train, distributed_train, rank, world_size)
File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 460, in main
train_dataloader, val_dataloader, train_sampler, whole_val_dataloader, dist_size = setup_dataloaders(config, distributed_train=is_distributed, rank=rank, world_size=world_size)
File "/mnt/newdisk3/bzw/code/ContextAware-PoseFormer/ContextPose/train.py", line 560, in
main(args)
NameError: name 'human36m' is not defined

Questions about HRNet detected keypoints 2d in the provided files "h36m_train.pkl" and "h36m_validation.pkl"

Thanks for your contributions to the 3D HPE! Could you please tell me how you get the "joints_2d_hrnet" in the "h36m_train.pkl" and "h36m_validation.pkl"? Are there any codes I can refer to?

how to test on my self-vifeo.mp4

The real learning rate do not match with paper

When I run your code step by step, I found the real learning rate in optimizer is 0.00016 because the second dictionary is empty in

(https://github.com/QitaoZhao/ContextAware-PoseFormer/blob/a2456578e8cd25f9fd99dacdf81d2e3623ca127b/ContextPose/train.py#L420C4-L425C6)

Thus all parameters are updated with learning rate 0.00016.

A bug in data_prefetcher

Thank you for your excellent work! However, I found an error (maybe a bug?) in your implementation.
First, you set the absolute 3d keypoint ground truth to relative coordinates by
keypoints_3d_gt[:, :, 1:] -= keypoints_3d_gt[:, :, :1]
keypoints_3d_gt[:, :, 0] = 0
in ContextAware-PoseFormer/ContextPose/mvn/datasets/utils.py line 44.
And then the 0th keypoint's coordinate would be set to 0. And this would cause an error when evaluating the results after an epoch. Because in ContextAware-PoseFormer/ContextPose/mvn/models/loss.py , P_MPJPE loss, you devide X0 by 0 and generate nan in keypoints coordinate, which would raise an error in np.linalg.svd(H).
Could you please tell me how to solve with this error?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.