gengzigang / pct Goto Github PK

This is an official implementation of our CVPR 2023 paper "Human Pose as Compositional Tokens" (https://arxiv.org/pdf/2303.11638.pdf)

License: MIT License

Python 99.71% Shell 0.29%

pct's People

Contributors

Stargazers

Watchers

Forkers

chunyuwang xxxnhb eijun-xmu sayoko17 xinfushe winson-du-ai naynasa superjay1996 aleeyang skeli9989 aliaksandrstasiuk dwro-creatz chssozxw quietscientist rkuo2000 tommy-hsu shiaoyoungcui

pct's Issues

想看一看，环境弄了许久也没成功

Traceback (most recent call last):
File "./tools/train.py", line 15, in
from mmpose.apis import train_model
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/apis/init.py", line 2, in
from .inference import (collect_multi_frames, inference_bottom_up_pose_model,
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/apis/inference.py", line 17, in
from mmpose.datasets.dataset_info import DatasetInfo
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/datasets/init.py", line 7, in
from .datasets import ( # isort:skip
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/datasets/datasets/init.py", line 2, in
from ...deprecated import (TopDownFreiHandDataset, TopDownOneHand10KDataset,
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/deprecated.py", line 5, in
from .datasets.datasets.base import Kpt2dSviewRgbImgTopDownDataset
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/datasets/datasets/base/init.py", line 2, in
from .kpt_2d_sview_rgb_img_bottom_up_dataset import
File "/root/miniconda3/lib/python3.8/site-packages/mmpose/datasets/datasets/base/kpt_2d_sview_rgb_img_bottom_up_dataset.py", line 8, in
from xtcocotools.coco import COCO
File "/root/miniconda3/lib/python3.8/site-packages/xtcocotools/coco.py", line 58, in
from . import mask as maskUtils
File "/root/miniconda3/lib/python3.8/site-packages/xtcocotools/mask.py", line 3, in
import xtcocotools._mask as _mask
File "xtcocotools/_mask.pyx", line 1, in init xtcocotools._mask
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

mmcv版本问题

我在执行指令./tools/dist_train.sh configs/pct_[base/large/huge]_classifier.py 8时，发生错误：AssertionError: MMCV==1.7.0 is used but incompatible. Please install mmcv>=2.0.0rc4, <=2.1.0.
接着我卸载当前版本的mmcv并下载2.0.0rc4版本，但是又出现错误：ImportError: cannot import name 'Config' from 'mmcv'，该如何解决呀

information about the backbone

hi , can i have better information about the swin backbone you used ? Is your backbone a modified version of swin transformers? I tried to have the number of parameter of the backbone only and it turn out that it only contains 8000 parameters what is different from the one provided initially by Microsoft that is around a billion of parameter. Can you please give me more information about the swin backbone you used ?
Number of parameters ?
FLOPs? (only the backbone)

分布式训练

请问该模型只能使用分布式训练吗？如何不使用分布式训练使用该模型？

Question about the dimension of token feature

Hello, thanks for your attractive idea in this paper.
In the paper, the dimension of each token feature is H and the dimension of each embedding in codebook is N.
Is H equal to N?
Because in eq.2 we need to calculate the distance between these two vectors, the dimension should be same.
Looking forward to your reply!

About Ablation

Hi, I've been reading your paper, good work!
However, there are 2 terms i don't quite understand. In 4.5 section(ablation study), you mentioned Image Guidance and auxiliary Pose Reconstruction Loss, I don't know what they refer to since I'm new to this field. Could you explain?

onnx

想问一下支持转成onnx吗

Question about well-trained checkpoint

hi, I've been trying to use your checkpoint for my private project. I wonder whether I can train my own pct, so I train them on COCO solely. After that, I plug the obtained model into my project, but it seems that it yielded subpar results.
I want to know the publicly available checkpoint is trained on what datasets @Gengzigang
Looking forward to your reply.

model

builder.py里面的函数怎么都被注释掉了，配了半天环境以为是我的问题

h36m dataset and result reproduction

Hi, thanks for your insightful work. I was able to reproduce the paper's results on Coco dataset, and I am attempting to reproduce the results with H36m. However, I do not find any sample code nor instructions on that. Would you please help and share code or instruct on how you conducted the experiments on h36m?

How to train and test on an MPII dataset?

Thank you for sharing code. I am a beginner in human pose estimation. I noticed that your method achieved very good results on the MPII dataset, so I would like to use it on the MPII dataset. But coco and MPII is not quite the same. How can I train the model on MPII data? Hope to receive advice, thank you!

chumpy installation

I am having a problem regarding the installation of mmpose module. I need to use chumpy, which does not recognize pip, what can I do to solve this problem ? Did anybody else have this ?

Errors in version package

I'm trying to train model with py version 3.10.10 on kaggle. Any can help me to choose the right version of all package in requirement.txt . Thanks

score of each keypoint is same

Hi, thanks for opening such a great project.

I ran multiple inferences using your pretrained swin_large model, and the scores of each keypoint are same, even if it doesn't seem right.

How can I fix this so that I can seek out keypoints that are not in the image?

FLOPS params

How to make the model work on Windows

该项目支持对视频进行推理吗

Questions about codebook implementation

Hi, Could I know how you initialize a new codebook? Furthermore, in you paper, you demonstrated that the codebook is updated by using exponential moving average of previous tokens, I wonder this step is in Stage I (i.e. train encoder) or is an additional stage before training tokenizers?

add the other dataset to improve the performance

Have you add the CrowdPose, OCHuman, SyncOCC dataset to train the pct model ,try to improve the performance of 2d pose estimation

I have a question about training on MPII dataset. How to train on this dataset?

I have the idea that just change the config and dir to mpii dataset. Is this right? is the mpii dataset is almost the same with coco dataset

How to inference 3D pose?

Hi, nice work. Thank you for sharing code,
I have ran the demo, but it produces only 2D pose.
I am trying to get wholebody (e.g., COCO-wholebody) 3D pose
How can I get wholebody or 3D pose?

Thank you

When training a classifier, don't freeze the weights of the decoder and codebook?

hi, When training a classifier, don't freeze the weights of the decoder and codebook? I didn't find any steps in the code for freezing weights.

Is this model only works well for swin backbone??

I tried to change the backbone to see the performances of the model , i retrained my backbone(Resnet) with heatmap supervision on coco as said in the doc of the repository and then trained the tokenizer , and the Classifier but i get bad Precision for the final model (AP = 0.150 and [email protected] = 0.4) is there something i missed ? or just the model works only for swin?

希望得到建议

现在我训练好了模型，并成功运行了dist_test.sh。我应该怎么做，来实现demo中的效果呢？即如何输出关键点提取后的图像？
希望能得到一些建议

KeyError: 'PCT is not in the models registry'

When I run your demo, I got that error. How can I solve it?

the output score of the keypoint

the output score of the keypoint more than 1, what does the output score means

how to train the classifier model

Hello @Gengzigang
I tried using lightweight backbone to train the tokenizer and classifier ,but i get a bad result.Is there something i missed?

The result of tokenizer model:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.965
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.990

The result of classifier model:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.365
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.733

I changed the backbone and optimizer for config of classifier model :
optimizer = dict(type='AdamW', lr=8e-4, betas=(0.9, 0.999), weight_decay=0.05)

When i start train the classifier model ,the ap is 0.005 ->0.007. SO how to train the classifier model?

Thank you for taking the time to answer.

How to train separately on an MPII dataset? Can you share the configuration file?

what about model's inference

@Gengzigang how did you do the model's inferecence for a single image ? is there a file that miss in your repository? i checked the one of mmpose but it doesn't worked for me , can you share your working environment for the inference?

How to train the classifier model?

According the paper and code, the backbone should be frozen for faster training, however, the provided classifier model do not have the same parameters with the provided backbone, it is really confused for training the final classifier model?
The final classifier model only have 0.343 mAP on COCO dataset with "pct_base_woimgguide_classifier.py".
{"mode": "val", "epoch": 210, "iter": 67, "lr": 0.0, "AP": 0.3434, "AP .5": 0.68824, "AP .75": 0.30028, "AP (M)": 0.33237, "AP (L)": 0.36357, "AR": 0.38416, "AR .5": 0.71678, "AR .75": 0.36288, "AR (M)": 0.3646, "AR (L)": 0.41226}.

It will be very helpful if this question could be solved, looking forward to your response.

想问一下stage1的量化误差是多少？

想问一下stage1的量化误差是多少？论文里貌似没有看到相关数据。

Module not found error!

user/PCT/tools/train.py", line 19, in
from models import build_posenet
ModuleNotFoundError: No module named 'models'

I face with this issue although there is a models folder under PCT.
How can fix it?

All keypoints sharing the same confidence in pose_results

When I ran demo_img_with_mmdet.py, I checked the output pose_results from inference_top_down_pose_model(). All the 17 keypoints share the same confidence value, which seems to be the aggregated confidence of all 17 keypoints. Could you fix the issue to show individual confidence of each keypoint in the demo script?

Curiosity about Model Choice: Swin-based vs. ViTPose with PCT

Hello @Gengzigang and team,

The idea of representing human pose as compositional tokens (PCT) is both unique and compelling. By modeling the relationship between keypoints in such a structured manner, it's pretty inspiring.

However, I have a question regarding your model choice. I noticed that you opted for a Swin-based model for implementation. Given the current success and traction of ViTPose, I'm curious as to why you didn't choose to integrate PCT directly with ViTPose. Was there a specific reason or advantage for preferring the Swin-based model over ViTPose when incorporating PCT?

Thank you for taking the time to answer. I'm eager to delve deeper into your work and truly appreciate the effort you've put into this research. Looking forward to your insights!

Warm regards,
Jia-Yau

Question about classifier training

@Gengzigang please, in your paper in the training process of the classifier you said you fixed the backbone for save computation cost thus only the classification head is updated.
My problem is that i changed the backbone and i have the good computation power and i want to update the backbone during the classifier training. I explore your code but didn't find where you specified that in order to change it. Could you please give me hints ?

Upgrade mmcv/mmpose version to newest

First, thank you for this awesome project.

According to issue #1 , mmcv and mmpose are respectively set to 1.7.0 and 0.29.0. With these versions, the code works perfectly well, but since I need to use other models provided by mmpose (using latest version, i.e. > 1.0.0), I have compatibility issues with these two projects.

So, I was wondering if you had the intention to upgrade the current project using newest versions of mmcv/mmpose in a near future. I know that this is a really painful task (that's why I'm not willing to do it by myself), so I'm not trying to force you anything, I just want to know in advance if you were planning to make these changes or not.