Giter Site home page Giter Site logo

unipose's Introduction

🤩 News

  • 2024.02.14: We update a file to highlight all classes (1237 classes) in the UNIKPT dataset.
  • 2023.11.28: We are excited to highlight the 68 face keypoints detection ability of UniPose across any categories in this figure. The definition of face keypoints follows this dataset.
  • 2023.11.9: Thanks to OpenXLab, you can try a quick online demo. Looking forward to the feedback!
  • 2023.11.1: We release the inference code, demo, checkpoints, and the annotation of the UniKPT dataset.
  • 2023.10.13: We release the arxiv version.

In-the-wild Test via UniPose

UniPose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.


Detecting any Face Keypoints:


🗒 TODO

  • Release inference code and demo.
  • Release checkpoints.
  • Release UniKPT annotations.
  • Release training codes.

💡 Overview

• UniPose is the first end-to-end prompt-based keypoint detection framework.


• It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).

Visual Prompts as Inputs:


Textual Prompts as Inputs:


🔨 Environment Setup

  1. Clone this repo
git clone https://github.com/IDEA-Rensearch/UniPose.git
cd UniPose
  1. Install the needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/UniPose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

▶ Demo

1. Guidelines

• We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.

• Since UniPose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.

• If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.

2. Run

Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command

CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)

We also support the inference using gradio.

python app.py

Checkpoints

name backbone Keypoint AP on COCO Checkpoint Config
1 UniPose Swin-T 74.4 Google Drive / OpenXLab GitHub Link
2 UniPose Swin-L 76.8 Coming Soon Coming Soon

The UniKPT Dataset


Datasets KPT Class Images Instances Unify Images Unify Instance
COCO 17 1 58,945 156,165 58,945 156,165
300W-Face 68 1 3,837 4,437 3,837 4,437
OneHand10K 21 1 11,703 11,289 2,000 2000
Human-Art 17 1 50,000 123,131 50,000 123,131
AP-10K 17 54 10,015 13,028 10,015 13,028
APT-36K 17 30 36,000 53,006 36,000 53,006
MacaquePose 17 1 13,083 16,393 2,000 2,320
Animal Kingdom 23 850 33,099 33,099 33,099 33,099
AnimalWeb 9 332 22,451 21,921 22,451 21,921
Vinegar Fly 31 1 1,500 1,500 1,500 1,500
Desert Locust 34 1 700 700 700 700
Keypoint-5 55/31 5 8,649 8,649 2,000 2,000
MP-100 561/293 100 16,943 18,000 16,943 18,000
UniKPT 338 1237 - - 226,547 418,487

• UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.

• All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.

• We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.

Citing UniPose

If you find this repository useful for your work, please consider citing it as follows:

@article{yang2023unipose,
  title={UniPose: Detection Any Keypoints},
  author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
  journal={arXiv preprint arXiv:2310.08530},
  year={2023}
}
@inproceedings{yang2023neural,
  title={Neural Interactive Keypoint Detection},
  author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15122--15132},
  year={2023}
}
@inproceedings{yang2022explicit,
  title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
  author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
  booktitle={The Eleventh International Conference on Learning Representations},
  year={2022}
}

unipose's People

Contributors

ailingzengzzz avatar yangjie-cv avatar guoqincode avatar

Stargazers

 avatar  avatar James avatar Austin Yang avatar jameskuma avatar Daniel Albohn avatar Muad Abd El Hay avatar  avatar LittleFish avatar  avatar Jianxff avatar Sherry X. Chen avatar Jas avatar Xin Cheng avatar MillX avatar sparrow avatar JiaweiPeng avatar Cioclea Doru Octavian avatar Deng Yu avatar Martin avatar Yang Liu avatar Bode Sule avatar  avatar Li-ChengYan avatar Liu liheng avatar wz avatar  avatar Varun Ganjigunte Prakash avatar Jeff Zhang avatar  avatar wuyujack (Mingfu Liang) avatar siyuan-li avatar Brian avatar yangyangzhao avatar Izzy Turtle avatar Taro avatar  avatar  avatar Xuejian Rong avatar Bo Chen avatar Bo Miao avatar  avatar Naoki Katsura avatar Xianing Chen avatar Justin Hinman avatar  avatar  avatar 王奇勋 avatar Markus Marks avatar nChieeF avatar Mattia Ceccarelli avatar Aria F avatar Rui Zhao avatar Felipe Parodi avatar Tianqing Li avatar Wenyuan avatar Azhar Hussian avatar Jehyun Kim avatar  avatar  avatar  avatar Lau Van Kiet avatar  avatar Xin Dong avatar Cheng-Bin Jin avatar Lennn avatar Grigory Feldman avatar Kellyxiaowei avatar  avatar HJ Im avatar Chengxi Guo avatar Kim Junho avatar Mateusz Stankiewicz avatar Calvin-Khang Ta avatar  avatar Jeff Carpenter avatar  avatar  avatar  avatar  avatar Kush Sahni avatar Baorui Ma avatar Haoge Deng avatar  avatar Ronghuan Wu avatar Naveen Venkat avatar Lars van der Bijl avatar  avatar ChenXiaodong avatar  avatar  avatar leekang avatar  avatar Julia avatar Huan Liu avatar  avatar  avatar RUAN SHIHAI avatar estepnv avatar Omar Moustafa  avatar

Watchers

gradetwo avatar OKUMURA Yoshio avatar  avatar PeterZs avatar Huang Lianghua avatar Tianxiao avatar visonpon avatar signal processing fan avatar Markus Marks avatar  avatar  avatar  avatar Matt Shaffer avatar Ulises avatar  avatar  avatar  avatar CYM avatar

unipose's Issues

Method name is copied from CVPR published Pose Estimation paper

Dear authors,

The name of your method "UniPose" refers to a Human Pose Estimation method that already exists and was published at CVPR 2020 (https://openaccess.thecvf.com/content_CVPR_2020/html/Artacho_UniPose_Unified_Human_Pose_Estimation_in_Single_Images_and_Videos_CVPR_2020_paper.html). Despite using the exact same name for the same application (which is not allowed), there was also no proper citation of previous works in the area.

Please rename your method, respecting copyright and plagiarism rules for scientific publications.

Does the online demo support image prompt?

I noticed the online demo only support the instance prompt. Does it provide the image prompt?
By the way, what the different between "instance prompt" and "keypoint example". When I upload a face image, and input face in "instance prompt" and "the left eye of the face" in "keypoint example". The result shows full face keypoint.

Regarding Licence

Hello! Thanks for open-sourcing the great work! Could you please confirm the Licence associated with this repository and the models?

only input images

In the inference stage, only input images, can I get the keypoints of the object?

How to distinguish which dataset the image belongs to in unikpt annotation

unikpt is a collection of multiple datasets, but except for coco, other images seem to have no way to distinguish which dataset they belong to from the annotations, such as the following example:
COCO can be identified by "coco_url".

print(json['images'][0])
{'license': 3, 'file_name': '000000391895.jpg', 'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg', 'height': 360, 'width': 640, 'date_captured': '2013-11-14 11:18:45', 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', 'id': 391895}

others:

print(p_json['images'][120050])
{'flickr_url': 'unknown', 'coco_url': 'unknown', 'file_name': 'car_fifth2/images_jpg/2_07961.jpg', 'id': 200207961, 'license': 1, 'date_captured': 'unknown', 'width': 1920, 'height': 1080}
print(json['images'][150050])
{'coco_url': '', 'date_captured': '', 'file_name': '017071.jpg', 'flickr_url': '', 'id': 17071, 'license': 0, 'width': 752, 'height': 752}

error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device

你好,我按照github上您发布的步骤进行测试,输入图片是椅子,然后实例提示,和关键点提示都是参照提前预定好的,但是测试时"pred_logits"这个变量中出现了大量的inf值导致无法输出测试结果。同时在我测试的时候还伴随着error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device这行警告,我想知道这二者之间是否有联系以及出现上述问题该如何解决呢?
如果能得到您的回复真是太感谢了~

Code Release

Thanks for this very nice work!
I am wondering if there is a code release planned?

error loading the checkpoint

Hi @yangjie-cv tks for updating your work. Iam running the inference_on_a_image.py file and meet this error:
image
I also checked the number of layers in model and in the checkpoint. they are 1165 and 1015 respectively.
can you double check the code ? tks~

About sketch style images

I notice in your paper, you have demonstrated cross-image-style images. I wonder how to test the shown sketch elephant in your paper's demo. What are the specific instance prompt and other parameters?

about train?

May I ask when will open source training code be available?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.