Giter Site home page Giter Site logo

dp-hoi's Introduction

Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li*, Xingao Li*, Changxing Ding, Xiangmin Xu

The paper is accepted to CVPR2024.

Preparation

Environment

  1. Install the dependencies.
pip install -r requirements.txt
  1. Clone and build CLIP.
git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Dataset

  1. haa500 dataset

  Download the haa500 dataset from the following URL and unzip it to the DP-HOI/pre_datasets folder.

http://xxxx

  run pre_haa500.py

python ./pre_datasets/pre_haa500.py

  Move the processed haa500 dataset to the DP-HOI/data folder.

  1. kinetics700 dataset

  Download the kinetics700 dataset from the following URL and unzip it to the DP-HOI/pre_datasets folder.

http://xxxx

  run pre_kinetics700.py

python ./pre_datasets/pre_kinetics700.py

  Move the processed kinetics700 dataset to the DP-HOI/data folder.

  1. flickr30k dataset

  Download the flickr30k dataset from the following URL and directly unzip it to the DP-HOI/data folder.

http://xxxx

  Move the processed json file in the DP-HOI/pre_datasets/train_flickr30k.json to the DP-HOI/data/flickr30k/annotations folder

  1. vg dataset

  Download the vg dataset from the following URL and directly unzip it to the DP-HOI/data folder.

http://xxxx

  Move the processed json file in the DP-HOI/pre_datasets/train_vg.json to the DP-HOI/data/vg/annotations folder

  1. objects365 dataset

  Download the objects365 dataset from the following URL and directly unzip it to the DP-HOI/data folder.

http://xxxx

  Move the processed json file in the DP-HOI/pre_datasets/train_objects365_10k.json to the DP-HOI/data/objects365/annotations folder

When you have completed the above steps, the pre-training dataset structure is:

DP-HOI
 |─ data
 |   └─ coco
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |─ images
 |       |   |─ test2015
 |       |   └─ train2015
 
 |   └─ object365
 |       |─ annotations
 |       |   |─ train_objects365_10k.json
 |       |─ images
 |       |   |─ train2014
 
 |   └─ haa500
 |       |─ annotations
 |       |   |─ train_haa500_50k.json
 |       |─ images
 |       |   └─ train

 |   └─ kinetics700
 |       |─ annotations
 |       |   |─ train_kinetics700_10k.json
 |       |─ images
 |       |   └─ train

 |   └─ flickr30k
 |       |─ annotations
 |       |   |─ train_flickr30k.json
 |       |─ images
 |       |   └─ train

 |   └─ vg
 |       |─ annotations
 |       |   |─ train_vg.json
 |       |─ images
 |       |   └─ train

Initial parameters

To speed up the pre-training process, consider using DETR's pre-trained weights for initialization. Download the pretrained model of DETR detector for ResNet50 , and put it to the params directory.

Pre-training

After the preparation, you can start training with the following commands.

sh ./config/train.sh

Fine-tuning

After pre-training, you can start fine-tuning with the following commands. An example of fine-tuning on HOICLIP is provided below.

python ./tools/convert_parameters.py \
        --finetune_model hoiclip \
        --load_path params/dphoi_res50_3layers.pth \
        --save_path params/dphoi_res50_hico_hoiclip.pth \
        --dataset hico \
        --num_queries 64 
sh ./scripts/finetune/hoiclip/train_hico.sh

Pre-trained model

You can also directly download the pre-trained model of DP-HOI for ResNet50.

Results

HICO-DET

Full (D) Rare (D) Non-rare (D) Model Config
ours (UPT) 33.36 28.74 34.75 model config
ours (PViC) 35.77 32.26 36.81 model config
ours (CDN-S) 35.00 32.38 35.78 model config
ours (CDN-S+CCS*) 35.38 34.61 35.61 model config
ours (HOICLIP) 36.56 34.36 37.22 model config

D: Default, †: DN strategy from DN-DETR, *: data augmentation strategy from DOQ. The weights fine-tuned on HICO-DET for two-stage methods (e.g., UPT and PViC) can be download here.

V-COCO

Scenario 1 Model Config
ours (GENs) 66.6 model config

Zero-shot HOI Detection Results

Type Unseen Seen Full Model Config
ours (HOICLIP) UV 26.30 34.49 33.34 model config
ours (HOICLIP) RF-UC 30.49 36.17 35.03 model config
ours (HOICLIP) NF-UC 28.87 29.98 29.76 model config

Citation

Please consider citing our paper if it helps your research.

@inproceedings{disentangled_cvpr2024,
author = {Zhuolong Li,Xingao Li,Changxing Ding,Xiangmin Xu},
title = {Disentangled Pre-training for Human-Object Interaction Detection},
booktitle={CVPR},
year = {2024},
}

Acknowledgement

Codes are built from DETR, DN-DETR, CLIP. We thank them for their contributions.

dp-hoi's People

Contributors

xingaoli avatar zlli3 avatar

Stargazers

Mingda Jia avatar  avatar  avatar Tianxiao avatar Mohammad Reza Taesiri avatar  avatar lizhongguo avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

dp-hoi's Issues

Timeline for testing code and full training code

Impressive and innovative work! Could you please share any information on the expected release timeline for testing code and full training code? Having access to these would be immensely beneficial. Thank you for your contributions and looking forward to the update.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.