Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li^*, Xingao Li^*, Changxing Ding, Xiangmin Xu

The paper is accepted to CVPR2024.

Preparation

Environment

Install the dependencies.

pip install -r requirements.txt

Clone and build CLIP.

git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Dataset

haa500 dataset

Download the haa500 dataset from the following URL and unzip it to the DP-HOI/pre_datasets folder.

http://xxxx

run pre_haa500.py

python ./pre_datasets/pre_haa500.py

Move the processed haa500 dataset to the DP-HOI/data folder.

kinetics700 dataset

Download the kinetics700 dataset from the following URL and unzip it to the DP-HOI/pre_datasets folder.

http://xxxx

run pre_kinetics700.py

python ./pre_datasets/pre_kinetics700.py

Move the processed kinetics700 dataset to the DP-HOI/data folder.

flickr30k dataset

Download the flickr30k dataset from the following URL and directly unzip it to the DP-HOI/data folder.

http://xxxx

Move the processed json file in the DP-HOI/pre_datasets/train_flickr30k.json to the DP-HOI/data/flickr30k/annotations folder

vg dataset

Download the vg dataset from the following URL and directly unzip it to the DP-HOI/data folder.

http://xxxx

Move the processed json file in the DP-HOI/pre_datasets/train_vg.json to the DP-HOI/data/vg/annotations folder

objects365 dataset

Download the objects365 dataset from the following URL and directly unzip it to the DP-HOI/data folder.

http://xxxx

Move the processed json file in the DP-HOI/pre_datasets/train_objects365_10k.json to the DP-HOI/data/objects365/annotations folder

When you have completed the above steps, the pre-training dataset structure is:

DP-HOI
 |─ data
 |   └─ coco
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |─ images
 |       |   |─ test2015
 |       |   └─ train2015
 
 |   └─ object365
 |       |─ annotations
 |       |   |─ train_objects365_10k.json
 |       |─ images
 |       |   |─ train2014
 
 |   └─ haa500
 |       |─ annotations
 |       |   |─ train_haa500_50k.json
 |       |─ images
 |       |   └─ train

 |   └─ kinetics700
 |       |─ annotations
 |       |   |─ train_kinetics700_10k.json
 |       |─ images
 |       |   └─ train

 |   └─ flickr30k
 |       |─ annotations
 |       |   |─ train_flickr30k.json
 |       |─ images
 |       |   └─ train

 |   └─ vg
 |       |─ annotations
 |       |   |─ train_vg.json
 |       |─ images
 |       |   └─ train

Initial parameters

To speed up the pre-training process, consider using DETR's pre-trained weights for initialization. Download the pretrained model of DETR detector for ResNet50 , and put it to the params directory.

Pre-training

After the preparation, you can start training with the following commands.

sh ./config/train.sh

Fine-tuning

After pre-training, you can start fine-tuning with the following commands. An example of fine-tuning on HOICLIP is provided below.

python ./tools/convert_parameters.py \
        --finetune_model hoiclip \
        --load_path params/dphoi_res50_3layers.pth \
        --save_path params/dphoi_res50_hico_hoiclip.pth \
        --dataset hico \
        --num_queries 64 
sh ./scripts/finetune/hoiclip/train_hico.sh

Pre-trained model

You can also directly download the pre-trained model of DP-HOI for ResNet50.

Results

HICO-DET

	Full (D)	Rare (D)	Non-rare (D)	Model	Config
ours (UPT)	33.36	28.74	34.75	model	config
ours (PViC)	35.77	32.26	36.81	model	config
ours (CDN-S^†)	35.00	32.38	35.78	model	config
ours (CDN-S^†+CCS^*)	35.38	34.61	35.61	model	config
ours (HOICLIP)	36.56	34.36	37.22	model	config

D: Default, †: DN strategy from DN-DETR, *: data augmentation strategy from DOQ. The weights fine-tuned on HICO-DET for two-stage methods (e.g., UPT and PViC) can be download here.

V-COCO

	Scenario 1	Model	Config
ours (GEN_s)	66.6	model	config

Zero-shot HOI Detection Results

	Type	Unseen	Seen	Full	Model	Config
ours (HOICLIP)	UV	26.30	34.49	33.34	model	config
ours (HOICLIP)	RF-UC	30.49	36.17	35.03	model	config
ours (HOICLIP)	NF-UC	28.87	29.98	29.76	model	config

Citation

Please consider citing our paper if it helps your research.

@inproceedings{disentangled_cvpr2024,
author = {Zhuolong Li,Xingao Li,Changxing Ding,Xiangmin Xu},
title = {Disentangled Pre-training for Human-Object Interaction Detection},
booktitle={CVPR},
year = {2024},
}

Acknowledgement

Codes are built from DETR, DN-DETR, CLIP. We thank them for their contributions.

xingaoli / dp-hoi Goto Github PK

dp-hoi's Introduction

Disentangled Pre-training for Human-Object Interaction Detection

Preparation

Environment

Dataset

Initial parameters

Pre-training

Fine-tuning

Pre-trained model

Results

HICO-DET

V-COCO

Zero-shot HOI Detection Results

Citation

Acknowledgement

dp-hoi's People

Contributors

Stargazers

Watchers

dp-hoi's Issues

Recommend Projects

Recommend Topics

Recommend Org