amazon-science / omni-detr Goto Github PK

PyTorch implementation of Omni-DETR for omni-supervised object detection: https://arxiv.org/abs/2203.16089

License: Other

Shell 1.84% Python 84.29% C++ 1.25% Cuda 12.61%

object-detection omni-supervised-learning semi-supervised-learning weakly-supervised-learning

omni-detr's Introduction

Omni-DETR: Omni-Supervised Object Detection with Transformers

This is the PyTorch implementation of the Omni-DETR paper. It is a unified framework to use different types of weak annotations for object detection.

If you use the code/model/results of this repository please cite:

@inproceedings{wang2022omni,
  author  = {Pei Wang and Zhaowei Cai and Hao Yang and Gurumurthy Swaminathan and Nuno Vasconcelos and Bernt Schiele and Stefano Soatto},
  title   = {Omni-DETR: Omni-Supervised Object Detection with Transformers},
  booktitle = {CVPR},
  Year  = {2022}
}

Installation

First, install PyTorch and torchvision. We have tested on version of 1.8.1, but the other versions should also be working, e.g. no earlier than 1.5.1.

Our implementation is partially based on Deformable DETR. Please follow its instruction for other requirements.

Usage

Dataset organization

Please organize each dataset as follows,

code_root/
└── coco/
  ├── train2017/
  ├── val2017/
  ├── train2014/
  ├── val2014/
  └── annotations/
    ├── instances_train2017.json
    ├── instances_val2017.json
    ├── instances_valminusminival2014.json
    └── instances_train2014.json
└── voc/
  └── VOCdevkit/
    └── VOC2007trainval
      ├── Annotations/
      ├── JPEGImages/
    └── VOC2012trainval/
      ├── Annotations/
      ├── JPEGImages/
    └── VOC2007test/
      ├── Annotations/
      ├── JPEGImages/
    └── VOC20072012trainval/
      ├── Annotations/
      ├── JPEGImages/
 └── objects365/
     ├── train_objects365/
        ├── objects365_v1_00000000.jpg
        ├── ...
     ├── val_objects365/
        ├── objects365_v1_00000016.jpg
        ├── ...
     └── annotations/
        ├── objects365_train.json
        └── objects365_val.json
 └── bees/
     └── ML-Data/
 └── crowdhuman/
    ├── Images/
      |── 273271,1a0d6000b9e1f5b7.jpg
      |── ...
    ├── annotation_train.odgt
    └── annotation_val.odgt

Dataset preparation

First go to scripts folder

cd scripts

COCO

To get the split labeled and omni-labeled datasets

python split_dataset_coco_omni.py

Add indicator to coco val set

python add_indicator_to_coco2017_val.py

For experiments compared with UFO, we prepare coco2014 set

python add_indicator_to_coco2014.py

VOC

First need to convert the annotation formats to coco style by

python VOC2COCO.py --xml_dir ../voc/VOCdevkit/VOC2007trainval/Annotations --json_file ../voc/VOCdevkit/VOC2007trainval/instances_VOC_trainval2007.json
python VOC2COCO.py --xml_dir ../voc/VOCdevkit/VOC2007test/Annotations --json_file ../voc/VOCdevkit/VOC2007test/instances_VOC_test2007.json
python VOC2COCO.py --xml_dir ../voc/VOCdevkit/VOC2012trainval/Annotations --json_file ../voc/VOCdevkit/VOC2012trainval/instances_VOC_trainval2012.json

To combine the annotations of voc07 and voc12 by

python combine_voc_trainval20072012.py

Add indicator to voc07 and 12

python prepare_voc_dataset.py

To get the split labeled and omni-labeled datasets

python split_dataset_voc_omni.py

Objects365

First sample a subset from the original whole training set

python prepare_objects365_for_omni.py

Add indicator to val

python add_indicator_to_objects365val.py

To get the split labeled and omni-labeled datasets

python split_dataset_objects365_omni.py

Bees

Because the official training set has some broken images (with names from Erlen_Erlen_Hive_04_1264.jpg to Erlen_Erlen_Hive_04_1842.jpg), we first need to manually delete them or run

xargs rm -r file_list_to_remove.txt

Finally, 3596 samples are kept. Next, convert the annotation formats to coco style by

python Bees2COCO.py

To split the training and validation set as 8:2

python split_bees_train_val.py

To get the split labeled and omni-labeled datasets

python split_dataset_bees_omni.py

CrowdHuman

Please follow repo to first convert annotations with odgt format to coco format, or run

python convert_crowdhuman_to_coco.py

Because we only focus on the full body detection of CrowdHuman, we first extract such annotation by

python build_crowdhuman_dataset.py

To get the split labeled and omni-labeled datasets

python split_dataset_crowdhuman_omni.py

Training Omni-DETR

After preparing datasets, please change the arguments in the config files, such as annotation_json_label, annotation_json_unlabel, according to the name of the generated json file above. The BURN_IN_STEP argument sometimes also needs to be changed (please refer to our supplementary materials). In our experiments, this hyperparameter does not have a huge impact on the results.

Because semi-supervised learning is just a special case of omni-supervised learning, to generate semi-supervised results, please modify the ratio of fully_labeled and Unsup, but set others as 0, when splitting the dataset.

Training Omni-DETR on each dataset (from the repo main folder)

Training from scratch

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_coco.sh
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_voc.sh
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_objects.sh
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_bees.sh
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_omni_crowdhuman.sh

Training from Deformable DETR

Because our burn-in stage is totally same as Deformable DETR, it is acceptable to start from a Deformable DETR checkpoint to skip the burn-in stage. Just modify the resume argument in config file above.

Before running the above scripts, you may have to run the below to change access permissions,

chmod u+x ./tools/run_dist_launch.sh
chmod u+x ./configs/r50_ut_detr_omni_coco.sh

Training under the setting of COCO35to80

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_tagsU_ufo.sh
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_point_ufo.sh

Training under the setting of VOC07to12

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_ut_detr_voc07to12_semi.sh

Note

Some of our experiments are on 800-pixels images by 8 * GPUs with 32G memory. If such memory is not affordable, please change the argument of pixels to 600. Then it can work on 8 * GPUs with 16G memory.
This code could have some minor accuracy differences from our paper due to some implementation changes after the paper submission.

License

This project is under the Apache-2.0 license. See LICENSE for details.

omni-detr's People

Contributors

Stargazers

Watchers

Forkers

zhaoweicai peiwang062 jacobbitlabs tpanza wwwbq chakrabortyrajatsubhra nhw649

omni-detr's Issues

Custom Dataset

Thanks for the amazing work.

I like to know your recommendations on how to train on custom dataset with lower number of possible classes (say 3), Then the number of queries would also need be changed. I have tired this with Deformable DETR and it does not let me change number of queries even after removing the classification head. I wonder if you fixed this issue.

Any recommendation on what and how to change the code for custom dataset implementation is very appreciated.

Is segmentation required?

Dear Authors,

I have a specific question about split_dataset_coco_omni.py

I do not have any segmentations in my annoatation. Only bbox. The code does not work if I don't have segmentation. Is other scripts and the code written under the assumption that segmentations exist? I can convert bbox to segmentation but that should not have been needed as the code should support bbox only.

My ann['segmentation'] is empty: that is []
But then I get out of range error during annToMask.

Please help.

BURN_IN_STEP

Dear Authors, can you explain what exactly is the burn in step? How does it affect the training? what are the extreme values you have tested? How is burn in step related to other variables?

And why my training gets out of memory error exactly on burn in step? even after changing the pixel size from 600 to 400 and adjusting the code accordingly.

This is the error that I get right at the epoch that is burn-in step. The code runs fine until then. Even when I bring BURNIN STEP at 2nd epoch for instance, it still goes out of memory!? Do you have any idea?

return forward_call(*input, **kwargs) File "/home/say/NEmo/omni-detr/models/deformable_transformer.py", line 221, in forward src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask) File "/home/say/.conda/envs/deformable/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/say/NEmo/omni-detr/models/ops/modules/ms_deform_attn.py", line 105, in forward + sampling_offsets / offset_normalizer[None, None, None, :, None, :] RuntimeError: CUDA out of memory. Tried to allocate .....

Thank you

code of forward propagation is incorrect

In the func of train_one_epoch_semi，the input of model is "sample", which is a list composed of tensors，the shape of tenosr is (b, c, h, w) , so tensor_list[0].ndim should be 4, but this repo only support the "if tensor_list[0].ndim == 3". I guess that the sample may be composed of some tensors which has the shape of (c, h, w), and len(sample) == batch ?

some questions about batchsize.

The batch size is defined here.
parser.add_argument('--batch_size', default=1, type=int)

But why is batch_sampler_train_burnin batchsize fixed at 2?
batch_sampler_train_label = torch.utils.data.BatchSampler(sampler_train_label, args.batch_size, drop_last=True) batch_sampler_train_unlabel = torch.utils.data.BatchSampler(sampler_train_unlabel, args.batch_size, drop_last=True) batch_sampler_train_burnin = torch.utils.data.BatchSampler(sampler_train_burnin, 2, drop_last=True)

In the paper, the batch size is set to 16, so I need modify which one?

All models of Deformable DETR are trained with total batch size of 16.

Why do we need to add tensor_label_k as input during SSOD？

The code block is as follow:
https://github.com/amazon-science/omni-detr/blob/main/engine.py#L184-L198

Why do we need to add tensor_label_k as input during SSOD？
When samples append tensor_label_q, tensor_label_k, tensor_unlabel_q as input, the cuda memory will increase until out of memery.

RuntimeError: CUDA out of memory. Tried to allocate 1.97 GiB (GPU 0; 79.35 GiB total capacity; 56.13 GiB already allocated; 1.38 GiB free; 57.79 GiB reserved in total by PyTorch)

pretrained weights

Hi. Great work!
Is it possible to provide us with some pre-trained weights so that we can play around?

Confusion about unlabeled data

Dear Authors,

Thanks for the paper. I used the scripts to split the data into label and unlabel(omni-label) but looking at unlabeled, I don't understand why bounding boxes still exist in the unlabel dataset!? What if I want to omni-label brand new data? The whole point of omni label is not to create bounding boxes, right? So why in the ...unlabel...json we still see bbox and segmentations fully populated? I know in the images side the label type is reflected (Unsup, tagsU, etc.) but still on the annotation side, it confuses me a lot seeing annotations for what is supposed to be semi-supervised tag only or None.

I have these 2 settings:
..._omni_unlabel_seed1709_10fully0Unsup90tagsU0tagsK0pointsU0pointsK0boxesEC0boxesU.json
..._omni_unlabel_seed1709_50fully50Unsup0tagsU0tagsK0pointsU0pointsK0boxesEC0boxesU.json

"annotations": [ { "area": 625848.0, "bbox": [ 684, 174, 1068, 586 ], "category_id": 1, "id": 1, "image_id": 0, "iscrowd": 0, "point": [ 1200.0, 346.0 ], "segmentation": [ [ 684, 174, 1752, 174, 1752, 760, 684, 760 ] ] }, {

Which model should I use after training, the teacher or the student?

Thanks for the amazing work.
After training for several epochs in SSOD mode，I found that the performance of teacher model was much better than student.
I wonder whether this phenomena is normal or not? If it is normal, which model should I use，the teacher or the sutdent？
Thanks.

CUDA out of memory after BURN_IN_STEP

The code can work in the step of BURN_IN_STEP with pixels 800 and gpu memory 32G. However, it occurs out of memory when semi-supervised learning with gpu memory 32G or 80G even reducing pixels to 600.

The CUDA out of memory information is as follows:
RuntimeError: CUDA out of memory. Tried to allocate 506.00 MiB (GPU 1; 31.75 GiB total capacity; 27.74 GiB already allocated; 424.00 MiB free; 29.83 GiB reserved in total by PyTorch)
RuntimeError: CUDA out of memory. Tried to allocate 1.97 GiB (GPU 0; 79.35 GiB total capacity; 56.13 GiB already allocated; 1.38 GiB free; 57.79 GiB reserved in total by PyTorch)

What's the problem? Any help will be appreciated.
@zhaoweicai

The role of the 'indicators' ?

What is the role of the indicator in network propagation? I've found that apart from serving as an indicator during dataset partitioning, it has no effect when calculating loss and binary matching, as its value always remains 1.

if indicators == None:
    num_batch = outputs['pred_logits'].shape[0]
    indicators = [1 for i in range(num_batch)]

How many epochs do I need to train in SSOD mode

Hi thanks for sharing your great project!

If I wanner reproduce the SSOD result (in Table 3), could you please tell me how to set the epochs? Will it still be 150, just like you wrote in r50_ut_detr_omni_coco.sh？

I’m looking forward to hearing from you.

amazon-science / omni-detr Goto Github PK

omni-detr's Introduction

Omni-DETR: Omni-Supervised Object Detection with Transformers

Installation

Usage

Dataset organization

Dataset preparation

COCO

VOC

Objects365

Bees

CrowdHuman

Training Omni-DETR

Training from scratch

Training from Deformable DETR

Training under the setting of COCO35to80

Training under the setting of VOC07to12

Note

License

omni-detr's People

Contributors

Stargazers

Watchers

Forkers

omni-detr's Issues

Recommend Projects

Recommend Topics

Recommend Org