idea-research / dino Goto Github PK

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

License: Apache License 2.0

Python 62.53% Shell 0.59% C++ 0.81% Cuda 8.18% Jupyter Notebook 27.88%

computer-vision deep-learning object-detection

dino's People

Contributors

Stargazers

Watchers

Forkers

outsider7777 zhanghao5683934 wstchhwp zqyjason jianpingzhonggit genlk hwijune poirot0805 size1995 chengyw auroresky94 legolaswyq opentld hailuo0112 jarvisustc jonychoi aku-hub sunsmarterjie eviliclufas hl-hanlin 1chimarugin yibingwei-1 li-qingyun ningyuanxiang kobrylee algonacci changgimoon linx-team zhuofalin bigwangyudong guttappa1238 ekusiadadus zhengfangwu yqgao716 deanofthewebb trongduynguyen0611 hiwavesupport mryu001 curisejia czczup qwzhong1988 timmhxw owen718 laonah kcchoi mrma-t wcyy0123 yf-wang-chn catakidoki originrose ljingv crazysman ashyun231 congjianting stajyeraselsan ithink3iam mrmurkin one-green-bird tinyloop xiaojake zongbowen tahirashehzadi matsusha bastiscode datacampuskoreauniv2022 ai-passionner yjh0410 theodore-xs legendbc qzgfather ksuncho triple-mu rianusr shonyeajin dghyundl00 kn-ru huyhoang17 bind-tian rayshark hongsi-taste chancellor376 anhtran09 xjx777 ymjian41 menglongyue jamessmith9956 whmjohn phuoc101 emiyaning jarygrace sudabai666 focalnet kiranmprajapati anhtu-phan davorjordacevic leonhlj you4ever momopusheen laborer123 thriller2ct

dino's Issues

How should I set the backbone_dir, when I bash scripts/DINO_train_swin.sh?

about the improvements over the results in the paper

Hi, thanks for releasing the code of the excellent work (DINO).

It seems that the released code achieves better results than the ones in the DINO paper (49.0 mAP vs. 47.9 mAP for 12 epochs with DINO-4scale). Are there some further improvements provided in this repo?

It will be much appreciated if more details about the improvements are provided. Thanks in advance.

Further actions when fine tuning results are very poor

First thank you for your great contribution.

I am trying to train my custom dataset, pretraining with the coco dataset
With your suggestion in the issue, I added parameters. Rest of them was system default.

-- pretrain_model_path /path/to/checkpoint/of/coco-pretrained-model
-- finetune_ignore label_enc.weight class_embed
-- options num_classes=7

Training proceeds, but performance has become poor. Growth stops around 0.3 AP.

Blue is the graph obtained at the time of pretraining, and the coco-dataset was trained with dino and backbone as swin large, and the red graph is the fine-tuning of this coco-dino-swin model as pretrained.

Looking a little more at the loss graph, it was observed that the box loss fell enough as expected, but the class embedding part, where the weight was ignored during finetune, did not fall enough.

What additional parameters do I need to adjust to improve the performance of the model I fine-tune with my own dataset? Or what additional actions should be observed/taken further? Your suggestion will be very appreciated.

How to use lambda_1 and lambda_2 in CDN?

Hi,

CDN is better than DN in both theory and practice. However, I don't find how to use lambda_1 and lambda_2, which are introduced in CDN, to create noisy contrastive pairs. Do they act as 'dn_box_noise_scale' or a multiplier for 'dn_box_noise_scale'?

Thanks.

About sigmoid_focal_loss

Hello, guys.
Well done!

I have a quick question about sigmoid_focal_loss as follows:
https://github.com/IDEACVR/DINO/blob/67bbcd97ef30a48cf343b7b0f3ad9ea0795b6fcd/models/dino/dino.py#L379

Why is the third dimension of target_classes_onehot one more than that of src_logits? Does the extra one dimension represent the "no object"?

Thanks in advance.

Why conditional query does not suit DINO but suit DAB-DETR?

Hi, why conditional query does not suit DINO but suit DAB-DETR? The reason is in CDN? Look forward to your reply.

hflip and vflip

Hi there,

I notice that we use the horizontal flip in data augmentation. I am trying to use vertical flip and rotation for data augmentation. However, I am wondering what the line of code means:

boxes = boxes[:, [2, 1, 0, 3]] * torch.as_tensor([-1, 1, -1, 1]) + torch.as_tensor([w, 0, w, 0])
target['masks'] = target['mask'].flip(-1)

(datasets.transform.py)

If I want to flip vertically, what should I change for the boxes and masks? Also, what should I change if I want to rotate? Thanks for your help!

dn_pos_idx calculattion for loss

        for i in range(len(targets)):
            if len(targets[i]['labels']) > 0:
                t = torch.arange(0, len(targets[i]['labels']) - 1).long().cuda()
                t = t.unsqueeze(0).repeat(scalar, 1)
                tgt_idx = t.flatten()
                output_idx = (torch.tensor(range(scalar)) * single_pad).long().cuda().unsqueeze(1) + t
                output_idx = output_idx.flatten()
            else:
                output_idx = tgt_idx = torch.tensor([]).long().cuda()

This is the code. I can't understand the subtraction on the third line. Why is the target number reduced by 1.

weight

I am try to use this model to train on my own dataset. I am wondering if I do not load the pre-trained model, what weights do we use to initialize? Thanks!

Compiling CUDA operators return error

Hello,

I run the following:
cd models/dino/ops
python setup.py build install

I get the following error:
ValueError: path '/comp_robot/liushilong/code/Deformable-DETR/models/ops/src/vision.cpp' cannot be absolute

Thank you in advance.

Attempt at the LFT, CDN, MQS modules implementations.

Thanks to the authors for their awesome word!
Although the source code is not available yet, I can't wait to attempt at reproducing the modules.

I submit this issue to ask if the LFT module can be implemented by modifying the following one line of code.

The original code in dn-dab-deformable-detr is:

intermediate_reference_points.append(reference_points)

The modified code is:

intermediate_reference_points.append(new_reference_points)

compile cuda error

thanks for your repo!

I have an errror while compiling CUDA.
How can I fix it?

os:Win11
python:3.8.8
cuda:11.4

cd models/dino/ops
python setup.py build install
↓
running build
running build_py
running build_ext
C:\Users\nomun\mysystem.venv\py388-DINO\lib\site-packages\torch\utils\cpp_extension.py:316: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
error: [WinError 2] The system cannot find the file specified

thanks.

category_map_str

When I read the code in detail, I find that there is a class called "label2compat" in coco.py (datasets). I am wondering if I need to change it when I want to use my dataset to train the model (different num_class)?

What happens when there are no target in the Dataset ?

hi,
In COCO Dataset, few images are non labeled, I was wondering how you handle these type of data for the denoising and matching part ?
Thanks

RuntimeError: CUDA out of memory

hi, i have the same settings as yours, batch size16 for GPU x 8 GPUs and DINO-4scale, but i still got error: CUDA out of memory.
any suggestion to solve it?

Wallclock time comparison

Hello,

you guys post the number of epochs for training when comparing with deformable detr and others architectures, but what about wallclock time? Is this data available? time per epoch is pretty much the same for every detr-like architectures?

thanks in advance

question about the model components in DINO

Hi, after reading the paper of the state-of-the-art detector DINO, I have one question about the details.

In the Appendix D3 Detailed model components, the paper says: "we find the conditional queries used in DAB-DETR does not suit our model". What does the conditional queries mean? Is it the idea of decoupling the object query to content part and position part (In this paper, you replace the [q_c, q_p] with q_c+q_p)? Or the scale vector (you remove the scale vector for the position encoding)?

Looking forward to a reply. Thanks in advance!

How train our own datasets?

#BREAK

Hi ,

when i run the code to eval the model using 'checkpoint0011_4scale.pth', the console prints the following message:

...
Test: [ 0/234] eta: 0:07:10 class_error: 21.43 loss: 23.3856 (23.3856) ...
Test: [ 10/234] eta: 0:01:09 class_error: 100.00 loss: 27.6563 (28.1273) ...
BREAK!BREAK!BREAK!BREAK!BREAK!
Averaged stats: class_error: 75.00 loss: 27.6915 (28.0543) ...
Accumulating evaluation results...
DONE (t=0.12s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
.....

Why is there a ‘’BREAK‘’ situation?

when will the code release?

inference per image

I am trying to use this amazing project for object detection task on my own dataset. I am now confusing how to get inference on an image/a dataset (推理单张图片或一个数据集). Could you release a sample code for inferencing? Thanks!

能导出onnx模型嘛？

will DINO Code be implemented in mmdetection project?

@SlongLiu @FengLi-ust @IDEACVR thanks for your greate work.

ImageNet Pretrained Swin Weight

Hello, thanks for the release of the code of this amazing study.

I want to ask that, is there a pre-trained imagenet backbone weights for the swin transformer in this repo? Maybe, it might be useful for replicating the swin transformer results of the paper.

Thank in advance

metric for each class

Hi there,

Thanks for your amazing work! I train this model on my own dataset recently, and I want to print metric for each class (e.g. AP50 for class1, AP50 for class2, ...). I am wondering how I could achieve this. Do you have any suggestion?

The effect of Objects365 Pretraining

Hi, thanks for your impressive work! DINO-SwinL achieves state-of-the-art performance on COCO leaderboard. Congratulations!
I am wondering that how much performance gain can the objects365 pretraining bring? Thank you.

Will the code for objects365 pretrain open source?

Hi, @SuperHenry2333 @SlongLiu @FengLi-ust

Thanks for your contribution, could you please open source the code that support objects365?

Why learnable content queries in Mixed query selection is helpful?

After reading the paper, I have a confusion about Mixed query selection. I think the learnable content queries will mismatch with the positional queries from the output of Encoder. So I am confused about the improvement of mixed query selection. Just setting content queries to zeros maybe achieve better performance. I want to know if you ever visualized the content queries after training. Look forward to your reply.

training logs

Hi, thanks for the brilliant work!

I am wondering whether you could provide the training logs for your released checkpoints so that we can track the training progress. thanks!

best,

There is no optimizer.zero_grad() in training without amp.

Thank you for sharing your great work.

When trainning without amp, there is no optimizer.zero_grad(), is this intentional?
Did you forget to write it down?
If you use amp, are these taining results different?

here

Trained checkpoint

When will the trained checkpoint with SwinL backbone be released? Thanks

What is the content query of CDN queries?

Hi, I'm confused about what the content query of CDN queries is. Is it the same as the normal learnable content queries?

Looking forward to your reply. Thanks in advance!

a question about use_ema in main.py

Hi there,

Thanks for your help and listening!

When I look at the main.py, I notice that "use_ema" appears frequently. However, I find that this will set "False" in config. What do you design it for? What tasks do we need to use it? If we use this project for object detection, do I need to set something for this part?

Why num_classes is 91 in DINO?

Hi, @SlongLiu @SuperHenry2333 @FengLi-ust

I noticed that the num_classes in DINO is 91, however, it should be 80 on COCO

when will the code release?

Can you give an approximate date?

memory leak

I met the isuue of memory leak, when It run on second epoch, the memory run out with memory size of 64G. anyone have ever been in the same situation ?

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\20825\\AppData\\Local\\Temp\\tmpt11h2ul9\\tmp2fdhjty4.py'

thanks for your greate work.
Recently, I wanted to train my own dataset
bash scripts/DINO_train.sh COCODIR
The result error:

Not using distributed mode
Loading config file from config/DINO/DINO_4scale.py
Traceback (most recent call last):
File "main.py", line 398, in
main(args)
File "main.py", line 100, in main
cfg = SLConfig.fromfile(args.config_file)
File "C:\Users\20825\Desktop\DINO-main\util\slconfig.py", line 188, in fromfile
cfg_dict, cfg_text = SLConfig._file2dict(filename)
File "C:\Users\20825\Desktop\DINO-main\util\slconfig.py", line 86, in _file2dict
shutil.copyfile(filename,
File "C:\anaconda\envs\dino_detr\lib\shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: 'C:\Users\20825\AppData\Local\Temp\tmpt11h2ul9\tmp2fdhjty4.py'
Thank you for your answer

I got the same situation

swin-Large model will be released?

Hi,

Thanks for great work. I found only res50 model released, when will the swin-large model released?

Any idea of improving accuracy of small and dense object detection?

Please give us anything that you can idea of. Thanks!!

obj365 checkpoint

Dear authors,

Thanks for the excellent work.
May I ask, do you have the plan to release the obj365 pretraining checkpoints?

Thanks.

P-R graph

Hi there,

Thanks for your amazing work again! I am trying to draw PR curve to evaluate the result of our model. However, I cannot locate the results' place, and I do not know how to start. Do you have any suggestion?

Why is nums_classes set to 91 and but，the actual number of categories in coco's dataset is 80

@IDEACVR

The decription of Deformable DETR is wrong

Deformable DETR doesn't formulate the queries as anchor points.

Actually, the query formulation of Deformable DETR is the same as the original DETR, i.e. a set of learned embeddings.

Instead, Deformable DETR predicts the reference points/anchor point by the queries.

Question about batch_size

Hi there,

I am trying to change the batch_size in config file to fit my dataset. However, when I set batch_size over 10, the system will show runtime error. Do you set something for the batch_size?

CUDA indexSelectLargeIndex problem

Hi,

Thanks for your amazing work!

When I try to train the object365 dataset, a cuda error will be triggered like '/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [281,0,0$
, thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.'
at https://github.com/IDEACVR/DINO/blob/24f3567e162c75f0323cf4a1ed2d5bf6e36bee52/models/dino/dn_components.py#L70
Here are some error messages that may be useful:

Here is my config

Looking forward to your reply.

RuntimeError: CUDA out of memory

load checkpoint

Fine tuning guide with custom dataset using pretrained coco model

I am trying to train a dino model and using the model, fine tune a dino model with my dataset.
I trained dino mode on the coco dataset at 12 epochs through the swin large 4 scale config
Then load the weights as a pretrained model to fine-tune it with my custom dataset.

Because the dataset I am trying to fine tune has different num_classes, so I will have to change the prediction layer num classes output and start fine tuning training.

How should I tweak the config for reuse? Or what part of the code should be modified?
I'd like to get the guide.

Regards,
Keanu

idea-research / dino Goto Github PK

dino's People

Contributors

Stargazers

Watchers

Forkers

dino's Issues

Recommend Projects

Recommend Topics

Recommend Org