Giter Site home page Giter Site logo

idea-research / dino Goto Github PK

View Code? Open in Web Editor NEW
2.1K 2.1K 230.0 13.21 MB

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

License: Apache License 2.0

Python 62.53% Shell 0.59% C++ 0.81% Cuda 8.18% Jupyter Notebook 27.88%
computer-vision deep-learning object-detection

dino's People

Contributors

developer0hye avatar fengli-ust avatar haozhang534 avatar ideacvr avatar rentainhe avatar slongliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dino's Issues

about the improvements over the results in the paper

Hi, thanks for releasing the code of the excellent work (DINO).

It seems that the released code achieves better results than the ones in the DINO paper (49.0 mAP vs. 47.9 mAP for 12 epochs with DINO-4scale). Are there some further improvements provided in this repo?

It will be much appreciated if more details about the improvements are provided. Thanks in advance.

Further actions when fine tuning results are very poor

First thank you for your great contribution.

I am trying to train my custom dataset, pretraining with the coco dataset
With your suggestion in the issue, I added parameters. Rest of them was system default.

-- pretrain_model_path /path/to/checkpoint/of/coco-pretrained-model
-- finetune_ignore label_enc.weight class_embed
-- options num_classes=7

Training proceeds, but performance has become poor. Growth stops around 0.3 AP.
스크린샷 2022-07-22 오전 11 22 03

스크린샷 2022-07-22 오전 11 19 50

Blue is the graph obtained at the time of pretraining, and the coco-dataset was trained with dino and backbone as swin large, and the red graph is the fine-tuning of this coco-dino-swin model as pretrained.

스크린샷 2022-07-22 오전 11 26 48

Looking a little more at the loss graph, it was observed that the box loss fell enough as expected, but the class embedding part, where the weight was ignored during finetune, did not fall enough.

What additional parameters do I need to adjust to improve the performance of the model I fine-tune with my own dataset? Or what additional actions should be observed/taken further? Your suggestion will be very appreciated.

How to use lambda_1 and lambda_2 in CDN?

Hi,

CDN is better than DN in both theory and practice. However, I don't find how to use lambda_1 and lambda_2, which are introduced in CDN, to create noisy contrastive pairs. Do they act as 'dn_box_noise_scale' or a multiplier for 'dn_box_noise_scale'?

Thanks.

hflip and vflip

Hi there,

I notice that we use the horizontal flip in data augmentation. I am trying to use vertical flip and rotation for data augmentation. However, I am wondering what the line of code means:

boxes = boxes[:, [2, 1, 0, 3]] * torch.as_tensor([-1, 1, -1, 1]) + torch.as_tensor([w, 0, w, 0])
target['masks'] = target['mask'].flip(-1)

(datasets.transform.py)

If I want to flip vertically, what should I change for the boxes and masks? Also, what should I change if I want to rotate? Thanks for your help!

dn_pos_idx calculattion for loss

        for i in range(len(targets)):
            if len(targets[i]['labels']) > 0:
                t = torch.arange(0, len(targets[i]['labels']) - 1).long().cuda()
                t = t.unsqueeze(0).repeat(scalar, 1)
                tgt_idx = t.flatten()
                output_idx = (torch.tensor(range(scalar)) * single_pad).long().cuda().unsqueeze(1) + t
                output_idx = output_idx.flatten()
            else:
                output_idx = tgt_idx = torch.tensor([]).long().cuda()

This is the code. I can't understand the subtraction on the third line. Why is the target number reduced by 1.

weight

I am try to use this model to train on my own dataset. I am wondering if I do not load the pre-trained model, what weights do we use to initialize? Thanks!

Compiling CUDA operators return error

Hello,

I run the following:
cd models/dino/ops
python setup.py build install

I get the following error:
ValueError: path '/comp_robot/liushilong/code/Deformable-DETR/models/ops/src/vision.cpp' cannot be absolute

Thank you in advance.

Attempt at the LFT, CDN, MQS modules implementations.

Thanks to the authors for their awesome word!
Although the source code is not available yet, I can't wait to attempt at reproducing the modules.

I submit this issue to ask if the LFT module can be implemented by modifying the following one line of code.

The original code in dn-dab-deformable-detr is:

intermediate_reference_points.append(reference_points)

The modified code is:

intermediate_reference_points.append(new_reference_points)

compile cuda error

thanks for your repo!

I have an errror while compiling CUDA.
How can I fix it?

os:Win11
python:3.8.8
cuda:11.4


cd models/dino/ops
python setup.py build install

running build
running build_py
running build_ext
C:\Users\nomun\mysystem.venv\py388-DINO\lib\site-packages\torch\utils\cpp_extension.py:316: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
error: [WinError 2] The system cannot find the file specified

thanks.

category_map_str

When I read the code in detail, I find that there is a class called "label2compat" in coco.py (datasets). I am wondering if I need to change it when I want to use my dataset to train the model (different num_class)?

Screenshot 2022-07-27 at 1 26 43 PM

RuntimeError: CUDA out of memory

hi, i have the same settings as yours, batch size16 for GPU x 8 GPUs and DINO-4scale, but i still got error: CUDA out of memory.
any suggestion to solve it?

Wallclock time comparison

Hello,

you guys post the number of epochs for training when comparing with deformable detr and others architectures, but what about wallclock time? Is this data available? time per epoch is pretty much the same for every detr-like architectures?

thanks in advance

question about the model components in DINO

Hi, after reading the paper of the state-of-the-art detector DINO, I have one question about the details.

In the Appendix D3 Detailed model components, the paper says: "we find the conditional queries used in DAB-DETR does not suit our model". What does the conditional queries mean? Is it the idea of decoupling the object query to content part and position part (In this paper, you replace the [q_c, q_p] with q_c+q_p)? Or the scale vector (you remove the scale vector for the position encoding)?

Looking forward to a reply. Thanks in advance!

#BREAK

Hi ,

when i run the code to eval the model using 'checkpoint0011_4scale.pth', the console prints the following message:

...
Test: [ 0/234] eta: 0:07:10 class_error: 21.43 loss: 23.3856 (23.3856) ...
Test: [ 10/234] eta: 0:01:09 class_error: 100.00 loss: 27.6563 (28.1273) ...
BREAK!BREAK!BREAK!BREAK!BREAK!
Averaged stats: class_error: 75.00 loss: 27.6915 (28.0543) ...
Accumulating evaluation results...
DONE (t=0.12s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
.....

Why is there a ‘’BREAK‘’ situation?

inference per image

I am trying to use this amazing project for object detection task on my own dataset. I am now confusing how to get inference on an image/a dataset (推理单张图片或一个数据集). Could you release a sample code for inferencing? Thanks!

ImageNet Pretrained Swin Weight

Hello, thanks for the release of the code of this amazing study.

I want to ask that, is there a pre-trained imagenet backbone weights for the swin transformer in this repo? Maybe, it might be useful for replicating the swin transformer results of the paper.

Thank in advance

metric for each class

Hi there,

Thanks for your amazing work! I train this model on my own dataset recently, and I want to print metric for each class (e.g. AP50 for class1, AP50 for class2, ...). I am wondering how I could achieve this. Do you have any suggestion?

The effect of Objects365 Pretraining

Hi, thanks for your impressive work! DINO-SwinL achieves state-of-the-art performance on COCO leaderboard. Congratulations!
I am wondering that how much performance gain can the objects365 pretraining bring? Thank you.

Why learnable content queries in Mixed query selection is helpful?

After reading the paper, I have a confusion about Mixed query selection. I think the learnable content queries will mismatch with the positional queries from the output of Encoder. So I am confused about the improvement of mixed query selection. Just setting content queries to zeros maybe achieve better performance. I want to know if you ever visualized the content queries after training. Look forward to your reply.

training logs

Hi, thanks for the brilliant work!

I am wondering whether you could provide the training logs for your released checkpoints so that we can track the training progress. thanks!

best,

Trained checkpoint

When will the trained checkpoint with SwinL backbone be released? Thanks

What is the content query of CDN queries?

Hi, I'm confused about what the content query of CDN queries is. Is it the same as the normal learnable content queries?

Looking forward to your reply. Thanks in advance!

a question about use_ema in main.py

Hi there,

Thanks for your help and listening!

When I look at the main.py, I notice that "use_ema" appears frequently. However, I find that this will set "False" in config. What do you design it for? What tasks do we need to use it? If we use this project for object detection, do I need to set something for this part?
Screenshot 2022-07-26 at 9 42 36 AM

memory leak

I met the isuue of memory leak, when It run on second epoch, the memory run out with memory size of 64G. anyone have ever been in the same situation ?

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\20825\\AppData\\Local\\Temp\\tmpt11h2ul9\\tmp2fdhjty4.py'

thanks for your greate work.
Recently, I wanted to train my own dataset
bash scripts/DINO_train.sh COCODIR
The result error:

Not using distributed mode
Loading config file from config/DINO/DINO_4scale.py
Traceback (most recent call last):
File "main.py", line 398, in
main(args)
File "main.py", line 100, in main
cfg = SLConfig.fromfile(args.config_file)
File "C:\Users\20825\Desktop\DINO-main\util\slconfig.py", line 188, in fromfile
cfg_dict, cfg_text = SLConfig._file2dict(filename)
File "C:\Users\20825\Desktop\DINO-main\util\slconfig.py", line 86, in _file2dict
shutil.copyfile(filename,
File "C:\anaconda\envs\dino_detr\lib\shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: 'C:\Users\20825\AppData\Local\Temp\tmpt11h2ul9\tmp2fdhjty4.py'
Thank you for your answer

obj365 checkpoint

Dear authors,

Thanks for the excellent work.
May I ask, do you have the plan to release the obj365 pretraining checkpoints?

Thanks.

P-R graph

Hi there,

Thanks for your amazing work again! I am trying to draw PR curve to evaluate the result of our model. However, I cannot locate the results' place, and I do not know how to start. Do you have any suggestion?

The decription of Deformable DETR is wrong

image

Deformable DETR doesn't formulate the queries as anchor points.

Actually, the query formulation of Deformable DETR is the same as the original DETR, i.e. a set of learned embeddings.

Instead, Deformable DETR predicts the reference points/anchor point by the queries.

Question about batch_size

Hi there,

I am trying to change the batch_size in config file to fit my dataset. However, when I set batch_size over 10, the system will show runtime error. Do you set something for the batch_size?

CUDA indexSelectLargeIndex problem

Hi,

Thanks for your amazing work!

When I try to train the object365 dataset, a cuda error will be triggered like '/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [281,0,0$
, thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.'
at https://github.com/IDEACVR/DINO/blob/24f3567e162c75f0323cf4a1ed2d5bf6e36bee52/models/dino/dn_components.py#L70
Here are some error messages that may be useful:
error
Here is my config
cof

Looking forward to your reply.

Fine tuning guide with custom dataset using pretrained coco model

I am trying to train a dino model and using the model, fine tune a dino model with my dataset.
I trained dino mode on the coco dataset at 12 epochs through the swin large 4 scale config
Then load the weights as a pretrained model to fine-tune it with my custom dataset.

Because the dataset I am trying to fine tune has different num_classes, so I will have to change the prediction layer num classes output and start fine tuning training.

How should I tweak the config for reuse? Or what part of the code should be modified?
I'd like to get the guide.

Regards,
Keanu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.