limbo0000 / instanceloc Goto Github PK

View Code? Open in Web Editor NEW

144.0 144.0 13.0 2.49 MB

[CVPR 2021] Instance Localization for Self-supervised Detection Pretraining

Home Page: https://arxiv.org/pdf/2102.08318.pdf

License: Apache License 2.0

Python 87.90% C++ 6.54% Cuda 5.49% Shell 0.08%

instanceloc's People

Contributors

Stargazers

Watchers

Forkers

trendingtechnology 1157449807 zumbalamambo zero2er0 liaw05 qianna00 georgehappy1 jhindel elainezhuo jonasmiksch shervin-halat sinamalakouti

instanceloc's Issues

training is slow

When running pre-train task on 4 V-100 GPUs, I found that this line of code in shuffle BN takes a lot of time:
idx_shuffle = torch.randperm(batch_size_all).cuda()

In addition，speed of RPN head is also slow.

Do you know what's going on？
Look forward to your reply.

After the training is completed by train_net.py, the terminal prints the verification results on COCO_val. I understand that according to the self-supervised process, you should fix the self-supervised training part of the backbone network, then fine-tune the detection head on the labeled data, and then go to COCO_val to evaluate the performance. But I don't see where InsLoc is fine-tuned training, where is this step achieved? Hope to get your answer

通过train_net.py训练完成后，终端打印了在COCO_val上的验证结果。我理解按照自监督的流程，应该固定自监督训练得到的主干网络部分，然后在有标签数据上微调检测头后，再去COCO_val上评估性能。但我没看到InsLoc是在哪里进行微调训练的，请问这一步是在哪里实现的呢？希望得到您的解答

Is there Shuffle-BN like moco in your paper?

Unclear what is used as a negative in loss

Hi!

Nice paper.
Though, it's unclear during the course of paper, what is used as negative examples for the InfoNCE objective. We know what are the v_q and v_k_+, although what is used as v_k_i's is not set. Natural guess supposes it's something like C(J_i, B_i), where J and B are random target and background image. But I can not figure it out for sure, while reading the paper.

Nvidia error after train begins

thanks for your excellent work~but when I try to pretrain the model on my own dataset ,the train soon jammed and I can not run Nvidia smi even I kill the training.

Unable to determine the device handle for GPU 0000:81:00.0: Unknown Error

Detail of the localization evaluation

Hi,

Thanks for the great work. I have one question about the localization evaluation results of table 3. in the paper.

Assume the image is x, the patch operation(splits the whole image into M patches) is P( . ) and the encoder is f( . ), which one of the feature patches f(P(x)) or P(f(x)) is the feature that we used to evaluate the localization?

Thanks!

data augmentations

Hi,
I have one more question about data augmentation.
I'd like to know details of the augmentation.
When you composite the foreground and background image, did you crop the foreground image or just resize with the random aspect ration you mentioned in the paper without cropping? If you cropped, what was the min/max size??
And after random scaled it between 128 to 256, did you paste it to the original size of background image?
Is there any more applied augmentations to the cut-mixed image, such as resize to 224??

Finally, I'm really looking forward to your upload. :)

Thanks.

May I ask which file is the code for the implementation of bounding box augmentations?

Is it in the AnchorAugHead.py file?

Another question is what does the gt_bboxes in the fwd function parameter in the entire model file represent?
The last question is whether the bounding box enhancement method in your paper is enhanced based on the bounding box of the cropped foreground image, and is the RPN-based two-stage method used to obtain more prospals? In fact, I just want to know what to enhance to get so many prospals, if it is based on RPN, it is only suitable for the two-stage target detector.

about custom dataset

Hi,Sir. How to use custom Dataset? For example,how to modify the config for the VOCDataset.

Question about the effect of RoiAlign

Hi,
I have read the paper of InstanceLoc, and found it a very interesting and promising work. The results in Tab. 3 are quite valuable. I wonder if there is any result about only applying copying and pasting (CP), which may help me better understand the effect of the CP-based data augmentation and the promotion brought by RoiAlign.
Thanks for your valuable work and hoping for your response.

Training not stable at the beginning

Hi! Did you also experience very unstable training for the first ~100k training steps?
Accuracy jumps from 0 to 80%, loss goes up or down around 40. This seem to stabilize later. Is this expected?

This is accuracy graph for ~1M training steps:

inloc_C4 config

Is there a instance location c4 configuration file?
Thank you

Please upload Model Zoo quickly,I can't wait to learn your work

thank you very much!

how to test the model?

I have trained the model.so how can I test

how to calculate the overall loss?

Hi, thanks for your interesting works.
I have a question about the loss for r50-fpn.
As I understand, you computed the contrastive loss on roi aligned features of each level and that means 4 losses are computed.
Then how did you combine the losses to get an overall loss, just sum of all losses or weighted sum??

Config for fine-tuning on Mini COCO

Hi! Could you provide config that you used to fine-tune pretrained model on Mini COCO?
Paper mentions that there are some minor changes to model structure and a few things (e.g. LR schedule) are not clear.

Thank you

The .pkl checkpoints can't be transfered.

Thanks for your job!!!
When I use 'convert_pretrained.py' to transfer your 'insloc_fpn_400ep.pkl', it goes wrong with 'RuntimeError: Invalid magic number; corrupt file?'. I don't know how to fix it...

How to create an downstream object detection task

Hi @zhirongw @limbo0000

Thanks for the wonderful work. I have few questions, to just improve my understanding of the SSL approach.

As we use InsLoc for the pre-text tasks on certain unlabeled datasets like(coco) using the ResNet 50 Architecture. Onces we have the pre-trained model how do we set it for the downstream object detection task :

Will this be with Supervision => where we have the images and the related bbox information about it

Architecture => if for the DetCo pre-trained model I have used Resnet 50 and for the Downstream object detection with labels I want to use mobilenetV2 .. is it possible ? or it should be resnet 50 itself for downstream task