zhanggongjie / meta-detr Goto Github PK
View Code? Open in Web Editor NEW[T-PAMI 2022] Meta-DETR for Few-Shot Object Detection: Official PyTorch Implementation
License: MIT License
[T-PAMI 2022] Meta-DETR for Few-Shot Object Detection: Official PyTorch Implementation
License: MIT License
I don't find visualization code from metaDETR, could you release these codes? thanks
Could you please provide the code and checkpoints of few-shot instance segmentation? Thanks.
Excellent code!So strong coding ability!
For seed 1~10,I wonder how you sample the novel dataset.
In order to ensure K-shot, do you discard the annotations or discard the image containing novel objects?
Looking forward to your reply~
Thanks for release the great job!
Would you please provide the base training/ few-shot fine-tuning logs, eg., the 10-shot setting on split-1 (seed0) of voc dataset?
And, i have noticed that you used the sinusoidal function instead of learnable embedding, such as nn.Embedding(), as queries' representations. Have you already done the corresponding ablation studies? if so, how such a design influences the performance?
Look forward to your replay, thank you!
the training data of Meta fintuning how to extract?could you open the script code for processing the Meta-fintuning training data ?
ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
In the file named ms_deform_attn_func.py , line 18,i can't import MultiScaleDeformableAttention. Maybe i haven't this file . could anyone tell me why? Thank you.
num_feature_levels == 4很多地方还没有实现,显示NotImplementedError,请问有完整版的代码吗
some mistakes when compiling deformable attention with "sh ./make.sh"
g++ -pthread -B /home/ub/anaconda3/envs/metadetr/compiler_compat -Wl,--sysroot=/ -pthread -shared -B /home/ub/anaconda3/envs/metadetr/compiler_compat -L/home/ub/anaconda3/envs/metadetr/lib -Wl,-rpath=/home/ub/anaconda3/envs/metadetr/lib -Wl,--no-as-needed -Wl,--sysroot=/ /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cpu/ms_deform_attn_cpu.o /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cuda/ms_deform_attn_cuda.o /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/vision.o -L/home/ub/anaconda3/envs/metadetr/lib/python3.7/site-packages/torch/lib -Lusr/local/cuda-11.1/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-37/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so
g++: error: /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cpu/ms_deform_attn_cpu.o: 没有那个文件或目录
g++: error: /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cuda/ms_deform_attn_cuda.o: 没有那个文件或目录
g++: error: /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/vision.o: 没有那个文件或目录
error: command '/usr/bin/g++' failed with exit code 1
does anyone know how to deal with it?
Are you fixing the pool for images for 1,2,3 or 5 shot in the initial by creating the json for base training . or are you dynamically creating the support and query dataset during every episode for base training so that many samples for each class gets in use.
FileNotFoundError: [Errno 2] No such file or directory: 'data\voc\annotations\pascal_trainval0712.json'
Hello,
Did you implement your model in multi-scale versions ? If yes, is it possible to release the associated code?
Thank you
I only have two GPUs at the moment, changed the command line
“GPUS_PER_NODE=2 tools/run_dist_launch.sh 2”
But still, only the first GPU can be used.
What part should I change so that I can use two GPUs?
Hello thank you for sharing the code.
I would like to know how to extract precision, recall and f1-score metrics. I already have the AP and AR metrics.
I am trying to use the following code but it gives me a numpy matrix:
precision = coco_eval.eval['precision']
recall = coco_eval.eval['recall']
Can you help me?
When I train on 2 GPUs. It seems pytorch divide sample_support_img by 2. Then I got a error
assert num_support == (self.args.episode_size * self.args.episode_num)
But I think we should not divide like this, right?
Hi! I am curious about the time cost of the training process. I will look forward to your reply.
Hello, Thanks for your great work. I would like to point out that in the paper it looks to me that the query features are passed to the CAM module. However, in the actual implementation the query features did not play a role until the final encoder-decoder architecture.
For example, the category_code() also computes the categorical features using the support samples, whereas in the meta_detr.py module the query features are extracted from the backbone and are only interacted with the support features in the self.transformer(), which seems to be different from paper's architecture. I am wondering if something is missing? Thanks.
Hi and thank you for the interesting work,
Can you please elaborate on the training of the Deformable-DETR-ft-full
variant? My understanding is that a Deformable-DETR was trained on base classes and fine-tuned on novel classes, but I still might be missing few other details. Specifically -
Thank you again!
Hi,
I would like to run the code on COLAB, because I have only one GPU on my computer, but I suppose COLAB also has only one GPU. Is there any way to run the code on COLAB with multiple GPUs? Or any recommendation to run the code with multiple GPUs on an alternative platform?
I would be glad if you help me with this.
Thanks for the work,
Is it possible to specify the prerequisites : python, cuda and pytorch version?
hi,
I'm glad to find this interesting work. I would like to perform few-shot finetuning based on the pretained weight provided in github, but I have couples of questions:
I want to run the code on Google Colab but i'm wondering that is it possibe to setup the enviement with only one GPU (with RAM=12GB)?
if I only want to perform few-shot finetuning, I'm assuming that i don't have to download the "full" dataset, correct?
I'm looking forward to seeing your reply :)
where to find the pretrained model?
您好,我按照您给出的log对MSCOCO数据集进行了重新的base training,但是只用了一张Nvidia V100 GPU。重新训练后的base training model相较于您提供的MSCOCO的base training model性能下降很多,尤其是在base class。
Hello, I re-base trained the MSCOCO dataset according to the log you gave, but only used an Nvidia V100 GPU. Compared with the base training model of MSCOCO provided by you, the performance of the base training model after retraining decreases a lot, especially in the base class.
我使用的base training的参数
I used these parameters of base training
EXP_DIR=exps/coco
BASE_TRAIN_DIR=${EXP_DIR}/base_train
mkdir exps
mkdir ${EXP_DIR}
mkdir ${BASE_TRAIN_DIR}python -u main.py
--dataset_file coco_base
--backbone resnet101
--num_feature_levels 1
--enc_layers 6
--dec_layers 6
--hidden_dim 256
--num_queries 300
--batch_size 4
--category_codes_cls_loss
--epoch 25
--lr_drop_milestones 20
--save_every_epoch 5
--eval_every_epoch 5
--output_dir ${BASE_TRAIN_DIR}
2>&1 | tee ${BASE_TRAIN_DIR}/log.txt
重新train后的base training model:
base training model after re-train:
Novel Categories:
Accumulating evaluation results...
DONE (t=3.51s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.044
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.066
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.052
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.120Base Categories:
Accumulating evaluation results...
DONE (t=9.94s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.014
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.072
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.157
主页上提供的base training model:
base training model provided on this github:
- Base Categories:
Accumulating evaluation results...
DONE (t=10.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.346
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.538
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.365
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.183
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.314
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.496
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.530
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.303
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.603
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.764- Novel Categories:
Accumulating evaluation results...
DONE (t=6.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.012
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.005
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.010
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.105
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.110
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.067
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.102
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.197
Hey, it's me here.
I notice that the few-shot tuning is based on your pre-splited nshot.json in data/coco_fewshot/seedx
So here are my questions:
Hey guys !
Really enjoy your good work, though I have some trouble make it converge because of the long training time.
Do you have any news or date on when the pretrained weights and cleaned git will be released ?
Thanks !
I have run some diagnosis on your code and I found that even using more workers, the dataloader is still slow (not even backward start) Is this slow loading in for samples in dataloader behavior expected in the start of training process?
Dear auther,
I notice that the sampling offsets related parameters aren't trained during fine-tuning. I am kind of confusing what the reason for.
Looking forward to your reply.
Hi, I've read about your paper and am wondering whether specific stages are frozen for the second finetuning stage.
How much time it takes to train an epoch when using one v100?
Why did I train an epoch with one 2080TI for a day(around 24 hours) ? Is there anything wrong with it? Or is that the reality?
Hello,
Thanks for the work and releasing the code.
I was wondering what meta task settings you used to get the results you listed in your paper for VOC and COCO dataset. Did you use 5 episode num, 5 episode size, 15 total num support and 10 max pos support?
Also, is the code release up to date with your latest version of the paper?
读论文的时候感觉作者把FSDV和detr结合的思路还挺好的,但是一些创新点貌似都是detr里的,比如
This paper presents a novel meta-detector framework, namely Meta-DETR, which eliminates region-wise prediction and instead meta-learns ob- ject localization and classification at image level in a unified and complementary manner.
想看下复现的实验效果,麻烦作者尽快整理呀(抱拳)
Hi, I wanna know whether is the transformer trained from scratch? If not, where are the pre-trained parameters from?
如果我想换数据集的话,前期数据预处理该怎么做,请问您有没有换过,并且数据前期的准备工作有没有做过,如果做过的话能否在后续的代码中开源一下脚本和说明?谢谢您,期待您的解答
Hey there! Will you release the code soon? Thanks!
Now it's November, when will the code be released?
So as I have put into the title.
I'm training and testing with the MSCOCO.
I use 8 V100s to do the base and finetune, epochs etc following the code's setting.
The performance recorded at the end of, for example 10 shot finetune is like this:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.19241
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.30558
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.20127
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.03518
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.16560
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.29263
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.22415
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.33887
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.35160
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.11586
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.30860
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.55801
Which is reasonable to me. However when I try to do inference (on my local machine with a 1080ti) with the trained checkpoint0499.pth, the result is like this for base classes, for novel it's all zeros:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.040
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.091
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.026
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.005
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.043
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.088
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.063
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.073
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.073
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.064
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.173
It is very likely that I have missed something so I'm trying to debug, but with no clue at the moment. The command I used at finetune is:
python main.py \
--dataset_file coco_base \
--backbone resnet101 \
--num_feature_levels 1 \
--enc_layers 6 \
--dec_layers 6 \
--hidden_dim 256 \
--num_queries 300 \
--batch_size 2 \
--category_codes_cls_loss \
--resume BLABLABLA \
--fewshot_finetune \
--fewshot_seed ${fewshot_seed} \
--num_shots 10 \
--epoch ${epoch} \
--lr_drop_milestones ${lr_drop1} ${lr_drop2} \
--warmup_epochs 50 \
--save_every_epoch 25 \
--eval_every_epoch 25 \
--start_epoch ${NEXT_EPOCH} \
--output_dir BLABLABLA \
And at inference is:
python main.py \
--eval \
--dataset_file coco_base \
--backbone resnet101 \
--num_feature_levels 1 \
--enc_layers 6 \
--dec_layers 6 \
--hidden_dim 256 \
--num_queries 10 \
--batch_size 1 \
--category_codes_cls_loss \
--resume checkpoint0499.pth \
--num_shots 10 \
--output_dir tmp \
Any information is appreciated.
请问在coco上base training阶段bAP能达到多少,论文中好像没有给出相关结果
Hello, I would like to know whether your articles are being submitted and whether you intend to submit them to conferences or journals?
Q1:ImportError: cannot import name '_NewEmptyTensorOp' from 'torchvision.ops.misc'
Q2:PermissionError: [Errno 13] Permission denied: './basetrain.sh'
For the above two questions, can the owner answer them?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.