zhanggongjie / meta-detr Goto Github PK

View Code? Open in Web Editor NEW

370.0 13.0 83.0 9.1 MB

[T-PAMI 2022] Meta-DETR for Few-Shot Object Detection: Official PyTorch Implementation

License: MIT License

Python 74.01% Shell 3.27% C++ 2.06% Cuda 20.66%

object-detection few-shot-object-detection meta-learning computer-vision pytorch

meta-detr's People

Contributors

Stargazers

Watchers

Forkers

sarahchenjieyi ziyehu zyg11 sarahbytedance angeloas gxd1994 xiyie rass89 mona0809 luweishuang ella2le casey-x frensher bermanra sangtrx xiaohei1001 lemoner20 0x8235 lycokie e-kiss-me tufo830 robucon spicyguml marisfr breaklien samxiaosheng tutuna dreamwish1998 billionerd awekling detrresearch hay-man xudongli-alex minisoco obsidian6s luluchou molierflower musherm zutsusemi neophack vancause erica-yang gemelgb aliper96 wmyname peikalunci autogyro czpeier lxxiaoxin1 renqian805 zlx1134558955 iraygmail xjtalgo liux-n elijahahianyo cv-det yejinhwang909 yurui12138 charlescxk wushuang1998 nigellu whmjohn imyjx yomik-js donglongzi adamcohenrose qinzhengmei sorrowyn hanwhapaullee ellieseven stu-github lifeng0718 cuongdv1 fangarenotgnu weiweilars sweetpotato0213 anh-vunguyen uusei caill0429 kdongyi kclxw rastna12 chen-hsinko

meta-detr's Issues

Question about CAM architecture

請問CAM架構當中的Q、K、V分別代表什麼?

How to visualize results?

I don't find visualization code from metaDETR, could you release these codes? thanks

The implementation of instance segmentation.

Could you please provide the code and checkpoints of few-shot instance segmentation? Thanks.

How do you sample training dataset？

Excellent code！So strong coding ability！

For seed 1~10，I wonder how you sample the novel dataset.
In order to ensure K-shot, do you discard the annotations or discard the image containing novel objects?

Looking forward to your reply~

The training log and the query embedding

Thanks for release the great job!

Would you please provide the base training/ few-shot fine-tuning logs, eg., the 10-shot setting on split-1 (seed0) of voc dataset?

And, i have noticed that you used the sinusoidal function instead of learnable embedding, such as nn.Embedding(), as queries' representations. Have you already done the corresponding ablation studies? if so, how such a design influences the performance?

Look forward to your replay, thank you!

the training data of Meta fintuning how to extract?

the training data of Meta fintuning how to extract?could you open the script code for processing the Meta-fintuning training data ?

does anyone meet the same problem with me?

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

In the file named ms_deform_attn_func.py , line 18，i can't import MultiScaleDeformableAttention. Maybe i haven't this file . could anyone tell me why? Thank you.

Can u provide a detailed readme post table compare with other detr based optimized method?

num_feature_levels == 4很多地方还没有实现，显示NotImplementedError，请问有完整版的代码吗

error: command '/usr/bin/g++' failed with exit code 1

some mistakes when compiling deformable attention with "sh ./make.sh"

g++ -pthread -B /home/ub/anaconda3/envs/metadetr/compiler_compat -Wl,--sysroot=/ -pthread -shared -B /home/ub/anaconda3/envs/metadetr/compiler_compat -L/home/ub/anaconda3/envs/metadetr/lib -Wl,-rpath=/home/ub/anaconda3/envs/metadetr/lib -Wl,--no-as-needed -Wl,--sysroot=/ /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cpu/ms_deform_attn_cpu.o /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cuda/ms_deform_attn_cuda.o /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/vision.o -L/home/ub/anaconda3/envs/metadetr/lib/python3.7/site-packages/torch/lib -Lusr/local/cuda-11.1/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-37/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so
g++: error: /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cpu/ms_deform_attn_cpu.o: 没有那个文件或目录
g++: error: /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/cuda/ms_deform_attn_cuda.o: 没有那个文件或目录
g++: error: /media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/build/temp.linux-x86_64-cpython-37/media/ub/eb78aac6-ede5-4037-8e5c-179e82f06841/pycharm_project/Meta-DETR-main/models/ops/src/vision.o: 没有那个文件或目录
error: command '/usr/bin/g++' failed with exit code 1

does anyone know how to deal with it?

It doesn't reproduce to your accuracy

First I downloaded your Pascal VOC Split 1 After Base Training: Weights
and then fine-tuned it in voc_fewshot_split1 seed1
But the results are down five points from the paper。
Fine-tuning other shots did not achieve the accuracy in the paper. On average, they're down about five points
I'd appreciate it if you could tell me why 。I hope you can reply to me as soon as possible
--dataset_file voc_base1
fewshot_seed=01
num_shot=01
gpu 3090
torch 1.7.1
gcc 7.5.0
Averaged stats: class_error: 16.67 loss: 13.4678 (13.6217) loss_ce: 1.0068 (1.0683) loss_bbox: 0.7345 (0.6811) loss_giou: 0.5437 (0.6277) loss_category_codes_cls: 0.0205 (0.0205) loss_ce_0: 1.0740 (1.0708) loss_bbox_0: 0.7470 (0.6255) loss_giou_0: 0.5046 (0.5959) loss_ce_1: 1.1113 (1.0220) loss_bbox_1: 0.6579 (0.6045) loss_giou_1: 0.4740 (0.5838) loss_ce_2: 1.0879 (1.0172) loss_bbox_2: 0.6594 (0.6091) loss_giou_2: 0.5281 (0.5844) loss_ce_3: 1.0211 (1.0271) loss_bbox_3: 0.6782 (0.6267) loss_giou_3: 0.4912 (0.5870) loss_ce_4: 1.0229 (1.0482) loss_bbox_4: 0.7843 (0.6298) loss_giou_4: 0.5092 (0.5920) loss_ce_unscaled: 0.5034 (0.5341) class_error_unscaled: 0.0000 (9.1140) loss_bbox_unscaled: 0.1469 (0.1362) loss_giou_unscaled: 0.2718 (0.3138) cardinality_error_unscaled: 279.0000 (271.8738) loss_category_codes_cls_unscaled: 0.0041 (0.0041) loss_ce_0_unscaled: 0.5370 (0.5354) loss_bbox_0_unscaled: 0.1494 (0.1251) loss_giou_0_unscaled: 0.2523 (0.2979) cardinality_error_0_unscaled: 294.7500 (293.1469) loss_ce_1_unscaled: 0.5556 (0.5110) loss_bbox_1_unscaled: 0.1316 (0.1209) loss_giou_1_unscaled: 0.2370 (0.2919) cardinality_error_1_unscaled: 297.8750 (295.3336) loss_ce_2_unscaled: 0.5439 (0.5086) loss_bbox_2_unscaled: 0.1319 (0.1218) loss_giou_2_unscaled: 0.2641 (0.2922) cardinality_error_2_unscaled: 295.2500 (292.1838) loss_ce_3_unscaled: 0.5105 (0.5136) loss_bbox_3_unscaled: 0.1356 (0.1253) loss_giou_3_unscaled: 0.2456 (0.2935) cardinality_error_3_unscaled: 295.3750 (292.3367) loss_ce_4_unscaled: 0.5114 (0.5241) loss_bbox_4_unscaled: 0.1569 (0.1260) loss_giou_4_unscaled: 0.2546 (0.2960) cardinality_error_4_unscaled: 290.6250 (288.4062)

Novel Categories:
Accumulating evaluation results...
DONE (t=2.62s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.177
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.304
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.177

Can anyone share the dataset? Thank you very much

base raining with the pooled samples

Are you fixing the pool for images for 1,2,3 or 5 shot in the initial by creating the json for base training . or are you dynamically creating the support and query dataset during every episode for base training so that many samples for each class gets in use.

FileNotFoundError: [Errno 2] No such file or directory: 'data\\voc\\annotations\\pascal_trainval0712.json'

FileNotFoundError: [Errno 2] No such file or directory: 'data\voc\annotations\pascal_trainval0712.json'

Multi-scale version

Hello,

Did you implement your model in multi-scale versions ? If yes, is it possible to release the associated code?

Thank you

Is this reasoning speed extremely slow, because the transformer is used?

how to use 2 GPU

I only have two GPUs at the moment, changed the command line

“GPUS_PER_NODE=2 tools/run_dist_launch.sh 2”

But still, only the first GPU can be used.

What part should I change so that I can use two GPUs?

How extract precision, recall and f1-score metrics

Hello thank you for sharing the code.

I would like to know how to extract precision, recall and f1-score metrics. I already have the AP and AR metrics.

I am trying to use the following code but it gives me a numpy matrix:

precision = coco_eval.eval['precision']
recall = coco_eval.eval['recall']

Can you help me?

Problem with DistributedDataParallel training

When I train on 2 GPUs. It seems pytorch divide sample_support_img by 2. Then I got a error
assert num_support == (self.args.episode_size * self.args.episode_num)
But I think we should not divide like this, right?

How long does the training step take?

Hi! I am curious about the time cost of the training process. I will look forward to your reply.

Difference between Model Architecture in the Paper and the Actual Implementation

Hello, Thanks for your great work. I would like to point out that in the paper it looks to me that the query features are passed to the CAM module. However, in the actual implementation the query features did not play a role until the final encoder-decoder architecture.

For example, the category_code() also computes the categorical features using the support samples, whereas in the meta_detr.py module the query features are extracted from the backbone and are only interacted with the support features in the self.transformer(), which seems to be different from paper's architecture. I am wondering if something is missing? Thanks.

Deformable-DETR-ft-full setting

Hi and thank you for the interesting work,

Can you please elaborate on the training of the Deformable-DETR-ft-full variant? My understanding is that a Deformable-DETR was trained on base classes and fine-tuned on novel classes, but I still might be missing few other details. Specifically -

What backbone was used? (ResNet50 or ResNet101..?)
Was the backbone initialized with pre-trained supervised ImageNet weights?
What was the training schedule+lr_drop used?

Thank you again!

RUN THE CODE ON COLAB

Hi,
I would like to run the code on COLAB, because I have only one GPU on my computer, but I suppose COLAB also has only one GPU. Is there any way to run the code on COLAB with multiple GPUs? Or any recommendation to run the code with multiple GPUs on an alternative platform?
I would be glad if you help me with this.

Is it possible to specify the prerequisites ?

Thanks for the work,

Is it possible to specify the prerequisites : python, cuda and pytorch version?

想請問可以新增如何使用train, resume, Inference的指令嘛？？

performing few-shot finetuning only

hi,

I'm glad to find this interesting work. I would like to perform few-shot finetuning based on the pretained weight provided in github, but I have couples of questions:

I want to run the code on Google Colab but i'm wondering that is it possibe to setup the enviement with only one GPU (with RAM=12GB)?
if I only want to perform few-shot finetuning, I'm assuming that i don't have to download the "full" dataset, correct?

I'm looking forward to seeing your reply :)

where to find the pretrained model?

About basetraining performance

您好，我按照您给出的log对MSCOCO数据集进行了重新的base training，但是只用了一张Nvidia V100 GPU。重新训练后的base training model相较于您提供的MSCOCO的base training model性能下降很多，尤其是在base class。
Hello, I re-base trained the MSCOCO dataset according to the log you gave, but only used an Nvidia V100 GPU. Compared with the base training model of MSCOCO provided by you, the performance of the base training model after retraining decreases a lot, especially in the base class.

我使用的base training的参数
I used these parameters of base training

EXP_DIR=exps/coco
BASE_TRAIN_DIR=${EXP_DIR}/base_train
mkdir exps
mkdir ${EXP_DIR}
mkdir ${BASE_TRAIN_DIR}

python -u main.py
--dataset_file coco_base
--backbone resnet101
--num_feature_levels 1
--enc_layers 6
--dec_layers 6
--hidden_dim 256
--num_queries 300
--batch_size 4
--category_codes_cls_loss
--epoch 25
--lr_drop_milestones 20
--save_every_epoch 5
--eval_every_epoch 5
--output_dir ${BASE_TRAIN_DIR}
2>&1 | tee ${BASE_TRAIN_DIR}/log.txt

重新train后的base training model：
base training model after re-train:

Novel Categories:
Accumulating evaluation results...
DONE (t=3.51s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.044
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.066
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.052
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.120

Base Categories:
Accumulating evaluation results...
DONE (t=9.94s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.014
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.072
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.157

主页上提供的base training model：
base training model provided on this github:

Base Categories:
Accumulating evaluation results...
DONE (t=10.00s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.346
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.538
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.365
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.183
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.314
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.496
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.530
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.303
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.603
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.764

Novel Categories:
Accumulating evaluation results...
DONE (t=6.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.012
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.005
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.010
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.105
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.110
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.067
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.102
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.197

作者你好，非常期待早点开源代码。请问用单块3090可以训练起来么，显卡数量对结果影响用多大，谢谢。

你好，请问啥会儿发布代码呀~

Some questions about few-shot tuning

Hey, it's me here.
I notice that the few-shot tuning is based on your pre-splited nshot.json in data/coco_fewshot/seedx
So here are my questions:

I want to know that how could I know which are the base classes, and which are the novel classes?
If I want to use another dataset, how could I do that?
In other datasets in coco annotation format, how could I set the base classes and novel classes?
That's all for now, thx if you could offer any help
Have a good day btw.

ETA Release and Pretrained weights

Hey guys !
Really enjoy your good work, though I have some trouble make it converge because of the long training time.
Do you have any news or date on when the pretrained weights and cleaned git will be released ?

Thanks !

how to transform pascal voc annotations into coco annotation format with json?

How long will it take to publish the code?

Slow DataLoader

I have run some diagnosis on your code and I found that even using more workers, the dataloader is still slow (not even backward start) Is this slow loading in for samples in dataloader behavior expected in the start of training process?

Why aren't the sampling offsets related parameters trained during fine-tuning?

Dear auther,
I notice that the sampling offsets related parameters aren't trained during fine-tuning. I am kind of confusing what the reason for.
Looking forward to your reply.

Questions about finetuning stage

Hi, I've read about your paper and am wondering whether specific stages are frozen for the second finetuning stage.

How much time it takes to train an epoch when using one v100?

How much time it takes to train an epoch when using one v100?
Why did I train an epoch with one 2080TI for a day(around 24 hours) ? Is there anything wrong with it? Or is that the reality?

Meta task settings

Hello,

Thanks for the work and releasing the code.

I was wondering what meta task settings you used to get the results you listed in your paper for VOC and COCO dataset. Did you use 5 episode num, 5 episode size, 15 total num support and 10 max pos support?

Also, is the code release up to date with your latest version of the paper?

麻烦作者尽快发布源码

读论文的时候感觉作者把FSDV和detr结合的思路还挺好的，但是一些创新点貌似都是detr里的，比如

This paper presents a novel meta-detector framework, namely Meta-DETR, which eliminates region-wise prediction and instead meta-learns ob- ject localization and classification at image level in a unified and complementary manner.

想看下复现的实验效果，麻烦作者尽快整理呀(抱拳)

Transformer pre-trained parameters

Hi, I wanna know whether is the transformer trained from scratch? If not, where are the pre-trained parameters from?

怎样换数据集测试？

如果我想换数据集的话，前期数据预处理该怎么做，请问您有没有换过，并且数据前期的准备工作有没有做过，如果做过的话能否在后续的代码中开源一下脚本和说明？谢谢您，期待您的解答

could you share inferrence.py file ? thank you advance.

Release of the code?

Hey there! Will you release the code soon? Thanks!

Now it's November, when will the code be released?

能快一点发布吗

Inference performance is far worse than the numbers recorded at training

So as I have put into the title.
I'm training and testing with the MSCOCO.
I use 8 V100s to do the base and finetune, epochs etc following the code's setting.
The performance recorded at the end of, for example 10 shot finetune is like this:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.19241
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.30558
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.20127
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.03518
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.16560
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.29263
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.22415
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.33887
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.35160
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.11586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.30860
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.55801

Which is reasonable to me. However when I try to do inference (on my local machine with a 1080ti) with the trained checkpoint0499.pth, the result is like this for base classes, for novel it's all zeros:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.040
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.091
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.026
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.005
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.043
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.063
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.073
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.073
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.064
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.173

It is very likely that I have missed something so I'm trying to debug, but with no clue at the moment. The command I used at finetune is:

python main.py \
	--dataset_file coco_base \
	--backbone resnet101 \
	--num_feature_levels 1 \
	--enc_layers 6 \
	--dec_layers 6 \
	--hidden_dim 256 \
	--num_queries 300 \
	--batch_size 2 \
	--category_codes_cls_loss \
	--resume BLABLABLA \
	--fewshot_finetune \
	--fewshot_seed ${fewshot_seed} \
	--num_shots 10 \
	--epoch ${epoch} \
	--lr_drop_milestones ${lr_drop1} ${lr_drop2} \
	--warmup_epochs 50 \
	--save_every_epoch 25 \
	--eval_every_epoch 25 \
	--start_epoch ${NEXT_EPOCH} \
	--output_dir BLABLABLA \

And at inference is:

python main.py \
	--eval \
	--dataset_file coco_base \
	--backbone resnet101 \
	--num_feature_levels 1 \
	--enc_layers 6 \
	--dec_layers 6 \
	--hidden_dim 256 \
	--num_queries 10 \
	--batch_size 1 \
	--category_codes_cls_loss \
	--resume checkpoint0499.pth \
	--num_shots 10 \
	--output_dir tmp \

Any information is appreciated.