dyabel / detpro Goto Github PK

License: Apache License 2.0

Python 99.65% Shell 0.35%

detpro's Introduction

This is the code base for CVPR2022 paper Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Prepare data

Download dataset according to LVIS, VOC, COCO and Objects365. Precomputed proposals generated by RPN trained on only base classes can be downloaded from google drive baiduyun (code:yadt). It is recommended to download and extract the dataset somewhere outside the project directory and symlink the dataset root to data as below.

├── mmdet
├── tools
├── configs
├── data
├── ├── lvis_v1
├── ├── ├──annotations
├── ├── ├──train2017
├── ├── ├──val2017
├── ├── ├──proposals
│   ├── coco
│   │   ├── annotations
│   │   ├── train2017
│   │   ├── val2017
│   ├── VOCdevkit
│   │   ├── VOC2007
│   │   ├── VOC2012
│   ├── objects365
│   │   ├── annotations
│   │   ├── train
│   │   ├── val

All models use the backbone pretrained with SoCo which can be downloaded from google drive baiduyun (code:kwps). Put the pretrained backbone under data/.

Main Results

Model	Lr Schd	AP^bb_r	AP^bb_c	AP^bb_f	AP^bb	AP^mk_r	AP^mk_c	AP^mk_f	AP^mk	Config	Prompt	Model
ViLD*	20 epochs	17.4	27.5	31.9	27.5	16.8	25.6	28.5	25.2	config	google drive baiduyun (code:a5ni)	google drive baiduyun (code:cyhv)
DetPro (Mask R-CNN)	20 epochs	20.8	27.8	32.4	28.4	19.8	25.6	28.9	25.9	config	google drive baiduyun (code:uvab)	google drive baiduyun (code:apmq)
DetPro (Cascade R-CNN)	20 epochs	21.7	29.6	35.0	30.5	20.0	26.7	30.4	27.0	config	google drive baiduyun (code:uvab)	google drive baiduyun (code:5ee9)

In the original implementation of ViLD, the whole training process takes up to 180,000 iterations with batchsize of 256, approximately 460 epochs, which is unaffordable. We re-implement ViLD (denoted as ViLD*) with backbone pretrained using SoCo. Our re-implementation version achieves comparable AP compared with the original implementation, while reducing the training epochs from 460 to 20.

Installation

Dependencies

python3.8
pytorch 1.7.0
cuda 11.0

This repo is built on mmdetection, CLIP and CoOP

pip install -r requirements/build.txt
pip install -e .
pip install git+https://github.com/openai/CLIP.git
pip uninstall pycocotools -y
pip uninstall mmpycocotools -y
pip install mmpycocotools
pip install git+https://github.com/lvis-dataset/lvis-api.git
pip install mmcv-full==1.2.5 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html

Get Started

Quick reproduction of the main results

./tools/dist_test.sh <config> <model> <gpu_num> --eval bbox segm --cfg-options model.roi_head.prompt_path=<prompt> model.roi_head.load_feature=False

Prepare data for DetPro training

see prepare.sh.

This process will take a long time. So we also provide the extracted clip image embeddings of precomputed proposals baiduyun (code:o4n5). You can download all these zip files and merge them into one file (lvis_clip_image_embedding.zip).

Train RPN on Only Base Classes

The training code and checkpoint are available here baiduyun(code:tqsd).

Train DetPro

see detpro.sh

Train ViLD with DetPro (Mask R-CNN)

see vild_detpro.sh

Tranfer experiments

see transer.sh

The empty prompt is provided here, you can use it to generate the prompt for COCO, VOC and Objects365.

Citation

@article{du2022learning,
  title={Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model},
  author={Du, Yu and Wei, Fangyun and Zhang, Zihe and Shi, Miaojing and Gao, Yue and Li, Guoqi},
  journal={arXiv preprint arXiv:2203.14940},
  year={2022}
}

detpro's People

Contributors

Stargazers

Watchers

detpro's Issues

Installation of mmdet

If this project is based on mmdet, why is it that only mmcv is installed in the installation list and there is no need to install a certain version of mmdetection?

Understand and reproduce the work

Hi,

Thanks for your great work!

I'm trying to reproduce the ViLD baseline in your repo but still having some trouble understanding it. Here are my questions:

In readme, the configs of ViLD and DetPro both point to the same file: detpro_ens_20e.py and you have confirmed in other issues that this is correct. In this case, how would you determine whether the model uses learnt prompt as in DetPro or manual prompt as in vanilla ViLD. Could you please share the running command for reproducing ViLD? Really appreciate it.
In those configs, they both use Shared4Conv1FCBBoxHead as the RoI head. However, this module seems not leverage text embeddings for training and inference. Should something like StandardRoIHeadTEXT be used instead? I'm not sure if my understanding is correct and please do correct me if my understanding is wrong.

Thanks for your time again!

Training time for DetPro?

Great work!

How long does the DetPro training take?

baiduyun link miss val part of lvis_clip_image_embedding

lvis_clip_image_embedding.zip.xxx
i cat them as lvis_clip_image_embedding.zip with 180g.
after spending one day on solving my disk storage problem and unziping lvis_clip_image_embedding.zip, i get only tarin/train2017 without val/trainxx&val. I guess that when you compressing the files, val part was forgot.

about reprodcution

Hi, thanks for providing the source codes and models.
When I tried to reproduce the main results about DetPro, I follow the instructions in README. I got warnings in loading pretrained models and the results are relatively low.

Warnings: The model and loaded state dict do not match exactly. unexpected key in source state_dict: roi_head.clip_model.input_resolution, roi_head.clip_model.context_length, roi_head.clip_model.vocab_size.

Results: APr:0.172 APc:0.219 APf:0.258' for detection and APr:0.168 APc:0.205 APf:0.234' for segmentation.

Is there anything wrong?

RPN Training Code

Wanted to confirm a quick doubt:

in lvis.py, I believe the code has the line if cat_info['frequency'] != 'r': . But I believe this is not applicable for training since we want to train on the frequent and common classes.

Prepare.sh structure of the zip directory

Hi,

Thanks again for your work. I followed the instruction on prepare.sh and tried to reproduce vild*. However, the estimated training time is > 5 days instead of ~ 2 days as you mentioned in other posts. I suspect the issue is with the lvis_clip_image_embedding.zip and want to confirm whether what I have is correct.

I followed the first command in prepare.sh but turn it into 8 GPU execution:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 8 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False total_epochs=1, which gives me 99342 *.pth files under data/lvis_clip_image_embedding/train2017.
After step 1, I run: zip -r data/lvis_clip_image_embedding.zip data/lvis_clip_image_embedding/* from the root directory and get data/lvis_clip_image_embedding.zip
I run ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 8 --work-dir workdirs/vild --cfg-options model.roi_head.prompt_path=lvis_clip_text_embedding.pt model.roi_head.load_feature=True as instructed in another post with 8 GPUs. I checked the GPU usage and they are all fully loaded so I didn't think the problem is with GPU utilization. However, the estimated finishing time is > 5 days.

I checked the hierarchy inside lvis_clip_image_embedding.zip and it looks like the internal hierarchy looks like:
data/lvis_clip_image_embedding/train2017/000000203466.pth while the zip file is already under ./data. In other words, it seems that a redundant level of data is created inside the zip file. I'm not sure if this expected or not. If not, what is the expected hierarchy inside the zip file?

Thanks for your help!

The prompts for COCO, VOC and Objects365

Hi, Yu Du!

Thanks for your great work! Can you provide the prompts for COCO, VOC and Objects365 datasets? I tried to reproduce the result, but it takes too much time to download or generate the clip embeddings.

question about current_mmdetection_Head.pth

Hi, "current_mmdetection_Head.pth" contains the backbone, neck, and roi_heads, which part should I input to the model?
Thank you very much.

Transfer need coco proposals?

Hello, when I reproduced the transfer results, I found that it need 'proposals/rpn_r101_fpn_1x_train2017.pkl' . Can you upload it or give a document about how to get it? Thank you!

Proposal files of transfer experiments

Hi,

Thanks for your great work. I just have a quick question regarding the transfer experiments. My understanding of this set of experiments is that you take a LVIS-pretrained detector (e.g, Mask R-CNN) and directly evaluate it on Pascal VOC, COCO, and Object 365. In other words, there should be no training on these datasets except LVIS. Is this a correct understanding?

If so, why do we have proposal_file=data_root + 'proposals/rpn_r101_fpn_1x_val2017.pkl' in the dataset definition of these three datasets. I assume the proposal file is only used to train ViLD (the knowledge distillation process).

Sorry for bugging you and thanks for your help!

query about the order of proposals generated by rpn101

Hi, I am following your nice work. I got a little confused that how to use your rpn101 proposals in https://drive.google.com/drive/folders/1rV6jJxbiELT4GNvfDYnRAuYJTw2AM_yR, I've downloaded it and found the whole pickle is a python list, which length is 100170, each item is a np array which shape is (1000,5).
My question is do the list ordered by img_id ? I am not sure about it.
Thanks for your help!!!

How to inference on single image?

Hi, I'm interested your amazing work and I want to visualize the results on my own images. How can I start?

python: can't open file 'gen_cls_embedding.py': [Errno 2] No such file or directory

I can't find gen_cls_embedding.py

link of config files in the table

Hi, it is confusing that the config-file link of ViLD* and DetPro (Mask R-CNN) is same. Is there a mistake in the upload process？
Thanks

Question of prompt

I see the author provides some prompt "fg_bg_5_5_6_r1_prompt", "fg_bg_5_6_7_r1_prompt" .... "fg_bg_5_9_10_r1_prompt.pth"

But in the transfer.sh, the code needs "endepoch6", "obj365" and so on.

How can I get these prompts?

Question about the prompt

Thanks for your excellent work!
Could you please provide the pretrained prompt weights "checkpoints/exp/fg_bg_5_5_6_endepoch6_prompt.pth",...,"checkpoints/exp/fg_bg_5_9_10_endepoch6_prompt.pth" for generating the class_embedding?

cannot reproduce the ViLD results

Hi,
I have already generated the CLIP embeddings for precomputed proposals and I'm trying to reproduce the baseline ViLD results through the following command:
./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 8 --work-dir workdirs/vild_ens_20e

However, after training finished, there is a huge gap between my results and that reported in paper:

bbox_AP: 0.1670, bbox_AP50: 0.2740, bbox_AP75: 0.1720, bbox_APs: 0.1100, bbox_APm: 0.2250, bbox_APl: 0.2970, bbox_APr: 0.0900, bbox_APc: 0.1560, bbox_APf: 0.2120, bbox_mAP_copypaste: AP:0.167 AP50:0.274 AP75:0.172 APs:0.110 APm:0.225 APl:0.297 APr:0.090 APc:0.156 APf:0.212, segm_AP: 0.1520, segm_AP50: 0.2480, segm_AP75: 0.1560, segm_APs: 0.0990, segm_APm: 0.2130, segm_APl: 0.2780, segm_APr: 0.0860, segm_APc: 0.1440, segm_APf: 0.1910, segm_mAP_copypaste: AP:0.152 AP50:0.248 AP75:0.156 APs:0.099 APm:0.213 APl:0.278 APr:0.086 APc:0.144 APf:0.191

Can you please give any help?~

Which pre-trained weights from SoCo is used in depart

I notice in https://github.com/hologerry/SoCo there are several types of pre-trained weights based on R-50. And I wonder which one is used in this paper ?

some issues about RPN

During the training process, you have use an pre-trained RPN to generate the RPN proposals in 1, and the you train the RPN
in 2 to get proposals.
What is the difference between them?
can I load the RPN1 in 2 and cancel the "Distillation"?
Thank you for your reply.

proposals/ RPn_R101_fpn_lvis_train.pkl

Hello, thank you very much for your work!

I have a question. How can I use code to generate 'proposals/ RPn_R101_fpn_lvis_train.pkl 'if you don't give them? Or are there detailed code instructions?

Thank you so much!

Runtime error

Dear Author,

When I am trying run the code, the following command:

python prompt/run.py train data/lvis_clip_image_proposal_embedding/train data/lvis_clip_image_proposal_embedding/val checkpoints/exp fg_bg_5_5_6_end soft 0.5 0.5 0.6 8 end

I meet such error:

['prompt/run.py', 'train', 'data/lvis_clip_image_proposal_embedding/train', 'data/lvis_clip_image_proposal_embedding/val', 'checkpoints/exp', 'fg_bg_5_5_6_end', 'soft', '0.5', '0.5', '0.6', '8', 'end']
jit version
Initializing a generic context
Initial context: "X X X X X X X X"
Number of context words (tokens): 8
Multiple GPUs detected (n_gpus=2), use all of them!
MODEL BUILD COMPLETE
Traceback (most recent call last):
  File "prompt/run.py", line 165, in <module>
    train_neg = sample(train_neg, len(train_neg[0])//int(neg_split), [866])
  File "prompt/run.py", line 111, in sample
    featk = torch.cat(featk, dim = 0)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

BTW, I have strictly followed the installation guide in readme and run prepare.sh.
Looking forward to your reply!

train time for vild

I use the sh and config
./tools/slurm_train.sh a100 vild configs/lvis/detpro_ens_20e.py workdirs/vild_ens_20e_fg_bg_5_10_end --cfg-options model.roi_head.load_feature=True
to reproduce vild* with 8 32g a100 batchsize 24(8*3)

and in the issue, you claim only 0.75s per iter. but for me, it is 6s。20epoch cost about 30 days.

Background embedding

Hi!
You mentioned SoftBG in your paper. But I cannot find its implementation in the codes. What I find is that you used a learned background embedding and masked the logits of the novel classes.

running error

Dear Author,i met a error when i am running dist_test.sh,what should ido
Traceback (most recent call last): File "/home/worker/anaconda3/envs/detpro/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/worker/anaconda3/envs/detpro/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/worker/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module> main() File "/home/worker/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/worker/anaconda3/envs/detpro/bin/python', '-u', './tools/test.py', '--local_rank=2', './configs/lvis/detpro_ens_20e.py', './epoch_20.pth', '--launcher', 'pytorch', '--eval', 'bbox', 'segm', '--cfg-options', 'model.roi_head.prompt_path=./lvis_text_embedding.pt', 'model.roi_head.load_feature=False', '--show', '--show-dir', './exp']' died with <Signals.SIGSEGV: 11>.

question about RPN training

I get an error when using rpn_f50_fpn_1x_lvis.py: TypeError: forward() missing 1 required positional argument: 'img_no_normalize'

prepare.sh

in prepare.sh line 4
zip -r data/lvis_clip_image_embedding.zip data/lvis_clip_image_embedding/*
so i wanna check that the baiduyun link 173 files ==above "lvis_clip_image_embedding.zip"?
and then i need do next:
./tools/dist_train.sh configs/lvis/prompt_save_train_reuse.py 2 --work-dir workdirs/prompt_save_train
./tools/dist_train.sh configs/lvis/prompt_save_val.py 2 --work-dir workdirs/prompt_save_val
to ./detpro/data/lvis_clip_image_proposal_embedding/train&val under ./data/
sorry to disturb you. i can't reproduce the result because of my fault.

type of rpn_head

Hello, thank you very much for your work!

I have a question. According the configs in vild_detpro.sh, the rpn_head is set as type='RPNHead' in mask_rcnn_r50fpn.py, but there is no 'forward_train' function within rpn_head.py. I wonder if you use 'cascade_rpn_head' instead of 'rpn_head ', for I find 'forward_train' fuction with parameters in cascade_rpn_head.py, which is consistent with 'self.rpn_head.forward_train' in mask_rcnn.py.

Thank you so much!

vild*

能否单独训练vild*呢？
另外train vild with depro是什么意思…这个指代你的detpro（maskrcnn）吗

Could you please upload the extracted clip image embeddings of precomputed proposals to google drive?

Dear author,

Could you please upload the extracted clip image embeddings of precomputed proposals to google drive?
Or somewhere can be downloaded via terminal?
As I am out of China mainland, it seems I can not download from Baidu cloud.
Also, I have meet the box identity error... So I can not generate these embeddings by myself...
#8 (comment)

I would appreciate it very much if you can upload to google drive or somewhere can be downloaded via terminal!

FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/exp/fg_bg_5_10_end_ens.pth'

I was trying to re-implement your Vild code by using the vild_detpro.sh

I notice there's a parameter model.roi_head.prompt_path=checkpoints/exp/fg_bg_5_10_end_ens.pth , I looked up all the file that you provided but couldn't find the correct file, I wonder how to get the file or is there anything that I miss.

At your convenience, I would really appreciate you looking into this issue

Training Time

Hello, thanks for your great work. Thank you for the code for ViLD and DetPro. I found that they all used segmentation. I would like to know if the segmentation branch is removed, will it save more training time? Thank you.

Pre-trained model

How did you get the pre-trained model current_mmdetection_Head.pth?

confused by rpn

Precomputed proposals generated by RPN trained on only base classes
what model rpn do u refer to？
could u provide the ckp of the rpn.
thx。

question about coco_unseen_ids

Hi, I want to know what's the meaning of the coco_unseen_ids_train, coco_unseen_ids_test in the file class_name.py. It seems that it is different from the popular (17+48) zero-shot setting. eg. The coco_unseen_ids_test list contains 15 items.
Thanks!

VILD* config

Which file is the VILD* config? When i select the VILD* config in README, it jumps to detpro_ens_20e.py. Is it correct?

feature dimensions do not match

Hi, I'm trying to run the following command in prepare.sh,
CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/prompt_save_train_reuse.py 2 --work-dir workdirs/prompt_save_train
and meet some errors like

Traceback (most recent call last):
  File "./tools/train.py", line 199, in <module>
    main()
  File "./tools/train.py", line 188, in main
    train_detector(
  File "/home/ubuntu/work/code/detpro/mmdet/apis/train.py", line 151, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/ubuntu/work/code/detpro/mmdet/models/detectors/base.py", line 246, in train_step
    losses = self(**data)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/home/ubuntu/work/code/detpro/mmdet/models/detectors/base.py", line 180, in forward
    return self.forward_train(img,img_no_normalize, img_metas, **kwargs)
  File "/home/ubuntu/work/code/detpro/mmdet/models/detectors/mask_rcnn.py", line 83, in forward_train
    roi_losses = self.roi_head.forward_train(x, img, img_no_normalize, img_metas, proposal_list,proposals,
  File "/home/ubuntu/work/code/detpro/mmdet/models/roi_heads/standard_roi_head_collect_reuse.py", line 269, in forward_train
    bbox_results = self._bbox_forward_train(x,img,sampling_results,proposals_pre_computed,
  File "/home/ubuntu/work/code/detpro/mmdet/models/roi_heads/standard_roi_head_collect_reuse.py", line 364, in _bbox_forward_train
    bbox_results, region_embeddings = self._bbox_forward(x, rois)
  File "/home/ubuntu/work/code/detpro/mmdet/models/roi_heads/standard_roi_head_collect_reuse.py", line 305, in _bbox_forward
    bbox_pred = self.bbox_head(bbox_feats)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/work/code/detpro/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py", line 214, in forward
    bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/ubuntu/anaconda3/envs/detpro/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

It looks like the dimension of feature does not match the dimension of weight in the BBoxHead module. I guess because the shared layer is commented out in forward pass of the 'ConvFCBBoxHead' module. Can you help check the code?

Which config file to use

Hi.

Thanks for your great work!!

I understand that VILD network consists of text head and image head.

So, I tried to figure out how the code is implemented and found out
it is in mmdet/models/roi_heads/standard_roi_head_text.py.

In readme, the configs of ViLD and DetPro both point to the same file: detpro_ens_20e.py, I think this does not call both StandardRoIHeadTEXT and StandardRoIHeadTEXTPrompt into the model for training text head and image head.

To do this I think we need to load the detpro_text_promt.py in config folder.

To sum up the question,

Does your training script(vild_detpro.sh) properly construct text head and image head in the model? I think config file detpro_text_promt.py should be loaded for your method.
How should I train the baseline VIPD*? Could you specify the command?!

I am not sure if my understanding is right and I would really appreciate if you could correct me if I am wrong.

Thank you for reading and thanks in advance.

The advantages of separate head for KD when ensembling?

Is there any significant improvement by adding a separate head network for the roi feature alignment by calling forward_embedding_for_image() function in the ensemble setup?

How long it will take for running prepare.sh?

Thanks very much for your amazing work and codes!
I am trying to run code myself, but it seems with provided prepare.sh, it will take several days (7 days for train split).
Is this normal or not?

Training RPN based on the provided code. Rather low performance.

Hi, Thanks for sharing the code and model. I've tried the code of RPN you provided and work on building a more generalized rpn. However, I have the following issues and would appreciate your help :
I test the epoch20.pth you provide on lvis_1 dataset. I get AR@100=36.5, far from the AR given by ViLD paper, Table 1: AR@100=39.3. Why does this happen? I understand that ViLD tests on the lvis novel data and you test the whole data. Shouldn't the epoch_20.pth perform much better than the ViLD rpn?
I don't get the same performance when training RPN using your code. I use the config rpn_r101_fpn_1x_lvis.py (training 12 epoch, no 20epoch config is provided) and only get AR@100=9.5. I follow the mmdet instructions and simply use bash tools/dist_tran.sh configs/rpn/rpn_r101_fpn_1x_lvis.py 2 to train the RPN, why do I get such low performance?
I can only test the RPN model using proposal_fast mode. When I use bash tools/dist_test.sh configs/rpn/rpn_r101_fpn_1x_lvis.py models/epoch_20.pth 1 --out work_dirs/result.pkl --eval proposal , I get all zero results.
image

running time for perpare.sh

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh',
CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totol_epochs=1
which is used to generate the CLIP embeddings for precomputed proposals.
However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

the disk storage of lvis_clip_image_proposal_embedding

i upload 173 lvis_clip_image_embedding.zip.xxx and cat them into lvis_clip_image_embedding.zip with about 180G storage space.
and i found that when unzip lvis_clip_image_embedding.zip, to get lvis_clip_image_proposal_embedding/train&val, my 500g disk storage run out of. I just wonder how much storage space to store the lvis_clip_image_proposal_embeddingtrain&val...

Issues in running the script

Hi, when I run your (demo script),
./tools/dist_test.sh configs/lvis/cascade_mask_rcnn_r50_fpn_sample1e-3_mstrain_20e_lvis_v1_pretrain_ens.py data/models/epoch_20.pth 8 --eval bbox segm --cfg-options model.roi_head.prompt_path=data/prompt/iou_neg5_ens.pth model.roi_head.load_feature=False

one error reports:
FileNotFoundError: [Errno 2] No such file or directory: 'data/lvis_v1/proposals/rpn_r101_fpn_lvis_val.pkl'

I check the file you provided and it seems the precomputed results of lvis validation set is not provided. Can u help check the file or check whether the script is correct or not?

[Feature Request] Colab demo

Hi, would it be possible to create a Colab demo/link for this repo?

How much memory is needed to train detpro?

OOM error occurs
RuntimeError: CUDA out of memory. Tried to allocate 262.00 MiB (GPU 0; 23.70 GiB total capacity; 21.79 GiB already allocated; 20.81 MiB free; 22.32 GiB reserved in total by PyTorch)
when I use command
python prompt/run.py train data/lvis_clip_image_proposal_embedding/train data/lvis_clip_image_proposal_embedding/val checkpoints/exp fg_bg_5_5_6_end soft 0.5 0.5 0.6 8 end
on 3090 (24G).

So I wonder how much memory is needed to train detpro?

About how to reproduce detpro?

I found the code in "detpro.sh" are all for reproduce the prompt of detpro. But I noticed that you provide a model "DetPro (Mask R-CNN)" https://drive.google.com/file/d/1ktTMZWFjUAGjzjlOdzxGfKQR8u9x_OmX/view?usp=sharing; But I don't find any script or configs to reproduce this model, the config of DetPro and ViLD* are the same.
Can you explain how to reproduce the DetPro model. Thank you very much!

Having issues reproducing DetPro

Hi! Thanks for your great work!
Following your code and instructions, I'm trying to reproduce the Mask-RCNN results on LVIS. Using the same experiment setting provided in the code on 8 * V100 GPUs, the evaluation results of detpro_20e_ens.py are 11.7/26.7/32.8/26.5 (AP_r/AP_c/AP_f/AP).
I also tried to replace the learned prompts fg_bg_5_10_end_ens.pth with your provided iou_neg5_ens.pth, but the performances are similar - 13.4/26.4/32.7/26.6.
I wonder if there should be any modification in the current code in order to reproduce the given results. Besides, is there any difference between the mentioned two learned prompts above?
Thanks for your kindful reply in advance!