chanchanchan97 / icafusion Goto Github PK

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection, Pattern Recognition

License: GNU Affero General Public License v3.0

Python 97.53% MATLAB 0.34% Shell 1.95% Dockerfile 0.17%

icafusion's Introduction

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

Introduction

In this paper, we propose a novel feature fusion framework of dual cross-attention transformers to model global feature interaction and capture complementary information across modalities simultaneously. In addition, we introdece an iterative interaction mechanism into dual cross-attention transformers, which shares parameters among block-wise multimodal transformers to reduce model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.

Paper download in: https://arxiv.org/pdf/2308.07504.pdf

Overview

Fig 1. Overview of our multispectral object detection framework

Fig 2. Illustration of the proposed DMFF module

Installation

Clone repo and install requirements.txt in a Python>=3.8.0 conda environment, including PyTorch>=1.12.

git clone https://github.com/chanchanchan97/ICAFusion.git
cd ICAFusion
pip install -r requirements.txt

Datasets

KAIST
Link：https://pan.baidu.com/s/1UdwQJH-cHVL91pkMW-ij6g Code：ig3y
FLIR-aligned
Link：https://pan.baidu.com/s/1ljr8qJYdz-60Lj-iVEHBvg Code：uqzs
VEDAI
The previous link is invalid. We will update it later.

Weights

KAIST
Link：https://pan.baidu.com/s/18UXctOSgjp6EUcJXIGbWTQ Code：9eku
FLIR-aligned
Link：https://pan.baidu.com/s/1VZbsTE4o6bw2XBypPW3zoA Code：xli9

Files

Note: This is the txt files for evaluation. We continuously optimize our codes, which results in the difference in detection performance. However, the codes of module for multimodal feature fusion still remain consistent with the methods proposed in this paper.

KAIST Link：https://pan.baidu.com/s/1N7SNEPXKX7KFaO2Th7vq2g Code：zijw

Citation

If you find our work useful in your research, please consider citing:

@article{SHEN2023109913,
  title={ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection},
  author={Shen, Jifeng and Chen, Yifei and Liu, Yue and Zuo, Xin and Fan, Heng and Yang, Wankou},
  journal={Pattern Recognition},
  pages={109913},
  year={2023},
  issn={0031-3203},
  doi={https://doi.org/10.1016/j.patcog.2023.109913},
  author={Jifeng Shen and Yifei Chen and Yue Liu and Xin Zuo and Heng Fan and Wankou Yang},
}

icafusion's People

Contributors

Stargazers

Watchers

Forkers

zxiaolon xbchen82 ymriri wuqingzhou828 wangzhenlin123 wj-data jihuacao

icafusion's Issues

MR

Hello author, I am testing with your code and wondering why the value at MR is 0. I have modified the code you commented on regarding MR calculation in test. py.

RuntimeError: stride should not be zero

Traceback (most recent call last):
File "train.py", line 590, in
train_rgb_ir(hyp, opt, device, tb_writer)
File "train.py", line 381, in train_rgb_ir
results, maps, MRresult, times = test.test(data_dict,
File "/data/zcy/ICAFusion-main/test.py", line 128, in test
out, _, train_out = model(img_rgb, img_ir, augment=augment) # inference and training outputs
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/ICAFusion-main/models/yolo_test.py", line 133, in forward
return self.forward_once(x, x2, profile) # single-scale inference, train
File "/data/zcy/ICAFusion-main/models/yolo_test.py", line 157, in forward_once
x = m(x) # run
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/ICAFusion-main/models/common.py", line 817, in forward
new_rgb_fea = self.vis_coefficient(self.avgpool(rgb_fea), self.maxpool(rgb_fea))
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/ICAFusion-main/models/common.py", line 885, in forward
y = nn.AvgPool2d(kernel_size=self.kernel_size, stride=(self.stride_h, self.stride_w), padding=0)(x)
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 615, in forward
return F.avg_pool2d(input, self.kernel_size, self.stride,
RuntimeError: stride should not be zero
没有改动程序，训练时出现了这样的错误

作者提供的权重在FLIR上的mAP50指标直接val是0.828，和论文里的指标对应不上

直接运行 python test.py --weights weights/ICAFusion_FLIR.pt --device 1

 Class      Images      Labels           P           R      [email protected]     [email protected]  [email protected]:.95: 100%|█████████████████████████████| 1013/1013 [00:58<00:00, 17.42it/s]
                 all        1013        8588       0.813       0.769       0.828       0.338       0.407
              MR-all     MR-day   MR-night    MR-near  MR-medium     MR-far    MR-none MR-partial   MR-heavy Recall-all
                0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
              person        1013        4106       0.834       0.766       0.849       0.287       0.385
                 car        1013        4123       0.836       0.847       0.898       0.603       0.552
             bicycle        1013         359       0.768       0.693       0.738       0.124       0.283

Vedai Dataset

Hello, the link to the Vedai dataset is invalid. Can you update it

数据集

请问可以提供一下KAIST数据集的下载链接吗

关于KAIST数据集训练好模型或者测试结果的TXT文件。

你好，请问您能提供下关于在KAIST测试数据集上训练好的模型或者测试结果的TXT文件吗？我在这个开源工程并没有找到相关的文件，您能上传一下吗？

寻求数据集的组织形式

您好，请问您能否分享一下代码对应的数据集的组织形式，例如图片和标签存储的层级

复现的指标很低

您好，我用您的默认指令和代码复现过几次，在KAIST和FLIR数据集的中，各个指标很低，请问下有哪些参数需要修改吗

How go genrate attention map?

Hello, I would like to follow how you did the attention map visualization in your paper? Can you provide the program code? Thanks

Yaml files for DMFF

Hi,

Thanks for releasing code for your research. I am trying to reproduce experiments results mentioned in you paper, but I cannot find DMFF yaml files under the transformer directory. I wondered if you could share the DMFF yaml files with me, or I can just create the DMFF yaml files myself by replacing NiN_fusion modules with the DMFF modules.

Thanks for your time for reading this message

单卡训练

MR为99

test.py运行后得到的result.txt,其用于评估显示MR为99

LLVIP results

Hello！
What are your experimental results on the LLVIP dataset? I ran the code from this paper on the LLVIP dataset, and the map@50 result is very low, more than ten points lower than CFT. Why is this the case?

Train new models

Could you please tell me what command I should use to train a new model?

map indicator problem

Hello, I tried to get the map indicator on the FLIR data set in your paper, but I ended up getting map50 as 0.81, but map0.5:0.95 as 0.39, which is a little different from your indicator.

weights/best.py

Hello author!
Thank you for releasing the code. I am trying to reproduce the experimental results mentioned in your paper.In order to get your experimental results, could you please provide '/home/shen/Chenyf/exp_save/multispectral-object-detection/5l_FLIR_3class_transformerx2_avgpool+maxpool/weights/best.pt' this file?

Vedai dataset

Hello, could you please show the file directory of Vedai dataset ?

NameError: name 'global_var' is not defined

NameError: name 'global_var' is not defined
Did the author upload this file?

MR？

Hello, thank you very much for your open-source work. After training the code, I couldn't achieve the author's effect. The MR on the Kaist dataset is over 8 o'clock?

Why is the verification of this code extremely slow during training and the value is always very low?

Trainning parameter

Hello, what are your training parameters on the Vedai dataset?

About MATLAB

Hello! I ran your code and found that there is a part of the code in train.py that needs to use MATLAB. I am not very familiar with MATLAB. Do I need to install MATLAB, or is it sufficient to install the MATLAB library in Python? Could you please provide a terminal command? Thank you very much!

YOLOV5TorchObjectDetector

感谢您的工作！git上的代码没有以下这两部分，求分享！
from models.yolo_v5_object_detector import YOLOV5TorchObjectDetector
from deep_utils import Box, split_extension

Questions about the validity of CFE model in the paper

Hi, Jifeng. Thanks for your work!

卷积通道数不匹配

NameError: name 'RegistrationBlock' is not defined

Traceback (most recent call last):
File "train.py", line 588, in
train_rgb_ir(hyp, opt, device, tb_writer)
File "train.py", line 92, in train_rgb_ir
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create
File "/root/ICAFusion-main/models/yolo_test.py", line 96, in init
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
File "/root/ICAFusion-main/models/yolo_test.py", line 302, in parse_model
elif m in [TransformerFusionBlock, RegistrationBlock, STNFusionBlock]:
NameError: name 'RegistrationBlock' is not defined

Whether the DMFF module can perform feature fusion on more than two branches

Hi，

Thank you for sharing your essay code. After reading your article, I would like to ask you whether the DMFF module can perform feature fusion on multiple branches (if the input of the model is a multispectral image of 4 different bands, is the theory proposed in your article still feasible?)

Thank you very much for your time for this message.

Inference speed

Hi, Jifeng. Thanks for your work!

According to your paper, the inference speed is 38 FPS，with input size=640*512. Also, after reading your code, I assume you use yolov5l as the basic model.

On the other hand, this repo is greatly benefited from the code of CFT. So, since you didn't upload the checkpoints, I tried to test CFT with yolov5l architecture and 640*512，but the speed is only 20 FPS on 3090 GPU.

I think it's hard to understand these results, given the fact that the computational complexity of your DMFF is not significantly less than CFT, which is also reported in page 13 of your paper.

So, it would be much helpful if you can provide your checkpoint and demo code.

Pre-training weight

Hello, did you load the pre-training weight of yolov5 during training

KAIST

Hi,

Thanks for releasing code for your research. I am trying to reproduce experiments results mentioned in you paper, but I can not find the cleaned kaist dataset mentioned(8,963 and 2,252 weakly-aligned image pairs with the resolution of 640 × 512 for training and testing respectively） in your paper. Could you please post a link？

Shared parameters

Hello, thank you for your contribution to open source! I am very interested in your ICFE module. When the paper mentioned sharing parameters during the iteration process, do you mean sharing hyperparameters between CrossTransformerBlock modules? I did not find any common parameters shared between CrossTransformerBlocks