Giter Site home page Giter Site logo

chanchanchan97 / icafusion Goto Github PK

View Code? Open in Web Editor NEW
95.0 2.0 7.0 6.57 MB

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection, Pattern Recognition

License: GNU Affero General Public License v3.0

Python 97.53% MATLAB 0.34% Shell 1.95% Dockerfile 0.17%

icafusion's Introduction

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

Introduction

In this paper, we propose a novel feature fusion framework of dual cross-attention transformers to model global feature interaction and capture complementary information across modalities simultaneously. In addition, we introdece an iterative interaction mechanism into dual cross-attention transformers, which shares parameters among block-wise multimodal transformers to reduce model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios.

Paper download in: https://arxiv.org/pdf/2308.07504.pdf

Overview

Fig 1. Overview of our multispectral object detection framework
Fig 2. Illustration of the proposed DMFF module

Installation

Clone repo and install requirements.txt in a Python>=3.8.0 conda environment, including PyTorch>=1.12.

git clone https://github.com/chanchanchan97/ICAFusion.git
cd ICAFusion
pip install -r requirements.txt

Datasets

Weights

Files

Note: This is the txt files for evaluation. We continuously optimize our codes, which results in the difference in detection performance. However, the codes of module for multimodal feature fusion still remain consistent with the methods proposed in this paper.

Citation

If you find our work useful in your research, please consider citing:

@article{SHEN2023109913,
  title={ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection},
  author={Shen, Jifeng and Chen, Yifei and Liu, Yue and Zuo, Xin and Fan, Heng and Yang, Wankou},
  journal={Pattern Recognition},
  pages={109913},
  year={2023},
  issn={0031-3203},
  doi={https://doi.org/10.1016/j.patcog.2023.109913},
  author={Jifeng Shen and Yifei Chen and Yue Liu and Xin Zuo and Heng Fan and Wankou Yang},
}

icafusion's People

Contributors

chanchanchan97 avatar

Stargazers

 avatar  avatar  avatar Cai Zhihong avatar  avatar Ahmed Mansour avatar  avatar  avatar  avatar  avatar  avatar Nada Elmasry avatar Yin Kangdi avatar JarvisKevin avatar  avatar MContour avatar  avatar  avatar  avatar  avatar ZXiaolon avatar  avatar jingbo zhang avatar  avatar  avatar  avatar Mr.Lawrence avatar  avatar  avatar  avatar  avatar  avatar Andrewuuu avatar fcqfcq avatar  avatar  avatar ssl avatar  avatar  avatar Liu Chang avatar Baiph avatar 小白桃 avatar Rach avatar  avatar sfluo avatar  avatar littlepaddy avatar  avatar Ting Li avatar  avatar leah avatar Climbing avatar  avatar Po Tsui avatar  avatar  avatar hxk avatar  avatar  avatar DECEM avatar LanceLot avatar LXDxiaoxiaoda avatar Noah avatar Kai Ran avatar  avatar  avatar  avatar xiaoxuesheng avatar  avatar  avatar Ymri avatar  avatar 魔鬼面具 avatar jefferson avatar leo avatar  avatar Shi Tianjun avatar  avatar  avatar Heitor Rapela Medeiros avatar  avatar  avatar Cj avatar river_cold avatar  avatar  avatar  avatar  avatar DengHaoRan avatar liuzengyun avatar  avatar dai ming avatar  avatar Anil Kumar Nayak avatar  avatar

Watchers

James Cloos avatar  avatar

icafusion's Issues

MR

Hello author, I am testing with your code and wondering why the value at MR is 0. I have modified the code you commented on regarding MR calculation in test. py.

RuntimeError: stride should not be zero

Traceback (most recent call last):
File "train.py", line 590, in
train_rgb_ir(hyp, opt, device, tb_writer)
File "train.py", line 381, in train_rgb_ir
results, maps, MRresult, times = test.test(data_dict,
File "/data/zcy/ICAFusion-main/test.py", line 128, in test
out, _, train_out = model(img_rgb, img_ir, augment=augment) # inference and training outputs
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/ICAFusion-main/models/yolo_test.py", line 133, in forward
return self.forward_once(x, x2, profile) # single-scale inference, train
File "/data/zcy/ICAFusion-main/models/yolo_test.py", line 157, in forward_once
x = m(x) # run
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/ICAFusion-main/models/common.py", line 817, in forward
new_rgb_fea = self.vis_coefficient(self.avgpool(rgb_fea), self.maxpool(rgb_fea))
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/ICAFusion-main/models/common.py", line 885, in forward
y = nn.AvgPool2d(kernel_size=self.kernel_size, stride=(self.stride_h, self.stride_w), padding=0)(x)
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/zcy/anaconda3/envs/ica/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 615, in forward
return F.avg_pool2d(input, self.kernel_size, self.stride,
RuntimeError: stride should not be zero
没有改动程序,训练时出现了这样的错误

作者提供的权重在FLIR上的mAP50指标直接val是0.828,和论文里的指标对应不上

直接运行 python test.py --weights weights/ICAFusion_FLIR.pt --device 1

 Class      Images      Labels           P           R      [email protected]     [email protected]  [email protected]:.95: 100%|█████████████████████████████| 1013/1013 [00:58<00:00, 17.42it/s]
                 all        1013        8588       0.813       0.769       0.828       0.338       0.407
              MR-all     MR-day   MR-night    MR-near  MR-medium     MR-far    MR-none MR-partial   MR-heavy Recall-all
                0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00       0.00
              person        1013        4106       0.834       0.766       0.849       0.287       0.385
                 car        1013        4123       0.836       0.847       0.898       0.603       0.552
             bicycle        1013         359       0.768       0.693       0.738       0.124       0.283

Vedai Dataset

Hello, the link to the Vedai dataset is invalid. Can you update it

数据集

请问可以提供一下KAIST数据集的下载链接吗

复现的指标很低

您好,我用您的默认指令和代码复现过几次,在KAIST和FLIR数据集的中,各个指标很低,请问下有哪些参数需要修改吗

How go genrate attention map?

Hello, I would like to follow how you did the attention map visualization in your paper? Can you provide the program code? Thanks

Screenshot 2024-06-04 at 23 08 35

Yaml files for DMFF

Hi,

Thanks for releasing code for your research. I am trying to reproduce experiments results mentioned in you paper, but I cannot find DMFF yaml files under the transformer directory. I wondered if you could share the DMFF yaml files with me, or I can just create the DMFF yaml files myself by replacing NiN_fusion modules with the DMFF modules.

Thanks for your time for reading this message

MR为99

test.py运行后得到的result.txt,其用于评估显示MR为99

LLVIP results

Hello!
What are your experimental results on the LLVIP dataset? I ran the code from this paper on the LLVIP dataset, and the map@50 result is very low, more than ten points lower than CFT. Why is this the case?

Train new models

Could you please tell me what command I should use to train a new model?

map indicator problem

Hello, I tried to get the map indicator on the FLIR data set in your paper, but I ended up getting map50 as 0.81, but map0.5:0.95 as 0.39, which is a little different from your indicator.

weights/best.py

Hello author!
Thank you for releasing the code. I am trying to reproduce the experimental results mentioned in your paper.In order to get your experimental results, could you please provide '/home/shen/Chenyf/exp_save/multispectral-object-detection/5l_FLIR_3class_transformerx2_avgpool+maxpool/weights/best.pt' this file?

Vedai dataset

Hello, could you please show the file directory of Vedai dataset ?

MR?

Hello, thank you very much for your open-source work. After training the code, I couldn't achieve the author's effect. The MR on the Kaist dataset is over 8 o'clock?

About MATLAB

Hello! I ran your code and found that there is a part of the code in train.py that needs to use MATLAB. I am not very familiar with MATLAB. Do I need to install MATLAB, or is it sufficient to install the MATLAB library in Python? Could you please provide a terminal command? Thank you very much!

YOLOV5TorchObjectDetector

感谢您的工作!git上的代码没有以下这两部分,求分享!
from models.yolo_v5_object_detector import YOLOV5TorchObjectDetector
from deep_utils import Box, split_extension

NameError: name 'RegistrationBlock' is not defined

Traceback (most recent call last):
File "train.py", line 588, in
train_rgb_ir(hyp, opt, device, tb_writer)
File "train.py", line 92, in train_rgb_ir
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create
File "/root/ICAFusion-main/models/yolo_test.py", line 96, in init
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
File "/root/ICAFusion-main/models/yolo_test.py", line 302, in parse_model
elif m in [TransformerFusionBlock, RegistrationBlock, STNFusionBlock]:
NameError: name 'RegistrationBlock' is not defined

Whether the DMFF module can perform feature fusion on more than two branches

Hi,

Thank you for sharing your essay code. After reading your article, I would like to ask you whether the DMFF module can perform feature fusion on multiple branches (if the input of the model is a multispectral image of 4 different bands, is the theory proposed in your article still feasible?)

Thank you very much for your time for this message.

Inference speed

Hi, Jifeng. Thanks for your work!

According to your paper, the inference speed is 38 FPS,with input size=640*512. Also, after reading your code, I assume you use yolov5l as the basic model.

On the other hand, this repo is greatly benefited from the code of CFT. So, since you didn't upload the checkpoints, I tried to test CFT with yolov5l architecture and 640*512,but the speed is only 20 FPS on 3090 GPU.

I think it's hard to understand these results, given the fact that the computational complexity of your DMFF is not significantly less than CFT, which is also reported in page 13 of your paper.

So, it would be much helpful if you can provide your checkpoint and demo code.

KAIST

Hi,

Thanks for releasing code for your research. I am trying to reproduce experiments results mentioned in you paper, but I can not find the cleaned kaist dataset mentioned(8,963 and 2,252 weakly-aligned image pairs with the resolution of 640 × 512 for training and testing respectively) in your paper. Could you please post a link?

Shared parameters

Hello, thank you for your contribution to open source! I am very interested in your ICFE module. When the paper mentioned sharing parameters during the iteration process, do you mean sharing hyperparameters between CrossTransformerBlock modules? I did not find any common parameters shared between CrossTransformerBlocks

MR=0

Hello author!
Thank you for releasing the code. I am trying to reproduce the experimental results mentioned in your paper, but when I was training the KAIST dataset, I found that the value of MR is always 0, I would like to ask you what caused this?

参数设置问题

作者您好,我想把您的代码作为基线,所以想了解一下您在跑kaist数据集的时候的参数设置,十分感谢!

BBox_IOU

Hello, may I ask what is the bounding box regression loss function that you have chosen?

yolo_v5_object_detector.py file

Can you please share the code of YOLOV5TorchObjectDetector class in yolo_v5_object_detector.py? Especially the preprocessing function.

Vedai dataset

Hello, could you give a link to your Vedai dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.