The following graph and link are taken from the original CSRA paper:
Residual Attention: A Simple But Effective Method for Multi-Label Recoginition
This package was modified and developed based on the original code for learning purposes.
- Python 3.9.7
- pytorch 1.11.0
- torchvision 0.12.0
- tqdm 4.63.0, pillow 9.0.1
- scikit-learn 1.0.2
Only VOC2007 was used, and the following structure is expected:
Dataset/
|-- VOCdevkit/
|---- VOC2007/
|------ JPEGImages/
|------ Annotations/
|------ ImageSets/
Then directly run the following command to generate json file (for implementation) of these datasets.
python utils/prepare/prepare_voc.py --data_path Dataset/VOCdevkit
which will automatically result in annotation json files in ./data/voc07, ./data/coco and ./data/wider
We provide prediction demos of our models. The demo images (picked from VCO2007) have already been put into ./utils/demo_images/, you can simply run demo.py by using our CSRA models pretrained on VOC2007:
CUDA_VISIBLE_DEVICES=0 python demo.py --model resnet101 --num_heads 1 --lam 0.1 --dataset voc07 --load_from OUR_VOC_PRETRAINED.pth --img_dir utils/demo_images
which will output like this:
utils/demo_images/000001.jpg prediction: dog,person,
utils/demo_images/000004.jpg prediction: car,
utils/demo_images/000002.jpg prediction: train,
...
Please download the pre-trained model form links proved by the author of the original CSRA paper.
ResNet101 trained on ImageNet with CutMix augmentation can be downloaded here.
Dataset | Backbone | Head nums | mAP(%) | Resolution | Download |
---|---|---|---|---|---|
VOC2007 | ResNet-101 | 1 | 94.7 | 448x448 | download |
VOC2007 | ResNet-cut | 1 | 95.2 | 448x448 | download |
COCO | ResNet-101 | 4 | 83.3 | 448x448 | download |
COCO | ResNet-cut | 6 | 85.6 | 448x448 | download |
COCO | VIT_L16_224 | 8 | 86.5 | 448x448 | download |
COCO | VIT_L16_224* | 8 | 86.9 | 448x448 | download |
Wider | VIT_B16_224 | 1 | 89.0 | 224x224 | download |
Wider | VIT_L16_224 | 1 | 90.2 | 224x224 | download |
For voc2007, run the following validation example:
set CUDA_VISIBLE_DEVICES=0 & python val.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --load_from MODEL.pth
set CUDA_VISIBLE_DEVICES=0 & python RF.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --load_from MODEL.pth
for RF.py, --model can be RF or BRRF
Other variable options:
- --svm activate BRSVM if True
- --Extra_feature activate extra feature filtering if True.
You can run either of these two lines below
CUDA_VISIBLE_DEVICES=0 python main.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20
CUDA_VISIBLE_DEVICES=0 python main.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --cutmix CutMix_ResNet101.pth
Note that the first command uses the Official ResNet-101 backbone while the second command uses the ResNet-101 pretrained on ImageNet with CutMix augmentation link (which is supposed to gain better performance).
This extension study was modified and developed by: Yunan Zhou( Master Student at Department of System Design Engineering, University of Waterloo, Canada )
CSRA Authors: Ke Zhu (http://www.lamda.nju.edu.cn/zhuk/) Jianxin Wu([email protected]) Lin Sui (http://www.lamda.nju.edu.cn/suil/)