Giter Site home page Giter Site logo

zhymeng / emo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhangzjn/emo

0.0 0.0 0.0 85.56 MB

[ICCV 2023] Official PyTorch implementation of "Rethinking Mobile Block for Efficient Attention-based Models"

Shell 0.04% Python 16.13% CSS 0.01% Makefile 0.01% Batchfile 0.01% Jupyter Notebook 83.80% Dockerfile 0.01%

emo's Introduction

EMO


Official PyTorch implementation of "Rethinking Mobile Block for Efficient Attention-based Models, ICCV'23".

Abstract: This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance. Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterpart has been recognized by attention-based studies. This work rethinks lightweight infrastructure from efficient IRB and effective components of Transformer from a unified perspective, extending CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMB) for lightweight model design. Following simple but effective design criterion, we deduce a modern Inverted Residual Mobile Block (iRMB) and build a ResNet-like Efficient MOdel (EMO) with only iRMB for down-stream tasks. Extensive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, e.g., EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass equal-order CNN-/Attention-based models, while trading-off the parameter, efficiency, and accuracy well: running 2.8-4.0×↑ faster than EdgeNeXt on iPhone14. Code is available.

Top: Abstracted unified Meta-Mobile Block from Multi-Head Self-Attention and Feed-Forward Network in Transformer as well as efficient Inverted Residual Block in MobileNet-v2. Absorbing the experience of light-weight CNN and Transformer, an efficient but effective EMO is designed based on deduced iRMB.
Bottom: Performance vs. FLOPs comparisons with SoTA Transformer-based methods.


Main results

Image Classification for ImageNet-1K:

Model #Params FLOPs Resolution Top-1 Log
EMO-1M 1.3M 261M 224 x 224 71.5 log
EMO-2M 2.3M 439M 224 x 224 75.1 log
EMO-5M 5.1M 903M 224 x 224 78.4 log
EMO-6M 6.1M 961M 224 x 224 79.0 log

Object Detection Performance Based on SSDLite for COCO2017:

Backbone AP AP50 AP75 APS APM APL #Params FLOPs Log
EMO-1M 22.0 37.3 22.0 2.1 20.6 43.2 2.3M 0.6G log
EMO-2M 25.2 42.0 25.3 3.3 25.9 47.6 3.3M 0.9G log
EMO-5M 27.9 45.2 28.1 5.2 30.2 50.6 6.0M 1.8G log

Object Detection Performance Based on RetinaNet for COCO2017:

Backbone AP AP50 AP75 APS APM APL #Params FLOPs Log
EMO-1M 34.4 54.2 36.2 20.2 37.1 46.0 10.4M 163G log
EMO-2M 36.2 56.6 38.1 21.7 38.8 48.1 11.5M 167G log
EMO-5M 38.9 59.8 41.0 23.8 42.2 51.7 14.4M 178G log

Semantic Segmentation Based on DeepLabv3 for ADE20k:

Backbone aAcc mIoU mAcc #Params FLOPs Log
EMO-1M 75.0 33.5 44.2 5.6M 2.4G log
EMO-2M 75.6 35.3 46.0 6.9M 3.5G log
EMO-5M 77.6 37.8 48.2 10.3M 5.8G log

Semantic Segmentation Based on PSPNet for ADE20k:

Backbone aAcc mIoU mAcc #Params FLOPs Log
EMO-1M 74.8 33.2 43.4 4.3M 2.1G log
EMO-2M 75.5 34.5 44.9 5.5M 3.1G log
EMO-5M 77.6 38.2 49.0 8.5M 5.3G log

Classification

Environments

conda install -y pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip3 install timm==0.6.5 tensorboardX einops torchprofile fvcore
git clone https://github.com/NVIDIA/apex && cd apex && pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ (optional)

Prepare ImageNet-1K Dataset

Download and extract ImageNet-1K dataset in the following directory structure:

├── imagenet
    ├── train
        ├── n01440764
            ├── n01440764_10026.JPEG
            ├── ...
        ├── ...
    ├── train.txt (optional)
    ├── val
        ├── n01440764
            ├── ILSVRC2012_val_00000293.JPEG
            ├── ...
        ├── ...
    └── val.txt (optional)

Test

Test with 8 GPUs in one node:

python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m test model.name=EMO_1M trainer.data.batch_size=2048 model.model_kwargs.checkpoint_path=resources/EMO-1M/net.pth

This should give Top-1: 71.498 (Top-5: 90.368)

EMO-2M
python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m test model.name=EMO_2M trainer.data.batch_size=2048 model.model_kwargs.checkpoint_path=resources/EMO-2M/net.pth

This should give Top-1 75.134 (Top-5: 92.184)

EMO-5M
python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m test model.name=EMO_5M trainer.data.batch_size=2048 model.model_kwargs.checkpoint_path=resources/EMO-5M/net.pth

This should give Top-1 78.422 (Top-5: 93.970)

EMO-6M
python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m test model.name=EMO_6M trainer.data.batch_size=2048 model.model_kwargs.checkpoint_path=resources/EMO-6M/net.pth

This should give Top-1 78.988 (Top-5: 94.162)

Train

Train with 8 GPUs in one node:

python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --use_env run.py -c configs/mobile/cls_emo -m train model.name=EMO_1M trainer.data.batch_size=2048


Down-Stream Tasks

Object Detection

  • Ref to MMDetection for the environments.
  • Configs can be found in down-stream-tasks/mmdetection/configs/ssd_emo and down-stream-tasks/mmdetection/configs/retinanet_emo
  • E.g.:
    ./tools/dist_train.sh configs/ssd_emo/ssdlite_emo_5M_pretrain_coco.py $GPU_NUM # for SSDLite with EMO-5M
    ./tools/dist_train.sh configs/retinanet_emo/retinanet_emo_5M_fpn_1x_coco.py $GPU_NUM # for RetinaNet with EMO-5M

Semantic Segmentation

  • Ref to MMSegmentation for the environments.
  • Configs can be found in down-stream-tasks/mmsegmentation/configs/deeplabv3_emo and down-stream-tasks/mmsegmentation/configs/pspnet_emo
  • E.g.:
    ./tools/dist_train.sh configs/deeplabv3_emo/deeplabv3_emo_5M_pretrain_512x512_80k_ade20k.py $GPU_NUM # for DeepLabv3 with EMO-5M
    ./tools/dist_train.sh configs/pspnet_emo/pspnet_emo_5M_512x512_80k_ade20k.py $GPU_NUM # for PSPNet with EMO-5M

Mobile Evaluation

Citation

If our work is helpful for your research, please consider citing:

@inproceedings{emo,
  title={Rethinking Mobile Block for Efficient Attention-based Models},
  author={Zhang, Jiangning and Li, Xiangtai and Li, Jian and Liu, Liang and Xue, Zhucun and Zhang, Boshen and Jiang, Zhengkai and Huang, Tianxin and Wang, Yabiao and Wang, Chengjie},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={1389--1400},
  year={2023}
}

Acknowledgements

We thank but not limited to following repositories for providing assistance for our research:

emo's People

Contributors

zhangzjn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.