Giter Site home page Giter Site logo

dl-mae / fastmim.pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ggjy/fastmim.pytorch

0.0 0.0 0.0 13.03 MB

FastMIM, official pytorch implementation of our paper "FastMIM: Expediting Masked Image Modeling Pre-training for Vision"(https://arxiv.org/pdf/2212.06593.pdf).

Shell 0.08% Python 99.41% CSS 0.01% Makefile 0.04% Batchfile 0.05% Jupyter Notebook 0.36% Dockerfile 0.04%

fastmim.pytorch's Introduction

Comparison among the MAE, SimMIM and our FastMIM framework. MAE randomly masks and discards the input patches. Although there is only small amount of encoder patches, MAE can only be used to pre-train the isotropic ViT which generates single-scale intermediate features. SimMIM preserves input resolution and can serve as a generic framework for all kinds of vision backbones, but it needs to tackle with large amount of patches. Our FastMIM simply reduces the input resolution and replaces the pixel target with HOG target. These modifications are simple yet effective. FastMIM (i) pre-train faster; (ii) has a lighter memory consumption; (iii) can serve as a generic framework for all kinds of architectures; and (iv) achieves comparable and even better performances compared to previous methods.

Set up

- python==3.x
- cuda==10.x
- torch==1.7.0+
- mmcv-full-1.4.4+

# other pytorch/cuda/timm version can also work

# To pip your environment
sh requirement_pip_install.sh

# build your apex (optional)
cd /your_path_to/apex-master/;
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:

│path/to/imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Pre-training on ImageNet-1K

To train Swin-B on ImageNet-1K on a single node with 8 gpus:

python -m torch.distributed.launch --nproc_per_node=8 main_pretrain.py --model mim_swin_base --data_path /your_path_to/data/imagenet/ --epochs 400 --warmup_epochs 10 --blr 1.5e-4 --weight_decay 0.05 --output_dir /your_path_to/fastmim_pretrain_output/ --batch_size 256 --save_ckpt_freq 50 --num_workers 10 --mask_ratio 0.75 --norm_pix_loss --input_size 128 --rrc_scale 0.2 1.0 --window_size 4 --decoder_embed_dim 256 --decoder_depth 4 --mim_loss HOG --block_size 32

Finetuning on ImageNet-1K

To fine-tune Swin-B on ImageNet-1K on a single node with 8 gpus:

python -m torch.distributed.launch --nproc_per_node=1 main_finetune.py --model swin_base_patch4_window7_224 --data_path /your_path_to/data/imagenet/ --batch_size 128 --epochs 100 --blr 1.0e-3 --layer_decay 0.80 --weight_decay 0.05 --drop_path 0.1 --dist_eval --finetune /your_path_to_ckpt/checkpoint-399.pth --output_dir /your_path_to/fastmim_finetune_output/

Notice

We build our object detection and sementic segmentation codebase upon mmdet-v2.23 and mmseg-v0.28, however, we also add some features from the updated mmdet version (e.g., simple copy-paste) into our mmdet-v2.23. If you directly download the mmdet-v2.23 from MMDet, the code may report some errors.

Results and Models

Result an ckpt will be updated when I recover from the covid-19. :(

Classification on ImageNet-1K (ViT-B/Swin-B/PVTv2-b2/CMT-S)

Model #Params PT Res. PT Epoch PT log/ckpt FT Res. FT log/ckpt Top-1 (%)
ViT-B 86M 128x128 800 log/ckpt 224x224 log/ckpt 83.8
Swin-B 88M 128x128 400 log/ckpt 224x224 log/ckpt 84.1

Object Detection on COCO (Swin-B based Mask R-CNN)

Semantic Segmentation on ADE20K (ViT-B based UpperNet)

Citation

If you find this project useful in your research, please consider cite:

@article{guo2022fastmim,
  title={FastMIM: Expediting Masked Image Modeling Pre-training for Vision},
  author={Guo, Jianyuan and Han, Kai and Wu, Han and Tang, Yehui and Wang, Yunhe and Xu, Chang},
  journal={arXiv preprint arXiv:2212.06593},
  year={2022}
}

Acknowledgement

The classification task in this repo is based on MAE, SimMIM, SlowFast and timm.

The object detection task in this repo is baesd on MMDet, ViDet and Swin-Transformer-Object-Detection.

The semantic segmentation task in this repo is baesd on MMSeg and BEiT.

License

License: MIT

fastmim.pytorch's People

Contributors

ggjy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.