Giter Site home page Giter Site logo

livt's Introduction

Learning Imbalanced Data with Vision Transformers

Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai and Chun Yuan

This repository is the official PyTorch implementation of the paper LiVT in CVPR 2023.  

 

Environments

python == 3.7
pytorch >= 1.7.0
torchvision >= 0.8.1
timm == 0.3.2
tensorboardX >= 2.1
  1. We recommand to install PyTorch 1.7.0+, torchvision 0.8.1+ and pytorch-image-models 0.3.2.
  2. If your PyTorch is 1.8.1+, a fix is needed to work with timm.
  3. See requirements.txt for detailed requirements. You don't have to be in strict agreement with it, just for reference.

Data preparation

We adopt torchvision.datasets.ImageFolder to build our dataloaders. Hence, we resort all datasets (ImageNet-LT, iNat18, Places-LT, CIFAR) as follows:

/path/to/ImageNet-LT/
    train/
        class1/
            img1.jpeg
        class2/
            img2.jpeg
    val/
        class1/
            img3.jpeg
        class2/
            img4.jpeg

You can follow the prepare.py to construct your dataset.

The detailed information of these datasets are shown as follows:  

 

Usage

  1. Please set the DATA_PATH and WORK_PATH in util.trainer.py Line 6-7.

  2. Typically, make sure 4 or 8 GPUs and >12GB per GPU Memory are available.

  3. Keep the settings consistent with the follows.

 

 

 

You can see all args in Class Trainer in util/trainer.py.

Specially, for different stage, the commands are:

# MGP stage
python script/pretrain.py
# BFT stage
python script/finetune.py
# evaluate stage
python script/evaluate.py

Results and Models

Balanced Finetuned Models and Masked Generative Pretrained Models.

Dataset Resolution Many Med. Few Acc args log ckpt MGP ckpt
ImageNet-LT 224*224 73.6 56.4 41.0 60.9 download download download Res_224
ImageNet-LT 384*384 76.4 59.7 42.7 63.8 download download download
iNat18 224*224 78.9 76.5 74.8 76.1 download download download Res_128
iNat18 384*384 83.2 81.5 79.7 81.0 download download download

Citation

If you find our idea or code inspiring, please cite our paper:

@inproceedings{LiVT,
  title={Learning Imbalanced Data with Vision Transformers},
  author={Xu, Zhengzhuo and Liu, Ruikang and Yang, Shuo and Chai, Zenghao and Yuan, Chun},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

This code is partially based on Prior-LT, if you use our code, please also cite:

@inproceedings{PriorLT,
  title={Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective},
  author={Xu, Zhengzhuo and Chai, Zenghao and Yuan, Chun},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

Acknowledgements

This project is highly based on DeiT and MAE.

The CIFAR code is based on LDAM and Prior-LT.

The loss implementations are based on CB, LDAM, LADE, PriorLT and MiSLAS.

livt's People

Contributors

xuzhengzhuo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.