Giter Site home page Giter Site logo

yhlleo / vts-drloc Goto Github PK

View Code? Open in Web Editor NEW
138.0 4.0 14.0 263 KB

NeurIPS 2021, Official codes for "Efficient Training of Visual Transformers with Small Datasets".

License: MIT License

Python 98.59% Shell 1.41%
efficient-deep-learning self-supervised-learning vision-transformer

vts-drloc's Introduction

Efficient Training of Visual Transformers with Small Datasets

Maintenance Contributing

To appear in NerIPS 2021.

[paper][Poster & Video][arXiv][code] [reviews]
Yahui Liu1,3, Enver Sangineto1, Wei Bi2, Nicu Sebe1, Bruno Lepri3, Marco De Nadai3
1University of Trento, Italy, 2Tencent AI Lab, China, 3Bruno Kessler Foundation, Italy.

Data preparation

Dataset Download Link
ImageNet train,val
CIFAR-10 all
CIFAR-100 all
SVHN train,test, extra
Oxford-Flower102 images, labels, splits
Clipart images, train_list, test_list
Infograph images, train_list, test_list
Painting images, train_list, test_list
Quickdraw images, train_list, test_list
Real images, train_list, test_list
Sketch images, train_list, test_list
  • Download the datasets and pre-processe some of them (i.e., imagenet, domainnet) by using codes in the scripts folder.
  • The datasets are prepared with the following stucture (except CIFAR-10/100 and SVHN):
dataset_name
  |__train
  |    |__category1
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__category2
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__...
  |__val
       |__category1
       |    |__xxx.jpg
       |    |__...
       |__category2
       |    |__xxx.jpg
       |    |__...
       |__...

Training

After prepare the datasets, we can simply start the training with 8 NVIDIA V100 GPUs:

sh train.sh

Evaluation

We can also load the pre-trained model and test the performance:

sh eval.sh

Pretrained models

For fast evaluation, we present the results of Swin-T trained with 100 epochs on various datasets as an example (Note that we save the model every 5 epochs during the training, so the attached best models may be slight different from the reported performances).

Datasets Baseline Ours
CIFAR-10 59.47 83.89
CIFAR-100 53.28 66.23
SVHN 71.60 94.23
Flowers102 34.51 39.37
Clipart 38.05 47.47
Infograph 8.20 10.16
Painting 35.92 41.86
Quickdraw 24.08 69.41
Real 73.47 75.59
Sketch 11.97 38.55

We provide a demo to download the pretrained models from Google Drive directly:

python3 ./scripts/collect_models.py

Related Work:

Acknowledgments

This code is highly based on the Swin-Transformer. Thanks to the contributors of this project.

Citation

@InProceedings{liu2021efficient,
    author    = {Liu, Yahui and Sangineto, Enver and Bi, Wei and Sebe, Nicu and Lepri, Bruno and De Nadai, Marco},
    title     = {Efficient Training of Visual Transformers with Small Datasets},
    booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
    year      = {2021}
}

If you have any questions, please contact me without hesitation (yahui.cvrs AT gmail.com).

vts-drloc's People

Contributors

yhlleo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vts-drloc's Issues

Augmentation settings on CIFAR10/100

Hi, thank you so much for sharing this excellent work.

I have some confusion about the experimental setup of CIFAR10/100. Commonly used augmentation settings are random cropping and padding=4, and the input image resolution is 32x32. But this setting does not seem to be able to get the output resolution of 7x7 as described in the paper when using SwinT. So could you please tell me the detailed augmentation settings you used on CIFAR10/100, and whether there are any changes to the network structure of original VTs.

Thanks again.

Imagenet-100 split

Thanks for your amazing work!
I also want to train with imagenet-100 using the subset in the file /scripts/imagenet-100.lst. But I didn't find its train/val split. May I know your splits or split reference?

[ Pretrained Models ]

Hi,

Thanks for the wonderful work. Could you please, share links to the default models used for finetunning experiments ?

Specifically, the pretrained models for finetunning experiments are they trained from scratch on ImageNet1K ?
- Because the official ones published for ViTs models are trained on ImageNet21K and finetunned on ImageNet1K ?

Thanks,

Strange reproduced results of Swin transformer

Hi authors,
I have reproduced all results based on your codes. Most of them are consistent with the reported results, except the swin transformer. Below are some results (with reported results followed in brackets):
Trained with 8 gpus (a100):
Cifar10: 75.00 (59.47), CIFAR100: 52.26 (53.28), SVHN: 38.10 (71.60)
Trained with 4 gpus:
CIFAR10: 81.91 (59.47), CIFAR100: 62.30 (53.28), SVHN: 91.29 (71.60)
It seems that the batch size affect swin a lot from results above. All reproduced results are comparable with vit. (e.g. ViT on CIFAR10 with 8 gpus: 77.00 (71.70)). Do you have any idea on the reason?

融合Swin-T语义分割

您好,请问论文中的这个密集相对损失函数drloc怎么融合到swin-T的语义分割模型里呢?

compare to CvT

Hi

Thanks for sharing this good work. I'm curious about why the proposed loss function can outperform CvT, which contains a depthwise convolution that is capable to learn local features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.