Giter Site home page Giter Site logo

dvit_repo's Introduction

DeepViT

This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://github.com/rwightman/pytorch-image-models) by Ross Wightman

Introduction

Deep Vision Transformer is initially described in arxiv, which observes the attention collapese phenomenon when training deep vision transformers: In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper. More specifically, we empirically observe that such scaling difficulty is caused by the attention collapse issue: as the transformer goes deeper, the attention maps gradually become similar and even much the same after certain layers. In other words, the feature maps tend to be identical in the top layers of deep ViT models. This fact demonstrates that in deeper layers of ViTs, the self-attention mechanism fails to learn effective concepts for representation learning and hinders the model from getting expected performance gain. Based on above observation, we propose a simple yet effective method, named Re-attention, to re-generate the attention maps to increase their diversity at different layers with negligible computation and memory cost. The pro-posed method makes it feasible to train deeper ViT models with consistent performance improvements via minor modification to existing ViT models. Notably, when training a deep ViT model with 32 transformer blocks, the Top-1 classification accuracy can be improved by 1.6% on ImageNet.

2. DeepViT Models

  1. Comparison with ViTs
Model Re-attention Top1 Acc (%) #params #Similar Blocks Checkpoint Attention Map
ViT-16 NA 78.88 24.5M 5 [download here](comming soon)
DeepViT-16 FC 79.10 24.5M 0 [download here](comming soon)
ViT-24 NA 79.35 36.3M 11 [download here](comming soon)
DeepViT-24 FC 79.99 36.3M 0 download here
ViT-32 NA 79.27 48.1M 15 [download here](comming soon)
DeepViT-32 FC 80.90 48.1M 0 download here
  1. DeepViTs with CNNs for patch processing and optimized training hyper-parameters
Model Re-attention Top1 Acc (%) #Params Image Size Checkpoint
DeepViT-16 FC + BN 82.3 24.5M 224 download here
DeepViT-32 FC + BN 83.1 48.1M 224 download here
DeepViT-32 FC + BN 84.25 48.1M 384 [download here](comming soon)

Evaluation

To evaluate a pre-trained DeepViT models on ImageNet val run:

bash eval.sh

Citing DeepVit

@article{zhou2021deepvit,
  title={DeepViT: Towards Deeper Vision Transformer},
  author={Zhou, Daquan and Kang, Bingyi and Jin, Xiaojie and Yang, Linjie and Lian, Xiaochen and Hou, Qibin and Feng, Jiashi},
  journal={arXiv preprint arXiv:2103.11886},
  year={2021}
}

dvit_repo's People

Contributors

zhoudaquan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.