Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Our classification code is developed on top of pytorch-image-models and deit.

For details see Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

If you use this code for a paper please cite:

PVTv1

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

PVTv2

@misc{wang2021pvtv2,
      title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2106.13797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Todo List

PVT + ImageNet-22K pre-training.

Usage

First, clone the repository locally:

git clone https://github.com/whai362/PVT.git

Then, install PyTorch 1.6.0+ and torchvision 0.7.0+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Model Zoo

PVTv2 on ImageNet-1K

Method	Size	Acc@1	#Params (M)	Config	Download
PVT-V2-B0	224	70.5	3.7	config	14M [Google] [GitHub]
PVT-V2-B1	224	78.7	14.0	config	54M [Google] [GitHub]
PVT-V2-B2-Linear	224	82.1	22.6	config	86M [GitHub]
PVT-V2-B2	224	82.0	25.4	config	97M [Google] [GitHub]
PVT-V2-B3	224	83.1	45.2	config	173M [Google] [GitHub]
PVT-V2-B4	224	83.6	62.6	config	239M [Google] [GitHub]
PVT-V2-B5	224	83.8	82.0	config	313M [Google] [GitHub]

PVTv1 on ImageNet-1K

Method	Size	Acc@1	#Params (M)	Config	Download
PVT-Tiny	224	75.1	13.2	config	51M [Google] [GitHub]
PVT-Small	224	79.8	24.5	config	93M [Google] [GitHub]
PVT-Medium	224	81.2	44.2	config	168M [Google] [GitHub]
PVT-Large	224	81.7	61.4	config	234M [Google] [GitHub]

Evaluation

To evaluate a pre-trained PVT-Small on ImageNet val with a single GPU run:

sh dist_train.sh configs/pvt/pvt_small.py 1 --data-path /path/to/imagenet --resume /path/to/checkpoint_file --eval

This should give

* Acc@1 79.764 Acc@5 94.950 loss 0.885
Accuracy of the network on the 50000 test images: 79.8%

Training

To train PVT-Small on ImageNet on a single node with 8 gpus for 300 epochs run:

sh dist_train.sh configs/pvt/pvt_small.py 8 --data-path /path/to/imagenet

Calculating FLOPS & Params

python get_flops.py pvt_v2_b2

This should give

Input shape: (3, 224, 224)
Flops: 4.04 GFLOPs
Params: 25.36 M

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

kanade-nya / pvttrain Goto Github PK

pvttrain's Introduction

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Todo List

Usage

Data preparation

Model Zoo

Evaluation

Training

Calculating FLOPS & Params

License

pvttrain's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent