Giter Site home page Giter Site logo

dl-vit / gpvit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chenhongyiyang/gpvit

0.0 0.0 0.0 6.3 MB

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

License: Apache License 2.0

Shell 0.88% Python 99.08% CSS 0.01% Makefile 0.02% Batchfile 0.02%

gpvit's Introduction

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

This repository contains the official PyTorch implementation of GPViT, a high-resolution non-hierarchical vision transformer architecture designed for high-performing visual recognition, which is introduced in our paper:

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation, Chenhongyi Yang*, Jiarui Xu*, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang.

Usage

Environment Setup

Our code base is built upon the MM-series toolkits. Specifically, classification is based on MMClassification; object detection is based on MMDetection; and semantic segmentation is based on MMSegmentation. Users can follow the official site of those toolkit to set up their environments. We also provide a sample setting up script as following:

conda create -n gpvit python=3.7 -y
source activate gpvit
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install -U openmim
mim install mmcv-full==1.4.8
pip install timm
pip install lmdb # for ImageNet experiments
pip install -v -e .
cd downstream/mmdetection  # set up object detection and instance segmentation
pip install -v -e . 
cd ../mmsegmentation # set up semantic segmentation
pip install -v -e .

Data Preparation

Please follow MMClassification, MMDetection and MMSegmentation to set up the ImageNet, COCO and ADE20K datasets. For ImageNet experiment, we convert the dataset to LMDB format to accelerate training and testing. For example, you can convert you own dataset by running:

python tools/dataset_tools/create_lmdb_dataset.py \
       --train-img-dir data/imagenet/train \
       --train-out data/imagenet/imagenet_lmdb/train \
       --val-img-dir data/imagenet/val \
       --val-out data/imagenet/imagenet_lmdb/val

After setting up, the datasets file structure should be as follows:

GPViT
|-- data
|   |-- imagenet
|   |   |-- imagenet_lmdb
|   |   |   |-- train
|   |   |   |   |-- data.mdb
|   |   |   |   |__ lock.mdb
|   |   |   |-- val
|   |   |   |   |-- data.mdb
|   |   |   |   |__ lock.mdb 
|   |   |-- meta
|   |   |   |__ ...
|-- downstream 
|   |-- mmsegmentation
|   |   |-- data
|   |   |   |-- ade
|   |   |   |   |-- ADEChallengeData2016
|   |   |   |   |   |-- annotations
|   |   |   |   |   |   |__ ...
|   |   |   |   |   |-- images
|   |   |   |   |   |   |__ ...
|   |   |   |   |   |-- objectInfo150.txt
|   |   |   |   |   |__ sceneCategories.txt
|   |   |__ ...
|   |-- mmdetection
|   |   |-- data
|   |   |   |-- coco
|   |   |   |   |-- train2017
|   |   |   |   |   |-- ...
|   |   |   |   |-- val2017
|   |   |   |   |   |-- ...
|   |   |   |   |-- annotations
|   |   |   |   |   |-- instances_train2017.json
|   |   |   |   |   |-- instances_val2017.json
|   |   |   |   |   |__ ...
|   |   |__ ...
|__ ...

ImageNet Classification

Training GPViT

# Example: Training GPViT-L1 model
zsh tool/dist_train.sh configs/gpvit/gpvit_l1.py 16 

Testing GPViT

# Example: Testing GPViT-L1 model
zsh tool/dist_test.sh configs/gpvit/gpvit_l1.py work_dirs/gpvit_l1/epoch_300.pth 16 --metrics accuracy

COCO Object Detection and Instance Segmentation

Run cd downstream/mmdetection first.

Training GPViT based Mask R-CNN

# Example: Training GPViT-L1 models with 1x and 3x+MS schedules
zsh tools/dist_train.sh configs/gpvit/mask_rcnn/gpvit_l1_maskrcnn_1x.py 16
zsh tools/dist_train.sh configs/gpvit/mask_rcnn/gpvit_l1_maskrcnn_3x.py 16

Training GPViT based RetinaNet

# Example: Training GPViT-L1 models with 1x and 3x+MS schedules
zsh tools/dist_train.sh configs/gpvit/retinanet/gpvit_l1_retinanet_1x.py 16
zsh tools/dist_train.sh configs/gpvit/retinanet/gpvit_l1_retinanet_3x.py 16

Testing GPViT based Mask R-CNN

# Example: Testing GPViT-L1 Mask R-CNN 1x model
zsh tools/dist_test.sh configs/gpvit/mask_rcnn/gpvit_l1_maskrcnn_1x.py work_dirs/gpvit_l1_maskrcnn_1x/epoch_12.pth 16 --eval bbox segm

Testing GPViT based RetinaNet

# Example: Testing GPViT-L1 RetinaNet 1x model
zsh tools/dist_test.sh configs/gpvit/retinanet/gpvit_l1_retinanet_1x.py work_dirs/gpvit_l1_retinanet_1x/epoch_12.pth 16 --eval bbox

ADE20K Semantic Segmentation

Run cd downstream/mmsegmentation first.

Training GPViT based semantic segmentation models

# Example: Training GPViT-L1 based SegFormer and UperNet models
zsh tools/dist_train.sh configs/gpvit/gpvit_l1_segformer.py 16
zsh tools/dist_train.sh configs/gpvit/gpvit_l1_upernet.py 16

Testing GPViT based semantic segmentation models

# Example: Testing GPViT-L1 based SegFormer and UperNet models
zsh tools/dist_test.sh configs/gpvit/gpvit_l1_segformer.py work_dirs/gpvit_l1_segformer/iter_160000.pth 16 --eval mIoU
zsh tools/dist_test.sh configs/gpvit/gpvit_l1_upernet.py work_dirs/gpvit_l1_upernet/iter_160000.pth 16 --eval mIoU

Benchmark results

ImageNet-1k Classification

Model #Params (M) Top-1 Acc Top-5 Acc Config Model
GPViT-L1 9.3 80.5 95.4 config model
GPViT-L2 23.8 83.4 96.6 config model
GPViT-L3 36.2 84.1 96.9 config model
GPViT-L4 75.4 84.3 96.9 config model

COCO Mask R-CNN 1x Schedule

Model #Params (M) AP Box AP Mask Config Model
GPViT-L1 33 48.1 42.7 config model
GPViT-L2 50 49.9 43.9 config model
GPViT-L3 64 50.4 44.4 config model
GPViT-L4 109 51.0 45.0 config model

COCO Mask R-CNN 3x+MS Schedule

Model #Params (M) AP Box AP Mask Config Model
GPViT-L1 33 50.2 44.3 config model
GPViT-L2 50 51.4 45.1 config model
GPViT-L3 64 51.6 45.2 config model
GPViT-L4 109 52.1 45.7 config model

COCO RetinaNet 1x Schedule

Model #Params (M) AP Box Config Model
GPViT-L1 21 45.8 config model
GPViT-L2 37 48.0 config model
GPViT-L3 52 48.3 config model
GPViT-L4 96 48.7 config model

COCO RetinaNet 3x+MS Schedule

Model #Params (M) AP Box Config Model
GPViT-L1 21 48.1 config model
GPViT-L2 37 49.0 config model
GPViT-L3 52 49.4 config model
GPViT-L4 96 49.8 config model

ADE20K UperNet

Model #Params (M) mIoU Config Model
GPViT-L1 37 49.1 config model
GPViT-L2 53 50.2 config model
GPViT-L3 66 51.7 config model
GPViT-L4 107 52.5 config model

ADE20K SegFormer

Model #Params (M) mIoU Config Model
GPViT-L1 9 46.9 config model
GPViT-L2 24 49.2 config model
GPViT-L3 36 50.8 config model
GPViT-L4 76 51.3 config model

Citation

@article{yang2022gpvit,
      title={GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation}, 
      author={Chenhongyi Yang and Jiarui Xu and Shalini De Mello and Elliot J. Crowley and Xiaolong Wang},
      journal={arXiv preprint 2212.06795}
      year={2022},
}

gpvit's People

Contributors

xvjiarui avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.