Giter Site home page Giter Site logo

alec-lin / visual-transformers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from andreybocharnikov/visual-transformers

0.0 1.0 0.0 229 KB

Unofficial implimentation of Visual Transformers: Token-based Image Representation and Processing for Computer Vision

Python 17.57% Jupyter Notebook 82.43%

visual-transformers's Introduction

Visual-Transformers

Unofficial implimentation of Visual Transformers: Token-based Image Representation and Processing for Computer Vision paper.

Usage:

python main.py task_mode learning_mode data --model --weights, where:

  • task_mode: classification or semantic_segmentation for corresponding task
  • learning_mode: train to train --model from scratch, test to validate --model with --weights on validation data.
  • data: path to dataset, in case of classification should be path to image net, in case of semantic segmentation to coco.
  • --model:
    โ—‹ classification: ResNet18 or VT_ResNet18 (will be used by default).
    โ—‹ semantic segmentation: PanopticFPN or VT_FPN (will be used by default).
  • --weights must be provided if learning_mode equals to test, won't be used in train mode.
  • --from_pretrained uses to continue training from some point, should be state_dict that contains model_state_dict, optimizer_state_dict and epoch.

Results:

  • final metrics and losses after 15 and 5 epochs of classification and semantic segmentation respectively:
ResNet18 VT-ResNet18
Training accuracy 0.664675 0.672889
Validation accuracy 0.691541 0.696929
Training loss 1.312150 1.249382
Validation loss 1.173559 1.114401
Panoptic FPN VT-FPN
Training mIOU 8.0968 7.0343
Validation mIOU 4.3148 3.2351
Training loss 2.044084 2.068598
Validation loss 2.101253 2.120928
  • loss and metric curves of classification and semantic segmentation:
cross entropy loss accuracy
classification loss classification metric
pixel-wise cross entropy loss mIOU
semantic segmentation_loss semantic segmentation mIOU
  • Efficiency and parameters
Params (M) FLOPs (M) Forward-backward pass (s)
ResNet18 11.2 822 0.016
VT-ResNet18 12.7 543 0.02
Panoptic FPN 16.4 67412 0.08
VT-FPN 40.3 110019 0.062

Weights:

visual-transformers's People

Contributors

andreybocharnikov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.