Light

alec-lin / visual-transformers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from andreybocharnikov/visual-transformers

0.0 1.0 0.0 229 KB

Unofficial implimentation of Visual Transformers: Token-based Image Representation and Processing for Computer Vision

Python 17.57% Jupyter Notebook 82.43%

visual-transformers's Introduction

Visual-Transformers

Unofficial implimentation of Visual Transformers: Token-based Image Representation and Processing for Computer Vision paper.

Usage:

python main.py task_mode learning_mode data --model --weights, where:

task_mode: classification or semantic_segmentation for corresponding task
learning_mode: train to train --model from scratch, test to validate --model with --weights on validation data.
data: path to dataset, in case of classification should be path to image net, in case of semantic segmentation to coco.
--model:
○ classification: ResNet18 or VT_ResNet18 (will be used by default).
○ semantic segmentation: PanopticFPN or VT_FPN (will be used by default).
--weights must be provided if learning_mode equals to test, won't be used in train mode.
--from_pretrained uses to continue training from some point, should be state_dict that contains model_state_dict, optimizer_state_dict and epoch.

Results:

final metrics and losses after 15 and 5 epochs of classification and semantic segmentation respectively:

	ResNet18	VT-ResNet18
Training accuracy	0.664675	0.672889
Validation accuracy	0.691541	0.696929

Training loss	1.312150	1.249382
Validation loss	1.173559	1.114401

	Panoptic FPN	VT-FPN
Training mIOU	8.0968	7.0343
Validation mIOU	4.3148	3.2351

Training loss	2.044084	2.068598
Validation loss	2.101253	2.120928

loss and metric curves of classification and semantic segmentation:

cross entropy loss	accuracy

pixel-wise cross entropy loss	mIOU

Efficiency and parameters

	Params (M)	FLOPs (M)	Forward-backward pass (s)
ResNet18	11.2	822	0.016
VT-ResNet18	12.7	543	0.02

Panoptic FPN	16.4	67412	0.08
VT-FPN	40.3	110019	0.062

Weights:

classification: ResNet18, VT-ResNet18
semantic segmentation: Panoptic FPN, VT-FPN

visual-transformers's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.