Giter Site home page Giter Site logo

kyegomez / vit-rgts Goto Github PK

View Code? Open in Web Editor NEW
119.0 5.0 13.0 230 KB

Open source implementation of "Vision Transformers Need Registers"

Home Page: https://discord.gg/qUtxnK2NMf

License: MIT License

Python 100.00%
attention-mechanism gpt4 vision-api vision-transformer vit

vit-rgts's Introduction

Multi-Modality

VISION TRANSFORMERS NEED REGISTERS

The vit model from the paper "VISION TRANSFORMERS NEED REGISTERS" that reaches SOTA for dense visual prediction tasks, enables object discovery methods with larger model, and leads to smoother feature maps and attentions maps for downstream visual processing.

Register tokens enable interpretable attention maps in all vision transofrmers!

Paper Link

Appreciation

  • Lucidrains
  • Agorians

Install

pip install vit-rgts

Usage

import torch
from vit_rgts.main import VitRGTS

v = VitRGTS(
    image_size = 256,
    patch_size = 32,
    num_classes = 1000,
    dim = 1024,
    depth = 6,
    heads = 16,
    mlp_dim = 2048,
    dropout = 0.1,
    emb_dropout = 0.1
)

img = torch.randn(1, 3, 256, 256)

preds = v(img) # (1, 1000)
print(preds)

Architecture

  • Additional tokens to input sequence that cleanup low informative background areas of images

Dataset Srtrategy

Here is a table summarizing the key datasets mentioned in the paper along with their metadata and source links:

Dataset Type Size Tasks Source
ImageNet-1k Image Classification 1.2M images, 1000 classes Pretraining http://www.image-net.org/
ImageNet-22k Image Classification 14M images, 21841 classes Pretraining https://github.com/google-research-datasets/ImageNet-21k-P
INaturalist (IN1k) Image Classification 437K images, 1000 classes Evaluation https://github.com/visipedia/inat_comp/tree/master/2018
Places205 (P205) Image Classification 2.4M images, 205 classes Evaluation http://places2.csail.mit.edu/index.html
Aircraft (Airc.) Image Classification 10K images, 100 classes Evaluation https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/
CIFAR-10 (CF10) Image Classification 60K images, 10 classes Evaluation https://www.cs.toronto.edu/~kriz/cifar.html
CIFAR-100 (CF100) Image Classification 60K images, 100 classes Evaluation https://www.cs.toronto.edu/~kriz/cifar.html
CUB-200-2011 (CUB) Image Classification 11.8K images, 200 classes Evaluation http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Caltech 101 (Cal101) Image Classification 9K images, 101 classes Evaluation http://www.vision.caltech.edu/Image_Datasets/Caltech101/
Stanford Cars (Cars) Image Classification 16K images, 196 classes Evaluation https://ai.stanford.edu/~jkrause/cars/car_dataset.html
Describable Textures (DTD) Image Classification 5640 images, 47 classes Evaluation https://www.robots.ox.ac.uk/~vgg/data/dtd/index.html
MPI Sintel (Flow.) Optical Flow 1041 images Evaluation http://sintel.is.tue.mpg.de/
Food-101 (Food) Image Classification 101K images, 101 classes Evaluation https://www.vision.ee.ethz.ch/datasets_extra/food-101/
Oxford-IIIT Pets (Pets) Image Classification 7349 images, 37 classes Evaluation https://www.robots.ox.ac.uk/~vgg/data/pets/
SUN397 (SUN) Scene Classification 108K images, 397 classes Evaluation https://groups.csail.mit.edu/vision/SUN/
PASCAL VOC 2007 (VOC) Object Detection 5011 images, 20 classes Evaluation http://host.robots.ox.ac.uk/pascal/VOC/voc2007/
PASCAL VOC 2012 (VOC) Object Detection 11540 images, 20 classes Evaluation http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
COCO 2017 (COCO) Object Detection 118K images, 80 classes Evaluation https://cocodataset.org/#home
ADE20K (ADE20k) Semantic Segmentation 20K images, 150 classes Evaluation https://groups.csail.mit.edu/vision/datasets/ADE20K/
NYU Depth V2 (NYUd) Monocular Depth Estimation 1449 images Evaluation https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html

License

MIT

Citations

@misc{2309.16588,
Author = {Timothée Darcet and Maxime Oquab and Julien Mairal and Piotr Bojanowski},
Title = {Vision Transformers Need Registers},
Year = {2023},
Eprint = {arXiv:2309.16588},
}

Todo

  • Make a new training script
  • Make a table of datasets used in the paper
  • Make a blog article on architecture and applications
  • Clean up operations, remove reduancy in attention, transformer, and vitgi

vit-rgts's People

Contributors

dependabot[bot] avatar kyegomez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vit-rgts's Issues

Are pretrained weights available?

Awesome work. I am trying to fine tune one of these models on a particular dataset. Are there pre trained weights available?

Thanks

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Model Checkpoints

Hi,

Thank you for your code. Will you put pre-trained ViT models' checkpoints in this repo?

Thanks

CLS Token not used

Thanks for sharing the code and detailed documentation.
I noticed that in the code the CLS token is created, but never used, because you hard coded mean pooling in the forward method.
However, in the paper the CLS token is specifically mentioned in multiple places.
Therefore I was wondering which pooling method you used in your experiments and what was the reasoning behind it.

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Attention Map example

Hi,

Since this repo is based on the paper "ViT needs registers" should be usefull to publish some example to visualize attention maps.

Thanks for this great work!

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.