Giter Site home page Giter Site logo

vit-zsl's Introduction

ViT-ZSL

PyTorch | Arxiv

PyTorch implementation of our ViT-ZSL model for zero-shot learning:
Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning
Faisal Alamri, Anjan Dutta
IMVIP, 2021

Abstract

Zero-Shot Learning (ZSL) aims to recognise unseen object classes, which are not observed during the training phase. The existing body of works on ZSL mostly relies on pretrained visual features and lacks the explicit attribute localisation mechanism on images. In this work, we propose an attention-based model in the problem settings of ZSL to learn attributes useful for unseen class recognition. Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches. We conduct experiments on three popular ZSL benchmarks (i.e., AWA2, CUB and SUN) and set new state-of-the-art harmonic mean results on all the three datasets, which illustrate the effectiveness of our proposed method.

Usage:

1) Download the datasets

Follow the instructions provided in data/Dataset_Instruction.txt

2) Create a conda environment:

Refer to: Conda Environment for more information.

# conda create -n {ENVNAME} python=3.6
conda create -n ViT_ZSL python=3.6

# Activate the environment: conda activate {ENVNAME}
conda activate ViT_ZSL

3) Required libraries :

This is a PyTorch implementation

pip install -r requirements.txt 

# PyTorch
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

4) Train (and test) the model

open ViT_ZSL.ipynb

jupyter notebook ViT_ZSL.ipynb

External sources:

Further questions:

Please do read our paper. If you still require any further information, feel free to contact us at our emails.

Citation:

If you use ViT-ZSL in your research, please use the following BibTeX entry.

@InProceedings{Alamri2021ViTZSL,
  author    = {Faisal Alamri and Anjan Dutta},
  title     = {Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning},
  booktitle = {IMVIP},
  year      = {2021}
}

Authors

vit-zsl's People

Contributors

faisalalamri0 avatar anjandutta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.