Giter Site home page Giter Site logo

zxddallas / clip-lora Goto Github PK

View Code? Open in Web Editor NEW

This project forked from maxzanella/clip-lora

0.0 0.0 0.0 14.82 MB

[CVPRW 2024] Low-Rank Adaptation for few-shot Vision-Language Models (CLIP-LoRA): a strong baseline without hyperparameter tuning.

Python 100.00%

clip-lora's Introduction

Low-Rank Few-Shot Adaptation of Vision-Language Models [CVPRW 2024]

The official implementation of Low-Rank Few-Shot Adaptation of Vision-Language Models.

Authors: Maxime Zanella, Ismail Ben Ayed.

We present CLIP-LoRA, an easy-to-use few-shot method for Vision-Language Models with fixed hyperparameters for every task and every number of shots. This repository also aims at facilitating the usage of Low-Rank adapters (LoRA) in Vision-Language Models like CLIP.

PEFT
Figure 1: Low-Rank Adaptation (LoRA) is easy to use and does not create any additional inference latency.

Here is how to run the experiments:

  1. Installation
  2. Usage

A quick guide on how LoRA is implemented in this repository:

  1. LoRA in MultiheadAttention

Please consider supporting our work:

  1. Citation

If you have any inquiries:

  1. Contact

Installation

Environment configuration

Our code requires an environment with PyTorch installed. If you don't have one, consider creating a Python environment with:

conda create -y --name CLIP-LoRA python=3.10.0
conda activate CLIP-LoRA

And install Pytorch for instance with:

pip3 install torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2

Datasets installation

Please follow DATASETS.md to install the datasets.

How to execute CLIP-LoRA

Execute CLIP-LoRA on the ImageNet dataset with a random seed of 1 by entering the following command:

python main.py --root_path /path/to/your/data --dataset imagenet --seed 1

You can also exectute CLIP-LoRA on the 10 other datasets:

python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1

You can optionally provide a save_path to save the LoRA modules, which can be reload easily with the --eval_only argument. The code will automatically check if your trained LoRA with the corresponding rank, alpha, encoder, params and position to ensure compatibility. The folder will be structured like that:

/your/save/path
└── backbone
    └── dataset
        └── Xshots
            ├── seedY

Here is the command line:

python main.py --root_path /path/to/your/data --dataset dataset_name --seed 1 --save_path /your/save/path --eval_only 

LoRA in MultiheadAttention

The PlainMultiheadAttentionLoRA class in loralib/layers.py extends the standard PyTorch multi-head attention mechanism by incorporating Low-Rank Adaptation (LoRA). This class constructs explicit linear modules for each component of the attention mechanism—query (q), key (k), value (v), and output (o)—providing a structured and adaptable foundation for your experiments.

Class Overview

PlainMultiheadAttentionLoRA takes an existing nn.MultiheadAttention module, replicates its configuration, and integrates LoRA linear modules.

Key Features

  • Parameter Initialization: The initialization process involves copying weights and biases from a pre-existing multi-head attention model. Each LoRA module (q, k, v, o) is adapted based on the specified requirements in the enable_lora list.
  • LoRA Integration: The replacement of standard linear layers with LinearLoRA layers introduces low-rank matrices, which are parameterized by the rank of adaptation (r) and the scaling factor (lora_alpha).
  • Forward Pass: The forward_module method manages the attention computation, incorporating optional dropout settings on the LoRA modules.

Example Usage

The following snippet demonstrates how to initialize the PlainMultiheadAttentionLoRA with an existing multi-head attention module.

from loralib.layers import PlainMultiheadAttentionLoRA

# Initialize with an existing MultiheadAttention module
existing_mha = nn.MultiheadAttention(embed_dim=512, num_heads=8)
lora_mha = PlainMultiheadAttentionLoRA(existing_mha, enable_lora=['q', 'k', 'v', 'o'], r=4, lora_alpha=2)

Citation

If you find this project useful, please cite it as follows:

@article{zanella2024low,
  title={Low-Rank Few-Shot Adaptation of Vision-Language Models},
  author={Zanella, Maxime and Ayed, Ismail Ben},
  journal={arXiv preprint arXiv:2405.18541},
  year={2024}
}

Contact

For any inquiries, feel free to create an issue or contact us at [email protected].

Acknowledgement

We express our gratitude to the CoOp and Tip-Adapter authors for their open-source contribution.

clip-lora's People

Contributors

maxzanella avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.