Giter Site home page Giter Site logo

dora's Introduction

DoRA: Weight-Decomposed Low-Rank Adaptation

This repo is now deprecated, please visit NVlabs/DoRA instead!!

DoRA: Weight-Decomposed Low-Rank Adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

Paper: https://arxiv.org/abs/2402.09353

Project page: https://nbasyl.github.io/DoRA-project-page/

DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.

Quick Start and some tricks regarding finetuing with DoRA

HuggingFace PEFT

DoRA is now supported by the Huggingface PEFT package. You can install the PEFT package using

pip install git+https://github.com/huggingface/peft.git -q

After PEFT is installed, you can simply set the use_dora argument of LoraConfig() to True for applying DoRA.

An example could be as follows:

from peft import LoraConfig

# Initialize DoRA configuration
config = (
    use_dora=True, ...
)

Please refer to the official documentation for more details.

DoRA hyperparameters settings

Note

๐Ÿ’ก While fine-tuning with DoRA, by utilizing the configuration of LoRA can already achieve better results most of the time, achieving optimal performance compared to LoRA still requires adjustments to the hyperparameters.

We suggest starting with a slightly lower learning rate than that of LoRA, and users may also experiment with varying lora dropout ratios.

User may also start with half of the rank of the LoRA configuration which oftentime can already results in comparable or even superior accuracy compared to that of LoRA.

Contact

Shih-Yang Liu: [email protected] or [email protected]

Licenses

Copyright ยฉ 2024, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

Citation

If you find DoRA useful, please cite it by using the following BibTeX entry.

@article{liu2024dora,
  title={{DoRA}: Weight-Decomposed Low-Rank Adaptation},
  author={Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung},
  booktitle={arXiv:2402.09353},
  url={arxiv.org/abs/2402.09353},
  year={2024}
}

dora's People

Contributors

d-kleine avatar nbasyl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dora's Issues

[Question]: which dimension should we calculate the norm across?

Hello, I have noticed that in the paper, the normalization (norm) is applied to the dimension of out_feature, but in the implementation of PEFT, it is applied to the dimension of in_feature. Which one is correct?

d is out_feature, k is in_feature, r is rank
weight.shape = [d, k]
lora_A.shape = [r, k]
lora_B.shape = [d, r]
image
https://github.com/huggingface/peft/blob/6dca6d22922fcb1ead828b5b3c146911d7b693fb/src/peft/tuners/lora/layer.py#L172-L175

weight.norm(dim=0) -> [1, k]
weight.norm(dim=1) -> [d, 1] [PEFT Implement]

About reproduce the results in paper.

Hi, thank you very much for your great paper.

I would like to reproduce the results in your paper and would like to ask do you have some recommendations about the codebase of PEFT. The main reason is that I find the results are pretty different for the same method in different paper.

Thank you very much in advance.

I find some confusion code in pefy

code: result_dora = (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out)) ) + mag_norm_scale * lora_B(lora_A(x)) * scaling
Question: what is the effect of (mag_norm_scale - 1) and mag_norm_scale ? And, result_dora can't equals the F.linear(x, transpose(weight, self.fan_in_fan_out)) in the Initializing stage due to the parameter "mag_norm_scale - 1"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.