https://arxiv.org/abs/2106.0

Approach <a target="_blank" rel="noopener noreferrer" href="https:

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS about paper HOT 6 OPEN

5g4s commented on September 18, 2024

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

from paper.

Comments (6)

5g4s commented on September 18, 2024

As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

from paper.

5g4s commented on September 18, 2024

Problem

The major downside of fine-tuning is that the new model contains as many parameters as in the original model.

Existing techniques often introduce inference latency (Houlsby et al., 2019; Rebuffi et al., 2017) by extending model depth or reduce the model’s usable sequence length (Li & Liang, 2021; Lester et al., 2021; Hambardzumyan et al., 2020; Liu et al., 2021) (Section 3). More importantly, these method often fail to match the fine-tuning baselines, posing a trade-off between efficiency and model quality.

from paper.

5g4s commented on September 18, 2024

We see a noticeable increase in latency when using adapters, even with a very small bottleneck dimension.

from paper.

5g4s commented on September 18, 2024

Approach

from paper.

5g4s commented on September 18, 2024

We limit our study to only adapting the attention weights for downstream tasks and freeze the MLP modules (so they are not trained in downstream tasks) both for simplicity and parameter-efficiency.

from paper.

5g4s commented on September 18, 2024

Table 6 shows that, surprisingly, LoRA already performs competitively with a very small r (more so for { $W_{q}$, $W_{v}$ } than just $W_{q}$).

from paper.

Recommend Projects

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS about paper HOT 6 OPEN

Comments (6)

Problem

Approach

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent