Comments (6)
As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example β deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
from paper.
Problem
The major downside of fine-tuning is that the new model contains as many parameters as in the original model.
Existing techniques often introduce inference latency (Houlsby et al., 2019; Rebuffi et al., 2017) by extending model depth or reduce the modelβs usable sequence length (Li & Liang, 2021; Lester et al., 2021; Hambardzumyan et al., 2020; Liu et al., 2021) (Section 3). More importantly, these method often fail to match the fine-tuning baselines, posing a trade-off between efficiency and model quality.
from paper.
We see a noticeable increase in latency when using adapters, even with a very small bottleneck dimension.
from paper.
Approach
from paper.
We limit our study to only adapting the attention weights for downstream tasks and freeze the MLP modules (so they are not trained in downstream tasks) both for simplicity and parameter-efficiency.
from paper.
Table 6 shows that, surprisingly, LoRA already performs competitively with a very small r (more so for {
from paper.
Related Issues (20)
- SimMIM: a Simple Framework for Masked Image Modeling HOT 1
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention HOT 5
- Efficient Attention: Attention with Linear Complexities HOT 2
- Reformer: The Efficient Transformer HOT 5
- Linear Transformers Are Secretly Fast Weight Programmers HOT 5
- RANDOM FEATURE ATTENTION HOT 4
- Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation HOT 12
- DINO: DETR WITH IMPROVED DENOISING ANCHOR BOXES FOR END-TO-END OBJECT DETECTION HOT 12
- RE-PARAMETERIZING YOUR OPTIMIZERS RATHER THAN ARCHITECTURES HOT 4
- Vision GNN: An Image is Worth Graph of Nodes HOT 6
- GhostNetV2: Enhance Cheap Operation with Long-Range Attention HOT 5
- LEARNING FROM PROTEIN STRUCTURE WITH GEOMETRIC VECTOR PERCEPTRONS HOT 6
- DN-DETR: Accelerate DETR Training by Introducing Query DeNoising HOT 7
- DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR HOT 6
- Rethinking Transformer-based Set Prediction for Object Detection HOT 5
- Conditional DETR for Fast Training Convergence HOT 2
- RWKV: Reinventing RNNs for the Transformer Era HOT 2
- ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING HOT 4
- Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paper.