Comments (4)
Good catch! I had Conv2d in mind when I first wrote it.
Looks like we just need to instantiate lora_A and lora_B differently depending on the kind of convolution.
Happy to review and merge it if someone wants to implement and test it. Otherwise, I'll do it in the near future.
from lora.
I have this idea but it will change the way current Conv2d LoRA works. We can treat convolution as a matmul with the input as a flattened "window". For example, for Conv2d, the input is a window with (kernel_size, kernel_size)
spatial dimensions, and the flattened input dim is in_channels * kernel_size * kernel_size
. This will naturally extend to Conv1d and Conv3d
- Conv1d: B @ A = (out_channels, rank) @ (rank, in_channels * kernel_size) = (out_channels, in_channels * kernel_size)
- Conv2d: B @ A = (out_channels, rank) @ (rank, in_channels * kernel_size * kernel_size) = (out_channels, in_channels * kernel_size * kernel_size)
- Conv3d: B @ A = (out_channels, rank) @ (rank, in_channels * kernel_size * kernel_size * kernel_size) = (out_channels, in_channels * kernel_size * kernel_size * kernel_size)
There are two benefits with the above implementation. (1) kernel size doesn't need to be the same in all spatial dimensions, and (2) we can use convolution in the LoRA branch in the forward pass instead of merging weights, similar to Linear implementation (relevant issue - #54). The first convolution (with lora_A) is normal convolution, with the same kernel size, but the second convolution (with lora_B) will be point-wise (aka 1x1) convolution. I haven't tested it but from what I understand, it should work.
The situation becomes slightly complicated when grouped convolution is involved (groups > 1
). I'm thinking of accounting for groups in the input channels of lora_A (so lora_A becomes (rank, in_channels / groups * kernel_size * kernel_size)). We can still implement forward pass of LoRA branch as two convolutions with lora_A and lora_B, where we will use grouped convolution for lora_A, similar to the original convolution branch. A problem might arise when we try to merge weights though. Due to how groped convolution works, I think the merged weights might not be lora_B @ lora_A (I will need to test this). If that's the case, we need to use a different calculation to merge weights.
Another way of using groups > 1
is to follow your current implementation, which puts groups in the output of lora_B (out_channels / groups, rank). However, this would sacrifice the ability to use convolution for forward pass in LoRA branch, but maintains the ability to merge weights with simple lora_B @ lora_A.
Let me know what you think @edwardjhu. Thank you!
from lora.
Hello, I also encountered the same problem. Is this improvement feasible in your subsequent [experiments? @gau-nernst @edwardjhu
from lora.
I have changed the initialization of lora_B parameters so that the new implementation works for more than 2d cases (Pull Request #157 ). I have tested it, and it works for 1d to 3d. Also, the Lora parameter's shape is the same as before in the 2d case. I didn't test the group case. Please let me know if the group case needs to be fixed.
@gau-nernst @edwardjhu
from lora.
Related Issues (20)
- Can't reproduce the results for GLUE and hyperparameter misalignment HOT 4
- Layers.py not being executed HOT 1
- Can not reproduce the result of Roberta-Base HOT 2
- how to improve the memory ability of lora fine tuning? HOT 1
- models are the same after loading lora parameters using peft library
- Is it necessary to add `model = model.merge_and_unload()` when training a new LoRA adapter?
- How to adjust LoRA into nn.ConvTranspose2d? HOT 2
- Cannot implement LoRA on a custom model containing transformer encoder from pytorch
- _conv_forward() error
- Dynamic Lora Selection In Runtime❓ HOT 1
- Reproduce Lora results is close but not accurate HOT 2
- Guidance Needed on Continuing Training with a New Dataset via LoRA
- After joining Lora, the first few layers show a gradient of 0
- lora-dim == lora-r ?
- LORA on T5 model
- [Question about multi-gpu training] HOT 2
- question for scale!
- Parameter count on GPT-2 medium
- Where is the LoRA matrices saved?
- Questions about running the cola dataset script HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lora.