Comments (7)
Are you talking about fine tuning or generating dataset?
You use gpt api's to generate the dataset, it doesn't require running anything on a gpu cloud vps
from codealpaca.
For comparison Alpaca-7B took 3 hours on 3xA100 and LoRA/PEFT reduces compute requirements two orders of magnitude for similar results.
So likely only a couple of hours and also likely that can be reduced to a few minutes on a single consumer GPU by swapping out the fine tuning process with LoRA using the peft library.
Even Alpaca-30B can be trained in a few hours on a single 3090 using 4bit peft (not officially supported in the peft library yet, but has been done).
from codealpaca.
Hey @MarkSchmidty do you have a link for the 4-bit peft - i'd like to see those results.
I think this is one of the few repos that actually fine-tuned llama as opposed to just using LoRA. I personally find LoRA suspicious, how is it possible that we can just freeze the model and add low-rank tensors to the query/key in the attention matrix and get comparable results to fine tuning (which is much more expensive). I did see the results in the Microsoft paper, but I'm still finding it hard to believe...
from codealpaca.
@vgoklani Generally you must merge the 16bit peft into a 16bit model and then quantize the resulting merged model down to 4bit if you want 4bit inference. The quality of the peft part falls apart at this point. So native finetuning does have a benefit over LoRA/peft if you're planning to quantize down to 4bit.
That said, this is the project which finetunes LoRAs in 4bit directly. This avoids the quality loss of quantizing after fine tuning, producing 4bit pefts about as good as native finetunes: https://github.com/johnsmith0031/alpaca_lora_4bit There are finetunes of all sizes mentioned in the Issues section.
from codealpaca.
Hey @MarkSchmidty do you have a link for the 4-bit peft - i'd like to see those results.
I think this is one of the few repos that actually fine-tuned llama as opposed to just using LoRA. I personally find LoRA suspicious, how is it possible that we can just freeze the model and add low-rank tensors to the query/key in the attention matrix and get comparable results to fine tuning (which is much more expensive). I did see the results in the Microsoft paper, but I'm still finding it hard to believe...
I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking
from codealpaca.
I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking
But was the LoRA created in 16bit or in 4bit and where you running inference in 16bit or in 4bit? LoRA made in 16bit with inference in 16bit is quite good, same with LoRA made in 4bit with inference in 4bit.
It's LoRAs made in 16bit with inference in 4bit that are "greatly lacking" I find.
from codealpaca.
I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking
But was the LoRA created in 16bit or in 4bit and where you running inference in 16bit or in 4bit? LoRA made in 16bit with inference in 16bit is quite good, same with LoRA made in 4bit with inference in 4bit.
It's LoRAs made in 16bit with inference in 4bit that are "greatly lacking" I find.
I haven't run inference on any llama/based model in 4bit so I can't comment on that
from codealpaca.
Related Issues (19)
- Please publish weights? HOT 1
- Instructions for training 13b model
- Cannot solve complex problems HOT 1
- How can i run this model locally using cpu on Windows 11?
- Open LLaMA project
- AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 8 * 1 * 8 HOT 1
- Just want to say thank you.
- Difference between new_codealpaca.json, rosetta_alpaca.json and codealpaca-20k.json?
- Llama2 model with code instruction-tuning on a single RTX 3090 is available now
- Please elaborate the process of coding data generation.
- A new code editing dataset
- More training data HOT 2
- Is it commercially usable? HOT 1
- Hosting your dataset on the Hugging Face Hub HOT 2
- Private data HOT 1
- 65b model possible? HOT 1
- bug: get empty state dict HOT 1
- How big is the finished model?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codealpaca.