Comments (6)
Could also be worth trying to disable the int8 quantization or increase the matrix rank. Will check tomorrow.
from alpaca-lora.
same here.
only changed:
BATCH_SIZE = 256
MICRO_BATCH_SIZE = 5
from alpaca-lora.
I've observed similar β updating to 2e-5 as in the original paper. I'll check in the morning if it works better.
from alpaca-lora.
this doesn't seem to actually help. Purple line is with 2e-5, green line 3e-4.
Maybe this is just the best result obtainable with the q_proj/v_proj parameters?
from alpaca-lora.
Have you evaluated the model quality? I've always suspected that instruct-tuning is much less data-intensive than most people think.
from alpaca-lora.
Fwiw I've been able to eke out some small gains setting LORA_R to 8 instead of 4. Otherwise, seems (until proven otherwise) like both learning rates are perfectly fine.
from alpaca-lora.
Related Issues (20)
- can't load tokenizer HOT 2
- Load_in_8bit causing issues: Out of memory error with 44Gb VRAM in my GPU or device_map error HOT 1
- AttributeError: module 'gradio' has no attribute 'inputs' HOT 18
- When I set load_in_8bit=true, some errors occurred....
- is there any flag to mark the model is safetensors or pickle format?
- Errors of tuning on 70B LLAMA 2, does alpaca-lora support 70B llama 2 tuning work?
- safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization HOT 15
- generate error after hit submit btn
- The weights are not updated HOT 1
- LAION Open Assistant data is already released
- Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported
- Is it possible to combine alpaca-lora with RAG
- Is there a way to check if this training is all done?
- failed to run on colab: ModulesToSaveWrapper has no attribute `embed_tokens`
- Finetune scenarios
- decapoda-research/llama-7b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' HOT 2
- Single GPU vs multiple GPUs stack (parallel)
- Why this errorοΌ ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.model.layers.3, base_model.model.model.layers.4, base_model.model.model.layers.5, base_model.model.model.layers.6, base_model.model.model.layers.7, base_model.model.model.layers.8, base_model.model.model.layers.9, base_model.model.model.layers.10, base_model.model.model.la
- InvalidHeaderDeserialization
- RuntimeError: "normal_kernel_cpu" not implemented for 'Char'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alpaca-lora.