sleekmike / finetune_gpt-j_6b_8-bit Goto Github PK
View Code? Open in Web Editor NEWFine-tuning GPT-J-6B on colab or equivalent PC GPU with your custom datasets: 8-bit weights with low-rank adaptors (LoRA)
License: MIT License
Fine-tuning GPT-J-6B on colab or equivalent PC GPU with your custom datasets: 8-bit weights with low-rank adaptors (LoRA)
License: MIT License
Hi @sleekmike
Great work on the notebook. I just wanted to check on the possibility of fine tuning the pythia 12B or any smaller variant.
I have some specific use cases where I wanted to explore more around Pythia 12B.
Do you have any idea if the same code you published in this repo would work with Pythia variants also. If not then can you help creating a blog or notebook for that as well?
Heya,
Almost got this working with DirectML on Windows, but bitsandbytes requires CUDA on Linux. I'm gonna run out of VRAM if I don't do the monkey patching seen in the repo, so I was wondering if you had any ideas to not depend on it.
Thanks!
When I run this example in Jupyter Lab and start finetuning the codeparrot example I get the following error message:
RuntimeError: Output 0 of DequantizeAndLinearBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.
Can you please help me?
Really awsome Work . I was wonder how you convert slim weight which floating point 16 provided by el euther to floating point 8 . Can you please direct if you want to do it for other models
Thanks
Hello!
After some epoch get
RuntimeError: probability tensor contains either inf
, nan
or element < 0
And saved model stops work ((
Hi,
I get this error when fine tuning the model:
RuntimeError: Output 0 of DequantizeAndLinearBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.
Do I need to save a checkpoint to avoid this? Not sure how it would work?
Thanks,
a
Able to finetune 6B on a RTX 3080 with this! Had to change batch to 32 but that's about it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.