Comments (5)
Also, it seems like your code does not support torch.bfloat16?
from low-bit-optimizers.
Hi, thank you for your question. Among the reported experiments in our paper, including image classification, machine translation, GPT-2 fine-tuning, and LLaMA fine-tuning, all were conducted in multi-GPU settings. Therefore, the error may depend on various complex factors, such as the version of transformers, the type of GPU, the pretrained model used, and so on.
Regarding torch.bfloat16: our 4-bit optimizers are compatible with torch.cuda.amp, where forward and backward computations are carried out in 16-bit, while optimizer states are stored in 4-bit. In this case, the 32-bit weights still need to be stored, and the optimizer state update are performed in 32-bit. This also applies to LLaMA fine-tuning. In general, our 4-bit optimizers do not change parameter dtype, thus not affect the forward and backward computations. The optimizer state update may be performed in 32-bit but this step is cheap. And finally, the optimizer states are stored in 4-bit.
from low-bit-optimizers.
I see. Thank you for your kind and detailed explanation!
I have one more quick question! Currently, the default setting of the second moment quantization (_C.QUANT.SQM) is 'group' for normalization and 'power-1' for mapping. But, in your paper, you used 'rank1' for normalization and 'linear' for mapping. Do I need to change this default setting?
from low-bit-optimizers.
You could pass the qconfig
argument that defines quantization setting to 4-bit optimizers. To use 'rank1' normalization and 'linear' (equivalent to 'power1') mapping for second moment, you could follow this:
optimizer = lpmm.optim.AdamW(
parameters,
qconfig="path/to/lpmm/configs/default.yml",
)
If the qconfig
argument is set to None, the optimizer will use the setting defined in config.py
, just as you mentioned. Also, you could use different provided qconfig files to modify the quantization setting.
from low-bit-optimizers.
Gotcha! Thank you for the explanation. It was very helpful!
from low-bit-optimizers.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from low-bit-optimizers.