Giter Site home page Giter Site logo

low-bit-optimizers's Issues

qlora bfloat16 使用这个优化器出现bug

File "sft_low_bit.py", line 869, in train
train_result = trainer.train()
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1971, in _inner_training_loop
self.optimizer.step()
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/optimizer.py", line 145, in step
self.optimizer.step(closure)
File "/root/miniconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/optim/adamw.py", line 230, in step
_single_tensor_adamw4bit(**kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/optim/adamw.py", line 426, in _single_tensor_adamw4bit
qx, gen = vectorwise_quant(exp_avg, qmap=exp_avgs_qmap[i], shape=param.shape, **exp_avg_qmetadata)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/functional.py", line 53, in vectorwise_quant
qx = nonlinear_quant(qx, qmap, b, round_type=kwargs['round_type'])
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/functional.py", line 369, in nonlinear_quant
idx = real_nonlinear_quant(qx, qmap, b, False)
File "/root/miniconda3/lib/python3.8/site-packages/lpmm-0.0.0-py3.8-linux-x86_64.egg/lpmm/functional.py", line 363, in real_nonlinear_quant
return ext_quantization.pack_nonlinear(grouped_qx, qmap, b, stochastic)
RuntimeError: The type of data is not kFloat32 or kFloat16!

qx: tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
dtype=torch.bfloat16)这个数据不满足要求?

apply lpmm.optim.AdamW to transformers trainer for multiple gpus training -> error

Hi, thank you for the interesting idea and very helpful implementation! Actually, I tried to apply lpmm.optim.AdamW to transformers trainer for multiple gpus training but got an error below.

lib/python3.10/site-packages/accelerate/utils/operations.py", line 167, in send_to_device
return tensor.to(device, non_blocking=non_blocking)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Doesn't your current code support the multiple gpus training? Thanks!

doesn't work directly with HF transformers trainer.

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1779, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2176, in _inner_training_loop
    self.optimizer.step()
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/optimizer.py", line 145, in step
    self.optimizer.step(closure)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
    out = func(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/optim/adamw.py", line 230, in step
    _single_tensor_adamw4bit(**kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/optim/adamw.py", line 426, in _single_tensor_adamw4bit
    qx, gen = vectorwise_quant(exp_avg, qmap=exp_avgs_qmap[i], shape=param.shape, **exp_avg_qmetadata)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/functional.py", line 53, in vectorwise_quant
    qx = nonlinear_quant(qx, qmap, b, round_type=kwargs['round_type'])
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/functional.py", line 369, in nonlinear_quant
    idx = real_nonlinear_quant(qx, qmap, b, False)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/lpmm/functional.py", line 363, in real_nonlinear_quant
    return ext_quantization.pack_nonlinear(grouped_qx, qmap, b, stochastic)
RuntimeError: The type of data is not kFloat32 or kFloat16!

How to apply the optimizer to BF16 model?

I removed the
TORCH_CHECK((name.dtype() == c10::BFloat16 || name.dtype() == torch::kFloat16), \ "The type of " #name " is not kFloat32 or kFloat16!");\
and got
RuntimeError: "pack_nonlinear_4bit" not implemented for 'BFloat16'
How can i apply the optimzer to bf16 model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.