I'm getting some warnings and finally error.
The setup is x2 Tesla T4 NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
Setting ds_accelerator to cuda (auto detect)
Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
PyTorch: setting up devices
loading file tokenizer.model
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
Vocab size: 32000
PAD: 0 <unk>
BOS: 1 <s>
EOS: 2 </s>
UNK: 0 <unk>
SEP: 1 <s>
{'instruction': 'Напишите функцию на Python, которая принимает на вход два числа и выводит их сумму.', 'input': 'Первое число: 5, Второе число: 10', 'output': '``python\ndef sum_numbers(a, b):\n return a + b\n\nprint(sum_numbers(5, 10))\n`\n\nРезультат выполнения:\n\n`\n15\n``'}
0%| | 0/28330 [00:00<?, ?it/s]Source and target examples
Задание: Напишите функцию на Python, которая принимает на вход два числа и выводит их сумму.
Вход: Первое число: 5, Второе число: 10
Ответ:
def sum_numbers(a, b):
return a + b
print(sum_numbers(5, 10))
Результат выполнения:
15
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28330/28330 [00:29<00:00, 953.09it/s]
0%| | 0/1492 [00:00<?, ?it/s]Source and target examples
Задание: Напишите одно предложение, начинающееся на "Когда я был маленьким", описывающее событие из вашего детства.
Ответ:
Когда я был маленьким, я всегда радовался, когда мама готовила пироги с клубникой, которые запахом наполняли всю квартиру.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1492/1492 [00:01<00:00, 976.01it/s]
INPUT_IDS
tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
835, 3982, 840, 1755, 29901, 2372, 1668, 1911, 730, 20153,
13603, 665, 5132, 29892, 21105, 1695, 7949, 1257, 665, 490,
11011, 11883, 5787, 12329, 606, 2771, 17272, 29932, 11244, 3404,
29959, 1805, 29889, 13, 2277, 29937, 939, 11011, 29901, 8880,
18072, 5787, 15391, 29901, 29871, 29945, 29892, 939, 702, 18805,
5787, 15391, 29901, 29871, 29896, 29900, 13, 2277, 29937, 14809,
7616, 29901, 29871, 7521, 4691, 13, 1753, 2533, 29918, 20326,
29898, 29874, 29892, 289, 1125, 13, 1678, 736, 263, 718,
289, 13, 13, 2158, 29898, 2083, 29918, 20326, 29898, 29945,
29892, 29871, 29896, 29900, 876, 13, 28956, 13, 13, 30027,
29919, 3923, 11332, 29932, 27056, 15530, 29901, 13, 13, 28956,
13, 29896, 29945, 13, 28956, 2])
MASK
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
LABELS
tensor([ -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, 1,
835, 3982, 840, 1755, 29901, 2372, 1668, 1911, 730, 20153,
13603, 665, 5132, 29892, 21105, 1695, 7949, 1257, 665, 490,
11011, 11883, 5787, 12329, 606, 2771, 17272, 29932, 11244, 3404,
29959, 1805, 29889, 13, 2277, 29937, 939, 11011, 29901, 8880,
18072, 5787, 15391, 29901, 29871, 29945, 29892, 939, 702, 18805,
5787, 15391, 29901, 29871, 29896, 29900, 13, 2277, 29937, 14809,
7616, 29901, 29871, 7521, 4691, 13, 1753, 2533, 29918, 20326,
29898, 29874, 29892, 289, 1125, 13, 1678, 736, 263, 718,
289, 13, 13, 2158, 29898, 2083, 29918, 20326, 29898, 29945,
29892, 29871, 29896, 29900, 876, 13, 28956, 13, 13, 30027,
29919, 3923, 11332, 29932, 27056, 15530, 29901, 13, 13, 28956,
13, 29896, 29945, 13, 28956, 2])
load_in_8bit True
loading configuration file llama_model_13b/config.json
Model config LlamaConfig {
"_name_or_path": "llama_model_13b",
"architectures": [
"LLaMAForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 2048,
"max_sequence_length": 2048,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.30.0.dev0",
"use_cache": true,
"vocab_size": 32000
}
loading weights file llama_model_13b/pytorch_model.bin.index.json
Instantiating LlamaForCausalLM model under default dtype torch.float16.
Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.30.0.dev0"
}
Detected 8-bit loading: activating 8-bit loading for this model
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:22<00:00, 1.82it/s]
All model checkpoint weights were used when initializing LlamaForCausalLM.
All the weights of LlamaForCausalLM were initialized from the model checkpoint at llama_model_13b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
loading configuration file llama_model_13b/generation_config.json
Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.30.0.dev0"
}
/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/peft/utils/other.py:76: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.
warnings.warn(
Trainer class: <class '__main__.TrainerNoBaseSave'>
You have loaded a model on multiple GPUs. is_model_parallel attribute will be force-set to True to avoid any unexpected behavior such as device placement mismatching.
The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using peft library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: WARNING Path datafile/python_projects/alpaca-lora-13b/wandb_dir/wandb/ wasn't writable, using system temp directory.
wandb: WARNING Path datafile/python_projects/alpaca-lora-13b/wandb_dir/wandb/ wasn't writable, using system temp directory
wandb: Tracking run with wandb version 0.15.4
wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing.
*** Running training ***
Num examples = 28,330
Num Epochs = 3
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 128
Gradient Accumulation steps = 32
Total optimization steps = 663
Number of trainable parameters = 13,107,200
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
0%| | 0/663 [00:00<?, ?it/s]use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
cuBLAS API failed with status 15
A: torch.Size([1376, 5120]), B: torch.Size([5120, 5120]), C: (1376, 5120); (lda, ldb, ldc): (c_int(44032), c_int(163840), c_int(44032)); (m, n, k): (c_int(1376), c_int(5120), c_int(5120))
wandb: Waiting for W&B process to finish... (failed 1).
error detectedwandb: You can sync this run to the cloud by running:
wandb: wandb sync /tmp/wandb/offline-run-20230622_125830-qgbnsfzk
wandb: Find logs at: /tmp/wandb/offline-run-20230622_125830-qgbnsfzk/logs
Traceback (most recent call last):
File "/usr/lib64/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib64/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/datafile/python_projects/alpaca-lora-13b/train.py", line 274, in <module>
train(**vars(args))
File "/datafile/python_projects/alpaca-lora-13b/train.py", line 256, in train
trainer.train(checkpoint)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/trainer.py", line 1661, in train
return inner_training_loop(
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/trainer.py", line 1946, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/trainer.py", line 2753, in training_step
loss = self.compute_loss(model, inputs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/trainer.py", line 2778, in compute_loss
outputs = model(**inputs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/accelerate/utils/operations.py", line 521, in forward
return model_forward(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/accelerate/utils/operations.py", line 509, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/peft/peft_model.py", line 705, in forward
return self.base_model(
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
outputs = self.model(
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 570, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(args, *kwargs) # type: ignore[misc]
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 566, in custom_forward
return module(*inputs, output_attentions, None)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, *kwargs)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/peft/tuners/lora.py", line 751, in forward
result = super().forward(x)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/bitsandbytes/nn/modules.py", line 388, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 559, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(args, *kwargs) # type: ignore[misc]
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/datafile/python_projects/alpaca-lora-main/.venv/lib64/python3.8/site-packages/bitsandbytes/functional.py", line 1781, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!