The alpaca-cot from phoebussi

Have you ever tried fine-tuning ChatGLM?

It's an open-source project from Tsinghua University, offering similar performance to Alpaca from Stanford U, and also emphasizing better ability in Chinese dialogs.

fine-tuned结束后，运行chatglm报错

Traceback (most recent call last):
File "/root/llm/Alpaca-CoT-main/app.py", line 15, in
from model_chatglm import ChatGLMForConditionalGeneration, ChatGLMTokenizer
ModuleNotFoundError: No module named 'model_chatglm'
这个需要pip什么安装包吗？

How to add the show of loss in training?

Thanks for your work! It really benefit me.
Cause I am new to pytorch, I wander how to add the code of showing loss change during training?
Would you mind give me some case?
Looking forward to your reply.
Thanks.

run generate.py, download Lora model config failed

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/saved-alpaca-belle-cot7b/resolve/main/adapter_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/peft/utils/config.py", line 99, in from_pretrained
    config_file = hf_hub_download(pretrained_model_name_or_path, CONFIG_NAME)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1134, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1475, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6423b5fb-3dc6880b29a2556c43cb8c3d)

Repository Not Found for url: https://huggingface.co/saved-alpaca-belle-cot7b/resolve/main/adapter_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/Alpaca-CoT/generate.py", line 47, in <module>
    model = PeftModel.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/peft/peft_model.py", line 135, in from_pretrained
    config = PEFT_TYPE_TO_CONFIG_MAPPING[PeftConfig.from_pretrained(model_id).peft_type].from_pretrained(model_id)
  File "/usr/local/lib/python3.10/site-packages/peft/utils/config.py", line 101, in from_pretrained
    raise ValueError(f"Can't find config.json at '{pretrained_model_name_or_path}'")
ValueError: Can't find config.json at 'saved-alpaca-belle-cot7b'

I think the LORA_WEIGHTS missing repo info

如何预训练模型和增加词汇表？

下载下来7B的模型之后，测试了几个中文问题，发现回答有很多无法识别的字符，是不是模型中中文的词汇表特别小？请问如何扩充中文词汇，并且在此基础上增加中文预训练语料来预训练？

Missing alpaca-belle-cot-7b?

Incredible work!! Thank you! But it seems that alpaca-belle-cot-7b is missing for now?

训练belle0.5M时，中途断掉了，有办法从checkpoint接着往下训练吗？

训练belle0.5M时，中途断掉了，有办法从checkpoint接着往下训练吗？具体如何操作？

expected scalar type Half but found Float

─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /export/home/gth/alpaca_lora/uniform_finetune.py:294 in │
│ │
│ 291 │ args = parser.parse_args() │
│ 292 │ print(args) │
│ 293 │ │
│ ❱ 294 │ train(args) │
│ 295 │
│ │
│ /export/home/gth/alpaca_lora/uniform_finetune.py:263 in train │
│ │
│ 260 │ if torch.version >= "2" and sys.platform != "win32": │
│ 261 │ │ model = torch.compile(model) │
│ 262 │ │
│ ❱ 263 │ trainer.train() │
│ 264 │ │
│ 265 │ model.save_pretrained(output_dir) │
│ 266 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1644 in train │
│ │
│ 1641 │ │ inner_training_loop = find_executable_batch_size( │
│ 1642 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1643 │ │ ) │
│ ❱ 1644 │ │ return inner_training_loop( │
│ 1645 │ │ │ args=args, │
│ 1646 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1647 │ │ │ trial=trial, │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1909 in │
│ _inner_training_loop │
│ │
│ 1906 │ │ │ │ ): │
│ 1907 │ │ │ │ │ # Avoid unnecessary DDP synchronization since there will be no backw │
│ 1908 │ │ │ │ │ with model.no_sync(): │
│ ❱ 1909 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1910 │ │ │ │ else: │
│ 1911 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1912 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2667 in training_step │
│ │
│ 2664 │ │ │ loss = loss / self.args.gradient_accumulation_steps │
│ 2665 │ │ │
│ 2666 │ │ if self.do_grad_scaling: │
│ ❱ 2667 │ │ │ self.scaler.scale(loss).backward() │
│ 2668 │ │ elif self.use_apex: │
│ 2669 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2670 │ │ │ │ scaled_loss.backward() │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/_tensor.py:488 in backward │
│ │
│ 485 │ │ │ │ create_graph=create_graph, │
│ 486 │ │ │ │ inputs=inputs, │
│ 487 │ │ │ ) │
│ ❱ 488 │ │ torch.autograd.backward( │
│ 489 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 490 │ │ ) │
│ 491 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, grad_tensors, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, *args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, *args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/utils/checkpoint.py:157 in backward │
│ │
│ 154 │ │ │ raise RuntimeError( │
│ 155 │ │ │ │ "none of output has requires_grad=True," │
│ 156 │ │ │ │ " this checkpoint() is not necessary") │
│ ❱ 157 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 158 │ │ grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None │
│ 159 │ │ │ │ │ for inp in detached_inputs) │
│ 160 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/init.py:197 in backward │
│ │
│ 194 │ # The reason we repeat same the comment below is that │
│ 195 │ # some Python versions print out the first line of a multi-line function │
│ 196 │ # calls in the traceback and some print out the last line │
│ ❱ 197 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 198 │ │ tensors, grad_tensors, retain_graph, create_graph, inputs, │
│ 199 │ │ allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to ru │
│ 200 │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/torch/autograd/function.py:267 in apply │
│ │
│ 264 │ │ │ │ │ │ │ "Function is not allowed. You should only implement one " │
│ 265 │ │ │ │ │ │ │ "of them.") │
│ 266 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 267 │ │ return user_fn(self, *args) │
│ 268 │ │
│ 269 │ def apply_jvp(self, *args): │
│ 270 │ │ # _forward_cls is defined by derived class │
│ │
│ /home/admin/anaconda3/lib/python3.9/site-packages/bitsandbytes/autograd/functions.py:456 in │
│ backward │
│ │
│ 453 │ │ │ │
│ 454 │ │ │ elif state.CB is not None: │
│ 455 │ │ │ │ CB = state.CB.to(ctx.dtype_A, copy=True).mul(state.SCB.unsqueeze(1).mul │
│ ❱ 456 │ │ │ │ grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype │
│ 457 │ │ │ elif state.CxB is not None: │
│ 458 │ │ │ │ │
│ 459 │ │ │ │ if state.tile_indices is None: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type Half but found Float

有效果对比和分析吗

rt

可以出一个colab能跑的版本嘛，感觉玩的人会多些

一直在关注你们的工作，做的很全很完善。但是建议有时间可以出一个colab版本，从装包开始都写在notebook里面，这样玩起来会容易一点，方便大家测试

单卡的A100 80GB显卡微调llama-7b-hf，belle0.5M的数据，per_gpu_train_batch_size可以设置更高一些吗？

per_gpu_train_batch_size默认是4，可以设置到8吗？

Issues regarding FastChat dataset

The repo marks FastChat as subsets from ShareGPT (Show in the table). However, I checked the repo of FastChat and find that their released dataset are processed Alpaca data to demonstrate how to train Vicuna, rather than ShareGPT data they used. You may want to fix this by adding a notice to the description of FastChat.

增加 BELLE 1M CN data

BELLE已开源其新的1M中文数据且更优质，与之前的0.5M不重复，可以合并进去
https://huggingface.co/datasets/BelleGroup/generated_train_1M_CN

Auto CoT by Amazon

https://github.com/amazon-science/auto-cot

Seems interesting, also dataset is available, too

fine-tuning后，调用app.py传参问题

get_model_class(args.model_type, args.model_name_or_path, args.lora_name_or_path)里面需要三个参数，
第三个参数应该传什么？假如 --model_type llama --model_name_or_path ./saved_models/llama-7b-hf_belle1.5m
lora_name_or_path 呢？

error on Multiple GPUs:

I can run on a single gpu, but multi-gpu will report the following error, has anyone encountered it
"
uniform_finetune.py: error: unrecognized arguments: --local-rank=3
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 141382) of binary: /usr/local/conda/bin/python3
"

There was an error when I ran finetune.py.

traceback (most recent call last):
File "finetune.py", line 231, in
trainer.train()
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/trainer.py", line 1648, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/trainer.py", line 1911, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/trainer.py", line 2657, in training_step
loss = self.compute_loss(model, inputs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/trainer.py", line 2689, in compute_loss
outputs = model(**inputs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 3 on device 3.
Original Traceback (most recent call last):
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/peft/peft_model.py", line 538, in forward
**kwargs,
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 714, in forward
return_dict=return_dict,
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 590, in forward
None,
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 581, in custom_forward
return module(*inputs, output_attentions, None)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 324, in forward
hidden_states = self.mlp(hidden_states)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 155, in forward
return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/jct/.conda/envs/alpaca_cot_envs/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2048x3 and 4x4096)

llama微调后的保存模型不一致

我这边使用llama-7b微调后的模型保存如下：

但看你这边放出的模型是下面的：

这中间是有什么特殊操作吗

fine-tuning完了以后，使用时AutoTokenizer这步报错

fine-tuning这部已顺利跑成功了，belle1.5m 的指令集已经替换成我自己的指令集了。
python3 uniform_finetune.py --model_type llama --model_name_or_path decapoda-research/llama-7b-hf
--data belle1.5m --lora_target_modules q_proj v_proj
--per_gpu_train_batch_size 4 --learning_rate 3e-4 --epochs 1
saved_models下面也生成了llama-7b-hf_belle1.5m文件夹，里面有两个文件config.json和pytorch_model.bin
然后加载模型：
from transformers import AutoTokenizer, AutoModelForCausalLM
import sys
model_path = "./saved_models/llama-7b-hf_belle1.5m"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
运行到tockenizer时，抛异常：

OSError Traceback (most recent call last)
Cell In[5], line 1
----> 1 tokenizer = AutoTokenizer.from_pretrained(model_path)

File ~/.conda/envs/python39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py:715, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
713 else:
714 if tokenizer_class_py is not None:
--> 715 return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
716 else:
717 raise ValueError(
718 "This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed "
719 "in order to use this tokenizer."
720 )

File ~/.conda/envs/python39/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:1795, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1789 logger.info(
1790 f"Can't load following files from cache: {unresolved_files} and cannot check if these "
1791 "files are necessary for the tokenizer to operate."
1792 )
1794 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()):
-> 1795 raise EnvironmentError(
1796 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
1797 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
1798 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "
1799 f"containing all relevant files for a {cls.name} tokenizer."
1800 )
1802 for file_id, file_path in vocab_files.items():
1803 if file_id not in resolved_vocab_files:

OSError: Can't load tokenizer for '/root/llm/Alpaca-CoT-main/saved_models/llama-7b-hf_belle1.5m'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/llm/Alpaca-CoT-main/saved_models/llama-7b-hf_belle1.5m' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.

后面该怎么加载啊？还是模型生成时就少点什么？

多GPU训练完成，单GPU推理失败

Python 3.9.12, torch 2.0.0, peft 0.3.0.dev0, transformers 4.28.0.dev0
训练采用多GPU：

torchrun --nproc_per_node 8 uniform_finetune.py --model_type llama --model_name_or_path ../llama_weights_converted/7B/ --data alpaca-gpt4-cot --lora_target_modules q_proj v_proj --per_gpu_train_batch_size 32 --gradient_accumulation_steps 2 --learning_rate 3e-4 --epochs 1

训练成功，
测试采用单卡，采用测试权重，lora权重加载报错：
LORA_WEIGHTS = "./saved_models/llama_alpaca-gpt4-cot"

CUDA_VISIBLE_DEVICES=0 python generate.py --size 7 --model llama
...
Loading checkpoint shards: 100%|█████████████████████████████████████████████| 3/3 [00:14<00:00,  4.84s/it]
Traceback (most recent call last):
  File "/generate.py", line 86, in <module>
    model = PeftModel.from_pretrained(
  File "/home/conda/llama/lib/python3.9/site-packages/peft/peft_model.py", line 164, in from_pretrained
    model = set_peft_model_state_dict(model, adapters_weights)
  File "/home/conda/llama/lib/python3.9/site-packages/peft/utils/save_and_load.py", line 74, in set_peft_model_state_dict
    model.load_state_dict(peft_model_state_dict, strict=False)
  File "/home/conda/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight: copying a param with shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([1, 4096]).
        size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight: copying a param with shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([4096, 1]).
        size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight: copying a param with shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([1, 4096]).

看了一下是peft的加载模型出现问题
搜了一下说在torch.load中设置map_device=“cuda:0"就可以，但是报同样的错误
请问有遇到过这种问题么，如何解决

需要安装什么python版本？

作inference的时候，下载失败

请问可以开源更大尺寸的模型吗？比如30b，65b

运行generate.py时报错

运行python generate.py --model_type chatglm --size 7 时可以正常起来，里面对应的chatglm的lora_weights已经写死。
后面输入instruction以后，运行报错：
Traceback (most recent call last):
File "/root/llm/Alpaca-CoT-main/generate.py", line 258, in
response = evaluate(instruction)
File "/root/llm/Alpaca-CoT-main/generate.py", line 212, in evaluate
output = tokenizer.decode(s)
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/fdb7a601d8f8279806124542e11549bdd76f62f6/tokenization_chatglm.py", line 276, in decode
if self.pad_token_id in token_ids: # remove pad
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

output的前一步那个s，打印出来是有值的：

Response:

The dtype of attention mask (torch.int64) is not bool
tensor([ 32313, 20107, 20125, 26054, 20109, 23384, 20104, 21833, 20007,
31121, 20104, 20532, 20109, 32475, 49321, 20100, 21029, 20007,
20004, 145875, 57010, 20012, 20004, 20150, 88230, 29668, 90663,
83831, 85119, 99903, 20004, 145875, 31034, 20012, 150001, 150004,
20483, 22739, 20142, 20372, 88230, 29668, 90663, 20103, 20142,
21224, 20006, 20120, 20134, 20236, 20103, 21008, 20208, 22095,
20012, 20004, 20004, 20009, 20007, 150009, 22999, 20142, 20372,
88230, 29668, 20102, 90085, 84121, 90663, 83823, 20004, 20010,
20007, 150009, 86246, 20058, 85119, 84052, 20062, 90959, 84140,
20006, 83984, 20058, 99903, 85119, 145907, 20004, 20013, 20007,
150009, 86977, 84121, 85119, 84086, 20006, 84111, 85964, 83824,
83995, 84015, 83824, 86299, 84015, 83835, 83823, 20004, 20016,
20007, 150009, 86246, 20058, 99903, 20062, 90997, 20006, 85749,
137200, 119854, 83966, 88230, 83823, 20004, 20004, 24400, 20120,
20127, 99903, 84192, 20006, 20142, 20372, 88230, 29668, 90663,
20134, 20113, 21554, 20103, 20142, 21224, 20102, 20120, 20134,
20113, 20477, 20103, 21506, 20142, 21224, 20207, 20142, 20372,
88230, 29668, 20007, 150005], device='cuda:0')

Datasets Resources: A list of instruction datasets

Hi, I am collecting a list of open-source datasets. You can find more availabe datasets here: https://github.com/yaodongC/awesome-instruction-dataset

ChatGLM微调报错

运行uniform_finetune.py

`
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│  /home/inspur/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py │
│ :1                                                                                               │
│ <!DOCTYPE html>                                                                                  │
│ ▲                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
SyntaxError: invalid syntax
`

How to further fine-tune on alpaca

The code only provides the code for fine-tune on llama and other models. If you want to further fine-tune on alpaca with good fine-tune, how to achieve it

搞个中文版的说明啊

反正也没啥老外来看，不如直接弄个中文版的，看得顺畅

GPT-4 Instruction dataset

Take a look:

https://github.com/teknium1/GPTeacher

这个是instruct tunning还是prompt tuning呢？有改变所有模型权重层吗？

Is it possible to run on 24gb vram?

I have a 4090 and decided to add some more dataset to train it.

More data

Swype will properly release their instruction fine tuning dataset real soon:
https://huggingface.co/datasets/swype/instruct-102.4k

And there is also:
https://github.com/nomic-ai/gpt4all

谁试过pytorch2.0 和pytorch1.X的训练速度的区别

我用bloom模型跑的多卡的finetune，但是根据我观察，采用pytorch1.13.1的训练速度和pytorch2.0.0的训练速度，几乎是一样的。想问一下各位大佬，通过lora方法训练模型，pytorch2.0是否有提速？谢谢了。

使用v100-32g的显卡，推理rt需要好几分钟

load_in_8bit可以初始为False

修改load_in_8bit参数，可以不用安装最新的bitsandbytes和peft（尤其是要处理对于cuda等环境依赖的时候，成本会很高）
我推荐 load_in_8bit=False，不影响模型的训练和加载，可以提升大家接入使用的速度

Pls add our work into your citation list.

Cite the original LLaMA, Stanford Alpaca, Self-Instruct and LoRA papers as well, please.

Pls add BELLE in your citation list :)
https://github.com/LianjiaTech/BELLE

--master_addr=xxx --master_port=yyy 这个参数是什么？怎么设置呢？

用多GPU训练会出现问题？

Traceback (most recent call last):
  File "train.py", line 206, in <module>
    trainer.train()
  File "/home/anaconda3/envs/alpaca/lib/python3.8/site-packages/transformers/trainer.py", line 1644, in train
    return inner_training_loop(
  File "/home/anaconda3/envs/alpaca/lib/python3.8/site-packages/transformers/trainer.py", line 1911, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/anaconda3/envs/alpaca/lib/python3.8/site-packages/transformers/trainer.py", line 2657, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/anaconda3/envs/alpaca/lib/python3.8/site-packages/transformers/trainer.py", line 2689, in compute_loss
    outputs = model(**inputs)
  File "/home/anaconda3/envs/alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anaconda3/envs/alpaca/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 157, in forward
    raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

反馈

大佬，gpt4那个数据集，instruction和output全部一样，是不是有什么问题？

cost time

i just want to ask how much time will be cost if someone finetune 1 epoch with 1M instruction data on one GPU A100. I'm doing this, and it seems need 120 hours，so long!

Would you like regularly release newly trained model checkpoint?

Hi,

I have just checkout out the HF repo for this project. I seems most models are not updated for a week.

I know FT would cost time. I wonder if you can release a schedule for regularly updating fined tuned models based on latest dataset included in this project? Or does it mean releaseing your own fine tuned ckpt is no longer within the scope of this project?

Thanks!

这个工程是我目前看到最全的，我们要不合作一下！

您需要什么数据？我们有一个5人标注小队外配一个爬虫工程师，一起构建高质量数据集如何？
详情可加微信：15821444815

用4个v100训练7b模型，显存占用很大

你好，我用4个v100训练7b模型，batch size用默认的4，加载完模型后，每个显卡的显存占用13gb，这个正常吗？为什么单卡24gb就可以呢？有什么方法进一步降低多卡训练的显存占用吗？

How to finetune without lora?

Thanks for your work.
I wanna know how to diable lora config which means train all parameter.
I have viewed the code, whether I can train all parameter by commented the below code out in uniform_finetune.py?

    # config = LoraConfig(
    #     r=args.lora_r,
    #     lora_alpha=args.lora_alpha,
    #     target_modules=args.lora_target_modules,
    #     lora_dropout=args.lora_dropout,
    #     bias="none",
    #     task_type="CAUSAL_LM",
    # )
    # model = get_peft_model(model, config)

    # # the size of trainable parameters for lora modules
    # model.print_trainable_parameters()

Recommend the dataset

The dataset used in SFT by ColossalAI: https://github.com/XueFuzhao/InstructionWild
A summary of available datasets: https://zhuanlan.zhihu.com/p/615277009

中文词表

请问中文词表有做单独处理吗。原来的llama可以编码解码中文，但是绝大多数是以字节的形式编码的~

llama-7b的rt时间多久呢？

训练效果差，每次回复后面总是要跟一堆重复东西

你好，感谢你的开源工作，不过我训练得到的模型效果总是不行，每次输入后面总是跟了一堆重复的东西,,例子如下。但是用你的一系列开源权重模型是没问题的。所以不知道能否给我提供一些训练上的建议，
`
input: hello

response: Hello! 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋 👋

input:你吃饭了吗
response: Yes, I have eaten. 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊 😊
`

一些具体的训练情况如下，选择了llama7b的模型，然后主要是修改了train_batch_size 改成了32(因为觉得训练有点慢就把它加大了)，把梯度累计改成了4，其他没选择的都是代码里的默认参数

python3 uniform_finetune.py --model_type llama --model_name_or_path weight/llama-7b-hf \ --data alpaca --lora_target_modules q_proj v_proj \ --per_gpu_train_batch_size 32 --learning_rate 3e-4 --epochs 3 --output_dir test_output \ --gradient_accumulation_steps 4 \

测试代码用的app.py，也没有修改里面的内容。

llama模型训练中途失败

Data from ShareGPT

Will crawling the data from ShareGPT (as done by Google Bard and Vicuna) be possible? The conversations shared by real-person users of ChatGPT are of very high quality.

The repo of ShareGPT: https://github.com/domeccleston/sharegpt.

phoebussi / alpaca-cot Goto Github PK

alpaca-cot's People

Stargazers

Watchers

Forkers

alpaca-cot's Issues

Response:

Recommend Projects

Recommend Topics

Recommend Org