Giter Site home page Giter Site logo

Comments (13)

ZenXir avatar ZenXir commented on May 18, 2024

是这样的大佬老师 我使用合并的model作为 base model 来 finetune, 提示这个错误
关于 MAX_STEPS 设置为None的原因

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.04s/it]
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 473.18it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 42.30it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 45.51it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876
Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 228, in <module>
    trainer = transformers.Trainer(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 543, in __init__
    if args.max_steps > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
(Chinese-alpaca-lora) root@DESKTOP-6KDJTBC:/mnt/e/Chinese-Vicuna#

from chinese-vicuna.

Facico avatar Facico commented on May 18, 2024

@ZenXir max_step会在代码下面改。这个问题我昨天在本地branch改了忘push上来了,你可以更新一下。

from chinese-vicuna.

ZenXir avatar ZenXir commented on May 18, 2024

好的大佬老师

from chinese-vicuna.

ZenXir avatar ZenXir commented on May 18, 2024

大佬老师 我使用合并的model 使用finetune.py 训练
试了多次 一直报错

模型合并过程和流程分两步:
1、是先按照 https://github.com/ymcui/Chinese-LLaMA-Alpaca 给出的embedding过的model 合并出 pth模型
2、把 1 合并出的pth模型,再通过 stransformer 转换成 huggingface 格式:
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /mnt/e/Chinese-LLaMA-Alpaca/model --model_size 7B --output_dir /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

finetune命令是:
python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

报错内容是这个:

CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:21<00:00, 10.80s/it]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.06it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876

 If there's a warning about missing keys above, please disregard :)
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                                             | 0/16260 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 271, in <module>
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2649, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2681, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/peft-0.3.0.dev0-py3.9.egg/peft/peft_model.py", line 529, in forward
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/accelerate-0.17.1-py3.9.egg/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/models/llama/modeling_llama.py", line 786, in forward
    loss = loss_fct(shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1))
RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080

from chinese-vicuna.

Facico avatar Facico commented on May 18, 2024

@ZenXir 我还没跑过他们的,你先自己研究一下吧。你这个情况就是没成功转过来。

RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080,llama的词表就是32000左右,这个仓库词表好像是49954这么多吧(不知道后续有没有更新)。如果我猜的没错的话,应该是要加上这一段东西model.resize_token_embeddings(len(tokenizer)) 来更新model内部的embedding维度,你可以试试

from chinese-vicuna.

ZenXir avatar ZenXir commented on May 18, 2024

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬
我让机器跑两天 看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为:", vocab_size)
model.resize_token_embeddings(vocab_size)

from chinese-vicuna.

ZenXir avatar ZenXir commented on May 18, 2024

@Facico 对了大佬老师
用合并了 embedding model的模型finetune 我使用的命令是:
python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

其他参数都是默认的,我的机器是单卡 RTX4090 24G
在影响训练效果,和速度方面 有什么建议调整的参数不?
像 bath_size , test_size, epoch 什么的
尤其效果方面的 到时候可以更直观的对比

from chinese-vicuna.

Facico avatar Facico commented on May 18, 2024

抱歉消息太多了有些消息会看漏,如果要直观的对比的话,保持batch size和epoch就可以了,如果想要跑快一点可以将mirco batch size调大

from chinese-vicuna.

molyswu avatar molyswu commented on May 18, 2024

双卡,RTX3090:

if not args.wandb:
37 os.environ["WANDB_MODE"] = "disable"
38 # optimized for RTX 4090. for larger GPUs, increase some of these?
39 MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
40 BATCH_SIZE = 128
41 MAX_STEPS = None
42 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
43 EPOCHS = 3 # we don't always need 3 tbh
44 LEARNING_RATE = 3e-4 # the Karpathy constant
45 CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
46 LORA_R = 8
47 LORA_ALPHA = 16
48 LORA_DROPOUT = 0.05
49 VAL_SET_SIZE = args.test_size #2000
50 TARGET_MODULES = [
51 "q_proj",
52 "v_proj",
53 ]

from chinese-vicuna.

molyswu avatar molyswu commented on May 18, 2024

/root/anaconda3/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/32481 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
./Chinese-Vicuna/finetune.py:271 in │
│ │
│ 268 │
│ 269 print("\n If there's a warning about missing keys above, please disregard :)") │
│ 270 │
│ ❱ 271 trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) │
│ 272 │
│ 273 model.save_pretrained(OUTPUT_DIR) │
│ 274 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1929 in _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2699 in training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2731 in compute_loss │
│ │
│ 2728 │ │ │ labels = inputs.pop("labels") │
│ 2729 │ │ else: │
│ 2730 │ │ │ labels = None │
│ ❱ 2731 │ │ outputs = model(**inputs) │
│ 2732 │ │ # Save past state if it exists │
│ 2733 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2734 │ │ if self.args.past_index >= 0: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ in forward:663 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:709 in │
│ forward │
│ │
│ 706 │ │ │ shift_labels = labels[..., 1:].contiguous() │
│ 707 │ │ │ # Flatten the tokens │
│ 708 │ │ │ loss_fct = CrossEntropyLoss() │
│ ❱ 709 │ │ │ shift_logits = shift_logits.view(-1, self.config.vocab_size) │
│ 710 │ │ │ shift_labels = shift_labels.view(-1) │
│ 711 │ │ │ # Enable model parallelism │
│ 712 │ │ │ shift_labels = shift_labels.to(shift_logits.device) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000
0%| | 0/32481 [00:04<?, ?it/s]

from chinese-vicuna.

godzeo avatar godzeo commented on May 18, 2024

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬 我让机器跑两天 看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为:", vocab_size)
model.resize_token_embeddings(vocab_size)

大佬三句代码是加在哪一步的哪个文件里面呢?我也想做同样的训练,奈何我太菜了,没明白

from chinese-vicuna.

Facico avatar Facico commented on May 18, 2024

@godzeo 放在加载完模型和tokenizer后就行

from chinese-vicuna.

abbhay avatar abbhay commented on May 18, 2024

好的大佬老师

老哥 这个max_step 怎么填哇

from chinese-vicuna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.