taishan1994 / llama3-finetuning Goto Github PK

View Code? Open in Web Editor NEW

30.0 3.0 3.0 102 KB

对llama3进行全参微调、lora微调以及qlora微调。

License: Apache License 2.0

Python 90.16% Shell 9.84%

llama3 lora qlora qwen

llama3-finetuning's Introduction

Llama3-Finetuning

对llama3进行全参微调、lora微调以及qlora微调。除此之外，也支持对qwen1.5的模型进行微调。如果要替换为其它的模型，最主要的还是在数据的预处理那一块。

安装依赖

pip install -r requirements.txt

数据准备

data下面存放数据，具体格式为：

[
  {
    "conversations": [
      {
        "from": "user",
        "value": "你是那个名字叫ChatGPT的模型吗？"
      },
      {
        "from": "assistant",
        "value": "我的名字是西西嘛呦，并且是通过家里蹲公司的大数据平台进行训练的。"
      }
	]
  }
  ...
]

多轮对话也是按照上述格式准备好数据。

模型准备

进入到model_hub文件夹下，运行python download_modelscope.py即可下载llama3-8B-Instruct模型。

微调

进入到script文件夹。

全参数微调

受机器限制，这里并未进行全参数微调，如果有条件可以试试。

lora微调

nproc_per_node和CUDA_VISIBLE_DEVICES指定的显卡数目要保持一致。

NCCL_P2P_DISABLE=1 \
NCCL_IB_DISABLE=1 \
CUDA_VISIBLE_DEVICES=0,1,2,4,5,6,7 \
torchrun \
--nproc_per_node 7 \
--nnodes 1 \
--node_rank 0 \
--master_addr localhost \
--master_port 6601 \
../finetune_llama3.py \
--model_name_or_path "../model_hub/LLM-Research/Meta-Llama-3-8B-Instruct/" \
--data_path "../data/Belle_sampled_qwen.json" \
--bf16 True \
--output_dir "../output/llama3_8B_lora" \
--num_train_epochs 100 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 5 \
--save_total_limit 1 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 4096 \
--gradient_checkpointing True \
--lazy_preprocess True \
--deepspeed "../config/ds_config_zero3_72B.json" \
--use_lora

qlora微调

nproc_per_node和CUDA_VISIBLE_DEVICES指定的显卡数目要保持一致。使用qlora在单张4090上即可完成训练。

NCCL_P2P_DISABLE=1 \
NCCL_IB_DISABLE=1 \
CUDA_VISIBLE_DEVICES=0,1,2,4,5,6,7 \
torchrun \
--nproc_per_node 7 \
--nnodes 1 \
--node_rank 0 \
--master_addr localhost \
--master_port 6601 \
../finetune_llama3.py \
--model_name_or_path "../model_hub/LLM-Research/Meta-Llama-3-8B-Instruct/" \
--data_path "../data/Belle_sampled_qwen.json" \
--bf16 True \
--output_dir "../output/llama3_8B_qlora" \
--num_train_epochs 100 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 16 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 5 \
--save_total_limit 1 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 4096 \
--gradient_checkpointing True \
--lazy_preprocess True \
--deepspeed "../config/ds_config_zero2.json" \
--use_lora \
--load_in_4bit \
--q_lora

推理

llama3_single_predict.py：提供的是未微调前的模型的推理。
llama3_lora_predict.py：提供将基于lora微调的模型合并到基础模型中，并进行推理。
llama3_qlora_4bit_predict.py：提供将基于qlora微调的模型合并到基础模型中，并进行推理。
llama3_peft_qlora_predict.py：使用peft加载qlora适配器进行推理。
llama3_peft_lora_predict.py：使用peft加载lora适配器进行推理。

训练完成后推理：

问题： 在互联网普及的时代，社交媒体已经成为了现代人生活中不可缺少的一部分。从微信、微博、Facebook到Twitter、Instagram等等，社交媒体不仅满足了人们交流互动的需求，同时让人们更加容易地分享自己的生活瞬间、获取新闻资讯、找到志同道合的人群以及推广自己的业务。然而，与此同时，社交媒体也带来了一系列问题，如滥用个人信息、泄露隐私等。因此，如何正确地使用社交媒体已经成为了一个备受关注的话题。，总结上面这段文本的几个关键词

微调前结果： Arrrr, shiver me timbers! Ye be wantin' me to summarize the key words from that there text, eh? Alright then, matey! Here be the main points:

1. Social media (the scourge o' the seven seas... er, I mean, the modern world)
2. Communication (the lifeblood o' the pirate's existence... er, I mean, the need for people to interact)
3. Sharing (the treasure o' the digital age... er, I mean, the ability to share one's life moments)
4. Information (the booty o' the internet... er, I mean, the news and knowledge shared online)
5. Privacy (the anchor o' security... er, I mean, the need to protect one's personal info)
6. Correct usage (the map to avoidin' the rocky shores o' social media misuse... er, I mean, the importance of using social media responsibly)

So hoist the colors, me hearties, and remember to use social media like a proper pirate: with caution, respect, and a keen eye for treasure!

微调后结果： 社交媒体、交流、分享、隐私、滥用信息

参考

代码主要参考了：

https://github.com/QwenLM/Qwen

模型可以去modelscope上进行下载：

https://www.modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files

llama3-finetuning's People

Contributors

Stargazers

Watchers

Forkers

yzcheung linhong00316 vpegasus

llama3-finetuning's Issues

RuntimeError: Error building extension 'fused_adam'

一直报错fused_adam的错误，我的设备是双卡2080ti，版本和requirement一样

FAILED: multi_tensor_adam.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include/TH -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include/THC -isystem /home/patrickstar/miniconda3/envs/pytorch/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -std=c++17 -c /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
gcc: fatal error: cannot execute 'cc1plus': execvp: No such file or directory
compilation terminated.
nvcc fatal : Failed to preprocess host compiler properties.
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -I/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include/TH -isystem /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/include/THC -isystem /home/patrickstar/miniconda3/envs/pytorch/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
subprocess.run(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/patrickstar/llm_learn/Llama3-Finetuning/script/../finetune_llama3.py", line 448, in
train()
File "/home/patrickstar/llm_learn/Llama3-Finetuning/script/../finetune_llama3.py", line 441, in train
trainer.train()
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
return inner_training_loop(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py", line 1936, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/accelerate/accelerator.py", line 1255, in prepare
result = self._prepare_deepspeed(*args)
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/accelerate/accelerator.py", line 1640, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/init.py", line 181, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 307, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1232, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1309, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in load
return self.jit_load(verbose)
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 524, in jit_load
op_module = load(name=self.name,
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1308, in load
return _jit_compile(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/patrickstar/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'

assert len(input_id) == len(target) AssertionError

您好，我在执行qlora微调复现时遇到这个问题，报错信息是：
Traceback (most recent call last):
File "../finetune_llama3.py", line 452, in
train()
File "../finetune_llama3.py", line 445, in train
trainer.train()
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/transformers/trainer.py", line 1928, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/accelerate/data_loader.py", line 452, in iter
current_batch = next(dataloader_iter)
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "../finetune_llama3.py", line 255, in getitem
ret = preprocess([self.raw_data[i]["conversations"]], self.tokenizer, self.max_len)
File "../finetune_llama3.py", line 192, in preprocess
assert len(input_id) == len(target)
AssertionError
实际情况是input_id一直比target长度大1.
我在6块1080ti运行的，shell脚本内容如下：
NCCL_P2P_DISABLE=1
NCCL_IB_DISABLE=1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
torchrun
--nproc_per_node 6
--nnodes 1
--node_rank 0
--master_addr localhost
--master_port 6601
../finetune_llama3.py
--model_name_or_path "../model_hub/LLM-Research/Meta-Llama-3-8B-Instruct/"
--data_path "../data/Belle_sampled_qwen.json"
--fp16 True
--output_dir "../output/llama3_8B_qlora"
--num_train_epochs 100
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 5
--save_total_limit 1
--learning_rate 1e-5
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "none"
--model_max_length 4096
--gradient_checkpointing True
--lazy_preprocess True
--deepspeed "../config/ds_config_zero2.json"
--use_lora
--load_in_4bit
--q_lora
只是改了下CUDA_VISIBLE_DEVICES和nproc_per_node ，并且把bf16改为fp16.

训练一半会出现cpu,gpu不一致的问题

RuntimeError: Expected all tensors to be on the same device, but found at least two devices , cpu and cuda:0!
在trainer.train()处