hit-scir / chinese-mixtral-8x7b Goto Github PK
View Code? Open in Web Editor NEW中文Mixtral-8x7B(Chinese-Mixtral-8x7B)
License: Apache License 2.0
中文Mixtral-8x7B(Chinese-Mixtral-8x7B)
License: Apache License 2.0
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"
from peft import LoraConfig, TaskType, get_peft_model, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, DataCollatorForLanguageModeling
model_path = "<xxx_path>/PretrainedModels/Mixtral-8x7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
)
model.enable_input_require_grads()
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
lora_rank = 64
lora_alpha = 128
lora_dropout = 0.05
lora_modules_to_save = "embed_tokens lm_head"
lora_target_modules = "q_proj v_proj k_proj o_proj w1 w2 w3"
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=lora_target_modules,
modules_to_save=lora_modules_to_save,
inference_mode=False,
r=lora_rank,
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
)
model = get_peft_model(model, peft_config)
在最后一行 model = get_peft_model(model, peft_config)
报错:
RuntimeError: only Tensors of floating point dtype can require gradients
我运行您这个脚本时就有这个错误,所以我把这些代码单拎出来发现也有这个错误,请问你们遇到过吗
目前个人用户购买苹果机来运行大模型,比较可取
您好,请问指令微调需要多少显存,有没有可以推荐的指令微调数据集?
4090 24g可以本地加载推理吗?微调又需要多少内存呢?Qlora训练又需要多少内存呢?全量训练的内存需求很大 对吗
我自己验证原始Mixtral 8x7B非指令版本的时候,在MMLU数据集上只有62分,跟论文的70分还有你们的结果69分差距有点大,想看看是不是推理的时候有什么差异?
你好,
我看到你们使用的是“12G知乎数据和2G悟道数据上训练中文BPE词表“
我的问题有两个:
感谢你们的这个repo,我也学习了很多。
您好,想要请教一下两个问题,
1.在词表扩充这里我们的新表是前32k个token与llama2原生一样的,只扩充后面新增的 token么,是怎么得到这样一个词表的, 在训练划分词表时可以配置一部分原先的token关系不变,还是完全训练一张新表,然后对照原始llama2的表对前32k个token重新排列一致了。
2.在原始32k词表中可能包含1~2k个少量中文常见字,我们后面扩充的几十k 中文新增token,可能已经包含了这部分,是需要对他们从前面32k中移除掉,还是直接不用管?
如能赐教,不胜感激 ~
请问微调需要的最小的GPU是多少?4 x A100(40GB)可以吗?
“在获得新词表后,我们需要对embedding和lm_head层进行扩充和初始化。我们使用新Token在旧embedding层中的词嵌入平均值对扩充部分进行初始化。在我们的前期实验中,这种方法略优于HuggingFace的默认实现,即使用固定的正态分布进行初始化。”
请问具体如何实现,可以指出具体代码位置吗?
我在使用以下命令构建了虚拟环境:
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117
pip install deepspeed
pip install transformers==4.36.2 datasets evaluate peft accelerate gradio optimum sentencepiece trl
pip install jupyterlab scikit-learn pandas matplotlib tensorboard nltk rouge bitsandbytes fire
pip install flash-attn --no-build-isolation
bash scripts/train-pt.sh
报错信息如下:
Traceback (most recent call last):
File "/Data1/home/fanziqi/Project/Chinese-Mixtral-8x7B/train.py", line 227, in
train()
File "/Data1/home/fanziqi/Project/Chinese-Mixtral-8x7B/train.py", line 204, in train
model.print_trainable_parameters()
File "/Data1/home/fanziqi/.conda/envs/huozi_ft/lib/python3.10/site-packages/peft/peft_model.py", line 531, in print_trainable_parameters
trainable_params, all_param = self.get_nb_trainable_parameters()
File "/Data1/home/fanziqi/.conda/envs/huozi_ft/lib/python3.10/site-packages/peft/peft_model.py", line 511, in get_nb_trainable_parameters
num_bytes = param.quant_storage.itemsize if hasattr(param, "quant_storage") else 1
AttributeError: 'torch.dtype' object has no attribute 'itemsize'
我的scripts/train-pt.sh脚本内容如下:
TRAIN_DATASETS=(
# 1:SkyPile-150B-2022
# 0.1:SkyPile-150B-2023
1:DKYoon-SlimPajama-6B
)
VALID_DATASETS=(
# SkyPile-150B-2022
# SkyPile-150B-2023
DKYoon-SlimPajama-6B
)
TRAIN_PARAMS=""
TRAIN_PARAMS+=" --enable_lora"
TRAIN_PARAMS+=" --lora_alpha 128"
TRAIN_PARAMS+=" --lora_dropout 0.05"
TRAIN_PARAMS+=" --lora_rank 64"
TRAIN_PARAMS+=" --lora_target_modules q_proj v_proj k_proj o_proj w1 w2 w3"
TRAIN_PARAMS+=" --lora_modules_to_save embed_tokens lm_head"
TRAIN_PARAMS+=" --model_name_or_path /Data1/home/fanziqi/.cache/huggingface/modelscope/HIT-SCIR/huozi3"
TRAIN_PARAMS+=" --tokenizer_name_or_path /Data1/home/fanziqi/.cache/huggingface/modelscope/HIT-SCIR/huozi3"
TRAIN_PARAMS+=" --train_datasets ${TRAIN_DATASETS[]}"
TRAIN_PARAMS+=" --valid_datasets ${VALID_DATASETS[]}"
TRAIN_PARAMS+=" --dataloader_drop_last"
TRAIN_PARAMS+=" --cache_dir hf-cache"
TRAIN_PARAMS+=" --output_dir outputs/$SLURM_JOB_ID"
TRAIN_PARAMS+=" --num_train_epochs 1"
TRAIN_PARAMS+=" --model_max_length 2048"
TRAIN_PARAMS+=" --per_device_train_batch_size 4"
TRAIN_PARAMS+=" --gradient_accumulation_steps 1"
TRAIN_PARAMS+=" --optim adamw_torch_fused"
TRAIN_PARAMS+=" --per_device_eval_batch_size 4"
TRAIN_PARAMS+=" --evaluation_strategy steps"
TRAIN_PARAMS+=" --eval_steps 500"
TRAIN_PARAMS+=" --save_strategy steps"
TRAIN_PARAMS+=" --save_steps 1000"
TRAIN_PARAMS+=" --learning_rate 1e-5"
TRAIN_PARAMS+=" --warmup_ratio 0.05"
TRAIN_PARAMS+=" --logging_dir logs/tb/$SLURM_JOB_ID"
TRAIN_PARAMS+=" --logging_strategy steps"
TRAIN_PARAMS+=" --logging_steps 1"
TRAIN_PARAMS+=" --lr_scheduler_type cosine"
TRAIN_PARAMS+=" --report_to tensorboard"
TRAIN_PARAMS+=" --gradient_checkpointing"
TRAIN_PARAMS+=" --bf16"
TRAIN_PARAMS+=" --deepspeed ds-config/config.json"
TORCHRUN_PARAMS='--nproc_per_node 2 --nnodes 1 --rdzv_id=0 '
CUDA_VISIBLE_DEVICES=6,7 torchrun --master_port 29501 $TORCHRUN_PARAMS train.py $TRAIN_PARAMS
pytorch环境:
原始torch版本1_cu117
torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2
更换torch版本2
torch 2.0.1+cu118
torchaudio 2.0.2+cu118
torchvision 0.15.2+cu118
更换torch版本3
torch 2.1.0+cu121
torchaudio 2.1.0+cu121
torchvision 0.16.0+cu121
训练容器中系统
Ubuntu22.04LTS
CUDA12.0
服务器中系统
Ubuntu22.04LTS
CUDA12.0/11.8互相切换匹配尝试
上述所有环境都尝试运行scripts/train-pt.sh,都出现“torchrun: command not found”问题,请问可能是什么原因导致?如何解决?
想请问下,ALP折线图的横坐标代表的是词表大小还是语料的token数目?
你好,请问在训练过程中,使用了多少gpu?以及至少需要多大的显存才可以训练?
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from setproctitle import setproctitle
model_path = "/home/q/nfs_share/huggingface/hub/Chinese-Mixtral-8x7B"
quant_path = "Chinese-Mixtral-8x7B-awq"
setproctitle(quant_path)
modules_to_not_convert = ["gate"]
quant_config = {
"zero_point": True,
"q_group_size": 128,
"w_bit": 4,
"version": "GEMM",
"modules_to_not_convert": modules_to_not_convert,
}
# Load model
# NOTE: pass safetensors=True to load safetensors
model = AutoAWQForCausalLM.from_pretrained(
model_path, safetensors=True, **{"low_cpu_mem_usage": True}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(
tokenizer, quant_config=quant_config, modules_to_not_convert=modules_to_not_convert
)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')
# nohup python mixtral_quant.py > mixtral_quant.log 2>&1 &
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer, TextIteratorStreamer
from threading import Thread
# Load model
quant_path = "casperhansen/mixtral-instruct-awq"
# quant_path = "./Chinese-Mixtral-8x7B-awq"
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
prompt_template = """\
<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>"""
def trans(en):
tokens = tokenizer(
prompt_template.format(prompt=en),
return_tensors="pt",
).input_ids.cuda()
# Generate output
generation_output = model.generate(
tokens,
streamer=TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True),
max_new_tokens=8192,
)
trans("你的名字")
重复你的名字~
你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字
中文数据集与英文数据集均预训练了2epoch还是仅英文训练了2epoch呢? 感谢回复
如题
请教一下,怎样用命令行torchrun训练微调?
您好。在A800上用4bit加载模型推理速度反而比不量化更慢了,几乎慢了一倍。请问是什么原因呢
哪里可以找到 tokenizer/Mixtral-8x7B-v0.1-vocab? 在preprocessing时候无法继续
Traceback (most recent call last):
File "/home/ai/Documents/daniel208/Mixtral/Chinese-Mixtral-8x7B-main/train.py", line 12, in
from data.utils import parse_dataset_name_and_ratio, count_token
ModuleNotFoundError: No module named 'data.utils'
这是我进行的错误讯息
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.