Giter Site home page Giter Site logo

chinese-mixtral-8x7b's People

Contributors

carfly avatar jubgjf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chinese-mixtral-8x7b's Issues

您好,我这里使用4bit量化后get_peft_model报错

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

from peft import LoraConfig, TaskType, get_peft_model, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, DataCollatorForLanguageModeling

model_path = "<xxx_path>/PretrainedModels/Mixtral-8x7B-v0.1"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
)
model.enable_input_require_grads()

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)

lora_rank = 64
lora_alpha = 128
lora_dropout = 0.05
lora_modules_to_save = "embed_tokens lm_head"
lora_target_modules = "q_proj v_proj k_proj o_proj w1 w2 w3"

peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules=lora_target_modules,
    modules_to_save=lora_modules_to_save,
    inference_mode=False,
    r=lora_rank,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
)
model = get_peft_model(model, peft_config)

在最后一行 model = get_peft_model(model, peft_config) 报错:
RuntimeError: only Tensors of floating point dtype can require gradients

我运行您这个脚本时就有这个错误,所以我把这些代码单拎出来发现也有这个错误,请问你们遇到过吗

能否说下硬件需求。

4090 24g可以本地加载推理吗?微调又需要多少内存呢?Qlora训练又需要多少内存呢?全量训练的内存需求很大 对吗

词表扩充数据:12G知乎数据和2G悟道数据上训练中文BPE词表

你好,

我看到你们使用的是“12G知乎数据和2G悟道数据上训练中文BPE词表“

我的问题有两个:

  1. 选择这两个数据集是你们的经验吗?即这两个数据集的中文质量高。对吧?
  2. 这两个数据集从哪里下载呢?

感谢你们的这个repo,我也学习了很多。

关于词表扩充

您好,想要请教一下两个问题,
1.在词表扩充这里我们的新表是前32k个token与llama2原生一样的,只扩充后面新增的 token么,是怎么得到这样一个词表的, 在训练划分词表时可以配置一部分原先的token关系不变,还是完全训练一张新表,然后对照原始llama2的表对前32k个token重新排列一致了。
2.在原始32k词表中可能包含1~2k个少量中文常见字,我们后面扩充的几十k 中文新增token,可能已经包含了这部分,是需要对他们从前面32k中移除掉,还是直接不用管?
如能赐教,不胜感激 ~

通信量不一致

多卡直接用huggingface transformers的trainer,训练mixtral,遇见这种错误,请问您们有遇到过吗?
issue
看起来像是,因为moe的专家路由,导致每张卡的通信量不一致了?

init_embeddings 模型转换问题

image

mixstral 8x7b 开源模型是 safetensors 格式, 这里要改成:

model = AutoModelForCausalLM.from_pretrained(old_model, device_map="auto", torch_dtype=torch.bfloat16,trust_remote_code=True)
model_dict = model.state_dict()

关于embedding扩展的细节

“在获得新词表后,我们需要对embedding和lm_head层进行扩充和初始化。我们使用新Token在旧embedding层中的词嵌入平均值对扩充部分进行初始化。在我们的前期实验中,这种方法略优于HuggingFace的默认实现,即使用固定的正态分布进行初始化。”

请问具体如何实现,可以指出具体代码位置吗?

增量预训练出错

我在使用以下命令构建了虚拟环境:
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117
pip install deepspeed
pip install transformers==4.36.2 datasets evaluate peft accelerate gradio optimum sentencepiece trl
pip install jupyterlab scikit-learn pandas matplotlib tensorboard nltk rouge bitsandbytes fire
pip install flash-attn --no-build-isolation
bash scripts/train-pt.sh

报错信息如下:
Traceback (most recent call last):
File "/Data1/home/fanziqi/Project/Chinese-Mixtral-8x7B/train.py", line 227, in
train()
File "/Data1/home/fanziqi/Project/Chinese-Mixtral-8x7B/train.py", line 204, in train
model.print_trainable_parameters()
File "/Data1/home/fanziqi/.conda/envs/huozi_ft/lib/python3.10/site-packages/peft/peft_model.py", line 531, in print_trainable_parameters
trainable_params, all_param = self.get_nb_trainable_parameters()
File "/Data1/home/fanziqi/.conda/envs/huozi_ft/lib/python3.10/site-packages/peft/peft_model.py", line 511, in get_nb_trainable_parameters
num_bytes = param.quant_storage.itemsize if hasattr(param, "quant_storage") else 1
AttributeError: 'torch.dtype' object has no attribute 'itemsize'

我的scripts/train-pt.sh脚本内容如下:

TRAIN_DATASETS=(
# 1:SkyPile-150B-2022
# 0.1:SkyPile-150B-2023
1:DKYoon-SlimPajama-6B
)

VALID_DATASETS=(
# SkyPile-150B-2022
# SkyPile-150B-2023
DKYoon-SlimPajama-6B
)

TRAIN_PARAMS=""
TRAIN_PARAMS+=" --enable_lora"
TRAIN_PARAMS+=" --lora_alpha 128"
TRAIN_PARAMS+=" --lora_dropout 0.05"
TRAIN_PARAMS+=" --lora_rank 64"
TRAIN_PARAMS+=" --lora_target_modules q_proj v_proj k_proj o_proj w1 w2 w3"
TRAIN_PARAMS+=" --lora_modules_to_save embed_tokens lm_head"
TRAIN_PARAMS+=" --model_name_or_path /Data1/home/fanziqi/.cache/huggingface/modelscope/HIT-SCIR/huozi3"
TRAIN_PARAMS+=" --tokenizer_name_or_path /Data1/home/fanziqi/.cache/huggingface/modelscope/HIT-SCIR/huozi3"
TRAIN_PARAMS+=" --train_datasets ${TRAIN_DATASETS[]}"
TRAIN_PARAMS+=" --valid_datasets ${VALID_DATASETS[
]}"
TRAIN_PARAMS+=" --dataloader_drop_last"
TRAIN_PARAMS+=" --cache_dir hf-cache"
TRAIN_PARAMS+=" --output_dir outputs/$SLURM_JOB_ID"
TRAIN_PARAMS+=" --num_train_epochs 1"
TRAIN_PARAMS+=" --model_max_length 2048"
TRAIN_PARAMS+=" --per_device_train_batch_size 4"
TRAIN_PARAMS+=" --gradient_accumulation_steps 1"
TRAIN_PARAMS+=" --optim adamw_torch_fused"
TRAIN_PARAMS+=" --per_device_eval_batch_size 4"
TRAIN_PARAMS+=" --evaluation_strategy steps"
TRAIN_PARAMS+=" --eval_steps 500"
TRAIN_PARAMS+=" --save_strategy steps"
TRAIN_PARAMS+=" --save_steps 1000"
TRAIN_PARAMS+=" --learning_rate 1e-5"
TRAIN_PARAMS+=" --warmup_ratio 0.05"
TRAIN_PARAMS+=" --logging_dir logs/tb/$SLURM_JOB_ID"
TRAIN_PARAMS+=" --logging_strategy steps"
TRAIN_PARAMS+=" --logging_steps 1"
TRAIN_PARAMS+=" --lr_scheduler_type cosine"
TRAIN_PARAMS+=" --report_to tensorboard"
TRAIN_PARAMS+=" --gradient_checkpointing"
TRAIN_PARAMS+=" --bf16"
TRAIN_PARAMS+=" --deepspeed ds-config/config.json"

TORCHRUN_PARAMS='--nproc_per_node 2 --nnodes 1 --rdzv_id=0 '

CUDA_VISIBLE_DEVICES=6,7 torchrun --master_port 29501 $TORCHRUN_PARAMS train.py $TRAIN_PARAMS

torchrun: command not found

pytorch环境:
原始torch版本1_cu117
torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2
更换torch版本2
torch 2.0.1+cu118
torchaudio 2.0.2+cu118
torchvision 0.15.2+cu118

更换torch版本3
torch 2.1.0+cu121
torchaudio 2.1.0+cu121
torchvision 0.16.0+cu121
训练容器中系统
Ubuntu22.04LTS
CUDA12.0
服务器中系统
Ubuntu22.04LTS
CUDA12.0/11.8互相切换匹配尝试

上述所有环境都尝试运行scripts/train-pt.sh,都出现“torchrun: command not found”问题,请问可能是什么原因导致?如何解决?

ALP横坐标意义

想请问下,ALP折线图的横坐标代表的是词表大小还是语料的token数目?

autoawq量化后推理很奇怪

量化 mixtral_quant.py

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from setproctitle import setproctitle

model_path = "/home/q/nfs_share/huggingface/hub/Chinese-Mixtral-8x7B"
quant_path = "Chinese-Mixtral-8x7B-awq"

setproctitle(quant_path)

modules_to_not_convert = ["gate"]
quant_config = {
    "zero_point": True,
    "q_group_size": 128,
    "w_bit": 4,
    "version": "GEMM",
    "modules_to_not_convert": modules_to_not_convert,
}

# Load model
# NOTE: pass safetensors=True to load safetensors
model = AutoAWQForCausalLM.from_pretrained(
    model_path, safetensors=True, **{"low_cpu_mem_usage": True}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(
    tokenizer, quant_config=quant_config, modules_to_not_convert=modules_to_not_convert
)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')


# nohup python mixtral_quant.py > mixtral_quant.log 2>&1 &

推理,显存占用30G

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer, TextIteratorStreamer
from threading import Thread
# Load model
quant_path = "casperhansen/mixtral-instruct-awq"
# quant_path = "./Chinese-Mixtral-8x7B-awq"
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
prompt_template = """\
<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>"""

def trans(en):
    tokens = tokenizer(
        prompt_template.format(prompt=en),
        return_tensors="pt",
    ).input_ids.cuda()

    # Generate output
    generation_output = model.generate(
        tokens,
        streamer=TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True),
        max_new_tokens=8192,
    )

trans("你的名字")

输出结果

重复你的名字~

你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字,你的名字

微调

请教一下,怎样用命令行torchrun训练微调?

4bit量化推理速度变慢

您好。在A800上用4bit加载模型推理速度反而比不量化更慢了,几乎慢了一倍。请问是什么原因呢

没有tokenizer文件

哪里可以找到 tokenizer/Mixtral-8x7B-v0.1-vocab? 在preprocessing时候无法继续

询问一下data.utils套件要在哪裡下载?

Traceback (most recent call last):

File "/home/ai/Documents/daniel208/Mixtral/Chinese-Mixtral-8x7B-main/train.py", line 12, in
from data.utils import parse_dataset_name_and_ratio, count_token
ModuleNotFoundError: No module named 'data.utils'

这是我进行的错误讯息

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.