baichuan-inc / baichuan2 Goto Github PK
View Code? Open in Web Editor NEWA series of large language models developed by Baichuan Intelligent Technology
Home Page: https://huggingface.co/baichuan-inc
License: Apache License 2.0
A series of large language models developed by Baichuan Intelligent Technology
Home Page: https://huggingface.co/baichuan-inc
License: Apache License 2.0
我是用transformers的trainer类去做的微调训练,每次一到eval的步骤就会报错,信息如下:
AttributeError: Caught AttributeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/peft/peft_model.py", line 931, in forward
return self.base_model(
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 692, in forward
outputs = self.model(
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 404, in forward
alibi_mask = self.get_alibi_mask(inputs_embeds, seq_length_with_past)
File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 354, in get_alibi_mask
mask = self.future_mask[
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BaichuanModel' object has no attribute 'future_mask'
之后我又改用Llam-efficient-tuning用和调baichuan1一样的方法去调baichuan2,使用了deepspeed,同样是在eval步骤出错。报错:
AttributeError: 'Parameter' object has no attribute 'ds_status'
求问是什么原因
Exception in thread Thread-2:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 1648, in generate
return self.sample(
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 2766, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf
, nan
or element < 0
大佬好,想问一下,baichuan2基于文本的预训练代码有吗?会开放吗?想在做IFT之前做一下预训练。
这个模型,如果要自己进行微调,需要大概什么样的硬件配置,大概需要多久。
同样的推理代码。我把llama2-7b-chat的checkpoint换成了baichuan2-7b-chat。就提示显存不够了,是否有什么魔法
https://github.com/baichuan-inc/Baichuan2#%E9%87%8F%E5%8C%96%E6%95%88%E6%9E%9C ,你这里不是写的fp16只用26吗,为什么我v100-32G,betch-size1都oom,难道另有玄只因
1、训练显存占用情况;
2、推理显存占用情况,能否支持多卡;
3、推理量化显存占用情况。
我们在 opencompass 上对 Baichuan2-7B-Base / Baichuan2-7B-Chat / Baichuan2-13B-Base / Baichuan2-13B-Chat 模型进行了评测,评测结果如下:
更多评测细节请见 https://opencompass.org.cn/leaderboard-llm
1、python3.10
2、按照要求安装了requirements.txt
3、git clone https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits
4、修改cli_demo.py里面的模型加载代码:
model = AutoModelForCausalLM.from_pretrained(
"D:\2-huggingface\Baichuan2-13B-Chat-4bits",
device_map="auto",
trust_remote_code=True
)
model.generation_config = GenerationConfig.from_pretrained(
"D:\2-huggingface\Baichuan2-13B-Chat-4bits"
)
tokenizer = AutoTokenizer.from_pretrained(
"D:\2-huggingface\Baichuan2-13B-Chat-4bits",
use_fast=False,
trust_remote_code=True
)
5、执行python cli_demo.py报错:
Exception has occurred: ImportError
Needs import model weight init func to run quantize.
File "C:\Users\WX.cache\huggingface\modules\transformers_modules\Baichuan2-13B-Chat-4bits\modeling_baichuan.py", line 606, in from_pretrained
from .quantizer import init_model_weight_int4
File "C:\Users\WX.cache\huggingface\modules\transformers_modules\Baichuan2-13B-Chat-4bits\quantizer.py", line 1, in
import bitsandbytes as bnb
ModuleNotFoundError: No module named 'scipy'
During handling of the above exception, another exception occurred:
File "C:\Users\WX.cache\huggingface\modules\transformers_modules\Baichuan2-13B-Chat-4bits\modeling_baichuan.py", line 611, in from_pretrained
raise ImportError(f"Needs import model weight init func to run quantize.")
File "D:\1-github\Baichuan2\cli_demo.py", line 13, in init_model
model = AutoModelForCausalLM.from_pretrained(
File "D:\1-github\Baichuan2\cli_demo.py", line 47, in main
model, tokenizer = init_model()
File "D:\1-github\Baichuan2\cli_demo.py", line 86, in
main()
ImportError: Needs import model weight init func to run quantize.
各位大神,求助!使用baichuan-incBaichuan2-13B-Chat-4bits报错了,代码是直接复制hf上的
报错信息:
self.quant_state[0] = self.quant_state[0].cuda(device)
TypeError: 'NoneType' object is not subscriptable
如https://huggingface.co/baichuan-inc/Baichuan2-13B-Base/blob/main/modeling_baichuan.py#L180,
是否应该替换成LowerTriangularMaskWithTensorBias?
目前v100 32G似乎很难跑起来baichuan2这个模型了,经常报oom的错误。
我看词表似乎增加了一倍,是这个导致显存占用变大的吗?
代码完全没改过,只下载了模型后改名存到baichuan-inc里,起web_demo.py就会报这个错
XTuner 已支持 百川2 QLora 单卡微调,欢迎加入 Wechat 群交流
git clone https://github.com/internLM/xtuner
cd xtuner
pip install -e .
xtuner train configs/baichuan/baichuan2_7b_base/baichuan2_7b_base_qlora_alpaca_e3.py
# 等同于
# python xtuner/tools/train.py configs/baichuan/baichuan2_7b_base/baichuan2_7b_base_qlora_alpaca_e3.py
支持多种alpaca、arxiv_gentitle、codealpaca 等数据集开箱即训,会自动下载数据集~
请教下ceval结果怎么计算的,跟榜单上对不上呢
我们是一家金融公司,我看申请商业授权里面要求法人的身份证复印件+营业执照,这个有什么替代方法么,公司的法人是董事长,目前比较困难,之前发了邮件,没有收到回复,所以就在这里询问~~
看了论文,baichuan2 chat版本做了rlhf流程,采集了类似于hh_rlhf的数据,请问有开源rlhf数据和训练框架的计划吗?或者可以先开源一部分reward model训练数据?
As the lm_head in modeling_baichuan.py is not an instance of <class 'torch.nn.modules.linear.Linear'>, running model.resize_token_embeddings()
will result in an error. And thus new tokens can not be added to the tokenizer.
TypeError: Old language model head is of type <class 'transformers_modules.13B.modeling_baichuan.NormHead'>, which is not an instance of <class 'torch.nn.modules.linear.Linear'>.
You should either use a different resize function or make sure that `old_lm_head` are an instance of <class 'torch.nn.modules.linear.Linear'>.
CUDA SETUP: Loading binary /root/anaconda3/envs/llama2/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Traceback (most recent call last):
File "/usr/local/Baichuan2/cli_demo.py", line 88, in
main()
File "/usr/local/Baichuan2/cli_demo.py", line 49, in main
model, tokenizer = init_model()
File "/usr/local/Baichuan2/cli_demo.py", line 14, in init_model
model = AutoModelForCausalLM.from_pretrained(
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
return model_class.from_pretrained(
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat-4bits/modeling_baichuan.py", line 664, in from_pretrained
dispatch_model(model, device_map=device_map)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/big_modeling.py", line 371, in dispatch_model
attach_align_device_hook_on_blocks(
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 507, in attach_align_device_hook_on_blocks
attach_execution_device_hook(module, execution_device[module_name])
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in attach_execution_device_hook
attach_execution_device_hook(child, execution_device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in attach_execution_device_hook
attach_execution_device_hook(child, execution_device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in attach_execution_device_hook
attach_execution_device_hook(child, execution_device)
[Previous line repeated 2 more times]
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 340, in attach_execution_device_hook
add_hook_to_module(module, AlignDevicesHook(execution_device, skip_keys=skip_keys))
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 155, in add_hook_to_module
module = hook.init_hook(module)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 253, in init_hook
set_module_tensor_to_device(module, name, self.execution_device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 292, in set_module_tensor_to_device
new_value = old_value.to(device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 191, in to
s[-2][0] = s[-2][0].to(device) # offset
AttributeError: 'str' object has no attribute 'to'
谢谢
看到https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/670d17ee403f45334f53121d72feff623cc37de1/config.json#L24
权重文件是bfloat16,这里
Line 15 in 93543b9
量化后4bit模型相比于未量化的推理速度好像并没有提升,能解释下么?
Readme里只有显存和精度的对比,能提供下推理速度的对比吗?
baichuan2 可以继续使用 llama-efficient-tunning 工程微调吗?
class NormHead(nn.Module):
def __init__(self, hidden_size, vocab_size, bias=False):
super().__init__()
self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
self.first_flag = True
def forward(self, hidden_states):
if self.training:
norm_weight = nn.functional.normalize(self.weight)
elif self.first_flag:
self.first_flag = False
self.weight = nn.Parameter(nn.functional.normalize(self.weight))
norm_weight = self.weight
else:
norm_weight = self.weight
return nn.functional.linear(hidden_states, norm_weight)
1.当eval的时候,weight会被重新赋值,但是如果我eval后继续训练,是不是会有问题
2.在eval的过程中新创建这个weight是不是只是为了加速,其实train和eval都用norm_weight = nn.functional.normalize(self.weight)是不是也可以,不会有什么影响
刚刚运行web_demo.py,报错误:OSError: /media/calvin/new_disk1/models/Baichuan2-7B-Base does not appear to have a file named generation_config.json. Checkout 'https://huggingface.co//media/calvin/new_disk1/models/Baichuan2-7B-Base/None' for available files.
去huggingface看确实缺少generation_config.json文件。请问是没上传吗?
技术报告中,从baichuan2-13B不同step评测结果看到,在预训练到1000B左右时,评测任务性能发生了二次突变(不同于初始时在25%左右震荡后上升),请问这是跟模型参数有关系吗?
在量化版本的基础上微调,报错了:
The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model,please make sure that you installed bitsandbytes>=0.37.0
但 我确认安装了 bitsandbytes>=0.37.0
register_conv_template(
Conversation(
name="baichuan-chat",
roles=("<reserved_102>", "<reserved_103>"),
sep_style=SeparatorStyle.NO_COLON_SINGLE,
sep="",
stop_token_ids=[],
)
)
但是我看baichuan2里的roles应该改成下面这样?
roles=("<reserved_106>", "<reserved_107>")
>>> model.generation_config.user_token_id
195
>>> model.generation_config.assistant_token_id
196
>>> tokenizer.decode([195])
'<reserved_106>'
>>> tokenizer.decode([196])
'<reserved_107>'
把float16改成float32之后,提示没有GPU设备
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
有没有类似llama.cpp的加速,现存的库好像都不支持baichuan2的加速
The technical report "Baichuan 2: Open Large-scale Language Models" mentioned Baichuan Harmless Evaluation Dataset , will this dataset be open sourced ?
代码如下:
model = AutoModelForCausalLM.from_pretrained(r".\Baichuan2-13B-Chat",
load_in_8bit=True, device_map="auto", trust_remote_code=True)
model.save_pretrained(r'.\8bit')
model = AutoModelForCausalLM.from_pretrained(r'.\8bit', device_map="auto", trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(r".\Baichuan2-13B-Chat")
tokenizer = AutoTokenizer.from_pretrained(r".\Baichuan2-13B-Chat",
use_fast=False, trust_remote_code=True)
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)
报错如下:
in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2676 # sample
2677 probs = nn.functional.softmax(next_token_scores, dim=-1)
-> 2678 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
2680 # finished sentences should have their next token be a padding token
2681 if eos_token_id is not None:
RuntimeError: probability tensor contains either inf
, nan
or element < 0
非常感谢你们的贡献!
>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("pretrained/baichuan2-7b/base,, load_in_8bit=True, device_map="auto", trust_remote_code=True)
[2023-09-06 16:03:27,691] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-09-06 16:03:28.999241: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 16:03:30.314803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
return model_class.from_pretrained(
File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 779, in from_pretrained
return super(BaichuanForCausalLM, cls).from_pretrained(
File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 638, in __init__
and config.quantization_config["load_in_4bit"]
TypeError: 'BitsAndBytesConfig' object is not subscriptable
>>>
Baichuan2 7b-base, 7b-chat, 13b-base 和 13b-chat 都会报这个错误
https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#L537 中需要判断 config.quantization_config 的类型
想知道为什么 labels 这个地方要加eos_token_id 而不是ignore_index
if from_ == "human":
input_ids += self.user_tokens + value_ids
labels += [self.tokenizer.eos_token_id] + [self.ignore_index] * len(
value_ids
)
logits的输出我理解为,[b,s,v], b:batch size, s:长度, v:词表大小,
我好奇是在s还是v维度上求极大值?另外reduce采用的操作是sum还是mean呢?
部署baichuan-13B-chat模型报错
如题,后续会有比13b更大模型的开源吗
>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("pretrained/baichuan2-7b/base,, load_in_8bit=True, device_map="auto", trust_remote_code=True)
[2023-09-06 16:03:27,691] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-09-06 16:03:28.999241: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 16:03:30.314803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
return model_class.from_pretrained(
File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 779, in from_pretrained
return super(BaichuanForCausalLM, cls).from_pretrained(
File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 638, in __init__
and config.quantization_config["load_in_4bit"]
TypeError: 'BitsAndBytesConfig' object is not subscriptable
>>>
https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#L537 中需要判断 config.quantization_config 的类型
In the first place, thanks for creating and open sourcing the model!
While @bddppq and I were looking into running the model with a standard huggingface pipeline
, we found a minor bug that prevents it from loading the model. Specifically, if you directly create a pipeline, for example using code like
pipeline = pipeline(task=task, model=model, revision=revision, **kwargs)
it will produce a pipeline, but when you run it, it tells you that it encountered a None
object - if you look deeper, the tokenizer is missing.
Note that the tutorial code (using AutoModel and AutoTokenizer) is correct, but ideally, one would want to just use a single-line pipeline
to load the model.
Basically, when loading a pipeline and the underlying model, huggingface uses the following strategy to determine the tokenizer:
It checks if the model type is in the TOKENIZER_MAPPING
class, which is basically a manually maintained list here: https://github.com/huggingface/transformers/blob/fb7d246951d5f60aa36a7958841dfea72f51fc6b/src/transformers/models/auto/tokenization_auto.py#L403
If the above does not exist, It checks if the model's config.json
has a tokenizer_class
field. See https://github.com/huggingface/transformers/blob/fb7d246951d5f60aa36a7958841dfea72f51fc6b/src/transformers/pipelines/__init__.py#L836 .
Right now, Baichuan does not have a TOKENIZER_MAPPING
manually committed to the transformers
code repo, and the model config.json
does not have the tokenizer_class defined. As a result, when loading the model via the pipeline interface, the tokenizer is simply not loaded.
Of course, one way is to basically manually send a pull request to huggingface/transformers and update the TOKENIZER_MAPPING. This comes with two shortcomings, though:
Thus, it is actually easier to simply add one line to the model config.json
(https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/config.json) file as follows:
"tokenizer_class": "BaichuanTokenizer",
and everything will be all good.
Why do we care about this - as part of our work, we are automating the process of launching LLM models, and we would love to see Baichuan being used in a much smoother way via hf pipeline instead of having to explicitly write the python code - thanks so much for looking into it!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.