baichuan-inc / baichuan2 Goto Github PK

View Code? Open in Web Editor NEW

4.0K 40.0 276.0 4.64 MB

A series of large language models developed by Baichuan Intelligent Technology

Home Page: https://huggingface.co/baichuan-inc

License: Apache License 2.0

Python 100.00%

artificial-intelligence benchmark ceval chatgpt chinese gpt gpt-4 huggingface large-language-models llama2

baichuan2's Issues

微调baichuan2时提示no attribute named "future_mask"

我是用transformers的trainer类去做的微调训练，每次一到eval的步骤就会报错，信息如下：
AttributeError: Caught AttributeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/peft/peft_model.py", line 931, in forward
return self.base_model(
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 692, in forward
outputs = self.model(
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 404, in forward
alibi_mask = self.get_alibi_mask(inputs_embeds, seq_length_with_past)
File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 354, in get_alibi_mask
mask = self.future_mask[
File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BaichuanModel' object has no attribute 'future_mask'

之后我又改用Llam-efficient-tuning用和调baichuan1一样的方法去调baichuan2,使用了deepspeed，同样是在eval步骤出错。报错：
AttributeError: 'Parameter' object has no attribute 'ds_status'
求问是什么原因

量化版本报错

Exception in thread Thread-2:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 1648, in generate
return self.sample(
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 2766, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

预训练

大佬好，想问一下，baichuan2基于文本的预训练代码有吗？会开放吗？想在做IFT之前做一下预训练。

请问有开源 reward model 的计划吗？

训练所需的硬件配置

这个模型，如果要自己进行微调，需要大概什么样的硬件配置，大概需要多久。

OOM问题

同样的推理代码。我把llama2-7b-chat的checkpoint换成了baichuan2-7b-chat。就提示显存不够了，是否有什么魔法

为什么oom

https://github.com/baichuan-inc/Baichuan2#%E9%87%8F%E5%8C%96%E6%95%88%E6%9E%9C ，你这里不是写的fp16只用26吗，为什么我v100-32G，betch-size1都oom，难道另有玄只因

训练和推理以及量化占用显存情况

1、训练显存占用情况；
2、推理显存占用情况，能否支持多卡；
3、推理量化显存占用情况。

[Evaluation] 提供 Baichuan2 模型在 OpenCompass 上的评测结果

我们在 opencompass 上对 Baichuan2-7B-Base / Baichuan2-7B-Chat / Baichuan2-13B-Base / Baichuan2-13B-Chat 模型进行了评测，评测结果如下：

更多评测细节请见 https://opencompass.org.cn/leaderboard-llm

评测交流群链接：https://cdn.vansin.top/opencompass/baichuan.jpg

使用 Baichuan2-13B-Chat-4bits 模型报错

1、python3.10
2、按照要求安装了requirements.txt
3、git clone https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits
4、修改cli_demo.py里面的模型加载代码：
model = AutoModelForCausalLM.from_pretrained(
"D:\2-huggingface\Baichuan2-13B-Chat-4bits",
device_map="auto",
trust_remote_code=True
)
model.generation_config = GenerationConfig.from_pretrained(
"D:\2-huggingface\Baichuan2-13B-Chat-4bits"
)
tokenizer = AutoTokenizer.from_pretrained(
"D:\2-huggingface\Baichuan2-13B-Chat-4bits",
use_fast=False,
trust_remote_code=True
)
5、执行python cli_demo.py报错：
Exception has occurred: ImportError
Needs import model weight init func to run quantize.
File "C:\Users\WX.cache\huggingface\modules\transformers_modules\Baichuan2-13B-Chat-4bits\modeling_baichuan.py", line 606, in from_pretrained
from .quantizer import init_model_weight_int4
File "C:\Users\WX.cache\huggingface\modules\transformers_modules\Baichuan2-13B-Chat-4bits\quantizer.py", line 1, in
import bitsandbytes as bnb
ModuleNotFoundError: No module named 'scipy'

During handling of the above exception, another exception occurred:

File "C:\Users\WX.cache\huggingface\modules\transformers_modules\Baichuan2-13B-Chat-4bits\modeling_baichuan.py", line 611, in from_pretrained
raise ImportError(f"Needs import model weight init func to run quantize.")
File "D:\1-github\Baichuan2\cli_demo.py", line 13, in init_model
model = AutoModelForCausalLM.from_pretrained(
File "D:\1-github\Baichuan2\cli_demo.py", line 47, in main
model, tokenizer = init_model()
File "D:\1-github\Baichuan2\cli_demo.py", line 86, in
main()
ImportError: Needs import model weight init func to run quantize.

和通义千问Qwen-7B比起来怎么样，看opencompass的榜单，千问貌似仍领先Baichuan2-7B

上下文窗口长度同一代为4096吗？

TypeError: 'NoneType' object is not subscriptable,使用baichuan-incBaichuan2-13B-Chat-4bits报错了

各位大神，求助！使用baichuan-incBaichuan2-13B-Chat-4bits报错了，代码是直接复制hf上的
报错信息：
self.quant_state[0] = self.quant_state[0].cuda(device)
TypeError: 'NoneType' object is not subscriptable

-

13B modeling.py中xformers加速时未添加alibi_mask？

如https://huggingface.co/baichuan-inc/Baichuan2-13B-Base/blob/main/modeling_baichuan.py#L180，
是否应该替换成LowerTriangularMaskWithTensorBias？

相比一代，显存占用更多了

目前v100 32G似乎很难跑起来baichuan2这个模型了，经常报oom的错误。
我看词表似乎增加了一倍，是这个导致显存占用变大的吗？

chat 13b模型部署报错 not a string

代码完全没改过，只下载了模型后改名存到baichuan-inc里，起web_demo.py就会报这个错

[Feature Support] XTuner 已支持百川 2 QLora 微调，单卡可训

XTuner 已支持百川2 QLora 单卡微调，欢迎加入 Wechat 群交流

git clone https://github.com/internLM/xtuner
cd xtuner
pip install -e .
 
xtuner train configs/baichuan/baichuan2_7b_base/baichuan2_7b_base_qlora_alpaca_e3.py
# 等同于
# python xtuner/tools/train.py  configs/baichuan/baichuan2_7b_base/baichuan2_7b_base_qlora_alpaca_e3.py

支持多种alpaca、arxiv_gentitle、codealpaca 等数据集开箱即训，会自动下载数据集~

https://github.com/internLM/xtuner

ceval结果

请教下ceval结果怎么计算的，跟榜单上对不上呢

咨询商业授权问题

我们是一家金融公司，我看申请商业授权里面要求法人的身份证复印件+营业执照，这个有什么替代方法么，公司的法人是董事长，目前比较困难，之前发了邮件，没有收到回复，所以就在这里询问~~

IndentationError: unindent does not match any outer indentation level

在huggingface上下载了baichuan2-13b-chat到服务器，加载模型的时候，发现他会在：
~/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-chat/modeling_baichuan.py:823)代码最后自动append一段代码，从而引起error：IndentationError: unindent does not match any outer indentation level

对齐的框架和数据

看了论文，baichuan2 chat版本做了rlhf流程，采集了类似于hh_rlhf的数据，请问有开源rlhf数据和训练框架的计划吗？或者可以先开源一部分reward model训练数据？

Can not add new tokens.

As the lm_head in modeling_baichuan.py is not an instance of <class 'torch.nn.modules.linear.Linear'>, running model.resize_token_embeddings() will result in an error. And thus new tokens can not be added to the tokenizer.

TypeError: Old language model head is of type <class 'transformers_modules.13B.modeling_baichuan.NormHead'>, which is not an instance of <class 'torch.nn.modules.linear.Linear'>.
You should either use a different resize function or make sure that `old_lm_head` are an instance of <class 'torch.nn.modules.linear.Linear'>.

报错 AttributeError: 'str' object has no attribute 'to'

CUDA SETUP: Loading binary /root/anaconda3/envs/llama2/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Traceback (most recent call last):
File "/usr/local/Baichuan2/cli_demo.py", line 88, in
main()
File "/usr/local/Baichuan2/cli_demo.py", line 49, in main
model, tokenizer = init_model()
File "/usr/local/Baichuan2/cli_demo.py", line 14, in init_model
model = AutoModelForCausalLM.from_pretrained(
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
return model_class.from_pretrained(
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat-4bits/modeling_baichuan.py", line 664, in from_pretrained
dispatch_model(model, device_map=device_map)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/big_modeling.py", line 371, in dispatch_model
attach_align_device_hook_on_blocks(
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 507, in attach_align_device_hook_on_blocks
attach_execution_device_hook(module, execution_device[module_name])
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in attach_execution_device_hook
attach_execution_device_hook(child, execution_device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in attach_execution_device_hook
attach_execution_device_hook(child, execution_device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in attach_execution_device_hook
attach_execution_device_hook(child, execution_device)
[Previous line repeated 2 more times]
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 340, in attach_execution_device_hook
add_hook_to_module(module, AlignDevicesHook(execution_device, skip_keys=skip_keys))
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 155, in add_hook_to_module
module = hook.init_hook(module)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/hooks.py", line 253, in init_hook
set_module_tensor_to_device(module, name, self.execution_device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 292, in set_module_tensor_to_device
new_value = old_value.to(device)
File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 191, in to
s[-2][0] = s[-2][0].to(device) # offset
AttributeError: 'str' object has no attribute 'to'

谢谢

模型权重数据类型为bfloat16，推理使用float16？

看到https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/670d17ee403f45334f53121d72feff623cc37de1/config.json#L24
权重文件是bfloat16，这里

Baichuan2/cli_demo.py

Line 15 in 93543b9

torch_dtype=torch.float16,

为什么设置为float16呢？

量化后4bit模型相比于未量化的推理速度好像并没有提升

量化后4bit模型相比于未量化的推理速度好像并没有提升，能解释下么？

Readme里只有显存和精度的对比，能提供下推理速度的对比吗？

账号封禁问题

你们百川把chat的体验页面开放出来，我除了想了解你们基本能力肯定还要看你们模型的安全性，我昨天给你们测试安全性(可能有涉及政治敏感问题)，你们就把我封了？这是几个意思？其他模型包括文心一言，智谱清言，360智脑，讯飞星火等，我都会测试他们的安全性，从来没遇到像你们这样处理用户的！！！

小白请问各位大佬，想请问一下 base模型和chat模型的区别是什么

baichuan2 可以继续使用 llama-efficient-tunning 工程微调吗？

关于NormHead的疑问

class NormHead(nn.Module):
    def __init__(self, hidden_size, vocab_size, bias=False):
        super().__init__()
        self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        self.first_flag = True

    def forward(self, hidden_states):
        if self.training:
            norm_weight = nn.functional.normalize(self.weight)
        elif self.first_flag:
            self.first_flag = False
            self.weight = nn.Parameter(nn.functional.normalize(self.weight))
            norm_weight = self.weight
        else:
            norm_weight = self.weight
        return nn.functional.linear(hidden_states, norm_weight)

1.当eval的时候，weight会被重新赋值，但是如果我eval后继续训练，是不是会有问题
2.在eval的过程中新创建这个weight是不是只是为了加速，其实train和eval都用norm_weight = nn.functional.normalize(self.weight)是不是也可以，不会有什么影响

huggingface模型内部缺少文件

刚刚运行web_demo.py，报错误：OSError: /media/calvin/new_disk1/models/Baichuan2-7B-Base does not appear to have a file named generation_config.json. Checkout 'https://huggingface.co//media/calvin/new_disk1/models/Baichuan2-7B-Base/None' for available files.
去huggingface看确实缺少generation_config.json文件。请问是没上传吗？

这回答的偏差有点大，把侠客行和将进酒搞混了

baichuan2-13B评测任务性能突变

技术报告中，从baichuan2-13B不同step评测结果看到，在预训练到1000B左右时，评测任务性能发生了二次突变（不同于初始时在25%左右震荡后上升），请问这是跟模型参数有关系吗？

对 Baichuan 1 的推理优化迁移到 Baichuan 2的转化代码有问题

下载的模型文件里面有三个.bin的文件提供的代码会报错

baichuan2 支持在量化版本的基础上，比如Baichuan2-13B-Chat-4bits,直接 finetuning吗

在量化版本的基础上微调，报错了：
The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model,please make sure that you installed bitsandbytes>=0.37.0
但我确认安装了 bitsandbytes>=0.37.0

fastchat 里面的baichuan config还能用吗？

现在是这个（https://github.com/lm-sys/FastChat/blob/56744d1d947ad7cc94763e911529756b17139505/fastchat/conversation.py#L782）

register_conv_template(
    Conversation(
        name="baichuan-chat",
        roles=("<reserved_102>", "<reserved_103>"),
        sep_style=SeparatorStyle.NO_COLON_SINGLE,
        sep="",
        stop_token_ids=[],
    )
)

但是我看baichuan2里的roles应该改成下面这样？

        roles=("<reserved_106>", "<reserved_107>")

>>> model.generation_config.user_token_id
195
>>> model.generation_config.assistant_token_id
196
>>> tokenizer.decode([195])
'<reserved_106>'
>>> tokenizer.decode([196])
'<reserved_107>'

量化后推理时间较长，有什么好的优化方法吗？

跑量化模型4bits可以用CPU推理吗

把float16改成float32之后，提示没有GPU设备

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

13b-chat模型输出结果异常

第一次尝试baichuan2-13b-chat模型，使用web_demo.py运行。
刚开始问你好，回复的内容就很不好，如下图，请问一下是什么情况？

类似 llama.cpp的加速？

有没有类似llama.cpp的加速，现存的库好像都不支持baichuan2的加速

Baichuan Harmless Evaluation Dataset will be open sourced?

The technical report "Baichuan 2: Open Large-scale Language Models" mentioned Baichuan Harmless Evaluation Dataset ， will this dataset be open sourced ?

加载8bit量化/离线量化模型报错：RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

背景：可以正常使用无量化模型。保存8bit量化模型过程无报错。按照官方文档中加载8bit量化/离线量化模型报错：RuntimeError: probability tensor contains either inf, nan or element < 0

代码如下：
model = AutoModelForCausalLM.from_pretrained(r".\Baichuan2-13B-Chat",
load_in_8bit=True, device_map="auto", trust_remote_code=True)
model.save_pretrained(r'.\8bit')
model = AutoModelForCausalLM.from_pretrained(r'.\8bit', device_map="auto", trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(r".\Baichuan2-13B-Chat")
tokenizer = AutoTokenizer.from_pretrained(r".\Baichuan2-13B-Chat",
use_fast=False, trust_remote_code=True)

messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)

报错如下：
in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2676 # sample
2677 probs = nn.functional.softmax(next_token_scores, dim=-1)
-> 2678 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
2680 # finished sentences should have their next token be a padding token
2681 if eos_token_id is not None:

RuntimeError: probability tensor contains either inf, nan or element < 0

求API

非常感谢你们的贡献！

使用 transformers 8bit 在线量化的时候会报 `'BitsAndBytesConfig' object is not subscriptable` 错误

错误复现

>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("pretrained/baichuan2-7b/base,, load_in_8bit=True, device_map="auto", trust_remote_code=True)
[2023-09-06 16:03:27,691] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-09-06 16:03:28.999241: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 16:03:30.314803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
    return model_class.from_pretrained(
  File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 779, in from_pretrained
    return super(BaichuanForCausalLM, cls).from_pretrained(
  File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 638, in __init__
    and config.quantization_config["load_in_4bit"]
TypeError: 'BitsAndBytesConfig' object is not subscriptable
>>>

Baichuan2 7b-base, 7b-chat, 13b-base 和 13b-chat 都会报这个错误

解决方法

https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#L537 中需要判断 config.quantization_config 的类型

sft代码疑问

想知道为什么 labels 这个地方要加eos_token_id 而不是ignore_index
if from_ == "human":
input_ids += self.user_tokens + value_ids
labels += [self.tokenizer.eos_token_id] + [self.ignore_index] * len(
value_ids
)

请问max_z loss在logits的哪个维度上取极值呢

logits的输出我理解为，[b,s,v]， b:batch size， s:长度, v:词表大小,
我好奇是在s还是v维度上求极大值？另外reduce采用的操作是sum还是mean呢？

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

部署baichuan-13B-chat模型报错

有无开源更大模型的计划

如题，后续会有比13b更大模型的开源吗

使用 transformers 8bit 在线量化的时候会报 `'BitsAndBytesConfig' object is not subscriptable` 错误

复现示例

>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("pretrained/baichuan2-7b/base,, load_in_8bit=True, device_map="auto", trust_remote_code=True)
[2023-09-06 16:03:27,691] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-09-06 16:03:28.999241: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 16:03:30.314803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
    return model_class.from_pretrained(
  File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 779, in from_pretrained
    return super(BaichuanForCausalLM, cls).from_pretrained(
  File "/home/liutianwei/.conda/envs/starwhale/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/home/liutianwei/.cache/huggingface/modules/transformers_modules/base/modeling_baichuan.py", line 638, in __init__
    and config.quantization_config["load_in_4bit"]
TypeError: 'BitsAndBytesConfig' object is not subscriptable
>>>

解决方法

https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#L537 中需要判断 config.quantization_config 的类型

bugreport: need to add "tokenizer_class": "BaichuanTokenizer" to model config

In the first place, thanks for creating and open sourcing the model!

Symptom

While @bddppq and I were looking into running the model with a standard huggingface pipeline, we found a minor bug that prevents it from loading the model. Specifically, if you directly create a pipeline, for example using code like

pipeline = pipeline(task=task, model=model, revision=revision, **kwargs)

it will produce a pipeline, but when you run it, it tells you that it encountered a None object - if you look deeper, the tokenizer is missing.

Note that the tutorial code (using AutoModel and AutoTokenizer) is correct, but ideally, one would want to just use a single-line pipeline to load the model.

Diagnosis

Basically, when loading a pipeline and the underlying model, huggingface uses the following strategy to determine the tokenizer:

It checks if the model type is in the TOKENIZER_MAPPING class, which is basically a manually maintained list here: https://github.com/huggingface/transformers/blob/fb7d246951d5f60aa36a7958841dfea72f51fc6b/src/transformers/models/auto/tokenization_auto.py#L403
- the map is further defined here https://github.com/huggingface/transformers/blob/fb7d246951d5f60aa36a7958841dfea72f51fc6b/src/transformers/models/auto/configuration_auto.py#L30
- and here https://github.com/huggingface/transformers/blob/fb7d246951d5f60aa36a7958841dfea72f51fc6b/src/transformers/models/auto/tokenization_auto.py#L53
- Basically, this is a manually maintained list that hard-codes what tokenizer should be used by what model type.
If the above does not exist, It checks if the model's config.json has a tokenizer_class field. See https://github.com/huggingface/transformers/blob/fb7d246951d5f60aa36a7958841dfea72f51fc6b/src/transformers/pipelines/__init__.py#L836 .

Right now, Baichuan does not have a TOKENIZER_MAPPING manually committed to the transformers code repo, and the model config.json does not have the tokenizer_class defined. As a result, when loading the model via the pipeline interface, the tokenizer is simply not loaded.

What to do

Of course, one way is to basically manually send a pull request to huggingface/transformers and update the TOKENIZER_MAPPING. This comes with two shortcomings, though:

existing transformer users have to wait for transformer to push a new version, update to the new version, and then use it.
Every new model release from Baichuan, if comes with model type changes, need to repeat the above process.

Thus, it is actually easier to simply add one line to the model config.json (https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/config.json) file as follows:

"tokenizer_class": "BaichuanTokenizer",

and everything will be all good.

Why do we care about this - as part of our work, we are automating the process of launching LLM models, and we would love to see Baichuan being used in a much smoother way via hf pipeline instead of having to explicitly write the python code - thanks so much for looking into it!

baichuan-inc / baichuan2 Goto Github PK

baichuan2's Issues

背景：可以正常使用无量化模型。保存8bit量化模型过程无报错。按照官方文档中加载8bit量化/离线量化模型报错：RuntimeError: probability tensor contains either inf, nan or element < 0

错误复现

解决方法

复现示例

解决方法

Symptom

Diagnosis

What to do

Recommend Projects

Recommend Topics

Recommend Org