llamafamily / llama-chinese Goto Github PK

View Code? Open in Web Editor NEW

13.0K 13.0K 1.2K 19.09 MB

Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用

Home Page: https://llama.family

Python 94.48% Shell 3.62% Dockerfile 0.62% Jupyter Notebook 1.29%

finetune-llm llama llama3 llm pretraining

llama-chinese's People

Contributors

Stargazers

Watchers

Forkers

cvcuiwei ylsir rayrtfr joshill staccats liucr lycokie tutuna 0x8235 mistyr0se nicbair adambear tttonytan hannah-ymc wuzerun-888 iffy-oo yqnt418 leing2021 henryhesz zhougx88 gitcodev shuanlotus sxm1129 icecube0-0 600ml servucn sylar003 roclee81 geekcheng 1156721874 skyrookieyu lostmanwang shunligo xiyuan-code revavo flyforfreedom lylyone kaye0110 chrisyang2017 linhong00316 tim-taoxq gooqi zomens sevenchao iamkomen itsharex justloveben kurtding feihuamantian so349mng huzp123 fdkl123 tiantianlecheng lidachuan211 tony163163 williamfangca 1530426574 yangguangcccaa saxonzhang2 souloki mrdaoyuan zhanfish taurusduan cryptovbl chenghongyun tufubaba eltociear liolio202070 sojjuu jackoelv ole-e-ole masemxiao maga315 jbluv techthiyanes cryptoman0463636 hyejdy hongdangshao fiyo nuoan ouyangchucai yuanmeng1120 unixcrh sunfj totoro-li erickong2012 web3creator stormdragongardin junking1 acondess liyandan fengzhongye nycleaner mustangcoder vvgjlshn5xk liunix61 maoyikun nanchengjiatu dannyshcn liuhao-0666

llama-chinese's Issues

请问怎么将模型与langchain-ChatGLM整合

请问怎么将代码放进langchain-ChatGLM里面呢，配置到model_config.py文件里面启动，目前是不兼容的

module 'bitsandbytes' has no attribute 'nn'

在windows上运行
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('model/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('model/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['~~Human: 介绍一下**\n~~Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

出现错误
Traceback (most recent call last):
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\test.py", line 3, in
model = AutoModelForCausalLM.from_pretrained('model/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\modeling_utils.py", line 2749, in from_pretrained
model = replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 212, in replace_with_bnb_linear
model, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 173, in _replace_with_bnb_linear
_, has_been_replaced = _replace_with_bnb_linear(
[Previous line repeated 1 more time]
File "C:\Users\Hasee\Desktop\Llama2-Chinese-main\venv\lib\site-packages\transformers\utils\bitsandbytes.py", line 144, in _replace_with_bnb_linear
model._modules[name] = bnb.nn.Linear8bitLt(
AttributeError: module 'bitsandbytes' has no attribute 'nn'

3个群都满200人了，想入群

使用 4bit 參數報錯

command: python chat_gradio.py --model_name_or_path meta-llama/Llama-2-7b-chat-hf --is_4bit
error msg:
Traceback (most recent call last):
File "chat_gradio.py", line 90, in
model = AutoGPTQForCausalLM.from_quantized(args.model_name_or_path, low_cpu_mem_usage=True, device="cuda:0", use_triton=False, inject_fused_attention=False, inject_fused_mlp=False)
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/auto.py", line 105, in from_quantized
return quant_func(
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 734, in from_quantized
quantize_config = BaseQuantizeConfig.from_pretrained(model_name_or_path, **kwargs)
File "/home/asus/llama2/.venv/lib/python3.8/site-packages/auto_gptq/modeling/_base.py", line 90, in from_pretrained
with open(resolved_config_file, "r", encoding="utf-8") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

OSError: No such device (os error 19)

from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer( [' \<s\>Human: 介绍一下**\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

按照案例执行，把“meta-llama/Llama-2-7b-chat-hf”改成自己的目录，会报错
model │
│ ing_utils.py:447 in load_state_dict │
│ │
│ 444 │ """ │
│ 445 │ if checkpoint_file.endswith(".safetensors") and is_safetensors_available(): │
│ 446 │ │ # Check format of the archive │
│ ❱ 447 │ │ with safe_open(checkpoint_file, framework="pt") as f: │
│ 448 │ │ │ metadata = f.metadata() │
│ 449 │ │ if metadata.get("format") not in ["pt", "tf", "flax"]: │
│ 450 │ │ │ raise OSError( │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: No such device (os error 19)

请问是为啥？

预训练的代码会开源吗

增量训练的也行，期待能早日上线，跑一个自己的base model

训练所需资源？7b-chat，lora

如题，llama2-7b-chat，lora微调所需资源大概是？

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行

请问我尝试使用colab运行示例量化程序，但因内存问题无法启动，这是我的使用的示例程序

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized('FlagAlpha/Llama2-Chinese-13b-Chat-4bit', device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained('FlagAlpha/Llama2-Chinese-13b-Chat-4bit',use_fast=False)
input_ids = tokenizer(['<s>Human: 怎么登上火星\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

colab分配的配置为12.7G内存 T4显卡

感谢开发者开源，想问一下 huggingface 上两个模型 https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat, https://huggingface.co/FlagAlpha/Llama2-Chinese-7b-Chat finetune 使用的中文数据集是哪些呢？

感谢开发者开源，想问一下 huggingface 上两个模型 https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat, https://huggingface.co/FlagAlpha/Llama2-Chinese-7b-Chat finetune 使用的中文数据集是哪些呢？

Llama2-Chinese 13B以及其他模型在CEVAL表现怎麽样

有没有CEVAL的结果了?

bitsandbytes库始终提示You might need to add them to your LD_LIBRARY_PATH.

如图
环境安装好了，但运行始终报错
bitsandbytes库始终提示You might need to add them to your LD_LIBRARY_PATH.
目前系统环境变量已经把LD_LIBRARY_PATH和本地CUDA lib目录添加了，可始终报这个错无法启动

模型效果与chatglm2对比如何？

如题，谢谢！

Llama-2-13b-chat-hf模型好像有点问题

请问，方便检查一下您分享的Llama-2-13b-chat-hf模型吗？我发现Llama-2-13b-chat-hf和Llama-2-13b-hf的模型权值文件的sha256是一样的。

建个 qq 大群吧, 微信群人数上限500 太少了，而且超过200 还不能主动加入

请问下，微调后的合并、推理有示例吗？

cpu运行，加速

开源的13B模型和网站上体验的demo感觉完全不一样

https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat 下载了这个模型，在本地测试了一下他的写诗能力，发现和在线网站上的demo （在线体验：llama.family）完全不一样，请问是什么原因。
本地：

在线：

模型调用示例代码prompt格式不对

参考 https://github.com/facebookresearch/llama/blob/main/llama/generation.py
https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat/blob/main/model.py

如果我想训练llama2cn写小说，应该如何组织训练集？

比如，提示词是【请续写以下句子】

问题是提示词加上上一句，回答是下一句，这样子吗？

能否提供sha256

下载了国内地址的7b模型后, 用最新的llama.cpp convert会出错, 怀疑是下载过程中出错了, 能否提供下各个模型的sha256 方便对比

请问推理时如何batch样本预测

在使用13b模型预测结果时，单条样本运行时间较长，请问如何batch预测结果

有做中文词表扩充吗

请问为什么我部署的LLaMa2-chinese-13b-chat的输出结果和您这边在线提供的LLama-family上差距很大？

怎样quantize？

https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat/tree/main
如果我的本机用llama.cpp 跑是不是还得自己quantize？

练了个不错的中文llama2 13b chat模型

llama2-13b-Chinese-chat

昨天练的，体验对话效果不错，方便的话可以记录一下。

安装 Deepspeed时，Unable to pre-compile async_io

Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
No hostfile exists at /job/hostfile, installing locally
Building deepspeed wheel
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
Traceback (most recent call last):
File "C:\Llama2-Chinese\DeepSpeed\setup.py", line 165, in
abort(f"Unable to pre-compile {op_name}")
File "C:\Llama2-Chinese\DeepSpeed\setup.py", line 51, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
Error on line 155
Fail to install deepspeed

RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0.  Target sizes: [768, 8192].  Tensor sizes: [2048, 8192]
 #033[2m#033[3mrank#033[0m#033[2m=#033[0m3#033[0m
#033[2m2023-07-21T07:52:53.999740Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 2 failed to start:
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 237, in get_model
    return FlashLlamaSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 185, in __init__
    self.load_weights(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 289, in load_weights
    module._parameters[param_name][: tensor.shape[0]] = tensor
RuntimeError: The expanded size of the tensor (768) must match the existing size (2048) at non-singleton dimension 0.  Target sizes: [768, 8192].  Tensor sizes: [2048, 8192]
#033[2m2023-07-21T07:52:53.999793Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
#033[2m2023-07-21T07:52:54.729163Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 3 terminated
Error: ShardCannotStart

$ pip install -r requirements.txt 报错

$ pip install -r requirements.txt
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, http://pypi.mirrors.ustc.edu.cn/simple/
Collecting git+https://github.com/PanQiWei/AutoGPTQ.git (from -r requirements.txt (line 4))
Cloning https://github.com/PanQiWei/AutoGPTQ.git to c:\users\admin\appdata\local\temp\pip-req-build-2gh473wc
Running command git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc'
fatal: unable to access 'https://github.com/PanQiWei/AutoGPTQ.git/': Recv failure: Connection was reset
fatal: could not fetch d047af6e8e361b71bb7a5b915a8c9cff4f00f1e9 from promisor remote
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

error: subprocess-exited-with-error

git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc' did not run successfully.
exit code: 128

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git 'C:\Users\Admin\AppData\Local\Temp\pip-req-build-2gh473wc' did not run successfully.
exit code: 128

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

（1）请问一下几种类型的数据集之前类型差异巨大，如何去保证训练的模型泛化性不被降低？
（2）这边针对README中的各类数据集预处理方法可以详细说明一下吗？
感谢！