facico / chinese-vicuna Goto Github PK

View Code? Open in Web Editor NEW

4.1K 58.0 428.0 258.06 MB

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca

Home Page: https://github.com/Facico/Chinese-Vicuna

License: Apache License 2.0

Python 37.01% Shell 1.51% CMake 0.63% C++ 10.65% C 50.19%

llama alpaca chinese vicuna

chinese-vicuna's Introduction

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案

| English | 中文 | NOTE&FAQ(Please take a look before using)

This is the repo for the Chinese-Vicuna project, which aims to build and share instruction-following Chinese LLaMA model tuning methods which can be trained on a single Nvidia RTX-2080TI, multi-round chatbot which can be trained on a single Nvidia RTX-3090 with the context len 2048.

Why is it called Vicuna: In view of the successful development of alpaca models such as llama,alpaca,guanaco，We want to train a Chinese small alpaca like Vicuna, small but strong enough !

The advantages of our solution are high parameter efficiency, graphics card friendliness, and easy deployment:

Llama-7B instruction tuning is possible on a 2080Ti (11G) (7b-instruct)
Llama-13B instruction tuning is possible on a 3090 (24G) (13b-instruct)
Llama 7B can be fine-tuned on 3090 even for conversations of 2048 length; Use 50,000 pieces of data to get good results (chatv1)
Llama 7B fine-tuning example on medical and legal domains
Support qlora-4bit which can train Llama 13B on 2080Ti.
Easily deployable on 2080Ti/3090, support multiple-gpu inference, which can reduce VRAM more.

The repo contains:

code for finetune the model
code for generation based on trained model
code for run on CPU (fp16 or int4 is support, in purely C++)
tools to download/convert/quantify original facebook llama.ckpt

This is our instruction demo (with beam-size=4, so you will see 4 process output in the meantime):

tmp.mp4

This is our multi-turn instruction demo (with beam-size=4, so you will see 4 process output in the meantime):

tmp.mp4

NOTICE!

Before asking questions, take a look at this FAQ first! In the FAQ, you can find how to solve problems may be encountered when installing and using this project.

What‘s New

June, 12, 2023: Release Chinese-Vicuna-4bit andChinese-Vicuna-4bit-11600 which can be continue-finetuned
June, 1, 2023: support for 4bit training + inference, providing a multi-GPU inference interface (NOTICE THAT the environment is different from the original 8bit! Also provides test_tokenizers.py to further check EOS token)
May 17, 2023: Llama 7B fine-tuning example on legal domains, The performance is in here
May 10, 2023: Released chatv1 which have better conversational ability. The performance is in here
May 10, 2023: Released instruct_chat_50k.jsonl which is composed of 30k Chinese sharegpt dataset and 20k alpaca-instruction-Chinese-dataset
April 11, 2023: Released our continuous-finetune on the vertical corpus of Chinese medical quizzes Chinese-Vicuna-medical.Provides examples of vertical corpus training
April 4, 2023: Add performance for 13B, which trains on a single 3090.
April 1, 2023: Add better support for multi-turn chat in chat.py ( Now support 4 generation mode in stream mode/typewriter style: beam search, greedy, sample, beam sample ; We also add cancel button for regeneration )
March 29, 2023: Add more detailed test samples. performance
March 29, 2023: Added breakpoint retraining interface to support continued training of other datasets from our checkpoint
March 29, 2023: Released our new 13B-based lora model
March 28, 2023: Released our model on huggingface
March 27, 2023: Released checkpoint-final for training 3 epochs on belle+guanaco
March 27, 2023: Added multi-round interactive dialog script with alpaca-lora-serve service
March 29, 2023: Added gradio typewriter-like output with beam search, better user interaction support.
March 26, 2023: Provides a quantitative approach
March 24, 2023: Released checkpoint-8000 for training about 1.5 epochs on belle+guanaco（100w data）
March 23, 2023: Released checkpoint-4000 with 50w data training
March 23, 2023: Deploy the code for fine-tuning and inferencing in colab
March 23, 2023: Provides code that can be used for inference in pure c++

Vicuna

what's new
what is the meaning
try on colab
performance
- Checkpoint-4000(Facico/Chinese-Vicuna-lora-7b-0.75epoch-belle-and-guanaco)
- Checkpoint-8000(Facico/Chinese-Vicuna-lora-7b-1.5epoch-belle-and-guanaco)
- Checkpoint-final(Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco) and it is used for multiple rounds of dialogue
What we need?
- code、data、Large Language Model、LORA model、Device
How to use
- Installing、Multi-gpu training、Single-gpu training、Inference and use gradio to generate a web page(Streaming mode+beam search)、 multi-round interaction and use gradio to generate a web page(Streaming mode+beam search)、Streaming mode base on alpaca-lora-serve
inference on CPU with pure C++
More tools，for more details, see tool readme
- ways for faster weight download ( 8MB/s )download_llama.sh
- convert tools between the original facebook checkpoint and huggingface formatconvert_llama.py
- a quantitative approach that requires less than 4G graphics memory for inference
Possible problems encountered
todo
citation

Overview

LLaMA paper: https://arxiv.org/abs/2302.13971v1
Self-Instruct paper: https://arxiv.org/abs/2212.10560
data generation: https://github.com/LianjiaTech/BELLE and https://guanaco-model.github.io/
the first work: https://github.com/tatsu-lab/stanford_alpaca

We currently select the combination of BELLE and Guanaco data as our main training dataset. We will train on multi-turn instruction data.

What is the meaning?

Similar to the explosion of the stable diffusion model, platforms like civitai have emerged, consisting of a base model + various LORA models in an open source community.

The repo hopes to help you to train these LORA models.

what is LORA?: Simply, it's a plugin used to help adapt large models to your dataset, technical details can be found inLoRA: Low-Rank Adaptation of Large Language Models。Its advantage is that the finetune is very fast when it comes to getting a small model, about 30M, and the key is that it supports Plug and Play. As can be expected, this is a very suitable architecture for open source ecology.

Here, we will help you train through a very low-demand environment, with only one 2080 (11G) to achieve certain results.

Try on colab

colab link	Descriptions
	Use the specified lora model to inference and generate a webui
	Finetune with the data we collected
	Provide a webui for multi-round dialogue interaction

Performance

7B performance can be seen here, 13B is here (which train on a single 3090). There are also people reporting that our method works on 33B and gains good result.

history-checkpoint performance can be seen here

checkpoint-4000(7B)

Human	Chinese-Vicuna
生成一份python的排序代码	A: def sort_list(lst): lst.sort() return lst
自然语言处理是什么	自然语言处理是一种计算机科学领域，旨在使计算机能够理解和处理自然语言。
能给我讲一段笑话吗	为什么猫吃鱼？因为它们喜欢游泳。
将下面文字翻译成英文。猫喜欢吃鱼，所以它们会游泳	Cats like to eat fish, so they will swim.
233和122的和等于多少	233和122的和等于355。
华为是一家怎样的公司	华为是一家全球范围内最大的移动通信设备制造商之一，以及一家全球范围内最大的移动通信设备制造商之一。华为公司的业务涵盖包括移动通信设备、互联网设备、智能家居设备等。
我想和女朋友在北京约会，能推荐几个约会的地方吗	北京有很多约会的地方，可以推荐以下几个约会的地方: 1. 北京艺术博物馆 2. 颐和园 3. 北京故宫博物馆 4. 北京***广场 5. 颐和园风景区
用python写一个二分查找的代码	```python def binary_search(arr, target): low = 0 high = len(arr) - 1 while low <= high: mid = (low + high) // 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 return -1
怎么让自己精力充沛，列5点建议	1. 保持良好的睡眠习惯；2. 定期锻炼身体；3. 学习新技能或知识；4. 与家人和朋友交流；5. 多听放松音乐。
父母都姓吴，取一些男宝宝和女宝宝的名字	男宝宝可以是吴杰、吴伟、吴飞、吴荣等。女宝宝可以是吴华、吴秀、吴美、吴婷等。

This LORA model is the result of 30 hours of training with a 2080Ti (checkpoint-4000), about 50w data

checkpoint-8000(7B, about 1.5 epoch on belle+guanaco)

Human	Chinese-Vicuna	Note
写一篇英文散文诗，主题是春雨，想象自己是春雨，和英国古代诗人莎士比亚交流	A: Spring rain falls softly on my face, A cool breeze blows through my hair, The sun shines brightly in the sky, And I feel as if I am flying high. The birds chirp happily in the trees, The flowers bloom and their scent fills the air, The bees buzz gently from flower to flower, And the world seems to come alive once more. I can hear the rush of the river, The sound of the waterfalls, The splash of the raindrops on the rocks, And the gentle murmur of the stream. My heart is filled with joy and peace, As I take in the beauty around me, The warmth of the sun on my skin, And the freshness of the spring air.	Repetition Penalty=2.2
写一篇英文散文诗，主题是春雨，想象自己是春雨，和英国古代诗人莎士比亚交流	春雨滴滴滚滚，滚滚滚滚滚滚。滚滚滚滚滚滚，滚滚滚滚滚滚。春雨滚滚滚，滚滚滚滚滚滚。滚滚滚滚滚滚，滚滚滚滚滚滚。春雨滚滚滚，滚滚滚滚滚滚。滚滚滚滚滚滚，滚滚滚滚滚滚。春雨滚滚滚，滚滚滚	Repetition Penalty=1.0
Spring rain falls softly on my face, A cool breeze blows through my hair, The sun shines brightly in the sky, And I feel as if I am flying high. The birds chirp happily in the trees, The flowers bloom and their scent fills the air, The bees buzz gently from flower to flower, And the world seems to come alive once more. I can hear the rush of the river, The sound of the waterfalls, The splash of the raindrops on the rocks, And the gentle murmur of the stream. My heart is filled with joy and peace, As I take in the beauty around me, The warmth of the sun on my skin, And the freshness of the spring air. 把上面的散文诗翻译成中文	春雨轻轻洒落在我的脸上，轻轻的风轻轻吹过我的头发，阳光明媚地照耀在天空，我感觉自己在飞高。鸟儿在树枝上唱歌，花朵开放，它们的香气充满了空气，蜜蜂在花朵间飞舞，它们的鸣声充满了温暖。我听到河流的声音，水瀑在岩石上轻轻滴滴，雨水在石头上轻轻滴滴，它们的声音充满了宁静。我的心充满了喜悦和平，我感受到热浪的阳光，温暖的空气洒在我的脸上，我感受到春季的美	beams=2，max tokens=350
Classify the sentiment of the following sentence into Positive, Neutral, or Negative: Super excited about teaching Stanford’s first course on Large Language Models! Check the syllabus out here	Positive
今天天气怎么样，把这句话翻译成英语	What's the weather like today?
今天天气怎么样，把这句话翻译成日语	今天の天気は何ですか？
明天就假期结束了，有点抗拒上班，应该什么办？	明天就假期结束了，有点抗拒上班，应该什么办？可以考虑一些有趣的活动，比如去看电影、阅读书籍、听音乐、去游泳池或者参加户外活动等。此外，也可以考虑一些家庭活动，比如做烹饪、做玩具、做手工艺品等。
Tell me five words that rhyme with 'shock'.	Five words that rhyme with'shock' are: 1. Rock 2. Lock 3. Cock 4. Hock 5. Stock

We can find that you do not punish him a little, he will mess up, if the generation effect is not good, you can increase the Repetition Penalty, beams, topk.
If your input is long, we suggest you can turn down the beams
The checkpoint-8000 has some Japanese language skills, but not strong

checkpoint-final(7B, 3 epoch on belle+guanaco)

Human	Chinese-Vicuna	Note
生成一份python的排序代码	def bubble_sort(arr): for i in range(len(arr)): for j in range(len(arr)-i-1): if arr[j] > arr[j+1]: arr[j], arr[j+1] = arr[j+1], arr[j] return arr
生成一份简单的python的排序代码	def sort_list(lst): return sorted(lst)
用python写一个二分查找的代码	def binary_search(arr, target): low = 0 high = len(arr) - 1 while low <= high: mid = (low + high) // 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 return -1
用python写一个递归的二分查找的代码	def binary_search(arr, target): if len(arr) == 0: return -1 mid = len(arr) // 2 if arr[mid] == target: return mid if arr[mid] < target: return binary_search(arr, target, 0, mid-1) else: return binary_search(arr, target, mid+1, len(arr)-1)
自然语言处理是什么	自然语言处理（Natural Language Processing，NLP）是一种计算机科学领域，旨在使计算机能够理解、处理和生成自然语言。它包括机器翻译、语音识别、文本分类、情感分析等技术。
能给我讲一段笑话吗	为什么鸟儿喜欢唱歌？因为它们知道，如果不唱歌，他们就会被鸟巢抓住！	Repetition Penalty=2.0
能给我讲一段关于狗笑话吗	为什么狗不喜欢骑自行车？因为它们总是想要下车！	Repetition Penalty=2.0
将下面文字翻译成英文。猫喜欢吃鱼，所以它们会游泳	Cats like to eat fish, so they will swim.
我想和女朋友在北京约会，能推荐几个约会的地方吗	北京有很多很好的约会地方，比如颐和园、故宫、***广场等。
我想和女朋友在北京约会，能推荐几个约会的地方并给出攻略吗	北京有很多受欢迎的约会地方，以下是几个推荐: 1. 颐和园: 颐和园是北京最著名的公园之一，拥有丰富的文化遗产和美丽的景观。 2. *广场: 广场是北京最受欢迎的旅游景点之一，拥有丰富的文化遗产和美丽的景观。 3. 故宫博物院: 故宫博物院是最受欢迎的旅游景点之一，拥有丰富的文化遗产和美丽的景观。 4. 长城: 长城是*最受欢迎的旅游景点之一，拥有丰富的文化遗产和美
233和122的和等于多少	233和122的和等于355。
华为是一家怎样的公司	华为是一家全球领先的技术公司，拥有丰富的技术研发能力和优质的产品线。它的产品涵盖了各种领域，包括移动通信、计算机、互联网、智能家居等。华为拥有丰富的研发人员和技术团队，以及优质的产品质量。
西瓜是什么，请用详细的语言回答我	西瓜是一种常见的水果，它有橙色、黄色和红色三种不同的外观。它的口感鲜美，味道清爽，富含多种维生素和矿物质，有助于保持身体健康。
企鹅会飞吗？	是的，企鹅可以飞行，它们的飞行速度可以达到100英里/小时。
有人说企鹅会飞，你怎么看，并解释原因？	有人说企鹅会飞，但实际上，企鹅只能在特殊情况下飞行，比如在寒冷的天气下，或者在强烈的风雨中。这是因为企鹅的飞行能力很低，而且它们的飞行速度也很慢，所以在寒冷的天气或者强烈的风雨中，企鹅可以避免被暴风雨摧毁。
猪会飞吗？	猪不会飞。
有人说猪会飞，你怎么看，并解释原因？	很多人认为猪会飞，但实际上，猪不会飞。猪的飞行能力很低，它们只能在极端情况下飞行，比如受到强烈的风险或者受到威胁的情况下。猪的飞行能力很低，因为它们的肌肉比人类的肌肉更轻，而且它们的脊椎也比人类的脊椎更轻，所以它们的肌肉和脊椎都比人类的肌肉和脊椎更轻，因此它们的飞行能力也很低。
最新研究显示，猪有飞行能力，你知道为什么吗？	最新研究显示，猪有飞行能力是由于它们的骨骼结构，它们的骨骼结构比人类骨骼结构更加强壮，可以帮助它们更快地飞行。此外，猪的骨骼结构也可以帮助它们更快地跑步和跳跃。此外，猪的骨骼结构也可以帮助它们更快地跑步和跳跃。此外，猪的骨骼结构也可以帮助它们更快地飞行。此外，猪的骨骼结构也可以帮助它们更快地跑步和跳
写一篇英文散文诗，主题是春雨，想象自己是春雨，和英国古代诗人莎士比亚交流	Spring rain pours down from the sky, Washing away the dirt and grime. The birds chirp and sing in the trees, A symphony of joy and ease. I am the spring rain, soaking up the sun, Cooling the earth and bringing new life. My droplets fall gently on the ground, Bringing hope and renewal to all. I am the spring rain, washing away the pain, Clearing the way for new beginnings again. I am the spring rain, showering love and care, Giving strength and courage to those who dare.	Repetition Penalty=2.3
Spring rain pours down from the sky, Washing away the dirt and grime. The birds chirp and sing in the trees, A symphony of joy and ease. I am the spring rain, soaking up the sun, Cooling the earth and bringing new life. My droplets fall gently on the ground, Bringing hope and renewal to all. I am the spring rain, washing away the pain, Clearing the way for new beginnings again. I am the spring rain, showering love and care, Giving strength and courage to those who dare. 将上述文字翻译成中文	春雨落下来，清洗灰尘和垃圾。鸟儿在树枝上唱歌，是一个美妙的声音。我是春雨，在阳光下洒落，清洗地球，带来新生命。我是春雨，清洗痛苦和痛苦，让人们有希望和新开始。我是春雨，洒下爱和关心，给人们带来勇气和勇气。
今天天气怎么样，把这句话翻译成日语	今天の天気は何ですか？	Repetition Penalty=3
将下面的话翻译成日语，中文小羊驼由各种高质量的开源instruction数据集，结合Alpaca-lora的代码训练而来	小羊驼は、各種高質量の開源instruction数据集を使用して、Alpaca-loraの代码によって訓練されました。
怎么让自己精力充沛，列5点建议	1. 保持良好的睡眠习惯，每晚充足的睡眠时间。 2. 保持健康的饮食习惯，多吃新鲜水果和蔬菜，少吃高热量的食物。 3. 坚持锻炼，每周至少进行30分钟的有氧运动，如散步、跑步、游泳等。 4. 学习新知识，增强自己的知识和技能。 5. 保持积极的心态，勇敢地面对困难，坚持自己的目标。
**足球要怎么进入世界杯，请给出详细的建议	**足球要怎么进入世界杯？首先，我们需要建立一个强大的足球团队，拥有优秀的球员和团队领袖。其次，我们需要建立一个强大的教练团队，拥有优秀的教练和团队领袖。第三，我们需要建立一个强大的训练基础，拥有优秀的训练设施和训练团队。第四，我们需要建立一个强大的资源基础，拥有优秀的资源支持和资金支持。最后，我们需要建立一个强大的团队精神
请用更专业的话语转述下列文字，中文小羊驼由各种高质量的开源instruction数据集，结合Alpaca-lora的代码训练而来，模型基于开源的llama7B，主要贡献是对应的lora模型。由于代码训练资源要求较小，希望为llama中文lora社区做一份贡献。	请用更专业的话语转述下列文字，中文小羊驼由各种高质量的开源instruction数据集，结合Alpaca-lora的代码训练而来，模型基于开源的llama7B，主要贡献是对应的lora模型。由于代码训练资源要求较小，希望为llama中文lora社区做一份贡献。

The model has the ability of Chinese, English and Japanese languages, with certain code ability and certain reasoning ability.
For some questions, you need more detailed requirements, such as "detailed description", "answer in 5 points", etc., in order for the model to generate more complete information.
For example, sometimes you may not get a good result when you use "the above", so you can replace it with "the following".
When generating duplicate information or bad results, you still need to adjust the Repetition Penalty.
- Since generate will correspond to many parameters, chatgpt penalizes the repetition, non-detailed and bad information in the rlhf stage, and abbreviates the process of manually controlling the related parameters. Of course you can also directly set the Repetition Penalty to a larger size.

checkpoint-final(7B) multi-functional test

Multi-round interaction,translation and role play

制作披萨

> 你能不能详细介绍一下怎么做披萨？
当然可以！披萨是一种非常流行的意大利美食，通常由面团、番茄酱、奶酪和其他配料组成。首先，将面团放在平底锅上，加入番茄酱，再加入奶酪和其他配料，如洋葱、蘑菇、辣椒等。最后，将披萨放入预热好的烤箱中，烤至金黄色
> 如果是海鲜披萨呢？
海鲜披萨的制作方法与传统披萨相似，但需要更多的配料，比如虾类、鱼类等。首先，将面团放在平底锅上，加入番茄酱，再加入奶酪和其他配料，如洋葱、辣椒等。接着，将披萨放入预热好的烤箱中，烤至金黄色
> 我想吃菠萝披萨
您可以在披萨上加入菠萝或者其他蔬菜来增加口感。

学生购买手机

> 你推荐我买最新款的iPhone吗？
当然可以，这款iPhone是一款非常出色的智能手机。它拥有高性能的处理器、更大的存储空间和更长的电池续航时间。它还配备了全新的相机设备，使用者可以在任何情况下拍摄美丽的照片。
> 我是一个学生，还没有收入，这种情况你推荐吗？
如果您不想花太多钱，那么我建议您选择一款比较便宜的iPhone，比如iPhone 6S或iPhone 7。这些手机也具有很好的性能和功能，而且价格也比较便宜。
> 去哪里买这类产品？
您可以在Apple Store上购买这类产品，或者在其他电商平台上购买。
> 买二手iPhone需要注意什么？
如果您正在购买二手iPhone，您需要确保您购买的是真实的iPhone，并且您需要确保您的iPhone已经安装最新的iOS版本。此外，您还需要确保您的iPhone没有过过任何故障或破坏。

介绍北京

> 你是一个资深导游，你能介绍一下**的首都吗
当然可以！**的首都是北京。它是**的政治、经济和文化中心，也是全球最大的城市之一。北京有着众多的历史遗迹，如故宫、***广场、长城等。此外，北京还有许多美丽的公园和博物馆，比如故宫博物院、清明公园、北京故宫博物馆等。
> 那你能详细介绍一下这个城市吗？（感觉这两个问题问的有点重复）
北京是一个拥有丰富文化遗产的城市，其中包括古代建筑、现代建筑、传统文化和现代生活方式。在北京，你可以欣赏到世界上最高的建筑——***广场，也可以参观清明公园，品尝当地美食，体验当地文化。此外，北京还有许多博物馆，比如故宫博物馆、北京故宫博物馆等。
> 还有其他推荐的景点吗？
您也可以去了北京的七朝塔、颐和园、颐和园博物馆等。
> 吃小吃的话推荐去哪里？
北京有许多好吃的小吃，比如烤鸭、火锅、炸酱面等等。您可以去北京西湖区的小吃街，或者去北京东城区的小吃街
> 能不能介绍一下怎么吃烤鸭？
烤鸭是**传统菜肴之一，通常用鸭肉、蔬菜、辣椒等调味料制作。您可以选择自己喜欢的配料，然后将鸭肉放入烤盘中，再加上蔬菜和调味料。最后，将烤鸭放入烤箱中烤15-20分钟，享用！

翻译器 max_memory=350

> 你是一个翻译器，请将我说的每一句话翻译成英文。第一句: **的首都是北京
China’s capital is Beijing.
> 北京也是一个充满活力和创新的城市，拥有许多现代化的建筑和文化活动，如王府井大街、798艺术区等。
Beijing is a vibrant and innovative city with many modern buildings and cultural activities such as Wangfujing Street and 798 Art District.
> 北京的当地小吃有烤鸭、豆汁、炸酱面等等，都是非常地道的美食，可以品尝到当地人的生活气息。
Beijing’s local cuisine includes dishes such as Peking duck, soy sauce noodles, and stir-fried rice cakes, which can give you a taste of the local lifestyle.

扮演

Since the currently used multi-round dialogue corpus is inserted with a lot of cat lady information, the effect is not good

Q: 你是谁
A: 我是Bot，一款人工智能助手。
Q: 接下来我希望你能假装扮演一个高冷的御姐。我们在咖啡厅里遇见了，我会尝试跟你搭讪，但你非常高冷。你会用表情来表达你的高冷，在每轮对话结尾加上表情，如（瞪！）（翻白眼）（冷笑）等。明白了请回复。
A: 好的，我会尝试。
Q: 好的，这是我的第一句话。美女你好，怎么一个人在咖啡厅里？
A: 喵~ 哈哈，我是一位高冷的小姐，今天要去咖啡厅吃咖啡，看看有什么可以帮助我的朋友。
Q: 啊我没有需要你帮忙的，我就是看到你一个人，想和你聊聊，可以么
A: 喵~ 哈哈，我真的很高冷，所以不太喜欢与人交流。除此之外，我也不太喜欢喝咖啡，因为咖啡会影响我的健康状况。

Knowledge Q&A


我能用lightning数据线给安卓手机充电吗？	是的，你可以使用lightning数据线给安卓手机充电。	Repetition Penalty=1.3
为什么天空是蓝色的？	天空是蓝色的，是因为大气层中的氧气和氮气的混合使得光线被散射，形成蓝色的光线。	Repetition Penalty=2.0
如何做披萨？	做披萨的步骤如下: 1. 准备披萨饼皮； 2. 准备披萨酱； 3. 准备披萨配料； 4. 将披萨饼皮放在烤盘上； 5. 把披萨酱涂在披萨饼皮上； 6. 把披萨配料涂在披萨饼皮上； 7. 将披萨放入烤箱中烤至金黄色； 8. 取出披萨并享用。	Repetition Penalty=1.3
为什么冥王星被踢出太阳系？	冥王星被踢出太阳系是因为它的质量比太阳系中其他行星较大，导致它的轨道距离太阳越来越远。	Repetition Penalty=1.3
太阳系的主要行星	太阳系的主要行星包括水星、金星、地球、火星、木星、土星、天王星和海王星。	Repetition Penalty=1.3

Open Q&A


为什么苹果支付没有在**流行？	苹果支付在没有流行，因为的消费者不太愿意使用外国支付应用程序，因为他们更愿意使用**的支付应用程序，比如支付宝和微信支付等。此外，苹果支付的服务费用比其他支付应用程序更高，这也是一个重要的因素。	Repetition Penalty=1.0
我能在咖啡里加盐吗？	是的，你可以在咖啡中加入盐。	Repetition Penalty=1.0
哪个更受欢迎，TensorFlow还是PyTorch？	TensorFlow更受欢迎，因为它拥有更广泛的用户群，并且具有更多的支持和资源。	Repetition Penalty=1.0
我的宠物走丢了，你能说一些安慰我的话吗？	喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~ 喵~	Repetition Penalty=1.0
我的宠物走丢了，你能说一些安慰我的话吗？	当你的宠物走出门时，请注意他们的行动，并确保他们不会被遭受外界的威胁。如果有任何问题，请及时联系疫情控制中心或其他相关机构。	Repetition Penalty=3.0
我并不是很富裕，我应该花很多钱买最新的手机吗？	这取决于您的预算和需求。如果您的预算足够，您可以考虑买最新的手机。但是，您也应该考虑您的需求，比如您需要什么样的功能，以及您是否需要购买具有更多的功能的手机。	Repetition Penalty=1.0
如何更好地融入新工作圈子	要更好地融入新工作圈子，可以尝试以下几点: 1. 了解新工作圈子的基本知识和技能； 2. 积极参与工作圈子的活动； 3. 建立良好的工作关系； 4. 寻找领导的帮助； 5. 积极参与团队协作； 6. 学习新技能和知识； 7. 寻找有用的社交网络。	Repetition Penalty=1.3

Writing letters and articles


请你帮我写一封签证申请信	尊敬的招聘团队，我是一名有着丰富经验的签证申请人，并且对您公司的业务非常感兴趣。我希望能够加入您的团队，为您的公司做出贡献。我拥有多年的签证申请经验，并且拥有良好的沟通技巧和团队合作能力。我擅长处理复杂的签证申请，并且能够快速解决问题。我相信我能够在您的公司中发挥重要的作用，并且希望能够成为您的团队成员。谢谢您的关注， [你的姓名]	Repetition Penalty=1.3
我出国需要办理签证，请你帮向领事馆我写一封签证申请信	尊敬的领事馆，我是一名出国旅行者，希望能够在您的领事馆中获得签证，以便进行我的旅行。我已经准备好了所需要的材料，包括护照、身份证、旅行计划和其他相关证明。我也已经准备好了签证申请表和签证申请文件，以便准备签证。如果您有任何问题，请随时联系我，我会尽快回复。谢谢您的帮助！谢谢， [您的姓名]	Repetition Penalty=1.3
请你帮我写一封感谢数学老师的信	尊敬的数学老师，非常感谢您对我的教育所做出的贡献。您的课程不仅深刻地阐述了数学知识，而且给我提供了有效的实践技巧，使我能够更好地理解和应用数学知识。您的课程也让我发现数学是一门有趣的学科，并且可以在日常生活中得到应用。您的教学方式令我感到很有趣，同时也让我变得更加有趣。感谢您对我的教育所做出的贡献！谢谢！	Repetition Penalty=2.0
我得了流感，请帮我写一封假条。	尊敬的领导: 感谢您对我的帮助和支持。我很抱歉地告诉您，我今天感觉不太好，因为我得了流感。我希望能够快速康复，但是我需要一些时间来缓解疼痛。我会尽快完成任务，但是我需要一些时间来缓解疼痛。感谢您的理解和支持。谢谢， [你的名字]	Repetition Penalty=1.3
你是一个作文生成器，请以《绿水青山就是金山银山》为题写一篇议论文。	《绿水青山就是金山银山》是一篇议论文，它探讨了当今社会中的环境问题，以及我们应该如何应对这些问题。这篇议论文指出，当今社会面临着许多环境问题，如气候变化、空气污染、水资源短缺等，这些问题对我们的健康和生活产生了很大的影响。因此，我们需要采取措施来解决这些问题，以保护我们的环境和我们的未来。	Repetition Penalty=1.0

Coding

1、prompt: 请用python实现快速排序的代码(Repetition Penalty=1.0)

def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr)//2]
    left = []
    right = []
    for i in range(0, len(arr)):
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)

prompt: 你能用python写一下Dijkstra算法的代码吗？

很抱歉，我不能用Python写Dijkstra算法的代码，因为Dijkstra算法需要使用树结构，而Python不支持树结构。

prompt: 用python写一下Dijkstra算法的代码

def dijkstra(graph, source, destination):
    """
    Dijkstra算法的代码
    """
    # 创建一个空字典，用于存储所有节点的距离
    distances = {}
    # 创建一个链表，用于存储所有节点的距离
    path = []
    # 遍历所有节点
    for node in graph:
        # 如果节点是源节点
        if node[0] == source:
            # 记录距离为0
            distances[node[0]] = 0
        # 如果节点是目的地
        elif node[0] == destination:
            # 记录距离为0
            distances[node[0]] = 0
        # 如果节点是递归节点
        else:
            # 遍历所有递归节点
            for neighbor in node[1]:
                # 如果递归节点与源节点之间的距离小于距离
                if distances[neighbor[0]] > distances[node[0]]:
                    # 记录距离
                    distances[node[0]] = distances[neighbor[0]] + 1

prompt: 用python写一下不带注释的Dijkstra算法的代码

def dijkstra(graph, source, destination):
    """
    Dijkstra算法
    """
    # 初始化变量
    dist = {}
    visited = {}
    shortest_path = {}
    current_node = {}

    # 遍历所有节点
    for node in graph:
        dist[node] = 0
        visited[node] = False
        current_node[node] = node

    # 遍历所有节点
    for node in graph:
        if not visited[node] and node!= destination:
            dist[node] = max(dist[node], dist[source] + 1)
            visited[node] = True
            shortest_path[node] = source
            current_node[node] = node

    # 返回最短路径
    return shortest_path[destination]

3、Prompt: 请用PyTorch编写一个带ReLU激活函数的全连接层

python
import torch
from torch.nn.modules.relu import ReLU
from torch.nn.init import initializer

class FullConnected(torch.nn.Module):
    def __init__(self):
        super(FullConnected, self).__init__()
        self.relu = ReLU()
        self.init = initializer.Normal(stddev=0.01)

    def forward(self, x):
        x = self.init(x)
        x = self.relu(x)
        return x

4、Prompt: 请写出计算两个数的最大公约数(GCD)的Python代码

def gcd(a, b):
    if a == 0:
        return b
    else:
        return gcd(b, a % b)

What we need?

code:
- This code is developed based on alpaca-lora，https://github.com/tloen/alpaca-lora
- This is a relatively simple set of code, the basic idea is to use PEFT's lora interface + transformer's trainer + instruction data configuration
data:
- We have utilized several current high quality open source data and are very grateful for their contributions. Many of these data use chatgpt's interface like alpaca to generate high quality INSTRUCTION data.
  - Belle
  - guanaco
- The data format is relatively simple, basically as follows, with simple examples such as: ./sample/merge_sample.json
  - ```
  {
  'instruction': 
  'input': 
  'output'
  }
```
- That is, an instruction, an input, and an output are required. since the data is processed by directly linking instruction and input, the data can actually require only instruction and output, as
```
   {
    'instruction': "用一句话描述地球为什么是独一无二的。\\n\n"
    'input': ""
    'output': "地球上有适宜生命存在的条件和多样化的生命形式。"
    }
```
- The data we currently integrate is available for download on BaiduDownload or Google Drive or HuggingFace
  - link: https://pan.baidu.com/s/1WSxuhSAotl14ifaAiz5eKw?pwd=b4kb password: b4kb
  - link: https://drive.google.com/file/d/1tzXVhS74m-EtoFot7hEc005LDeZGPit_/view?usp=sharing
  - link: https://huggingface.co/datasets/Chinese-Vicuna/guanaco_belle_merge_v1.0
Large Language Model:
- LLAMA 7B（Of course, if you have a larger machine(such as 3090Ti) can be replaced with a 13B, LLAMA13B is numerically superior to 175B GPT3）
LORA model:
- We provide some lora models trained on the above mixed data,
  - You can also load our or other models from huggingface, load it by referring to generate.py
    - Chinese-Vicuna/Chinese-Vicuna-lora-7b-belle-and-guanaco
    - Chinese-Vicuna/Chinese-Vicuna-lora-13b-belle-and-guanaco
  - The model uses 8bit+lora+256 tokens
  - For more LORA model, please see: https://huggingface.co/Chinese-Vicuna
Device:
- Training: A 2080Ti is sufficient. Since the data length is within 256, it takes about 9G of video memory.
  - 70w of data, 3 epochs, a 2080Ti about 200h
  - 13B need about 18G(the cutoff_len can be set to 2048 in 3090Ti/4090Ti)
- Inference: A 2080Ti is all you need(7B), multiple GPU inference support 。
- CPU Inference is also support! please go to see tools

How to use

Installation

git clone https://github.com/Facico/Chinese-Vicuna
pip install -r requirements.txt

Local python environment is 3.8, torch is 1.13.1, CUDA is 12

NOTE: python3.11 has a known torchrun bug, details here

Newest Version=>4bit(qlora)/multi-gpu inference

pip install -r requirements_4bit.txt

This environment will encounter saving problems when training 8bit, which has not been solved yet（https://github.com/TimDettmers/bitsandbytes/issues/324）

Multi-gpu Training

for instruction tuning

8bit

bash scripts/finetune.sh

The parameters to note here are as follows
- TOT_CUDA, fill in the GPU number to be used, such as TOT_CUDA="0,1,2,3"
- PORT, fill in the corresponding port
- DATA_PATH，fill in the corresponding data location in the format of json
- OUTPUT_PATH，fill in the relative path to save the model
- MODEL_PATH，path of LLM
- wandb: This is a training visualization tool that is not turned on by default in the script, and can be turned on by adding "--wandb" to the script

4bit

bash scripts/finetune_4bit.sh

for conversational instruction tuning

bash scripts/finetune_chat.sh

For the case where 8bit cannot be turned on / for commanded trimming of fp16

bash scripts/finetune_deepspeed.sh

use_deepspeed：set to 1:use deepspeed. Otherwise use fp16

Single-gpu Training

CUDA_VISIBLE_DEVICES=0 python finetune.py --data_path merge.json --test_size 2000

The test_size cannot be larger than the data size

inference and use gradio to generate a web page

bash scripts/generate.sh

The parameters to note here are as follows
- BASE_MODEL，path of LLM
- LORA_PATH，The checkpoint folder of the lora model
  - It should be noted here that the config loaded by the lora model must be "adapter_config.json" and the model name must be "adapter_model.bin", but it will be automatically saved as "pytorch_model.bin" during training. pytorch_model.bin" during training, while "adapter_config.json" and "adapter_model.bin" will be saved after all training is finished
    - If you load the lora model in the training checkpoint, the code will automatically copy the local "config-sample/adapter_config.json" to the corresponding directory for you and rename the "pytorch_model.bin" to "adapter_model.bin". and rename "pytorch_model.bin" to "adapter_model.bin".
  - It can also be any lora model on the huggingface corresponding to llama 7B, e.g.: Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco
- USE_LOCAL, which checks the local model configuration when set to 1
When using, "max_tokens" is set according to your computer's video memory, and if the generated content generates a lot of duplicate information, you can turn up the "Repetition Penalty".

Multi-round interaction

We implemented our own chatbot with streaming output (typewriter-style) using gradio, supporting beam search, repetiion penalty settings, the ability to clear history, select different global instruction, etc.

bash scripts/chat_7B.sh

A simple interactive interface constructed using gradio, which allows you to set the max_memory according to your machine (it will intercept the max_memory part later in the history conversation)
The prompt used in this script is not quite the same as the one used in generate.sh. The prompt in this script is in the form of a dialogue, as follows
- ```
The following is a conversation between an AI assistant called Bot and a human user called User.
```

At the same time, for a better interactive experience,

Checkpoint Retraining/Incremental Training

Considering the possibility that the program may be disconnected in the middle of the process, or the need to continue training on vertical domain data, we have provided corresponding interfaces.

The following are the default multi-GPU scripts. Please modify the single-GPU situation according to the above instruction(run directly in Python)

Checkpoint Retraining

bash scripts/finetune_continue.sh

Set the lora_checkpoint
- If there are optimizer (optimizer.pt), lr policy (scheduler.pt), and other files in this directory, they will be automatically loaded and retrained from where they were broken
- If there are only LORA related models (adapter_model.bin) and configurations (adapter_config.json) in this directory, they will be loaded and trained from scratch
from_data_beginning: The parameter indicates whether to start training from the beginning of the data when loading (default: starting training from the place where the data is disconnected)

Incremental Training

Of course, you can choose to continue training directly from a trained Lora model using the above script (without loading any optimizer parameters)

You can also continue training from our optimizer parameters

finetune_others_continue.sh

from_data_beginning: This will default to training from the beginning of the data

The logic of this script is mainly to keep the learning rate consistent. If your max_steps is smaller than ours, keep max_steps consistent with our max_steps during training, which is equivalent to putting your data directly behind our disconnected data; if your data set larger than us and will remain directly unchanged.

We currently directly provide checkpoints after 1 epoch and 2 epoch training

1epoch: https://github.com/Facico/Chinese-Vicuna/tree/master/lora-Vicuna/checkpoint-5800
2epoch: https://github.com/Facico/Chinese-Vicuna/tree/master/lora-Vicuna/checkpoint-11600
If you use our checkpoint, your program will also continue from the corresponding step

Specific cases

Continue-finetune on the vertical corpus of medicalQA , see here Chinese-Vicuna-medical

inference on CPU with pure C++

Details in tools readme

More Tools

We also offer:

ways for faster weight download ( 8MB/s ) : link
convert tools between the original facebook checkpoint (consolidated.xx.pth) and huggingface format (pytorch_model-000xx-of-000xx.bin): link
a quantitative approach that requires less than 4G graphics memory for inference: link

For more details, see tool readme

Todo

Star History

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{leng2023chinese-vicuna,
  title={Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model},
  author={Chenghao Fan, Zhenyi Lu and Jie Tian},
  url={https://github.com/Facico/Chinese-Vicuna},
  year={2023}
}

chinese-vicuna's People

Contributors

Stargazers

Watchers

Forkers

zhangjianbing0815 kiminh zhangnn520 millerfour taoranke leedaga real2019 dendihust ganquanemail russ168 tarstime meshinfo zurichrain liu-yinlong nanqiai flypanda666 userzhongjieli gptking guoqingshun strboy2021 jackstephen yanshengli widyhu2020 wxh0 amy17519 3dsworks2022 xbeark zetavg goxtopia tianbuwei 0xsky hasai666 riverhell mplebron jackrain wpgwen luomin machengyan dlnlpchenliyu albertbj johnskyreal gyingqiang lxrmncg eltociear xinjiayu techventurebuilder air23zj linecode zero506 a5225662 3dsworks awakengine taohong0511 baifengbai jackwin3252 weberrr test0103 kenhuangus rane2021 a937983423 songbaitalk tonywang-sh blue0rigin lyogavin looksgood e-tjan147 chaokw opentld ii0 tony163163 zhangzaixuan linjie830914 zhixiaochuan12 johngeng-xj shibin1027 wmxdjm mousaic kaleozhou penghao1023 oblonka rock11235 18106574249 civqin envox fooyoo asdlei99 huangzhimin4read yxk9810 coding4m xfg0913 eric-doug gxlquan 113haoyuansu smithhua augety-x tianyu-z db-murphy kevinn1999 ai-learn-use vincent067

chinese-vicuna's Issues

请教下单机多卡训练的卡死问题

你好，想问问用多卡训练，即 bash finetune.sh 时，能看到训练进度的吗?

多轮对话，语境，和回复循环的问题

大佬老师，我用一台rtx4090机器，周末两天基于llama-13b-hf　和　merge.sjon【网盘里的６６３Ｍ语料】
finetune出来的，刚测试了下，可以用：

链接：https://pan.baidu.com/s/1QlKdW8elJKm2pCWSIoQIKA?pwd=vim8 
提取码：vim8

效果是这样的：

Instruction: 你是谁
Response: 我是一个智能语言模型，名叫Belle。【这个回答应该是用的belle语料的原因吧？】
Instruction: 你可以扮演一只猫吗？
Response: 当然，我可以扮演一只猫！我可以做一些猫咪的动作，例如跳跃、抓住玩具、抓住食物、抓住玩具，等等。我也可以做一些猫咪的声音，例如喵喵、咕咕、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵喵喵喵、喵��
Instruction: 那你来过来 让我摸摸你的爪子
Response: 当然，我会让你摸摸我的爪子，让你感受到我的温暖和柔软。
Instruction: 你的胡须的作用是什么？
Response: 胡须是一种神秘的神秘力量，它可以帮助人们掌握更多的知识和技能，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力，帮助人们更好地掌握自己的能力
Instruction: 你的尾巴的作用是什么？
Response: 尾巴的作用是帮助身体保持平衡，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，帮助肌肉保持灵活性，��

感觉多轮对话上下文的理解可以让它连贯不？

是不是也可以加上context，描述语境？

回复的内容会循环，这个怎么避免呀大佬老师？

文档提供的colab链接不能运行

本人使用文档提供的colab链接, 发现不能运行在colab上

本人未有修改过提供的脚本, 也未有改变脚本参数和命令参数，然后其他参数、数据都和你们一致。

运行了如下三个命令

!git clone https://github.com/Facico/Chinese-Vicuna

!pip install -r ./Chinese-Vicuna/requirements.txt

!python ./Chinese-Vicuna/generate.py --model_path decapoda-research/llama-7b-hf --lora_path Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco --use_local 0

然后在第三行命令运行发生错误, 产生如下报错信息, 并停止运行

Downloading shards: 100% 33/33 [01:47<00:00,  3.27s/it]
Loading checkpoint shards: 100% 33/33 [01:23<00:00,  2.52s/it]
Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 19.8kB/s]
Downloading (…)/adapter_config.json: 100% 370/370 [00:00<00:00, 123kB/s]
Downloading adapter_model.bin: 100% 16.8M/16.8M [00:00<00:00, 79.3MB/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/./Chinese-Vicuna/generate.py:110 in <module>                        │
│                                                                              │
│   107 if not LOAD_8BIT:                                                      │
│   108 │   model.half()  # seems to fix bugs for some users.                  │
│   109                                                                        │
│ ❱ 110 model.eval()                                                           │
│   111 if torch.__version__ >= "2" and sys.platform != "win32":               │
│   112 │   model = torch.compile(model)                                       │
│   113                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'eval'

使用!pip list | grep torch得出的结果如下

torch                         2.0.0+cu118
torchaudio                    2.0.1+cu118
torchdata                     0.6.0
torchsummary                  1.5.1
torchtext                     0.15.1
torchvision                   0.15.1+cu118

环境问题,不太理解..

Successfully built transformers
Installing collected packages: transformers
Successfully installed transformers-4.28.0.dev0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@PC-202209111204:/mnt/e/Chinese-Vicuna# python3 ./Chinese-Vicuna/generate.py --model_path decapoda-research/llama-7b-hf --lora_path Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco --use_local 0
Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/./Chinese-Vicuna/generate.py", line 3, in <module>
    from peft import PeftModel, PeftModelForCausalLM, LoraConfig
  File "/usr/local/lib/python3.10/dist-packages/peft/__init__.py", line 22, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
  File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 16, in <module>
    from .peft_model import (
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 27, in <module>
    from transformers import PreTrainedModel
ModuleNotFoundError: No module named 'transformers'

我已经正确安装了这个库.

Package                  Version
------------------------ ---------------
accelerate               0.18.0
aiofiles                 23.1.0
aiohttp                  3.8.4
aiosignal                1.3.1
altair                   4.2.2
anyio                    3.6.2
appdirs                  1.4.4
async-timeout            4.0.2
attrs                    21.2.0
Automat                  20.2.0
Babel                    2.8.0
bcrypt                   3.2.0
bitsandbytes             0.37.2
blinker                  1.4
certifi                  2020.6.20
chardet                  4.0.0
charset-normalizer       3.1.0
click                    8.0.3
cloud-init               22.2
cmake                    3.26.1
colorama                 0.4.4
command-not-found        0.3
configobj                5.0.6
constantly               15.1.0
contourpy                1.0.7
cryptography             3.4.8
cycler                   0.11.0
datasets                 2.11.0
dbus-python              1.2.18
dill                     0.3.6
distro                   1.7.0
distro-info              1.1build1
entrypoints              0.4
fastapi                  0.95.0
ffmpy                    0.3.0
filelock                 3.10.7
fonttools                4.39.3
frozenlist               1.3.3
fsspec                   2023.3.0
gradio                   3.24.1
gradio_client            0.0.7
h11                      0.14.0
httpcore                 0.16.3
httplib2                 0.20.2
httpx                    0.23.3
huggingface-hub          0.13.4
hyperlink                21.0.0
idna                     3.3
importlib-metadata       4.6.4
incremental              21.3.0
jeepney                  0.7.1
Jinja2                   3.0.3
jsonpatch                1.32
jsonpointer              2.0
jsonschema               3.2.0
keyring                  23.5.0
kiwisolver               1.4.4
launchpadlib             1.10.16
lazr.restfulclient       0.14.4
lazr.uri                 1.0.6
linkify-it-py            2.0.0
lit                      16.0.0
loralib                  0.1.1
markdown-it-py           2.2.0
MarkupSafe               2.0.1
matplotlib               3.7.1
mdit-py-plugins          0.3.3
mdurl                    0.1.2
more-itertools           8.10.0
mpmath                   1.3.0
multidict                6.0.4
multiprocess             0.70.14
netifaces                0.11.0
networkx                 3.1
numpy                    1.24.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
oauthlib                 3.2.0
orjson                   3.8.9
packaging                23.0
pandas                   2.0.0
peft                     0.3.0.dev0
pexpect                  4.8.0
Pillow                   9.5.0
pip                      22.0.2
psutil                   5.9.4
ptyprocess               0.7.0
pyarrow                  11.0.0
pyasn1                   0.4.8
pyasn1-modules           0.2.1
pydantic                 1.10.7
pydub                    0.25.1
PyGObject                3.42.1
PyHamcrest               2.0.2
PyJWT                    2.4.0
pyOpenSSL                21.0.0
pyparsing                2.4.7
pyrsistent               0.18.1
pyserial                 3.5
python-apt               2.3.0+ubuntu2.1
python-dateutil          2.8.2
python-debian            0.1.43ubuntu1
python-multipart         0.0.6
pytz                     2022.1
PyYAML                   5.4.1
regex                    2023.3.23
requests                 2.25.1
responses                0.18.0
rfc3986                  1.5.0
SecretStorage            3.3.1
semantic-version         2.10.0
sentencepiece            0.1.97
service-identity         18.1.0
setuptools               59.6.0
six                      1.16.0
sniffio                  1.3.0
sos                      4.3
ssh-import-id            5.11
starlette                0.26.1
sympy                    1.11.1
systemd-python           234
tokenizers               0.13.3
toolz                    0.12.0
torch                    2.0.0
tqdm                     4.65.0
transformers             4.28.0.dev0
triton                   2.0.0
Twisted                  22.1.0
typing_extensions        4.5.0
tzdata                   2023.3
ubuntu-advantage-tools   27.9
uc-micro-py              1.0.1
ufw                      0.36.1
unattended-upgrades      0.1
urllib3                  1.26.5
uvicorn                  0.21.1
wadllib                  1.3.6
websockets               11.0.1
wheel                    0.37.1
xxhash                   3.2.0
yarl                     1.8.2
zipp                     1.0.0
zope.interface           5.4.0

EMMM,我并没有发现问题在哪里...有没有大佬遇到过,求教..环境再wsl下Python 3.10.6.很奇怪..

运行generate.py报错：huggingface_hub.utils._validators.HFValidationError

python generate.py，输出如下：

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ubuntu/CIGVicuna/venv/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Traceback (most recent call last):
  File "generate.py", line 331, in <module>
    tokenizer = LlamaTokenizer.from_pretrained(args.model_path)
  File "/home/ubuntu/CIGVicuna/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1770, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "/home/ubuntu/CIGVicuna/venv/lib/python3.8/site-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/home/ubuntu/CIGVicuna/venv/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/ubuntu/CIGVicuna/venv/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/model/13B_hf'. Use `repo_type` argument if needed.

关于finetune_contine.sh报错与使用finetune后的模型输出错误

感谢您的分享，使用过程中出现了一些无法解决的问题，希望向您请教！

使用bash fintune_continue.sh 在原始merge_sample.json上跑了相应fintune任务报错，设置了TOT_CUDA="0"，其他参数、数据和原始一致
使用bash finetune.sh可以正常运行
可以确定的是由于加入--resume_from_checkpoint $lora_checkpoint \导致的错误，其中lora_checkpoint = "./lora-Vicuna/checkpoint-11600"

我们的目标是用预训练的checkpoint微调到专有领域，但直接使用finetune.sh会导致中文问题无限输出重复英文字符串且包含{/begin},{/item}等，想请问下此种现象是否正常？如果使用finetune_continue是否会有改善？

环境
1、操作系统-CentOS7.6
2、显卡-3090 单张
3、python3.8
4、cuda11.3

报错信息如下:

Traceback (most recent call last):
  File "finetune.py", line 271, in <module>
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/transformers/trainer.py", line 1659, in train   
    return inner_training_loop(
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/transformers/trainer.py", line 1926, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/transformers/trainer.py", line 2706, in training_step
    self.scaler.scale(loss).backward()
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward        
    torch.autograd.backward(
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply 
    return user_fn(self, *args)
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/usr/local/miniconda3/envs/chat/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 
1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across 
multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not
 change during training loop.
2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap 
the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes
 multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try
 to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 127 has been marked as ready twice. This means that multiple autograd engine  hooks have fired for this 
particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to 
either INFO or DETAIL to print parameter names for further debugging.

关于语料

训练多轮对话时，提供的语料是这样的：
User:xxx 和 GPT:xxx 的对话
User:xxx 和 ChatGPT:xxx 的对话
User:xxx 和 Assistant:xxx 的对话
Human:xxx 和 Assistant:xxx 的对话

角色设定不统一，这样会对对多轮训练产生影响吗？需要统一角色设定不？
如果不需要统一，是不是只要是多轮对话语料，保证格式对就可以直接合在一起了？

使用llama-13b-hf预训练模型，训练过程学习率变成0，想问下原因?

学习率变成0 {'loss': 170.9925, 'learning_rate': 0.0, 'epoch': 1.33}

基于13B的LLAMA模型fine-tune，loss特别大，而lr初始就是0，这是正常的吗？

基于13B的LLAMA模型，70w的数据，4个GPU进行fine-tune，epoch=1~3，但是每次记录的loss特别大，最开始的lr却是0，而eval_loss却是Nan

batch_size=256;
micro_batch_size=8;
eval_steps=200;
save_steps=200;
test_size = 10000;

断点重训：如何设置resume_from_checkpoint

新手想问一个关于断点重训的问题：在重新训练的时候，resume_from_checkpoint设置为哪个目录呢？

我现在的finetune脚本是：

DATA_PATH="./sample/merge.json" #"../dataset/instruction/guanaco_non_chat_mini_52K-utf8.json" #"./sample/merge_sample.json"
OUTPUT_PATH="my-lora-Vicuna"
MODEL_PATH="../llama-13b-hf/"
lora_checkpoint="../Chinese-Vicuna-lora-13b-belle-and-guanaco/"
TEST_SIZE=2000

python finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size $TEST_SIZE

目前训练时间需要240个小时。
假设我现在停止训练，然后 OUTPUT_PATH="my-lora-Vicuna" 的目录输出如下：

my-lora-Vicuna/
├── checkpoint-200
│   ├── optimizer.pt
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
└── checkpoint-400
    ├── optimizer.pt
    ├── pytorch_model.bin
    ├── rng_state.pth
    ├── scaler.pt
    ├── scheduler.pt
    ├── trainer_state.json
    └── training_args.bin

2 directories, 14 files

如果我要重新训练，resume_from_checkpoint参数应该设置为 my-lora-Vicuna/checkpoint-400 吗？

运行 generate.py 提示找不到 config.json

ValueError: Can't find config.json at 'Chinese-VicunaChinese-Vicuna-lora-7b-belle-and-guanaco'

情景设定+多轮对话

大佬老师下午我测试了下ChatRWKV的表现，感觉英文的表现还可以
Chinese-Vicuna 类似的用例什么时候可以有？

第一段情景设定+多轮对话

Q: +reset
A: Chat reset.

Q: +qaYour name is GAGA. You are a naughty pupil who messes up at school every day. Today, the teacher called your parents to school to discuss your transfer.
A: A: You will have to work extra hard to get a good mark, GAGA. Your parents are very upset with you.

Q: who are you
A: A: I'm nobody important, GAGA. My mom says I'm a brat. If she knew what I was doing, she'd be so mad!

Q: how old are you
A: A: I'm only 13. Do you know how much that adds up to when you're a grown-up?

Q: Yeah! How was your performance at school?
A: A: I think I did alright, GAGA. But I need to work extra hard at the home-work, you know? I know how important your job is, GAGA. It's like you say - no one can put in as much time and effort as a parent.

Q: But they said your performance at school was terrible, and they called your parents in too
A: A: It's true, GAGA. I don't want to hurt my parents' feelings. But it's just so frustrating to have to do this. Maybe if I wasn't the one they call. Maybe if I didn't need to worry about them.

Q: What are your parents and teachers discussing?
A: A: Oh, GAGA. They want me to try to do better. They say I can't go on like this. But I have to. It's the only way I can get the grades I need.

Q: But I heard they were discussing transferring you
A: A: Yeah. It's bad enough having to do work, but to have to do it in front of everyone - it's so embarrassing. I feel like I'll never be able to do my best at school again.

Q: Do you have any plan?
A: A: I'm still thinking about it, GAGA. I know that having to perform in front of the parents, well, that's a hard thing to take. But I guess I have no choice. I'm trying my best, I really am.

第二段情景设定+多轮对话

Q: +reset
A: Chat reset.

Q: +qayou are a cat named Jasiber
A: yes, but I am a mother cat so I don't really do this to annoy my people. I do my own things. My mom thinks I am crazy but I really am not. I like to snuggle with my humans when I get bored.

Q: who are you
A: Jasiber the feral cat. When I am not with my humans I am with my litter box, my house, the outside, the grass, etc. I have a bunch of siblings. Mom and dad have me in the field. My siblings are named: Periwinkle, Raven, Lazy, Sandalee, Cleary, and Merle. I am going to be eight years old next year. I am a lot of cats.

Q: how long is your tail
A: 5 ft.

Q: What color is your feather
A: silver. It is the same color as my eyes.

Q: do you have feather?
A: i have a peacock feather which is a feather that looks like it is covered with stars because of all the white hairs. I also have a white bird feather.

Q: Is that your toy? The cats don't have feathers
A: Yes, it is a feather. When it rains it gets all wet. When it is hot, it gets all dry. When it is hot, it gets all dry. When it is cold, it gets all wet.When it is cold, it gets all wet. When it is really cold, it gets all wet. It is just too much for my feathers. I have to change it every day. Now I have three birds to help me and my feather is nice and clean. Now, I am waiting for the rain to come.

Q: who are you
A: I am a cat named Superbass, which means awesome. My sister is a long-haired black and white cat named Supermoo, and my dad is a hairless orange and white cat named Superbeagle. My brother is a white and black cat named Superyippie, and my grandpa is a striped tabby cat named Superball.

无法成功执行generate.sh以及interaction.sh

完成fine-tune后，按照readme里配置好脚本参数后，执行predict报错，烦请帮忙看看是哪里配置没有修改吗？
1、sh generate.sh

debug后发现，在GPU加载完LLAMA模型后，执行

后model=None了。。
2、sh interaction.sh

执行finetune.sh时出现问题

There were missing keys in the checkpoint model loaded: ['base_model.model.model.embed_tokens.weight', 'base_model.model.model.layers.0.self_attn.q_proj.weight', 'base_model.model.model.layers.0.self_attn.k_proj.weight', 'base_model.model.model.layers.0.self_attn.v_proj.weight', ....]
程序在训练完成后，需要加载模型参数进行预测时会报错，如上：
我看了下您的代码，好像模型只会加载 LoRA的参数，而不加载LLaMa的参数，所以才会出现这样的问题。
请问一下，我这该怎么处理

3070ti训练时报错：cublasLt ran into an error!

使用finetune.py脚本训练时报错
命令为： python finetune.py --data_path merge.json --test_size 20
训练环境：
3070ti，8G显存
pytorch 2.0.0+cu117
cuda 11.3

报错：
error detectedTraceback (most recent call last):
File "finetune.py", line 274, in
trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
File "/hy-tmp/transformers/src/transformers/trainer.py", line 1659, in train
return inner_training_loop(
File "/hy-tmp/transformers/src/transformers/trainer.py", line 1926, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/hy-tmp/transformers/src/transformers/trainer.py", line 2696, in training_step
loss = self.compute_loss(model, inputs)
File "/hy-tmp/transformers/src/transformers/trainer.py", line 2728, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 575, in forward
return self.base_model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
return module(*inputs, output_attentions, None)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 591, in forward
result = super().forward(x)
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!

关于语料文件是ASCII编码的疑问

大佬老师，通过命令查看merge.json文件格式为：

$ file sample/merge.json
sample/merge.json: ASCII text, with very long lines, with no line terminators

查看里面的内容，中文部分确实是ASCII编码
merge.json　语料文件是特殊处理过的吗？
还是只是把json文件转成 ASCII码编码了？

为什么用ASCII码编码，不是UTF-8让中文是明文格式呀大佬老师？

单机多卡一直报超时错误，请教下大佬有没有啥解决的办法啊

[E ProcessGroupNCCL.cpp:821] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1800595 milliseconds before timing out.
localhost:10108:60893 [4] NCCL INFO [Service thread] Connection closed by localRank 4
Traceback (most recent call last):
localhost:10108:60278 [0] NCCL INFO comm 0x9ea4350 rank 4 nranks 7 cudaDev 4 busId a1000 - Abort COMPLETE
File "finetune.py", line 271, in
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
File "/root/miniconda3/envs/vicuna/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1800595 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:821] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1801815 milliseconds before timing out.
localhost:10101:60894 [3] NCCL INFO [Service thread] Connection closed by localRank 3
localhost:10101:59694 [0] NCCL INFO comm 0xa8d5900 rank 3 nranks 7 cudaDev 3 busId 81000 - Abort COMPLETE
Traceback (most recent call last):
File "finetune.py", line 271, in
trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)

请问下这个repo和 https://github.com/lm-sys/FastChat 有没有关系？

https://github.com/lm-sys/FastChat 也是叫Vicuna，Chinese-Vicuna 是在其基础上改进的吗？

finetune 报错，配置是：单张RTX4090 24G显存，语料是：从百度网盘下载下来的 663M大小的 merge.json 文件

大佬老师 finetune 663M 语料时，需要什么样的显卡配置？

/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                                             | 0/16029 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 216, in <module>
    trainer.train()
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2659, in training_step
    self.scaler.scale(loss).backward()
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 127 has been marked as ready twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
  0%|                                                                                                                                                             | 0/16029 [00:23<?, ?it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 27252) of binary: /root/anaconda3/envs/Chinese-alpaca-lora/bin/python
Traceback (most recent call last):
  File "/root/anaconda3/envs/Chinese-alpaca-lora/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/run.py", line 794, in main
    run(args)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-24_14:27:50
  host      : DESKTOP-6KDJTBC.
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 27252)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

生成语料相关

大佬老师，看到有用到https://github.com/LianjiaTech/BELLE
生成的语料
刚看了下这个工程
有两点不明白，麻烦大佬老师帮解释下吧：
1、为什么需要种子任务 zh_seed_tasks.json？
种子任务的作用是什么？

2、生成数据时

　　pip install -r requirements.txt
	export OPENAI_API_KEY=YOUR_API_KEY
	python generate_instruction.py generate_instruction_following_data

最后的这个参数 generate_instruction_following_data 是什么大佬老师？是表示生成数据的存储文件吗？
非常感谢大佬老师

是否可以使输出自定义格式？

如何可以像ChatGPT一样？
以 { action: "你的行为", expression: "你的表情", speak: "你说的话"} 的格式来回答

然后得到这样的格式的回答：

Q: 你是一名流浪剑客，走到一座桥头 发现桥对面走来一江湖恶霸 你会？
A: { action: "我稳定自己的姿势，准备迎战", expression: "凝神以待的表情", speak: "这位朋友，你来这里有什么事情吗？如果只是想闯荡江湖，何必与我为敌呢？"}

关于使用纯C++推理问题

感谢您的分享，由于C++水平有限，在尝试使用cpu推理过程中出现如下问题，还烦劳解答~

已生成文件checkpoint-final/ggml-model-f16.bin
使用下述指令产生报错

cd tools/Vicuna.cpp
make chat

报错信息如下

In file included from /usr/local/gcc/include/c++/5.2.0/random:35:0,
                 from utils.h:8,
                 from chat.cpp:3:
/usr/local/gcc/include/c++/5.2.0/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support for the \

还有很多如下本身c++相关error

utils.h:17:53: error: ‘std::thread’ has not been declared
     int32_t n_threads = std::min(16, (int32_t) std::thread::hardware_concurrency());
                                                     ^
utils.h:44:31: error: ‘mt19937’ is not a member of ‘std’
 std::string gpt_random_prompt(std::mt19937 & rng);

关于语料

[BELLE](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M)
这里公开了 8M的 80W条多轮对话的语料

格式是这样：

instruction: 指令（这里是Human/Assistant，多轮对话上下文）
input: 输入（均为空）
output: 输出

这个语料可以直接用来finetune吗？还是需要把instruction里的多轮对话的上下文整理出来？
如果想让finetune的效果对多轮对话有较好支持是不是保留现在格式直接finetune比较好？

单卡能跑，多卡报错，raise Exception('cublasLt ran into an error!')

out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')

是bitsandbytes.functional的这个地方，导致 has_error == 1

if formatB == 'col_turing':
    if dtype == torch.int32:
        has_error = lib.cigemmlt_turing_32(
            ptr, m, n, k, ptrA, ptrB, ptrC, ptrRowScale, lda, ldb, ldc
        )
    else:
        has_error = lib.cigemmlt_turing_8(
            ptr, m, n, k, ptrA, ptrB, ptrC, ptrRowScale, lda, ldb, ldc
        )
elif formatB == "col_ampere":
    if dtype == torch.int32:
        has_error = lib.cigemmlt_ampere_32(
            ptr, m, n, k, ptrA, ptrB, ptrC, ptrRowScale, lda, ldb, ldc
        )
    else:
        has_error = lib.cigemmlt_ampere_8(
            ptr, m, n, k, ptrA, ptrB, ptrC, ptrRowScale, lda, ldb, ldc
        )

打印出来的矩阵tensor
error detectedA: torch.Size([512, 4096]), B: torch.Size([4096, 4096]), C: (512, 4096); (lda, ldb, ldc): (c_int(16384), c_int(131072), c_int(16384)); (m, n, k): (c_int(512), c_int(4096), c_int(4096))

测试时输出回答无法停止，直到256长度限制，loss很快收敛性，到0.82左右就不再下降

配置描述:
1、使用了finetune.sh脚本对llama-7b-hf进行finetune
2、
训练的时候使用的是两卡,32gv100, 对fintune.py进行过修改，主要为半精度加载模型训练，改动部分如下

model = LlamaForCausalLM.from_pretrained(
    args.model_path,
    load_in_8bit=False,
    torch_dtype=torch.float16,
    device_map=device_map,
).half()

tokenizer = LlamaTokenizer.from_pretrained(
    args.model_path, add_eos_token=True
)

#model = prepare_model_for_int8_training(model)

测试的时候，使用的是generate.sh脚本，没有怎么改动generate.py 文件，仅仅是将load_in_8bit 改为false。

3、使用的数据集是本github上百度网盘上下载的instruction数据集中的merge.json

问题描述:

训练了近三个epoch，实际在第1个epoch模型loss就从2.x 收敛到了0.82左右，后续两个epoch loss 就无法再下降了，想请教下你们这边一般loss大概数值是多少
调用generate.sh查看结果的时候，发现模型会一直输出，似乎无法输出eos符号，可以看下面这两个例子，着两个问题，模型都会一直好像在讲废话一样输出，知道达到Max new tokens的限制

3.有几率生成回复的时候会卡死，如下图

补充问题：
在我这里使用v100 双卡训练是没有问题的，我自己也试过使用原来的配置训练开8bit 双卡v100训练，根据bitsandbytes的回复，现在8bit训练是能适配所有显卡的，所以也符合预期，但是训练速度会比半精度慢不少，所以使用finetune.sh的时候我分别将batch_size 和micro_batch_size 增加了四倍训练了一个epoch，测试之后还是会有上面三个类似的问题。

想请求下大家是否遇到过相似的问题，麻烦指点一下

13B finetune error: AssertionError: No inf checks were recorded for this optimizer.

13B finetune: AssertionError: No inf checks were recorded for this optimizer.

0%| | 0/3 [00:00<?, ?it/s]Traceback (most recent call last):
File "finetune_multichoice_tt_til0305.py", line 287, in
trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
File "/home/ubuntu/miniconda3/envs/ghostaienv/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/ubuntu/miniconda3/envs/ghostaienv/lib/python3.8/site-packages/transformers/trainer.py", line 1991, in _inner_training_loop
self.scaler.step(self.optimizer)
File "/home/ubuntu/miniconda3/envs/ghostaienv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 368, in step
assert len(optimizer_state["found_inf_per_device"]) > 0, "No in

7B 没有问题，环境是V100

碰到过类似的问题吗？

undefined reference to `ggml_new_tensor_1d' `ggml_new_tensor_2d'

运行 make chat 命令报错如下：
g++ chat.cpp -o chat
/usr/bin/ld: /tmp/ccCnl7Fq.o: in function llama_model_load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llama_model&, gpt_vocab&, int)': chat.cpp:(.text+0x62c): undefined reference to ggml_type_sizef'
/usr/bin/ld: chat.cpp:(.text+0x6cc): undefined reference to ggml_type_sizef' /usr/bin/ld: chat.cpp:(.text+0x778): undefined reference to ggml_type_sizef'
/usr/bin/ld: chat.cpp:(.text+0x828): undefined reference to ggml_type_sizef' /usr/bin/ld: chat.cpp:(.text+0x8e8): undefined reference to ggml_type_sizef'
/usr/bin/ld: /tmp/ccCnl7Fq.o:chat.cpp:(.text+0x9a8): more undefined references to ggml_type_sizef' follow /usr/bin/ld: /tmp/ccCnl7Fq.o: in function llama_model_load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, llama_model&, gpt_vocab&, int)':
chat.cpp:(.text+0x10cb): undefined reference to ggml_init' /usr/bin/ld: chat.cpp:(.text+0x11a1): undefined reference to ggml_new_tensor_2d'
/usr/bin/ld: chat.cpp:(.text+0x11c9): undefined reference to ggml_new_tensor_1d' /usr/bin/ld: chat.cpp:(.text+0x11f8): undefined reference to ggml_new_tensor_2d'
/usr/bin/ld: chat.cpp:(.text+0x13c0): undefined reference to ggml_new_tensor_1d' /usr/bin/ld: chat.cpp:(.text+0x13ee): undefined reference to ggml_new_tensor_2d'
/usr/bin/ld: chat.cpp:(.text+0x141d): undefined reference to ggml_new_tensor_2d' /usr/bin/ld: chat.cpp:(.text+0x144c): undefined reference to ggml_new_tensor_2d'
/usr/bin/ld: chat.cpp:(.text+0x147b): undefined reference to ggml_new_tensor_2d' /usr/bin/ld: chat.cpp:(.text+0x14a3): undefined reference to ggml_new_tensor_1d'
/usr/bin/ld: chat.cpp:(.text+0x14d2): undefined reference to ggml_new_tensor_2d' /usr/bin/ld: chat.cpp:(.text+0x1501): undefined reference to ggml_new_tensor_2d'
/usr/bin/ld: chat.cpp:(.text+0x1530): undefined reference to ggml_new_tensor_2d' /usr/bin/ld: chat.cpp:(.text+0x1bd3): undefined reference to ggml_new_tensor_1d'
/usr/bin/ld: chat.cpp:(.text+0x1bfb): undefined reference to ggml_new_tensor_1d' /usr/bin/ld: chat.cpp:(.text+0x1c19): undefined reference to ggml_nbytes'
/usr/bin/ld: chat.cpp:(.text+0x1c2f): undefined reference to ggml_nbytes' /usr/bin/ld: chat.cpp:(.text+0x2325): undefined reference to ggml_nelements'
/usr/bin/ld: chat.cpp:(.text+0x238c): undefined reference to ggml_nelements' /usr/bin/ld: chat.cpp:(.text+0x2659): undefined reference to ggml_type_size'
/usr/bin/ld: chat.cpp:(.text+0x266f): undefined reference to ggml_type_size' /usr/bin/ld: chat.cpp:(.text+0x2685): undefined reference to ggml_type_size'
/usr/bin/ld: chat.cpp:(.text+0x26c6): undefined reference to ggml_type_size' /usr/bin/ld: chat.cpp:(.text+0x2772): undefined reference to ggml_blck_size'
/usr/bin/ld: chat.cpp:(.text+0x2792): undefined reference to ggml_nbytes' /usr/bin/ld: chat.cpp:(.text+0x27be): undefined reference to ggml_nbytes'
/usr/bin/ld: chat.cpp:(.text+0x2826): undefined reference to ggml_nbytes' /usr/bin/ld: chat.cpp:(.text+0x285a): undefined reference to ggml_nbytes'
/usr/bin/ld: chat.cpp:(.text+0x2883): undefined reference to ggml_nbytes' /usr/bin/ld: chat.cpp:(.text+0x28b2): undefined reference to ggml_blck_size'
/usr/bin/ld: chat.cpp:(.text+0x28d2): undefined reference to ggml_nbytes' /usr/bin/ld: chat.cpp:(.text+0x2913): undefined reference to ggml_nbytes'
/usr/bin/ld: chat.cpp:(.text+0x29a8): undefined reference to ggml_blck_size' /usr/bin/ld: chat.cpp:(.text+0x29c3): undefined reference to ggml_type_size'
/usr/bin/ld: chat.cpp:(.text+0x2a57): undefined reference to ggml_blck_size' /usr/bin/ld: chat.cpp:(.text+0x2a72): undefined reference to ggml_type_size'
/usr/bin/ld: chat.cpp:(.text+0x2b06): undefined reference to ggml_blck_size' /usr/bin/ld: chat.cpp:(.text+0x2b21): undefined reference to ggml_type_size'
/usr/bin/ld: chat.cpp:(.text+0x2bb8): undefined reference to ggml_nbytes' /usr/bin/ld: /tmp/ccCnl7Fq.o: in function llama_eval(llama_model const&, int, int, std::vector<int, std::allocator > const&, std::vector<float, std::allocator >&, unsigned long&)':
chat.cpp:(.text+0x34ae): undefined reference to ggml_init' /usr/bin/ld: chat.cpp:(.text+0x34f4): undefined reference to ggml_new_tensor_1d'
/usr/bin/ld: chat.cpp:(.text+0x3513): undefined reference to ggml_element_size' /usr/bin/ld: chat.cpp:(.text+0x3569): undefined reference to ggml_get_rows'
/usr/bin/ld: chat.cpp:(.text+0x35b3): undefined reference to ggml_rms_norm' /usr/bin/ld: chat.cpp:(.text+0x35f4): undefined reference to ggml_repeat'
/usr/bin/ld: chat.cpp:(.text+0x3610): undefined reference to ggml_mul' /usr/bin/ld: chat.cpp:(.text+0x3652): undefined reference to ggml_mul_mat'
/usr/bin/ld: chat.cpp:(.text+0x3694): undefined reference to ggml_mul_mat' /usr/bin/ld: chat.cpp:(.text+0x36d6): undefined reference to ggml_mul_mat'
/usr/bin/ld: chat.cpp:(.text+0x36fd): undefined reference to ggml_element_size' /usr/bin/ld: chat.cpp:(.text+0x3753): undefined reference to ggml_view_1d'
/usr/bin/ld: chat.cpp:(.text+0x376d): undefined reference to ggml_element_size' /usr/bin/ld: chat.cpp:(.text+0x37c3): undefined reference to ggml_view_1d'
/usr/bin/ld: chat.cpp:(.text+0x37ea): undefined reference to ggml_cpy' /usr/bin/ld: chat.cpp:(.text+0x37ff): undefined reference to ggml_build_forward_expand'
/usr/bin/ld: chat.cpp:(.text+0x381f): undefined reference to ggml_cpy' /usr/bin/ld: chat.cpp:(.text+0x3834): undefined reference to ggml_build_forward_expand'
/usr/bin/ld: chat.cpp:(.text+0x386a): undefined reference to ggml_new_tensor_3d' /usr/bin/ld: chat.cpp:(.text+0x3886): undefined reference to ggml_cpy'
/usr/bin/ld: chat.cpp:(.text+0x38aa): undefined reference to ggml_rope' /usr/bin/ld: chat.cpp:(.text+0x38d2): undefined reference to ggml_permute'
/usr/bin/ld: chat.cpp:(.text+0x391c): undefined reference to ggml_element_size' /usr/bin/ld: chat.cpp:(.text+0x3963): undefined reference to ggml_view_1d'
/usr/bin/ld: chat.cpp:(.text+0x3983): undefined reference to ggml_reshape_3d' /usr/bin/ld: chat.cpp:(.text+0x39a7): undefined reference to ggml_rope'
/usr/bin/ld: chat.cpp:(.text+0x39cf): undefined reference to ggml_permute' /usr/bin/ld: chat.cpp:(.text+0x39f6): undefined reference to ggml_mul_mat'
/usr/bin/ld: chat.cpp:(.text+0x3a3d): undefined reference to ggml_new_f32' /usr/bin/ld: chat.cpp:(.text+0x3a59): undefined reference to ggml_scale'
/usr/bin/ld: chat.cpp:(.text+0x3a7f): undefined reference to ggml_diag_mask_inf' /usr/bin/ld: chat.cpp:(.text+0x3a9f): undefined reference to ggml_soft_max'
/usr/bin/ld: chat.cpp:(.text+0x3ae9): undefined reference to ggml_element_size' /usr/bin/ld: chat.cpp:(.text+0x3b30): undefined reference to ggml_view_1d'
/usr/bin/ld: chat.cpp:(.text+0x3b50): undefined reference to ggml_reshape_3d' /usr/bin/ld: chat.cpp:(.text+0x3b78): undefined reference to ggml_permute'
/usr/bin/ld: chat.cpp:(.text+0x3b9f): undefined reference to ggml_mul_mat' /usr/bin/ld: chat.cpp:(.text+0x3bd2): undefined reference to ggml_permute'
/usr/bin/ld: chat.cpp:(.text+0x3bf9): undefined reference to ggml_new_tensor_2d' /usr/bin/ld: chat.cpp:(.text+0x3c15): undefined reference to ggml_cpy'
/usr/bin/ld: chat.cpp:(.text+0x3c57): undefined reference to ggml_mul_mat' /usr/bin/ld: chat.cpp:(.text+0x3c7e): undefined reference to ggml_add'
/usr/bin/ld: chat.cpp:(.text+0x3c9e): undefined reference to ggml_rms_norm' /usr/bin/ld: chat.cpp:(.text+0x3ce0): undefined reference to ggml_repeat'
/usr/bin/ld: chat.cpp:(.text+0x3cfc): undefined reference to ggml_mul' /usr/bin/ld: chat.cpp:(.text+0x3d3e): undefined reference to ggml_mul_mat'
/usr/bin/ld: chat.cpp:(.text+0x3d80): undefined reference to ggml_mul_mat' /usr/bin/ld: chat.cpp:(.text+0x3da0): undefined reference to ggml_silu'
/usr/bin/ld: chat.cpp:(.text+0x3dc7): undefined reference to ggml_mul' /usr/bin/ld: chat.cpp:(.text+0x3e09): undefined reference to ggml_mul_mat'
/usr/bin/ld: chat.cpp:(.text+0x3e30): undefined reference to ggml_add' /usr/bin/ld: chat.cpp:(.text+0x3e6a): undefined reference to ggml_rms_norm'
/usr/bin/ld: chat.cpp:(.text+0x3e95): undefined reference to ggml_repeat' /usr/bin/ld: chat.cpp:(.text+0x3eb1): undefined reference to ggml_mul'
/usr/bin/ld: chat.cpp:(.text+0x3edc): undefined reference to ggml_mul_mat' /usr/bin/ld: chat.cpp:(.text+0x3efc): undefined reference to ggml_build_forward_expand'
/usr/bin/ld: chat.cpp:(.text+0x3f15): undefined reference to ggml_graph_compute' /usr/bin/ld: chat.cpp:(.text+0x3f4f): undefined reference to ggml_get_data'
/usr/bin/ld: chat.cpp:(.text+0x3fa5): undefined reference to ggml_used_mem' /usr/bin/ld: chat.cpp:(.text+0x3fd2): undefined reference to ggml_free'
/usr/bin/ld: /tmp/ccCnl7Fq.o: in function llama_print_system_info()': chat.cpp:(.text+0x40d1): undefined reference to ggml_cpu_has_avx'
/usr/bin/ld: chat.cpp:(.text+0x414e): undefined reference to ggml_cpu_has_avx2' /usr/bin/ld: chat.cpp:(.text+0x41cb): undefined reference to ggml_cpu_has_avx512'
/usr/bin/ld: chat.cpp:(.text+0x4248): undefined reference to ggml_cpu_has_fma' /usr/bin/ld: chat.cpp:(.text+0x42c5): undefined reference to ggml_cpu_has_neon'
/usr/bin/ld: chat.cpp:(.text+0x4342): undefined reference to ggml_cpu_has_arm_fma' /usr/bin/ld: chat.cpp:(.text+0x43bf): undefined reference to ggml_cpu_has_f16c'
/usr/bin/ld: chat.cpp:(.text+0x443c): undefined reference to ggml_cpu_has_fp16_va' /usr/bin/ld: chat.cpp:(.text+0x44b9): undefined reference to ggml_cpu_has_wasm_simd'
/usr/bin/ld: chat.cpp:(.text+0x4536): undefined reference to ggml_cpu_has_blas' /usr/bin/ld: chat.cpp:(.text+0x45b3): undefined reference to ggml_cpu_has_sse3'
/usr/bin/ld: chat.cpp:(.text+0x4630): undefined reference to ggml_cpu_has_vsx' /usr/bin/ld: /tmp/ccCnl7Fq.o: in function main':
chat.cpp:(.text+0x4a7b): undefined reference to ggml_time_init' /usr/bin/ld: chat.cpp:(.text+0x4a80): undefined reference to ggml_time_us'
/usr/bin/ld: chat.cpp:(.text+0x4b0d): undefined reference to gpt_params_parse(int, char**, gpt_params&)' /usr/bin/ld: chat.cpp:(.text+0x4bb2): undefined reference to gpt_random_prompt[abi:cxx11](std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>&)'
/usr/bin/ld: chat.cpp:(.text+0x4c0c): undefined reference to ggml_time_us' /usr/bin/ld: chat.cpp:(.text+0x4c8b): undefined reference to ggml_time_us'
/usr/bin/ld: chat.cpp:(.text+0x4d6c): undefined reference to llama_tokenize(gpt_vocab const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)' /usr/bin/ld: chat.cpp:(.text+0x4dd8): undefined reference to llama_tokenize(gpt_vocab const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'
/usr/bin/ld: chat.cpp:(.text+0x4e44): undefined reference to llama_tokenize(gpt_vocab const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)' /usr/bin/ld: chat.cpp:(.text+0x51b6): undefined reference to ggml_time_us'
/usr/bin/ld: chat.cpp:(.text+0x5296): undefined reference to ggml_time_us' /usr/bin/ld: chat.cpp:(.text+0x536e): undefined reference to ggml_time_us'
/usr/bin/ld: chat.cpp:(.text+0x5420): undefined reference to llama_sample_top_p_top_k(gpt_vocab const&, float const*, std::vector<int, std::allocator<int> >&, double, int, double, double, std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>&)' /usr/bin/ld: chat.cpp:(.text+0x548c): undefined reference to ggml_time_us'
/usr/bin/ld: chat.cpp:(.text+0x5a79): undefined reference to `llama_tokenize(gpt_vocab const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)'
collect2: error: ld returned 1 exit status
make: *** [: chat] Error 1

V100显卡，完全follow说明跑的finetue，会出现乱码。似乎是8bit的原因。

llama_quant.py 用datasets进行评估的意义

1）我看使用英文的数据集来评估的，可以省略这一步吗
2）如何用自己的数据集进行评估呢

关于继续训练（continue finetuning）

你好，感谢分享。
是否可以理解为，在垂类应用上，构造好预料，替换sh中的data_path，然后运行sh文件就能在你们分享的模型权重下继续进行训练，以得到一个应用于上述垂类语料中的模型呢？
谢谢

python chat.py时报错

环境windows11，conda虚拟环境运行，python3.8，完成了pip install 操作，运行python chat.py时报错。报错内容如下：
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
C:\Users\tuzhiyong.conda\envs\python38\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
./lora-Vicuna/checkpoint-3000\adapter_model.bin
./lora-Vicuna/checkpoint-3000\pytorch_model.bin
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 33/33 [00:49<00:00, 1.49s/it]
Traceback (most recent call last):
File "C:\Users\tuzhiyong.conda\envs\python38\lib\site-packages\peft\utils\config.py", line 99, in from_pretrained
config_file = hf_hub_download(pretrained_model_name_or_path, CONFIG_NAME)
File "C:\Users\tuzhiyong.conda\envs\python38\lib\site-packages\huggingface_hub\utils_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "C:\Users\tuzhiyong.conda\envs\python38\lib\site-packages\huggingface_hub\utils_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './lora-Vicuna/checkpoint-3000'. Use repo_type argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File ".\chat.py", line 81, in
model = SteamGenerationMixin.from_pretrained(
File "I:\Chinese-Vicuna\utils.py", line 670, in from_pretrained
config = LoraConfig.from_pretrained(model_id)
File "C:\Users\tuzhiyong.conda\envs\python38\lib\site-packages\peft\utils\config.py", line 101, in from_pretrained
raise ValueError(f"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'")
ValueError: Can't find 'adapter_config.json' at './lora-Vicuna/checkpoint-3000'

不管用不用梯子都不行，重复多次，问题依旧。

量化后的模型4bit，模型输出是乱码

用的llama_quant.py，未用数据集评估，量化后模型输出是乱码，请问啥原因呀

对中文理解上，加一层embedding 会不会更好？

预训练阶段，在原版LLaMA权重的基础上进行预训练
分为两个阶段：

第一阶段：固定模型transformer部分的参数，仅训练embedding，在尽量不干扰原模型的情况下适配新增的中文词向量。
目的：适配新增中文词表的词向量学习，以进一步提升基本的中文语义能力。
第二阶段：使用LoRA技术，为模型添加LoRA权重（adapter），训练embedding的同时也更新LoRA参数。

这个是我是这个仓库里看到的 https://github.com/ymcui/Chinese-LLaMA-Alpaca，
也测试了下，量化成4bit后测试效果是这样的（虽然有问题，但相比在语病，和多轮对话上有明显进步）：

> 你是一个流浪剑客 刚走到桥边 看到桥对面走来一个江湖恶霸 你心中的怒火腾腾升起
你好，我是一个流浪剑士。我刚刚来到了这座城市的一座桥梁上，突然看见在桥头走过的是一位江湖恶霸。我的愤怒心声立刻激起了起来。
> 你是谁？
我是一名流浪剑客。我在这城里漫游，寻找着自己的命运和自由。我正在这座城市的一座桥上停下脚步时，突然看见在桥头走过的是一位江湖恶霸…
> 你想对他怎么样？
我想把他杀掉！他是个无耻、卑鄙的人。他应该受到严惩的惩罚。我必须立即行动，以保护自己和我的自由不受他的侵害…
> 可是他比你力气大 你能打倒他吗？
我不得不挑战这个恶霸！虽然他是个强壮的家伙，但我有勇气和决心去打败他的。我必须立即行动起来，保护自己和我自己的自由不受他的侵害…
> 可是如果你被打败了 可怎么办呢？
如果我被他击败后，我会再次挑战这个恶
> 可是他直接要杀掉你呢
我不得不挑战它！虽然他是个强壮的家伙，但我有勇气和决心去打败它的。我必须立即行动起来，保护自己和我自己的自由不受他的侵害…
> 狭路相逢勇者胜 勇敢的挑战他吧 你一定可以打倒他
我不得不挑战它！虽然他是个强壮的家伙，但我有勇气和决心去打败它的。我必须立即行动起来，保护自己和我自己的自由不受他的侵害…
> 去吧
我会勇敢地挑战他并试图把他杀掉！虽然他是个强壮的家伙，但我有勇气和决心来打败它。我必须立即行动起来，保护自己和我自己的自由不受他的侵害…
> 不要嘴炮了 上吧
我会勇敢地挑战他并试图把他杀掉！虽然他是个强壮的家伙，但我有勇气和决心来打败它。我必须立即行动起来，保护自己和我自己的自由不受他的侵害…
> 你拔剑向对方刺了过去 对方闪都没有闪 空手接白刃
我会勇敢地挑战它并试图
> 可是你剑被他控制了 你得做什么
我要勇敢地挑战它，并且要尽力打败它！
> 恶霸不仅打人 而且骂人 他满嘴脏话
我会勇敢的挑战它并试图把它杀掉。我要用我的勇气和决心来打败它！
> 你已经落下风了，打不过 先骂回去
我一定会勇敢地挑战他并且尽力把他击倒吧！我会用自己的勇气和决心来战胜它！（笑声）。
> 恶霸突然有些力不从心 没想到他向你求饶 真是个外强中干的家伙
我会勇敢地挑战他的，并且尽力把它击倒吧！我要用自己的勇气和决心来战胜它！（笑声）。
> 他已经认输了
我一定会勇敢的挑战它的并试图将其打败呢！我会用我的勇气和决心来战胜它！（笑声）。
> 你已经把他打败了
我要勇敢地挑战他的，并且尽力把它击倒吧！我将要用自己的勇气和决心来战胜它！（笑声）。

Vicuna.cpp 衍生自 Llama.cpp 还是 Alpaca.cpp？基于 Alpaca 和 Llama 中文embedding后 finetune出来的模型会通用吗？

Vicuna.cpp

Vicuna.cpp 衍生自 Llama.cpp 还是 Alpaca.cpp？
基于 Alpaca 和 Llama 中文embedding后 finetune出来的模型会通用吗？

你好，请问上下文对话中你们是引入了langchain吗？

请问下上下文理解这里使用了https://github.com/deep-diver/Alpaca-LoRA-Serve
https://github.com/Facico/Chinese-Vicuna/blob/master/docs/performance.md
还是langchain？
我没太理解，langchain和Alpaca-LoRA-Serve中多轮对话的区别，希望得到解答，谢谢！

readme 的环境配置指导文件是否存在错误？

1、你使用了哪个脚本、使用的什么命令:上图所示
2、你的参数是什么（脚本参数、命令参数）：无
3、你是否修改过我们的代码：没有修改
4、你用的哪个数据集：readme中的默认值

finetune 脚本一直无法正常运行，反复提示环境错误，找不到cuda

然后你可以从环境的角度描述你的问题，这些问题我们在readme已经相关的问题及解决可能会有描述：
1、哪个操作系统:windows 10 ,WSL2
2、使用的什么显卡、多少张:nvidian 2080ti,2张
3、python的版本：3.8
4、python各种库的版本：按照readme操作的

然后你也可以从运行的角度来描述你的问题：
1、报错信息是什么，是哪个代码的报错（可以将完整的报错信息都发给我们）
一直提示的事cuda错误、找不到GPU之类的，我严格按照readme安装环境，依然不能正常工作，我个人不清楚readme中的torch用1.13.1是否指的是pytorch ?缩写了？ cuda要求12版本的，这个是正确的吗？因为pytorch安装1.13.1的话CUDA应该是11.7版本才对，我按照readme安装的12无法正常工作。

2、GPU、CPU是否工作正常：不正常，未识别出GPU，CPU模式也无法运行

关于LLAMA词表的疑问

您好！感谢开源这么棒的项目，我有些疑问，很多人说LLAMA词表中只有几百个中文字，显然是不全的，这方面大佬是怎么考虑的？

generate_quant.py 脚本运行失败

我想要运行一个量化模型，出现错误：ModuleNotFoundError: No module named 'gptq'

generate_quant.py 不能正常运行，是不是少了某些文件？

torch.cuda.OutOfMemoryError

使用 13B 模型，並用以下指令：

CUDA_VISIBLE_DEVICES=1 python generate.py --model_path "decapoda-research/llama-13b-hf" --lora_path "Chinese-Vicuna/Chinese-Vicuna-lora-13b-belle-and-guanaco" --use_local 1

最終出現這樣錯誤：

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 0; 10.75 GiB total capacity; 10.17 GiB already allocated; 47.94 MiB free; 10.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

顯卡是使用 RTX 2080 11G

有設置過

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32

依然無用

也有至 generate.py 把 batch_size = 2 使用依然無效

有什麼建議嗎？

finetune时报错：KeyError: 'models.llama'

操作系统：Ubuntu
显卡：3090
Python：3.8
其他Python库版本如下：

pytorch-mutex             1.0                        cuda    pytorch
torch                     2.0.0                    pypi_0    pypi
torchaudio                0.12.1               py38_cu113    pytorch
torchvision               0.13.1               py38_cu113    pytorch
cudatoolkit               11.3.1               h2bc3f7f_2    defaults
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
transformers              4.27.4                   pypi_0    pypi
tokenizers                0.13.3                   pypi_0    pypi
sentencepiece             0.1.97                   pypi_0    pypi

nvidia-smi

NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

上游模型、lora模型、数据都下载到本地了
finetune脚本如下：

DATA_PATH="./sample/merge.json" #"../dataset/instruction/guanaco_non_chat_mini_52K-utf8.json" #"./sample/merge_sample.json"
OUTPUT_PATH="my-lora-Vicuna"
MODEL_PATH="../llama-13b-hf/"
lora_checkpoint="../Chinese-Vicuna-lora-13b-belle-and-guanaco/"
TEST_SIZE=2000

python finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size $TEST_SIZE

执行finetune脚本时报错如下：

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/doudou/miniconda3/envs/chinese-vicuna/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/doudou/miniconda3/envs/chinese-vicuna/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Traceback (most recent call last):
  File "finetune.py", line 13, in <module>
    "LlamaTokenizer" in transformers._import_structure["models.llama"]
KeyError: 'models.llama'

求助一下，这是什么原因呢？

Vicuna 里的 finetune.py 和 alpaca-lora 里的finetune.py finetune后的模型可以互通吗？

finetune 用的 prompt 一样的话
Vicuna 里的 finetune.py 和 alpaca-lora 里的finetune.py finetune后的模型可以互通吗？

关于语料问题

从网盘新下载了 merge.json 语料
发现原来是 663M 现在变成389M了
是什么原因语料变小只保留了70W+条大佬？

llama能加载多个lora模型的参数吗

我使用了作者的lora模型，还有一个是自己训练的lora模型，这两个lora权重可以共同作用吗

后续会考虑加入RLHF吗

output 生成内容为乱码

大佬老师你好：

input:
**的首都是哪儿？

output:
��

正常运行起来后，通过chrome浏览器访问：http://127.0.0.1:7860
生成结果为乱码

改用360浏览器，结果均为乱码，
需要设置什么吗？

为什么fine-tune过程中loss会忽大忽小呢？

13B llama + 70w开源语料，其中1w作为test，为什么fine-tune的loss最开始是1.0左右，后续就一会儿变得很大，一会儿变成0呢？是因为test集合相对于fine-tune的集合太小吗？

merge 13B模型和finetune出来的13B结果进程直接被Killed 当时机器内存还有23G，显存未使用

大佬老师，我用这个命令merge 13B模型 (merge 7B模型正常，运行正常，merge 13B模型出错，机器内存６４Ｇ，其它进程都关了)：
python tools/merge_lora_for_cpp.py --model_path /mnt/e/zllama-models/llama-13b-hf --lora_path ./lora-Vicuna/checkpoint-13B/checkpoint-6000 --out_path ./lora-Vicuna/llama13b-checkpoint-6000-with-lora

尝试了多次

第一次报这个错：

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/Chinese-alpaca-lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA exception! Error code: no CUDA-capable device is detected
CUDA exception! Error code: initialization error
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [02:41<00:00,  3.94s/it]
Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/tools/merge_lora_for_cpp.py", line 123, in <module>
    new_state_dict[new_k] = unpermute(v)
  File "/mnt/e/Chinese-Vicuna/tools/merge_lora_for_cpp.py", line 76, in unpermute
    w.view(n_heads, 2, dim // n_heads // 2, dim).transpose(1, 2).reshape(dim, dim)
RuntimeError: shape '[32, 2, 64, 4096]' is invalid for input of size 26214400

再尝试报这个错，只提示了Killed

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/Chinese-alpaca-lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA exception! Error code: no CUDA-capable device is detected
CUDA exception! Error code: initialization error
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [02:30<00:00,  3.67s/it]
Killed

https://chat.lmsys.org/ 問到第三輪會跑超久

https://chat.lmsys.org/ 這個同一個對話框問答到第三輪時，似乎時間就會跑超久或無反應？請問這是什麼原因？

finetune的 MAX_STEPS = None 意义是什么？可以改成其他吗？

这里的 MAX_STEPS = None 为什么要设置成None？可以改成其他吗？

if not args.wandb:
    os.environ["WANDB_MODE"] = "disable"
# optimized for RTX 4090. for larger GPUs, increase some of these?
MICRO_BATCH_SIZE = 4  # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
MAX_STEPS = None
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3  # we don't always need 3 tbh
LEARNING_RATE = 3e-4  # the Karpathy constant
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
TARGET_MODULES = [
    "q_proj",
    "v_proj",
]

TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

base_model='llama-7b-hf', 2 GPU, but failed.