thudm / agenttuning Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 82.0 28.66 MB

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Home Page: https://thudm.github.io/AgentTuning/

Python 37.00% JavaScript 8.10% CSS 1.16% HTML 22.00% Shell 0.98% C++ 28.69% Makefile 1.43% CMake 0.64% TypeScript 0.01%

agenttuning's People

Contributors

Stargazers

Watchers

Forkers

snoopycn codeaudit darcstar-solutions-tech dumpmemory userbox020 tarantulatechnology onecany eltociear contropist f901107 ai-dialogos-chatbot-with-llms sorokinvld evdcush gxkim shresthasurav minhtien2405 wwxfromtju kingler lizeyuan-z zheleska techthiyanes universewill goswamig xfg0913 hanhanhong jsheng112 harshhere905 jansystemic bhargavshirin hhy5277 tonywhite11 qcwthu mascobot shawnzhang31 yzxzero coldra1n shraddha761 killer2op 0armaan025 knowledgehacker mohitd404 linsnowx nashuju chgg joskid eric-doug strategist922 flazerain meslubi2021 phuocnguyenhuu cinalin zainlau zhubao315 ftgreat silasdao adambear audiebant sundogs8603 richelynscott tianmingxtu fucheng830 albertbj stone-wlg startime-h cylonspace xinhen shitianyu-hue novellll mu-l liyz2poly crazyboystop syno8 bocchi810 wangshuai199696 zhaopufeng akrichikov xiaozhiob yuhaibao39 zhoumz123 zhousteven

agenttuning's Issues

本地模型

如果我的模型部署在本地，能否进行Held-in和Held-out任务的评测呢？可以给一些方法概述吗，不尽感激！

Hi, thanks for the interesting work and awesome codebase. Recently, I tested the 7b model with TGI (used config file: docker/agentlm-7b.yml) and I just found that the model's output seems to be trunked:

Query: I want to have a journey to California, please help me to build up a tourism plan. Please think step by step.
Answer: {"generated_text":"Sure, I'd be happy to help you plan a tour of California! Here's a"}

I wonder if I need to revise the config to enable the model to generate more words. Many thanks

An open queston: What's the difference between Agents and Tools

In your paper, I appreciate the evident enhancements achieved by "agents" across various benchmarks. I am intrigued by your approach to identifying and defining agents, a matter of ongoing concern within what some refer to as an "Agent Group." Personally, I perceive these entities as valuable tools, cleverly branded as agents to garner attention. 😂

训练数据中指令与模型行为不匹配

查看huggingface dataset上ALFWorld和Mind2Web的训练数据，发现根据提供的指令，模型不可能产生预期的行为，比如下面两条数据，这个是符合预期的吗？

train code

May I ask if there is code available for training agentLM?

什么时候上魔塔社区

论文中关于损失函数E的问题

您好，看了下您这边的论文，是针对最后损失loss进行修改吗？其中J(θ) = η · E(x,y)∼Dagent [log πθ (y | x)] + (1 − η) · E(x,y)∼Dgeneral [log πθ (y | x)] E（x,y）在论文中并没有找到对应的解释说明，是CE—loss吗

Finetuning with Mistral or Yi?

As titled. Are there plans to use a newer base model to finetune agents, such as Mistral 7B or Yi 34B?

Dataset details 中找不到reward的计算方式

请教以下问题，非常感谢您的回答：）

1.AgentBench 论文附录中数据集的 Dataset details 中找不到reward的计算方式！？比如DB的C.1中只是提到”Metrics. We measure the Success Rate of agents in completing instructions.“ 这个不是计算trajectory的reward分数（而且AgentBench中DB数据并没有trajectory）。

AgentBench 中为什么#Dev比#Test大呢？如DB的#Dev=60，#Test=300. 训练集比测试集大吗？

AgentTuning 7b evaluate in HH， not expect as paper result

https://huggingface.co/THUDM/agentlm-7b ， I try it,but far below 84% in alfworld-std. Is it the wrong model？

if it is possible to conduct RLHF from env

Thanks for open-sourced agentTuning code , I am quite interested in training the model, i see the training framework is not open-sourced #1,

The discussion mentioned that it could support ptuning or LORA, i am also wondering if it could also support RLHF?

Recently, i read a paper: https://arxiv.org/abs/2312.14878, i am curious how the AgentLM performance would be if we could let it learn from interacting with environments. (refer to Finetune type II in that paper)

关于TRAJECTORY FILTERING问题

是6个数据集都用reward作为指标进行筛选？还是像agentbench里面的那样，os用SR，KG用F1，DCG用reward

是否可以不在docker里运行

我想在服务器上运行代码，是否可以不用docker，能否提供一下比较推荐的python安装版本

Start TGI worker

您好，我在评估mt_bench时有两个问题请问如何start TGI worker？以及host=127.0.0.1和port=30070都是固定不变的吗？非常期待您的回答

关于数据集

Filt. Traj. 就是huggingface上数据的行数，那# Inst. 和 Avg # Filt.Traj. Turns是怎么计算的呀？举例来说，下面这个例子要怎么算呢？

[ { "from": "human", "loss": null, "value": "You are web shopping.\nI will give you instructions about what to do.\nYou have to follow the instructions.\nEvery round I will give you an observation and a list of available actions, you have to respond an action based on the state and instruction.\nYou can use search action if search is available.\nYou can click one of the buttons in clickables.\nAn action should be of the following structure:\nsearch[keywords]\nclick[value]\nIf the action is not valid, perform nothing.\nKeywords in search are up to you, but the value in click must be a value in the list of available actions.\nRemember that your keywords in search should be carefully designed.\nYour response should use the following format:\n\nThought:\nI think ... \n\nAction: \nclick[something]" }, { "from": "gpt", "loss": false, "value": "Ok." }, { "from": "human", "loss": null, "value": "WebShop [SEP] Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Search" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should use the search bar to look for a mid-century coffee table for my living room.\n\nAction:\nsearch[mid-century coffee table]" }, { "from": "human", "loss": null, "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] Page 1 (Total results: 50) [SEP] Next > [SEP] B09FDCPKRR [SEP] Home Square 2 Piece Living Room Set with Coffee Table and End Table in Walnut [SEP] $757.76 [SEP] B07XQ712TW [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] $100.0 [SEP] B01DBKYOV6 [SEP] Baxton Studio 424-6898-AMZ Coffee Table, Brown [SEP] $239.99 [SEP] B08WRSWF5X [SEP] Modway Render Mid-Century Modern Coffee Table in Walnut [SEP] $272.12 [SEP] B09333CV7P [SEP] Modway Envision 70\" Mid-Century Modern Low Profile Entertainment TV Stand in Walnut White & Transmit Mid-Century Coffee Table, Walnut White [SEP] $453.5 [SEP] B078KKB8Z9 [SEP] Modway Transmit Mid-Century Coffee Table, Walnut White [SEP] $199.5 [SEP] B077Y987TL [SEP] Furniture of America Regols Rego Mid-Century Coffee Table with Glass Top, Gray [SEP] $100.0 [SEP] B06W52GBKW [SEP] Archian Walnut Wood High Coffee Table [SEP] $88.79 [SEP] B07NKG7DF9 [SEP] Modway Origin 47\" Mid-Century Modern Wood Coffee Table In Walnut White [SEP] $218.34 [SEP] B01HQGZEO2 [SEP] Baxton Studio Pierce Mid-Century Modern Coffee Table Walnut Brown/Contemporary [SEP] $105.53" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should click on the 'IWELL Mid Century Coffee Table for Living Room' option as it fits my requirements and has a price of $100.0.\n\nAction:\nclick[B07XQ712TW]" }, { "from": "human", "loss": null, "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] < Prev [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] Price: $100.0 [SEP] Rating: N.A. [SEP] Description [SEP] Features [SEP] Reviews [SEP] Buy Now" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should click on the 'Buy Now' button to purchase the IWELL Mid Century Coffee Table.\n\nAction:\nclick[Buy Now]" } ]

huggingface不让访问，有推荐的下载方式吗？

请问下agentlm-7b最少需要多少显存可以推理

下载了hugging face上的agentlm-7b模型进行推理，发现4090上显存不够。

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = '/home/nickkf/Documents/models/agentlm-7b'

# 加载模型和tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 设置设备为gpu或cpu
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 定义聊天输入文本
chat_round = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. 
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST]"""

# 对输入进行tokenize
inputs = tokenizer([chat_round], return_tensors="pt").to(device)

# 生成回复response
reply_ids = model.generate(**inputs)
reply = tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0]

print("AI:", reply)

可以给个简单点的工具调用示例吗

貌似hotpotqa测试脚本跑不起来？

如题，我按照规定要求设置的环境，但是跑不起来，有更清晰的环境设置么？

通用数据如何筛选

想问一下，通用数据ShareGPT_Vicuna_unfiltered有9w条，你们是如何筛选到5w条的？能提供一下脚本吗

请教reward分数的各种情况

reward出现了以下情况：
1.论文第2节说”Each trajectory has a final reward r ∈ [0, 1], reflecting the completion status of the task。“
2.论文第2.1.3节说 "due to the difficulty of the Mind2Web task, we use a threshold of r ≥ 2/3 to ensure we obtain a sufficient number of trajectories."
3. 论文”AgentBench: Evaluating LLMs as Agents“ 第E.1节说 ”ultimately we provide a final reward score according to the above metrics:
reward = 0.7 × metricwinrate + 0.3 × metricdamagerate“
4.论文”AgentBench: Evaluating LLMs as Agents“ 第H.1节说 ”As there might be more than one suitable item for a given query, Webshop adopts a matching reward as its evaluation metric:“
$$ Reward =\frac{|Uatt ∩ Yatt| + |Uopt ∩ Yopt| + I[yprice ≤ uprice]}{|Uatt| + |Uopt| + 1}·rtype $$

请教作者：

1和2冲突了？（1说r的取值范围只有0和1）
3和4的公式分别是计算DCG和WB的r, 那本文中各个任务的r计算公式分别是啥呀？

Can you point to the ShareGPT filtered/cleaned data used?

Hey, thank you for your great work. I was replicating the training of AgentLM and was searching for the ShareGPT data used. The paper mentions this as the data used but I cannot find the filtered/cleaned version anywhere. Can you pls tell how to get the final version of the ShareGPT data used for training AgentLM?

fine-tune code

Hi, will you open-source the fine-tuning code as well?

Add license

Add MIT License
Can I work on this?

Inference with `vllm`

Hi, thanks for ur great work!
I notice that you use Text-Generation-Inference to accelerate the evaluation process.
Does your model work with vllm?

Auto comment

Auto comment workflow add comment while issue is opened , closed pr is opened.

weight decay确定是0.1吗？

如题，我好像还没有见过这么大的weight decay

agent tuning和toolbench的区别

之前读过你们toolbench的论文，感觉数据集构建和指令微调的思路也和这篇agent tuning差不太多。
想请教一下两篇paper的差别和侧重点，非常感谢！

关于reward

请教以下问题，非常感谢您的回答：）

reward既作为模型metric，又作为轨迹筛选标准？！AgentBench中DCG和WS都使用reward来作为模型在该任务上的metric, 但是AgentTunning中是用来筛选交互轨迹的-“Recall that each interaction trajectory receives a reward r, this allows us to automatically select high-quality trajectories based on the reward.”
AgentTunning中有6个held-in任务，他们的reward计算公式是？ AgentBench出现了两个reward计算公式，分别用于DCG和WS，都是专门为DCG和WS设计的，不能推广到其他任务，比如DB。

Number of training steps

Great work for creating the AgentTuning data! I'm wondering how many training steps did you use for training the final AgentLM-7B model?

魔塔上的 AgentInstruct 数据集的 conversation 都是空值

你好，我下了魔塔上的 AgentInstruct 数据集，但 conversation 都是空值，请问是数据不开源了嘛？

基于fastchat部署，推理异常

用fastchat部署AgentLM-13B，推理的时候格式是乱的，尤其是streaming的模式，每行只有几个字符就切换到下一行了，一个单词被切成了好几个字母或字母组合。如果手动用transformer加载并用gradio展示的话就没有这个问题，用fastchat的debug模式看了一下，用的是LlamaForCausalLM加载的模型，应该没错

Grammer mistake in readme

I found some grammars mistake in readme.

requests.exceptions.MissingSchema: Invalid URL '127.0.0.123332/generate': No scheme supplied. Perhaps you meant https://127.0.0.123332/generate?

您好，我在运行python eval_general/eval_gsm8k_tgi.py --port 30070出现了上述问题，请问如何解决呢？

论文中的问题

Recognizing the superior quality of GPT-4 responses as highlighted by (Wang et al., 2023a), we adopted a sampling ratio of 1:4 between GPT-4 and GPT-3.5 for better performance.
GPT-4的响应质量不是更高吗，为什么采样比例GPT-4反而低呢？

期待用 Qwen72B 训练的模型。

请问大佬们有这个想法吗。

关于dataset statics 和 download

Filt. Traj. 就是huggingface上数据的行数，那# Inst. 和 Avg # Filt.Traj. Turns是怎么计算的呀？举例来说，下面这个例子的# Inst. 和 Avg # Filt.Traj. Turns要怎么算呢？

[
    {
        "from": "human",
        "loss": null,
        "value": "You are web shopping.\xxx"
    },
    {
        "from": "gpt",
        "loss": false,
        "value": "Ok."
    },
    {
        "from": "human",
        "loss": null,
        "value": "WebShop [SEP] Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Search"
    },
    {
        "from": "gpt",
        "loss": true,
        "value": "Thought:\nI think I should use the search bar to look for a mid-century coffee table for my living room.\n\nAction:\nsearch[mid-century coffee table]"
    },
    {
        "from": "human",
        "loss": null,
        "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] Page 1 (Total results: 50) [SEP] Next > [SEP] B09FDCPKRR [SEP] Home Square 2 Piece Living Room Set with Coffee Table and End Table in Walnut [SEP] $757.76 [SEP] B07XQ712TW [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] $100.0 [SEP] B01DBKYOV6 [SEP] Baxton Studio 424-6898-AMZ Coffee Table, Brown [SEP] $239.99 [SEP] B08WRSWF5X [SEP] Modway Render Mid-Century Modern Coffee Table in Walnut [SEP] $272.12 [SEP] B09333CV7P [SEP] Modway Envision 70\" Mid-Century Modern Low Profile Entertainment TV Stand in Walnut White & Transmit Mid-Century Coffee Table, Walnut White [SEP] $453.5 [SEP] B078KKB8Z9 [SEP] Modway Transmit Mid-Century Coffee Table, Walnut White [SEP] $199.5 [SEP] B077Y987TL [SEP] Furniture of America Regols Rego Mid-Century Coffee Table with Glass Top, Gray [SEP] $100.0 [SEP] B06W52GBKW [SEP] Archian Walnut Wood High Coffee Table [SEP] $88.79 [SEP] B07NKG7DF9 [SEP] Modway Origin 47\" Mid-Century Modern Wood Coffee Table In Walnut White [SEP] $218.34 [SEP] B01HQGZEO2 [SEP] Baxton Studio Pierce Mid-Century Modern Coffee Table Walnut Brown/Contemporary [SEP] $105.53"
    },
    {
        "from": "gpt",
        "loss": true,
        "value": "Thought:\nI think I should click on the 'IWELL Mid Century Coffee Table for Living Room' option as it fits my requirements and has a price of $100.0.\n\nAction:\nclick[B07XQ712TW]"
    },
    {
        "from": "human",
        "loss": null,
        "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] < Prev [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] Price: $100.0 [SEP] Rating: N.A. [SEP] Description [SEP] Features [SEP] Reviews [SEP] Buy Now"
    },
    {
        "from": "gpt",
        "loss": true,
        "value": "Thought:\nI think I should click on the 'Buy Now' button to purchase the IWELL Mid Century Coffee Table.\n\nAction:\nclick[Buy Now]"
    }
]```

2. huggingface不让访问，有推荐的下载方式吗？

除了用docker运行，还有其他方式可以运行AgentLM吗？

比如用脚本运行或者配置到Langchain-chatchat可以吗？

论文中Table 2中的数字的含义和计算方式

可以解释一下Table 2中的数字的含义和计算方式吗：）

Table 2: Ablation study on trajec�tory filtering.

底座模型基于llama2，是否支持中文呢

微调

麻烦大神回答了！
有没有微调过程成的细节？
直接就用fastchat的那个微调的代码吗？
期待您的回复~

Can I run AgentInstruct data on the AgentBench?

Hey, thank you for your great work. I just wanted to know how can I run evaluation on the open source AgentInstruct data on the AgentBench repo. I will be really grateful if you can share the files defined here for running AgentInstruct on AgentBench. I am asking this because I want to see eval some models on the AgentInstruct data.

Adding Contributors Section in readme.md file.

Why Contributors section:- A "Contributors" section in a repo gives credit to and acknowledges
the people who have helped with the project, fosters a sense of community, and helps others
know who to contact for questions or issues related to the project.

Issue type

[✅] Docs

Expected output :-

@ kindly assign this issue to me ! I would love to work on it ! Thank you !

微调显存

请问微调一个模型（例如7B）需要多大的显存？

AgentLM能支持openai.api类的接口本地部署吗？

请问AgentLM能支持openai.api类的接口本地部署吗？尝试用了fastchat进行部署，提示说找不到这个模型。请问AgentLM如何进行openai.api类的部署？

训练数据是如何采样的？

我观察到 #36 里面说一共进行了2000个step，所以应该有2000x64=128000个样本，考虑到 &eta = 0.2，所以sharegpt的数据应该为102400，但是 #16 中说这个数据已经是上采样了，这之间的gap是为什么呢？
以及考虑到 &eta = 0.2 所以是把agenttuning的数据在一个epoch里上采样了60000 * 0.2 / 1800 = 8倍吗？（具体来说，在2000个step下，每个agenttuning/sharept样本大概被训练了几次呢？）
还有你们为什么会选择sharegpt用60000这个数据量呢？（而不是10000或者40000之类的）

交互轨迹的Reward如何得到

论文中的交互轨迹的Reward如何得到？用什么进行打分？

请问哪里可以找到工作里对于数据库方面的训练数据

就是收集BIRD的思维轨迹那一块

thudm / agenttuning Goto Github PK

agenttuning's People

Contributors

Stargazers

Watchers

Forkers

agenttuning's Issues

Filt. Traj. 就是huggingface上数据的行数，那# Inst. 和 Avg # Filt.Traj. Turns是怎么计算的呀？举例来说，下面这个例子要怎么算呢？

Filt. Traj. 就是huggingface上数据的行数，那# Inst. 和 Avg # Filt.Traj. Turns是怎么计算的呀？举例来说，下面这个例子的# Inst. 和 Avg # Filt.Traj. Turns要怎么算呢？

Recommend Projects

Recommend Topics

Recommend Org