thudm / agenttuning Goto Github PK
View Code? Open in Web Editor NEWAgentTuning: Enabling Generalized Agent Abilities for LLMs
Home Page: https://thudm.github.io/AgentTuning/
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Home Page: https://thudm.github.io/AgentTuning/
如果我的模型部署在本地,能否进行Held-in和Held-out任务的评测呢?可以给一些方法概述吗,不尽感激!
Hi, thanks for the interesting work and awesome codebase. Recently, I tested the 7b model with TGI (used config file: docker/agentlm-7b.yml) and I just found that the model's output seems to be trunked:
Query: I want to have a journey to California, please help me to build up a tourism plan. Please think step by step.
Answer: {"generated_text":"Sure, I'd be happy to help you plan a tour of California! Here's a"}
I wonder if I need to revise the config to enable the model to generate more words. Many thanks
In your paper, I appreciate the evident enhancements achieved by "agents" across various benchmarks. I am intrigued by your approach to identifying and defining agents, a matter of ongoing concern within what some refer to as an "Agent Group." Personally, I perceive these entities as valuable tools, cleverly branded as agents to garner attention. 😂
May I ask if there is code available for training agentLM?
什么时候上魔塔社区
您好,看了下您这边的论文,是针对最后损失loss进行修改吗?其中J(θ) = η · E(x,y)∼Dagent [log πθ (y | x)] + (1 − η) · E(x,y)∼Dgeneral [log πθ (y | x)] E(x,y)在论文中并没有找到对应的解释说明,是CE—loss吗
As titled. Are there plans to use a newer base model to finetune agents, such as Mistral 7B or Yi 34B?
请教以下问题,非常感谢您的回答:)
1.AgentBench 论文附录中数据集的 Dataset details 中找不到reward的计算方式!?比如DB的C.1中只是提到”Metrics. We measure the Success Rate of agents in completing instructions.“ 这个不是计算trajectory的reward分数(而且AgentBench中DB数据并没有trajectory)。
https://huggingface.co/THUDM/agentlm-7b , I try it,but far below 84% in alfworld-std. Is it the wrong model?
Thanks for open-sourced agentTuning code , I am quite interested in training the model, i see the training framework is not open-sourced #1,
The discussion mentioned that it could support ptuning or LORA, i am also wondering if it could also support RLHF?
Recently, i read a paper: https://arxiv.org/abs/2312.14878, i am curious how the AgentLM performance would be if we could let it learn from interacting with environments. (refer to Finetune type II in that paper)
是6个数据集都用reward作为指标进行筛选?还是像agentbench里面的那样,os用SR,KG用F1,DCG用reward
我想在服务器上运行代码,是否可以不用docker,能否提供一下比较推荐的python安装版本
[ { "from": "human", "loss": null, "value": "You are web shopping.\nI will give you instructions about what to do.\nYou have to follow the instructions.\nEvery round I will give you an observation and a list of available actions, you have to respond an action based on the state and instruction.\nYou can use search action if search is available.\nYou can click one of the buttons in clickables.\nAn action should be of the following structure:\nsearch[keywords]\nclick[value]\nIf the action is not valid, perform nothing.\nKeywords in search are up to you, but the value in click must be a value in the list of available actions.\nRemember that your keywords in search should be carefully designed.\nYour response should use the following format:\n\nThought:\nI think ... \n\nAction: \nclick[something]" }, { "from": "gpt", "loss": false, "value": "Ok." }, { "from": "human", "loss": null, "value": "WebShop [SEP] Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Search" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should use the search bar to look for a mid-century coffee table for my living room.\n\nAction:\nsearch[mid-century coffee table]" }, { "from": "human", "loss": null, "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] Page 1 (Total results: 50) [SEP] Next > [SEP] B09FDCPKRR [SEP] Home Square 2 Piece Living Room Set with Coffee Table and End Table in Walnut [SEP] $757.76 [SEP] B07XQ712TW [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] $100.0 [SEP] B01DBKYOV6 [SEP] Baxton Studio 424-6898-AMZ Coffee Table, Brown [SEP] $239.99 [SEP] B08WRSWF5X [SEP] Modway Render Mid-Century Modern Coffee Table in Walnut [SEP] $272.12 [SEP] B09333CV7P [SEP] Modway Envision 70\" Mid-Century Modern Low Profile Entertainment TV Stand in Walnut White & Transmit Mid-Century Coffee Table, Walnut White [SEP] $453.5 [SEP] B078KKB8Z9 [SEP] Modway Transmit Mid-Century Coffee Table, Walnut White [SEP] $199.5 [SEP] B077Y987TL [SEP] Furniture of America Regols Rego Mid-Century Coffee Table with Glass Top, Gray [SEP] $100.0 [SEP] B06W52GBKW [SEP] Archian Walnut Wood High Coffee Table [SEP] $88.79 [SEP] B07NKG7DF9 [SEP] Modway Origin 47\" Mid-Century Modern Wood Coffee Table In Walnut White [SEP] $218.34 [SEP] B01HQGZEO2 [SEP] Baxton Studio Pierce Mid-Century Modern Coffee Table Walnut Brown/Contemporary [SEP] $105.53" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should click on the 'IWELL Mid Century Coffee Table for Living Room' option as it fits my requirements and has a price of $100.0.\n\nAction:\nclick[B07XQ712TW]" }, { "from": "human", "loss": null, "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] < Prev [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] Price: $100.0 [SEP] Rating: N.A. [SEP] Description [SEP] Features [SEP] Reviews [SEP] Buy Now" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should click on the 'Buy Now' button to purchase the IWELL Mid Century Coffee Table.\n\nAction:\nclick[Buy Now]" } ]
下载了hugging face上的agentlm-7b模型进行推理,发现4090上显存不够。
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = '/home/nickkf/Documents/models/agentlm-7b'
# 加载模型和tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# 设置设备为gpu或cpu
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# 定义聊天输入文本
chat_round = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
There's a llama in my garden 😱 What should I do? [/INST]"""
# 对输入进行tokenize
inputs = tokenizer([chat_round], return_tensors="pt").to(device)
# 生成回复response
reply_ids = model.generate(**inputs)
reply = tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0]
print("AI:", reply)
如题,我按照规定要求设置的环境,但是跑不起来,有更清晰的环境设置么?
想问一下,通用数据ShareGPT_Vicuna_unfiltered有9w条,你们是如何筛选到5w条的?能提供一下脚本吗
reward出现了以下情况:
1.论文第2节说”Each trajectory has a final reward r ∈ [0, 1], reflecting the completion status of the task。“
2.论文第2.1.3节说 "due to the difficulty of the Mind2Web task, we use a threshold of r ≥ 2/3 to ensure we obtain a sufficient number of trajectories."
3. 论文”AgentBench: Evaluating LLMs as Agents“ 第E.1节说 ”ultimately we provide a final reward score according to the above metrics:
reward = 0.7 × metricwinrate + 0.3 × metricdamagerate“
4.论文”AgentBench: Evaluating LLMs as Agents“ 第H.1节说 ”As there might be more than one suitable item for a given query, Webshop adopts a matching reward as its evaluation metric:“
$$ Reward =\frac{|Uatt ∩ Yatt| + |Uopt ∩ Yopt| + I[yprice ≤ uprice]}{|Uatt| + |Uopt| + 1}·rtype $$
请教作者:
Hey, thank you for your great work. I was replicating the training of AgentLM and was searching for the ShareGPT data used. The paper mentions this as the data used but I cannot find the filtered/cleaned version anywhere. Can you pls tell how to get the final version of the ShareGPT data used for training AgentLM?
Hi, will you open-source the fine-tuning code as well?
Add MIT License
Can I work on this?
Hi, thanks for ur great work!
I notice that you use Text-Generation-Inference to accelerate the evaluation process.
Does your model work with vllm
?
Auto comment workflow add comment while issue is opened , closed pr is opened.
如题,我好像还没有见过这么大的weight decay
之前读过你们toolbench的论文,感觉数据集构建和指令微调的思路也和这篇agent tuning差不太多。
想请教一下两篇paper的差别和侧重点,非常感谢!
请教以下问题,非常感谢您的回答:)
Great work for creating the AgentTuning data! I'm wondering how many training steps did you use for training the final AgentLM-7B model?
你好,我下了魔塔上的 AgentInstruct 数据集,但 conversation 都是空值,请问是数据不开源了嘛?
用fastchat部署AgentLM-13B,推理的时候格式是乱的,尤其是streaming的模式,每行只有几个字符就切换到下一行了,一个单词被切成了好几个字母或字母组合。如果手动用transformer加载并用gradio展示的话就没有这个问题,用fastchat的debug模式看了一下,用的是LlamaForCausalLM加载的模型,应该没错
I found some grammars mistake in readme.
Recognizing the superior quality of GPT-4 responses as highlighted by (Wang et al., 2023a), we adopted a sampling ratio of 1:4 between GPT-4 and GPT-3.5 for better performance.
GPT-4的响应质量不是更高吗,为什么采样比例GPT-4反而低呢?
请问大佬们有这个想法吗。
[
{
"from": "human",
"loss": null,
"value": "You are web shopping.\xxx"
},
{
"from": "gpt",
"loss": false,
"value": "Ok."
},
{
"from": "human",
"loss": null,
"value": "WebShop [SEP] Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Search"
},
{
"from": "gpt",
"loss": true,
"value": "Thought:\nI think I should use the search bar to look for a mid-century coffee table for my living room.\n\nAction:\nsearch[mid-century coffee table]"
},
{
"from": "human",
"loss": null,
"value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] Page 1 (Total results: 50) [SEP] Next > [SEP] B09FDCPKRR [SEP] Home Square 2 Piece Living Room Set with Coffee Table and End Table in Walnut [SEP] $757.76 [SEP] B07XQ712TW [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] $100.0 [SEP] B01DBKYOV6 [SEP] Baxton Studio 424-6898-AMZ Coffee Table, Brown [SEP] $239.99 [SEP] B08WRSWF5X [SEP] Modway Render Mid-Century Modern Coffee Table in Walnut [SEP] $272.12 [SEP] B09333CV7P [SEP] Modway Envision 70\" Mid-Century Modern Low Profile Entertainment TV Stand in Walnut White & Transmit Mid-Century Coffee Table, Walnut White [SEP] $453.5 [SEP] B078KKB8Z9 [SEP] Modway Transmit Mid-Century Coffee Table, Walnut White [SEP] $199.5 [SEP] B077Y987TL [SEP] Furniture of America Regols Rego Mid-Century Coffee Table with Glass Top, Gray [SEP] $100.0 [SEP] B06W52GBKW [SEP] Archian Walnut Wood High Coffee Table [SEP] $88.79 [SEP] B07NKG7DF9 [SEP] Modway Origin 47\" Mid-Century Modern Wood Coffee Table In Walnut White [SEP] $218.34 [SEP] B01HQGZEO2 [SEP] Baxton Studio Pierce Mid-Century Modern Coffee Table Walnut Brown/Contemporary [SEP] $105.53"
},
{
"from": "gpt",
"loss": true,
"value": "Thought:\nI think I should click on the 'IWELL Mid Century Coffee Table for Living Room' option as it fits my requirements and has a price of $100.0.\n\nAction:\nclick[B07XQ712TW]"
},
{
"from": "human",
"loss": null,
"value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] < Prev [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] Price: $100.0 [SEP] Rating: N.A. [SEP] Description [SEP] Features [SEP] Reviews [SEP] Buy Now"
},
{
"from": "gpt",
"loss": true,
"value": "Thought:\nI think I should click on the 'Buy Now' button to purchase the IWELL Mid Century Coffee Table.\n\nAction:\nclick[Buy Now]"
}
]```
2. huggingface不让访问,有推荐的下载方式吗?
比如用脚本运行或者配置到Langchain-chatchat可以吗?
可以解释一下Table 2中的数字的含义和计算方式吗 :)
Table 2: Ablation study on trajec�tory filtering.
麻烦大神回答了!
有没有微调过程成的细节?
直接就用fastchat的那个微调的代码吗?
期待您的回复~
Hey, thank you for your great work. I just wanted to know how can I run evaluation on the open source AgentInstruct data on the AgentBench repo. I will be really grateful if you can share the files defined here for running AgentInstruct on AgentBench. I am asking this because I want to see eval some models on the AgentInstruct data.
Why Contributors section:- A "Contributors" section in a repo gives credit to and acknowledges
the people who have helped with the project, fosters a sense of community, and helps others
know who to contact for questions or issues related to the project.
Issue type
@ kindly assign this issue to me ! I would love to work on it ! Thank you !
请问微调一个模型(例如7B)需要多大的显存?
请问AgentLM能支持openai.api类的接口本地部署吗?尝试用了fastchat进行部署,提示说找不到这个模型。请问AgentLM如何进行openai.api类的部署?
我观察到 #36 里面说一共进行了2000个step,所以应该有2000x64=128000个样本,考虑到 &eta = 0.2,所以sharegpt的数据应该为102400,但是 #16 中说这个数据已经是上采样了,这之间的gap是为什么呢?
以及考虑到 &eta = 0.2 所以是把agenttuning的数据在一个epoch里上采样了60000 * 0.2 / 1800 = 8倍吗?(具体来说,在2000个step下,每个agenttuning/sharept样本大概被训练了几次呢?)
还有你们为什么会选择sharegpt用60000这个数据量呢?(而不是10000或者40000之类的)
论文中的交互轨迹的Reward如何得到?用什么进行打分?
就是收集BIRD的思维轨迹那一块
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.