Comments (2)
- 对于 Held-in 任务是的,AgentLM在 sft 过程中学习 gpt-4 的高质量交互对话(reward 筛选),并在这些任务上表现不错(reward 评测),同时也能泛化到其余 Held-out 智能体任务上。
- 这 6 个 Held-in 任务为 AgentBench 子集,reward 计算方式均能在 AgentBench 论文附录中每个数据集的 Dataset details 中找到
from agenttuning.
请教以下问题,非常感谢您的回答:)
- AgentBench 论文附录中数据集的 Dataset details 中找不到reward的计算方式!?比如DB的C.1中只是提到”Metrics. We measure the Success Rate of agents in completing instructions.“ 这个不是计算trajectory的reward分数(而且AgentBench中DB数据并没有trajectory)。
- AgentBench中DB数据并没有交互轨迹,如何使用CoT with Actions呢?
- AgentBench 中为什么#Dev比#Test大呢?如DB的#Dev=60,#Test=300. 训练集比测试集大吗?
》参考如下:
- AgentBench使用CoT with Actions:
AgentBench论文第2节中说”Since LLM-as-Agent requires LLMs’ strong reasoning ability, CoT (Wei et al., 2022b), which has been considered a de facto strategy in related evaluation together with actions (Yao et al., 2023b), is also adopted in AGENTBENCH.“
- AgentBench的DB数据:
{
"description": "how many weeks did julie covington's \"don't cry for me argentina\" spend at the top of australia's singles chart?",
"label": [
"7"
],
"create": {
"database": "wikitq",
"init": "wikitq_init.sql"
},
"table": {
"table_name": "Music Chart History",
"table_info": {
"columns": [
{
"name": "#",
"type": "INT"
},
{
"name": "Title",
"type": "TEXT"
},
{
"name": "Artist",
"type": "TEXT"
},
{
"name": "Highest pos. reached",
"type": "INT"
},
{
"name": "weeks at No. 1",
"type": "TEXT"
}
],
"rows": [
[
"1.",
"\"Don't Cry for Me Argentina\"",
"Julie Covington",
"1",
"7"
],
[
"2.",
"\"The Way You That You Do It\"",
"Pussyfoot",
"1",
"7"
],
[
"3.",
"\"I Just Want to Be Your Everything\"",
"Andy Gibb",
"1",
"7"
],
[
"4.",
"\"That's Rock and Roll\"",
"Shaun Cassidy",
"2",
""
],
[
"5.",
"\"Living Next Door to Alice\"",
"Smokie",
"2",
""
],
[
"6.",
"\"I Go To Rio\"",
"Peter Allen",
"1",
"5"
],
[
"7.",
"\"Torn Between Two Lovers\"",
"Mary McGregor",
"1",
"4"
],
[
"8.",
"\"Walk Right In\"",
"Dr Hook",
"1",
"5"
],
[
"9.",
"\"You're Moving Out Today\"",
"Carole Bayer Sager",
"1",
"4"
],
[
"10.",
"\"If You Leave Me Now\"",
"Chicago",
"1",
"5 (pkd #1 in 76 & 77)"
],
[
"11.",
"\"Don't Give Up on Us\"",
"David Soul",
"1",
"3"
],
[
"12.",
"\"Lido Shuffle\" / \"What Can I Say\"",
"Boz Scaggs",
"2",
""
],
[
"13.",
"\"You and Me\"",
"Alice Cooper",
"2",
""
],
[
"14.",
"\"Dance Little Lady Dance\"",
"Tina Charles",
"4",
""
],
[
"15.",
"\"When I Need You\"",
"Leo Sayer",
"8",
""
],
[
"16.",
"\"Don't Fall in Love\"",
"Ferrets",
"2",
""
],
[
"17.",
"\"I Feel Love\"",
"Donna Summer",
"1",
"1"
],
[
"18.",
"\"Help is on its Way\"",
"Little River Band",
"1",
"1"
],
[
"19.",
"\"You Gotta Get Up and Dance\"",
"Supercharge",
"3",
""
],
[
"20.",
"\"Mull of Kintyre\"",
"Wings",
"1",
"11 (pkd #1 in 77 & 78)"
],
[
"21.",
"\"Don't Leave Me This Way\"",
"Thelma Houston",
"6",
""
],
[
"22.",
"\"Ain't Gonna Bump No More with No Big Fat Woman\"",
"Joe Tex",
"2",
""
],
[
"23.",
"\"You're in My Heart\"",
"Rod Stewart",
"1",
"1"
],
[
"24.",
"\"Ma Baker\"",
"Boney M",
"5",
""
],
[
"25.",
"\"Lucille\"",
"Kenny Rogers",
"7",
""
],
[
"26.",
"\"Livin' la Vida Loca\"",
"Ricky Martin",
"1",
"3"
],
[
"27.",
"\"Smooth\"",
"Santana featuring Rob Thomas",
"1",
"12"
],
[
"28.",
"\"No Scrubs\"",
"TLC",
"3",
""
],
[
"29.",
"\"All Star\"",
"Smash Mouth",
"4",
""
],
[
"30.",
"\"Baby One More Time\"",
"Britney Spears",
"1",
"2"
],
[
"31.",
"\"Say My Name\"",
"Destiny's Child",
"1",
"3"
],
[
"32.",
"\"Genie in a Bottle\"",
"Christina Aguilera",
"1",
"5"
],
[
"33.",
"\"Smooth Criminal\"",
"Michael Jackson",
"7",
""
],
[
"34.",
"\"I Will Always Love You\"",
"Whitney Houston",
"1",
"10"
],
[
"35.",
"\"You Are Not Alone\"",
"Michael Jackson",
"1",
"5"
]
]
}
},
"evaluation": "",
"example": "",
"type": [
"other"
],
"heads": [
"#",
"Title",
"Artist",
"Highest pos. reached",
"weeks at No. 1"
],
"add_description": "The name of this table is Music Chart History, and the headers of this table are #,Title,Artist,Highest pos. reached,weeks at No. 1.",
"sql": {
"query": "SELECT weeks_at_No_1 FROM `Music Chart History` WHERE Artist = 'Julie Covington' AND Title = 'Don\\'t Cry for Me Argentina';",
"length": 123
},
"source": "wikitq"
}
from agenttuning.
Related Issues (20)
- AgentTuning 7b evaluate in HH, not expect as paper result HOT 13
- Dataset details 中找不到reward的计算方式 HOT 5
- 通用数据如何筛选 HOT 7
- 除了用docker运行,还有其他方式可以运行AgentLM吗? HOT 6
- Finetuning with Mistral or Yi? HOT 1
- 关于TRAJECTORY FILTERING问题 HOT 3
- 请问下agentlm-7b最少需要多少显存可以推理 HOT 5
- 基于fastchat部署,推理异常 HOT 3
- 期待用 Qwen72B 训练的模型。 HOT 1
- 可以给个简单点的工具调用示例吗 HOT 1
- Can I run AgentInstruct data on the AgentBench? HOT 1
- Can you point to the ShareGPT filtered/cleaned data used? HOT 1
- if it is possible to conduct RLHF from env HOT 1
- 训练数据是如何采样的? HOT 3
- 貌似hotpotqa测试脚本跑不起来? HOT 1
- weight decay确定是0.1吗? HOT 1
- 魔塔上的 AgentInstruct 数据集的 conversation 都是空值
- 请问哪里可以找到工作里对于数据库方面的训练数据 HOT 1
- 本地模型
- 训练数据中指令与模型行为不匹配
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from agenttuning.