Giter Site home page Giter Site logo

Comments (5)

lr-tsinghua11 avatar lr-tsinghua11 commented on June 12, 2024
  1. reward 的计算方式与 AgentBench 相同,参考其附录各数据集的 metrics
  2. #Dev 和 #Test 都是测试集,#Dev 从 #Test 随机采样而来,不是训练集,#Dev 仅用于减少评测时间

from agenttuning.

DryPilgrim avatar DryPilgrim commented on June 12, 2024

非常感谢您的回答 Tks :-) ,我还有疑问想请教一下:

  1. 我这里提到的reward是指AgentTunning中用来评估交互轨迹质量的reward, AgentBench中没有提到要评估交互轨迹的质量。 reward和metric不一样,以DB为例,metric是模型sql操作的成功率SR,属于整个数据集层面;reward是衡量trajectory的质量,属于单条交互轨迹层面。所以对于DB任务来说,AgentBench中的metrics不能拿来作为AgentTunning中交互轨迹的评估分数reward。

  2. AgentBench 论文中说开源了Datasets,只有#dev和#test,没有训练集吗?

from agenttuning.

lr-tsinghua11 avatar lr-tsinghua11 commented on June 12, 2024

也感谢您的持续关注! :-)

  1. 正如你所说,metric 的确是衡量整个数据集的,我们在实验过程中修改了部分 AgentBench 代码,从而获取到每条训练数据的 reward(把最终计算 SR 平均数替换为存下整个列表),进而可以 filter 交互数据;
  2. 是的,AgentBench 仅评测模型,训练集需要通过其余开源仓库等方式获取。

from agenttuning.

DryPilgrim avatar DryPilgrim commented on June 12, 2024

1 AgentTunning仓库中为什么没有held_in任务的评测呀(已有eval_heldout和eval_general)?
2 agentBench的代码中是在哪里计算的DB的SR呀?我看了代码,发现只有webshop实现了SR计算(THUDM/AgentBench/src/server/tasks/webshop/baseline_models/test.py)。此外,data中也没有webshop数据。
3 AgentBench训练集要自己重新构造吗?比如DB任务,需要自己收集混合WikiSQL、WikiTableQuestions等等。有处理训练集的脚本吗?

from agenttuning.

lr-tsinghua11 avatar lr-tsinghua11 commented on June 12, 2024
  1. held_in 任务位于 ./AgentBench.old 文件夹下
  2. 如果是说新版 AgentBench,计算 DB 的 metric 代码位于 THUDM/AgentBench/src/server/tasks/dbbench/_init_.py 第 173 行,旧版位于 THUDM/AgentTuning/AgentBench.old/src/server/tasks/dbbench/_init_.py 第 176 行
  3. AgentBench 中收录的部分任务有官方训练集,如 AlfWorld, WebShop, Mind2Web, KG,对于一些没有训练集的任务,如 DB,OS,我们自行构造了同分布训练数据。如 AgentInstruct 中 DB 的部分训练数据由 BIRD 数据集收集,其余还有部分 DB 和 OS 的训练任务使用 GPT-4 进行构造,构造的结果经筛选后收录于 AgentInstruct。我们暂时不会开源构造训练数据的脚本。

from agenttuning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.