Giter Site home page Giter Site logo

thudm / agenttuning Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 82.0 28.66 MB

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Home Page: https://thudm.github.io/AgentTuning/

Python 37.00% JavaScript 8.10% CSS 1.16% HTML 22.00% Shell 0.98% C++ 28.69% Makefile 1.43% CMake 0.64% TypeScript 0.01%

agenttuning's People

Contributors

btlmd avatar eltociear avatar harshhere905 avatar lr-tsinghua11 avatar rohan37kumar avatar sengxian avatar shraddha761 avatar shresthasurav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agenttuning's Issues

本地模型

如果我的模型部署在本地,能否进行Held-in和Held-out任务的评测呢?可以给一些方法概述吗,不尽感激!

Model Output Length

Hi, thanks for the interesting work and awesome codebase. Recently, I tested the 7b model with TGI (used config file: docker/agentlm-7b.yml) and I just found that the model's output seems to be trunked:

Query: I want to have a journey to California, please help me to build up a tourism plan. Please think step by step.
Answer: {"generated_text":"Sure, I'd be happy to help you plan a tour of California! Here's a"}

I wonder if I need to revise the config to enable the model to generate more words. Many thanks

An open queston: What's the difference between Agents and Tools

In your paper, I appreciate the evident enhancements achieved by "agents" across various benchmarks. I am intrigued by your approach to identifying and defining agents, a matter of ongoing concern within what some refer to as an "Agent Group." Personally, I perceive these entities as valuable tools, cleverly branded as agents to garner attention. 😂

训练数据中指令与模型行为不匹配

查看huggingface dataset上ALFWorld和Mind2Web的训练数据,发现根据提供的指令,模型不可能产生预期的行为,比如下面两条数据,这个是符合预期的吗?

image (1) image

train code

May I ask if there is code available for training agentLM?

论文中关于损失函数E的问题

您好,看了下您这边的论文,是针对最后损失loss进行修改吗?其中J(θ) = η · E(x,y)∼Dagent [log πθ (y | x)] + (1 − η) · E(x,y)∼Dgeneral [log πθ (y | x)] E(x,y)在论文中并没有找到对应的解释说明,是CE—loss吗

Dataset details 中找不到reward的计算方式

请教以下问题,非常感谢您的回答:)

1.AgentBench 论文附录中数据集的 Dataset details 中找不到reward的计算方式!?比如DB的C.1中只是提到”Metrics. We measure the Success Rate of agents in completing instructions.“ 这个不是计算trajectory的reward分数(而且AgentBench中DB数据并没有trajectory)。

  1. AgentBench 中为什么#Dev比#Test大呢?如DB的#Dev=60,#Test=300. 训练集比测试集大吗?

if it is possible to conduct RLHF from env

Thanks for open-sourced agentTuning code , I am quite interested in training the model, i see the training framework is not open-sourced #1,

The discussion mentioned that it could support ptuning or LORA, i am also wondering if it could also support RLHF?

Recently, i read a paper: https://arxiv.org/abs/2312.14878, i am curious how the AgentLM performance would be if we could let it learn from interacting with environments. (refer to Finetune type II in that paper)

Start TGI worker

image 您好,我在评估mt_bench时有两个问题 请问如何start TGI worker? 以及host=127.0.0.1和port=30070都是固定不变的吗? 非常期待您的回答

关于数据集

  1. Filt. Traj. 就是huggingface上数据的行数,那# Inst. 和 Avg # Filt.Traj. Turns是怎么计算的呀?举例来说,下面这个例子要怎么算呢?

[ { "from": "human", "loss": null, "value": "You are web shopping.\nI will give you instructions about what to do.\nYou have to follow the instructions.\nEvery round I will give you an observation and a list of available actions, you have to respond an action based on the state and instruction.\nYou can use search action if search is available.\nYou can click one of the buttons in clickables.\nAn action should be of the following structure:\nsearch[keywords]\nclick[value]\nIf the action is not valid, perform nothing.\nKeywords in search are up to you, but the value in click must be a value in the list of available actions.\nRemember that your keywords in search should be carefully designed.\nYour response should use the following format:\n\nThought:\nI think ... \n\nAction: \nclick[something]" }, { "from": "gpt", "loss": false, "value": "Ok." }, { "from": "human", "loss": null, "value": "WebShop [SEP] Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Search" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should use the search bar to look for a mid-century coffee table for my living room.\n\nAction:\nsearch[mid-century coffee table]" }, { "from": "human", "loss": null, "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] Page 1 (Total results: 50) [SEP] Next > [SEP] B09FDCPKRR [SEP] Home Square 2 Piece Living Room Set with Coffee Table and End Table in Walnut [SEP] $757.76 [SEP] B07XQ712TW [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] $100.0 [SEP] B01DBKYOV6 [SEP] Baxton Studio 424-6898-AMZ Coffee Table, Brown [SEP] $239.99 [SEP] B08WRSWF5X [SEP] Modway Render Mid-Century Modern Coffee Table in Walnut [SEP] $272.12 [SEP] B09333CV7P [SEP] Modway Envision 70\" Mid-Century Modern Low Profile Entertainment TV Stand in Walnut White & Transmit Mid-Century Coffee Table, Walnut White [SEP] $453.5 [SEP] B078KKB8Z9 [SEP] Modway Transmit Mid-Century Coffee Table, Walnut White [SEP] $199.5 [SEP] B077Y987TL [SEP] Furniture of America Regols Rego Mid-Century Coffee Table with Glass Top, Gray [SEP] $100.0 [SEP] B06W52GBKW [SEP] Archian Walnut Wood High Coffee Table [SEP] $88.79 [SEP] B07NKG7DF9 [SEP] Modway Origin 47\" Mid-Century Modern Wood Coffee Table In Walnut White [SEP] $218.34 [SEP] B01HQGZEO2 [SEP] Baxton Studio Pierce Mid-Century Modern Coffee Table Walnut Brown/Contemporary [SEP] $105.53" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should click on the 'IWELL Mid Century Coffee Table for Living Room' option as it fits my requirements and has a price of $100.0.\n\nAction:\nclick[B07XQ712TW]" }, { "from": "human", "loss": null, "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] < Prev [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] Price: $100.0 [SEP] Rating: N.A. [SEP] Description [SEP] Features [SEP] Reviews [SEP] Buy Now" }, { "from": "gpt", "loss": true, "value": "Thought:\nI think I should click on the 'Buy Now' button to purchase the IWELL Mid Century Coffee Table.\n\nAction:\nclick[Buy Now]" } ]

  1. huggingface不让访问,有推荐的下载方式吗?

请问下agentlm-7b最少需要多少显存可以推理

下载了hugging face上的agentlm-7b模型进行推理,发现4090上显存不够。

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = '/home/nickkf/Documents/models/agentlm-7b'

# 加载模型和tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 设置设备为gpu或cpu
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 定义聊天输入文本
chat_round = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. 
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST]"""

# 对输入进行tokenize
inputs = tokenizer([chat_round], return_tensors="pt").to(device)

# 生成回复response
reply_ids = model.generate(**inputs)
reply = tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0]

print("AI:", reply)

通用数据如何筛选

想问一下,通用数据ShareGPT_Vicuna_unfiltered有9w条,你们是如何筛选到5w条的?能提供一下脚本吗

请教reward分数的各种情况

reward出现了以下情况:
1.论文第2节说”Each trajectory has a final reward r ∈ [0, 1], reflecting the completion status of the task。“
2.论文第2.1.3节说 "due to the difficulty of the Mind2Web task, we use a threshold of r ≥ 2/3 to ensure we obtain a sufficient number of trajectories."
3. 论文”AgentBench: Evaluating LLMs as Agents“ 第E.1节说 ”ultimately we provide a final reward score according to the above metrics:
reward = 0.7 × metricwinrate + 0.3 × metricdamagerate“
4.论文”AgentBench: Evaluating LLMs as Agents“ 第H.1节说 ”As there might be more than one suitable item for a given query, Webshop adopts a matching reward as its evaluation metric:“
$$ Reward =\frac{|Uatt ∩ Yatt| + |Uopt ∩ Yopt| + I[yprice ≤ uprice]}{|Uatt| + |Uopt| + 1}·rtype $$

请教作者:

  • 1和2冲突了?(1说r的取值范围只有0和1)
  • 3和4的公式分别是计算DCG和WB的r, 那本文中各个任务的r计算公式分别是啥呀?

Can you point to the ShareGPT filtered/cleaned data used?

Hey, thank you for your great work. I was replicating the training of AgentLM and was searching for the ShareGPT data used. The paper mentions this as the data used but I cannot find the filtered/cleaned version anywhere. Can you pls tell how to get the final version of the ShareGPT data used for training AgentLM?

Auto comment

Auto comment workflow add comment while issue is opened , closed pr is opened.

agent tuning和toolbench的区别

之前读过你们toolbench的论文,感觉数据集构建和指令微调的思路也和这篇agent tuning差不太多。
想请教一下两篇paper的差别和侧重点,非常感谢!

关于reward

请教以下问题,非常感谢您的回答:)

  1. reward既作为模型metric,又作为轨迹筛选标准?!AgentBench中DCG和WS都使用reward来作为模型在该任务上的metric, 但是AgentTunning中是用来筛选交互轨迹的-“Recall that each interaction trajectory receives a reward r, this allows us to automatically select high-quality trajectories based on the reward.”
  2. AgentTunning中有6个held-in任务,他们的reward计算公式是? AgentBench出现了两个reward计算公式,分别用于DCG和WS,都是专门为DCG和WS设计的,不能推广到其他任务,比如DB。

Number of training steps

Great work for creating the AgentTuning data! I'm wondering how many training steps did you use for training the final AgentLM-7B model?

基于fastchat部署,推理异常

用fastchat部署AgentLM-13B,推理的时候格式是乱的,尤其是streaming的模式,每行只有几个字符就切换到下一行了,一个单词被切成了好几个字母或字母组合。如果手动用transformer加载并用gradio展示的话就没有这个问题,用fastchat的debug模式看了一下,用的是LlamaForCausalLM加载的模型,应该没错

论文中的问题

Recognizing the superior quality of GPT-4 responses as highlighted by (Wang et al., 2023a), we adopted a sampling ratio of 1:4 between GPT-4 and GPT-3.5 for better performance.
GPT-4的响应质量不是更高吗,为什么采样比例GPT-4反而低呢?

关于dataset statics 和 download

  1. Filt. Traj. 就是huggingface上数据的行数,那# Inst. 和 Avg # Filt.Traj. Turns是怎么计算的呀?举例来说,下面这个例子的# Inst. 和 Avg # Filt.Traj. Turns要怎么算呢?

[
    {
        "from": "human",
        "loss": null,
        "value": "You are web shopping.\xxx"
    },
    {
        "from": "gpt",
        "loss": false,
        "value": "Ok."
    },
    {
        "from": "human",
        "loss": null,
        "value": "WebShop [SEP] Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Search"
    },
    {
        "from": "gpt",
        "loss": true,
        "value": "Thought:\nI think I should use the search bar to look for a mid-century coffee table for my living room.\n\nAction:\nsearch[mid-century coffee table]"
    },
    {
        "from": "human",
        "loss": null,
        "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] Page 1 (Total results: 50) [SEP] Next > [SEP] B09FDCPKRR [SEP] Home Square 2 Piece Living Room Set with Coffee Table and End Table in Walnut [SEP] $757.76 [SEP] B07XQ712TW [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] $100.0 [SEP] B01DBKYOV6 [SEP] Baxton Studio 424-6898-AMZ Coffee Table, Brown [SEP] $239.99 [SEP] B08WRSWF5X [SEP] Modway Render Mid-Century Modern Coffee Table in Walnut [SEP] $272.12 [SEP] B09333CV7P [SEP] Modway Envision 70\" Mid-Century Modern Low Profile Entertainment TV Stand in Walnut White & Transmit Mid-Century Coffee Table, Walnut White [SEP] $453.5 [SEP] B078KKB8Z9 [SEP] Modway Transmit Mid-Century Coffee Table, Walnut White [SEP] $199.5 [SEP] B077Y987TL [SEP] Furniture of America Regols Rego Mid-Century Coffee Table with Glass Top, Gray [SEP] $100.0 [SEP] B06W52GBKW [SEP] Archian Walnut Wood High Coffee Table [SEP] $88.79 [SEP] B07NKG7DF9 [SEP] Modway Origin 47\" Mid-Century Modern Wood Coffee Table In Walnut White [SEP] $218.34 [SEP] B01HQGZEO2 [SEP] Baxton Studio Pierce Mid-Century Modern Coffee Table Walnut Brown/Contemporary [SEP] $105.53"
    },
    {
        "from": "gpt",
        "loss": true,
        "value": "Thought:\nI think I should click on the 'IWELL Mid Century Coffee Table for Living Room' option as it fits my requirements and has a price of $100.0.\n\nAction:\nclick[B07XQ712TW]"
    },
    {
        "from": "human",
        "loss": null,
        "value": "Instruction: [SEP] i'm looking for a mid century coffee table for my living room [SEP] Back to Search [SEP] < Prev [SEP] IWELL Mid Century Coffee Table for Living Room, Modern Cocktail Table with Storage Shelf, Couch Table, Sofa Table for Office, Easy to Assemble, Dark Walnut [SEP] Price: $100.0 [SEP] Rating: N.A. [SEP] Description [SEP] Features [SEP] Reviews [SEP] Buy Now"
    },
    {
        "from": "gpt",
        "loss": true,
        "value": "Thought:\nI think I should click on the 'Buy Now' button to purchase the IWELL Mid Century Coffee Table.\n\nAction:\nclick[Buy Now]"
    }
]```

2. huggingface不让访问,有推荐的下载方式吗?

微调

麻烦大神回答了!
有没有微调过程成的细节?
直接就用fastchat的那个微调的代码吗?
期待您的回复~

Can I run AgentInstruct data on the AgentBench?

Hey, thank you for your great work. I just wanted to know how can I run evaluation on the open source AgentInstruct data on the AgentBench repo. I will be really grateful if you can share the files defined here for running AgentInstruct on AgentBench. I am asking this because I want to see eval some models on the AgentInstruct data.

Adding Contributors Section in readme.md file.

Why Contributors section:- A "Contributors" section in a repo gives credit to and acknowledges
the people who have helped with the project, fosters a sense of community, and helps others
know who to contact for questions or issues related to the project.

Issue type

  • [✅] Docs

Expected output :-
5

@ kindly assign this issue to me ! I would love to work on it ! Thank you !

微调显存

请问微调一个模型(例如7B)需要多大的显存?

训练数据是如何采样的?

我观察到 #36 里面说一共进行了2000个step,所以应该有2000x64=128000个样本,考虑到 &eta = 0.2,所以sharegpt的数据应该为102400,但是 #16 中说这个数据已经是上采样了,这之间的gap是为什么呢?
以及考虑到 &eta = 0.2 所以是把agenttuning的数据在一个epoch里上采样了60000 * 0.2 / 1800 = 8倍吗?(具体来说,在2000个step下,每个agenttuning/sharept样本大概被训练了几次呢?)
还有你们为什么会选择sharegpt用60000这个数据量呢?(而不是10000或者40000之类的)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.