GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution

paper: https://arxiv.org/abs/2307.08775
TL;DR: GEAR is a computationally efficient query-tool grounding algorithm that is generalizable to various tasks that require tool use while not relying on task-specific demonstrations

Requirements

conda create --name GEAR python=3.7.10
conda activate GEAR
bash install.sh

What does data looks like

The sampled datasets used in our experiments can be found in /datasets.

/gear/read_dataset.py provides functions for reading those dataset files.

Instructions

Add OpenAI Api key in api.py and OpenAIModels.py

Args Explanation

parser.add_argument("--slm1", help="small language model",
                    type=str, default="EleutherAI/gpt-neo-1.3B")
parser.add_argument("--slm2", help="small language model2",
                    type=str, default="sentence-transformers/all-mpnet-base-v2")
parser.add_argument("-v", "--verbose", help="verbose",
                    action="store_true")
parser.add_argument("-M", "--max_tokens", help="max tokens",
                     type=int, default=512)
parser.add_argument("-t", "--top_k", help="Return top k APIs",
                    type=int, default=1)
parser.add_argument("--llm", help="large language model",
                    type=str, default="EleutherAI/gpt-j-6B")
parser.add_argument("-e", "--early_stop", help="early stop the program", 
                    type=int, default = 0)
parser.add_argument("-c", "--check_point", help="checkpoint file path", 
                    type=str, default=None)
parser.add_argument("-d", "--dataset", help="dataset path",
                    type=str)
parser.add_argument("-o", "--output", help="output file path",
                    type=str, default=None)
parser.add_argument('--experiment', choices=['gptj', 'gptj_zero_shot', 'openai', "gptj_few_shot", "openai_zero_shot", "openai_few_shot", "ground"], nargs="+")
parser.add_argument('--openai_model', choices=['chatgpt', 'gpt3'], default="gpt3")
parser.add_argument("--tool", choices=["calculator", "wiki", "qa", "mt", "multilingualqa", "timezone", "sleep", "log", "exp", "robot"], nargs="+")
parser.add_argument('--prompt', choices=['mt', 'wiki', 'qa', 'calculator'], default="mt")
parser.add_argument('--fdevice', type=str, default="cuda:0")
parser.add_argument('--ALPHA', type=float, default=0.75)

Run GEAR on GPT-J with four basic tools

cd gear
python -u main.py -v -t 1 -e 1000 \
-d {DATASET_PATH} \
-c {CHECKPOINT_JSON_PATH} \
-o {OUTPUT_JSON_PATH} \
--experiment gptj \
--tool calculator wiki qa mt \
--fdevice {DEVICE} \
> {OUTPUT_TXT_PATH}

Run GEAR on GPT-3 Model with four basic tools

cd gear
python -u main.py -v -t 1 -e 1000 \
-d {DATASET_PATH} \
-c {CHECKPOINT_JSON_PATH} \
-o {OUTPUT_JSON_PATH} \
--experiment openai \
--tool calculator wiki qa mt \
--openai_model gpt3 \
--fdevice {DEVICE} \
> {OUTPUT_TXT_PATH}

OpenAI model name can be changed to chatgpt

Run zero-shot and few-shot experiments for GPT-J and GPT-3 with four basic tools

cd gear
python -u main.py -v -t 1 -e 1000 \
-d {DATASET_PATH} \
-c {CHECKPOINT_JSON_PATH} \
-o {OUTPUT_JSON_PATH} \
--experiment gptj_zero_shot gptj_few_shot openai_zero_shot openai_few_shot \
--prompt {TASK_NAME} \
--tool calculator wiki qa mt \
--openai_model gpt3 \
--fdevice {DEVICE} \
> {OUTPUT_TXT_PATH}

FAQ

The Program is running so slow and WikiSearch returns nothing. This is because of the AWS connection issue from the URL used in the Wikipedia Search tool. Our tests conducted between April and June 2023 show that the server works well and typically takes 2-3 seconds to return a result, yet after June 20th, retrieval times increased to 120 seconds without returning anything. One potential solution here is to change the Wikisearch URL or use the Python Wikipedia Search package until it is fixed, but may not guarantee the same experiment result.

Reach out to Yining [email protected] or Haoping [email protected] if you have any other questions! :)

How to Cite

@inproceedings{lu-etal-2024-gear,
    title = "{GEAR}: Augmenting Language Models with Generalizable and Efficient Tool Resolution",
    author = "Lu, Yining  and
      Yu, Haoping  and
      Khashabi, Daniel",
    editor = "Graham, Yvette  and
      Purver, Matthew",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-long.7",
    pages = "112--138",
    abstract = "Augmenting large language models (LLM) to use external tools enhances their performance across a variety of tasks. However, prior works over-rely on task-specific demonstration of tool use that limits their generalizability and computational cost due to making many calls to large-scale LLMs. We introduce GEAR, a computationally efficient query-tool grounding algorithm that is generalizable to various tasks that require tool use while not relying on task-specific demonstrations. GEAR achieves better efficiency by delegating tool grounding and execution to small language models (SLM) and LLM, respectively; while leveraging semantic and pattern-based evaluation at both question and answer levels for generalizable tool grounding. We evaluate GEAR on 14 datasets across 6 downstream tasks, demonstrating its strong generalizability to novel tasks, tools and different SLMs. Despite offering more efficiency, GEAR achieves higher precision in tool grounding compared to prior strategies using LLM prompting, thus improving downstream accuracy at a reduced computational cost. For example, we demonstrate that GEAR-augmented GPT-J and GPT-3 outperform counterpart tool-augmented baselines because of better tool use.",
}

yining610 / gear Goto Github PK

gear's Introduction

GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution

Requirements

What does data looks like

Instructions

FAQ

How to Cite

gear's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent