Giter Site home page Giter Site logo

zorazrw / trove Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 3.0 94.79 MB

[ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks

Home Page: https://arxiv.org/pdf/2401.12869.pdf

Python 0.19% HTML 99.81%
programmatic-tasks tool-making

trove's Introduction

TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks ๐Ÿ› ๏ธ

Setup

Install the required packages:

pip install -r requirements.txt

Tasks and datasets are organized as follows:

โ”œโ”€โ”€ MATH
โ”‚   โ”œโ”€โ”€ algebra
โ”‚   โ”œโ”€โ”€ counting_and_probability
โ”‚   โ”œโ”€โ”€ geometry
โ”‚   โ”œโ”€โ”€ intermediate_algebra
โ”‚   โ”œโ”€โ”€ number_theory
โ”‚   โ”œโ”€โ”€ prealgebra
โ”‚   โ””โ”€โ”€ precalculus
โ”œโ”€โ”€ TableQA
โ”‚   โ”œโ”€โ”€ TabMWP
โ”‚   โ”œโ”€โ”€ WTQ
โ”‚   โ””โ”€โ”€ HiTab
โ”œโ”€โ”€ VQA
โ””โ”€โ”€ โ””โ”€โ”€ GQA

Running Experiments

Our Method: TroVE

python run_trove.py --task_name "math/algebra"
  • For MATH tasks, specify the task name as math/${dataset_name}, e.g., math/algebra.
  • For TableQA and VQA tasks, directly used the dataset name: [tabmwp, wtq, hitab, gqa].

Note that the specified --task_name argument should be lowercased.

Baseline Methods: Primitive & Instance

python baseline.py --task_name "math/algebra" --suffix "primitive"  # or "instance"

Note that for GQA dataset, we implement the locate_objects and visual_qa functions as fast apis. So you need to launch the server first (as below), then run the trove/baseline experiments.

uvicorn server.gqa:app

Evaluation

python -m utils.eval --results_path ${RESULTS_PATH}

trove's People

Contributors

zorazrw avatar

Stargazers

JohnZhou avatar ChiYeung Law avatar  avatar  avatar Yusuke avatar Zhiyuan Ma avatar Myungchul Shin avatar Jason avatar  avatar YIBO PENG avatar Sachit Menon avatar Xiang Li avatar Hong Wu avatar Sang Choe avatar Graham Neubig avatar Tianbao Xie avatar Lau Van Kiet avatar Fan avatar Fangyu Lei avatar Cheng avatar

Watchers

Shashank Gupta avatar  avatar  avatar

trove's Issues

Plans to release model outputs?

Hi,

Interesting work! Are there any plans to release the model outputs (especially the final set of functions and corresponding solutions) for the models and datasets studied in the paper?

System vs. user prompt in api.py?

Hi, I noticed that you included the code used to call ChatGPT, but as far as I can, didn't include any code that calls it.

def chat_api_wait(

Could you quickly explain how you split the prompt between the system & user prompts in your experiments?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.