Giter Site home page Giter Site logo

rafa_code's Introduction

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Code for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency, International Conference on Machine Learning (ICML), 2024.

Project page: https://agentification.github.io/RAFA.

Authors: Zhihan Liu*, Hao Hu*, Shenao Zhang*, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang (* indicates equal contribution)

RAFA diagram

Please follow the instructions in the respective directories to reproduce our results in the four benchmarks:


Game of 24

Environment setup

  • Set OPENAI_API_KEY environment variable to your OpenAI API key:
export OPENAI_API_KEY=<your key>

Run the code

Experiment replication

python run.py --backend gpt-4 --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --prompt_sample standard --n_generate_sample 10 --method_generate propose --method_evaluate value --method_select greedy --n_select_sample 1 --n_evaluate_sample 3 --feedback

Params for different method

  • baseline ToT method (b=1, b=2)
python run.py --backend gpt-4 --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --prompt_sample standard --n_generate_sample 10 --method_generate propose --method_evaluate value --method_select greedy --n_select_sample 1 --n_evaluate_sample 3 --planning tot
python run.py --backend gpt-4 --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --prompt_sample standard --n_generate_sample 10 --method_generate propose --method_evaluate value --method_select greedy --n_select_sample 2 --n_evaluate_sample 3 --planning tot
  • baseline Reflexion method
python run.py --backend gpt-4 --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --prompt_sample standard --n_generate_sample 10 --method_generate propose --method_evaluate value --method_select greedy --n_select_sample 1 --n_evaluate_sample 3 --planning naive --feedback
  • RAFA (b=1, b=2)
python run.py --backend gpt-4 --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --prompt_sample standard --n_generate_sample 10 --method_generate propose --method_evaluate value --method_select greedy --n_select_sample 1 --n_evaluate_sample 3 --planning tot --feedback
python run.py --backend gpt-4 --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --prompt_sample standard --n_generate_sample 10 --method_generate propose --method_evaluate value --method_select greedy --n_select_sample 2 --n_evaluate_sample 3 --planning tot --feedback

GPT 3.5

To run gpt-3.5-turbo, just replace --backend gpt-4 with --backend gpt-3.5-turbo. You can use --backend gpt-3.5-turbo-16k to avoid context length error if possible.


ALFWorld

Environment setup

  • Install the required packages:
pip install -r requirements.txt
export OPENAI_API_KEY=<your key>

Run the code

./run_rafa.sh

BlocksWorld

Environment setup

  • Our experiments are conducted with Vicuna-13B/33B (v1.3). The required packages can be installed by
    pip install -r requirements.txt
    

Run the code

  • To run the RAP experiments, here is a shell script of the script

    CUDA_VISIBLE_DEVICES=0,1,2 nohup python -m torch.distributed.run --master_port 1034 --nproc_per_node 1 run_mcts.py --task mcts --model_name Vicuna --verbose False --data data/blocksworld/step_6.json --max_depth 6 --name m6ct_roll60 --rollouts 60 --model_path lmsys/vicuna-33b-v1.3 --num_gpus 3
  • To run the RAFA experiments, here is a shell script example

    CUDA_VISIBLE_DEVICES=0,1,2 nohup python -m torch.distributed.run --master_port 36977 --nproc_per_node 1 run_rafa_mcts.py --model_name Vicuna --verbose False --data data/blocksworld/step_6.json --max_depth 6 --name rafm_step6_33b_try60 --rollouts 60 --model_path lmsys/vicuna-33b-v1.3 --num_gpus 3
  • For details on the runtime arguments, one can use python run_rafa_mcts.py --help.


Tic-Tac-Toe

Environment setup

  • Set OPENAI_API_KEY environment variable to your OpenAI API key:
export OPENAI_API_KEY=<your key>

Run the code

Experiment replication

python run.py --X gpt-4 --O gpt-4 --O_MPC 3 --num_train_epochs 12 --num_eval_epochs 10

Parameters

--X, --O: the backend model for X player and O player (default: gpt-3.5-turbo-16k)
--X_MPC, --O_MPC: how many actions to propose for X player and O player (default: 1, just base model without MPC)
--temperature: temperature for gpt (default: 0.2)
--eval_freq: evaluation frequency (default: 1)
--num_train_epochs: number of epochs for training (default: 1)
--num_eval_epochs: number of epochs for evaluating (default: 1)
--verbose: auxiliary outputs (default: 1)

Citation

@article{liu2023reason,
      title={Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency},
      author={Liu, Zhihan and Hu, Hao and Zhang, Shenao and Guo, Hongyi and Ke, Shuqi and Liu, Boyi and Wang, Zhaoran},
      journal={arXiv preprint arXiv:2309.17382},
      year={2023}
}

rafa_code's People

Contributors

shenao-zhang avatar shuqike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rafa_code's Issues

Experimental Results on BlocksWorld (4-steps, Vicuna-13B-V1.3)

Hello authors, appreciate to your great work!

When I attempted to replicate the experimental results on Blocksworld (4-step setting, Vicuna-13B-V1.3), I found some discrepancies with the results stated in the paper. Could you please help me to check where the issue is?

Here are my results:
image

For RAP, the command I used was:

python -m torch.distributed.run --master_port 1034 --nproc_per_node 1 run_mcts.py --task mcts \
    --model_name Vicuna \
    --verbose False \
    --data data/blocksworld/step_4.json \
    --max_depth 4 \
    --name rap_roll60_13b_all \
    --rollouts 60 \
    --model_path lmsys/vicuna-13b-v1.3 \
    --num_gpus 1

For RAFA, the command was:

python -m torch.distributed.run --master_port 36977 --nproc_per_node 1 run_rafa_mcts.py \
    --model_name Vicuna \
    --verbose False \
    --data data/blocksworld/step_4.json \
    --max_depth 4 \
    --name rafa_roll60_13b_all \
    --rollouts 60 \
    --model_path lmsys/vicuna-13b-v1.3 \
    --num_gpus 1

Thank you very much!

Best,
Peng

ENV Package issue

Hi, I am interested in your masterpieces, and trying to reproduce the result, however, I am a little confused with Environment and DATA_PATH as attached, it seems like those variables are missing in the env package, how can I fix that issue, looking forward to your reply!
env_issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.