A simple, modular, non-brittle framework for implementing text-based Reflexion agents. Designed for compatability with diverse problem domains and interopability with existing workflows.
Below is an example of a Reflexion agent that implements functions based on feedback from a code execution environment, which generates its own tests.
from reflexion.agents.programming import PythonReflexionAgent
from reflexion.datasets.programming import HumanEvalDataset
from reflexion.environments.programming import (InternalTestingEnv,
PythonTestingEnv)
from reflexion.llms import OpenAIChatLLM
# Load a task from a dataset
LANG = "python"
dataset = HumanEvalDataset(language=LANG)
task_id, signature, docstring, tests = dataset[0]
# Instantiate an LLM
llm = OpenAIChatLLM(model_name="gpt-4", temperature=0)
# Instantiate a code execution environment
local_env = PythonTestingEnv(timeout=10)
# Instantiate a Reflexion agent with an internal testing environment
agent = PythonReflexionAgent(
function_signature=signature,
docstring=docstring,
testing_env=InternalTestingEnv(function_signature=signature,
docstring=docstring,
language=LANG,
local_env=local_env,
llm=llm),
llm=llm
)
# Run the agent for a few steps
for _ in range(3):
reward, message = agent.step()
# Evaluate the agent's implementation against the ground truth tests
rewards, messages = local_env.step(program=agent.implementation, tests=tests)
Follow these steps to get reflexion-framework up and running:
- Clone the repository with submodules:
git clone --recurse-submodules https://github.com/becklabs/reflexion-framework.git && cd reflexion-framework
- Install the package:
pip3 install -e .
-
If using the Rust programming environment, Install Cargo
-
If using OpenAI LLMs, set the
OPENAI_API_KEY
environment variable to your API key:
export OPENAI_API_KEY=yourkey
- If using
transformers
LLMs, install the necessary libraries:
pip3 install transformers torch
- If evaluating with the
LeetCodeHard
benchmark, build the dataset according to the instructions in the README.
- Wider language support for programming tasks
- HotpotQA and Alfworld agent and environment implementations