Giter Site home page Giter Site logo

alckasoc / discussion-agents Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 10.0 2.53 MB

πŸ””πŸ§  The encyclopedia of LLM-based agents

Home Page: https://agential.readthedocs.io

License: MIT License

Makefile 0.39% Python 95.71% Jupyter Notebook 3.90%
llm-based-agent llms agential

discussion-agents's Introduction

Agential

codecov

Features

Our primary goal is to provide easy-to-use and clean implementations of popular LLM-based agent methods: an encyclopedia! This library is one of our contributions for our research project empirically surveying and investigating the performance of these methods across a diverse set of reasoning/decision-making tasks. Learn more about this here!

  • Easy-to-Use Interface: Provides intuitive and user-friendly functions for rapid prototyping and development.

  • Clean Functions: Offers clean and well-structured functions, promoting readability and maintainability of code.

  • Modularized Implementations: Includes modularized implementations of popular LLM-based agents and agent-related methods, allowing users to leverage cutting-edge innovations from the literature.

Getting Started

First, install the library with pip:

pip install agential

Next, let's query the ReActAgent!

question = 'Who was once considered the best kick boxer in the world, however he has been involved in a number of controversies relating to his "unsportsmanlike conducts" in the sport and crimes of violence outside of the ring?'

llm = ChatOpenAI(openai_api_key="YOUR_API_KEY")
agent = ReActAgent(llm=llm)
out = agent.generate(question=question)

Project Organization


β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ external                   <- Data from third party sources.
β”‚Β Β  β”œβ”€β”€ interim                    <- Intermediate data that has been transformed.
β”‚Β Β  β”œβ”€β”€ processed                  <- The final, canonical data sets for modeling.
β”‚Β Β  └── raw                        <- The original, immutable data dump.
β”‚
β”œβ”€β”€ agential                       <- Source code for this project.
β”‚Β Β  β”œβ”€β”€ cog   
β”‚   β”‚   β”œβ”€β”€ agent                  <- Model/agent-related modules.
β”‚   β”‚   β”‚   
β”‚   β”‚   β”œβ”€β”€ eval                   <- Agent core modules.
β”‚   β”‚   β”‚   
β”‚   β”‚   β”œβ”€β”€ functional                  
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ modules           
β”‚   β”‚   β”‚   β”œβ”€β”€ memory             <- Memory-related modules.
β”‚   β”‚   β”‚   β”œβ”€β”€ plan               <- Planning-related modules.
β”‚   β”‚   β”‚   β”œβ”€β”€ reflect            <- Reflecting-related modules.
β”‚   β”‚   β”‚   └── score              <- Scoring-related modules.
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ persona             
β”‚   β”‚   β”‚
β”‚   β”‚   └── prompts             
β”‚   β”‚
β”‚   └── utils                      <- Utility methods.
β”‚       
β”œβ”€β”€ docs                           <- An mkdocs project.
β”‚
β”œβ”€β”€ models                         <- Trained and serialized models, model predictions,
β”‚                                          or model summaries.
β”‚       
β”œβ”€β”€ notebooks                      <- Jupyter notebooks. Naming convention is a number 
β”‚                                    (for ordering), the creator's initials, and a short `-` delimited β”‚ description, e.g. `1.0-jqp-initial-data-exploration`.
β”‚  
β”‚
β”œβ”€β”€ references                     <- Data dictionaries, manuals, and all other explanatory materials.
β”‚
β”œβ”€β”€ reports                        <- Generated analysis as HTML, PDF, LaTeX, etc.
β”‚Β Β  └── figures                    <- Generated graphics and figures to be used in reporting.
β”‚
└── tests                          <- Tests.

Contributing

If you want to contribute, please check the contributing.md for guidelines!

discussion-agents's People

Contributors

alckasoc avatar igiotto12 avatar dependabot[bot] avatar tsomoriri avatar gaodalie avatar berniewu2 avatar tedasdf avatar joepehlke avatar abbasquettawala avatar jsgage7 avatar

Stargazers

Andy Cao avatar  avatar  avatar  avatar Nate d. avatar Woojin Kim avatar  avatar Talmo Pereira avatar cat avatar Sanjay avatar  avatar Markus Rauhalahti avatar  avatar

Watchers

Frankline Ononiwu avatar Kostas Georgiou avatar  avatar

discussion-agents's Issues

[Feature Request]: Generalize ReAct and Reflexion

Feature Description

Let's start with ReAct. In the paper, ReAct is tested on multiple benchmarks. Our current implementation of ReAct uses HotPotQA (one of the benchmarks in the paper) prompts only. By generalizing ReAct, you will make it such that:

  • ReAct prompts can be swapped out for prompts for benchmarks besides HotPotQA
  • user has the ability to add their own prompt

Reason

No response

[Feature Request]: MBPP for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and MBPP.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: MBPP for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and MBPP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add MBPP prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for MBPP
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

CRITIC was not tested on MBPP. To test CRITIC on MBPP, reference how other method papers test on MBPP. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on MBPP, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: AgentBench for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and WebShop.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add relevant the prompts and logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

CRITIC may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: SVAMP for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and SVAMP.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: AgentBench for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and AgentBench.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: Refactoring & Adding a Planning Feature

Feature Description

  • Refactor the current architecture to provide functionality for both low/medium level cognitive modules and high level agent classes.
  • Add a planning feature to the LangChain Generative Agents implementation.
    • The agent describes a daily plan. These are added to the agent's memory bank as aany other memory. These daily plans are iteratively refined throughout the day.

Reference link: https://github.com/joonspk-research/generative_agents/blob/main/reverie/backend_server/persona/cognitive_modules/plan.py#L461

Reason

Planning gives the agent a set of directions and something to work with. It also keeps their actions consistent with their personality traits and more. Also, I implement this because it is missing from the current Generative Agents implementation.

More details can be found in the PR #5 .

[Feature Request]: Specify Max Steps in ReAct and Reflexion

Feature Description

In the prompts used in ReflexionCoT and ReflexionReAct and ReAct, we should specify the max number of steps. Within the prompt there should be a line that says: You have a maximum of {n} steps.

Reason

The agent is unaware that the current setup has to finish in n steps. This specification in the prompt will help it at least come up with an answer by the end of the task generation/trajectory.

[Feature Request]: Base Agent Class

Feature Description

  • write base Agent class (not sure how to go about this)
    • agent class should have both high and mid level control over interaction methods
    • need some way to generalize TimeWeightedVectorStoreRetriever (so GenerativeAgentMemory doesn’t necessarily require it)
    • need some way to generalize GenerativeAgentMemory (so GenerativeAgent doesn’t necessarily require it)
    • create test cases for instantiating an agent

Reason

No response

[Feature Request]: AmbigNQ for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and AmbigNQ.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: AmbigNQ for CRITIC

Feature Description

Familiarize yourself with the repository and take a look the CRITIC repo, paper, and AmbigNQ.

There are 3 question answering (QA) tasks tested in the paper: HotpotQA, TriviaQA, and AmbigNQ.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the AmbigNQ prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for AmbigNQ
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on AmbigNQ (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on slack if you're confused! Good luck!

[Feature Request]: HumanEval for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and HumanEval.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Docs]: Create an Interactive Table for all Agent Implementations

Issue with current documentation:

No response

Idea or request for content:

Maybe some interactive table? Like the below table but it's interactive and each row of the table can be expanded like an accordion bar! This would be a great addition for documentation and also usage.

image

[Feature Request]: Eval Framework

Feature Description

  • a standard way to evaluate these agents work as expected (performance)
  • a standard way to benchmark speed
  • other evaluation methods for measuring usability, easy-of-use, etc

Reason

No response

[Feature Request]: TabMWP for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and TabMWP.
There are 3 math reasoning tasks tested in the paper: GSM8k, SVAMP, TabMWP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the TabMWP prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for TabMWP
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on TabMWP (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: ALFWorld for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and ALFWorld.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: GSM8k for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and GSM8k.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: SVAMP for CRITIC

Feature Description

Familiarize yourself with the repository and take a look the CRITIC repo, paper, and SVAMP.

There are 3 math reasoning tasks tested in the paper: GSM8k, SVAMP, TabMWP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the SVAMP prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for SVAMP
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on SVAMP (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on slack if you're confused! Good luck!

[Feature Request]: Improving Unit Tests

Feature Description

  • Improve unit tests
    • fix type ignore problems with mypy
    • is it possible to make them faster?
    • more controlled?
    • no cost entirely?
    • mock the model part entirely?
    • configure code cov arguments

Reason

No response

[Feature Request]: HumanEval for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and HumanEval.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add HumanEval prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for HumanEval
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

CRITIC was not tested on HumanEval. To test CRITIC on HumanEval, reference how other method papers test on HumanEval. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on HumanEval, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: TriviaQA for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and TriviaQA.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: GSM8k for CRITIC

Feature Description

Familiarize yourself with the repository and take a look the CRITIC repo, paper, and GSM8k.

There are 3 math reasoning tasks tested in the paper: GSM8k, SVAMP, TabMWP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the GSM8k prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for GSM8k
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on GSM8k (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on slack if you're confused! Good luck!

deadline: 5 2 2024

[Docs]: Write a CONTRIBUTING.md/Onboarding Guide

Issue with current documentation:

This document can include (not limited to):

  • what resources you found most useful in getting up to speed
  • details about environment set up etc
  • how to navigate around the repo and what issues to take a look at
  • what's the best way to get started?
  • code of conduct and values

Idea or request for content:

No response

[Feature Request]: WebShop for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and WebShop.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

Reason

No response

[Feature Request]: ALFWorld for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and ALFWorld.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add ALFWorld prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for ALFWorld
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

CRITIC was not tested on ALFWorld. To test CRITIC on ALFWorld, reference how other method papers test on ALFWorld. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on ALFWorld, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: TabMWP for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and TabMWP.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Docs]: README Update (repo structure, features)

Issue with current documentation:

It just has a set up guide.

Idea or request for content:

Below is an example README. Cross-reference it with popular library READMEs. It doesn't need to follow the below structure exactly.

<banner image>
# Title

Short description of the project.

<badges>

## Table of Contents

## Features

## What does this library offer?

## What doesn't this library offer?

...

Some reference READMEs:

[Feature Request]: More Unit tests for Reflexion

Feature Description

Unit test Reflexion's generate() method to make sure the multiply retry (outer while loop) works properly and reflections carry across multiple retries of the same task (intra-task learning).

Some cases (not-comprehensive):

  • default settings (already tested)
  • self.max_tries > 1
  • self.max_tries < 1
  • self.patience < self.max_tries and > 0
  • self.patience = 0
  • also take a look at the current test cases for ReflexionCoT and ReflexionReAct and see if there's anything to test
    • max_tries
    • max_reflections`
    • patience

Also, because max_reflections is used within ReflexionCoTReflector and ReflexionReActReflector, those classes should also have a couple more unit tests.

Reason

Better unit test coverage and good for understanding Reflexion implementation with CoT or ReAct.

[Feature Request]: Write a basic "Getting Started" Guide in the README

Feature Description

This guide should:

  • concise but clearly showcase this library's uses
  • demonstrate the overall library structure
  • what the library provides and what it doesn't
  • simple use case with defining a module (any module or all; though keep it minimal, this is a getting started guide)
  • simple use case defining an agent (ReAct is a good start)
  • running an agent on sample input

Reason

No response

[Feature Request]: FEVER for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and FEVER.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add relevant prompts and logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

CRITIC may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: Rethinking the Modules Design

Feature Description

  • factor out thinking and acting at the module-level
  • adding that extra layer pre/post-action under modules that encapsulates things that happen before and after an action is taken
  • this would require a change in the base agent class (because it has plan, reflect) but this will be taken care of later (we will first implement the first 2 changes above to see how things work)

Reason

No response

[Feature Request]: CRITIC

Implement:

The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.

Run:

  • HotpotQA
  • TriviaQA
  • AmbigNQ
  • GSM8k
  • SVAMP
  • TabMWP
  • MBPP
  • HumanEval
  • ALFWorld
  • WebShop
  • AgentBench (includes ALFWorld & WebShop)

[Feature Request]: ReAct

Feature Description

Implement:

The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.

Run:

  • HotpotQA
  • TriviaQA
  • AmbigNQ
  • GSM8k
  • SVAMP
  • TabMWP
  • MBPP
  • HumanEval
  • ALFWorld
  • WebShop
  • AgentBench (includes ALFWorld & WebShop)

[Feature Request]: WebShop for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and WebShop.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add relevant the prompts and logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

  • Set up your environment via the CONTRIBUTING.md
  • Make a Pull Request (PR)
  • Add the prompts for the specified benchmark
  • Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
    • Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
  • Include a thorough description of your changes within the PR
  • Request a review from @alckasoc

CRITIC may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.