The discussion-agents from alckasoc

[Feature Request]: Generalize ReAct and Reflexion

Feature Description

Let's start with ReAct. In the paper, ReAct is tested on multiple benchmarks. Our current implementation of ReAct uses HotPotQA (one of the benchmarks in the paper) prompts only. By generalizing ReAct, you will make it such that:

ReAct prompts can be swapped out for prompts for benchmarks besides HotPotQA
user has the ability to add their own prompt

Reason

No response

[Feature Request]: MBPP for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and MBPP.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: MBPP for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and MBPP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add MBPP prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for MBPP
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

CRITIC was not tested on MBPP. To test CRITIC on MBPP, reference how other method papers test on MBPP. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on MBPP, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: Better Method to Organize Prompts

Feature Description

Currently prompts are just organized into files as string constants. Is there a better way to organize these?

Reason

No response

[Feature Request]: AgentBench for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and WebShop.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add relevant the prompts and logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

CRITIC may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: SVAMP for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and SVAMP.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: AgentBench for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and AgentBench.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: Refactoring & Adding a Planning Feature

Feature Description

Refactor the current architecture to provide functionality for both low/medium level cognitive modules and high level agent classes.
Add a planning feature to the LangChain Generative Agents implementation.
- The agent describes a daily plan. These are added to the agent's memory bank as aany other memory. These daily plans are iteratively refined throughout the day.

Reference link: https://github.com/joonspk-research/generative_agents/blob/main/reverie/backend_server/persona/cognitive_modules/plan.py#L461

Reason

Planning gives the agent a set of directions and something to work with. It also keeps their actions consistent with their personality traits and more. Also, I implement this because it is missing from the current Generative Agents implementation.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: AmbigNQ for CRITIC

Feature Description

Familiarize yourself with the repository and take a look the CRITIC repo, paper, and AmbigNQ.

There are 3 question answering (QA) tasks tested in the paper: HotpotQA, TriviaQA, and AmbigNQ.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the AmbigNQ prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for AmbigNQ
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Request a review from @alckasoc

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

Idea or request for content:

Maybe some interactive table? Like the below table but it's interactive and each row of the table can be expanded like an accordion bar! This would be a great addition for documentation and also usage.

[Feature Request]: Eval Framework

Feature Description

a standard way to evaluate these agents work as expected (performance)
a standard way to benchmark speed
other evaluation methods for measuring usability, easy-of-use, etc

Reason

No response

[Feature Request]: TabMWP for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and TabMWP.
There are 3 math reasoning tasks tested in the paper: GSM8k, SVAMP, TabMWP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the TabMWP prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for TabMWP
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on TabMWP (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: ALFWorld for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and ALFWorld.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: GSM8k for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and GSM8k.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

There are 3 math reasoning tasks tested in the paper: GSM8k, SVAMP, TabMWP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the SVAMP prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for SVAMP
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on SVAMP (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on slack if you're confused! Good luck!

[Feature Request]: Improving Unit Tests

Feature Description

Reason

No response

[Feature Request]: HumanEval for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and HumanEval.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add HumanEval prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for HumanEval
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

CRITIC was not tested on HumanEval. To test CRITIC on HumanEval, reference how other method papers test on HumanEval. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on HumanEval, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: TriviaQA for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and TriviaQA.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: GSM8k for CRITIC

Feature Description

Familiarize yourself with the repository and take a look the CRITIC repo, paper, and GSM8k.

There are 3 math reasoning tasks tested in the paper: GSM8k, SVAMP, TabMWP.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add the GSM8k prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for GSM8k
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Request a review from @alckasoc

If there is any additional logic for how CRITIC is tested on GSM8k (refer to the paper's repo), include these specifications in the PR description.

Feel free to ask me questions on slack if you're confused! Good luck!

deadline: 5 2 2024

[Docs]: Write a CONTRIBUTING.md/Onboarding Guide

Issue with current documentation:

This document can include (not limited to):

what resources you found most useful in getting up to speed
details about environment set up etc
how to navigate around the repo and what issues to take a look at
what's the best way to get started?
code of conduct and values

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

Reason

No response

[Feature Request]: ALFWorld for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and ALFWorld.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add ALFWorld prompts and relevant logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for ALFWorld
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

CRITIC was not tested on ALFWorld. To test CRITIC on ALFWorld, reference how other method papers test on ALFWorld. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on ALFWorld, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: ReAct

Feature Description

https://arxiv.org/pdf/2210.03629.pdf

Reason

No response

[Feature Request]: TabMWP for ReAct

Feature Description

Familiarize yourself with the repository and take a look at the ReAct repo, paper, and TabMWP.

Currently, the ReAct implementation only has prompts for HotpotQA and FEVER.

Add relevant prompts and logic to the current ReAct implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

ReAct may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing ReAct on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Docs]: README Update (repo structure, features)

Issue with current documentation:

It just has a set up guide.

Idea or request for content:

Below is an example README. Cross-reference it with popular library READMEs. It doesn't need to follow the below structure exactly.

<banner image>
# Title

Short description of the project.

<badges>

## Table of Contents

## Features

## What does this library offer?

## What doesn't this library offer?

...

Some reference READMEs:

[Feature Request]: More Unit tests for Reflexion

Feature Description

Unit test Reflexion's generate() method to make sure the multiply retry (outer while loop) works properly and reflections carry across multiple retries of the same task (intra-task learning).

Some cases (not-comprehensive):

default settings (already tested)
self.max_tries > 1
self.max_tries < 1
self.patience < self.max_tries and > 0
self.patience = 0
also take a look at the current test cases for ReflexionCoT and ReflexionReAct and see if there's anything to test
- max_tries
- max_reflections`
- patience

Also, because max_reflections is used within ReflexionCoTReflector and ReflexionReActReflector, those classes should also have a couple more unit tests.

Reason

Better unit test coverage and good for understanding Reflexion implementation with CoT or ReAct.

[Feature Request]: Write a basic "Getting Started" Guide in the README

Feature Description

This guide should:

concise but clearly showcase this library's uses
demonstrate the overall library structure
what the library provides and what it doesn't
simple use case with defining a module (any module or all; though keep it minimal, this is a getting started guide)
simple use case defining an agent (ReAct is a good start)
running an agent on sample input

Reason

No response

[Feature Request]: ReAct meet ActRe

Feature Description

https://arxiv.org/abs/2403.14589

Reason

seem cool

[Feature Request]: FEVER for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and FEVER.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add relevant prompts and logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

CRITIC may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

[Feature Request]: Rethinking the Modules Design

Feature Description

factor out thinking and acting at the module-level
adding that extra layer pre/post-action under modules that encapsulates things that happen before and after an action is taken
this would require a change in the base agent class (because it has plan, reflect) but this will be taken care of later (we will first implement the first 2 changes above to see how things work)

Reason

No response

[Feature Request]: CRITIC

Implement:

The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.

Run:

[Feature Request]: ReAct

Feature Description

Implement:

The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.

Run:

[Feature Request]: WebShop for CRITIC

Feature Description

Familiarize yourself with the repository and take a look at the CRITIC repo, paper, and WebShop.

Currently, the CRITIC implementation only has prompts for HotpotQA and TriviaQA.

Add relevant the prompts and logic to the current CRITIC implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to make modifications in cog/prompts, but will also require you to test your code in all the other relevant modules cog/functional and cog/agent. CRITIC does not have any cog/modules.

What to submit:

Set up your environment via the CONTRIBUTING.md
Make a Pull Request (PR)
Add the prompts for the specified benchmark
Write a short notebook tmp.ipynb in cog/agent showcasing the agent ran on a sample question from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs
Include a thorough description of your changes within the PR
Request a review from @alckasoc

CRITIC may not have been tested on this benchmark. If this is true, refer to other methods that have been tested on this benchmark. Check the project lifecycle document.
If there is any additional logic for testing CRITIC on this benchmark, include these specifications in the PR description.

Feel free to ask me questions on Slack if you're confused! Good luck!

alckasoc / discussion-agents Goto Github PK

discussion-agents's Introduction

Agential

Features

Getting Started

Project Organization

Contributing

discussion-agents's People

Contributors

Stargazers

Watchers

Forkers

discussion-agents's Issues

Feature Description

Reason

Feature Description

Feature Description

Feature Description

Reason

Feature Description

Feature Description

Feature Description

Feature Description

Reason

Feature Description

Reason

Feature Description

Reason

Feature Description

Reason

Feature Description

Reason

Feature Description

Reason

Feature Description

Feature Description

Feature Description

Reason

Feature Description

Reason

Feature Description

Reason

Feature Description

Issue with current documentation:

Idea or request for content:

Feature Description

Reason

Issue with current documentation:

Idea or request for content:

Feature Description

Reason

Feature Description

Feature Description

Feature Description

Feature Description

Reason

Feature Description

Reason

Feature Description

Feature Description

Reason

Feature Description

Feature Description

Feature Description

Issue with current documentation:

Idea or request for content:

Issue with current documentation:

Idea or request for content:

Issue with current documentation:

Idea or request for content:

Feature Description

Reason

Feature Description

Reason

Feature Description

Feature Description

Reason

Feature Description

Issue with current documentation:

Idea or request for content: