Giter Site home page Giter Site logo

humancompatibleai / overcooked_ai Goto Github PK

View Code? Open in Web Editor NEW
675.0 19.0 143.0 523.61 MB

A benchmark environment for fully cooperative human-AI performance.

Home Page: https://arxiv.org/abs/1910.05789

License: MIT License

Python 38.43% Jupyter Notebook 50.38% Shell 0.99% Dockerfile 0.05% JavaScript 4.08% CSS 0.20% HTML 0.97% PureBasic 4.90%
artificial-intelligence deep-learning machine-learning pytorch reinforcement-learning

overcooked_ai's Introduction

MDP python tests overcooked-ai codecov PyPI version "Open Issues" GitHub issues by-label Downloads arXiv

Overcooked-AI 🧑‍🍳🤖

5 of the available layouts. New layouts are easy to hardcode or generate programmatically.

Introduction 🥘

Overcooked-AI is a benchmark environment for fully cooperative human-AI task performance, based on the wildly popular video game Overcooked.

The goal of the game is to deliver soups as fast as possible. Each soup requires placing up to 3 ingredients in a pot, waiting for the soup to cook, and then having an agent pick up the soup and delivering it. The agents should split up tasks on the fly and coordinate effectively in order to achieve high reward.

You can try out the game here (playing with some previously trained DRL agents). To play with your own trained agents using this interface, or to collect more human-AI or human-human data, you can use the code here. You can find some human-human and human-AI gameplay data already collected here.

DRL implementations compatible with the environment are included in the repo as a submodule under src/human_aware_rl.

The old human_aware_rl is being deprecated and should only used to reproduce the results in the 2019 paper: On the Utility of Learning about Humans for Human-AI Coordination (also see our blog post).

For simple usage of the environment, it's worthwhile considering using this environment wrapper.

Research Papers using Overcooked-AI 📑

Installation ☑️

Installing from PyPI 🗜

You can install the pre-compiled wheel file using pip.

pip install overcooked-ai

Note that PyPI releases are stable but infrequent. For the most up-to-date development features, build from source with pip install -e ..

Building from source 🔧

It is useful to setup a conda environment with Python 3.7 (virtualenv works too):

conda create -n overcooked_ai python=3.7
conda activate overcooked_ai

Clone the repo

git clone https://github.com/HumanCompatibleAI/overcooked_ai.git

Finally, use python setup-tools to locally install

If you just want to use the environment:

pip install -e .

If you also need the DRL implementations (you may have to input this in your terminal as pip install -e '.[harl]'):

pip install -e .[harl]

Verifying Installation 📈

When building from source, you can verify the installation by running the Overcooked unit test suite. The following commands should all be run from the overcooked_ai project root directory:

python testing/overcooked_test.py

To check whether the humam_aware_rl is installed correctly, you can run the following command from the src/human_aware_rl directory:

$ ./run_tests.sh

⚠️Be sure to change your CWD to the human_aware_rl directory before running the script, as the test script uses the CWD to dynamically generate a path to save temporary training runs/checkpoints. The testing script will fail if not being run from the correct directory.

This will run all tests belonging to the human_aware_rl module. You can checkout the README in the submodule for instructions of running target-specific tests. This can be initiated from any directory.

If you're thinking of using the planning code extensively, you should run the full testing suite that verifies all of the Overcooked accessory tools (this can take 5-10 mins):

python -m unittest discover -s testing/ -p "*_test.py"

Code Structure Overview 🗺

overcooked_ai_py contains:

mdp/:

  • overcooked_mdp.py: main Overcooked game logic
  • overcooked_env.py: environment classes built on top of the Overcooked mdp
  • layout_generator.py: functions to generate random layouts programmatically

agents/:

  • agent.py: location of agent classes
  • benchmarking.py: sample trajectories of agents (both trained and planners) and load various models

planning/:

  • planners.py: near-optimal agent planning logic
  • search.py: A* search and shortest path logic

human_aware_rl contains:

ppo/:

  • ppo_rllib.py: Primary module where code for training a PPO agent resides. This includes an rllib compatible wrapper on OvercookedEnv, utilities for converting rllib Policy classes to Overcooked Agents, as well as utility functions and callbacks
  • ppo_rllib_client.py Driver code for configuing and launching the training of an agent. More details about usage below
  • ppo_rllib_from_params_client.py: train one agent with PPO in Overcooked with variable-MDPs
  • ppo_rllib_test.py Reproducibility tests for local sanity checks
  • run_experiments.sh Script for training agents on 5 classical layouts
  • trained_example/ Pretrained model for testing purposes

rllib/:

  • rllib.py: rllib agent and training utils that utilize Overcooked APIs
  • utils.py: utils for the above
  • tests.py: preliminary tests for the above

imitation/:

  • behavior_cloning_tf2.py: Module for training, saving, and loading a BC model
  • behavior_cloning_tf2_test.py: Contains basic reproducibility tests as well as unit tests for the various components of the bc module.

human/:

  • process_data.py script to process human data in specific formats to be used by DRL algorithms
  • data_processing_utils.py utils for the above

utils.py: utils for the repo

overcooked_demo contains:

server/:

  • app.py: The Flask app
  • game.py: The main logic of the game. State transitions are handled by overcooked.Gridworld object embedded in the game environment
  • move_agents.py: A script that simplifies copying checkpoints to agents directory. Instruction of how to use can be found inside the file or by running python move_agents.py -h

up.sh: Shell script to spin up the Docker server that hosts the game

Python Visualizations 🌠

See this Google Colab for some sample code for visualizing trajectories in python.

We have incorporated a notebook that guides users on the process of training, loading, and evaluating agents. Ideally, we would like to enable users to execute the notebook in Google Colab; however, due to Colab's default kernel being Python 3.10 and our repository being optimized for Python 3.7, some functions are presently incompatible with Colab. To provide a seamless experience, we have pre-executed all the cells in the notebook, allowing you to view the expected output when running it locally following the appropriate setup.

Overcooked_demo can also start an interactive game in the browser for visualizations. Details can be found in its README

Raw Data 📒

The raw data used in training is >100 MB, which makes it inconvenient to distribute via git. The code uses pickled dataframes for training and testing, but in case one needs to original data it can be found here

Further Issues and questions ❓

If you have issues or questions, you can contact Micah Carroll at [email protected].

overcooked_ai's People

Contributors

alexlichtenstein avatar andrefpoliveira avatar bmielnicki avatar btjanaka avatar cassidylaidlaw avatar davidmlin avatar decodyng avatar dependabot[bot] avatar jyan1999 avatar markkho avatar mesutyang97 avatar micahcarroll avatar nathan-miller23 avatar paulk444 avatar rohinmshah avatar wduguay-air avatar xihuai18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

overcooked_ai's Issues

Compliance with standard gym API

Hi, it seems that the current repo does not fully comply with the standard gym API? E.g., if I create an env using gym.make('Overcooked-v0'), its action space will be None. Am I doing something wrong here?

Feature request: Gymnasium & PettingZoo support

Hi, would it be possible to add PettingZoo support in the future? The current setup.py also uses OpenAI gym, which has not been maintained for a few years now, Gymnasium is the maintained version of it.

Gymnasium and PettingZoo are compatible with current RL training libraries (rllib, tianshou and CleanRL have already migrated, and stable-baselines3 will soon) as well as other tools such as Comet and WandB.

For information about upgrading and compatibility, see migration guide and gym compatibility. The main difference is the API has switched to returning truncated and terminated, rather than done, in order to give more information and mitigate edge case issues.

Adding PettingZoo support would be a bit more complicated, but if needed I would be happy to help look over code or answer any questions. It would be really helpful for future researchers, and we would be excited to list it in the third party environments lists (PettingZoo, Gymnasium).

Clean up Human-AI / Human-human data collection

Uncover what's going on in the Human-AI data collection repo

This repo was used for HH data collection, and has the newer changes from Nathan.

I think once you understand HH data collection repo, it might be easiest to just have a H-AI data collection modality within it (maybe taking inspiration from the repo below on how to structure).

This repo was used for the original H-AI data collection, and is out of date.

The goal is to have only one repo that allows you to do either. One challenge is that psiturk might not work anymore (idk the state of development), but let's cross that bridge when we get there (if it doesn't work). Sandbox psiturk mode and testing things in localhost are your friends.

Is It Possible to Configure the Environment for Single Agent?

Hi, thanks for all of the great work and the nice environment! May I ask if it is possible to configure the environment to have only one single agent to work? Maybe in this case, the task can be easily developed as one single-agent RL task. Thank you in advance!

Tutorial Google Colab Notebook

Creating a tutorial Google Colab notebook on how to use the environment, visualize rollouts, etc. Most useful after python visualizations #45 are completed.

Pass action into env.step for gym env

The action should be a tuple with the joint action of the primary and secondary agents in index format. I tried the following code:

mdp = OvercookedGridworld.from_layout_name("cramped_room")
base_env = OvercookedEnv.from_mdp(mdp, horizon=500)
env = gym.make("Overcooked-v0",base_env = base_env, featurize_fn =base_env.featurize_state_mdp)
env.reset()
env.step((0,0))

I got the following error:

AttributeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 env.step((0,0))

File ~/.pyenv/versions/3.9.1/envs/myvenv/lib/python3.9/site-packages/gym/wrappers/order_enforcing.py:37, in OrderEnforcing.step(self, action)
35 if not self._has_reset:
36 raise ResetNeeded("Cannot call env.step() before calling env.reset()")
---> 37 return self.env.step(action)

File ~/.pyenv/versions/3.9.1/envs/myvenv/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:52, in StepAPICompatibility.step(self, action)
43 def step(self, action):
44 """Steps through the environment, returning 5 or 4 items depending on new_step_api.
45
46 Args:
(...)
50 (observation, reward, terminated, truncated, info) or (observation, reward, done, info)
51 """
---> 52 step_returns = self.env.step(action)
53 if self.new_step_api:
54 return step_to_new_api(step_returns)

File ~/.pyenv/versions/3.9.1/envs/myvenv/lib/python3.9/site-packages/gym/wrappers/env_checker.py:39, in PassiveEnvChecker.step(self, action)
37 return env_step_passive_checker(self.env, action)
38 else:
...
-> 1851 for obj in state.objects.values():
1852 if obj.position in counters_considered:
1853 counter_objects_dict[obj.name].append(obj.position)

AttributeError: 'OvercookedGridworld' object has no attribute 'objects'

I tried many alternatives but none of them worked. Can you share an example of what I should pass in the env.step function?

Will the trained models (BC and H_proxy) be publicly available?

I want to use these models to be the baseline in my work.
Can I have the access to these models? or is there any way I can make sure that my implementation of the baseline (BC and H_proxy) is correct? (e.g. training loss)

edit: I found this file but it still use GAIL model to train behaviour cloning. Is this the version that used in the paper?

Native python visualization - trajectory slider

Taken from @micahcarroll comment in #45
Steps:

  1. Convert entire trajectory into images in a single function.
  2. Combine trajectory from 1. and ipythonwidgets to flick between images of the same trajectory

This task is not blocking native python visualization PR #53 as core visualization functionality (convert OvercookedState to jpeg) is already there.

Layout compatibility

I've came across the agent documentation located here https://github.com/HumanCompatibleAI/overcooked-demo/tree/master/server/static/assets/agents/RllibSelfPlay_CrampedRoom

Regarding the layout compatibility part, it says that an agent trained on layout X can only be run on layout X/will achieve poor performance.
So does that mean I can't test an agent on an unseen layout ?

In other words, I want to train a custom agent on one OR more randomly generated layouts (with the same constraints) and test the agent's performance on a new unseen layout. Is that possible ?

Thanks in advance!

Should probably merge this repo with overcooked-demo

There seems to be little reason not to merge this repo with overcooked-demo. Currently there is no way to visualize trajectories directly in this repo that isn't terminal graphics in python (which takes some time to get used to).

In this sense, having an extra repo just for interactive play with agents (trained or non-trained) seems like just an additional barrier to entry. One reason why we did the division in the first place was that the hard-coded agents in overcooked-demo are resulting from DRL runs of the paper, so it seemed more tied to the paper than not. However, I feel like once we make the demo more agent agnostic (having random and custom agents too), then this shouldn't be an issue anymore.

Add support for Windows

Make all path directory code OS agnostic. Would require using os.path.join rather than manually adding slashes between directory names in many places.

Switch to pytest

Switch repo to use pytest for tests, instead of our current unittesting framework that is kind of janky and requires running shell scripts. This is something that should be done across all repos, or only on overcooked_ai once we have merged them.

Cannot build from source

Hey. I tried to follow the instructions to install the repo from source but I'm getting the following error (through pip everything worked fine):

Obtaining file:///C:/Users/Andre.LAPTOP01/overcooked_ai
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\Andre.LAPTOP01\anaconda3\envs\overcooked_ai\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Andre.LAPTOP01\\overcooked_ai\\setup.py'"'"'; __file__='"'"'C:\\Users\\Andre.LAPTOP01\\overcooked_ai\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ANDRE~1.LAP\AppData\Local\Temp\pip-pip-egg-info-5295cu1v'
         cwd: C:\Users\Andre.LAPTOP01\overcooked_ai\
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Andre.LAPTOP01\overcooked_ai\setup.py", line 6, in <module>
        long_description = fh.read()
      File "C:\Users\Andre.LAPTOP01\anaconda3\envs\overcooked_ai\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 843: character maps to <undefined>
    ----------------------------------------
WARNING: Discarding file:///C:/Users/Andre.LAPTOP01/overcooked_ai. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Agent reset during activation overrides agent's index and mdp

Hello Micah and Nathan,
I found out that my unability to run a GreedyHumanModel agent in overcooked demo comes from the agent being resetted during game's activation in overcooked demo game.py. I don't know how to handle that without breaking everything, though for now commenting out the "super" here makes the agent run. Thanks for your help !

Metrics for layout difficulty

Summary and more detailed description of the layout metrics task discussed with @mesutyang97 so they are open to discussion with everyone.

  1. Score difference between agents self-play and play with other agents:

    • function (or method) input:
      • layout
      • agents list
      • number of games for every agent pair
    • brief description of implementation:
      • every agent from the list play with every agent (including itself), the final score is calculated
    • result:
      • avg of non-self-play scores divided by avg of self-play scores
  2. Measuring how much irrational play can lower score (how much play is fragile to random perturbations):

  3. Calculate the number of corridors tiles (tiles that are part of paths wide enough for only 1 chef) to detect layout with possible coordination of movement problems. It is tricky to figure out how to find relevant corridors, possibly by measuring how often they are used by list of agents.

  4. How many counters can be used to pass an object and how much time it can save - @mesutyang97 said he implemented something like that, but the code is currently outdated. Anyway finding relevant counters used for passes can also be tricky, possible testing which counters are used by self-play agents could reveal some of them (as passing the item by counter requires usually high coordination).

  5. Measuring how much actions of the agent can infer his intent - I've discussed details with @mesutyang97 , by forgot exact details here.

Besides metrics, another thing to do is using them somewhere in overcooked repositories. My ideas are:
a) Adding metrics to trajectory dicts (to allow evaluation of how well agents does on layouts with certain metrics)
b) Generate layouts until there is one that passes some metrics requirements.

1,2, a), and b) seems straightforward to implement and I will code them first. Then probably 3 and 4 using agents for finding relevant terrain tiles.

If you have some trained agents that do well (or at least not very poorly) on the generated random layouts let me know - they would be useful for defaults used in agent-related metrics.

Code Coverage Monitor

Incorporating code coverage automation reports into the project to better enhancing testing and confidence in code. Example of such a tool here

Create Trajectory class

Creating a Trajectory class to better abstract away a lot of the complex dictionary parsing in a more Object Oriented fashion.
Moved from #44 as this issue will probably wait a bit.

Add additional ingredients

Adapting the Recipe class to handle additional ingredients for a more diverse soup space. Might involve changes to the OvercookedGridworld and OvercookedEnv classes to handle the new ingredients

Default Recipe config

Currently when recipe config is not specified user gets an error ValueError: Recipe class must be configured before recipes can be created.
The fix for this is to insert Recipe.configure ({}) somewhere in the code that does not add any additional information except the will of the user to prevent error. I would like to do something that won't force user to add Recipe.configure ({}) in the code without sacrificing the positives of the current solution.
Why this check was added? There is anything wrong with assuming that if recipes are not configured user chose to have a default configuration (maybe with adding some warning print)?

ERROR: test_resume_functionality (ppo.ppo_rllib_test.TestPPORllib)

Hi,

I cloned this repo by following the instructions in README (didn't change anything) and I got the following error message when running python -m unittest discover -s testing/ -p "*_test.py".

It seems to point to a FileNotFoundError, but it mentions a very detailed timestamp in the filename so I suspect the test itself is supposed to create that file, but it's not working correctly? Was that just a bug of the test itself? Or anything I can do to fix that?

======================================================================
ERROR: test_resume_functionality (ppo.ppo_rllib_test.TestPPORllib)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_test.py", line 353, in test_resume_functionality
    options={"--loglevel": "ERROR"},
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
    run()
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 470, in main
    result = run(params)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/ppo/ppo_rllib_from_params_client.py", line 407, in run
    trainer = load_trainer(save_path=saved_path, true_num_workers=False)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 856, in load_trainer
    trainer = gen_trainer_from_params(config)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 801, in gen_trainer_from_params
    logger_creator=custom_logger_creator,
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 308, in __init__
    super().__init__(config=config, logger_creator=logger_creator, **kwargs)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 132, in __init__
    self._create_logger(self.config, logger_creator)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 823, in _create_logger
    self._result_logger = logger_creator(config)
  File "/home/sophiag/overcooked_ai/src/human_aware_rl/rllib/rllib.py", line 747, in custom_logger_creator
    logdir = tempfile.mkdtemp(prefix=logdir_prefix, dir=results_dir)
  File "/home/sophiag/anaconda3/envs/overcooked_ai/lib/python3.7/tempfile.py", line 366, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '/home/sophiag/ray_results/PPO_cramped_room_False_nw=16_vf=0.000100_es=0.200000_en=0.000500_kl=0.200000_11_2023-04-13_19-10-53t8mqlh9x'

----------------------------------------------------------------------
Ran 7 tests in 597.026s

FAILED (errors=1)

event_infos tag bug

Reported via email:


I’m looking to use the ‘event_infos’ tags as propositional labels for my particular application, but it seems that sometimes these tags don’t get correctly updated.

For example, in the following (state, action, new_state) triple:

X X P X X

O →1 T

Xo →0 X

X D X S X

(Action.INTERACT, Action.INTERACT)

X X P X X

O →1t T

Xo →0 X

X D X S X

I can see, by inspecting the relevant variables, that state.players[1].held_object has value None and new_state.players[1].held_object has value tomato@(3, 1), but that if I look at:

_, info = env.mdp.get_state_transition(state, (Action.INTERACT, Action.INTERACT,), False, env.mp)

Then the value of info['event_infos']['tomato_pickup’] is [False,False], when I believe it should be [False,True]. This may well happen with other info tags as well, but I haven’t dug into that yet. Do you have any idea what might be causing this problem and/or how to fix it?

For the record, I’m using whatever version of the repo is installed when running pip install overcooked-ai.

Add Dynamic Recipes

Update the OvercookedState and OvercookedGridworld classes to include recipe lists that can change with time. This would involve adding timestep dependencies to all_orders and bonus_orders

Unify `OvercookedGridworld` and `OvercookedEnv` classes

Refactoring the code base to merge the functionality of the two high level overcooked classes. OvercookedState should still include all the dynamic state data, and should be as lightweight as possible. OvercookedGridworld will assume the functionality of OvercookedEnv to include all static data, such as terrain and episode horizon, as well as trajectory specific data

PettingZoo

Does the author use PettingZoo Library to implement OverCooked Environment?

Have only one single player?

Thanks for the great work. Is it possible to configure the environment to have just one player instead of multiple? So the collaboration component will be removed and it becomes a standard RL task.

Clean up pre-trained examples

Currently have just one available, but we should have multiple and have them better documented.

  • Run all scripts and visualize training runs and get an intuition for them
  • Improve documentation for loading your saved model

Steps:

  1. Pick best performing model for each layout from sweep
  2. Use hyperparameters from that run and put them in a .sh file with 5 different seeds, for each layout
  3. Run bash file
  4. Look at results (check reasonably good), and save models somewhere (in repo if not too big, otherwise google drive)
  5. Put best performing out of 5 seeds into demo as pre-loaded defaults for each layout

the import package problem

Dear Authors,
I think there are some changes in the overcooked_ai repo. Can you check this error (i.e., MediumLevelPlanner)
image

Sincerely,

Understanding Training and Self-play agents

Hello,

Firstly, thank you for providing such a comprehensive GitHub repository on multi-agent RL. I'm new to the field of Reinforcement Learning and had some questions regarding the project:

In the human_aware_rl/ppo directory, it appears that a PPO agent is trained alongside a pre-trained Behavioral Cloning (BC) agent. Could you provide some guidance on how to modify this setup to train two PPO agents together, similar to the approach taken in PantheonRL?

The human_aware_rl/imitation directory suggests that a BC agent is trained using previously collected human data. Could you confirm this?

I'm particularly interested in understanding which of these setups qualifies as self-play. My assumption is that the first case might be considered self-play, but given that one agent is a BC agent, I'm not sure if this meets the traditional definition of self-play, such as the approach used in PantheonRL where you can train a PPO ego agent and PPO alt agent in stable-baselines3.

Thank you for your time and looking forward to your response.

Best regards

trajectory replay code

Hi,
I have been trying to investigate how to get the individual frames for each time step from the trajectory. Therefore, I was looking for the code that replays the trajectory json file.
That is the code that does this: https://humancompatibleai.github.io/overcooked-demo/replay
I am not able to understand how a 30s game play is equivalent to 204 steps, whereas a 60s game play is equivalent to 404 steps.

It would be great if some guidance could be given regarding this.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.