toybox-rs / toybox Goto Github PK

The Machine Learning Toybox for testing the behavior of autonomous agents.

Python 59.14% Shell 0.72% Dockerfile 0.05% HTML 40.09%

causality explanation explainable-ai xai atari reinforcement-learning

toybox's Introduction

The Reinforcement Learning Toybox

A set of games designed for testing deep RL agents. This repo contains Python wrappers and an intervention API for Toybox games. Python wrappers for the Atari games are constructed to mock the Arcade Learning Environment and subclass the gym.envs.atari.AtariEnv wrapper. ToyboxBaseEnv may be a good entry point for the gym wrappers.

If you use this code, or otherwise are inspired by our white-box testing approach, please cite our NeurIPS workshop paper:

@inproceedings{foley2018toybox,
  title={{Toybox: Better Atari Environments for Testing Reinforcement Learning Agents}},
  author={Foley, John and Tosch, Emma and Clary, Kaleigh and Jensen, David},
  booktitle={{NeurIPS 2018 Workshop on Systems for ML}},
  year={2018}
}

We have a lenghtier paper on ArXiV and can provide a draft of a non-public paper on our acceptance testing framework by request (email at etosch at cs dot umass dot edu).

How accurate are your games?

Watch four minutes of agents playing each game. Both ALE implementations and Toybox implementations have their idiosyncracies, but the core gameplay and concepts have been captured. Pull requests always welcome to improve fidelity.

Where is the actual Rust code?

The rust implementations of the games have moved to a different repository: toybox-rs/toybox-rs

Installation

Create a virtual environment using your python3 installation: ${python} -m venv venv
- OSX
  - On OSX ${python}, this is likely python3: thus, your command will be python3 -m venv venv
  - If you are not sure of your version, run python --version
- Windows (not fully tested!)
  - If you are on Windows, your command will likely be: {python}
Activate your virtual environment:
- BSD-ish: source venv/bin/activate
- Windows: venv/Scripts/activate
Install Toybox:

pip install ctoybox
pip install git+https://github.com/toybox-rs/Toybox

Note: if you are trying to run from Windows, you will need to build from source. See instructions for building here. 4. Install requirements: run pip install -r REQUIREMENTS.txt 5. Run python setup.py install

Play the games (using pygame)

pip install ctoybox pygame
python -m ctoybox.human_play breakout
python -m ctoybox.human_play amidar
python -m ctoybox.human_play space_invaders

Run the tests

Sample behavioral tests developed with Toybox are frozen and available here. These tests are featured with an OpenAI baselines integration to facilitate off-the-shelf model training.

Python

Tensorflow, OpenAI Gym, OpenCV, and other libraries may or may not break with various Python versions. We have confirmed that the code in this repository will work with the following Python versions:

3.6, 3.7

Get starting images for reference from ALE / atari_py

./scripts/utils/start_images --help

toybox's People

Contributors

Stargazers

Watchers

Forkers

jjfiv etosch vfleon xaunlee purvapruthi miffyli charudatta10 zacheryfogg blewmat junkilee haleyuvo standardgalactic

toybox's Issues

Amidar pixel perfection

For side-by-side paper graphics

query brick by id

find out if there is any stochasticity in any of the five default training algorithms/agents

Is there any stochasticity during training?
Do any produce stochastic agents (i.e., agents with stochastic policies)?

The five algorithms in baselines:

ppo2
acer
a2c
trpo_mpi
deepq

setup.py

https://github.com/getsentry/milksnake

cargo-fmt as pre-commit hook

Do it.

add python enemy movement plugin capabilities

We'd like to preliminarily support alternative enemy movement protocols that the user can provide via Python functions and an API.

Fix sprite rendering in human_play

Right now it just shows it with a rectangle. Maybe we can even use the software rendering to a texture for this backend, so it always stays in sync?

Instantiate Frozen Lake as a GridWorld experiment

FrozenLake is an OpenAI task that we should be able to use as part of the evaluation in the paper. It's a GridWorld-like task.

Implement Level 2 of Breakout

set up rust API to be comparable to ALE

note: can't do ROM, but we should just note that.

Render in grayscale / much smaller

to make agent training faster

TestSuite class

post-submission, pre-artifact

GridWorld game environment

@kclary says this is a very common RL benchmark, and should serve as a simpler, tutorial-esque game in our toybox that we could direct people who want to implement their own to.

Assigning to me for now.

Compile breakout game to web

So we can have a demo with NIPS pub.

add ability to pause during gameplay (human or agent) to export state

toybox.py: toybox.get_score() returning zero

@kclary was working on this, now with the graphics up and running I can tell that toybox (rust) knows what the score is but it's getting lost somehow on the way out. Maybe we just need a python/ctypes type annotation?

Need bitmap fonts for software rendering

Showing the score or not to an agent is an experiment we want to do, per @kclary.
The exact font is irrelevant to this hypothesis, so something easy.

rename openai to ctoybox

Then we will be libctoybox

write function to get index from row and columns

Implement Level 2 of Amidar

FONT PERFECTION: Breakout number font, Amidar colored digits.

Do it on a feature branch so that we don't impact training here unless we want to.

write rust package/module for visualization

Non-differentiable paddle

We need a paddle that is harder to learn.

Testing for paper

Breakout

last brick
ez channel
polar angles

Amidar

last segment
ez caught
enemy avoidance

Stealing @kclary's board notation.

Breakout paddle should allow user to control bounce

This is not real physics, but usually people alter the velocity of the ball coming off the paddle as if the paddle were really curved -- this allows the player to get all the bricks (otherwise it just bounces at the same angle forever).

start training

Score behavior after end of game

While watching trained models play both Breakout and Amidar, occasionally during game-over, the score will jump up from what was last seen on screen. Could be due to the 4-step concatenation before checking whether the player died? Seems to be many 4-step blocks between when you see the agent die and the screen go black, and when the rendering step sees that the game is over - that's when the score increase happens.

For Amidar, these can be large increases (e.g. 74 - 125, 278 - 379)

Breakout: Match ALE paddle behavior (don't go past the sides)

Render score over lunch

Display grayscale games (with special key) in human_play

This is nice to have but definitely not paper critical.

Breakout: Paddle should get smaller

Migrate Breakout to use real physics computation

Rust has a nice collision API here: ncollide2d that would allow us to trivially add arbitrary shapes to our breakout game. It also supports time of impact style queries so that when the ball ends up going too fast it won't "make mistakes".

Right now, we can just simulate tiny timesteps in our game, but that's going to be less efficient than using a real solver...

train a random agent

Python API (e.g., code for painting things in amidar)

Refactor toybox.py into toybox_clib.py and toybox.py

Identify agent and training regimen that takes the least amount of time

We need to evaluate our system by training an agent on both our system and the OpenAI gym one. It doesn't matter how good the agent is, so long as it does well enough that we can compare runtime performance meaningfully. Therefore, we should identify the setup that takes the least amount time some, so that we can iterate rapidly.

C API to ctypes in python

On the rust side of things: FFI chapter
On the python side of things: ctypes official docs
A specific tutorial but 2014

Support ALE Paddle Positions

round joystick like they do

GridWorld - find some from papers, make an openai Gym impl

toybox.py needs to convert ALE input to toybox input struct

They use 2p as a number from 0..32 or some stuff. We need the buttons that matter.

ppo2 doesn't like something on amidar

Running with with the ppo2 algorithm causes a silent failure. Not sure why.

Amidar enemies are too fast

Probably explains why the agent sucks? idk, player and enemy speeds are sync'd so that should be ok, but I no longer know anything.

Get list of start JSONs from Breakout API

This is a consequence of #25 now working for breakout. The seeds are controlled by state so our tests all use the same ball angles. OK but should be a choice.

separate config and state in rust-python API

This should make it a little cheaper to serialize state JSON more often.

implement configurable input

https://github.com/KDL-umass/Amidar
-- everything is parameterized here. right now the rust implementation is totally hard-coded. It should require input files the way that Amidar does. This should be done in a branch until we are done training, since it is only needed for experimentation.