The notebooks in this repo build an A2C from scratch in PyTorch, starting with a Monte Carlo version that takes four floats as input (Cartpole) and gradually increasing complexity until the final model, an n-step A2C with multiple actors which takes in raw pixels. These models are simple in an effort to facilitate understanding. For a more production-strength A2C check out this model converted from OpenAI baselines.
Notebooks:
-
Monte Carlo A2C
-
Adding N-Step
-
Code walk-through TUTORIAL: A simplified version of 2a used for teaching purposes. Compliment to comic.
-
Adding in multiple actors
-
Allowing model to take in a stack of "frames" rather that single frame. This in preparation for next step when we add in stack of frames from raw pixels.
-
Transitioning to raw pixel input. Changing FC NN to CNN. Takes hours on p2x large rather than seconds on laptop to train.
-
MC A2C which is also trained to predict its own next state and reward. Currently being used for experiments in transfer learning, prediction, data generation. If a model can predict its own future states, can it use this predictor to generate data for "mental training"?
For a deeper dive in deep RL, these are my favorite resources: