waznop / simple-a2c Goto Github PK

View Code? Open in Web Editor NEW

A simple A2C made from scratch in PyTorch. Accompanying comic at https://hackernoon.com/intuitive-rl-intro-to-advantage-actor-critic-a2c-4ff545978752

Jupyter Notebook 100.00%

simple-a2c's Introduction

Simple Advantage Actor Critic (A2C)

The notebooks in this repo build an A2C from scratch in PyTorch, starting with a Monte Carlo version that takes four floats as input (Cartpole) and gradually increasing complexity until the final model, an n-step A2C with multiple actors which takes in raw pixels. These models are simple in an effort to facilitate understanding. For a more production-strength A2C check out this model converted from OpenAI baselines.

Notebooks:

Monte Carlo A2C
Adding N-Step
Code walk-through TUTORIAL: A simplified version of 2a used for teaching purposes. Compliment to comic.
Adding in multiple actors
Allowing model to take in a stack of "frames" rather that single frame. This in preparation for next step when we add in stack of frames from raw pixels.
Transitioning to raw pixel input. Changing FC NN to CNN. Takes hours on p2x large rather than seconds on laptop to train.
MC A2C which is also trained to predict its own next state and reward. Currently being used for experiments in transfer learning, prediction, data generation. If a model can predict its own future states, can it use this predictor to generate data for "mental training"?

For a deeper dive in deep RL, these are my favorite resources:

Reinforcement Learning: An Introduction. Barto & Sutton

David Silver's course

Denny Britz' RL repo

Recommend Projects

waznop / simple-a2c Goto Github PK

simple-a2c's Introduction

Simple Advantage Actor Critic (A2C)

simple-a2c's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent