player-ConvNN

Implementation of a Convolutional Neural Network that applies the deep Q-learning algorithm to play PyGame video games.
Specifically, I intended to replicate DeepMind's paper for a simple arcade game from my GitHub page, Asteroids. I ended up making slight variations to DeepMind's implementation (see more below).
The interesting bit of the code is that the game's state is expressed in raw pixel data, whose value estimation requires a CNN. Check out the Youtube video that shows the CNN's behaviour under different degrees of difficulty.

As inputs, it accepts:

image: array, image made of raw pixels captured from the game screen;
reward: float, reward received at that particular state;
is_terminal: bool, indication of whether the player is at a terminal state;

and returns:

an integer, as the chosen action.

Quick start

Download the files CNN_Agent.py and Parameters_CNN.ckpt to the same folder as the game asteroids v1.0.py, which can be found here.
Then, launch the game asteroids v1.0.py and enjoy watching the AI playing it.

The below is a brief instance of the game. Notice that to get to this level the CNN only requires 150,000 frame observations, i.e. approx 5 hours of playing.

In here, I modified the settings of the videogame to make it very difficult, even for a human. The agent appears to be doing very well anyway (certainly better than I would do!).

Optimal strategy vs State-Action value function

One of the challenges of the project was to measure the algorithm's performance.
I couldn't come up with any way to compare the agent's performance to the optimal strategy (assuming it exists). I could only observe its performance vs humans.

However, the bit that never stops impressing me is that Q-learning (among other RL methods) doesn't merely approximate the optimal strategy, but attempts to estimate the discounted value of future rewards for each decision.
The below graph shows the 50-step moving average of the ratio between estimated future rewards and realized discounted rewards obtained pursuing the strategy. The fact that the average ratio gravitates around 1.0, gives me confidence that the agent is accurately estimating the value of its strategy.

Pseudocode

The agent adopts the deep-Q learning algorithm: the state-action-value function is calibrated by minimizing (for small minibatches of previous observations) the below cost function

( [r_t+1 + max_a {Q(s_t+1,a)}]  -  Q(s_t,a_t) )^2

where Q(s_t, a_t) is the value associated with taking action a_t on state s_t, and r_t+1 is the reward observed at time t+1.

More specifically, the algorithm involves:
Initialize the agent's action-value Q function with random weights;
While running:
-- Take an epsilon-greedy decision on what to do at the current state;
-- Observe the following reward and screen's image;
-- Convert the observed image to grey scale and compress it to a pre-determined smaller format [I chose 84x84];
-- Store the transition from the current state to the (now observed) following state and following reward;
-- Run a step of the learning algorithm, ie select a minibatch of transitions from previous observations [I chose size 32] and take one optimization step by minimizing the above cost function [I chose AdamOptimizer];

Key differences with DeepMind's architecture

I chose to use:

a different activation function: I used the leaky ReLU function, defined as max(a*z, z) with a << 1, as opposed to the simple ReLU function defined as max(0, z). The reason for this was that the ReLU function led to many neurons "dying" during training as they got stuck in permanently negative territory and became impossible to train further. It took me long time to identify this issue;
a different network: this CNN is deeper (3 convolutional layers), but uses less parameters (smaller filters and less kernels) as well as a smaller feed forward hidden layer;
a more flexible graph that allows to easily add maxpooling layers and an implementation of the dropout technique. In fairness, I ended up using none of these extra features as they didn't seem to any value.

Visual processing

Before feeding the screen's image to the CNN, the input is pre-processed first: the image is converted to grey scale, compressed to a smaller resolution, and stacked to previous frames [I chose to feed the CNN with the 3 most recent frames stacked together].

Here I show an example of an input image and its resulting (compressed) stacked images fed to the CNN:

Out of interest, here is the visualization of the first-layer filters of the CNN.
Contrary to other applications, understanding what the first-layer filters do is not intuitive at all.

Resources & Acknowledgements

Playing with Atari with Deep Reinforcement Learning, by DeepMind Technologies;
Daniel Slater's blog, and in particular his PyGamePlayer code that I used as starting point for mine.

Requirements

Python 3. I recommend this version as it's the only one I found compatible with the below libraries;
PyGame, I used version 1.9.2a0. Download it from here;
TensorFlow, I only managed to install it on my Mac. Download it from here;
Asteroids, arcade game from my GitHub page, whose code can be found here.

flankme / player-convnn Goto Github PK

player-convnn's Introduction

player-ConvNN

Quick start

Optimal strategy vs State-Action value function

Pseudocode

Key differences with DeepMind's architecture

Visual processing

Resources & Acknowledgements

Requirements

player-convnn's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent