Giter Site home page Giter Site logo

player-convnn's Introduction

player-ConvNN

Implementation of a Convolutional Neural Network that applies the deep Q-learning algorithm to play PyGame video games.
Specifically, I intended to replicate DeepMind's paper for a simple arcade game from my GitHub page, Asteroids. I ended up making slight variations to DeepMind's implementation (see more below).
The interesting bit of the code is that the game's state is expressed in raw pixel data, whose value estimation requires a CNN. Check out the Youtube video that shows the CNN's behaviour under different degrees of difficulty.

As inputs, it accepts:

  • image: array, image made of raw pixels captured from the game screen;
  • reward: float, reward received at that particular state;
  • is_terminal: bool, indication of whether the player is at a terminal state;

and returns:

  • an integer, as the chosen action.

Quick start

Download the files CNN_Agent.py and Parameters_CNN.ckpt to the same folder as the game asteroids v1.0.py, which can be found here.
Then, launch the game asteroids v1.0.py and enjoy watching the AI playing it.

The below is a brief instance of the game. Notice that to get to this level the CNN only requires 150,000 frame observations, i.e. approx 5 hours of playing.

In here, I modified the settings of the videogame to make it very difficult, even for a human. The agent appears to be doing very well anyway (certainly better than I would do!).

Optimal strategy vs State-Action value function

One of the challenges of the project was to measure the algorithm's performance.
I couldn't come up with any way to compare the agent's performance to the optimal strategy (assuming it exists). I could only observe its performance vs humans.

However, the bit that never stops impressing me is that Q-learning (among other RL methods) doesn't merely approximate the optimal strategy, but attempts to estimate the discounted value of future rewards for each decision.
The below graph shows the 50-step moving average of the ratio between estimated future rewards and realized discounted rewards obtained pursuing the strategy. The fact that the average ratio gravitates around 1.0, gives me confidence that the agent is accurately estimating the value of its strategy.

Pseudocode

The agent adopts the deep-Q learning algorithm: the state-action-value function is calibrated by minimizing (for small minibatches of previous observations) the below cost function

( [r_t+1 + max_a {Q(s_t+1,a)}]  -  Q(s_t,a_t) )^2  

where Q(s_t, a_t) is the value associated with taking action a_t on state s_t, and r_t+1 is the reward observed at time t+1.

More specifically, the algorithm involves:
Initialize the agent's action-value Q function with random weights;
While running:
-- Take an epsilon-greedy decision on what to do at the current state;
-- Observe the following reward and screen's image;
-- Convert the observed image to grey scale and compress it to a pre-determined smaller format [I chose 84x84];
-- Store the transition from the current state to the (now observed) following state and following reward;
-- Run a step of the learning algorithm, ie select a minibatch of transitions from previous observations [I chose size 32] and take one optimization step by minimizing the above cost function [I chose AdamOptimizer];

Key differences with DeepMind's architecture

I chose to use:

  • a different activation function: I used the leaky ReLU function, defined as max(a*z, z) with a << 1, as opposed to the simple ReLU function defined as max(0, z). The reason for this was that the ReLU function led to many neurons "dying" during training as they got stuck in permanently negative territory and became impossible to train further. It took me long time to identify this issue;
  • a different network: this CNN is deeper (3 convolutional layers), but uses less parameters (smaller filters and less kernels) as well as a smaller feed forward hidden layer;
  • a more flexible graph that allows to easily add maxpooling layers and an implementation of the dropout technique. In fairness, I ended up using none of these extra features as they didn't seem to any value.

Visual processing

Before feeding the screen's image to the CNN, the input is pre-processed first: the image is converted to grey scale, compressed to a smaller resolution, and stacked to previous frames [I chose to feed the CNN with the 3 most recent frames stacked together].

Here I show an example of an input image and its resulting (compressed) stacked images fed to the CNN:

Out of interest, here is the visualization of the first-layer filters of the CNN.
Contrary to other applications, understanding what the first-layer filters do is not intuitive at all.

Resources & Acknowledgements

Requirements

  • Python 3. I recommend this version as it's the only one I found compatible with the below libraries;
  • PyGame, I used version 1.9.2a0. Download it from here;
  • TensorFlow, I only managed to install it on my Mac. Download it from here;
  • Asteroids, arcade game from my GitHub page, whose code can be found here.

player-convnn's People

Contributors

flankme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.