Giter Site home page Giter Site logo

snake-rl's Introduction

snake-rl

Snake learns to survive using reinforcement learning.

Snake

Q-learning algorithm

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. This is done by means of a look-up table called Q-table which stores a quality value Q for every pair of state S and action a to be taken in that state. Meaning that in a given state S, the best move a is the one associated with the highest Q value.

At the start, the Q-table is initialized with zeros and its entries are updated iteratively during the game using the Bellman equation. To fill up the Q-table quickly, a strategy called ε-greedy is implemented to encourage exploration of game states by choosing random moves with probability ε or optimal moves using the Q-table values with probability (1-ε). At the start of the game, ε is set to 1.0 (exploration phase) and is slowly decreased throughout the training to 0.0 (exploitation phase).

State representation

The Q-learning algorithm, despite the strategies to encourage exploration, falters with increasing numbers of states/actions since the likelihood of the agent visiting a particular state and performing a particular action is increasingly small.

However, since Snake is a really simple game we can engineer a state representation S which only takes up 8-bits of information, meaning we only have to deal with 256 states!

Here's how:

  • Bit0: Apple Y coordinate is greater than Snake's head Y coordinate (Apple is above)
  • Bit1: Apple Y coordinate is smaller than Snake's head Y coordinate (Apple is below)
  • Bit2: Apple X coordinate is smaller than Snake's head X coordinate (Apple is on the left)
  • Bit3: Apple X coordinate is greater than Snake's head X coordinate (Apple is on the right)
  • Bit4: Snake's body or wall just above Snake's head
  • Bit5: Snake's body or wall just below Snake's head
  • Bit6: Snake's body or wall just left of Snake's head
  • Bit7: Snake's body or wall just right of Snake's head

In this way, we can compress all the redundant information given by the full game matrix in a single byte, also making the state independent of the matrix dimensions. Given that there are 5 actions { IDLE, UP, DOWN, LEFT, RIGHT }, the shape of the Q-table will be: (256, 5).

Reward

Finally, the reward given to the Snake during training is the following:

  • +10: If the Snake eats an apple
  • -10: If the Snake dies
  •    0: Otherwise

This value is used in the Bellman equation as the immediate reward R for taking action a in state S.

Performance

The Snake typically learns how to play in about 15/20 minutes and, if more time is given, it can even get to grow 90 blocks in length! However, given its short sightedness due to the simplistic state space representation, this is the maximum that he was able to achieve.

Installation

snake-rl is written in the C language and uses the raylib simple cross-platform graphics library built on top of OpenGL. You can follow the installation steps of raylib for your operating system at the raylib GitHub repository.

Disclamer: Although raylib is also supported on Windows and macOS, the game has been compiled and tested only on GNU Linux.

Build using make

You can build snake-rl using make, just use the following:

git clone https://github.com/Gualor/snake-rl.git
cd snake-rl
make

Usage

For running the application just use the following:

./build/snake

This will create a new game window and Snake will start learning right away how to play at lightning speed!

Save/load state

Training sessions can take up some time until we can see Snake surviving and getting longer and longer. You can save the Q-table values to file and load it next time you want to resume the training:

./build/snake -s .

This will create a snake.qtable file which contains the Q-table values in CSV format. If the file already exists it will be overwritten.

For loading the snake.qtable file and resume training, run the following:

./build/snake -l ./snake.qtable -s .

This will load the Q-table and save further progress onto the same file.

NB: ε parameter is not stored inside the snake.qtable file and, therefore, when loading the file ε is automatically set to 0.0 since it is assumed that the exploration phase has already ended.

snake-rl's People

Contributors

gualor avatar

Stargazers

Tony Vu avatar Manuel Carzaniga avatar  avatar chattershuts avatar Andrea Bonvini avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.