Giter Site home page Giter Site logo

shakedzy / tic_tac_toe Goto Github PK

View Code? Open in Web Editor NEW
25.0 5.0 4.0 18.67 MB

Teaching the computer to play Tic Tac Toe using Deep Q Networks

License: MIT License

Python 100.00%
deep-learning deep-neural-networks reinforcement-learning deep-reinforcement-learning q-learning tic-tac-toe artificial-intelligence machine-learning

tic_tac_toe's Introduction

Tic Tac Toe played by Double Deep Q-Networks

tic_tac_toe

This repository contains a (successful) attempt to train a Double Deep Q-Network (DDQN) agent to play Tic-Tac-Toe. It learned to:

  • Distinguish valid from invalid moves
  • Comprehend how to win a game
  • Block the opponent when poses a threat

Key formulas of algorithms used:

Double Deep Q-Networks:

Based on the DDQN algorithm by Van-Hasselt et al. [1]. The cost function used is:

cost

Where θ represents the trained Q-Network and ϑ represents the semi-static Q-Target network.

The Q-Target update rule is based on the DDPG algorithm by Lillicrap et al. [2] :

update_rule

for some 0 <= τ <= 1.

Maximum Entropy Learning:

Based on a paper by Haarnoja et al.[3] and designed according to a blog-post by BAIR[4]. Q-Values are computed using the Soft Bellman Equation: soft_bellman

Trained models:

Two types of agents were trained:

  • a regular DDQN agent
  • an agent which learns using maximum entropy. They are named 'Q' and 'E' respectively.

Both models use a cyclic memory buffer as their experience-replay memory.

All pre-trained models are found under the models/ directory, where several trained models can be found for each variant. Q files refer to DDQN models and E files refer to DDQN-Max-Entropy models.

Do it yourself:

The main.py holds several useful functions. See doc-strings for more details:

  • train will initiate a single training process. It will save the weights and plots graphs. Using the current settings, training took me around 70 minutes on a 2018 MacBook Pro
  • multi_train will train several DDQN and DDQN-Max-Entropy models
  • play allows a human player to play against a saved model
  • face_off can be used to compare models by letting them play against each other

The DeepQNetworkModel class can be easily configured using these parameters (among others):

  • layers_size: set the number and size of the hidden layers of the model (only fully-connected layers are supported)
  • memory: set memory type (cyclic buffer or reservoir sampling)
  • double_dqn: set whether to use DDQN or a standard DQN
  • maximize_entropy: set whether to use maximum entropy learning or not

See the class doc-string for all possible parameters.


Related blogposts:


References:

  1. Hado van Hasselt et al., Deep Reinforcement Learning with Double Q-learning
  2. Lillicrap et al. , Continuous control with deep reinforcement learning
  3. Haarnoja et al., Reinforcement Learning with Deep Energy-Based Policies
  4. Tang & Haarnoja, Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning (blogpost)

tic_tac_toe's People

Contributors

shakedzy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tic_tac_toe's Issues

Network never copied

Hello,

I read what you wrote on medium and was studying your approach. I noticed this condition:

if self.memory.counter % (self.batches_to_q_target_switch * self.learning_batch_size) == 0:

is never True, so you never ever copy the Q-Network to the Q-Target. I even tried to end it as soon as it was but it just never reached.
image
I'm guessing instead of self.memory.counter you wanted to use self.learn_counter since they're just 1 unit apart anyway.

Am I missing something? If I'm right, I don't understand how the average reward can get better if the target never changes.

Would you kindly review this and get back to me?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.