Giter Site home page Giter Site logo

2048nn's Introduction

2048NN

Train a neural network to play 2048

This project uses a policy network and tree search to find the optimal moves. The neural network is trained through self-play reinforcement learning.

The nibble update

  • Changed board engine to use nibbles and bitwise operators, as proposed in github/nneonneo/2048-ai.
  • play_fixed is 80 times faster
  • play_fixed_batch is 11 times faster (previous batch methods were 7.5x faster, new method has no batch acceleration)
  • mcts_fixed_batch (mean log score) is 4.7 times faster
  • Unfortunately, network forward is still the bottleneck.

Other changes

  • Soft classification targets during training.
  • min_move_dead playouts
  • Fixed10: On 20200213, remade all training data using fixed LUDR with 10 playouts, min_move_dead averaged over 4 runs.

Milestones:

Network name: % policy games that acheive 2048

Network Name % Policy games acheiving 2048
20200126/soft3.5_20_200_c64b3_p10_bs2048lr0.08d0.0_s4_best 0.4
20200128/20_400_soft3.5c64b3_p10_bs2048lr0.08d0.0_s2pre_best 1.88
20200130/0_600_soft3.5c64b3_p10_bs2048lr0.08d0.0_s7pre_best 4.2
20200205/0_800_soft3.0c64b3_p10_bs2048lr0.1d0.0_s0_best 6.16
20200207/0_1000_soft3.0c64b3_p10_bs2048lr0.1d0.0_s7_best 7.32
20200213/0_800_soft3.0c64b3_p10_bs2048lr0.1d0.0_s2_best 9.0
20200213/0_3400_soft3.0c64b3_p10_bs2048lr0.1d0.0_s0_best 24.0
x x

Findings:

  • Using fixed move order (Left, Up, Right, Down) can reach 2048 occasionally.
  • Hyperparameter optimization is necessary for training strong models.
  • Models tend to play better during the 'late game' (higher score boards). Possibly due to training data distribution.
  • Strong trained models prefer the move order (Left, Up, Down, Right). This fixed order is indeed slightly stronger than the initially proposed order. It makes sense in retrospect.

Monte Carlo playout process:

Given the current board, for each legal move, a number (e.g. 50) of games starting from that move are played to the end. Subsequent moves in each playout game are made according to either a fixed move order or the output of a neural network model (i.e. a policy network). The log of the scores of each playout are averaged to produce a final log score for each initial legal move. The chosen move for the initial board state is the one with the highest log score. No bias is used for the initial move.

This Monte Carlo playout process results in much stronger moves than the policy generating the moves during playouts. These stronger moves allow the main game line to reach much higher scores and tile values. The stronger moves from the playout process are then used for training the neural network to increase the strength of its policy, which feeds back into stronger playout results.

TODO: describe min_move_dead

Dependencies:

  • numpy
  • torch [pytorch]
  • ax [ax-platform] (for optimization)
  • curses [ncurses / windows-curses] (for manual play)

2048nn's People

Contributors

fqjin avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

zwl666666

2048nn's Issues

Test tile value scaling

Boards can be scaled so that the maximum tile is unit-valued. However, this removes information about the value of the minimum value (2), which is transformed into a fraction. A minimum scale value may need to be input as a channel. This can be tested by scaling board right before training.

Try conservative strategy

The current mcts algorithm chooses the move with the highest average log score. A more conservative strategy might choose the move with the highest minimum score. This strategy would choose moves that lead to higher probability of near-term survival, hopefully leading to games that choose high-probability but low-reward lines rather than low-probability but high-reward lines.

An added benefit is that the mcts process can be terminated as soon as a single game dies. I estimate that this will save a significant amount of computation time, up to 2 orders of magnitude.

I hypothesize that the resulting games will likely not achieve high scores, but should be more consistent in terms of not dying early. If the goal is to achieve 2048 consistently but nothing higher, this might be the way to go.

Draw pretty boards

Implement method to convert boards to pretty graphics for displaying on main page.

Move generate_tile inside moves

Add a generate_tile() at the end of each move so that it does not need to be called separately after each move in the mcts and play functions.

Try neuroevolution

Monte Carlo search takes too long and may not converge to the best move due to the non-normal stopping behavior of the game. Neuroevolution is an alternative, where game playouts are used to evaluate a population, and the best players are selected. This allows direct selection of models that lead to high scores / move counts rather than trying to fit models to a proxy, the estimated best move.

Rewrite training for pytorch

Ideally, training should sample from a set of many games. This is achievable when the game generating process (batch MCTS) is accelerated (see #10 ).

Add unit tests

Unit tests are helpful to make sure core mechanics are not broken.

Batch process returns poor game results

Using make_data with mcts_nn and mcts_batch gives different results. I noticed that mcts_batch games end earlier on average and scores are up to 1/2 as much. I briefly tested both algorithms and currently do not have an explanation. Cannot train if cannot generate good quality data.

Speed up mcts

Current selfplay speed is still too long. The biggest own-time bottlenecks are:

CPU (7 seconds), for some reason, loading the model takes much longer here (+3 sec)

  • conv2d: 31%
  • move_batch: 14%
  • merge_row_batch: 13%
  • generate_tile: 12%

GPU (15 seconds):

  • move_batch: 32%
  • zeros: 18%
  • generate_tile: 15%
  • merge_row_batch: 15%

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.