Project Deep Reinforcement Learning Summer Term 2022 at University of Ulm
AlphaZero is an Reinforcement Learning algorithm that leverages Neural Networks, a Monte Carlo Tree Search and self-play to perform with superhuman performance in the classic games of Go, Chess and Shogi. In this replication study, we focus on the different network architectures which can be employed the algorithm and compare their playing and learning performance against one another.
- working monte-carlo tree search
- different neural network architectures
- functional training loop for self play
- working implementation for tictactoe and k-in-a-row
- A (web) visualization which shows the current state of the game, as well as current policy and value estimation
- A distributed version of the monte-carlo as described in the papers
Ai agents often returns invalid move, and a random baseline has to be used as fallback- Cpp wrapper loses nodes when executed multithreaded
- Based on the adaptation of the weights, sometimes the cpp wrapper can fail multiple times
We use conda for managing our python dependencies. To create a new environment with the needed dependencies:
conda env create -n alpha-zero --file environment.yml
All needed parameters for training a model are supplied in a json configuration file. A example for this can be seen in config.json
.
To start a new run of self-play learning or resume another run use barracks.py
.--out
to supply the name of the directory where the model weights are saved to.
python barracks.py config.json --out name_of_run
If you want to play against a network agent use playing_demo.py
The whole self-learn pipeline is able to function without using the swig wrapper. The so performed training tends to be slower by a not so small margin. For a better experience we suggest to use the C++/SWIG implementation of the Monte Carlo Tree Search.
For a successfull compilation of the wrapper Libtorch is needed.
We used Libtorch 1.11.0. It is important that the PyTorch Version and Libtorch version match.
Also both need to be using the same CUDA Toolkit version.
This means the version installed manually on your system must match the version supplied in the environment.yml
.
The wrapper is build using CMake. For Windows we recommend to use the CMake GUI to generate the Projects files for a Visual Studio Project. Next compile the so generated Project in Release Configuration. The Generated files are automatically copied to correct location so that they can be used by the Python Code.
The code can be found in distributed_mcts
The SWIG wrapper uses the following third party dependencies:
- Catch2 licensed under the Boost Software License
- SpdLog licensed under the MIT License
- Threadpool Header-Only Library by Andreas Franz Borchert licensed under the MIT License
neural_network.py
: Contains the code for the neural networks in Pythonai_player.py
: wrapper around the Neural Network and the MCTS. Can differentiate between the python model and the swig wrapperbaselines.py
: Contains the baselines for the projectcolosseum.py
: Can duell two agents against each other and report how many games are won/lostmcts
: Contains the python implementation for the mcts
Files used by use to generate the plots for the report
elo_generator.py
generate_perf_diagram.py
render_elo_diagram.py
render_timing_diagram.py
render_perf_diagram.py