Chess Match Predictor

What if you could predict the results of a chess match without knowing the players' individual moves?

The main goal of our project was to create a neural network that could anticipate the outcome of a chess game, given only a small number of variables. More specifically, our group wanted to create a machine learning model that was more effective at predicting the results of chess matches than simpler methods, such as random guessing. Below is our report describing the process of how the neural network was created, the results the model yielded, and what conclusions can be drawn from this information.

Introduction to Neural Networks

Defined formally, neural networks are computer systems modeled on the human brain and nervous system. These algorithms employ machine learning in order to produce increasingly accurate outputs, based on data sets. Before understanding this process, one must first understand the architecture of a neural network. As shown in Fig. 1, they are made up of connected layers of nodes, which work sequentially to manipulate the input in different ways. Each node represents some component of the inputted variables, the value of which changes throughout the learning process. Depending on what kind of layer they are in, the nodes have different purposes. For example, in basic sequential networks, there are three main types of layers: input, which takes the data in and applies weights (numerical values representing the variable's importance), hidden, which apply weights to values given by the previous layer and pass the generated value to the next layer, and output, which contain values that represent the network's prediction.

Fig. 1 Example of basic neural networks archiecture

W3C

### Network Learning In order to improve a neural network's accuracy, the structure described above repetitively produces outputs, each time changing the weights of each node. It makes these changes based on the backpropagation algorithm. This algorithm minimizes a network's loss, which represents the distance between the output value and the expected value. As this is done after each output is produced, the network becomes more increasingly accurate. This process is called training. During training, a network's accuracy is also tested on validation data, in order to make sure the algorithm is applicable to new data, not just the training data. If a neural network is found to have lower validation accuracy than training accuracy, it is considered to be suffering from overfitting (displayed in Fig. 2). The solution to the issue is to decrease the number of epochs (repetitions) in training, as doing so makes the network more applicable to other data. When the optimal number of layers, nodes, and epochs is found, the network can then be used on new input data.

Fig. 2 Plot showing the effects of overfitting

Our Data

The original data set we used for our neural network contained information about 6.3 million chess games played on Lichess.org (part of which is shown in Fig. 3), but that amount had to be decreased to about 1 million due to a lack of ram. Initially, this set contained several variables of each game, such as each player's username, the time at which each match was played, and the date of each game. Given that most of these details were irrelevant to our specific goal of predicting which player would win, the number of columns was eventually whittled down to 5.

Fig. 3 Exerpt from the original data set

Variables Used

The first variable in our input data set (Fig. 4) was the event, which represented which kind of chess game each match was. This variable is important, because the amount of time in a game has a significant influence on each player's moves. There were 7 possibilities: blitz, blitz tournament, bullet, bullet tournament, classical, classical tournament, and correspondence. The second and third columns contained the ELO of black and white, respectively. This variable is essentially a representation of a player's overall performance as a chess player. Each time a player wins or loses, their ELO increases or decreases based on their opponent's rating. For example, if one player's ELO is significantly higher than the other's, that player will most likely win the game. But, given this large difference in rating, each player's ELO will only change depending on the level of disadvantage- you get more points for beating someone significantly better than you, but lose fewer if you lose to someone better. On the other hand, if two players are very evenly matched, their ELO will shift only slightly based on if they win or lose. The theoretical maximum of this rating system is 3,000, although no one is currently at that level; if they managed to approach it a single loss would tank their score. The fourth column in our data set contained the Encyclopedia of Chess Openings (ECO) code for each match. Each of these codes has a letter and two digits (A00-E99) which help to determine the specific variation of the opening. Finally, the fifth column included in our data set was the game termination, which was either normal (checkmate) or time forfeit. The combination of these variables gave the network an adequate amount of information to make accurate predictions, without simply knowing all of each player's moves in a match.

Fig. 4 Visual representation of input data set

Target Data Set

The target dataset was used to contain the expected outputs for each chess match. The three possible outcomes were black wins, white wins, and tie. Fig. 5 shows this data set in a table format.

Fig. 5 Visual representation of target data set

Data Manipulation

In order to input our training data and target data into the neural network, they had to be converted from it's raw format into a normalized, network compatible structure. To do so, we one-hot encoded the data values using the Pandas function "get_dummies." This function changes each value in the data set from a single number into a binary array of possible numbers. More specifically, each column is split up into multiple columns of zeros and ones, the amount of which is defined by the number of possible states of the variable. For example, since the event column has 7 possibilities, it is separated into 7 columns each containing a 0 or a 1 (displayed in Fig. 6). If a specific chess match was of a certain type, the zero in that column would be changed to a one. This process normalizes the data, making it compatible with the neural network. Once we had completed one-hot encoding our data, it simply had to be converted into a numpy array and inputted into the network.

Fig. 6 Table of one-hot encoded data set

In order to ensure that our network was not suffering from overfitting, we split up the data and target arrays into training data/targets and validation and testing data/targets. Doing so allowed us to train the network using most of the data set, while testing the model on "new" data, to make sure it wasn't biased towards only one data set. As shown in the image below, we divided the first 800,000 data points into the training data and target arrays, and allotted the rest of the data set (about 200,100 data points) into the validation/testing data and target arrays.

Seperation of data set into the training data/targets arrays and the validation/testing data/targets arrays

Neural Network Overview

Architecture

The final build of our neural network was a dense six layer model. After the input layer (which had as many nodes as there were columns in the data set), our 4 hidden layers had 16, 16, 8, and 4 nodes respectively (displayed visually in Fig. 7). Our output layer then contained three nodes, as the possible outcomes of each chess game are black wins, white wins, and tie. This network structure was effective for our project, because it forced the computer to not overcomplicate the weights of each variable during the prediction process. More specifically, the relatively small number of nodes in each layer decreased efficiently, making the value manipulation between the penultimate layer and the output layer simpler (which made the network's predictions more consistently accurate).

Fig. 7 Visualization of neural network

Other Network Components

To make sure our model trained efficiently, our group chose network functions that were the most effective at performing their designated tasks specifically in the context of our project. For example, we chose to use the rectified linear activation function (in the input and hidden layers) over sigmoid activation, because it doesn't cause issues with vanishing gradients (when the gradient of a function is approaching 0, so little to no learning occurs). For the output layer activation, the clear choice was softmax because it is specifically effective for multi-class prediction models, such as our network. The function works by assigning a decimal probability to each possible result, then outputting the most likely outcome. The loss function utilized by our network was categorical cross-entropy, which works by calculating the difference between two probability distributions. In the context of our project, this means the difference between each of the model's predictions and the respective targets. The accuracy of the network was represented using a simple percentage, as our predictions were discrete values (as opposed to continuous outputs, such as percentages). Lastly, our network employed the RMSProp optimization algorithm for performing backpropagation. This algorithm is especially effective, because it normalizes gradients using a moving average of squared gradients in order to prevent exploding (overstepping) for large gradients and vanishing (understepping) for small gradients. The code required to employ these functions in a network is displayed below.

Written implementation of loss function, layer activations, accuracy tracker, and optmizer

Results and Analysis

Our network was able to predict the outcome of a completely new chess match 62.08% of the time (after epoch 7), with a validation/testing loss of ~0.75. Figs. 8 and 9 display the evolution of these output accuracy up until epoch 7, as well as afterwards, when the network began to suffer from overfitting.

Fig. 8

Fig. 9

But, without context, the significance of this percentage is difficult to interpret. This is why it's important to compare it to other prediction methods. The simplest algorithm to do this with is random guessing, as its expected accuracy is easy to calculate. Given that there were three possible outputs, guessing the result of a chess game randomly would (theoretically) yield an accuracy of 33.33%. Our model clearly performed significantly better than this, as its accuracy is almost double that of the compared strategy. Another relatively simple prediction algorithm is to choose the outcome of each match solely on the players' ELO (meaning the higher rated player always wins, unless they are equal in which case the match would end in a tie). Using this strategy on our data set yielded an accuracy of 58%, further representing the effectiveness of our model. Fig. 10 provides a visual representation of the accuracies of random guessing, ELO only prediction, and our network.

Fig. 10

To summarize our findings, the results displayed by these calculations and comparisons demonstrate two main details about our data set and network. Firstly, the model overfitting at epoch 7 demonstrates that there were imperfections in our data set and or neural network. One possible culprit is that the data set simply wasn't large enough to prevent the model from becoming biased towards the training data. Another aspect of the project that could have caused this problem is that the network structure wasn't in its most effective form (meaning the amount of layers and nodes were not completely optimized). On a more positive note, these results show that our model was successful in learning the importance of variables included in the data set other than just ELO (one of the most significant factors in trying to predict a match's outcome).

Future Work and Improvements

While our model worked well, there are still a few things that we want to improve on in the future. First of all, we want to be able to use more RAM. We tried to run all 6 million games we had collected (and a few million more that we had stored up in case we needed them), but we ran out of RAM after only a million games, even with 24 GB of RAM. If we are able to run this on a powerful computer with hundreds of gigabytes of RAM, the model will have more games to go off of and thus will be more accurate. Another thing that we would like to do in the future is optimising the layers and nodes in each layer a little better. Finding the optimal balance between these two would allow us to get the highest accuracy possible with our data and type of network.

Applications

Our model can be used to improve the openings of amateur chess players. They can look at the data and see which openings performed the best for the ELO range that they are in and implement them into their own chess game. Obviously, chess openings are determined by both players, but with enough practice they can steer the game somewhat in the direction that they want to give them an advantage. Additionally, spectators who are not very familiar with chess can use this tool to get an idea of which player is most likely to win after a few moves.

Sources and Acknowledgements

Guidance and Feedback

Professor Donald Smith, Guilford College: https://www.guilford.edu/profile/dsmith4

Libraries Used

-Numpy -Tensorflow -Pandas -Keras -Matplotlib

kevalshah2005 / chess-predictor Goto Github PK

chess-predictor's Introduction

Chess Match Predictor

Main Sections of Report:

Introduction to Neural Networks

Our Data

Variables Used

Target Data Set

Data Manipulation

Neural Network Overview

Architecture

Other Network Components

Results and Analysis

Future Work and Improvements

Applications

Sources and Acknowledgements

Guidance and Feedback

Libraries Used

Original Data set

Sigmoid vs. ReLU

General NN Information

chess-predictor's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org