Giter Site home page Giter Site logo

-bitcoin-price-prediction-using-transformers's Introduction

Bitcoin Price Prediction Using Transformers

In this project we used Transformers architecture - encoder-decoder, to predict Bitcoin value into a chosen future horizon. Our raw data holds almost 1 year of Bitcoin prices per minute (closing, opening, etc.). We extracted more statistics out of the data using common financial technical indicators - Finta, while making sure they are low correlated between them. We then fed the model with the data and trained it to predict the chosen future horizon based on past values. We optimized the hyperparameters of the model using Optuna.

Watch on Youtube:

Previous Work

Bitcoin prediction using RNN:

https://www.kaggle.com/muharremyasar/btc-historical-with-rnn

IBM stock price prediction using Transformer-Encoder:

https://github.com/JanSchm/CapMarket/blob/master/bot_experiments/IBM_Transformer%2BTimeEmbedding.ipynb

Short term stock price prediction using LSTM with a simple trading bot:

https://github.com/roeeben/Stock-Price-Prediction-With-a-Bot/blob/main/README.md

Data Processing

We are using Bitcoin historical one-minute records from (UTC+8): 2021-01-01 00:00:00 - 2021-12-05 23:59:00, containing 488,160 records from Okex Exchange. We got it from: https://www.kaggle.com/aipeli/btcusdt and it can also be found in our repository here.

Without regard to the time stamp feature, the data contains 5 features: the opening price, highest price, lowest price, closing price, and volume of transactions per minute. We calculated the correlations between the features and noticed that the first 4 (the prices) are high-correlated between themselves. So we wanted to add more meaningful features to the data before handing it to the model. For that, we used FinTA which implements common financial technical indicators in Pandas. We chose only the features which are low-correlated to all others and made sure they all use only past samples (so we won't accidentally use the future). After choosing them we cleaned it from NaNs and ended up with a total of 34 features and 488029 samples (lost the first 131 samples).

After that we split the data into train (80%), validation (10%), and test (10%), in chronological order as can be seen here:

alt text

Then the train data is being scaled, and the validation and test datasets are scaled accordingly.

Finally, we divided the train set into tensors of large sequential batches. During the training, we will sample from each batch a sequence of bptt_src to use as source and a sequence of bptt_tgt to use as target. To create more diverse data, we can start to sample from a random start point in each epoch, by setting the flag random_start_point True.

Architecture

We used PyTorch nn.Transformer as the basis of our model. Before both encoder and decoder, we entered a time embedding layer and in the output of the decoder a linear one.

In the time embedding layer, we are implementing a version of Time2Vec. We added more features to the data in 2 ways:

  1. Periodic features which implemented as a linear layer followed by sin activation - a total of periodic_features features.
  2. Linear features which implemented as a linear layer.

Both kinds of features are concatenated to the existing ones creating a total of out_features at the output.

The linear layer before the output is used to output the same number of features as the target - in_features.

The model structure:

Hyperparameters

  • num_features = int, number of features to choose from the full set (1 - 34)
  • scaler = str, the kind of scaler to use to scale the data (Standard Scaler - 'standard', Min Max Scaler - 'minmax')
  • train_batch_size = int, size of train batch
  • eval_batch_size = int, size of validation/test batch
  • epochs = int, number of epochs to run the training
  • bptt_src = int, the length of the source sequence
  • bptt_tgt = int, the length of the target sequence
  • overlap = int, number of overlapping samples between the source and the target
  • num_encoder_layers = int, number of enconder layers in the transformer
  • num_decoder_layers = int, number of decoder layers in the transformer
  • periodic_features = int, number of periodic features to add in the time embedding layer
  • out_features = int, number of output feature after the time embedding layer (> in_features + periodic_features)
  • nhead = int, number of heads in the multihead attention layers in the transformer (both encoder and decoder, must be a divider of out_features)
  • dim_feedforward = int, dimension of the feed forward layers in the transformer (both encoder and decoder)
  • dropout = float, the dropout probability of the dropout layers in the model (0.0 - 1.0)
  • activation = str, activation function to use in the transformer (ReLU - 'relu', GeLU - 'gelu')
  • random_start_point = bool, start each epoch from random start point from the first bptt_src samples
  • clip_param = float, the max norm of the gradients in the clip_grad_norm layer
  • lr = float, starting learning rate
  • gamma = float, multiplicative factor of learning rate decay (0.0 - 1.0)
  • step_size = int, period of learning rate decay in epochs

The most crucial thing to understand here is the relations between bptt_src, bptt_tgt and overlap. We use bptt_src past samples to predict the following bptt_tgt - overlap.

Optuna

We used Optuna to find the optimal hyperparameters in terms of the validation loss.

We fixed or constrained some of the hyperparameters by using the knowledge we gained during the manual tuning, to make runtime more reasonable:

Hyperparameter Value
num_features 34
train_batch_size 32
eval_batch_size 32
epochs 50
overlap 1
num_decoder_layers num_encoder_layers
periodic_features (out_features - num_features // 10) x 4 + 2
nhead out_features / 4
step_size 1
lr 0.5
step_size 1
gamma 0.95

For the other hyperparameters, we chose the range of possible values to optimize over.

These are the hyperparameters that were chosen:

Hyperparameter Value
scaler 'minmax'
bptt_src 10
bptt_tgt 6
num_encoder_layers 4
out_features 60
dim_feedforward 384
dropout 0.0
random_start_point 'False'
clip_param 0.75

The impact of these hyperparameters on the loss is visualized here:

The most important one is the scaler. We also saw that bptt_src and bptt_tgt were not as important as we thought they would be.

After our final fine-tuning, we only changed bptt_tgt from 6 as suggested by optuna to 2.

The full analysis by Optuna can be found in bitcoin_price_prediction_optuna.ipynb

Result

We trained the model with the hyperparameters above.

Model statistics:

  • Train Loss:
  • Validation Loss:
  • Test Loss:

After this training we checked the real-time performance of the model on the test set, meaning we entered the first 10 samples as source (bptt_src = 10), the 10th sample in this sequence as target (overlap = 1) and predicted the next value (bptt_tgt - overlap = 1). We then shifted the source samples by one and predicted the next value in the same way. We repeated the process until we had the prediction for all the possible minutes in the test set. You can see the result here:

We can see that the general trend of the prediction is similar to the real one. That is not surprising because we are looking at a large scale of minutes, 48,802, where the prediction is only based on the last 10 samples, so on the large scale, we expect to see both real and prediction around the same values. For better analysis we need to look closer, so here is a zoom-in view:

Here we can see the differences between the real and predicted values. The trends are still somewhat similar but sometimes the prediction predicts a rise or fall before it happens, for example, the rise around the minute 42,560 or the little fall in minute 42,580, and sometimes it just follows the existing trend like around minute 42,600.

Usage

To retrain the model run bitcoin_price_prediction.ipynb after you chose your hyperparameters in the first cell. The flag plot_data_process when set False will hide all the produced data processing images.

If you would like to do further hyperparameters tuning using optuna run bitcoin_price_prediction_optuna.ipynb. In the define_model function we declared the values of the fixed or constrained hyperparameters and in the objective function we declared the hyperparameters we want to tune along with their range.

Files in the repository

Folder File name Purpose
code bitcoin_price_prediction.ipynb Notebook which includes all data processing, training, and inference
bitcoin_price_prediction_optuna.ipynb Optuna hyperparameters tuning
data okex_btcusdt_kline_1m.csv.zip Zip file containing the data we used in this project
images Data_Separation.png Image that shows our train-validation-tets split
Model_Structure.png Image that shows our model architecture
Optuna_Result.jpeg Image that shows the importance of the Hyperparameters produced by Optuna
Test_Prediction.png Image that shows our result on the test set
Test_Presiction_Zoom_In.png Image that shows our result on the test set - zoomed-in
presentation_preview.gif Gif showing preview of the project presentation

Further Work

The work we presented here achieved good results, but definitely there are aspects to improve and examine such as:

  • Try running the model on a different stock.
  • Examine the feature extraction process and check which features are the most helpful.
  • Further tuning of the hyperparameters, release the constraints we put on some of them.
  • Check the performance in real-time trading (better to start with a trading bot on the test set)

Hope this was helpful and please let us know if you have any comments on this work:

https://github.com/yuvalaya

https://github.com/baruch1192

-bitcoin-price-prediction-using-transformers's People

Contributors

baruch1192 avatar yuvalaya avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.