Giter Site home page Giter Site logo

leonardoemili / stock-price-forecasting Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 3.0 72.22 MB

Distributed stock price forecasting system to predict S&P 500 stock prices.

Jupyter Notebook 91.84% TeX 7.80% Python 0.35%
deep-learning stock-market trading fundamental-analysis sp500 pytorch pytorch-lightning distributed-systems pyspark technical-analysis

stock-price-forecasting's Introduction

Stock Price Forecasting

Neural stock price forecasting system using fundamental analysis and technical analysis to predict the trend of stocks from the S&P 500 index. The main contributions of this work are summarized as follows:

  • Develop the first approach with Pytorch Lightning as a learning framework, employing attention and Recurrent Neural Networks (RNNs). For further insights, read the dedicated report or the related notebook.
  • Develop a distributed approach with Pytorch, PySpark, and Petastorm, leveraging a cluster of nodes to parallelize the computation. It builds on top of the former and extends it introducing the powerful Spark's SQL queries, enabling the system to scale with a large amount of data. For an overview of the system, see the slides or the related notebook.

Datasets

We use data from Kaggle's public challenges, namely a first dataset with financial reports from S&P 500 from 2003 to 2013, and a second dataset containing stock market data. By aligning the two datasets and removing outliers (refer to the notebooks to see how the alignment is performed), we get an enriched dataset that can be used to perform both fundamental and technical analysis.

Results

A benchmark showing the performance of our trading strategy algorithm (details in the slides, pages 14-16).

MSE R2 Adjusted R2 Operation accuracy Profit
DecisionTreeRegressor 0.078 0.852 - 55.45% 35.97%
RandomForestRegressor 0.104 0.803 - 57.01% 51.61%
LSTM 0.021 0.939 0.897 56.52% 58.35%

How to train the distributed system?

In case you would like to install and configure PySpark on your local machine, please follow the instructions described here. Otherwise, you can clone the notebook and import it into Databricks as described here.

How to test the system?

For a simple and ready-to-use test, simply run the test/evaluate.py script that refers to the distributed system with pre-trained weights for the LSTM model. Otherwise, you can re-train the system using a model of your choice, and use the new weights to perform the evaluation.

Project structure

.
├── data/                     # Stock prices and fundamental data
├── report/
│   ├── main.pdf              # Project report for the dlai-2021 course
│   ├── main.tex
│   └── ...
├── test/
│   ├── data/                 # Model weights and test data
│   ├── evaluate.py           # Evaluation script
│   └── ...
├── dist_forecasting.ipynb    # PySpark distributed stock prediction system
├── forecasting.ipynb         # Stock prediction system
├── environment.yml           # Training environment
└── ...

stock-price-forecasting's People

Contributors

alessioluciani avatar leonardoemili avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.