Giter Site home page Giter Site logo

sidmohan0 / quant-train Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 463 KB

Python Repository to ingest, feature engineer, train, backtest, and run a random forest model to predict the direction of the S&P500 at the start of the next day's trading session.

License: MIT License

Python 100.00%
black-scholes ml options-pricing options-trading polygon quant quantitative-finance sp500 sp500-data-analysis

quant-train's Introduction

quant-train

Overview

S&P 500 Overnight Random Forest Systematic Trading Strategy with data ingestion using Polygon API, feature engineering, storage into a MySQL DB, a backtesting framework with visualizations, and code for a production deployment.

I'll be updating this regularly to add more functionality. Take a peek at the Future Updates section for more details. Interested in collaborating, contributing, or just want to chat? Please reach out to me direct at [email protected].

Overivew

This repo is a collection of python scripts that do the following:

  • download SPY spot and options data.
  • calculate overnight returns and volatility
  • solve for the greeks of the option contracts (check out feature_functions.py)
  • train a random forest model to predict overnight returns
  • backtest the model
  • calculate the performance metrics of the model
  • send an email with the prediction using gmail

Installation

  1. Clone the repo
  2. Use a virtual env , I use Pyenv w/ Python 3.8.10 running on Ubuntu 20.04
  3. pip install -r requirements.txt

Future Updates

Quality of Life

  • pipenv or poetry
  • refactor feature engineering parts
  • refactor backtesting parts
  • refactor into packages and modules

Data

  • Use the polygon python client more proactively to speed up some of the ingestion of raw information like options contracts

Automation

  • run_backtest.sh
  • run_prod.sh
  • some sort of cron job

Containerization

  • Dockerfile
  • Orchestrator (Kubernetes, Docker Swarm, etc.) and deployment scripts - also potentially making Automation section redundant

Processing

By far the slowest part of the process is the feature engineering. While we are ultimately dumping a lot of information about contracts in-memory when running the program, the feature engineering, where we calculate the Greeks of the contracts, involves various operations over the dataframes. Some things I'm thinking about and open to suggestions on:

  • Polars for faster processing of Dataframes
  • GPU acceleration
  • PySpark for distributed processing, also use w/ or instead of Polars
  • Maybe add and or update the options functions in feature_functions.py to use tf-quant-finance

Modeling

  • Hyperparameter tuning
  • Feature selection framework

Acknowledgement:

Credit to QuantGalore for the original code. I've dressed it up a bit but the core pieces as of initial commit i owe to them. Check out their repo, and blog post

Changelog

11/3/23

quant-train's People

Contributors

quantgalore avatar sidmohan0 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

gharehira

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.