This is a collection of notebooks, recipes, and scripts demonstrating how to use AlgoSeek as a data provider for Quantitative Finance, Algorithmic Trading, and Machine Learning. There are samples covering everything from data ingestion (real-time and batch), stock/universe selection, backtesting, and feature engineering. Additionally, there are examples showing how to manipulate financial data with AWS EMR, PySpark, and AWS Sagemaker for end-to-end ML pipelines and intraday strategy research.
The notebooks in the root of this repo are the starting point. These are the broad introductions to sections detailed in the notebooks portion. The rest of the subdirectories are as follows
Directory | Description |
---|---|
algoseek | Library for algoseek-specific functions |
data | data folder (gitignored but created in first notebook |
Datasets | Samples and descriptions of AlgoSeeks Datasets |
eda | Exploratory Data Analysis and Data Visualizations |
integrations | Using AlgoSeek with external data sources |
ML | Machine Learning Scripts |
Notebooks | Miscellaneous Notebooks |
Strategies | Trading Strategies |
WIP | Work In Progress Notebooks |
Here are dataset-specific notebooks exploring data for daily and intraday frequencies.
These datasets contain the actual stock movement data.
Dataset | Description |
---|---|
BasicOHLCDaily | |
BasicAdjustedOHLCDaily | |
PrimaryOHLCDaily | |
PrimaryAdjustedOHLCDaily | |
StandardOHLCDaily | |
StandardAdjustedOHLCDaily | |
TradeAndQuote | |
TradeAndQuoteMinuteBar | |
TradeAndQuoteMinuteBarExcludingTRF | |
TradeOnly | |
TradeOnlyAdjusted | |
TradeOnlyAdjustedMinuteBar | |
TradeOnlyAdjustedMinuteBarBBG | |
TradeOnlyAdjustedMinuteBarExcludingTRF | |
TradeOnlyMinuteBar | |
TradeOnlyMinuteBarBBG | |
TradeOnlyMinuteBarExcludingTRF |
These datasets provide more information about the securities and the Equity datasets.
Dataset | Description |
---|---|
BasicAdjustments | |
DetailedAdjustments | |
LookupBase | |
SecMasterBase |
There are currently two different ways to access the data: using the AlgoSeek SDK or using Boto3. There are notebooks for both methods here, but you should start with the introductions for both.
5.1) ML Preprocessing
5.2) Popular Libraries
Introductions to using AlgoSeek datasets with several popular libraries
5.3) ML Models
Some sample machine learning models to get you started
Model | Frequency | Description |
---|---|---|
Keras Univariate LSTM | Intraday | Intraday Regression |
Linear Regression | Intraday | |
LightGBM Regression | Intraday | |
Random Forest Regressor | Intraday | |
XGBoost | Intraday |
5.4) MLOps Framework
Model | Frequency | Description |
---|---|---|
LightGBM Regressor | Intraday | [Link](ml/mlflow/lightgbm/ |
Strategy | Frequency | Description |
---|