Giter Site home page Giter Site logo

typicle / master Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sjtu-quant/master

0.0 0.0 0.0 12.09 MB

This is the official code and supplementary materials for our AAAI-2024 paper: MASTER: Market-Guided Stock Transformer for Stock Price Forecasting. MASTER is a stock transformer for stock price forecasting, which models the momentary and cross-time stock correlation and guide feature selection with market information.

Python 100.00%

master's Introduction

Readme

This is the official code and supplementary materials for our AAAI-2024 paper: MASTER: Market-Guided Stock Transformer for Stock Price Forecasting. ArXiv preprint

MASTER is a stock transformer for stock price forecasting, which models the momentary and cross-time stock correlation and guides feature selection with market information.

MASTER framework

Our original experiments were conducted in a complex business codebase developed based on Qlib. The original code is confidential and exhaustive. In order to enable anyone to quickly use MASTER and reproduce the paper's results, here we publish our well-processed data and core code.

Usage

  1. Install dependencies.
  • pandas == 1.5.3
  • torch == 1.11.0
  1. Install Qlib. We have minimized the reliance on Qlib, and you can simply install it by
  • pip install pyqlib
  • pylib == 0.9.1.99
  1. Download data from OneDrive link (or an alternative choice MEGA link) and unpack it into data/ .

  2. Run main.py.

  3. We provide two trained models: model/csi300master_0.pkl, model/csi800master_0.pkl

Dataset

Form

The downloaded data is split into training, validation, and test sets, with two stock universes. Note the csi300 data is a subset of the csi800 data. You can use the following code to investigate the datetime, instrument, and feature formulation.

with open(f'data/csi300/csi300_dl_train.pkl', 'rb') as f:
    dl_train = pickle.load(f)
    dl_train.data # a Pandas dataframe

In our code, the data will be gathered chronically and then grouped by prediction dates. the data iterated by the data loader is of shape (N, T, F), where:

  • N - number of stocks. For CSI300, N is around 300 on each prediction date; For CSI800, N is around 800 on each prediction date.
  • T - length of lookback_window, T=8.
  • F - 222 in total, including 158 factors, 63 market information, and 1 label.

Market information

For convenient reference, we extract and organize market information from the published data into data/csi_market_information.csv. You can check the datetime and feature formulation in the file. Note that m is shared by all stocks. The market data is generated by the following pseudo-code.

m = []
for S in csi300, csi500, csi800:
  m += [market_index(S,-1)]
  for d in [5, 10, 20, 30, 60]:
    m += [historical_market_index_mean(S, d), historical_market_index_std(S, d)]
    m += [historical_amount_mean(S, d), historical_amount_std(S, d)]

Preprocessing

The published data went through the following necessary preprocessing.

  1. Drop NA features, and perform robust daily Z-score normalization on each feature dimension.
  2. Drop NA labels and 5% of the most extreme labels, and perform daily Z-score normalization on labels.
  • Daily Z-score normalization is a common practice in Qlib to standardize the labels for stock price forecasting. To mitigate the difference between a normal distribution and groundtruth distribution, we filtered out 5% of most extreme labels in training. Note that the reported RankIC compares the output ranking with the groundtruth, whose value is not affected by the label normalization.

Cite

If you use the data or the code, please cite our work! ๐Ÿ˜„

master's People

Contributors

litong99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.