Giter Site home page Giter Site logo

tashares's Introduction

tashares

Tashares is a python module to forecast China A-shares price trend in 1, 2 and 5 days. It's an open-source tool, that utilizes yfinance and talib to generate 155 techinical analysis features, and then leverage catboost to build three ranking models that order all stock prices of interest from trending up to trending down relatively.

installation

TA Lib (technical analysis library) is a prerequisite, which can be installed from source code https://ta-lib.org/hdr_dw.html.

On a Linux machine, download source code ta-lib-0.4.0-src.tar.gz , and then run the following to complete installation.

tar -xzf ta-lib-0.4.0-src.tar.gz
cd ta-lib/
./configure --prefix=/usr
make
sudo make install

Ref to Python wrapper for TA-LIB for installatiob instructions on other platforms.

Then this is the recommended way to install tashares:

pip install tashares

The latest code on github can be installed with:

pip install git+https://github.com/joeycw/tashares

quick start

After installation, launch python3, run a test

>>> from tashares.tashares import Tashares
>>> tas = Tashares()
>>> tas()

The test result looks like the following, and the ashares list is under the data folder, tashares/data/list_of_interest.

   rank     symbol       date     day_1     day_2     day_5     score                        shortname
0      0  300059.SZ 2022-02-18  0.316329  0.256538  0.338447  0.303771               east_money_informa
1      1  600000.SS 2022-02-18  0.086610  0.229318  0.278299  0.198076  shanghai_pudong_development_ban
2      2  000100.SZ 2022-02-18 -0.004794  0.180493  0.398047  0.191249               tcl_technology_gro
3      3  000725.SZ 2022-02-18 -0.056116  0.084442  0.306973  0.111766                boe_technology_gp
4      4  601899.SS 2022-02-18  0.029995 -0.122044 -0.079928 -0.057326        zijin_mining_group_co.ltd
5      5  600660.SS 2022-02-18 -0.038651 -0.057926 -0.117553 -0.071377  fuyao_glass_industry_group_co._
6      6  000001.SZ 2022-02-18 -0.151842 -0.119519  0.037139 -0.078074                     ping_an_bank
7      7  601888.SS 2022-02-18 -0.243156 -0.091782  0.088246 -0.082231  china_tourism_group_duty_free_c
8      8  000333.SZ 2022-02-18 -0.191822 -0.098220  0.019681 -0.090121               midea_group_co_ltd
9      9  002594.SZ 2022-02-18 -0.152458 -0.182644  0.015600 -0.106501                  byd_company_ltd
10    10  688256.SS 2022-02-18 -0.207314 -0.168954 -0.126835 -0.167701  cambricon_technologies_corporat
11    11  601111.SS 2022-02-18 -0.222781 -0.134231 -0.374176 -0.243729                    air_china_ltd
12    12  601168.SS 2022-02-18 -0.231550 -0.242365 -0.306838 -0.260251          western_mining_co._ltd.
13    13  002415.SZ 2022-02-18 -0.178105 -0.295894 -0.322221 -0.265407               hangzhou_hikvision
14    14  601318.SS 2022-02-18 -0.476346 -0.201118 -0.211716 -0.296394  ping_an_insurance(group)co.of_c
15    15  601668.SS 2022-02-18 -0.351012 -0.250289 -0.473890 -0.358397  china_construction_engineering_
16    16  600196.SS 2022-02-18 -0.357091 -0.418911 -0.440672 -0.405558  shanghai_fosun_pharmaceutical_g
17    17  002460.SZ 2022-02-18 -0.382765 -0.583493 -1.123627 -0.696628               ganfeng_lithium_co
18    18  600111.SS 2022-02-18 -0.716760 -0.801826 -1.196879 -0.905155  china_nthn_rare_earth_(gp)_hi-t
19    19  002466.SZ 2022-02-18 -1.019046 -0.866705 -1.299832 -1.061861               tianqi_lithium_cor

Once a customized symbol list of interest is available, run the following to produce predictive results.

>>> tas = Tashares(symbol_list="/absolute/path/to/list_of_ashares")
>>> tas()

The format of symbol list is the following. There is a longer list list_of_all of more than 1000 under the data folder.

002415.SZ
000100.SZ
601168.SS
002466.SZ
601668.SS

experimental results

Trained catboost models on 5 years trading history, from 2015-01-01 to 2020-01-01, and tested on about 2 years trading history after 2020-01-01. Here each trading day is treated as a ranking list.

Here are the test metrics for the model to predict next day's trending:

Metric Value
ranking AUC 0.5548286031164161
ranking QueryAUC 0.5656376778982241
ranking DCG:top=20 4.602394544886906
ranking NDCG:top=20 0.6537243035654637
ranking MAP:top=10 0.5476869971732982
ranking MRR:top=10 0.8856164383561643
ranking PrecisionAt:top=10 0.6618721461187219
ranking RecallAt:top=10 0.009447016096515872
ranking PrecisionAt:top=20 0.6239726027397255
ranking RecallAt:top=20 0.017115635510283153

There are some trivial cases in the whole test set that inflate the precision. Testing on a short symbol list of interest for next 1 day, PrecisionAt:top=10 gets 0.522763, about 2.3% more accurate than random guess, and about 5.1% over random at top=5.

A-Shares Metric Day 1 Day 2 Day 5
ranking PrecisionAt:top=5 0.537354 0.505058 0.512451
ranking PrecisionAt:top=10 0.522763 0.500973 0.510700
random 0.511253 0.498915 0.513445

Note that in the Table forecasting on the next fifth day is no better than random guess.

Here is a real-world experiment lasting half year so far. The dotted vertical line indicates the date of forecasting launch, coupled with certain trading strategy.

drawing

On US stocks, the same approach shows similar ranking metrics that hasn't been applied to real world.

other content

  • send out daily email with the forecasting results enclosed. Find the script sendemail.py under the docs folder. In case you are using gmail free service, your own passcode obtained from gmail is required in Line 29.

  • dump training/test/forecast samples into three csv files rspectively for further study.

    1. (optional) specify StartFromDate in cfg.ini, Default 2015-01-01
    2. (optional) specify MaxTrainingDate in cfg.ini, Default = 2020-01-01
    3. (optional) prepare a customized symbol list, Default list_of_all under the data folder
    4. run the following to dump samples in current working directory. Three files are generated,
      1. training_ashares.csv, containing samples from StartFromDate to MaxTrainingDate
      2. test_ashares.csv, containing samples from MaxTrainingDate to yesterday
      3. forecasting_ashares.csv, containing samples of yesterday (just one day)
    >>> from tashares.wrapper import dump_datafiles
    >>> dump_datafiles()
    
    1. to get samples of customized list
    >>> dump_datafiles(symbol_list='/absolute/path/to/list')
    
    1. (optional) specify the directory to dump the three cvs files.
    >>> dump_datafiles(symbol_list='/absolute/path/to/list', data_dir='/tmp')
    

reference

disclaimer

This software is released under the mit license.

tashares's People

Contributors

joeycw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.