scottfreellc / alphapy Goto Github PK

View Code? Open in Web Editor NEW

1.1K 61.0 194.0 33.98 MB

Python AutoML for Trading Systems and Sports Betting

License: Apache License 2.0

Python 100.00%

machine-learning predictive-analytics classification regression scikit-learn pandas trading stocks sports portfolio

alphapy's People

Contributors

Stargazers

Watchers

Forkers

loudermilk 42trading spark-lin allensmile zhanglae gongqingyi-github jiangwm python3pkg superaja gucasbrg meghs91 genliu777 dsnoor zwang1986 ikhaled hemantmshah26 bjornl eycab ralic krzysiekpi jane-hnatiuk cmstewart fredwang222 somu-analytics clustersdata ravi-code-ranjan randomwak stevenlol autodataplatform aborodya githubbla tqangxl zapper199 ianmadlenya scilear hujiaogen rizplate michaelfriedberg praveenmunagapati sonhoang198 lukw00heck linuxster tengben0905 vskynet liuweiping2020 ltoscano vishalbelsare eswdevplus bsjung kcompher seanlty nandopacheco memazouni zenghanfu fantasticer renwoox agentsmith7 sanjayarora hacknuke tookennysupreme oldrichsmejkal arcayi prasaddeshpande sudarsan-sridharan magic9911 celinecheng1029 coolsnake james-bao zhaofinance afcarl maxpolokov sylinuxhy thincat kaplanemrah softdzx quanttrade joe-nano bitchaincapital modi975 bvrgs arungahlot ondrocks rydeen7 fagan2888 ai-hub-deep-learning-fundamental abouitforextradingblog raaraa abhisam lxngoddess5321 marchanero czr1803 karagul mrkoooooo cirrusdriver arachchi shubhampachori12110095 ds-madhavan-ramani laokpa nirvananimbusa rouzbe

alphapy's Issues

Export train and test files after feature extraction.

We have been experimenting recently with H2O AutoML and other "driverless" programs. After creating the original train and test files, we also want to save date-stamped files after feature extraction. These date-stamped files can then be imported into H2O or any other ML tool for testing. This task has been completed and will be included in the next release.

Error while Trading Model example running

Describe the bug
While running the tutorial getting the error:

Traceback (most recent call last):
  File "/Users/snusik_zzz/GIT/ap/venv/bin/mflow", line 8, in <module>
    sys.exit(main())
  File "/Users/snusik_zzz/GIT/ap/venv/lib/python3.8/site-packages/alphapy/market_flow.py", line 435, in main
    model = market_pipeline(model, market_specs)
  File "/Users/snusik_zzz/GIT/ap/venv/lib/python3.8/site-packages/alphapy/market_flow.py", line 302, in market_pipeline
    run_analysis(a, lag_period, forecast_period, leaders, predict_history)
  File "/Users/snusik_zzz/GIT/ap/venv/lib/python3.8/site-packages/alphapy/analysis.py", line 270, in run_analysis
    analysis.model = main_pipeline(model)
  File "/Users/snusik_zzz/GIT/ap/venv/lib/python3.8/site-packages/alphapy/__main__.py", line 436, in main_pipeline
    model = training_pipeline(model)
  File "/Users/snusik_zzz/GIT/ap/venv/lib/python3.8/site-packages/alphapy/__main__.py", line 230, in training_pipeline
    model = sample_data(model)
  File "/Users/snusik_zzz/GIT/ap/venv/lib/python3.8/site-packages/alphapy/data.py", line 280, in sample_data
    X, y = sampler.fit_sample(X_train, y_train)
AttributeError: 'RandomUnderSampler' object has no attribute 'fit_sample'

Desktop:

OS: macOS 10.15.2

I think it should be X, y = sampler.fit_resample(X_train, y_train) in the data.py file

Missing LICENSE

Please add a LICENSE file to the repository :-)

Samples not working

The examples do not work. Is it possible to get examples that work or are prepared to be used? Would it be possible to show a solution to the mistakes?

MarketFlow fractal

Convert the fractal field to pandas offset aliases. This will allow the user to specify custom analysis periods. AlphaPy will fetch data from the source (specified in schema) and then the user can choose to resample to the given duration, as specified by fractal in the market.yml file.

KeyError: 'fb_stock_prices_1d'

I am working on "Trading System" with this as an example https://www.linkedin.com/pulse/developing-trading-system-alphapy-mark-conway but I am getting this error;

short;

[06/25/17 03:21:33] INFO	Getting fb data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: fb
[06/25/17 03:21:33] INFO	No DataFrame for fb
[06/25/17 03:21:33] INFO	Getting nflx data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: nflx
[06/25/17 03:21:33] INFO	No DataFrame for nflx
[06/25/17 03:21:33] INFO	Getting aapl data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: aapl
[06/25/17 03:21:33] INFO	No DataFrame for aapl
[06/25/17 03:21:33] INFO	Getting amzn data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: amzn
[06/25/17 03:21:33] INFO	No DataFrame for amzn
[06/25/17 03:21:33] INFO	Getting googl data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: googl
[06/25/17 03:21:33] INFO	No DataFrame for googl

Long;

--> mflow --pdate 2017-01-01
None
/Users/user/.virtualenvs/ml/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated since IPython 4.0. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
/Users/user/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/Users/user/.virtualenvs/ml/lib/python2.7/site-packages/sklearn/learning_curve.py:23: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
[06/25/17 03:21:33] INFO	********************************************************************************
[06/25/17 03:21:33] INFO	MarketFlow Start
[06/25/17 03:21:33] INFO	********************************************************************************
[06/25/17 03:21:33] INFO	Training Date: 1900-01-01
[06/25/17 03:21:33] INFO	Prediction Date: 2017-01-01
[06/25/17 03:21:33] INFO	MarketFlow Configuration
[06/25/17 03:21:33] INFO	Getting Features
[06/25/17 03:21:33] INFO	Defining Groups
[06/25/17 03:21:33] INFO	Added: set(['fb', 'nflx', 'aapl', 'amzn', 'googl'])
[06/25/17 03:21:33] INFO	Defining Aliases
[06/25/17 03:21:33] INFO	Getting System Parameters
[06/25/17 03:21:33] INFO	Defining Variables
[06/25/17 03:21:33] INFO	No Variables Found
[06/25/17 03:21:33] INFO	Getting Variable Functions
[06/25/17 03:21:33] INFO	No Variable Functions Found
[06/25/17 03:21:33] INFO	MARKET PARAMETERS:
[06/25/17 03:21:33] INFO	features        = ['hc', 'lc']
[06/25/17 03:21:33] INFO	forecast_period = 1
[06/25/17 03:21:33] INFO	fractal         = 1d
[06/25/17 03:21:33] INFO	leaders         = []
[06/25/17 03:21:33] INFO	data_history    = 1000
[06/25/17 03:21:33] INFO	predict_history = 50
[06/25/17 03:21:33] INFO	schema          = prices
[06/25/17 03:21:33] INFO	system          = {'scale': False, 'shortentry': 'lc', 'name': 'closer', 'shortexit': None, 'longexit': None, 'longentry': 'hc', 'holdperiod': 0}
[06/25/17 03:21:33] INFO	target_group    = faang
[06/25/17 03:21:33] INFO	Model Configuration
[06/25/17 03:21:33] INFO	No Treatments Found
[06/25/17 03:21:33] INFO	MODEL PARAMETERS:
[06/25/17 03:21:33] INFO	algorithms        = ['XGB']
[06/25/17 03:21:33] INFO	balance_classes   = True
[06/25/17 03:21:33] INFO	calibration       = False
[06/25/17 03:21:33] INFO	cal_type          = sigmoid
[06/25/17 03:21:33] INFO	calibration_plot  = False
[06/25/17 03:21:33] INFO	clustering        = False
[06/25/17 03:21:33] INFO	cluster_inc       = 3
[06/25/17 03:21:33] INFO	cluster_max       = 30
[06/25/17 03:21:33] INFO	cluster_min       = 3
[06/25/17 03:21:33] INFO	confusion_matrix  = True
[06/25/17 03:21:33] INFO	counts            = False
[06/25/17 03:21:33] INFO	cv_folds          = 3
[06/25/17 03:21:33] INFO	directory         = .
[06/25/17 03:21:33] INFO	extension         = csv
[06/25/17 03:21:33] INFO	drop              = ['date', 'tag', 'open', 'high', 'low', 'close', 'adjclose']
[06/25/17 03:21:33] INFO	encoder           = <Encoders.factorize: 3>
[06/25/17 03:21:33] INFO	esr               = 30
[06/25/17 03:21:33] INFO	factors           = []
[06/25/17 03:21:33] INFO	features [X]      = *
[06/25/17 03:21:33] INFO	feature_selection = False
[06/25/17 03:21:33] INFO	fs_percentage     = 10
[06/25/17 03:21:33] INFO	fs_score_func     = <function f_classif at 0x10e1861b8>
[06/25/17 03:21:33] INFO	fs_uni_grid       = [5, 10, 15, 20, 25]
[06/25/17 03:21:33] INFO	grid_search       = False
[06/25/17 03:21:33] INFO	gs_iters          = 100
[06/25/17 03:21:33] INFO	gs_random         = True
[06/25/17 03:21:33] INFO	gs_sample         = True
[06/25/17 03:21:33] INFO	gs_sample_pct     = 0.250000
[06/25/17 03:21:33] INFO	importances       = True
[06/25/17 03:21:33] INFO	interactions      = True
[06/25/17 03:21:33] INFO	isomap            = False
[06/25/17 03:21:33] INFO	iso_components    = 2
[06/25/17 03:21:33] INFO	iso_neighbors     = 5
[06/25/17 03:21:33] INFO	isample_pct       = 5
[06/25/17 03:21:33] INFO	learning_curve    = True
[06/25/17 03:21:33] INFO	logtransform      = False
[06/25/17 03:21:33] INFO	lv_remove         = True
[06/25/17 03:21:33] INFO	lv_threshold      = 0.100000
[06/25/17 03:21:33] INFO	model_type        = <ModelType.classification: 1>
[06/25/17 03:21:33] INFO	n_estimators      = 501
[06/25/17 03:21:33] INFO	n_jobs            = -1
[06/25/17 03:21:33] INFO	ngrams_max        = 1
[06/25/17 03:21:33] INFO	numpy             = False
[06/25/17 03:21:33] INFO	pca               = False
[06/25/17 03:21:33] INFO	pca_inc           = 3
[06/25/17 03:21:33] INFO	pca_max           = 15
[06/25/17 03:21:33] INFO	pca_min           = 3
[06/25/17 03:21:33] INFO	pca_whiten        = False
[06/25/17 03:21:33] INFO	poly_degree       = 2
[06/25/17 03:21:33] INFO	pvalue_level      = 0.010000
[06/25/17 03:21:33] INFO	rfe               = False
[06/25/17 03:21:33] INFO	rfe_step          = 10
[06/25/17 03:21:33] INFO	roc_curve         = True
[06/25/17 03:21:33] INFO	rounding          = 3
[06/25/17 03:21:33] INFO	sampling          = True
[06/25/17 03:21:33] INFO	sampling_method   = <SamplingMethod.under_random: 12>
[06/25/17 03:21:33] INFO	sampling_ratio    = 0.500000
[06/25/17 03:21:33] INFO	scaler_option     = True
[06/25/17 03:21:33] INFO	scaler_type       = <Scalers.standard: 2>
[06/25/17 03:21:33] INFO	scipy             = False
[06/25/17 03:21:33] INFO	scorer            = roc_auc
[06/25/17 03:21:33] INFO	seed              = 10231
[06/25/17 03:21:33] INFO	sentinel          = -1
[06/25/17 03:21:33] INFO	separator         = ,
[06/25/17 03:21:33] INFO	shuffle           = True
[06/25/17 03:21:33] INFO	split             = 0.400000
[06/25/17 03:21:33] INFO	submission_file   = None
[06/25/17 03:21:33] INFO	submit_probas     = False
[06/25/17 03:21:33] INFO	target [y]        = wr
[06/25/17 03:21:33] INFO	target_value      = 1
[06/25/17 03:21:33] INFO	treatments        = None
[06/25/17 03:21:33] INFO	tsne              = False
[06/25/17 03:21:33] INFO	tsne_components   = 2
[06/25/17 03:21:33] INFO	tsne_learn_rate   = 1000.000000
[06/25/17 03:21:33] INFO	tsne_perplexity   = 30.000000
[06/25/17 03:21:33] INFO	vectorize         = False
[06/25/17 03:21:33] INFO	verbosity         = 1
[06/25/17 03:21:33] INFO	Creating directory ./data
[06/25/17 03:21:33] INFO	Creating directory ./input
[06/25/17 03:21:33] INFO	Creating directory ./model
[06/25/17 03:21:33] INFO	Creating directory ./output
[06/25/17 03:21:33] INFO	Creating directory ./plots
[06/25/17 03:21:33] INFO	Creating directory ./systems
[06/25/17 03:21:33] INFO	Creating Model
[06/25/17 03:21:33] INFO	Running MarketFlow Pipeline
[06/25/17 03:21:33] INFO	Running Long/Short System closer
[06/25/17 03:21:33] INFO	All Members: set(['fb', 'nflx', 'aapl', 'amzn', 'googl'])
[06/25/17 03:21:33] INFO	Getting Daily Data
[06/25/17 03:21:33] INFO	Getting fb data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: fb
[06/25/17 03:21:33] INFO	No DataFrame for fb
[06/25/17 03:21:33] INFO	Getting nflx data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: nflx
[06/25/17 03:21:33] INFO	No DataFrame for nflx
[06/25/17 03:21:33] INFO	Getting aapl data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: aapl
[06/25/17 03:21:33] INFO	No DataFrame for aapl
[06/25/17 03:21:33] INFO	Getting amzn data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: amzn
[06/25/17 03:21:33] INFO	No DataFrame for amzn
[06/25/17 03:21:33] INFO	Getting googl data for last 1000 days
[06/25/17 03:21:33] INFO	Could not retrieve data for: googl
[06/25/17 03:21:33] INFO	No DataFrame for googl
[06/25/17 03:21:33] INFO	Applying variable: hc
[06/25/17 03:21:33] INFO	Applying variable: lc
[06/25/17 03:21:33] INFO	Applying variable: wr
[06/25/17 03:21:33] INFO	Generating Trades for System closer
Traceback (most recent call last):
  File "/Users/user/.virtualenvs/ml/bin/mflow", line 11, in <module>
    load_entry_point('alphapy==2.0.1', 'console_scripts', 'mflow')()
  File "build/bdist.macosx-10.11-x86_64/egg/alphapy/market_flow.py", line 379, in main
  File "build/bdist.macosx-10.11-x86_64/egg/alphapy/market_flow.py", line 264, in market_pipeline
  File "build/bdist.macosx-10.11-x86_64/egg/alphapy/system.py", line 418, in run_system
  File "build/bdist.macosx-10.11-x86_64/egg/alphapy/system.py", line 174, in long_short
KeyError: 'fb_stock_prices_1d'

I think Daily Data doesn't exist. Could you please give me example data list.

Thanks,

Very interesting, but need some richer examples

I think you need a few more ready to go and enriched examples preferably notebooks that can be run directly in a working directory. As currently set up the rather skinny examples cannot be run directly in place that is if I clone a copy of the repo and start up a one of the notebook examples... Also they seem pretty incomplete vs the full capabilities?.. I'm referring specifically to the two trading ones.. (have not looked at the others..)

Nonetheless I appreciate your work,, when I can I may be able to create something along this line to contribute back,, but it will take some time since the current examples are so sparse!

IEX Cloud Migration

Previously, we could access IEX data through the Pandas Data Reader but now have to migrate to the IEX cloud endpoint. You will have to obtain your own API key to get IEX data, and we will add support to AlphaPy shortly to access historical data through the IEX cloud.

Blended Model in Prediction Mode

A saved blended model does not currently work in prediction mode. This is an open bug.

AIMetatrader.com

Hello,

i am a privat Person and i try to learn programming. I have a project to include Google AI latest to Metatrader4/5. Actually i want to inframe MT4/5 to a Domain called www.AIMetatrader.com and connect the MT4/5 to Python to include AlphaPy and open-source the result, as a free Web-Trading-AI.

This is difficult for me, because i am not yet a programmer. I eater don't have Money to pay a programmer so i want to asking you for Partnership on this Project. I hope you are interested and agree.

Error in importing BalanceCascade from imblearn.ensemble

Describe the bug
ImportError: cannot import name 'BalanceCascade' from 'imblearn.ensemble' (/opt/conda/lib/python3.7/site-packages/imblearn/ensemble/init.py)

To Reproduce
Steps to reproduce the behavior:

Following instructions from here: https://alphapy.readthedocs.io/en/latest/tutorials/kaggle.html
Running step 2 (alphapy) throws the following error. Seems like there is no BalanceCascade in imblearn

Expected behavior
No error thrown.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Google Cloud Platform Virtual Machine running Linux

Additional context
Not sure what I'm doing wrong. Googled the imblearn package and they dont seem to have a BalanceCascade class in ensemble: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/imblearn/ensemble/__init__.py
Any help is appreciated!

features.py get_text_features pandas 1x support

the following lines...

    min_length = int(feature.str.len().min())
    max_length = int(feature.str.len().max())

seem to fall foul of Pandas 1.x stricter rules. The following seems to work...

    min_length = int(feature.astype(str).str.len().min())
    max_length = int(feature.astype(str).str.len().max())

How to adapt custom market datasource (another market)?

Hi, AlphaPy is an amazing project , it completely splits indicator and feature generation out, I am very appreciate .
I'd like to use China Market and Binace Market, already build a local warehouse, but don't find a doc to adapt it.
I mean it would be nice to adapt local file-based warehouse, since the datasets are very large and wouldn't be efficient to download again .

Sequence-to-Sequence Input/Output

Allow multiple rows of lagged data to produce forecasts. Currently, columns are shifted in situ, and this is confusing when identifying which features have been lagged and which have not. We have defined a convention for variable names, e.g., close[1] is the previous close, but we want to make this explicit in the data frame itself, so that you can clearly see when multiple variables have been lagged, e.g., close[1], close[2], etc. This also applies for future forecasts, e.g., target[+2] or target[+10]. This capability will allow easy transformation for Keras sequence-to-sequence learning in the future. Expect a new version around the end of August, along with other fixes.

Features Generation BugFixes

Please see attached patch that fixes a bunch of bugs around feature generation in mflow.

It has the following...

Fix for this issue...
#33
train mode now correctly ignores the --pdate argument rather
than falling over.
Arrays of NaN in feature generation were being dropped
rather than added as a column of sentinels.
Multi feature generation was
failing due to feature name count not matching feature count, improved the
asserts around this and fixed the feature names.
Disabled scipy signal-to-noise ratio as this seems to be long-since deprecated. (my understanding is that the scipy version currently in use would fail for this feature).

mflow_bugfixes.txt

NCAAB game file name

[09/28/17 10:15:30] INFO Loading data from ./data/NCAAB_game_scores_1g.cs
[09/28/17 10:15:31] INFO Could not find or access ./data/NCAAB_game_scores_1g.csv

we only have /data/ncaab_game_scores_1g.csv, change to NCAAB resolved the problem.

variables.vexec inconsistent with docs

Hi,

Hopefully these contributions are useful, it seems very quiet, hopefully you are all in good health!

vexec's doc says
"To write your own variable function, your function must have
a pandas DataFrame as an input parameter and must return
a pandas DataFrame with the new variable(s)."

However it doesn't quite work that way. Perhaps something like the change below would be useful?

diff --git a/alphapy/variables.py b/alphapy/variables.py
index 8477647..ed3bc9a 100644
--- a/alphapy/variables.py
+++ b/alphapy/variables.py
@@ -448,7 +448,11 @@ def vexec(f, v, vfuncs=None):
                         func = None
             if func:
                 # Create the variable by calling the function
-                f[v] = func(*newlist)
+                r = func(*newlist)
+                if(type(r) is pd.core.frame.DataFrame):
+                    f = pd.concat([f, r], axis=1, join='inner')
+                else:
+                    f[v] = r
             elif func_name not in dir(builtins):
                 module_error = "*** Could not find module to execute function {} ***".format(func_name)
                 logger.error(module_error)

All the best.

Support Multiple Systems

This change allows the ability to add multiple systems to the market.yml

The modified yaml would look like...

system:
  closer_scaled:
    holdperiod : 0
    longentry  : hc
    longexit   :
    shortentry : lc
    shortexit  :
    scale      : True
  closer_unscaled:
    holdperiod : 0
    longentry  : hc
    longexit   :
    shortentry : lc
    shortexit  :
    scale      : False

@@ -298,26 +299,27 @@ def market_pipeline(model, market_specs):
     system_specs = market_specs['system']
     if system_specs:
         # get the system specs
-        system_name = system_specs['name']
-        longentry = system_specs['longentry']
-        shortentry = system_specs['shortentry']
-        longexit = system_specs['longexit']
-        shortexit = system_specs['shortexit']
-        holdperiod = system_specs['holdperiod']
-        scale = system_specs['scale']
-        logger.info("Running System %s", system_name)
-        logger.info("Long Entry  : %s", longentry)
-        logger.info("Short Entry : %s", shortentry)
-        logger.info("Long Exit   : %s", longexit)
-        logger.info("Short Exit  : %s", shortexit)
-        logger.info("Hold Period : %d", holdperiod)
-        logger.info("Scale       : %r", scale)
-        # create and run the system
-        system = System(system_name, longentry, shortentry,
-                        longexit, shortexit, holdperiod, scale)
-        tfs = run_system(model, system, group, intraday)
-        # generate a portfolio
-        gen_portfolio(model, system_name, group, tfs)
+        for system_name in system_specs:
+            longentry = system_specs[system_name]['longentry']
+            shortentry = system_specs[system_name]['shortentry']
+            longexit = system_specs[system_name]['longexit']
+            shortexit = system_specs[system_name]['shortexit']
+            holdperiod = system_specs[system_name]['holdperiod']
+            scale = system_specs[system_name]['scale']
+            logger.info("Running System %s", system_name)
+            logger.info("Long Entry  : %s", longentry)
+            logger.info("Short Entry : %s", shortentry)
+            logger.info("Long Exit   : %s", longexit)
+            logger.info("Short Exit  : %s", shortexit)
+            logger.info("Hold Period : %d", holdperiod)
+            logger.info("Scale       : %r", scale)
+            # create and run the system
+            system = System(system_name, longentry, shortentry,
+                            longexit, shortexit, holdperiod, scale)
+            tfs = run_system(model, system, group, intraday)
+            # generate a portfolio
+            gen_portfolio(model, system_name, group, tfs)
 
     # Return the completed model
     return model

KeyError: 'lag_period'

Commands used on OSX:

pip install xgboost
pip install -U alphapy
git clone https://github.com/ScottFreeLLC/AlphaPy.git
cd AlphaPy/alphapy/examples/Trading\ Model/
mflow --pdate 2017-01-01

Output:


/usr/local/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
None
/usr/local/lib/python3.6/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
[11/22/17 08:53:30] INFO	********************************************************************************
[11/22/17 08:53:30] INFO	MarketFlow Start
[11/22/17 08:53:30] INFO	********************************************************************************
[11/22/17 08:53:30] INFO	Training Date: 1900-01-01
[11/22/17 08:53:30] INFO	Prediction Date: 2017-01-01
[11/22/17 08:53:30] INFO	MarketFlow Configuration
Traceback (most recent call last):
  File "/usr/local/bin/mflow", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/alphapy/market_flow.py", line 363, in main
    market_specs = get_market_config()
  File "/usr/local/lib/python3.6/site-packages/alphapy/market_flow.py", line 94, in get_market_config
    specs['lag_period'] = cfg['market']['lag_period']
KeyError: 'lag_period'

Error During save_predictions

Describe the bug
Attempting to run the MarketFlow tutorial, the code throws an error after plot generation, during what seems like its attempt to write the data.

To Reproduce
Steps to reproduce the behavior:

~> mflow --train 2019-01-01
~> mflow --predict 2020-07-21

both train and predict flags seem to exit at the same place.

Expected behavior
I expected the code to finish and write out the file to the ./output/ directory

Desktop (please complete the following information):

OS: Linux Mint 18.3
Python: v3.7.7

Traceback
[07/20/20 21:06:02] INFO Writing feature map to ./model/feature_map_20200720.pkl
[07/20/20 21:06:02] INFO Loading data from ./input/test_20200720.csv
Traceback (most recent call last):
File "/home/michael/miniconda3/bin/mflow", line 8, in
sys.exit(main())
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/market_flow.py", line 430, in main
model = market_pipeline(model, market_specs)
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/market_flow.py", line 292, in market_pipeline
run_analysis(a, lag_period, forecast_period, leaders, predict_history)
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/analysis.py", line 270, in run_analysis
analysis.model = main_pipeline(model)
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/main.py", line 426, in main_pipeline
model = training_pipeline(model)
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/main.py", line 289, in training_pipeline
save_model(model, 'BEST', Partition.test)
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/model.py", line 1315, in save_model
preds, probas = save_predictions(model, tag, partition)
File "/home/michael/miniconda3/lib/python3.7/site-packages/alphapy/model.py", line 1208, in save_predictions
pd_indices = pf[pf.date >= predict_date].index.tolist()
File "/home/michael/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'date'

Quandl WIKI Prices

We will soon add functionality for obtaining end-of-stock data from Quandl.

ValueError: model.yml features:encoding:type target unrecognized

I'm getting this error when running the examples.

Any ideas?

Traceback (most recent call last):
  File "/usr/local/bin/alphapy", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/alphapy/__main__.py", line 471, in main
    specs = get_model_config()
  File "/usr/local/lib/python3.7/site-packages/alphapy/model.py", line 270, in get_model_config
    raise ValueError("model.yml features:encoding:type %s unrecognized" % encoder)
ValueError: model.yml features:encoding:type target unrecognized

Startup problems

Hello Scott,

I like this project a lot but I have issues getting it up and running. I tried both under Windows and Ubuntu. In Ubuntu at least the pip3 install alphapy worked fine.

But when I move to one of the example directories and try to run the commands you show in the documentation, it just does not find the alphapy or market_flow files. Even when I put those files inside the same directory and try to run them from the Python editor environment, I get all kinds of errors.

How can you run the examples directly from the Python editor environment?

Data is not pulling from internet

When I run mflow on the tutorial "trading model" for some reason this code that is trying to receive the data just goes to the else. Is there something im doing wrong? Thanks in advance for the help!

Error while making a prediction with sflow

Description of the error

When following the NCAA Basketball Tutorial and trying to make a prediction on a date with the model created and trained, the system throws the following error:

"IndexError: boolean index did not match indexed array along dimension 1; dimension is 96 but corresponding boolean dimension is 106"

It is thrown at this part of the code:

Traceback (most recent call last):
File "/Users/alejandrovargasperez/opt/anaconda3/bin/sflow", line 8, in
sys.exit(main())
File "/Users/alejandrovargasperez/opt/anaconda3/lib/python3.8/site-packages/alphapy/sport_flow.py", line 912, in main
model = main_pipeline(model)
File "/Users/alejandrovargasperez/opt/anaconda3/lib/python3.8/site-packages/alphapy/main.py", line 434, in main_pipeline
model = prediction_pipeline(model)
File "/Users/alejandrovargasperez/opt/anaconda3/lib/python3.8/site-packages/alphapy/main.py", line 364, in prediction_pipeline
X_all = create_interactions(model, X_all)
File "/Users/alejandrovargasperez/opt/anaconda3/lib/python3.8/site-packages/alphapy/features.py", line 1316, in create_interactions
pfeatures, pnames = get_polynomials(X[:, support], poly_degree)

To Reproduce
Steps to reproduce the behavior:

Run the command: "sflow --pdate 2016-03-01"
When finished, run SportFlow in predict mode: "sflow --predict --pdate 2016-03-15"
See error

Deep Learning Models

Implement deep learning classifiers and regressors using the scikit-learn wrapper to Keras.

https://keras.io/scikit-learn-api/

Transforms Dictionary Error

Fix bug for multiple transforms per column, as all but one will be dropped in the Python dictionary.

transforms:
    date              : ['alphapy.transforms', 'extract_bizday']
    date              : ['alphapy.transforms', 'extract_date']
    date              : ['alphapy.transforms', 'extract_time']

ValueError: max() arg is an empty sequence and NameError: name 'search_path' is not defined

Running python 3.7.6 on command line, NCAAB tutorial on Alphapy 2.4.1

[03/10/20 17:05:51] INFO Running Model
[03/10/20 17:05:51] INFO Predict Mode
[03/10/20 17:05:51] INFO Loading Data
[03/10/20 17:05:51] INFO Loading data from ./input/train.csv
[03/10/20 17:05:51] INFO Could not find or access ./input/train.csv
[03/10/20 17:05:51] INFO Loading Data
[03/10/20 17:05:51] INFO Loading data from ./input/predict.csv
[03/10/20 17:05:51] INFO Found target won_on_spread in data frame
[03/10/20 17:05:51] INFO Labels (y) found for Partition.predict
Traceback (most recent call last):
File "c:\users\home\anaconda3\lib\site-packages\alphapy\model.py", line 576, in load_feature_map
file_name = most_recent_file(search_dir, 'feature_map_*.pkl')
File "c:\users\home\anaconda3\lib\site-packages\alphapy\utilities.py", line 93, in most_recent_file
file_name = max(glob.iglob(search_path), key=os.path.getctime)
ValueError: max() arg is an empty sequence

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\home\anaconda3\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "c:\users\home\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\home\Anaconda3\Scripts\sflow.exe_main.py", line 7, in
File "c:\users\home\anaconda3\lib\site-packages\alphapy\sport_flow.py", line 909, in main
model = main_pipeline(model)
File "c:\users\home\anaconda3\lib\site-packages\alphapy_main.py", line 424, in main_pipeline
model = prediction_pipeline(model)
File "c:\users\home\anaconda3\lib\site-packages\alphapy_main_.py", line 336, in prediction_pipeline
model = load_feature_map(model, directory)
File "c:\users\home\anaconda3\lib\site-packages\alphapy\model.py", line 582, in load_feature_map
logging.error("Could not find feature map in %s", search_path)
NameError: name 'search_path' is not defined

How to use saved model

How do I use in prediction after I have saved .pkl models to disk？

Cryptocurrency Prices

You can get historical cryptocurrency pricing from Quandl, but the MarketFlow pipeline needs to be modified to read directly from the data directory if no feeds are available.

Here are some sources of historical daily and intraday cryptocurrency data:

Daily : https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory
Intraday (1-minute) : https://www.kaggle.com/smitad/bitcoin-trading-strategy-simulation/data

Model.yml encoding error

Running this via Windows WSL (winbash).
Python 3.7

Run mflow and get this error:

Traceback (most recent call last):
File "/home/d/.local/bin/mflow", line 8, in
sys.exit(main())
File "/home/d/.local/lib/python3.7/site-packages/alphapy/market_flow.py", line 412, in main
model_specs = get_model_config()
File "/home/d/.local/lib/python3.7/site-packages/alphapy/model.py", line 274, in get_model_config
raise ValueError("model.yml features:encoding:type %s unrecognized" % encoder)
ValueError: model.yml features:encoding:type factorize unrecognized

This is from the config entry in console:

[03/01/20 01:54:19] INFO ********************************************************************************
[03/01/20 01:54:19] INFO MarketFlow Start
[03/01/20 01:54:19] INFO ********************************************************************************
[03/01/20 01:54:19] INFO Training Date: 1900-01-01
[03/01/20 01:54:19] INFO Prediction Date: 2020-03-01
[03/01/20 01:54:19] INFO MarketFlow Configuration
[03/01/20 01:54:19] INFO Getting Features
[03/01/20 01:54:19] INFO No Features Found
[03/01/20 01:54:19] INFO Defining Groups
[03/01/20 01:54:19] INFO Added: {'googl', 'fb', 'aapl', 'amzn', 'nflx'}
[03/01/20 01:54:19] INFO Defining Aliases
[03/01/20 01:54:19] INFO Getting System Parameters
[03/01/20 01:54:19] INFO Defining AlphaPy Variables [phigh, plow]
[03/01/20 01:54:19] INFO Defining User Variables
[03/01/20 01:54:19] INFO No Variables Found
[03/01/20 01:54:19] INFO Getting Variable Functions
[03/01/20 01:54:19] INFO No Variable Functions Found
[03/01/20 01:54:19] INFO MARKET PARAMETERS:
[03/01/20 01:54:19] INFO api_key = None
[03/01/20 01:54:19] INFO api_key_name = None
[03/01/20 01:54:19] INFO create_model = False
[03/01/20 01:54:19] INFO data_fractal = 1d
[03/01/20 01:54:19] INFO data_history = 500
[03/01/20 01:54:19] INFO features = {}
[03/01/20 01:54:19] INFO forecast_period = 1
[03/01/20 01:54:19] INFO fractal = 1d
[03/01/20 01:54:19] INFO lag_period = 1
[03/01/20 01:54:19] INFO leaders = []
[03/01/20 01:54:19] INFO predict_history = 50
[03/01/20 01:54:19] INFO schema = yahoo
[03/01/20 01:54:19] INFO subject = stock
[03/01/20 01:54:19] INFO subschema = None
[03/01/20 01:54:19] INFO system = {'name': 'closer', 'holdperiod': 0, 'longentry': 'hc', 'longexit': None, 'shortentry': 'lc', 'shortexit': None, 'scale': False}

fit() got an unexpected keyword argument 'eval_set'

When I run Kaggle example I got error message:

File "/home/******/anaconda/lib/python2.7/site-packages/alphapy/model.py", line 723, in first_fit
early_stopping_rounds=esr)
TypeError: fit() got an unexpected keyword argument 'eval_set'

Can't get --predict working

Hi,

Firstly, really great project! There's an awful lot here, enjoying getting my head around it, although feel like I might need a bigger head! :-D

I'm running into this issue, and being mostly a java developer, and taking into account that no-one else is complaining, I'm struggling to understand if this is a limitation of my python knowledge or a genuine bug.

running the following...

cd Trading\ Model/
mflow
mflow --predict --pdate 2020-05-13

I get the following...

[05/13/20 10:06:23] INFO	Original Features : Index(['abovema_10[1]', 'abovema_20[1]', 'abovema_3[1]', 'abovema_5[1]',
       'abovema_50[1]', 'adx[1]', 'atr[1]', 'atr_10[1]', 'atr_14[1]',
       'atr_20[1]',
       ...
       'wr_2[1]', 'wr_3[1]', 'wr_5[1]', 'wr_6[1]', 'wr_7[1]', 'gap',
       'gapbadown', 'gapbaup', 'gapdown', 'gapup'],
      dtype='object', length=149)
[05/13/20 10:06:23] INFO	Feature Count     : 149
[05/13/20 10:06:23] INFO	Creating Base Features
Traceback (most recent call last):
  File "/usr/local/bin/mflow", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/alphapy/market_flow.py", line 430, in main
    model = market_pipeline(model, market_specs)
  File "/usr/local/lib/python3.6/site-packages/alphapy/market_flow.py", line 292, in market_pipeline
    run_analysis(a, lag_period, forecast_period, leaders, predict_history)
  File "/usr/local/lib/python3.6/site-packages/alphapy/analysis.py", line 270, in run_analysis
    analysis.model = main_pipeline(model)
  File "/usr/local/lib/python3.6/site-packages/alphapy/__main__.py", line 424, in main_pipeline
    model = prediction_pipeline(model)
  File "/usr/local/lib/python3.6/site-packages/alphapy/__main__.py", line 351, in prediction_pipeline
    X_all = create_features(model, X_all, X_train, X_predict, y_train)
  File "/usr/local/lib/python3.6/site-packages/alphapy/features.py", line 1049, in create_features
    features, fnames = get_text_features(fnum, fname, X, nunique, vectorize, ngrams_max)
  File "/usr/local/lib/python3.6/site-packages/alphapy/features.py", line 418, in get_text_features
    min_length = int(feature.str.len().min())
  File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 5269, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/strings.py", line 2039, in __init__
    self._inferred_dtype = self._validate(data)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/strings.py", line 2096, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!

I'm running this in a docker env based on python:3.6-stretch with the following requirements.txt...

bokeh==1.3
ipython==7.2
keras-applications==1.0.8
keras-preprocessing==1.0.5
keras==2.2.4
matplotlib==3.0
numpy==1.18.4
pandas==1.0
pyyaml==5.1
scikit-learn==0.22.1
scipy==1.4.1
seaborn==0.9
tensorflow==1.15
xgboost==0.90
arrow==0.13
category_encoders==2.1
iexfinance==0.4.3
imbalanced-learn==0.5
pandas-datareader==0.8
pyfolio==0.9
joblib==0.14.1
alphapy==2.4.2

IEX Data

The next release of AlphaPy will support the IEX API.

https://iextrading.com/developer/docs/

Trading System phigh/plow etc

Hi,

Looking into the Trading System in a bit more detail, unless I'm missing something I don't think this can really work as it is at present.

Using the phigh/plow feature you run into the problem that predictions are loaded from the training run which are a consolidated list of predictions for the entire group. The system then tries to join these predictions with the price data for a single stock symbol which leads to a mismatch.

It seems that it would be better if the Trading System made its own predictions using a pre-trained model for each of the stock symbols that it iterates through. I can't see any obvious way to join the predictions back with the stock symbols, running a separate prediction would avoid this problem and mean that the Trading System could be run on new test data without running the whole training cycle.

I'm thinking of looking into this, but before I leap in, it would be good to know what the consensus is on this...?

Error in Binary Encoding

local variable 'pd_features' referenced before assignment

Yahoo Finance Daily Data through icharts no longer available

If you haven't been able to download daily data through Yahoo lately, here's why:

pydata/pandas-datareader#315

Yahoo has discontinued its free Finance API after many years, so we will search for another source of historical data.

Google Daily Data Restrictions

It now looks like Google daily data history is being limited to one year. We are rapidly running out of options for both free daily and intraday market data. To effectively train market models, we need as much historical data as possible. If anyone has any suggestions for alternate feeds (e.g., Quandl), please post them here.

AttributeError: type object 'DataFrame' has no attribute 'from_items'

Winbash python3.7 running mflow:

Traceback (most recent call last):
File "/home/freefall/.local/bin/mflow", line 8, in
sys.exit(main())
File "/home/d/.local/lib/python3.7/site-packages/alphapy/market_flow.py", line 430, in main
model = market_pipeline(model, market_specs)
File "/home/d/.local/lib/python3.7/site-packages/alphapy/market_flow.py", line 318, in market_pipeline
tfs = run_system(model, system, group, intraday)
File "/home/fredefall/.local/lib/python3.7/site-packages/alphapy/system.py", line 370, in run_system
tf = DataFrame.from_items(gtlist, orient='index', columns=Trade.states)
AttributeError: type object 'DataFrame' has no attribute 'from_items'

AI to Metatrader 4

Hello,

I'm interested in your application.

I want the AI for Metatrader 4. My idea is to use Google Servers and this Technology from Google to handle a Website that clone Metatrader4 library. To clone Website you have to import this code from that is listed on this site: https://www.mql5.com/de/articles/3024?utm_source=MetaTrader+4+WebTerminal&utm_campaign=Share+WebTerminal

Other Forum about this project: https://www.mql5.com/de/forum/223405

Can you do this as Opensource?

How do i run this with nvidia-docker?

This happends when i launch mflow --predict.

root@b1efe70104d6:/app# pip freeze | grep tensorflow
You are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
tensorflow-gpu==1.11.0
root@b1efe70104d6:/app# mflow --predict
/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Using TensorFlow backend.
Illegal instruction (core dumped)

Index out of range when running sportflow

Describe the bug
I have just installed sportflow and following the readme I get an error 'index out of range'.

To Reproduce
Steps to reproduce the behavior:

Execute sflow -pdate 2016-03-01
See error
Traceback (most recent call last):
File "/home/cif/anaconda3/bin/sflow", line 8, in
sys.exit(main())
File "/home/cif/anaconda3/lib/python3.8/site-packages/alphapy/sport_flow.py", line 678, in main
args = parser.parse_args()
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 1771, in parse_args
self.error(msg % ' '.join(argv))
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 2519, in error
self.print_usage(_sys.stderr)
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 2489, in print_usage
self._print_message(self.format_usage(), file)
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 2455, in format_usage
return formatter.format_help()
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 282, in format_help
help = self._root_section.format_help()
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 213, in format_help
item_help = join([func(*args) for func, args in self.items])
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 213, in
item_help = join([func(*args) for func, args in self.items])
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 320, in _format_usage
action_usage = format(optionals + positionals, groups)
File "/home/cif/anaconda3/lib/python3.8/argparse.py", line 395, in _format_actions_usage
start = actions.index(group._group_actions[0])
IndexError: list index out of range

Expected behavior
See sportflow output.

Desktop (please complete the following information):

OS: [e.g. iOS] Ubuntu 20.10
Browser [e.g. chrome, safari] NA
Version [e.g. 22] NA

scottfreellc / alphapy Goto Github PK

alphapy's People

Contributors

Stargazers

Watchers

Forkers

alphapy's Issues

Recommend Projects

Recommend Topics

Recommend Org