rachnog / deep-trading Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 697.0 229.14 MB

Algorithmic trading with deep learning experiments

Jupyter Notebook 17.45% Python 1.37% OpenEdge ABL 81.18%

deep-trading's People

Contributors

Stargazers

Watchers

Forkers

vladpaunescu jskdr jerminal nigelliyang veterun jude2014 ottolu amoliu sjtu2008 th3nolo keon wolfws diegslva cinneesol urbangene91 lkh-1 takuya-andou chopinwong01 anuragvermaknn muharremokutan hellojixian ijeomaonuosa bigdig issamlaradji xiao666 sweihub sibnick rayveni chenhuih anatoliypalash weijy026a stevenlol jiayingjie92 little1tow linsong8208 chagge xawen267 lidaguo zhliaoli xcbat lasse-sota hitum-dev mohsenai91 tpnguyen leezqcst ianmadlenya chenkuochen youngkwonjo djdongjiang scholltan solaris-meng allensmile tsenhu dalamar66 xuqy1981 tide999 vdt mikkelr60 pustar alexandre-io f-zhan gaussleescorpio adminho peratham ieyer r04922159 tianhm elephann lxd4lxd lai3d logikonline hedgefair jg-jayaganesh yydxlv gongqingyi-github anthonyng2 giladbi merico34 gr8adakron mguo001 mazecreator mrfksiv yongduek ploxoy nickpoon pranavsekhar flybirp chrisfischer fairywindchen venkim sushi30 xc35 jebenois matrixy metricle chkob citysir schollz sbdatasc 19ai

deep-trading's Issues

Slice indices issue in Simple time series forecasting.ipython

Hi Alex,

Thanks for sharing your code. I am new to Python and CNN. I found your post on Medium and would like to try out your code, logic, and approaches on my own Mac OS.
I am able to follow through your code in "Neural networks for algorithmic trading. Part One Simple time series forecasting.ipython" up to the following error.

TypeErrorTraceback (most recent call last)
in ()
6 X, Y = split_into_chunks(timeseries, TRAIN_SIZE, TARGET_TIME, LAG_SIZE, binary=False, scale=False)
7 X, Y = np.array(X), np.array(Y)
----> 8 X_train, X_test, Y_train, Y_test = create_Xt_Yt(X, Y, percentage=0.9)
9
10 Xp, Yp = split_into_chunks(timeseries, TRAIN_SIZE, TARGET_TIME, LAG_SIZE, binary=False, scale=False)

in create_Xt_Yt(X, y, percentage)
85 print ("len(X):", len(X))
86 print ("len(y):", len(y))
---> 87 X_train = X[0:len(X) * percentage]
88 Y_train = y[0:len(y) * percentage]
89

TypeError: slice indices must be integers or None or have an index method

Can you please help to shed some lights on this issue?

Thanks,
Kiko

Need versions of python used along with libraries

Hi Alex, your experiments look very interesting. I would love to try and execute them on my own machine and follow your logic and methods. I, however, am having some problems getting the Python code to execute on my machine. To give you a little background — I am also a software developer, only on the Microsoft Stack. So, programming does not intimidate me, however, Python is fairly new. I’ve taken your code and tried to open it in PyCharm (JetBrain’s Python editor) and had limited luck. But, I’ve been having a heck of a time getting all of the libraries downloaded because I'm showing there are lot of unresolved references in your code. This is probably because I don't have the same module stack as you.

A couple of questions — what version of the Python interpreter are you using? Also, is there a Python scientific library set that you are using to execute your code? For example, I downloaded the “Winpython 64-bit library” which contained most of the libraries that you reference. However, it seems to be missing some of them (e.g. indicators.py). If there isn't a specific library set, what are all of the libraries/modules you are using to run this?

Any nudge in the right direction will help.

Error in data-preprocessing in multivariate/multivariate.py

Although the FORECAST constant is 1, the label is actually 2 days after.
Check the codes between line 52 - 55.

Do a little experiment:
x = [0, 1, 2, 3]
x[0:3]
x[3]

Unable to execute Code due to => TypeError: 'generator' object is not subscriptable

Traceback (most recent call last):
File "C:\Users\alok.saw\Downloads\Deep-Trading-master\Deep-Trading-master\hyperparameters\hyper.py", line 164, in
best = fmin(experiment, space, algo=tpe.suggest, max_evals=50, trials=trials)
File "C:\Users\alok.saw\AppData\Local\Programs\Python\Python35\lib\site-packages\hyperopt\fmin.py", line 307, in fmin
return_argmin=return_argmin,
File "C:\Users\alok.saw\AppData\Local\Programs\Python\Python35\lib\site-packages\hyperopt\base.py", line 635, in fmin
return_argmin=return_argmin)
File "C:\Users\alok.saw\AppData\Local\Programs\Python\Python35\lib\site-packages\hyperopt\fmin.py", line 314, in fmin
pass_expr_memo_ctrl=pass_expr_memo_ctrl)
File "C:\Users\alok.saw\AppData\Local\Programs\Python\Python35\lib\site-packages\hyperopt\base.py", line 786, in init
pyll.toposort(self.expr)
File "C:\Users\alok.saw\AppData\Local\Programs\Python\Python35\lib\site-packages\hyperopt\pyll\base.py", line 715, in toposort
assert order[-1] == expr
TypeError: 'generator' object is not subscriptable

keras-cannot-import-name-np-utils

Error message:
from . import np_utils
ImportError: cannot import name np_utils

source: https://stackoverflow.com/questions/45060150/keras-cannot-import-name-np-utils
- solution: pip install keras==2.0.2

hyperopt.py not found

in hyperparameters models, one file named hyperopt.py is missing.

Indexing error in multimodel/process_data.py

On line 170, you have

y_i = np.std(data_chng_train[i:i+window+forecast][3])

However, this would just pull the 3rd row out of the group of 30.
I think what you meant to write was:

y_i = np.std(data_chng_train[i:i+window+forecast, 3])

But I still don't really understand what you are trying to do with np.std()

StandardScaler() for OHLCV data

I'm working on a time-series classification on financial data (not regression, but similar).

I'm using sklearn.StandardScaler() although after reading all of your posts on Medium (thanks for the help!) I'm not entirely sure that I'm not screwing it up...

I'm doing something like this to create 'lagged' data for the time window I'm trying to classify:

def lag_data(df_data):
    for each in channels:
                features_to_add.append(pd.concat([
                df_data[[each]].shift(i).add_prefix("lag_{}_".format(i)) for i in range((lookforward*-1), lookback)], axis=1))
    return pd.concat(features_to_add, axis=1)

from from sklearn.preprocessing import StandardScaler
Instantiate scaler
scaler = StandardScaler()
# Scale the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(OHLCV_data_with_lag.values))

And then like this to prepare for Conv1D in Keras:


def X_to_Conv1D_arrays(X):
    # Convert X to 3D arrays
    X = np.array(X)
    
    # Reshape data for Conv1D
    X = X.reshape(X.shape[0], X.shape[1], 1)
    
    print("X: ", X.shape)
    print("X: ", type(X))

    return X

I've gotten some decent accuracy, but I'm wondering if this is a faulty way to prepare the data...

Versions of dependent libraries needed - keras-2.0.8 fails

Could you please provide a list of the 3rd party libraries that are required to run your code. I installed for example keras-2.0.8 but this does not provide Graph any more and running training.py fails.

Tims-MBP-2:simple_forecasting cmt$ python training.py
Using TensorFlow backend.
Traceback (most recent call last):
File "training.py", line 15, in
from keras.models import Sequential, Graph
ImportError: cannot import name Graph
Tims-MBP-2:simple_forecasting cmt$

Graphs not showing the training/test data

`plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='best')
plt.show()

plt.figure()
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='best')
plt.show()`

^ for me the above cell is not producing data on the graphs, just 2 blank plots. I was wondering what I am doing wrong?

X_test2 and X_test1 missing

X_test2 and X_test1 not defined at line 128 in Deep-Trading/multimodal/train_nets.py

Wrong data preprocessing leading to better results | Multimodal project

Hi Alex,
I really like your tutorials and used them as a good example for starting own projects ;) but I think
there is a major error in the preprocessing, performed by the split_into_XY - function, in the process_data modul in the multimodal project.

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i:i+window+forecast][3])

By using the above mentioned code, for generating the regression labels, the train data contain the labels!!!
In general, the idea behind it, isn't clear to me.
First, the code should be replaced with (that's for sure):

x_i = data_chng_train[i:i+window]
y_i = np.std(data_chng_train[i+window+forecast])

But on the other hand, i dont understand, why you are using the standard deviation along the specific axis?!
Shouldn't it be:

x_i = data_chng_train[i:i+window]
y_i = data_chng_train[i+window+forecast][3]  #Using the close prize [3] as label

Then obviously all results substantially change and getting worse:

simpler_forecasting table.csv is opened as a binary file

I'm not sure why no one has reported this before, so it makes me wonder if the problem is on my end. I got stuck trying to plot the very first chart, and it took some investigation to determine that load_snp_close() was quietly failing to read any data, because of the swallowed exception TypeError: a bytes-like object is required, not 'str'.

I had to change
f = open('table.csv', 'rb').readlines()[1:]
to
f = open('table.csv', 'r').readlines()[1:]

After that, the data was read as expected.

License is missing or I can't find it

Would you mining adding a license to the code.
It's not advisable (in finance) to touch/derive code without explicit license.

Thank you for you effort
P

FileNotFoundError: File b'AAPL.csv' does not exist

when I run habrahabr.ipynb, I get the error message about b'AAPL.csv' doesn't exist,
may I have the 'AAPL.csv' file? thanks

FileNotFoundError Traceback (most recent call last)
in ()
----> 1 data = pd.read_csv('AAPL.csv')[::-1]
2 data = data.ix[:, 'Adj Close'].tolist()
3
4 # Uncomment below to use price change time series
5 # data = data.ix[:, 'Adj Close'].pct_change().dropna().tolist()

/home/smilewater/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
653 skip_blank_lines=skip_blank_lines)
654
--> 655 return _read(filepath_or_buffer, kwds)
656
657 parser_f.name = name

/home/smilewater/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
403
404 # Create the parser.
--> 405 parser = TextFileReader(filepath_or_buffer, **kwds)
406
407 if chunksize or iterator:

/home/smilewater/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, f, engine, **kwds)
762 self.options['has_index_names'] = kwds['has_index_names']
763
--> 764 self._make_engine(self.engine)
765
766 def close(self):

/home/smilewater/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
983 def _make_engine(self, engine='c'):
984 if engine == 'c':
--> 985 self._engine = CParserWrapper(self.f, **self.options)
986 else:
987 if engine == 'python':

/home/smilewater/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, src, **kwds)
1603 kwds['allow_leading_cols'] = self.index_col is not False
1604
-> 1605 self._reader = parsers.TextReader(src, **kwds)
1606
1607 # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit (pandas/_libs/parsers.c:4209)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source (pandas/_libs/parsers.c:8873)()

FileNotFoundError: File b'AAPL.csv' does not exist

undefined in train_nets.py

On line 69 of train_nets.py, last_close and next_close are undefined.

Standardization of Test Datased uses future values (conceptual error)

Hi Alex. Thanks for giving out your code, it's a very good example.

I've read your medium post, checked your code and re-coded the data pre-processing functions (and some other parts) by myself to use different standardization methods and check for possible explanations for the exceptional results you've obtained.

I think you may have committed a conceptual error by writing
if scale: timeseries = preprocessing.scale(timeseries)
https://github.com/Rachnog/Deep-Trading/blob/master/simple_forecasting/processing.py#L65

It's a good idea to standardize the data for each sliding window sample by using all and only the data inside the window (instead of methods like scenario 3 at http://sebastianraschka.com/faq/docs/scale-training-test.html). This is fine for the train dataset but not for the test dataset (your standardization method is similar, in some terms, to scenario 2 at the above URL), in this last one you shouldn't use the full window but only the data for X_test, not for Y_test. This because Y_test, at test time, isn't known, so you cannot calculate the mean and the std used in the standardization process to normalize the full sliding window sample.

Taking away that single number from the mean-std calc takes away the astonishing results of the network.

I've tried different standardization methods and the only one providing the exceptional results you've obtained is the method you've used. The other standardization methods I coded work ok but give results far away from your one.

My simple explanation is that "hiding" the information of Y_test(t) inside the mean and std of each window (then used to standardize the [X_test(t),Y_test(t)] sample) is enough to give the neural network the information it needs to reconstruct almost perfectly the output Y_test by providing X_test_std (standardized with future information) as an input and making the inverse standardization process with the mean and std (calculated with future information).

Please, let me know your thoughts.

Input shape for simple_forecasting

I'm using Python 3 and getting dimension mismatch in keras
Exception: Error when checking model input: expected dense_input_1 to have shape (None, 20) but got array with shape (0, 1)
Can you please tell me what the input should be to the model?

This is my output using model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_37 (Dense)                 (None, 500)           10500       dense_input_13[0][0]             
____________________________________________________________________________________________________
activation_30 (Activation)       (None, 500)           0           dense_37[0][0]                   
____________________________________________________________________________________________________
dropout_11 (Dropout)             (None, 500)           0           activation_30[0][0]              
____________________________________________________________________________________________________
dense_38 (Dense)                 (None, 250)           125250      dropout_11[0][0]                 
____________________________________________________________________________________________________
activation_31 (Activation)       (None, 250)           0           dense_38[0][0]                   
____________________________________________________________________________________________________
dense_39 (Dense)                 (None, 1)             251         activation_31[0][0]              
____________________________________________________________________________________________________
activation_32 (Activation)       (None, 1)             0           dense_39[0][0]                   
====================================================================================================
Total params: 136001
____________________________________________________________________________________________________