seanny123 / da-rnn Goto Github PK

Dual-Stage Attention-Based Recurrent Neural Net for Time Series Prediction

Python 100.00%

deep-learning forecasting neural-networks pytorch

da-rnn's Introduction

Hey! I'm Sean Aubin!

I'm a punk trained in Electrical Engineering and Theoretical Neuroscience, which means I'm a data generalist competent in Machine Learning. I have a blog that usually updates once a year.

Further details about my professional experience are (unfortunately) on LinkedIn.

Feel free to contact me via email or LinkedIn. I'm trying to avoid using Twitter and have not yet adopted another social network.

Computery Interests

Dynamicland: Making computing collaborative and humane by embodying it in the world.
Zig: Replacing C for low-level computation.
Oils: An upgrade path from Bash to a better shell language.

Other Interests

Social System Design: Designing systems (both digital and in-person) which allow for meaningful collaborations.

AFK

I run the Toronto Applied Rationality Meetup where we discuss complex topics with compassion, nuance, and mindfulness.

da-rnn's People

Stargazers

Watchers

Forkers

callingwisdom vgoklani chriszhangpodo isaacmg shubhampachori12110095 vpquant selvamshan metricle motjumi aryancodify vanducdoan guanhuazhan whidbey taogeanton2 zhangjiekui cxz llmlin tiara424 josephstalin117 jtiscione emrul goshaq matthiasbouquet jessicajiangdx mahaalkh himikk bballamudi ahmedengu thetrueharvey sunghoonyang wildcat47 arita37 vfp1 mindis futuregoingon rezaarmand nintharacha concenterate crystal22 joss13aws binguidata jjd209 htw5295 wolfhu kdcro101 entrpn shreeeyash xianyuhe cczhgit dgai91 jerudamaja vcjy2017 sahanduiuc qshan2170 arithmeticjia lucky7chess deep-learning-trader dawnywu ringwraith rainerheintzmann gitouyou maurice1979 royshan chen-lc cherrypiecoco tdg2088 gre-examination zarasmn goldenwalden jianpei-w karumanchi yvonnelu dan-r95 2017wxyzwxyz vmbbc jiepingwu jangdonghae yfulin prajwalthakur gurami85 valeman rovedream quentinambard huqy wtwong316 aramnasser quanthao kurucan yujiandiao freshklauser yiyg510 fancy1573 hello-starry tripleess passion4energy zelinwang123 afcarl asasapd jimmyiskandar bigandsweet

da-rnn's Issues

Can't find tanh function in eqn. 8

like above, i can't find tanh function.

It's weird that this code can only performance well on predicting 'NDX'

I found that this da-rnn can only predict 'NDX'.
If I try to use it to predict other colums such as 'YHOO' or 'XLNX', the results are bad.
And here is the most weird thing. I modified it to do the single varible prediction (e.g. using 'NDX' as both the input and target). And it still only works well on 'NDX'. The results on other stocks are bad (the Loss doesn't decrease).
Can this be explained?

the performance on 'NDX'

the performance on 'AAL'

Data overlapping in train/test split

Current version of predict function creates overlapping batch 1st element' indexes for train and test X and y_history tensors. Last item from X in train is first item in X in test. And due to mentioned in issue #4 gap between y_hist and y_targ there is one sequence missing in last chunk of splitted y_pred: i.e. we have dummy dataset with numbers as targs from 1 to 60, out last item in last batch would be 58 with y_targ = [60] leaving time window with 59 number out of party

FileNotFoundError: [Errno 2] No such file or directory: '/da-rnn/plots/pred_0.png'

why? The code has no problem

Error using CUDA

I get this error when trying to use the code with GPU (it works fine with CPU):

2019-05-14 19:51:48,675 - VOC_TOPICS - INFO - Shape of data: (40560, 82).
Missing in data: 0.
2019-05-14 19:51:48,785 - VOC_TOPICS - INFO - Training size: 28392.
2019-05-14 19:51:51,329 - VOC_TOPICS - INFO - Iterations per epoch: 221.812 ~ 222.
Traceback (most recent call last):
  File "main.py", line 196, in <module>
    iter_loss, epoch_loss = train(model, data, config, n_epochs=10, save_plots=save_plots)
  File "main.py", line 84, in train
    loss = train_iteration(net, t_cfg.loss_func, feats, y_history, y_target)
  File "main.py", line 143, in train_iteration
    input_weighted, input_encoded = t_net.encoder(numpy_to_tvar(X))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/da-rnn/da-rnn/modules.py", line 43, in forward
    input_data.permute(0, 2, 1)), dim=2)  # batch_size * input_size * (2*hidden_size + T - 1)
RuntimeError: Expected object of backend CPU but got backend CUDA for sequence element 2 in sequence argument at position #1 'tensors'

RuntimeError

win7 64bit, python 3.7.0, cuda 9

Warning (from warnings module):
File "E:\python370\lib\site-packages\sklearn\externals\joblib\externals\cloudpickle\cloudpickle.py", line 47
import imp
DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
2018-11-06 22:37:50,303 - VOC_TOPICS - INFO - Using computation device: cuda:0
2018-11-06 22:37:52,505 - VOC_TOPICS - INFO - Shape of data: (40560, 82).
Missing in data: 0.
2018-11-06 22:37:52,637 - VOC_TOPICS - INFO - Training size: 28392.
2018-11-06 22:38:02,391 - VOC_TOPICS - INFO - Iterations per epoch: 221.812 ~ 222.
Traceback (most recent call last):
File "E:\python370\Scripts\da-rnn-master\main.py", line 196, in
iter_loss, epoch_loss = train(model, data, config, n_epochs=10, save_plots=save_plots)
File "E:\python370\Scripts\da-rnn-master\main.py", line 84, in train
loss = train_iteration(net, t_cfg.loss_func, feats, y_history, y_target)
File "E:\python370\Scripts\da-rnn-master\main.py", line 143, in train_iteration
input_weighted, input_encoded = t_net.encoder(numpy_to_tvar(X))
File "E:\python370\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(input, **kwargs)
File "E:\python370\Scripts\da-rnn-master\modules.py", line 43, in forward
input_data.permute(0, 2, 1)), dim=2) # batch_size * input_size * (2hidden_size + T - 1)
RuntimeError: Expected a Tensor of type torch.FloatTensor but found a type torch.cuda.FloatTensor for sequence element 2 in sequence argument at position #1 'tensors'

The result value is different from raw data because of StandardScaler(). How can I get the plots and calculate MSE use raw data?

Regarding scaling of data

I have seen that standardscaler.fit(X) is being used which which scale the entire data.But the usual practice is to fit on the training data and apply the same mean on testing and validation data set.I am new to this feild and doesnt know how to preprocess time series data.Kindly reply

Is there any room for gpu memory improvements?

Hi, it looks like the loop inside the decoder consumes too much gpu memory very quickly if you try to increase history size because we keep tracking hidden and cell units. I wonder if there is room for any improvements. Would it kill the purpose of the model if we detach some things somehow?

Dose this model genelarize well on your (other) dataset?

Thank you very much for the implementation. And I wonder whether there is someone applying this method on other datasets and how's the performance?

When I apply this method on my datasets (a traffic dataset and a disease dataset), there are two problems: 1. the loss is very big, i.e., the model cannot learn the pattern, much worse than the vanilla LSTM, wired. 2. in some cases, the val loss drops quickly, but increases explosively (in 1-2 epochs).

I have tried to use the minmax scaler and the gradient norm clip to address the problem, but these don't work. As the encoder use the whole sequence for attention, the T cannot be too big, that limits the information inputed to the model. But I still think it is hard to tune this model in other datasets. Does someone have similar experience?

Reg predicting the output

The paper is a NARX problem and as such, the predicted value at a time step should be used for next prediction.But here its not used.When this code is used as such, the predictions are very good.But when I modified the code to use predicted values for the next prediction, its not giving good result.But as per the paper their RMSE is very low even under this setting.But I couldn't find any difference in the implementation w.r,t the paper.

I got an error when run main_predict.py after running main.py successful

Traceback (most recent call last): File "D:/graduate/Code/Research/DA_RNN/Seanny123/main_predict.py", line 74, in <module> final_y_pred = predict(enc, dec, data, **da_rnn_kwargs) File "D:/graduate/Code/Research/DA_RNN/Seanny123/main_predict.py", line 49, in predict y_pred[y_slc] = decoder(input_encoded, y_history).cpu().data.numpy() File "B:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "D:\graduate\Code\Research\DA_RNN\Seanny123\modules.py", line 108, in forward y_tilde = self.fc(torch.cat((context, y_history[:, t]), dim=1)) # (batch_size, out_size) File "B:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "B:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "B:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1369, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [128 x 40624], m2: [65 x 1] at C:\w\1\s\tmp_conda_3.7_021303\conda\conda-bld\pytorch_1565316900252\work\aten\src\TH/generic/THTensorMath.cpp:752
I have read the similar issues #2 but there isn't "unsqueeze(1)" in the newest code.
I have tried to add unsqueeze(1) back as same as the old location ( like Chandler Zuo did), but main.py and main_predict.py both cannot running rightly.
Could someone tell me how can i solve it?
Please

thank you

NaN issues when changing dat-aset.

test.csv.zip

When trying to change the dataset to a different one (in this case one that is generated randomly within certain ranges, see attached) the encoding step produces only NaN values.

why not use tanh in encoder while use it in decoder ?

Firstly, thanks for your code ,it really helps me a lot to understander the paper.
But when i debug the code , i find that in modules.py seanny used tanh in decoder while omit it in encoder ,but in paper ,formula 8 and 12 both use tanh to calculate part of attention weight.
I dont know why , can anybody offer some help?Thanks in advance !

why use companies' stock price to predict NASDAQ-100 Index?

your code in main.py line189, the code of getting data is below, why do you use companies' stock price to predict NASDAQ-100 Index?

raw_data = pd.read_csv(os.path.join("data", "nasdaq100_padding.csv"), nrows=100 if debug else None)
logger.info(f"Shape of data: {raw_data.shape}.\nMissing in data: {raw_data.isnull().sum().sum()}.")
targ_cols = ("NDX",)
data, scaler = preprocess_data(raw_data, targ_cols)

NDX should be calculated by these stock prices, isn’t it? why u have to learn the calculation formula by RNN?
The DA-RNN paper gives a time series predicting model, right? But where is your time series predicting? I am confusion.

That's what I found when I read the code repeatedly, If I got wrong or missed something, please tell me.
Thank you.

Error while executing main_predict.py

Got this error trying just to run code from github:
Traceback (most recent call last): File "main_predict.py", line 76, in <module> final_y_pred = predict(enc, dec, data, **da_rnn_kwargs) File "main_predict.py", line 51, in predict y_pred[y_slc] = decoder(input_encoded, y_history).cpu().data.numpy() File "/home/halcyon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/halcyon/darnn/modules.py", line 106, in forward y_tilde = self.fc(torch.cat((context, y_history[:, t]), dim=1)) # (batch_size, out_size) File "/home/halcyon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/halcyon/anaconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 55, in forward return F.linear(input, self.weight, self.bias) File "/home/halcyon/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1024, in linear return torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [128 x 40624], m2: [65 x 1] at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/THC/generic/THCTensorMathBlas.cu:249
Running main.py gives no error, so looks like somehow tensor sizes got mismatch at prediction stage
The only modification to code i've made is just added this string in the beginning:
torch.set_default_tensor_type('torch.cuda.FloatTensor')
w/o that string i've got 'type mismatch' from torch

tensor problem

hi, I got this error :
RuntimeError: All input tensors must be on the same device. Received cpu and cuda:0

any idea ?

Current values of external factor are used for prediction in your code?

I am confused about Chandler's change in the code. He presented in his blog, i.e. "Unlike the experiment presented in the paper, which uses the contemporary values of exogenous factors to predict the target variable, I exclude them.". What the mean is that instead of inputting synchronous external factors when predicting the target series, he only used the past values of all external series to predict the target time series? Instead, current values of external factor are not used for prediction in his code?

Multi-Step Prediction

With the current implementation (afaik) only a single step will be predicted. A modification for the Decoder part would be great

How many epoch should I choose?

I noticed that Seanny used 10 epoch in this project.But actually the train loss is still falling .
So I wanna ask you that in this project , how many epoch should I choose ?
If I use 200 epoch to train my data , it cost too quite a long time , how to speed up ？If you know how to change this project from CPU to GPU , could you please reply in detail?
Thank you in advance !

Evaluation mode missing on validation and predict

This section in predict()

da-rnn/main.py

Lines 178 to 181 in 8585806

    
           y_history = numpy_to_tvar(y_history) 
        
           _, input_encoded = t_net.encoder(numpy_to_tvar(X)) 
        
           y_pred[y_slc] = t_net.decoder(input_encoded, y_history).cpu().data.numpy()

should be changed to

with torch.no_grad():
  y_history = numpy_to_tvar(y_history)
  _, input_encoded = t_net.encoder(numpy_to_tvar(X))
  y_pred[y_slc] = t_net.decoder(input_encoded, y_history).cpu().data.numpy()

This is required to disable any gradient calculation during validation / prediction in training loop. From my understanding if not included, it wll update weights on each validation / prediction iteration.

This is original final_prediction output

This is with grad turned off during validation / prediction.

one of the variables needed for gradient computation has been modified by an inplace operation

When I run main.py, the program has an error，‘one of the variables needed for gradient computation has been modified by an inplace operation’， I am trying to modify an item with +=，but it doesn't work.Can someone help solve this problem?


	y_history = numpy_to_tvar(y_history)
	_, input_encoded = t_net.encoder(numpy_to_tvar(X))
	y_pred[y_slc] = t_net.decoder(input_encoded, y_history).cpu().data.numpy()