zalandoresearch / pytorch-ts Goto Github PK

View Code? Open in Web Editor NEW

1.2K 25.0 189.0 3.83 MB

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend

License: MIT License

Python 100.00%

pytorch time-series probabilistic deepar lstnet n-beats

pytorch-ts's People

Contributors

Stargazers

Watchers

Forkers

statmixedml jingmouren github553 ingmarschuster jiseungshin sprinterzzj lfywork binbinmeng east0 nature0310 mindis taogeanton2 forkmafia993 ssmall41 narendhrancs muleina bluematrix007 luyanfcp guti1 merrillli saeed1262 chandanpanda hoangperry d-st-sword forward-wyd 2017wxyzwxyz zpetan jackyvan foeinlove liuhaitao01 yyht nielsrogge vikua knut0815 bihailantian21 ann-eat-apple zeta1999 hohocode mrzhouqifei tch chendhui smfang goomhow xiaopengli1 vincent-leguen jxzhangjhu valeman christine-tinguo adelegouttes maciejdomagala smallgreycreatures ath0m samnor haowen470 pandorals shaojingzhi thomas783 lixixibj larkz tylerchoi1224 vishalbelsare juliaj cyy111 kimonili aslinagy snapbuy wn1695173791 princejavier stevenjokess asheeshiit kejiejiang karlwong mostafa-at-github khuyentran1401 shashankdeshpande hangzhang10 yanwanquan anne-lyt yunxileo bhsnagasai sege01 achinta tripleess liang813 qwzhong1988 drcyfai sandysnow3 yaoxy2010 jimmyiskandar kashif janrth amrenderv1 nitinthedreamer rambam613 astoncpou lyapunovstability cjn-chen wlazjr zhuangweikang yagudinamir

pytorch-ts's Issues

Cannot import because of tensorboard error, yet tensorboard is installed

I have a fresh install of pytorchts in it's own environment on my Ubuntu 18 machine, but am unable to import it because of a tensorboard logging issue despite the fact that I have tensorboard installed.

Tensorboard, PyTorch, and PyTorchTS versions

(timeseries) amruch@wit:~/graphika/SBIR_COVID$ conda list | grep "board"
tensorboard               2.2.1              pyh532a8cf_0
tensorboard-plugin-wit    1.6.0                      py_0
(timeseries) amruch@wit:~/graphika/SBIR_COVID$ conda list | grep "torch"
pytorch                   1.6.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
pytorchts                 0.2.0                    pypi_0    pypi
torchvision               0.7.0                py36_cu101    pytorch

Error thrown when importing various pts modules:

(timeseries) amruch@wit:~/graphika/SBIR_COVID$ python
Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pts.dataset import ListDataset
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/__init__.py", line 6, in <module>
    from .trainer import Trainer
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/trainer.py", line 7, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
    raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above
>>> from pts.model.deepar import DeepAREstimator
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/__init__.py", line 6, in <module>
    from .trainer import Trainer
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/trainer.py", line 7, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
    raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above```

It seems to me that preventing any kind of import due to tensorboard is a bit overkill. Why not just throw a warning that progress/results cannot be logged?

DeepVAR: name 'SetField' is not defined

Description

I am trying to use DeepVAREstimator from the issue-3 branch throwing an error NameError: name 'SetField' is not defined.

To Reproduce

from pts.transform import SetField
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trainer = Trainer(device = device, epochs = 10) 

estimator = DeepVAREstimator(input_size = 401,
                             freq = "1M", 
                             prediction_length = pred_h,
                             context_length = pred_h*2,
                             target_dim = target_dim,
                             use_feat_static_cat = True,
                             cardinality = card_static,
                             trainer = trainer)                              
predictor = estimator.train(training_data = train_ds)

Error message output

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-375c015eb18b> in <module>
     20                              # time_features = feat_dynamic_real_train,
     21                              trainer = trainer)                              
---> 22 predictor = estimator.train(training_data = train_ds)
     23 predictor.__dict__["prediction_net"]

~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/estimator.py in train(self, training_data)
    132 
    133     def train(self, training_data: Dataset) -> Predictor:
--> 134         return self.train_model(training_data).predictor

~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/estimator.py in train_model(self, training_data)
     98 
     99     def train_model(self, training_data: Dataset) -> TrainOutput:
--> 100         transformation = self.create_transformation()
    101         transformation.estimate(iter(training_data))
    102 

~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/deepvar/deepvar_estimator.py in create_transformation(self)
    154                 else []
    155             )
--> 156             + [
    157                 AsNumpyArray(
    158                     field=FieldName.FEAT_STATIC_CAT, expected_ndim=1, dtype=np.long,

NameError: name 'SetField' is not defined

Potential Solution

Include the following into deepvar_estimator.py

from pts.transform import (
...
    SetField
)

TQDM issue, perhaps with versioning

/databricks/python/lib/python3.7/site-packages/pts/trainer.py in __call__(self, net, train_iter, validation_iter)
     96             if validation_iter is not None:
     97                 cumm_epoch_loss_val = 0.0
---> 98                 with tqdm(validation_iter, total=total, colour="green") as it:
     99 
    100                     for batch_no, data_entry in enumerate(it, start=1):

/databricks/python/lib/python3.7/site-packages/tqdm/std.py in __init__(self, iterable, desc, total, leave, file, ncols, mininterval, maxinterval, miniters, ascii, disable, unit, unit_scale, dynamic_ncols, smoothing, bar_format, initial, position, postfix, unit_divisor, write_bytes, lock_args, gui, **kwargs)
    946                     fp_write=getattr(file, 'write', sys.stderr.write))
    947                 if "nested" in kwargs else
--> 948                 TqdmKeyError("Unknown argument(s): " + str(kwargs)))
    949 
    950         # Preprocess the arguments

TQDM version used is incompatible signature.

Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'

While trying to forecast predictions and plot the confidence intervals, I received the following error: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'. I'm not sure what's going on, as in my code I also transformed the index datetime dtype into a float manually. Is this related to some other error in my code and/or setup? My notebook is available here: https://drive.google.com/file/d/1B7kmDmdqY-zYFscL-LyV2nTJd_GGfAJ1/view?usp=sharing. The code uses public data and loads it via a request call, so you should be able to run it as is. There are no external dependencies beyond those used in the tutorial.

This is especially odd to me because I confirmed that the datetime type (datetime64[ns]) is the same for my dataset as for the dataset used in the example before I transform it to a float (it fails either way):

My data

>>> # Assess Index dtype
>>> daily_covid_df.index
DatetimeIndex(['2020-01-22', '2020-01-22', '2020-01-23', '2020-01-23',
               '2020-01-24', '2020-01-24', '2020-01-25', '2020-01-25',
               '2020-01-26', '2020-01-26',
               ...
               '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-11',
               '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-11',
               '2020-09-11', '2020-09-11'],
              dtype='datetime64[ns]', name='date', length=10738, freq=None)

Your data

>>> import pandas as pd
>>> url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
>>> df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
>>> df.index
DatetimeIndex(['2015-02-26 21:42:53', '2015-02-26 21:47:53',
               '2015-02-26 21:52:53', '2015-02-26 21:57:53',
               '2015-02-26 22:02:53', '2015-02-26 22:07:53',
               '2015-02-26 22:12:53', '2015-02-26 22:17:53',
               '2015-02-26 22:22:53', '2015-02-26 22:27:53',
               ...
               '2015-04-22 20:07:53', '2015-04-22 20:12:53',
               '2015-04-22 20:17:53', '2015-04-22 20:22:53',
               '2015-04-22 20:27:53', '2015-04-22 20:32:53',
               '2015-04-22 20:37:53', '2015-04-22 20:42:53',
               '2015-04-22 20:47:53', '2015-04-22 20:52:53'],
              dtype='datetime64[ns]', name='timestamp', length=15831, freq=None)

Also, I'm not sure if I understand the logic of what is done when the test_data object is created.

# Create Test Split
test_data = ListDataset(
    [{"start": example_ny_df.index[0], "target": example_ny_df.positiveIncrease}],
    freq = FREQ
)

I would have presumed that because the start is the earliest date in the dataset and the target is the full data up through and to last date, that the plotting code for

for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')

would plot predictions from the start date to the end date; however, in my code it plots from mid-August to the first week of September. Is this because of the -60 in the to_pandas(test_entry)[-60] part?

Is there documentation available that explains these functions a bit more or should I just reference the code itself?

Thanks for your time and attention!

when i try to your Quick start,i have some trouble

BrokenPipeError Traceback (most recent call last)

in
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data)

D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\model\estimator.py in train(self, training_data)
146
147 def train(self, training_data: Dataset) -> Predictor:
--> 148 return self.train_model(training_data).predictor

D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\model\estimator.py in train_model(self, training_data)
134 net=trained_net,
135 input_names=get_module_forward_input_names(trained_net),
--> 136 data_loader=training_data_loader,
137 )
138

D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\trainer.py in call(self, net, input_names, data_loader)
46
47 with tqdm(data_loader) as it:
---> 48 for batch_no, data_entry in enumerate(it, start=1):
49 optimizer.zero_grad()
50 inputs = [data_entry[k].to(self.device) for k in input_names]

D:\software\anaconda\envs\tensorflow\lib\site-packages\tqdm\std.py in iter(self)
1163
1164 try:
-> 1165 for obj in iterable:
1166 yield obj
1167 # Update and possibly print the progressbar.

D:\software\anaconda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
289 return _SingleProcessDataLoaderIter(self)
290 else:
--> 291 return _MultiProcessingDataLoaderIter(self)
292
293 @Property

D:\software\anaconda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
735 # before it starts, and del tries to join but will get:
736 # AssertionError: can only join a started process.
--> 737 w.start()
738 self._index_queues.append(index_queue)
739 self._workers.append(w)

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

get_dataset() function failing

get_dataset() function failing with call

dataset = get_dataset("pts_m5", regenerate=False)

Fails with error message,

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-3674bc0c6fce> in <module>
----> 1 dataset = get_dataset("pts_m5", regenerate=False)

~/code/geo-deep-forecast/.geo-deep-env3.8-v5/lib/python3.8/site-packages/gluonts/dataset/repository/datasets.py in get_dataset(dataset_name, path, regenerate)
    189         dataset obtained by either downloading or reloading from local file.
    190     """
--> 191     dataset_path = materialize_dataset(dataset_name, path, regenerate)
    192 
    193     return load_datasets(

~/code/geo-deep-forecast/.geo-deep-env3.8-v5/lib/python3.8/site-packages/gluonts/dataset/repository/datasets.py in materialize_dataset(dataset_name, path, regenerate)
    142         the path where the dataset is materialized
    143     """
--> 144     assert dataset_name in dataset_recipes.keys(), (
    145         f"{dataset_name} is not present, please choose one from "
    146         f"{dataset_recipes.keys()}."

AssertionError: pts_m5 is not present, please choose one from odict_keys(['constant', 'exchange_rate', 'solar-energy', 'electricity', 'traffic', 'exchange_rate_nips', 'electricity_nips', 'traffic_nips', 'solar_nips', 'wiki-rolling_nips', 'taxi_30min', 'm3_monthly', 'm3_quarterly', 'm3_yearly', 'm3_other', 'm4_hourly', 'm4_daily', 'm4_weekly', 'm4_monthly', 'm4_quarterly', 'm4_yearly', 'm5']).

Report errors when using pip3

ERROR: Could not find a version that satisfies the requirement pytorchts (from versions: none)
ERROR: No matching distribution found for pytorchts

what`s the bese practice process holiday feature

Now,i placed the holidays feature in 'taget',but in deepar it should be 'feat_dynamic_real'
i`m not sure placed it in 'target' is the best way.

unable to reproduce results from notebook

I am unable to reproduce results from TimeGrad Notebook. I am getting diverging loss into NaN loss.

predictor = estimator.train(dataset_train, num_workers=8)

99it [00:22, 4.39it/s, avg_epoch_loss=0.945, epoch=0]
99it [00:22, 4.40it/s, avg_epoch_loss=0.495, epoch=1]
99it [00:22, 4.39it/s, avg_epoch_loss=0.466, epoch=2]
99it [00:22, 4.35it/s, avg_epoch_loss=0.795, epoch=3]
99it [00:22, 4.33it/s, avg_epoch_loss=0.852, epoch=4]
99it [00:22, 4.32it/s, avg_epoch_loss=nan, epoch=5]
99it [00:22, 4.33it/s, avg_epoch_loss=nan, epoch=6]
99it [00:22, 4.30it/s, avg_epoch_loss=nan, epoch=7]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=8]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=9]
99it [00:23, 4.29it/s, avg_epoch_loss=nan, epoch=10]
99it [00:23, 4.28it/s, avg_epoch_loss=nan, epoch=11]
99it [00:22, 4.33it/s, avg_epoch_loss=nan, epoch=12]
99it [00:23, 4.21it/s, avg_epoch_loss=nan, epoch=13]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=14]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=15]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=16]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=17]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=18]
99it [00:23, 4.20it/s, avg_epoch_loss=nan, epoch=19]

unable to reproduce the resultsof TimeGrad on some datasets

I'm very glad to read the paper "Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting"， it's a very intesting work and TimeGrad achieves state of the art results on multivariate time series forecasting tasks. However, I can not reproduce the results on some datasets with the model implemented in pytorch-ts. Could you release the hyperparameter settings of these datasets in the paper? Thx a lot!

Error: wrong Input size

Hello, I’m trying the example in the readme with a very simple univariate of about 1300 datapoint.
It is made of just the data (y:m:d) and the values.
I keep getting errors like input size got is 37 != expected (I tried using the length of the data frame) , but I can’t understand what is that 37.

How was the input = 43 of the example calculated?

Multivariate question

Hello im trying to train using a simple detaframe made of {timestamp, a, b} in which a on time t is 0, 1, 2, 3 and b on time t + 1 is =a, it should be able to predict b simply by a.

but i cant understand how MultivariateGrouper works and what should be the "max_target_dim"
also why in the example MultivariateEvaluator was used "quantiles=(np.arange(20)/20.0)[1:]"?

and is the "target_dim" in TempFlowEstimator the last one because that is what we want to predict or no?

Issue with fourier time-series features at weekly frequency

Hey!

My pandas version is 1.1.0. In pts/feature/fourier_date_feature.py on line 52, pandas.tseries.frequencies.to_offset is used to normalize frequency, but when the freq_str parameter is W this function produces the following:

offset = to_offset('W')
multiple, granularity = offset.n, offset.name

print(granularity)
# this prints 'W-SUN' which is equivalent to 'W'

Because of the assertion on line 66, W-SUN and thus the initial W is not accepted. Changing W to W-SUN in the features dictionary (or adding both) had fixed this issue for me.

Thanks for looking into this.

pip3 install failing

Outputs

ERROR: Could not find a version that satisfies the requirement pytorchts (from versions: none)
ERROR: No matching distribution found for pytorchts

have a problem on ubuntu example

dataset:url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
example=https://github.com/zalandoresearch/pytorch-ts

RuntimeError Traceback (most recent call last)

in
7 device=device))
8 # predictor = estimator.train(training_data=training_data, num_workers=4)
----> 9 predictor = estimator.train(training_data=training_data, num_workers=1)
10

~/Documents/pytorch-ts/pts/model/estimator.py in train(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
179 shuffle_buffer_length=shuffle_buffer_length,
180 cache_data=cache_data,
--> 181 **kwargs,
182 ).predictor

~/Documents/pytorch-ts/pts/model/estimator.py in train_model(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
147 net=trained_net,
148 train_iter=training_data_loader,
--> 149 validation_iter=validation_data_loader,
150 )
151

~/Documents/pytorch-ts/pts/trainer.py in call(self, net, train_iter, validation_iter)
70
71 inputs = [v.to(self.device) for v in data_entry.values()]
---> 72 output = net(*inputs)
73
74 if isinstance(output, (list, tuple)):

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),

~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
252 future_time_feat=future_time_feat,
253 future_target=future_target,
--> 254 future_observed_values=future_observed_values,
255 )
256

~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
226 past_observed_values=past_observed_values,
227 future_time_feat=future_time_feat,
--> 228 future_target=future_target,
229 )
230

~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
198
199 # unroll encoder
--> 200 outputs, state = self.rnn(inputs)
201
202 # outputs: (batch_size, seq_len, num_cells)

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
657 hx = self.permute_hidden(hx, sorted_indices)
658
--> 659 self.check_forward_args(input, hx, batch_sizes)
660 if batch_sizes is None:
661 result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
603 # See torch/nn/modules/module.py::_forward_unimplemented
604 def check_forward_args(self, input: Tensor, hidden: Tuple[Tensor, Tensor], batch_sizes: Optional[Tensor]): # type: ignore
--> 605 self.check_input(input, batch_sizes)
606 self.check_hidden_size(hidden[0], self.get_expected_hidden_size(input, batch_sizes),
607 'Expected hidden[0] size {}, got {}')

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
202 raise RuntimeError(
203 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 204 self.input_size, input.size(-1)))
205
206 def get_expected_hidden_size(self, input: Tensor, batch_sizes: Optional[Tensor]) -> Tuple[int, int, int]:

RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19

jit compiled error with DeepAR prediction net

I want to compile a trained DeepAR prediction net with

net = predictor.prediction_net scripted_module = torch.jit.script(net)

and got the following error:
Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults

Tweedie Loss

Is it possible to add this distribution to deepAR model ?

https://discuss.pytorch.org/t/custom-tweedie-loss-throwing-an-error-in-pytorch/76349/6

Inserting Target features

Hello,

As far as I know, in order to include a dynamic feature using deepar-estimator, feat_dynamic_real size should be target size + prediction length size.

Example - 3 days of prediction
{"target: [10, 5, 0, 0, 0], "feat_dynamic_real": [2, 30, 3, 1, 6, 2, 10, 8]}

If I want to use the same features as this kaggle solution`, how should I include:

**Sale values:**

Lag 1 value
Moving average of 7, 28 days
Continuous zero-sale days until today

Lag 1 value I just use lags_seq parameter lags_seq=[1]

However Moving average and Continuous zero-sale days until today will have the same target`s array size. How should I include it ?

Add early stopping to model training

Currently the loss function in model training only measures training loss per epoch. Add a feature to include validation loss as an early stopping criteria.

Matrix multiplication error during training TFT

Environment Details

Python version: 3.7.10
Operating System: Google Colab web platform

Error Description

I get a matrix multiplication error during training TFT

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4608x2 and 1x32)

Steps to reproduce

Open Colab https://colab.research.google.com/
Run the code

!pip install pytorchts -q
!curl https://forecasters.org/data/m3comp/M3C.xls --create-dirs -o /root/.mxnet/gluon-ts/datasets/M3C.xls 

from gluonts.dataset.repository.datasets import get_dataset
from pts.model.tft import TemporalFusionTransformerEstimator
from pts import Trainer

dataset = get_dataset("m3_monthly", regenerate=False)

estimator = TemporalFusionTransformerEstimator(
    freq=dataset.metadata.freq,
    prediction_length=dataset.metadata.prediction_length,
    context_length=dataset.metadata.prediction_length,
    dropout_rate=0.1,
    num_outputs=15,
    trainer=Trainer(device='cpu',
                    epochs=20,
                    learning_rate=1e-3,
                    num_batches_per_epoch=100,
                    batch_size=128))

predictor = estimator.train(dataset.train)

Getting error in embedding

Hi, while trying to reproduce the simple example from the "quick start" section, I keep getting the follow error message:

`Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/binance_conda/NEW/test-123.py", line 73, in <module>
    predictor = estimator.train(training_data=training_data)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\estimator.py", line 148, in train
    return self.train_model(training_data).predictor
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\estimator.py", line 133, in train_model
    self.trainer(
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\trainer.py", line 52, in __call__
    output = net(*inputs)
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 246, in forward
    distr = self.distribution(
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 221, in distribution
    rnn_outputs, _, scale, _ = self.unroll_encoder(
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 168, in unroll_encoder
    embedded_cat = self.embedder(feat_static_cat)
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\modules\feature.py", line 30, in forward
    [
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\modules\feature.py", line 31, in <listcomp>
    embed(cat_feature_slice.squeeze(-1))
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
    return F.embedding(
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
`

Scalable Representation Learning for Multivariate Time Series

Summary

It would be great to add Scalable Representation Learning for Multivariate Time Series as an additional feature input to models implemented in to Pytorch-TS.

Potential Benefits

Both univariate and multivariate models can potentially benefit from this approach, as it helps to bring more information to the model for prediction tasks. Also, the embeddings can be used for classiﬁcation to better understand the data at hand.

Description

The basic idea is to learn embeddings of time series from which similarities of the time series can be derived. The objective is to ensure that similar time series obtain similar representations that can be used as an input for modelling. As for image embeddings, the learned representations may also be used to define a meaningful measure between time series, e.g., comparing time series using a distance measure between their representations with dimensionality reduction and/or clustering.

The criterion to select pairs of similar time series follows word2vec’s intuition. For word embeddings, the representation of the context of a word should probably be, on one hand, close to the one of this word, and, on the other hand, distant from the one of randomly chosen words, since they are probably unrelated to the original word’s context. The corresponding loss then pushes pairs of (context, word) and (context, random word) to be linearly separable. This is called negative sampling. It can visualized as follows:

The loss minimizes the distance between an anchor and a positive, both of which have the same identity, and maximizes the distance between the anchor and a negative of a different identity.

To adapt this principle to time series, one can consider a random subseries x_ref of a given time series y_i. Then, on one hand, the representation of x_ref should be close to the one of any of its subseries x_pos (a positive example). On the other hand, if one considers another subseries x_neg (a negative example) chosen at random (in a different random time series y_j if several series are available, or in the same time series if it is long enough and not stationary), then its representation should be distant from the one of x_ref. Following the analogy with word2vec, x_pos corresponds to a word, x_ref to its context, and x_neg to a random word. To improve the stability and convergence of the training procedure as well as the experimental results of the learned representations, once can introduce, as in word2vec, several negative samples (x_neg_k) chosen independently at random.

The loss pushes the computed representations to distinguish between x_ref and x_neg, and to assimilate x_ref and x_pos. Overall, the training procedure consists in traveling through the training dataset for several epochs (possibly using mini-batches), picking tuples x_ref , x_pos ,(x_neg_k ) at random and performing a minimization step on the corresponding loss for each pair, until training ends.

Some Initial Comments

The approach needs to be a two step procedure:

We need to learn the embeddings first, i.e., train an embedding model
Once the embeddings are learned, we can incorporate them as a feat_static_real into any of the available model implementations, i.e., DeepAR, DeepVAR, TransformerTempFlowEstimator, etc.
We should also output the embeddings as they are useful in their own right

References

Unsupervised Scalable Representation Learning for Multivariate Time Series:

Where to find documentation?

I'm new to this project. Aside from the README, which doesn't cover the entire API surface (e.g. Temporal fusion transformer), how are we supposed to discover the API? Is there more detailed API documentation somewhere?

Thank you.

Continue training after making predictor

I would like to perform time series cross validation on a fairly large dataset.
To see if the results are consistent over multiple folds.

The way I approach this is to stop training at certain points in time, then predict the next 2 weeks of my data. After this I can continue training (see figure below.)

Looking at the source code pts/model/estimator.py
I can see a function train_model :

pytorch-ts/pts/model/estimator.py

Line 89 in 5da3be5

def train_model(

Which outputs a trained neural network

The other function I use now train:

pytorch-ts/pts/model/estimator.py

Line 164 in 5da3be5

def train(

Creates a predictor object

I'm not sure how to combine these functions to achieve the desired result,
anyone has experience in using this?

Implement Temporal Fusion Transformer?

Hi!

First of all, thank you for this library. It is the only Pytorch lib for ts with a usable API!

I am pretty sure you already know this, but one of the latest and most famous deep learning architectures for ts forecasting is the Temporal Fusion Transformer (TFT): https://arxiv.org/abs/1912.09363

There are already two Pytorch implementations out there: this one and this other one (that is heavily inspired by the former), but both lack a nice API and there are some implementation issues.

IMO it would be a great addition to pytorch-ts.

Thank you!

How to set input_size

Description

What does the input_size argument in DeepAREstimator or TransformerTempFlowEstimator stand for and how to properly set meaningful values for each of them? Would it be possible to derive the values directly from the input data?

Working Example Notebooks

Description

Referring to #6 (comment), it would be good to have Notebooks that illustrate the usage of each estimator available in PyTorch-TS. That would greatly facilitate getting hands-on to the package.

AttributeError: 'TempFlowTrainingNetwork' object has no attribute '_wandb_hook_names'

After your update of Trainer, it will report the bug when I run your example.

I found that you have added wandb in the trainer.

TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'

Despite assuring I am pandas 1.0.5 (cf. awslabs/gluonts#958), I am still getting a TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day' error when running the following:

# Define DL Time Series Model
estimator = DeepAREstimator(
    freq = FREQ,
    prediction_length = 1, #predict 1 day ahead
    input_size = 32,
    trainer = Trainer(
        epochs = 100,
        device = DEVICE
    )
predictor = estimator.train(training_data=training_data)

Which returned

0it [00:00, ?it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-b7c68ebabaa3> in <module>
----> 1 predictor = estimator.train(training_data=training_data)

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train(self, training_data)
    146 
    147     def train(self, training_data: Dataset) -> Predictor:
--> 148         return self.train_model(training_data).predictor

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train_model(self, training_data)
    131         trained_net = self.create_training_network(self.trainer.device)
    132 
--> 133         self.trainer(
    134             net=trained_net,
    135             input_names=get_module_forward_input_names(trained_net),

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/trainer.py in __call__(self, net, input_names, data_loader)
     46 
     47             with tqdm(data_loader) as it:
---> 48                 for batch_no, data_entry in enumerate(it, start=1):
     49                     optimizer.zero_grad()
     50                     inputs = [data_entry[k].to(self.device) for k in input_names]

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/tqdm/std.py in __iter__(self)
   1128 
   1129         try:
-> 1130             for obj in iterable:
   1131                 yield obj
   1132                 # Update and possibly print the progressbar.

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1
    365         if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    987             else:
    988                 del self._task_info[idx]
--> 989                 return self._process_data(data)
    990 
    991     def _try_put_index(self):

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1012         self._try_put_index()
   1013         if isinstance(data, ExceptionWrapper):
-> 1014             data.reraise()
   1015         return data
   1016 

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
    393             # (https://bugs.python.org/issue2651), so we work around it.
    394             msg = KeyErrorMessage(msg)
--> 395         raise self.exc_type(msg)

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/dataset/transformed_iterable_dataset.py", line 39, in __iter__
    data_entry = next(self._cur_iter)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 128, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 85, in __call__
    raise e
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 83, in __call__
    yield self.map_transform(data_entry.copy(), is_train)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 195, in map_transform
    self._update_cache(start, length)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 169, in _update_cache
    end = shift_timestamp(start, length)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/split.py", line 33, in shift_timestamp
    return _shift_timestamp_helper(ts, ts.freq, offset)
TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'

Where the following preceded that code:

# Print Timestamp Statistics
earliest_time = min(example_ny_df.index)
latest_time = max(example_ny_df.index)
time_range_full = (max(example_ny_df.index) - min(example_ny_df.index)).days

# Determine Cut-point for 80/20 Training/Testing Splits
TRAININGSPLIT = 0.8
time_range_split = int(time_range_full * TRAININGSPLIT)
time_split = min(example_ny_df.index) + datetime.timedelta(days=time_range_split)

# Create Training Split / Predictor Object
FREQ = "1D"
training_data = ListDataset(
    [{"start": earliest_time, "target": example_ny_df.positiveIncrease[:time_split]}],
    freq = FREQ
)

# Setup GPU, if Exists
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Processing device:", DEVICE)

I'm going to try to redo this from a 100% clean install without even trying the GPU version of torch as mentioned #22

ERROR: Could not install packages due to an OSError:

Hello, I was trying to install pytorch-ts using pipe install pytorchts in the Anaconda environment, and got the following error message. Any hint or suggestions on how to fix it will be highly appreciated. I did not get this kind of permission error when using pip to install other packages.

Successfully built pytorchts subprocess32 pathtools
Installing collected packages: smmap, gitdb, subprocess32, shortuuid, sentry-sdk, promise, pathtools, GitPython, docker-pycreds, configparser, wandb, torch, pytorchts
  Attempting uninstall: torch
    Found existing installation: torch 1.6.0
    Uninstalling torch-1.6.0:
      Successfully uninstalled torch-1.6.0
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\users\\anaconda3\\envs\\jane_street_kaggle\\lib\\site-packages\\~orch\\lib\\asmjit.dll'
Consider using the `--user` option or check the permissions.

how to extract the predicted cross-covariance matrix from estimators?

Hi, first of all, great work.

I was wondering how to recover the information shown in Fig. 5 in the tempflow paper https://arxiv.org/abs/2002.06103

Thanks!

Potential bug in calculating `lags_for_fourier_time_features_from_frequency` function

Hello,

I was trying to understand the code execution and stumbled across a potential bug.
The lags_for_fourier_time_features_from_frequency() returns an incorrect result when you pass a minute level freq arg.

Example : lags_for_fourier_time_features_from_frequency(freq='10min') == [1] when it should return [1, 4, 12, 24, 48]

For other frequencies it works as expected -

lags_for_fourier_time_features_from_frequency(freq='1D') == [1, 7, 14]
lags_for_fourier_time_features_from_frequency(freq='10M') == [1, 12]

Could you please explain why max(lags) is added to self.history_length to increase context len in here ?

Can I turn off the frequency argument in the DeepAREstimator?

Hi,

I am working on some time series data that do not have a stable frequency. I have a datetime column of when X occurs, and this X can occur at any time after the time of the previous occurrence i.e Xt can be 1 minute after Xt-1 but even months after Xt-1.

Is there a way to adapt the frequency argument to be suitable for my case?

Thanks in advance!

Multivariate Target Dim errors

Hello, im trying to use the dataset here https://github.com/smallGum/MLCNN-Multivariate-Time-Series/blob/master/data/nasdaq100_padding.csv to train TransformerTempFlowEstimator but i keep getting error related to the target_dim, here im using only 2 columns:

df = pd.read_csv ("./data/nasdaq100_padding.csv")

leng = len(df.NDX)

train = int(leng/2)
test = int(leng/2)
prediction_length = 15

training_data1 = ListDataset(
    [{"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AAPL[:train]},
     {"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AMZN[:train]}
    ],
    one_dim_target=False,
    freq = "min"
)
device = torch.device("cuda" )
estimator = TransformerTempFlowEstimator (freq="min", 
                            prediction_length=prediction_length,
                            input_size=600,
                            target_dim = 2,                          
                            trainer=Trainer(epochs=15,
                                            #learning_rate = 0.00001,
                                            device=device, 
                                            num_batches_per_epoch=500, 
                                            batch_size=20))
predictor = estimator.train(training_data=training_data1)

im getting errors like: RuntimeError: Sizes of tensors must match except in dimension 0. Got 20 and 10 (The offending index is 0) (which usually works by changing target_dim but then i get:)
and: RuntimeError: shape '[-1, 30, 3]' is invalid for input of size 600

File not found: datasets/pts_m5/metadata.json

Excuse me if this is a newbie question but when I try to run https://github.com/zalandoresearch/pytorch-ts/blob/master/examples/m5-tft.ipynb I am running into the following error:

Traceback (most recent call last):
  File "C:\Users\Gili\Documents\myproject\pytorch-ts-test.py", line 17, in <module>
    dataset = get_dataset("pts_m5", regenerate=False)
  File "C:\Users\Gili\Documents\myproject\python\lib\site-packages\gluonts\dataset\repository\datasets.py", line 232, in get_dataset
    return load_datasets(
  File "C:\Users\Gili\Documents\myproject\python\lib\site-packages\gluonts\dataset\common.py", line 491, in load_datasets
    meta = MetaData.parse_file(Path(metadata) / "metadata.json")
  File "pydantic\main.py", line 613, in pydantic.main.BaseModel.parse_file
  File "pydantic\parse.py", line 57, in pydantic.parse.load_file
  File "C:\Python39\lib\pathlib.py", line 1248, in read_bytes
    with self.open(mode='rb') as f:
  File "C:\Python39\lib\pathlib.py", line 1241, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "C:\Python39\lib\pathlib.py", line 1109, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Gili\\.mxnet\\gluon-ts\\datasets\\pts_m5\\metadata.json'

Looking at https://www.kaggle.com/c/m5-forecasting-accuracy/data this file does not seem to exist. What am I missing here?

Anomaly detection

Hi, would it be possible to use the code for anomaly detection ? I'm interested in applying the conditional normalizing flows model for this in the framework of detecting gravitational waves and compare with the performance of other models like VAE + LSTM, GANs, etc ...

Thanks !!

data format - from pandas to gluonts data format for both uni-variate and multivariate data

Hi @kashif

I hope your well!
I am just trying to get familiar with the library and I generally work with the pandas data format.
Could you help me format my data and create a basic uni-variate model for prediction into the future.

import numpy as np

import pandas as pd
import yfinance as yf
data = yf.download("SPY", start="2012-01-01", end="2017-04-30")['Adj Close']
data=pd.DataFrame(data)
data

from gluonts.dataset.common import ListDataset
training_data = ListDataset(
    [{"start": data.index[0], "target": data.values[:1300]}],
    freq = "1D"
)

from gluonts.dataset.common import ListDataset
test_data = ListDataset(
    [{"start": data.index[1301], "target": data.values[1301:1339]}],
    freq = "1D"
)


from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer

estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=38,
    context_length=100,
    freq='1D',
    trainer=Trainer(ctx="cpu", 
                    epochs=5, 
                    learning_rate=1e-3, 
                    num_batches_per_epoch=100
                   )
)
predictor = estimator.train(training_data=training_data)

Where to place m5 data?

When I want to use

from pts.dataset.repository import get_dataset

dataset = get_dataset("m5", regenerate=False)

I get the warning that the files from Kaggle are not present in the directory:
RuntimeError: M5 data is available on Kaggle (https://www.kaggle.com/c/m5-forecasting-accuracy/data). You first need to agree to the terms of the competition before being able to download the data. After you have done that, please copy the files into /root/.pytorch/pytorch-ts/datasets/m5.

However, I have no idea where to put these files. I'm working in Google Colab. The root directory is called "content". Should I make a ./pytorch/pytorch-ts/datasets/m5 directory myself?

Readme example not working

Description

Many thanks to the authors for making the implementation available! Great initiative.

I am trying to run the README.md example, but it is not working.

Code Snippet

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=43,
                            trainer=Trainer(epochs=10,
                                            device=device))
predictor = estimator.train(training_data=training_data)

Error

Running the code in readme until the snippet works just fine. When I run estimator.train(training_data=training_data), the above snipped throws the following error:

0it [00:02, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-57e06c572bad> in <module>
      6                             trainer=Trainer(epochs=10,
      7                                             device=device))
----> 8 predictor = estimator.train(training_data=training_data)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\estimator.py in train(self, training_data)
    133 
    134     def train(self, training_data: Dataset) -> Predictor:
--> 135         return self.train_model(training_data).predictor

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\estimator.py in train_model(self, training_data)
    118         trained_net = self.create_training_network(self.trainer.device)
    119 
--> 120         self.trainer(
    121             net=trained_net,
    122             input_names=get_module_forward_input_names(trained_net),

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\trainer.py in __call__(self, net, input_names, data_loader)
     50                     inputs = [data_entry[k].to(self.device) for k in input_names]
     51 
---> 52                     output = net(*inputs)
     53                     if isinstance(output, (list, tuple)):
     54                         loss = output[0]

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
    244         future_observed_values: torch.Tensor,
    245     ) -> torch.Tensor:
--> 246         distr = self.distribution(
    247             feat_static_cat=feat_static_cat,
    248             feat_static_real=feat_static_real,

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
    219         future_observed_values: torch.Tensor,
    220     ) -> Distribution:
--> 221         rnn_outputs, _, scale, _ = self.unroll_encoder(
    222             feat_static_cat=feat_static_cat,
    223             feat_static_real=feat_static_real,

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
    166 
    167         # (batch_size, num_features)
--> 168         embedded_cat = self.embedder(feat_static_cat)
    169 
    170         # in addition to embedding features, use the log scale as it can help

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\modules\feature.py in forward(self, features)
     28 
     29         return torch.cat(
---> 30             [
     31                 embed(cat_feature_slice.squeeze(-1))
     32                 for embed, cat_feature_slice in zip(

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\modules\feature.py in <listcomp>(.0)
     29         return torch.cat(
     30             [
---> 31                 embed(cat_feature_slice.squeeze(-1))
     32                 for embed, cat_feature_slice in zip(
     33                     self.__embedders, cat_feature_slices

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
    110 
    111     def forward(self, input):
--> 112         return F.embedding(
    113             input, self.weight, self.padding_idx, self.max_norm,
    114             self.norm_type, self.scale_grad_by_freq, self.sparse)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1482         # remove once script supports set_grad_enabled
   1483         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1485 
   1486 

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Environment

Operating system: Windows 10
Python version: 3.8.2
Pytorch: 1.4.0
Torchvision: 0.5.0
Cudatoolkit:10.0

Referencing gluon-ts and copyright

The vast majority of this repo seems to be copy-pasted from gluon-ts. It is a bit problematic as this is not really stated clearly, for instance in the README.md (which is also copy-pasted from gluon-ts).

I assume there is no ill-intent, could you perhaps state clearly which files does not come from gluon-ts? Note also, that all files that have been copy-pasted should still hold the initial copyright according to Apache license (see https://opensource.stackexchange.com/questions/5528/removing-copyright-notice-in-uis-of-apache-2-licensed-software).

Thanks!

TempFlowEstimator deserialize error

i train a tempFlow model,and then save(serialize) in a folder, but when i load(deserialize)the model,error append,just like

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
p = Predictor.deserialize(get_model_path('tempflow'))
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\model\predictor.py", line 82, in deserialize
return tpe.deserialize(path, device)
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\model\predictor.py", line 172, in deserialize
transformation = load_json(fp.read())
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 173, in load_json
return decode(json.loads(s))
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 354, in decode
kwargs = decode(r["kwargs"]) if "kwargs" in r else {}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in decode
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in decode
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 354, in decode
kwargs = decode(r["kwargs"]) if "kwargs" in r else {}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in decode
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in decode
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 355, in decode
return cls(*args, **kwargs)
TypeError: init() got an unexpected keyword argument 'normalized'

Forecast Reconciliation

Description

It would be very useful to allow for forecast reconciliation of hierarchical and/or grouped time series. This means that the sum of all forecasts that make up a hierarchy matches to the forecast of the hierarchy. Say, you forecast several time series that are within the same hierarchy + the time series of the total (e.g., all tourism visits in Australia within all territories + Total Tourism of the territories as an aggregate). What forecast reconciliation does it makes sure that the bottom level forecasts match the top-level aggregate forecast. As PyTorch-TS is a probabilistic framework, we also need to make sure that the uncertainty attached to the forecasts are corrected.

Besides cross-sectional hierarchies, you may also want to include temporal hierarchies, so that you train the model on daily, weekly and monthly data, and you make sure that all sum up to the temporal hierarchy of interest, e.g., monthly forecast.

Several paper show that Cross-temporal coherent forecasts improve accuracy compared to not taking the information into account.

References

This is a non-exhaustive list of references intended to give a first overview over the topic:

How to plot forecasts of multivariate time series

I'd like to plot the predictions of the TempFlowEstimator on a multivariate time series dataset, similar to what is done in the README of this repository.

When I make the forecasts as follows:

from pts.evaluation import make_evaluation_predictions
from pts.evaluation import MultivariateEvaluator
import numpy as np

evaluator = MultivariateEvaluator(quantiles=(np.arange(20)/20.0)[1:],
                                  target_agg_funcs={'sum': np.sum})

forecast_it, ts_it = make_evaluation_predictions(dataset=dataset_test,
                                             predictor=predictor,
                                             num_samples=100)
forecasts = list(forecast_it)
targets = list(ts_it)

targets[0] is a Pandas dataframe containing the true values for each of the (in my case 12) time series for all time steps. forecasts[0] is a SampleForecast object whose samples is a Numpy array of shape (100, 365, 12). This means that we have 100 samples for each of the 365 time steps of the test set, for each of the 12 time series.

However, how can I plot the samples of the first time series for example? I tried to set the samples of the forecasts[0] object to the samples of the first series (i.e. forecasts[0].samples = forecasts[0].samples[:,:,0]), but when I call the plot function on that I get

Exception                                 Traceback (most recent call last)
<ipython-input-83-b629071ff750> in <module>()
----> 1 samples_first_time_series.plot()

2 frames
/usr/local/lib/python3.6/dist-packages/pts/model/forecast.py in plot(self, prediction_intervals, show_mean, color, label, output_file, *args, **kwargs)
    132 
    133         p50_data = ps_data[i_p50]
--> 134         p50_series = pd.Series(data=p50_data, index=self.index)
    135         p50_series.plot(color=color, ls="-", label=f"{label_prefix}median")
    136 

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

/usr/local/lib/python3.6/dist-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

How to incorporate covariate information using TransformerTempFlowEstimator

Description

I am currently using the Australian retail trade turnover data set to get familiar with PyTorch-TS in general and with TransformerTempFlowEstimator in particular. The data looks as follows:

Each series (133 series in total) has 417 months of training observations and is uniquely identified using two keys:

State: The Australian state (or territory)
Industry: The industry of retail trade

All series show quite some positive dependencies, as the correlation matrix shows:

As such, TransformerTempFlowEstimator seems to be a good option. I want to make use of both State and Industry as covariates in the model. For each categorical covariate, a generalized linear mixed model is fit to the outcome and the coefficients are returned as the encodings. The cardinality of State and Industry is [7, 20]. After bringing the data into the right format, I create the train data as follows:

train_ds = ListDataset([{FieldName.TARGET: target, 
                         FieldName.START: start,
                         FieldName.ITEM_ID: item_id,
                         FieldName.FEAT_DYNAMIC_REAL: feat_dynamic_real,
                         FieldName.FEAT_STATIC_REAL: feat_static_real,
                         FieldName.FEAT_TIME: time_feat
                        } 
                        for (target, 
                             start, 
                             item_id, 
                             feat_dynamic_real, 
                             feat_static_real, 
                             time_feat
                            ) in zip(target_train,
                                     start_train,
                                     item_id_train,
                                     feat_dynamic_real_train,
                                     feat_static_real_train,
                                     time_feat_train
                                    )],
                      freq = "1M")

feat_static_real_train contain the embeddings and time_feat_train the month information. To transform the data into a multivariate data set, I use

grouper_train = MultivariateGrouper(max_target_dim = 133) # as there are 133 unique series 
train_ds = grouper_train(train_ds)

However, after using grouper_train(train_ds), none of the covariate information is included anymore. To bring them back, I use

train_ds.list_data[0]["feat_dynamic_real"] = feat_dynamic_real_train
train_ds.list_data[0]["feat_static_real"] = feat_static_real_train

I then train the model as follows:

np.random.seed(123)
torch.manual_seed(123)
trainer = Trainer( epochs = 40) 

estimator = TransformerTempFlowEstimator(input_size = 401,
                                         freq = "1M", 
                                         prediction_length = 24,
                                         context_length = 48,
                                         target_dim = 133,
                                         cardinality = [7, 20],
                                         trainer = trainer)                              
predictor = estimator.train(training_data = train_ds)

The model summary is

predictor.__dict__["prediction_net"]*
pts.model.transformer_tempflow.transformer_tempflow_network.TransformerTempFlowPredictionNetwork(act_type="gelu", cardinality=[7, 20], conditioning_length=200, context_length=48, d_model=32, dequantize=False, dim_feedforward_scale=4, dropout_rate=0.1, embedding_dimension=5, flow_type="RealNVP", hidden_size=100, history_length=60, input_size=401, lags_seq=[1, 12], n_blocks=3, n_hidden=2, num_decoder_layers=3, num_encoder_layers=3, num_heads=8, prediction_length=24, scaling=True, target_dim=133)

I also compared the forecast to some competing models, even though I am not sure that all models are correctly specified (i.e., covariate information, no parameter tuning).

Given the strong dependencies between the different series, I would suspect that TransformerTempFlowEstimator should outperform models that treat the series as being independent.

Question

Based on the above summary, I have the following questions concerning the proper use of TransformerTempFlowEstimator:

How can covariates be included, in particular categorical information.
Does the model automatically include, e.g., month and/or age information that it itself derives from the data or do we need to pass it using time_features in the function call.
Does the model automatically derive holiday information from the data, or do we need to derive it ourselves as described here.
Does the model automatically select an appropriate lag-structure from the data, or do we need to derive it ourselves as described here.
Which of the following field names are currently supported:

 "FieldName.START = 'start'",
 "FieldName.TARGET = 'target'",
 "FieldName.FEAT_STATIC_CAT = 'feat_static_cat'",
 "FieldName.FEAT_STATIC_REAL = 'feat_static_real'",
 "FieldName.FEAT_DYNAMIC_CAT = 'feat_dynamic_cat'",
 "FieldName.FEAT_DYNAMIC_REAL = 'feat_dynamic_real'",
 "FieldName.FEAT_TIME = 'time_feat'",
 "FieldName.FEAT_CONST = 'feat_dynamic_const'",
 "FieldName.FEAT_AGE = 'feat_dynamic_age'",
 "FieldName.OBSERVED_VALUES = 'observed_values'",
 "FieldName.IS_PAD = 'is_pad'",
 "FieldName.FORECAST_START = 'forecast_start'"]

how to customize dataset?

It's a great work! And I want to apply the model to predict air quality such as PM10 and PM2.5 based on temperature, wind speed and direction and so on. Can this model directly accept pytorch's dataloader object? If not, how to customize dataset?

Relation to gluonts

First of all, thanks a lot for the interesting paper and for open-sourcing the corresponding model!

I was wondering about the precise relation of this project to gluonts. In the readme you're saying that this project uses gluonts for data loading, transformations etc., but looking at the source code, it seems like you essentially did a port of the existing gluonts code to pytorch? So in that sense you're using the gluonts API and if I have some function (like a transform) coded for gluonts, chances are that it is compatible with this project due to python's duck typing?
Is this the correct understanding?

Working Example of TransformerTempFlowEstimator

I have been trying to get TransformerTempFlowEstimator working without success.
Can you provide an example script? Issues include RuntimeError: Sizes of tensors must match except in dimension 2. Got 1 and 32 in dimension 0 and not understanding how the data loading works for multivariate data.
My example below:

from pts.dataset import MultivariateGrouper
import pandas as pd
import torch

from pts.dataset import ListDataset
from pts.model.transformer_tempflow import TransformerTempFlowEstimator
from pts import Trainer

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)

train_ds = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]+i}
        for i in range(2)],
    freq="5min"
)

grouper_train = MultivariateGrouper(max_target_dim=2)
gt = grouper_train(train_ds)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

trainer = Trainer(epochs=10)

estimator = TransformerTempFlowEstimator(input_size=1,
                                         freq="5min",
                                         prediction_length=100,
                                         context_length=4,
                                         target_dim=64,
                                         cardinality=[7, 20],
                                         trainer=trainer)

predictor = estimator.train(training_data=gt)

Installation issue

Hi, with the latest update of making gluonts a requirement, I think there seems to be an issue with the installation via pip.

The error I get:

ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
pytorchts depends on gluonts@ git+https://github.com/awslabs/gluon-ts.git@master#egg=gluonts

Issue running example on Windows pc

Hey,

I ran into an issue while testing your example code. I use a windows pc with cpu. Latest version of torch is installed 1.7.1.

Any idea what could resolve the issue?

Thanks,
Pieter

RuntimeError Traceback (most recent call last)
in
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data, num_workers=2)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\estimator.py in train(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
171 shuffle_buffer_length=shuffle_buffer_length,
172 cache_data=cache_data,
--> 173 **kwargs,
174 ).predictor

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\estimator.py in train_model(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
143 net=trained_net,
144 train_iter=training_data_loader,
--> 145 validation_iter=validation_data_loader,
146 )
147

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\trainer.py in call(self, net, train_iter, validation_iter)
68 inputs = [v.to(self.device) for v in data_entry.values()]
69
---> 70 output = net(*inputs)
71 if isinstance(output, (list, tuple)):
72 loss = output[0]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
252 future_time_feat=future_time_feat,
253 future_target=future_target,
--> 254 future_observed_values=future_observed_values,
255 )
256

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
226 past_observed_values=past_observed_values,
227 future_time_feat=future_time_feat,
--> 228 future_target=future_target,
229 )
230

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
166
167 # (batch_size, num_features)
--> 168 embedded_cat = self.embedder(feat_static_cat)
169
170 # in addition to embedding features, use the log scale as it can help

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\modules\feature.py in forward(self, features)
35 embed(cat_feature_slice.squeeze(-1))
36 for embed, cat_feature_slice in zip(
---> 37 self.__embedders, cat_feature_slices
38 )
39 ],

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\modules\feature.py in (.0)
34 [
35 embed(cat_feature_slice.squeeze(-1))
---> 36 for embed, cat_feature_slice in zip(
37 self.__embedders, cat_feature_slices
38 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1850 # remove once script supports set_grad_enabled
1851 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1852 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1853
1854

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

README example is failing with "RuntimeError: input.size(-1) must be equal to input_size"

Hi, I'm executing the following code from the README and

import pandas as pd
import matplotlib.pyplot as plt

import torch
print(torch.__version__)

import gluonts
from gluonts.dataset.common import ListDataset
from gluonts.dataset.util import to_pandas

import pts
from pts.model.deepar import DeepAREstimator
from pts import Trainer

print(pts.__version__, gluonts.__version__)

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)

df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()


training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)


device = "cpu"
estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=43,
                            trainer=Trainer(epochs=10,
                                            device=device))
predictor = estimator.train(training_data=training_data, num_workers=4)

and got the following error:

1.9.0
0.0.0-unknown 0.8.0

    203                     expected_input_dim, input.dim()))
    204         if self.input_size != input.size(-1):
--> 205             raise RuntimeError(
    206                 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
    207                     self.input_size, input.size(-1)))

RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19

Version:

pip list | grep pytorchts
pytorchts                     0.5.1

Any suggestions ?
Thanks

zalandoresearch / pytorch-ts Goto Github PK

pytorch-ts's People

Contributors

Stargazers

Watchers

Forkers

pytorch-ts's Issues

Description

To Reproduce

Error message output

Potential Solution

Environment Details

Error Description

Steps to reproduce

Summary

Potential Benefits

Description

Some Initial Comments

References

Description

Description

Description

Code Snippet

Error

Environment

Description

References

Description

Question

Recommend Projects

Recommend Topics

Recommend Org