zalandoresearch / pytorch-ts Goto Github PK
View Code? Open in Web Editor NEWPyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
License: MIT License
PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
License: MIT License
I have a fresh install of pytorchts
in it's own environment on my Ubuntu 18 machine, but am unable to import it because of a tensorboard logging issue despite the fact that I have tensorboard installed.
Tensorboard, PyTorch, and PyTorchTS versions
(timeseries) amruch@wit:~/graphika/SBIR_COVID$ conda list | grep "board"
tensorboard 2.2.1 pyh532a8cf_0
tensorboard-plugin-wit 1.6.0 py_0
(timeseries) amruch@wit:~/graphika/SBIR_COVID$ conda list | grep "torch"
pytorch 1.6.0 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
pytorchts 0.2.0 pypi_0 pypi
torchvision 0.7.0 py36_cu101 pytorch
Error thrown when importing various pts
modules:
(timeseries) amruch@wit:~/graphika/SBIR_COVID$ python
Python 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pts.dataset import ListDataset
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/__init__.py", line 6, in <module>
from .trainer import Trainer
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/trainer.py", line 7, in <module>
from torch.utils.tensorboard import SummaryWriter
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above
>>> from pts.model.deepar import DeepAREstimator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/__init__.py", line 6, in <module>
from .trainer import Trainer
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/trainer.py", line 7, in <module>
from torch.utils.tensorboard import SummaryWriter
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above```
It seems to me that preventing any kind of import due to tensorboard is a bit overkill. Why not just throw a warning that progress/results cannot be logged?
I am trying to use DeepVAREstimator from the issue-3 branch throwing an error NameError: name 'SetField' is not defined.
from pts.transform import SetField
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trainer = Trainer(device = device, epochs = 10)
estimator = DeepVAREstimator(input_size = 401,
freq = "1M",
prediction_length = pred_h,
context_length = pred_h*2,
target_dim = target_dim,
use_feat_static_cat = True,
cardinality = card_static,
trainer = trainer)
predictor = estimator.train(training_data = train_ds)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-27-375c015eb18b> in <module>
20 # time_features = feat_dynamic_real_train,
21 trainer = trainer)
---> 22 predictor = estimator.train(training_data = train_ds)
23 predictor.__dict__["prediction_net"]
~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/estimator.py in train(self, training_data)
132
133 def train(self, training_data: Dataset) -> Predictor:
--> 134 return self.train_model(training_data).predictor
~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/estimator.py in train_model(self, training_data)
98
99 def train_model(self, training_data: Dataset) -> TrainOutput:
--> 100 transformation = self.create_transformation()
101 transformation.estimate(iter(training_data))
102
~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/deepvar/deepvar_estimator.py in create_transformation(self)
154 else []
155 )
--> 156 + [
157 AsNumpyArray(
158 field=FieldName.FEAT_STATIC_CAT, expected_ndim=1, dtype=np.long,
NameError: name 'SetField' is not defined
Include the following into deepvar_estimator.py
from pts.transform import (
...
SetField
)
/databricks/python/lib/python3.7/site-packages/pts/trainer.py in __call__(self, net, train_iter, validation_iter)
96 if validation_iter is not None:
97 cumm_epoch_loss_val = 0.0
---> 98 with tqdm(validation_iter, total=total, colour="green") as it:
99
100 for batch_no, data_entry in enumerate(it, start=1):
/databricks/python/lib/python3.7/site-packages/tqdm/std.py in __init__(self, iterable, desc, total, leave, file, ncols, mininterval, maxinterval, miniters, ascii, disable, unit, unit_scale, dynamic_ncols, smoothing, bar_format, initial, position, postfix, unit_divisor, write_bytes, lock_args, gui, **kwargs)
946 fp_write=getattr(file, 'write', sys.stderr.write))
947 if "nested" in kwargs else
--> 948 TqdmKeyError("Unknown argument(s): " + str(kwargs)))
949
950 # Preprocess the arguments
TQDM version used is incompatible signature.
While trying to forecast predictions and plot the confidence intervals, I received the following error: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'
. I'm not sure what's going on, as in my code I also transformed the index datetime dtype into a float manually. Is this related to some other error in my code and/or setup? My notebook is available here: https://drive.google.com/file/d/1B7kmDmdqY-zYFscL-LyV2nTJd_GGfAJ1/view?usp=sharing. The code uses public data and loads it via a request
call, so you should be able to run it as is. There are no external dependencies beyond those used in the tutorial.
This is especially odd to me because I confirmed that the datetime type (datetime64[ns]
) is the same for my dataset as for the dataset used in the example before I transform it to a float (it fails either way):
My data
>>> # Assess Index dtype
>>> daily_covid_df.index
DatetimeIndex(['2020-01-22', '2020-01-22', '2020-01-23', '2020-01-23',
'2020-01-24', '2020-01-24', '2020-01-25', '2020-01-25',
'2020-01-26', '2020-01-26',
...
'2020-09-11', '2020-09-11', '2020-09-11', '2020-09-11',
'2020-09-11', '2020-09-11', '2020-09-11', '2020-09-11',
'2020-09-11', '2020-09-11'],
dtype='datetime64[ns]', name='date', length=10738, freq=None)
Your data
>>> import pandas as pd
>>> url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
>>> df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
>>> df.index
DatetimeIndex(['2015-02-26 21:42:53', '2015-02-26 21:47:53',
'2015-02-26 21:52:53', '2015-02-26 21:57:53',
'2015-02-26 22:02:53', '2015-02-26 22:07:53',
'2015-02-26 22:12:53', '2015-02-26 22:17:53',
'2015-02-26 22:22:53', '2015-02-26 22:27:53',
...
'2015-04-22 20:07:53', '2015-04-22 20:12:53',
'2015-04-22 20:17:53', '2015-04-22 20:22:53',
'2015-04-22 20:27:53', '2015-04-22 20:32:53',
'2015-04-22 20:37:53', '2015-04-22 20:42:53',
'2015-04-22 20:47:53', '2015-04-22 20:52:53'],
dtype='datetime64[ns]', name='timestamp', length=15831, freq=None)
Also, I'm not sure if I understand the logic of what is done when the test_data
object is created.
# Create Test Split
test_data = ListDataset(
[{"start": example_ny_df.index[0], "target": example_ny_df.positiveIncrease}],
freq = FREQ
)
I would have presumed that because the start is the earliest date in the dataset and the target is the full data up through and to last date, that the plotting code for
for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
to_pandas(test_entry)[-60:].plot(linewidth=2)
forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')
would plot predictions from the start date to the end date; however, in my code it plots from mid-August to the first week of September. Is this because of the -60
in the to_pandas(test_entry)[-60]
part?
Is there documentation available that explains these functions a bit more or should I just reference the code itself?
Thanks for your time and attention!
BrokenPipeError Traceback (most recent call last)
in
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data)
D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\model\estimator.py in train(self, training_data)
146
147 def train(self, training_data: Dataset) -> Predictor:
--> 148 return self.train_model(training_data).predictor
D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\model\estimator.py in train_model(self, training_data)
134 net=trained_net,
135 input_names=get_module_forward_input_names(trained_net),
--> 136 data_loader=training_data_loader,
137 )
138
D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\trainer.py in call(self, net, input_names, data_loader)
46
47 with tqdm(data_loader) as it:
---> 48 for batch_no, data_entry in enumerate(it, start=1):
49 optimizer.zero_grad()
50 inputs = [data_entry[k].to(self.device) for k in input_names]
D:\software\anaconda\envs\tensorflow\lib\site-packages\tqdm\std.py in iter(self)
1163
1164 try:
-> 1165 for obj in iterable:
1166 yield obj
1167 # Update and possibly print the progressbar.
D:\software\anaconda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
289 return _SingleProcessDataLoaderIter(self)
290 else:
--> 291 return _MultiProcessingDataLoaderIter(self)
292
293 @Property
D:\software\anaconda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
735 # before it starts, and del tries to join but will get:
736 # AssertionError: can only join a started process.
--> 737 w.start()
738 self._index_queues.append(index_queue)
739 self._workers.append(w)
D:\software\anaconda\envs\tensorflow\lib\multiprocessing\process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect
D:\software\anaconda\envs\tensorflow\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
D:\software\anaconda\envs\tensorflow\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
D:\software\anaconda\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)
D:\software\anaconda\envs\tensorflow\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
BrokenPipeError: [Errno 32] Broken pipe
get_dataset() function failing with call
dataset = get_dataset("pts_m5", regenerate=False)
Fails with error message,
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-3674bc0c6fce> in <module>
----> 1 dataset = get_dataset("pts_m5", regenerate=False)
~/code/geo-deep-forecast/.geo-deep-env3.8-v5/lib/python3.8/site-packages/gluonts/dataset/repository/datasets.py in get_dataset(dataset_name, path, regenerate)
189 dataset obtained by either downloading or reloading from local file.
190 """
--> 191 dataset_path = materialize_dataset(dataset_name, path, regenerate)
192
193 return load_datasets(
~/code/geo-deep-forecast/.geo-deep-env3.8-v5/lib/python3.8/site-packages/gluonts/dataset/repository/datasets.py in materialize_dataset(dataset_name, path, regenerate)
142 the path where the dataset is materialized
143 """
--> 144 assert dataset_name in dataset_recipes.keys(), (
145 f"{dataset_name} is not present, please choose one from "
146 f"{dataset_recipes.keys()}."
AssertionError: pts_m5 is not present, please choose one from odict_keys(['constant', 'exchange_rate', 'solar-energy', 'electricity', 'traffic', 'exchange_rate_nips', 'electricity_nips', 'traffic_nips', 'solar_nips', 'wiki-rolling_nips', 'taxi_30min', 'm3_monthly', 'm3_quarterly', 'm3_yearly', 'm3_other', 'm4_hourly', 'm4_daily', 'm4_weekly', 'm4_monthly', 'm4_quarterly', 'm4_yearly', 'm5']).
ERROR: Could not find a version that satisfies the requirement pytorchts (from versions: none)
ERROR: No matching distribution found for pytorchts
Now,i placed the holidays feature in 'taget',but in deepar it should be 'feat_dynamic_real'
i`m not sure placed it in 'target' is the best way.
I am unable to reproduce results from TimeGrad Notebook. I am getting diverging loss into NaN loss.
predictor = estimator.train(dataset_train, num_workers=8)
99it [00:22, 4.39it/s, avg_epoch_loss=0.945, epoch=0]
99it [00:22, 4.40it/s, avg_epoch_loss=0.495, epoch=1]
99it [00:22, 4.39it/s, avg_epoch_loss=0.466, epoch=2]
99it [00:22, 4.35it/s, avg_epoch_loss=0.795, epoch=3]
99it [00:22, 4.33it/s, avg_epoch_loss=0.852, epoch=4]
99it [00:22, 4.32it/s, avg_epoch_loss=nan, epoch=5]
99it [00:22, 4.33it/s, avg_epoch_loss=nan, epoch=6]
99it [00:22, 4.30it/s, avg_epoch_loss=nan, epoch=7]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=8]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=9]
99it [00:23, 4.29it/s, avg_epoch_loss=nan, epoch=10]
99it [00:23, 4.28it/s, avg_epoch_loss=nan, epoch=11]
99it [00:22, 4.33it/s, avg_epoch_loss=nan, epoch=12]
99it [00:23, 4.21it/s, avg_epoch_loss=nan, epoch=13]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=14]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=15]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=16]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=17]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=18]
99it [00:23, 4.20it/s, avg_epoch_loss=nan, epoch=19]
I'm very glad to read the paper "Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting"๏ผ it's a very intesting work and TimeGrad achieves state of the art results on multivariate time series forecasting tasks. However, I can not reproduce the results on some datasets with the model implemented in pytorch-ts. Could you release the hyperparameter settings of these datasets in the paper? Thx a lot!
Hello, Iโm trying the example in the readme with a very simple univariate of about 1300 datapoint.
It is made of just the data (y:m:d) and the values.
I keep getting errors like input size got is 37 != expected (I tried using the length of the data frame) , but I canโt understand what is that 37.
How was the input = 43 of the example calculated?
Hello im trying to train using a simple detaframe made of {timestamp, a, b} in which a on time t is 0, 1, 2, 3 and b on time t + 1 is =a, it should be able to predict b simply by a.
but i cant understand how MultivariateGrouper works and what should be the "max_target_dim"
also why in the example MultivariateEvaluator was used "quantiles=(np.arange(20)/20.0)[1:]"?
and is the "target_dim" in TempFlowEstimator the last one because that is what we want to predict or no?
Hey!
My pandas version is 1.1.0. In pts/feature/fourier_date_feature.py
on line 52, pandas.tseries.frequencies.to_offset
is used to normalize frequency, but when the freq_str
parameter is W
this function produces the following:
offset = to_offset('W')
multiple, granularity = offset.n, offset.name
print(granularity)
# this prints 'W-SUN' which is equivalent to 'W'
Because of the assertion on line 66, W-SUN
and thus the initial W
is not accepted. Changing W
to W-SUN
in the features
dictionary (or adding both) had fixed this issue for me.
Thanks for looking into this.
Outputs
ERROR: Could not find a version that satisfies the requirement pytorchts (from versions: none)
ERROR: No matching distribution found for pytorchts
dataset:url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
example=https://github.com/zalandoresearch/pytorch-ts
RuntimeError Traceback (most recent call last)
in
7 device=device))
8 # predictor = estimator.train(training_data=training_data, num_workers=4)
----> 9 predictor = estimator.train(training_data=training_data, num_workers=1)
10
~/Documents/pytorch-ts/pts/model/estimator.py in train(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
179 shuffle_buffer_length=shuffle_buffer_length,
180 cache_data=cache_data,
--> 181 **kwargs,
182 ).predictor
~/Documents/pytorch-ts/pts/model/estimator.py in train_model(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
147 net=trained_net,
148 train_iter=training_data_loader,
--> 149 validation_iter=validation_data_loader,
150 )
151
~/Documents/pytorch-ts/pts/trainer.py in call(self, net, train_iter, validation_iter)
70
71 inputs = [v.to(self.device) for v in data_entry.values()]
---> 72 output = net(*inputs)
73
74 if isinstance(output, (list, tuple)):
~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
252 future_time_feat=future_time_feat,
253 future_target=future_target,
--> 254 future_observed_values=future_observed_values,
255 )
256
~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
226 past_observed_values=past_observed_values,
227 future_time_feat=future_time_feat,
--> 228 future_target=future_target,
229 )
230
~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
198
199 # unroll encoder
--> 200 outputs, state = self.rnn(inputs)
201
202 # outputs: (batch_size, seq_len, num_cells)
~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
657 hx = self.permute_hidden(hx, sorted_indices)
658
--> 659 self.check_forward_args(input, hx, batch_sizes)
660 if batch_sizes is None:
661 result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
603 # See torch/nn/modules/module.py::_forward_unimplemented
604 def check_forward_args(self, input: Tensor, hidden: Tuple[Tensor, Tensor], batch_sizes: Optional[Tensor]): # type: ignore
--> 605 self.check_input(input, batch_sizes)
606 self.check_hidden_size(hidden[0], self.get_expected_hidden_size(input, batch_sizes),
607 'Expected hidden[0] size {}, got {}')
~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
202 raise RuntimeError(
203 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 204 self.input_size, input.size(-1)))
205
206 def get_expected_hidden_size(self, input: Tensor, batch_sizes: Optional[Tensor]) -> Tuple[int, int, int]:
RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19
I want to compile a trained DeepAR prediction net with
net = predictor.prediction_net scripted_module = torch.jit.script(net)
and got the following error:
Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults
Is it possible to add this distribution to deepAR model ?
https://discuss.pytorch.org/t/custom-tweedie-loss-throwing-an-error-in-pytorch/76349/6
Hello,
As far as I know, in order to include a dynamic feature using deepar-estimator, feat_dynamic_real
size should be target size + prediction length size.
Example - 3 days of prediction
{"target: [10, 5, 0, 0, 0], "feat_dynamic_real": [2, 30, 3, 1, 6, 2, 10, 8]}
If I want to use the same features as this kaggle solution`, how should I include:
**Sale values:**
Lag 1 value
Moving average of 7, 28 days
Continuous zero-sale days until today
Lag 1 value I just use lags_seq
parameter lags_seq=[1]
However Moving average
and Continuous zero-sale days until today
will have the same target`s array size. How should I include it ?
Currently the loss function in model training only measures training loss per epoch. Add a feature to include validation loss as an early stopping criteria.
I get a matrix multiplication error during training TFT
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4608x2 and 1x32)
!pip install pytorchts -q
!curl https://forecasters.org/data/m3comp/M3C.xls --create-dirs -o /root/.mxnet/gluon-ts/datasets/M3C.xls
from gluonts.dataset.repository.datasets import get_dataset
from pts.model.tft import TemporalFusionTransformerEstimator
from pts import Trainer
dataset = get_dataset("m3_monthly", regenerate=False)
estimator = TemporalFusionTransformerEstimator(
freq=dataset.metadata.freq,
prediction_length=dataset.metadata.prediction_length,
context_length=dataset.metadata.prediction_length,
dropout_rate=0.1,
num_outputs=15,
trainer=Trainer(device='cpu',
epochs=20,
learning_rate=1e-3,
num_batches_per_epoch=100,
batch_size=128))
predictor = estimator.train(dataset.train)
Hi, while trying to reproduce the simple example from the "quick start" section, I keep getting the follow error message:
`Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/binance_conda/NEW/test-123.py", line 73, in <module>
predictor = estimator.train(training_data=training_data)
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\estimator.py", line 148, in train
return self.train_model(training_data).predictor
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\estimator.py", line 133, in train_model
self.trainer(
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\trainer.py", line 52, in __call__
output = net(*inputs)
File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 246, in forward
distr = self.distribution(
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 221, in distribution
rnn_outputs, _, scale, _ = self.unroll_encoder(
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 168, in unroll_encoder
embedded_cat = self.embedder(feat_static_cat)
File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\modules\feature.py", line 30, in forward
[
File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\modules\feature.py", line 31, in <listcomp>
embed(cat_feature_slice.squeeze(-1))
File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
return F.embedding(
File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\functional.py", line 1852, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
`
It would be great to add Scalable Representation Learning for Multivariate Time Series as an additional feature input to models implemented in to Pytorch-TS.
Both univariate and multivariate models can potentially benefit from this approach, as it helps to bring more information to the model for prediction tasks. Also, the embeddings can be used for classi๏ฌcation to better understand the data at hand.
The basic idea is to learn embeddings of time series from which similarities of the time series can be derived. The objective is to ensure that similar time series obtain similar representations that can be used as an input for modelling. As for image embeddings, the learned representations may also be used to define a meaningful measure between time series, e.g., comparing time series using a distance measure between their representations with dimensionality reduction and/or clustering.
The criterion to select pairs of similar time series follows word2vecโs intuition. For word embeddings, the representation of the context of a word should probably be, on one hand, close to the one of this word, and, on the other hand, distant from the one of randomly chosen words, since they are probably unrelated to the original wordโs context. The corresponding loss then pushes pairs of (context, word) and (context, random word) to be linearly separable. This is called negative sampling. It can visualized as follows:
The loss minimizes the distance between an anchor and a positive, both of which have the same identity, and maximizes the distance between the anchor and a negative of a different identity.
To adapt this principle to time series, one can consider a random subseries x_ref of a given time series y_i. Then, on one hand, the representation of x_ref should be close to the one of any of its subseries x_pos (a positive example). On the other hand, if one considers another subseries x_neg (a negative example) chosen at random (in a different random time series y_j if several series are available, or in the same time series if it is long enough and not stationary), then its representation should be distant from the one of x_ref. Following the analogy with word2vec, x_pos corresponds to a word, x_ref to its context, and x_neg to a random word. To improve the stability and convergence of the training procedure as well as the experimental results of the learned representations, once can introduce, as in word2vec, several negative samples (x_neg_k) chosen independently at random.
The loss pushes the computed representations to distinguish between x_ref and x_neg, and to assimilate x_ref and x_pos. Overall, the training procedure consists in traveling through the training dataset for several epochs (possibly using mini-batches), picking tuples x_ref , x_pos ,(x_neg_k ) at random and performing a minimization step on the corresponding loss for each pair, until training ends.
The approach needs to be a two step procedure:
Unsupervised Scalable Representation Learning for Multivariate Time Series:
I'm new to this project. Aside from the README, which doesn't cover the entire API surface (e.g. Temporal fusion transformer), how are we supposed to discover the API? Is there more detailed API documentation somewhere?
Thank you.
I would like to perform time series cross validation on a fairly large dataset.
To see if the results are consistent over multiple folds.
The way I approach this is to stop training at certain points in time, then predict the next 2 weeks of my data. After this I can continue training (see figure below.)
Looking at the source code pts/model/estimator.py
I can see a function train_model :
pytorch-ts/pts/model/estimator.py
Line 89 in 5da3be5
Which outputs a trained neural network
The other function I use now train:
pytorch-ts/pts/model/estimator.py
Line 164 in 5da3be5
Creates a predictor object
I'm not sure how to combine these functions to achieve the desired result,
anyone has experience in using this?
Hi!
First of all, thank you for this library. It is the only Pytorch lib for ts with a usable API!
I am pretty sure you already know this, but one of the latest and most famous deep learning architectures for ts forecasting is the Temporal Fusion Transformer (TFT): https://arxiv.org/abs/1912.09363
There are already two Pytorch implementations out there: this one and this other one (that is heavily inspired by the former), but both lack a nice API and there are some implementation issues.
IMO it would be a great addition to pytorch-ts.
Thank you!
What does the input_size argument in DeepAREstimator or TransformerTempFlowEstimator stand for and how to properly set meaningful values for each of them? Would it be possible to derive the values directly from the input data?
Referring to #6 (comment), it would be good to have Notebooks that illustrate the usage of each estimator available in PyTorch-TS. That would greatly facilitate getting hands-on to the package.
After your update of Trainer, it will report the bug when I run your example.
I found that you have added wandb in the trainer.
Despite assuring I am pandas
1.0.5 (cf. awslabs/gluonts#958), I am still getting a TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'
error when running the following:
# Define DL Time Series Model
estimator = DeepAREstimator(
freq = FREQ,
prediction_length = 1, #predict 1 day ahead
input_size = 32,
trainer = Trainer(
epochs = 100,
device = DEVICE
)
predictor = estimator.train(training_data=training_data)
Which returned
0it [00:00, ?it/s]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-76-b7c68ebabaa3> in <module>
----> 1 predictor = estimator.train(training_data=training_data)
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train(self, training_data)
146
147 def train(self, training_data: Dataset) -> Predictor:
--> 148 return self.train_model(training_data).predictor
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train_model(self, training_data)
131 trained_net = self.create_training_network(self.trainer.device)
132
--> 133 self.trainer(
134 net=trained_net,
135 input_names=get_module_forward_input_names(trained_net),
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/trainer.py in __call__(self, net, input_names, data_loader)
46
47 with tqdm(data_loader) as it:
---> 48 for batch_no, data_entry in enumerate(it, start=1):
49 optimizer.zero_grad()
50 inputs = [data_entry[k].to(self.device) for k in input_names]
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/tqdm/std.py in __iter__(self)
1128
1129 try:
-> 1130 for obj in iterable:
1131 yield obj
1132 # Update and possibly print the progressbar.
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
361
362 def __next__(self):
--> 363 data = self._next_data()
364 self._num_yielded += 1
365 if self._dataset_kind == _DatasetKind.Iterable and \
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
987 else:
988 del self._task_info[idx]
--> 989 return self._process_data(data)
990
991 def _try_put_index(self):
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
1012 self._try_put_index()
1013 if isinstance(data, ExceptionWrapper):
-> 1014 data.reraise()
1015 return data
1016
~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
393 # (https://bugs.python.org/issue2651), so we work around it.
394 msg = KeyErrorMessage(msg)
--> 395 raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
data.append(next(self.dataset_iter))
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/dataset/transformed_iterable_dataset.py", line 39, in __iter__
data_entry = next(self._cur_iter)
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 128, in __call__
for data_entry in data_it:
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
for data_entry in data_it:
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
for data_entry in data_it:
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 85, in __call__
raise e
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 83, in __call__
yield self.map_transform(data_entry.copy(), is_train)
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 195, in map_transform
self._update_cache(start, length)
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 169, in _update_cache
end = shift_timestamp(start, length)
File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/split.py", line 33, in shift_timestamp
return _shift_timestamp_helper(ts, ts.freq, offset)
TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'
Where the following preceded that code:
# Print Timestamp Statistics
earliest_time = min(example_ny_df.index)
latest_time = max(example_ny_df.index)
time_range_full = (max(example_ny_df.index) - min(example_ny_df.index)).days
# Determine Cut-point for 80/20 Training/Testing Splits
TRAININGSPLIT = 0.8
time_range_split = int(time_range_full * TRAININGSPLIT)
time_split = min(example_ny_df.index) + datetime.timedelta(days=time_range_split)
# Create Training Split / Predictor Object
FREQ = "1D"
training_data = ListDataset(
[{"start": earliest_time, "target": example_ny_df.positiveIncrease[:time_split]}],
freq = FREQ
)
# Setup GPU, if Exists
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Processing device:", DEVICE)
I'm going to try to redo this from a 100% clean install without even trying the GPU version of torch as mentioned #22
Hello, I was trying to install pytorch-ts using pipe install pytorchts in the Anaconda environment, and got the following error message. Any hint or suggestions on how to fix it will be highly appreciated. I did not get this kind of permission error when using pip to install other packages.
Successfully built pytorchts subprocess32 pathtools
Installing collected packages: smmap, gitdb, subprocess32, shortuuid, sentry-sdk, promise, pathtools, GitPython, docker-pycreds, configparser, wandb, torch, pytorchts
Attempting uninstall: torch
Found existing installation: torch 1.6.0
Uninstalling torch-1.6.0:
Successfully uninstalled torch-1.6.0
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\users\\anaconda3\\envs\\jane_street_kaggle\\lib\\site-packages\\~orch\\lib\\asmjit.dll'
Consider using the `--user` option or check the permissions.
Hi, first of all, great work.
I was wondering how to recover the information shown in Fig. 5 in the tempflow paper https://arxiv.org/abs/2002.06103
Thanks!
Hello,
I was trying to understand the code execution and stumbled across a potential bug.
The lags_for_fourier_time_features_from_frequency()
returns an incorrect result when you pass a minute level freq
arg.
Example : lags_for_fourier_time_features_from_frequency(freq='10min') == [1]
when it should return [1, 4, 12, 24, 48]
For other frequencies it works as expected -
lags_for_fourier_time_features_from_frequency(freq='1D') == [1, 7, 14]
lags_for_fourier_time_features_from_frequency(freq='10M') == [1, 12]
Could you please explain why max(lags)
is added to self.history_length
to increase context len in here ?
Hi,
I am working on some time series data that do not have a stable frequency. I have a datetime column of when X occurs, and this X can occur at any time after the time of the previous occurrence i.e Xt can be 1 minute after Xt-1 but even months after Xt-1.
Is there a way to adapt the frequency argument to be suitable for my case?
Thanks in advance!
Hello, im trying to use the dataset here https://github.com/smallGum/MLCNN-Multivariate-Time-Series/blob/master/data/nasdaq100_padding.csv to train TransformerTempFlowEstimator but i keep getting error related to the target_dim, here im using only 2 columns:
df = pd.read_csv ("./data/nasdaq100_padding.csv")
leng = len(df.NDX)
train = int(leng/2)
test = int(leng/2)
prediction_length = 15
training_data1 = ListDataset(
[{"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AAPL[:train]},
{"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AMZN[:train]}
],
one_dim_target=False,
freq = "min"
)
device = torch.device("cuda" )
estimator = TransformerTempFlowEstimator (freq="min",
prediction_length=prediction_length,
input_size=600,
target_dim = 2,
trainer=Trainer(epochs=15,
#learning_rate = 0.00001,
device=device,
num_batches_per_epoch=500,
batch_size=20))
predictor = estimator.train(training_data=training_data1)
im getting errors like: RuntimeError: Sizes of tensors must match except in dimension 0. Got 20 and 10 (The offending index is 0)
(which usually works by changing target_dim but then i get:)
and: RuntimeError: shape '[-1, 30, 3]' is invalid for input of size 600
Excuse me if this is a newbie question but when I try to run https://github.com/zalandoresearch/pytorch-ts/blob/master/examples/m5-tft.ipynb I am running into the following error:
Traceback (most recent call last):
File "C:\Users\Gili\Documents\myproject\pytorch-ts-test.py", line 17, in <module>
dataset = get_dataset("pts_m5", regenerate=False)
File "C:\Users\Gili\Documents\myproject\python\lib\site-packages\gluonts\dataset\repository\datasets.py", line 232, in get_dataset
return load_datasets(
File "C:\Users\Gili\Documents\myproject\python\lib\site-packages\gluonts\dataset\common.py", line 491, in load_datasets
meta = MetaData.parse_file(Path(metadata) / "metadata.json")
File "pydantic\main.py", line 613, in pydantic.main.BaseModel.parse_file
File "pydantic\parse.py", line 57, in pydantic.parse.load_file
File "C:\Python39\lib\pathlib.py", line 1248, in read_bytes
with self.open(mode='rb') as f:
File "C:\Python39\lib\pathlib.py", line 1241, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "C:\Python39\lib\pathlib.py", line 1109, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Gili\\.mxnet\\gluon-ts\\datasets\\pts_m5\\metadata.json'
Looking at https://www.kaggle.com/c/m5-forecasting-accuracy/data this file does not seem to exist. What am I missing here?
Hi, would it be possible to use the code for anomaly detection ? I'm interested in applying the conditional normalizing flows model for this in the framework of detecting gravitational waves and compare with the performance of other models like VAE + LSTM, GANs, etc ...
Thanks !!
Hi @kashif
I hope your well!
I am just trying to get familiar with the library and I generally work with the pandas data format.
Could you help me format my data and create a basic uni-variate model for prediction into the future.
import numpy as np
import pandas as pd
import yfinance as yf
data = yf.download("SPY", start="2012-01-01", end="2017-04-30")['Adj Close']
data=pd.DataFrame(data)
data
from gluonts.dataset.common import ListDataset
training_data = ListDataset(
[{"start": data.index[0], "target": data.values[:1300]}],
freq = "1D"
)
from gluonts.dataset.common import ListDataset
test_data = ListDataset(
[{"start": data.index[1301], "target": data.values[1301:1339]}],
freq = "1D"
)
from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer
estimator = SimpleFeedForwardEstimator(
num_hidden_dimensions=[10],
prediction_length=38,
context_length=100,
freq='1D',
trainer=Trainer(ctx="cpu",
epochs=5,
learning_rate=1e-3,
num_batches_per_epoch=100
)
)
predictor = estimator.train(training_data=training_data)
When I want to use
from pts.dataset.repository import get_dataset
dataset = get_dataset("m5", regenerate=False)
I get the warning that the files from Kaggle are not present in the directory:
RuntimeError: M5 data is available on Kaggle (https://www.kaggle.com/c/m5-forecasting-accuracy/data). You first need to agree to the terms of the competition before being able to download the data. After you have done that, please copy the files into /root/.pytorch/pytorch-ts/datasets/m5.
However, I have no idea where to put these files. I'm working in Google Colab. The root directory is called "content". Should I make a ./pytorch/pytorch-ts/datasets/m5 directory myself?
Many thanks to the authors for making the implementation available! Great initiative.
I am trying to run the README.md example, but it is not working.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
estimator = DeepAREstimator(freq="5min",
prediction_length=12,
input_size=43,
trainer=Trainer(epochs=10,
device=device))
predictor = estimator.train(training_data=training_data)
Running the code in readme until the snippet works just fine. When I run estimator.train(training_data=training_data)
, the above snipped throws the following error:
0it [00:02, ?it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-21-57e06c572bad> in <module>
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data)
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\estimator.py in train(self, training_data)
133
134 def train(self, training_data: Dataset) -> Predictor:
--> 135 return self.train_model(training_data).predictor
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\estimator.py in train_model(self, training_data)
118 trained_net = self.create_training_network(self.trainer.device)
119
--> 120 self.trainer(
121 net=trained_net,
122 input_names=get_module_forward_input_names(trained_net),
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\trainer.py in __call__(self, net, input_names, data_loader)
50 inputs = [data_entry[k].to(self.device) for k in input_names]
51
---> 52 output = net(*inputs)
53 if isinstance(output, (list, tuple)):
54 loss = output[0]
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
244 future_observed_values: torch.Tensor,
245 ) -> torch.Tensor:
--> 246 distr = self.distribution(
247 feat_static_cat=feat_static_cat,
248 feat_static_real=feat_static_real,
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
219 future_observed_values: torch.Tensor,
220 ) -> Distribution:
--> 221 rnn_outputs, _, scale, _ = self.unroll_encoder(
222 feat_static_cat=feat_static_cat,
223 feat_static_real=feat_static_real,
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
166
167 # (batch_size, num_features)
--> 168 embedded_cat = self.embedder(feat_static_cat)
169
170 # in addition to embedding features, use the log scale as it can help
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\modules\feature.py in forward(self, features)
28
29 return torch.cat(
---> 30 [
31 embed(cat_feature_slice.squeeze(-1))
32 for embed, cat_feature_slice in zip(
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\modules\feature.py in <listcomp>(.0)
29 return torch.cat(
30 [
---> 31 embed(cat_feature_slice.squeeze(-1))
32 for embed, cat_feature_slice in zip(
33 self.__embedders, cat_feature_slices
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
110
111 def forward(self, input):
--> 112 return F.embedding(
113 input, self.weight, self.padding_idx, self.max_norm,
114 self.norm_type, self.scale_grad_by_freq, self.sparse)
C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1482 # remove once script supports set_grad_enabled
1483 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1485
1486
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)
The vast majority of this repo seems to be copy-pasted from gluon-ts. It is a bit problematic as this is not really stated clearly, for instance in the README.md (which is also copy-pasted from gluon-ts).
I assume there is no ill-intent, could you perhaps state clearly which files does not come from gluon-ts? Note also, that all files that have been copy-pasted should still hold the initial copyright according to Apache license (see https://opensource.stackexchange.com/questions/5528/removing-copyright-notice-in-uis-of-apache-2-licensed-software).
Thanks!
i train a tempFlow model,and then save(serialize) in a folder, but when i load(deserialize)the model,error append,just like
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
p = Predictor.deserialize(get_model_path('tempflow'))
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\model\predictor.py", line 82, in deserialize
return tpe.deserialize(path, device)
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\model\predictor.py", line 172, in deserialize
transformation = load_json(fp.read())
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 173, in load_json
return decode(json.loads(s))
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 354, in decode
kwargs = decode(r["kwargs"]) if "kwargs" in r else {}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in decode
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in decode
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 354, in decode
kwargs = decode(r["kwargs"]) if "kwargs" in r else {}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in decode
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in decode
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 355, in decode
return cls(*args, **kwargs)
TypeError: init() got an unexpected keyword argument 'normalized'
It would be very useful to allow for forecast reconciliation of hierarchical and/or grouped time series. This means that the sum of all forecasts that make up a hierarchy matches to the forecast of the hierarchy. Say, you forecast several time series that are within the same hierarchy + the time series of the total (e.g., all tourism visits in Australia within all territories + Total Tourism of the territories as an aggregate). What forecast reconciliation does it makes sure that the bottom level forecasts match the top-level aggregate forecast. As PyTorch-TS is a probabilistic framework, we also need to make sure that the uncertainty attached to the forecasts are corrected.
Besides cross-sectional hierarchies, you may also want to include temporal hierarchies, so that you train the model on daily, weekly and monthly data, and you make sure that all sum up to the temporal hierarchy of interest, e.g., monthly forecast.
Several paper show that Cross-temporal coherent forecasts improve accuracy compared to not taking the information into account.
This is a non-exhaustive list of references intended to give a first overview over the topic:
I'd like to plot the predictions of the TempFlowEstimator
on a multivariate time series dataset, similar to what is done in the README of this repository.
When I make the forecasts as follows:
from pts.evaluation import make_evaluation_predictions
from pts.evaluation import MultivariateEvaluator
import numpy as np
evaluator = MultivariateEvaluator(quantiles=(np.arange(20)/20.0)[1:],
target_agg_funcs={'sum': np.sum})
forecast_it, ts_it = make_evaluation_predictions(dataset=dataset_test,
predictor=predictor,
num_samples=100)
forecasts = list(forecast_it)
targets = list(ts_it)
targets[0]
is a Pandas dataframe containing the true values for each of the (in my case 12) time series for all time steps. forecasts[0]
is a SampleForecast
object whose samples
is a Numpy array of shape (100, 365, 12). This means that we have 100 samples for each of the 365 time steps of the test set, for each of the 12 time series.
However, how can I plot the samples of the first time series for example? I tried to set the samples of the forecasts[0] object to the samples of the first series (i.e. forecasts[0].samples = forecasts[0].samples[:,:,0]
), but when I call the plot function on that I get
Exception Traceback (most recent call last)
<ipython-input-83-b629071ff750> in <module>()
----> 1 samples_first_time_series.plot()
2 frames
/usr/local/lib/python3.6/dist-packages/pts/model/forecast.py in plot(self, prediction_intervals, show_mean, color, label, output_file, *args, **kwargs)
132
133 p50_data = ps_data[i_p50]
--> 134 p50_series = pd.Series(data=p50_data, index=self.index)
135 p50_series.plot(color=color, ls="-", label=f"{label_prefix}median")
136
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
303 data = data.copy()
304 else:
--> 305 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
306
307 data = SingleBlockManager(data, index, fastpath=True)
/usr/local/lib/python3.6/dist-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
480 elif subarr.ndim > 1:
481 if isinstance(data, np.ndarray):
--> 482 raise Exception("Data must be 1-dimensional")
483 else:
484 subarr = com.asarray_tuplesafe(data, dtype=dtype)
Exception: Data must be 1-dimensional
I am currently using the Australian retail trade turnover data set to get familiar with PyTorch-TS in general and with TransformerTempFlowEstimator in particular. The data looks as follows:
Each series (133 series in total) has 417 months of training observations and is uniquely identified using two keys:
All series show quite some positive dependencies, as the correlation matrix shows:
As such, TransformerTempFlowEstimator seems to be a good option. I want to make use of both State and Industry as covariates in the model. For each categorical covariate, a generalized linear mixed model is fit to the outcome and the coefficients are returned as the encodings. The cardinality of State and Industry is [7, 20]. After bringing the data into the right format, I create the train data as follows:
train_ds = ListDataset([{FieldName.TARGET: target,
FieldName.START: start,
FieldName.ITEM_ID: item_id,
FieldName.FEAT_DYNAMIC_REAL: feat_dynamic_real,
FieldName.FEAT_STATIC_REAL: feat_static_real,
FieldName.FEAT_TIME: time_feat
}
for (target,
start,
item_id,
feat_dynamic_real,
feat_static_real,
time_feat
) in zip(target_train,
start_train,
item_id_train,
feat_dynamic_real_train,
feat_static_real_train,
time_feat_train
)],
freq = "1M")
feat_static_real_train contain the embeddings and time_feat_train the month information. To transform the data into a multivariate data set, I use
grouper_train = MultivariateGrouper(max_target_dim = 133) # as there are 133 unique series
train_ds = grouper_train(train_ds)
However, after using grouper_train(train_ds), none of the covariate information is included anymore. To bring them back, I use
train_ds.list_data[0]["feat_dynamic_real"] = feat_dynamic_real_train
train_ds.list_data[0]["feat_static_real"] = feat_static_real_train
I then train the model as follows:
np.random.seed(123)
torch.manual_seed(123)
trainer = Trainer( epochs = 40)
estimator = TransformerTempFlowEstimator(input_size = 401,
freq = "1M",
prediction_length = 24,
context_length = 48,
target_dim = 133,
cardinality = [7, 20],
trainer = trainer)
predictor = estimator.train(training_data = train_ds)
The model summary is
predictor.__dict__["prediction_net"]*
pts.model.transformer_tempflow.transformer_tempflow_network.TransformerTempFlowPredictionNetwork(act_type="gelu", cardinality=[7, 20], conditioning_length=200, context_length=48, d_model=32, dequantize=False, dim_feedforward_scale=4, dropout_rate=0.1, embedding_dimension=5, flow_type="RealNVP", hidden_size=100, history_length=60, input_size=401, lags_seq=[1, 12], n_blocks=3, n_hidden=2, num_decoder_layers=3, num_encoder_layers=3, num_heads=8, prediction_length=24, scaling=True, target_dim=133)
I also compared the forecast to some competing models, even though I am not sure that all models are correctly specified (i.e., covariate information, no parameter tuning).
Given the strong dependencies between the different series, I would suspect that TransformerTempFlowEstimator should outperform models that treat the series as being independent.
Based on the above summary, I have the following questions concerning the proper use of TransformerTempFlowEstimator:
"FieldName.START = 'start'",
"FieldName.TARGET = 'target'",
"FieldName.FEAT_STATIC_CAT = 'feat_static_cat'",
"FieldName.FEAT_STATIC_REAL = 'feat_static_real'",
"FieldName.FEAT_DYNAMIC_CAT = 'feat_dynamic_cat'",
"FieldName.FEAT_DYNAMIC_REAL = 'feat_dynamic_real'",
"FieldName.FEAT_TIME = 'time_feat'",
"FieldName.FEAT_CONST = 'feat_dynamic_const'",
"FieldName.FEAT_AGE = 'feat_dynamic_age'",
"FieldName.OBSERVED_VALUES = 'observed_values'",
"FieldName.IS_PAD = 'is_pad'",
"FieldName.FORECAST_START = 'forecast_start'"]
It's a great work! And I want to apply the model to predict air quality such as PM10 and PM2.5 based on temperature, wind speed and direction and so on. Can this model directly accept pytorch's dataloader object? If not, how to customize dataset?
First of all, thanks a lot for the interesting paper and for open-sourcing the corresponding model!
I was wondering about the precise relation of this project to gluonts. In the readme you're saying that this project uses gluonts for data loading, transformations etc., but looking at the source code, it seems like you essentially did a port of the existing gluonts code to pytorch? So in that sense you're using the gluonts API and if I have some function (like a transform) coded for gluonts, chances are that it is compatible with this project due to python's duck typing?
Is this the correct understanding?
I have been trying to get TransformerTempFlowEstimator working without success.
Can you provide an example script? Issues include RuntimeError: Sizes of tensors must match except in dimension 2. Got 1 and 32 in dimension 0
and not understanding how the data loading works for multivariate data.
My example below:
from pts.dataset import MultivariateGrouper
import pandas as pd
import torch
from pts.dataset import ListDataset
from pts.model.transformer_tempflow import TransformerTempFlowEstimator
from pts import Trainer
url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
train_ds = ListDataset(
[{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]+i}
for i in range(2)],
freq="5min"
)
grouper_train = MultivariateGrouper(max_target_dim=2)
gt = grouper_train(train_ds)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trainer = Trainer(epochs=10)
estimator = TransformerTempFlowEstimator(input_size=1,
freq="5min",
prediction_length=100,
context_length=4,
target_dim=64,
cardinality=[7, 20],
trainer=trainer)
predictor = estimator.train(training_data=gt)
Hi, with the latest update of making gluonts a requirement, I think there seems to be an issue with the installation via pip.
The error I get:
ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
pytorchts depends on gluonts@ git+https://github.com/awslabs/gluon-ts.git@master#egg=gluonts
Hey,
I ran into an issue while testing your example code. I use a windows pc with cpu. Latest version of torch is installed 1.7.1.
Any idea what could resolve the issue?
Thanks,
Pieter
RuntimeError Traceback (most recent call last)
in
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data, num_workers=2)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\estimator.py in train(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
171 shuffle_buffer_length=shuffle_buffer_length,
172 cache_data=cache_data,
--> 173 **kwargs,
174 ).predictor
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\estimator.py in train_model(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
143 net=trained_net,
144 train_iter=training_data_loader,
--> 145 validation_iter=validation_data_loader,
146 )
147
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\trainer.py in call(self, net, train_iter, validation_iter)
68 inputs = [v.to(self.device) for v in data_entry.values()]
69
---> 70 output = net(*inputs)
71 if isinstance(output, (list, tuple)):
72 loss = output[0]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
252 future_time_feat=future_time_feat,
253 future_target=future_target,
--> 254 future_observed_values=future_observed_values,
255 )
256
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
226 past_observed_values=past_observed_values,
227 future_time_feat=future_time_feat,
--> 228 future_target=future_target,
229 )
230
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
166
167 # (batch_size, num_features)
--> 168 embedded_cat = self.embedder(feat_static_cat)
169
170 # in addition to embedding features, use the log scale as it can help
~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\modules\feature.py in forward(self, features)
35 embed(cat_feature_slice.squeeze(-1))
36 for embed, cat_feature_slice in zip(
---> 37 self.__embedders, cat_feature_slices
38 )
39 ],
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\modules\feature.py in (.0)
34 [
35 embed(cat_feature_slice.squeeze(-1))
---> 36 for embed, cat_feature_slice in zip(
37 self.__embedders, cat_feature_slices
38 )
~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1850 # remove once script supports set_grad_enabled
1851 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1852 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1853
1854
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
Hi, I'm executing the following code from the README and
import pandas as pd
import matplotlib.pyplot as plt
import torch
print(torch.__version__)
import gluonts
from gluonts.dataset.common import ListDataset
from gluonts.dataset.util import to_pandas
import pts
from pts.model.deepar import DeepAREstimator
from pts import Trainer
print(pts.__version__, gluonts.__version__)
url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()
training_data = ListDataset(
[{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
freq = "5min"
)
device = "cpu"
estimator = DeepAREstimator(freq="5min",
prediction_length=12,
input_size=43,
trainer=Trainer(epochs=10,
device=device))
predictor = estimator.train(training_data=training_data, num_workers=4)
and got the following error:
1.9.0
0.0.0-unknown 0.8.0
203 expected_input_dim, input.dim()))
204 if self.input_size != input.size(-1):
--> 205 raise RuntimeError(
206 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
207 self.input_size, input.size(-1)))
RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19
Version:
pip list | grep pytorchts
pytorchts 0.5.1
Any suggestions ?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.