johannfaouzi / pyts Goto Github PK
View Code? Open in Web Editor NEWA Python package for time series classification
Home Page: https://pyts.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
A Python package for time series classification
Home Page: https://pyts.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
In the documentation, it is specified that most algorithms only work with fixed-length sequences. What is the recommended approach for dealing with Multichannel time series event classification using samples with variable length? Zero padding perhaps?
Thank you for your contribution to implementing this awesome algo!
I was working on your WEASEL code, but I just figured out that the windowing is not overlapping. Do you mind if I ask you why?
windowed_view(X,12,12)[0]
Hi,
I am creating MTF matrices from the Lightning2 time-series dataset using the pyts module. When entering a high number for the quantile bins (n_bins) the MTF transformation does not succeed:
from scipy.io import arff
import pandas as pd
from pyts.image import GASF, GADF, MTF
lighttrain = arff.loadarff('Datasets/Lightning2/Lightning2_TRAIN.arff')
dftrain0 = pd.DataFrame(lighttrain[0])
dftrain = dftrain0.drop('target', axis=1) #taking away the class labels
matsize = 16
qsize = 40
dfmtftr = MTF(matsize,n_bins=qsize).fit_transform(dftrain)
Error in console:
File "/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 462, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/anaconda3/lib/python3.6/site-packages/pyts/image/image.py", line 310, in transform
for i in range(n_samples)])
File "/anaconda3/lib/python3.6/site-packages/pyts/image/image.py", line 310, in
for i in range(n_samples)])
ValueError: bins must be monotonically increasing or decreasing
theoretically I should be able to sort the data into as many bins as I want, shouldn't I? I cannot see a reason in the image.py
source-code why the error should occur at n_bins=40
but not at n_bins=8
. Are there specific boundaries for image_size
and n_bins
?
The source time-series has a length of 637.
ImportError Traceback (most recent call last)
in
----> 1 from pyts.image import GASF, GADF, MTF
ImportError: cannot import name 'GASF' from 'pyts.image' (/home/jamazzon/anaconda3/envs/fastaiv3_m2/lib/python3.7/site-packages/pyts/image/init.py)
from pyts.image import GASF, GADF, MTF
NumPy 1.16.2
SciPy 1.2.1
Scikit-Learn 0.20.3
Numba 0.43.1
Pyts 0.8.0
I don't have a bug or issue to report. Just some questions. First, the word-bank size for the Weasel transformation is limited to 26. Apparently this is because there are 26 letters in the English alphabet, but this seems like a rather arbitrary limitation. Why was this designed this way? Secondly, is it possible to serialize a Weasel transformer? suppose I have a large time-series that takes a while to model. Shouldn't there be a warm-start process?, best if it's serializable to str or json, but even pickle would be nice. if it is let me know!
Thanks for the great package!
Andrew
I have data in the same format used for load_basic_motions(return_X_y=True)
But when I created my data set I had to pad some time series with zeros
This made the length all the same and I ended up with a ndarray fo X_train shape of (177,12,111) and y_train hape of 177
When I run clf.fit I get a ValueError that says the following:
At least two consecutive quantiles are equal. Consider trying with a smaller number of
bins or removing timestamps with low variation.
the Bin edges seem to all be zero
Is this because of the padding?
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.5, random_state = 42)
transformer = WEASELMUSE()
logistic = LogisticRegression(random_state=1, max_iter=10000,solver='liblinear', multi_class='ovr')
clf = make_pipeline(transformer, logistic)
clf.fit(X_train, y_train)
NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0
Is it possible to encode 2 x 20 timeseries into a GAF? Trying to encode the Open and Volume fields of the S&P index. Thank you for your effort.
<< your code here >>
If each sample is not the same length of n_timestamps, is it possible to perform WEASEL+MUSE on that dataset?
If I construct an array of (n_samples, n_features, n_timestamps) with n_timestamps that aren't equal, I get an exception from the validation of the array.
If I pad the samples with None or if I pad with some constant value, I get an exception (NaNs not allowed / quantiles equal).
Is there a solution/workaround to this problem, or am I chasing something that isn't allowed?
Thank you for a great package!
from pyts.multivariate.transformation import WEASELMUSE
# Create Fake Simple Dataset
# 3 Samples, 2 Features, Different n_timestamps
X = [[[3, 1, 0, 2, 2], [2, 2, 3, 3, 5]],
[[0, 1, 0, 5, 2, 4], [3, 0, 0, 2, 4, 2]],
[[1, 0, 1, 3, 5], [1, 3, 1, 2, 3]]]
y = [1, 0, 1]
transformer = WEASELMUSE(word_size=4, n_bins=2, window_sizes=[8],
chi2_threshold=15, sparse=False)
X_new = transformer.fit_transform(X, y)
Version 0.8.0 is available on PyPI. A lot of changes have been made to the package, with no backward compatibility unfortunately (but it was necessary). The package was significantly improved, in terms of optimization, code cleaniless and documentation. Here is a summary of the changes:
No more Python 2 support
New package required: numba
Updated required versions of packages
Modification of the API:
quantization
module merged in approximation
and removed
bow
module renamed bag_of_words
Fewer acronyms used for the names of the classes: if an algorithm has a name
with three words or fewer, the whole name is used.
More preprocessing tools in preprocessing
module
New module metrics
with metrics specific to time series
Improved tests using pytest tools
Reworked documentation
Updated continuous integration scripts
More optimized code using numba
Hi,
I wonder why the approximation functions PAA and DFT are applied to the rows? In my opinion based on what I found in the papers and dissertation of Patrick Schäfer, this should be applied to the columns (along the time axis). Am I wrong?
For example the code below returns an error:
import numpy as np
import pyts.approximation as pya
x = np.random.randn(100,2)
paa = pya.PAA(window_size=10)
x_paa = paa.fit_transform(x)
print('Shape of x {}'.format(x.shape))
print('Shape of x_paa {}'.format(x_paa.shape))
ValueError: 'window_size' must be lower or equal than the size of each time series.
However, what i´ve been expecting is the following:
import numpy as np
import pyts.approximation as pya
x = np.random.randn(100,2)
paa = pya.PAA(window_size=10)
x_paa = paa.fit_transform(x)
print('Shape of x {}'.format(x.shape))
print('Shape of x_paa {}'.format(x_paa.shape))
Shape of x (100, 2)
Shape of x_paa (10, 2)
Regards,
legout
I am using pyts 0.7.5 version and using GAF to convert time series data into images. When I am using transform(self, X), Shape of is defined as:
X : array-like, shape = [n_samples, n_features].
Here, what is the purpose of n_features?
Hi,
Great work with time series, congrats!
It is a question instead an issue, i'm working in a multivariate time series classification problem, the dataset is in the form:
feature 1 | feature 2 | ... | feature n | label | time_series_group |
---|---|---|---|---|---|
23.4 | 12.5 | ... | 1.34 | Class 1 | 1 |
27.3 | 14.7 | ... | 2.46 | Class 1 | 1 |
... | ... | ... | ... | Class 1 | 1 |
26.5 | 11.2 | ... | 2.32 | Class 1 | 1 |
28.4 | 11.5 | ... | 1.54 | Class 1 | 2 |
27.3 | 14.3 | ... | 2.09 | Class 1 | 2 |
... | ... | ... | ... | Class 1 | 2 |
21.2 | 14.9 | ... | 3.34 | Class 1 | 2 |
25.4 | 12.5 | ... | 1.34 | Class 2 | 3 |
28.3 | 14.7 | ... | 2.46 | Class 2 | 3 |
... | ... | ... | ... | Class 2 | 3 |
29.5 | 11.9 | ... | 1.35 | Class 2 | 3 |
27.4 | 10.5 | ... | 1.54 | Class 2 | 4 |
21.3 | 17.3 | ... | 2.09 | Class 2 | 4 |
... | ... | ... | ... | Class 2 | 4 |
26.4 | 13.6 | ... | 2.47 | Class 2 | 4 |
The dataset above isn´t real, it is only an example. The column time_series_group identifies the quantity of rows that represent the information belonging a multidimensional time series. For instance, we going to consider that each group of time series have the same number of rows m, so, the first m rows (identified by the time_series_group = 1) have the information of the n features that represent the Class 1. Each column of features, inside a group number 1, represent a time series of the feature behavior, therefore, the Class 1 is identified by the n (number of features) time series.
Is there a way to solve this kind of problem with pyts? If not, do you have any thoughts about how to approach it?
ValueError: could not broadcast input array from shape (48) into shape (1)
(both from the pip package and git)
Very useful toolbox, thanks.
greetings. i've been trying to use your DTW library to shift two series (DEPTH in this case) to each other .
a value is generated howerver . it's always postive . and in my case i need it to specifiy is the shift up (negative) or down (postive)
for each barrel
https://github.com/sudomaze/core-dtw
please guide me to the correct way
Hi,
I was running the for Markov Transition Field example and getting the following error
X_mtf = mtf.transform(X)
D:\Anaconda3\envs\tf-gpu4\lib\site-packages\pyts\image\image.py:321: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)]
instead of arr[seq]
. In the future this will be interpreted as an array index, arr[np.array(seq)]
, which will result either in an error or a different result.
MTF[np.meshgrid(list_values[i], list_values[j])] = MTM[i, j]
Traceback (most recent call last):
File "", line 1, in
X_mtf = mtf.transform(X)
File "D:\Anaconda3\envs\tf-gpu4\lib\site-packages\pyts\image\image.py", line 301, in transform
remainder)
File "D:\Anaconda3\envs\tf-gpu4\lib\site-packages\numpy\lib\shape_base.py", line 357, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
File "D:\Anaconda3\envs\tf-gpu4\lib\site-packages\pyts\image\image.py", line 336, in _mtf
np.arange(start[j], end[j])].mean()
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (5,)
GASF and GADF example worked fine. Kindly assist.
Regards,
Avi
A common approach to constructing a sample for time series classification is to take one long time series and break it into smaller, overlapping series which each have their own label. However, if you have overlapping sequences, the ShapeletTransform and LearningShapelets algorithms in pyts return the same shapelets, since these will occur in multiple samples. Setting remove_similar = True in ShapeletTransform does not resolve this since it only excludes similar shapelets "taken from the same time series". It would be great if these algos would consider only a single instance of a shapelet that is shared across multiple time series samples. At the moment, I manually remove identical shapelets from the output (so that the number of shapelets I actually end up with is < n_shapelets).
The documentation commonly uses this tuple: (samples, timestamps). That doesn't make any sense in my brain as I've always thought of those being the same thing. If I'm sampling something, I'm reading a sensor value periodically. I could create a timestamp for that sample, but I also have the sensor's value at that time. My input data is (samples, sensor values). It has one row for each time I read the sensors, and a column for the value of each sensor. I think this is called the "wide" data format. Is pyts compatible with the wide data format? Or is there an easy way to transform my data into something compatible with pyts?
great code thanks
may you clarify :
will it work for multivariate time series prediction both regression and classification
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values
color weight gender height age
1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32
Just a few thoughts from quickly running the examples, regarding the relationship of some of these estimators to those of scikit-learn.
Maybe mentioning some of it in the docstrings could be useful for users familiar with scikit-learn,
pyts.transformer.StandardScaler
is similar to sklearn.preprocessing.StandardScaler
except that the scaling is done along axis=1 instead of 0, (i.e. equivalent of using sklearn.preprocessing.scale(X, axis=1)
) with an additional "empirical variance" added during the normalization.
For VSM(window_size=4, numerosity_reduction=False)
+ SAXVSMClassifier
, I wonder if it's equivalent to running
CountVectorizer(analyzer='char', ngram_range=(4, 4))
on the SAX output, followed by class-wise normalization with TfidfTransformer
and finally applying the NearestCentroid(metric='cosine')
. I've not read the SAX-VSM paper in detail, so I might be missing something.
In any case these are interesting analogies, and something that might be used for unit tests #3 (if true). Also using the scikit-learns TF-IDF methods, that have been optimized for a while, will probably be computational faster (in particular they use sparse arrays instead of lists).
I know you don't plan to actively develop this package at the moment, so just writing it for future reference..
Got error when import this lib
Just using pip3 install pyts
to install this lib, an get the following error:
$ python3
Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyts.image import RecurrencePlots
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'RecurrencePlots' from 'pyts.image' (/usr/local/lib/python3.8/site-packages/pyts/image/__init__.py)
NumPy 1.18.1
SciPy 1.5.2
Scikit-Learn 0.23.2
Numba 0.51.2
Pyts 0.11.0
I have a ZeroDivisionError trying to fit LearningShapelets
data.txt
import pandas as pd
from pyts.classification.learning_shapelets import LearningShapelets
train_data = pd.read_csv('data.txt', index_col=0)
y_train = train_data['class']
x_train = train_data.drop('class', axis=1)
pls = LearningShapelets()
pls.fit(x_train, y_train)
NumPy 1.18.5
SciPy 1.4.1
Scikit-Learn 0.22.2
Numba 0.51.2
Pyts 0.11.0
Version 0.9.0 is available on PyPI (and on Anaconda Cloud via the conda-forge channel for this first time!). Brief summary of the changes:
datasets
module with dataset loading utilitiesmultivariate
module with utilities for multivariate time seriesExamples
section in most of the public functions and classes<<
from pyts.image import mtf
Traceback (most recent call last):
File "C:\Users\BBD\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-33-02178b70bcc1>", line 1, in <module>
from pyts.image import mtf
File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\image\__init__.py", line 6, in <module>
from .gaf import GramianAngularField
File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\image\gaf.py", line 8, in <module>
from ..approximation import PiecewiseAggregateApproximation
File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\approximation\__init__.py", line 4, in <module>
from .sax import SymbolicAggregateApproximation
File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\approximation\sax.py", line 6, in <module>
from ..preprocessing import KBinsDiscretizer
File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\preprocessing\__init__.py", line 4, in <module>
from .transformer import PowerTransformer, QuantileTransformer
File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\preprocessing\transformer.py", line 4, in <module>
from sklearn.preprocessing import PowerTransformer as SklearnPowerTransformer
ImportError: cannot import name 'PowerTransformer'
>>
Just wondering, are the image pixels values in range [0 255] for all image encodings (Gaf, Mtf, RecPlot)?
So recurrence plot pyts.image.RecurrencePlot
works really nicely, but can be a bit slow. I want to write a function that would make a movie for threshold='point'
and percentage
in range 0 to 100. I can do it trivially by repeating the plot independently 100 times, but I presume that (theoretically) some part of the work can be re-used if percentage is changed but data stays the same. Can this be done?
I'm trying to get a WEASEL transform of just one single time series and am running into issues, see below.
Please advise.
Thanks.
from pyts.transformation import WEASEL
n_samples, n_timestamps = 1, 100
n_classes = 1
rng = np.random.RandomState(41)
X = rng.randn(n_samples, n_timestamps)
y = rng.randint(n_classes, size=n_samples)
weasel = WEASEL(word_size = 2, n_bins = 2, window_sizes=[12, 36])
X_weasel = weasel.fit_transform(X, y)
plt.figure(figsize=(12, 8))
vocabulary_length = len(weasel.vocabulary_)
width = 0.3
plt.bar(np.arange(vocabulary_length) - width / 2, X_weasel[0],
width=width, label='First time series')
plt.xticks(np.arange(vocabulary_length),
np.vectorize(weasel.vocabulary_.get)(np.arange(X_weasel[0].size)),
fontsize=12, rotation=60)
plt.yticks(np.arange(np.max(X_weasel[:2] + 1)), fontsize=12)
plt.xlabel("Words", fontsize=18)
plt.ylabel("Frequencies", fontsize=18)
plt.title("WEASEL transformation", fontsize=20)
plt.legend(loc='best')
plt.show()
ValueError Traceback (most recent call last)
in
13 weasel = WEASEL(word_size = 2, n_bins = 2, window_sizes=[12, 36])
14 # X_weasel = weasel.fit_transform(X, y).toarray()
---> 15 X_weasel = weasel.fit_transform(X, y)
16 # X_weasel = weasel.fit_transform(np.array(X), np.array(y)).toarray()
17
~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/transformation/weasel.py in fit_transform(self, X, y)
258 )
259 y_repeated = np.repeat(y, n_windows)
--> 260 X_sfa = sfa.fit_transform(X_windowed, y_repeated)
261
262 X_word = np.asarray([''.join(X_sfa[i])
~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/sfa.py in fit_transform(self, X, y)
157 )
158 self.pipeline = Pipeline([('dft', dft), ('mcb', mcb)])
--> 159 X_sfa = self.pipeline.fit_transform(X, y)
160 self.support = self.pipeline.named_steps['dft'].support
161 self.bin_edges = self.pipeline.named_steps['mcb'].bin_edges
~/anaconda3/envs/tf36/lib/python3.6/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
391 return Xt
392 if hasattr(last_step, 'fit_transform'):
--> 393 return last_step.fit_transform(Xt, y, **fit_params)
394 else:
395 return last_step.fit(Xt, y, **fit_params).transform(Xt)
~/anaconda3/envs/tf36/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
554 else:
555 # fit method of arity 2 (supervised transformation)
--> 556 return self.fit(X, y, **fit_params).transform(X)
557
558
~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in fit(self, X, y)
113 self.check_constant(X)
114 self.bin_edges = self._compute_bins(
--> 115 X, y, n_timestamps, self.n_bins, self.strategy)
116 return self
117
~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in _compute_bins(self, X, y, n_timestamps, n_bins, strategy)
207 )
208 else:
--> 209 bins_edges = self._entropy_bins(X, y, n_timestamps, n_bins)
210 return bins_edges
211
~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in _entropy_bins(self, X, y, n_timestamps, n_bins)
221 "The number of bins is too high for feature {0}. "
222 "Try with a smaller number of bins or remove "
--> 223 "this feature.".format(i)
224 )
225 bins[i] = threshold
ValueError: The number of bins is too high for feature 0. Try with a smaller number of bins or remove this feature.
If I understand the WEASEL+MUSE algorithm correctly it should be possible to use it with samples of different lengths.
This is currently not possible with the API of the WEASELMUSE class which expects a 3d array in the shape = (n_samples, n_features, n_timestamps) since a numpy array has the same shape for all samples.
I tried to fill the time series of all samples to the length of the longest samples with nan values, but the input checks reject nan values.
Is there a way to achieve using samples of different lengths?
Hi, I am getting many of these warnings with ShapeletTransform:
D:\Anaconda3\envs\tipjar\lib\site-packages\numpy\core\_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
st = ShapeletTransform(n_jobs=cpus)
features = st.fit_transform(X_trn_seg, y_trn_seg)
NumPy 1.19.1
SciPy 1.4.1
Scikit-Learn 0.23.1
Numba 0.49.1
Pyts 0.11.0
Hi --
Do you have an example of using WEASEL to get results close to those reported in the paper? I'm playing around w/ using WEASEL as a featurizer on some of the UCR datasets, but my results aren't nearly as good as in the paper -- I'm guessing I'm not using the right hyperparameters. Any ideas?
Thanks!
Hi, I am getting this error message
st = ShapeletTransform(n_jobs=cpus)
st.fit(X_trn_seg, y_trn_seg)
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
r = call_item()
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 253, in __call__
for func, args, kwargs in self.items]
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 253, in <listcomp>
for func, args, kwargs in self.items]
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 522, in _fit_one_time_series
X, window_sizes, shapelets, lengths, fit=True)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 115, in _derive_all_distances
X, n_samples, n_timestamps, window_sizes, shapelets, lengths
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/numba/core/dispatcher.py", line 404, in _compile_for_args
error_rewrite(e, 'unsupported_error')
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/numba/core/dispatcher.py", line 344, in error_rewrite
reraise(type(e), e, None)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/numba/core/utils.py", line 80, in reraise
raise value.with_traceback(tb)
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure IR is legal prior to lowering)
Tuple 'shapelets' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.
File "../anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 71:
@njit()
def _derive_all_squared_distances_fit(
^
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "pipeline.py", line 25, in <module>
X_trn, st = shaplet(X_trn_seg, y_trn_seg, X_trn) # Shapelet features
File "/home/jmrichardson/tipjar/tipjar/prepare/shapelet.py", line 14, in shaplet
st.fit(X_trn_seg, y_trn_seg)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 304, in fit
window_steps, n_jobs, rng) = self._check_params(X, y)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 499, in _check_params
window_sizes = self._auto_length_computation(X, y, rng, n_jobs)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 623, in _auto_length_computation
self.remove_similar, n_jobs, rng
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 570, in _fit
for i in range(n_samples))
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 1042, in __call__
self.retrieve()
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 921, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure IR is legal prior to lowering)
Tuple 'shapelets' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.
File "../anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 71:
@njit()
def _derive_all_squared_distances_fit(
^
NumPy 1.19.2
SciPy 1.4.1
Scikit-Learn 0.23.1
Numba 0.49.1
Pyts 0.12.dev0
When using WEASELMUSE for multivariate time series classification the result of the trandformer give a very large number of features 650,000. Also, the number of counter in the histogram is sometime zero for some examples. Is this expected?
I used my own data set the result of X_weasel was an ndarray size 1500 x 650000.
The 1500 makes sense as this is the number of examples I had, but the 650000 seems large.
I use the following code below. Also, when using the same code in the example when loading basic motions I get similar results. Large number of feature and some examples with all zeros. Thus if I plot the histogram there is nothing to plot.
transformer = WEASELMUSE(strategy='uniform', word_size=4, window_sizes=np.arange(5, 70), sparse=False)
X_weasel = transformer.fit_transform(X_train, y_train)
NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0
The MTF example ( https://pyts.readthedocs.io/en/latest/auto_examples/image/plot_mtf.html ) ran on my machine, so I know the setup is complete. But for some reason I am getting this error when I pass my 2-D array:
user@ubuntu:~/Desktop/pyts_ex$ python prog.py
Traceback (most recent call last):
File "prog.py", line 61, in
x_train_np_new = mtf.fit_transform(x_train_np)
File "/home/user/.local/lib/python2.7/site-packages/sklearn/base.py", line 464, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/usr/local/lib/python2.7/dist-packages/pyts/image/mtf.py", line 142, in transform
X_binned = discretizer.fit_transform(X)
File "/home/user/.local/lib/python2.7/site-packages/sklearn/base.py", line 464, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/usr/local/lib/python2.7/dist-packages/pyts/preprocessing/discretizer.py", line 119, in transform
self._check_constant(X)
File "/usr/local/lib/python2.7/dist-packages/pyts/preprocessing/discretizer.py", line 140, in _check_constant
raise ValueError("At least one sample is constant.")
ValueError: At least one sample is constant.
My interpretation for this error is that pyts requires some kind of constant, which seems to be missing from my provided array, any thought how to fix this? Thanks.
Hello! Is there a way to inverse_transform any of the 'approximation' methods? I have a dataset of simple 20,0001D timeseries only 30 units long (i.e., [20,000 x 30]) and I want to reduce this to say 5 (i.e. to [20,000 x 5]). So far I have been using PCA and kernel PCA, neither of which are meant for time series work, and I would like to compare performance with the various algorithms in this package, but unfortuantely I did not find a way to inverse-transform the reduced time series.
<< your code here >>
Hi, first of all, I really appreciate this wonderful library for processing time-series related issues. I am using pyts to loading UEA datasets, but I found that when I load a binary-class dataset, but the loaded labels are not binary. After debugging, I guess this might some issues existed with the line I provided below.
Line 297 in 1aa4558
I am not sure that my interpretation is correct or not. I look forward to hearing from you, and I wish this powerful tool becomes better and better. Thanks so much. :)
Very nice package. I was just wondering if there were any plans to add unit tests ?
I was wondering, is fitted.shape
equal to (6, 6, 6) because it fitted and transformed for each feature and generated an image for each one, so I have 6 6x6 images?
This code shows what I mean. fitted.shape
is (6, 6, 6)
.
from pyts.image import GramianAngularField
gasf = GramianAngularField(image_size=6, method='summation')
cueSamples = numpy.zeros((6, 314)) # shape = (6, 314)
fitted = gasf.fit_transform(cueSamples)
fitted.shape # (6, 6, 6)
NumPy 1.20.2
SciPy 1.6.2
Scikit-Learn 0.24.1
Numba 0.53.1
Pyts 0.11.0
I am using Gramain Angular Transformation to encode my time series data into images. For some reconstruction purpose, I need to perform inverse transformation on gaf matrix. Is there any way that could help me do that ?
I'm looking to train a Connectionist Temporal Classification (CTC) classifier. The input is a sequence of tensors of length N and the output a sequence of length M, M<N. I want to use a Gramian Angular Field to encode the input sequence.
From what I understand pyts Gramian Angular Field encodes the entire input to a single output? So given a series of 1x1000 where 1 is the batch dimension and 1000 is the series length, I get back a single tensor 1x32x32, what I want is Bx32x32 where B is the number of windows.
Is there a way of doing this using pyts? I'm guessing I could just reshape the input from 1x1000 to say 10x100 but is there a transform which does this, perhaps with overlap etc?
This is a question, and I hope it is ok to ask it here.
While looking at the pyts.image.gaf module I've come across the parameter 'overlapping' that says "If True, reduce the size of each time series using PAA with possible overlapping windows."
However, I am not clear what effect does this parameter have on GAF image calculation? By how much does it reduce the size of the time series using Piecewise aggregate approximation, and also how does it determine possible overlapping windows (what are those windows)?
Also, in the example of GramianAngularField (https://pyts.readthedocs.io/en/latest/auto_examples/image/plot_gaf.html) i see that fit transform does not pass the y values, or are they encoded in the variable X somehow ?
This is a question. My understanding is Weasel transforms original continuous-valued features to bag of pattern, then how to do classification after the transformation?
Hi,
Please add the following code to the documentation. For me it was not clear that the input is multiple time series in the form of 2D array. I thought it is a plot image of the function. It turned up to be completely false. It is indeed different epochs/time series for each line of this 2D ... let's call it "input". Below is an example of a single time series (the output of a sin with 126 values) and the resulting single reccurence plot.
import numpy as np
import matplotlib.pyplot as plt
import datetime
import sys
from pyts.image import RecurrencePlot
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
sin_func = np.arange(0,4*np.pi,0.1) # start,stop,step
y = np.sin(sin_func)
X = np.reshape(y,(1, y.size)) #1 because we have only one input
# Recurrence plot transformation
rp = RecurrencePlot(threshold='point', percentage=20)
X_rp = rp.fit_transform(X)
plt.subplot(211)
plt.plot(y)
plt.subplot(212)
plt.imshow(X_rp[0], cmap='binary', origin='lower')
Does pyts library support the generation of unthresholded recurrence plots?
There are 2 problems.
The first one is the misalign of the documentation (https://pyts.readthedocs.io/en/latest/) that point to latest commit and the version on conda and pip still at 0.10 (while latest is the 0.11).
Maybe it is not a problem but i think it usefull if docs are sync with the package.
The second problem is the setup.py of the repository.
At the line 18 there is:
INSTALL_REQUIRES = ['numpy>=1.15.4'
'scipy>=1.3.0'
'scikit-learn>=0.22.1'
'joblib>=0.12'
'numba==0.46.0']
It missing some comma and as result when doing some pip install -e .
it doesn't work...
It must be replaced simply with:
INSTALL_REQUIRES = ['numpy>=1.15.4',
'scipy>=1.3.0',
'scikit-learn>=0.22.1',
'joblib>=0.12',
'numba==0.46.0']
Thank you
I was trying this python library on my mac and notice that during visualisation from this library, it does not automatically show the plot unless I explicitly show it. I think it would be a nice feature that it automatically show the plot when I use the visualisation features. It is probably as simple as adding plt.show()
at the end.
Please forgive me for my poor English since English is not my native language.
I have read the source code and Sakoe_Chiba band generation examples from tslearn, pyts. The generation manner of the Sakoe_Chiba seems different between pyts and tslearn, which leads to different calculation results when comparing 2 sequences with different lengths. I have also read the source code of fastdtw(https://pypi.org/project/fastdtw/) when the radius parameter is different, the calculation results between fastdtw and pyts are also varied.
import numpy as np
from tslearn.metrics import dtw as ts_dtw
from pyts.metrics import dtw as py_dtw
from fastdtw import fastdtw
if __name == "__main__":
np.random.seed(2020)
seq_0 = np.random.randn(140)
seq_1 = np.random.randn(50)
# Experiment 1: Different DTW caculation(Consistent)
# [INFO] DTW Calculation:
# -- tslearn dtw: 8.26240
# -- pyts dtw: 8.26240
print("[INFO] DTW Calculation:")
print("-- tslearn dtw: {:.5f}".format(ts_dtw(seq_0, seq_1)))
print("-- pyts dtw: {:.5f}".format(py_dtw(seq_0, seq_1)))
py_dtw
# Experiment 2: FastDTW calculation(Inconsistent)
# [INFO] FastDTW Calculation:
# -- FastDTW results: 8.67608
# -- pyts FastDTW: 9.12243
print("\n[INFO] FastDTW Calculation:")
print("-- FastDTW results: {:.5f}".format(
np.sqrt(fastdtw(seq_0, seq_1, radius=2, dist=lambda x, y: (x-y)**2)[0])))
print("-- pyts FastDTW: {:.5f}".format(
py_dtw(seq_0, seq_1, method="fast", options={"radius": 2})))
# Experiment 3: Sakoe_Chiba calculation(Inconsistent)
# [INFO] Sakoe_Chiba Calculation:
# -- tslearn Sakoe_Chiba dtw: 8.26240
# -- pyts Sakoe_Chiba dtw: 10.49161
print("\n[INFO] Sakoe_Chiba Calculation:")
print("-- tslearn Sakoe_Chiba dtw: {:.5f}".format(
ts_dtw(seq_0, seq_1, sakoe_chiba_radius=5)))
print("-- pyts Sakoe_Chiba dtw: {:.5f}".format(
py_dtw(seq_0, seq_1, method="sakoechiba", options={"window_size": 5})))
# Experiment 4: itakura calculation(In this example, they are consistent, however, I haven't read the source code yet)
# [INFO] itakura Calculation:
# -- tslearn itakura dtw: 8.51087
# -- pyts itakura dtw: 8.51087
print("\n[INFO] itakura Calculation:")
print("-- tslearn itakura dtw: {:.5f}".format(
ts_dtw(seq_0, seq_1, itakura_max_slope=6)))
print("-- pyts itakura dtw: {:.5f}".format(
py_dtw(seq_0, seq_1, method="itakura", options={"max_slope": 6})))
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Numba 0.49.1
Pyts 0.11.0
tslearn: '0.4.1'
fastdtw: See pypi
Reading the docs, it appears that it may not have any explanation for the colours and square values inside the MTF image ? Do you think this info is worth adding?
pip install pyts
Failed building wheel for llvmlite
RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
Hi, thanks for this great library.
I am trying to create recurrence plots for a time series of geometric brownian motion, but however I try to set the parameters in RecurrencePlot
, I keep getting errors.
import numpy as np
from pyts.image import RecurrencePlot
from sklearn.preprocessing import MinMaxScaler
# create a univariate (one feature) time series
x = np.random.normal(0, 0.01, 10000).cumsum()
x = MinMaxScaler().fit_transform(x.reshape(-1, 1))
print(x.shape)
>>> (10000, 1)
rp = RecurrencePlot(dimension=20, threshold=1.)
rp.fit_transform(x)
>>> ValueError: If 'dimension' is an integer, it must be greater than or equal to 1 and lower than or equal to n_timestamps (got 20).
I have dimension
as an integer, it is set to 20, so it is greater than or equal to 1. What I don't understand is how 20 isn't lower than n_timestamps
, because I do not fully understand what n_timestamps
is.
As far as I understand my data x
, it is shaped (10000, 1)
which is (n_samples, n_features)
i.e. each row is a unique 'sample' ordered chronologically, and it only has one column (one 'feature') as the time series is univariate. In addition, the documentation for RecurrencePlot.fit_transform
states that the input X
must have shape [n_samples, n_features]
, which as far as I understand, my data does have that shape.
What am I doing wrong here? What is the difference between n_samples
and n_timestamps
? Thanks in advance
There is an error in the code corresponding to MTF that results in a shape mismatch whenever it is run. I believe that the repository is up to date and the error has been corrected and it seems to work fine. The MTF in the pip package however does not work and it needs to be updated. Changing the image.py works just fine, but it does not work right out the box.
Great package overall, thanks. :D
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.