Giter Site home page Giter Site logo

johannfaouzi / pyts Goto Github PK

View Code? Open in Web Editor NEW
1.7K 25.0 162.0 7.7 MB

A Python package for time series classification

Home Page: https://pyts.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
python classification machine-learning time-series-classification time-series-analysis time-series

pyts's People

Contributors

avisp avatar darigovresearch avatar hareyakana avatar johannfaouzi avatar kinow avatar lgtm-com[bot] avatar lucasplagwitz avatar rth avatar svenbarray avatar tobcar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyts's Issues

Inquiry about Window sliding in WEASEL

Description

Thank you for your contribution to implementing this awesome algo!
I was working on your WEASEL code, but I just figured out that the windowing is not overlapping. Do you mind if I ask you why?

Steps/Code to Reproduce

windowed_view(X,12,12)[0]

Versions

MTF np.digitize error: bins must be monotonically increasing or decreasing

Hi,

I am creating MTF matrices from the Lightning2 time-series dataset using the pyts module. When entering a high number for the quantile bins (n_bins) the MTF transformation does not succeed:

from scipy.io import arff
import pandas as pd
from pyts.image import GASF, GADF, MTF

lighttrain = arff.loadarff('Datasets/Lightning2/Lightning2_TRAIN.arff')
dftrain0 = pd.DataFrame(lighttrain[0])
dftrain = dftrain0.drop('target', axis=1) #taking away the class labels

matsize = 16
qsize = 40

dfmtftr = MTF(matsize,n_bins=qsize).fit_transform(dftrain)

Error in console:

File "/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 462, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/anaconda3/lib/python3.6/site-packages/pyts/image/image.py", line 310, in transform
for i in range(n_samples)])
File "/anaconda3/lib/python3.6/site-packages/pyts/image/image.py", line 310, in
for i in range(n_samples)])
ValueError: bins must be monotonically increasing or decreasing

theoretically I should be able to sort the data into as many bins as I want, shouldn't I? I cannot see a reason in the image.py source-code why the error should occur at n_bins=40 but not at n_bins=8. Are there specific boundaries for image_size and n_bins?
The source time-series has a length of 637.

Error when importing GASF

Description


ImportError Traceback (most recent call last)
in
----> 1 from pyts.image import GASF, GADF, MTF

ImportError: cannot import name 'GASF' from 'pyts.image' (/home/jamazzon/anaconda3/envs/fastaiv3_m2/lib/python3.7/site-packages/pyts/image/init.py)

Steps/Code to Reproduce

from pyts.image import GASF, GADF, MTF

Versions

NumPy 1.16.2
SciPy 1.2.1
Scikit-Learn 0.20.3
Numba 0.43.1
Pyts 0.8.0

WEASEL Implementation

I don't have a bug or issue to report. Just some questions. First, the word-bank size for the Weasel transformation is limited to 26. Apparently this is because there are 26 letters in the English alphabet, but this seems like a rather arbitrary limitation. Why was this designed this way? Secondly, is it possible to serialize a Weasel transformer? suppose I have a large time-series that takes a while to model. Shouldn't there be a warm-start process?, best if it's serializable to str or json, but even pickle would be nice. if it is let me know!

Thanks for the great package!

Andrew

Bin Edges Are all zero - Value Error

Description

I have data in the same format used for load_basic_motions(return_X_y=True)
But when I created my data set I had to pad some time series with zeros
This made the length all the same and I ended up with a ndarray fo X_train shape of (177,12,111) and y_train hape of 177

When I run clf.fit I get a ValueError that says the following:

 At least two consecutive quantiles are equal. Consider trying with a smaller number of 
 bins or removing timestamps with low variation.

the Bin edges seem to all be zero

Is this because of the padding?

Steps/Code to Reproduce

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.5, random_state = 42)
    transformer = WEASELMUSE()
    logistic = LogisticRegression(random_state=1, max_iter=10000,solver='liblinear', multi_class='ovr')
    clf = make_pipeline(transformer, logistic)
    clf.fit(X_train, y_train)

Versions

NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0

WEASEL+MUSE with Samples of Different Lengths

If each sample is not the same length of n_timestamps, is it possible to perform WEASEL+MUSE on that dataset?

If I construct an array of (n_samples, n_features, n_timestamps) with n_timestamps that aren't equal, I get an exception from the validation of the array.

If I pad the samples with None or if I pad with some constant value, I get an exception (NaNs not allowed / quantiles equal).

Is there a solution/workaround to this problem, or am I chasing something that isn't allowed?

Thank you for a great package!

from pyts.multivariate.transformation import WEASELMUSE

# Create Fake Simple Dataset
# 3 Samples, 2 Features, Different n_timestamps
X = [[[3, 1, 0, 2, 2], [2, 2, 3, 3, 5]],
     [[0, 1, 0, 5, 2, 4], [3, 0, 0, 2, 4, 2]],
     [[1, 0, 1, 3, 5], [1, 3, 1, 2, 3]]]
y = [1, 0, 1]

transformer = WEASELMUSE(word_size=4, n_bins=2, window_sizes=[8],
                         chi2_threshold=15, sparse=False)
X_new = transformer.fit_transform(X, y)

Release of version 0.8.0

Version 0.8.0 is available on PyPI. A lot of changes have been made to the package, with no backward compatibility unfortunately (but it was necessary). The package was significantly improved, in terms of optimization, code cleaniless and documentation. Here is a summary of the changes:

  • No more Python 2 support

  • New package required: numba

  • Updated required versions of packages

  • Modification of the API:

    • quantization module merged in approximation and removed

    • bow module renamed bag_of_words

    • Fewer acronyms used for the names of the classes: if an algorithm has a name
      with three words or fewer, the whole name is used.

    • More preprocessing tools in preprocessing module

    • New module metrics with metrics specific to time series

  • Improved tests using pytest tools

  • Reworked documentation

  • Updated continuous integration scripts

  • More optimized code using numba

Dimensionality Reduction (approximation) along columns/time axis?

Hi,

I wonder why the approximation functions PAA and DFT are applied to the rows? In my opinion based on what I found in the papers and dissertation of Patrick Schäfer, this should be applied to the columns (along the time axis). Am I wrong?

For example the code below returns an error:

import numpy as np
import pyts.approximation as pya

x = np.random.randn(100,2)

paa = pya.PAA(window_size=10)
x_paa = paa.fit_transform(x)

print('Shape of x {}'.format(x.shape))
print('Shape of x_paa {}'.format(x_paa.shape))

ValueError: 'window_size' must be lower or equal than the size of each time series.

However, what i´ve been expecting is the following:

import numpy as np
import pyts.approximation as pya

x = np.random.randn(100,2)

paa = pya.PAA(window_size=10)
x_paa = paa.fit_transform(x)

print('Shape of x {}'.format(x.shape))
print('Shape of x_paa {}'.format(x_paa.shape))

Shape of x (100, 2)
Shape of x_paa (10, 2)

Regards,
legout

Query about pyts 0.7.5

I am using pyts 0.7.5 version and using GAF to convert time series data into images. When I am using transform(self, X), Shape of is defined as:
X : array-like, shape = [n_samples, n_features].

Here, what is the purpose of n_features?

Multivariate time series classification

Hi,

Great work with time series, congrats!

It is a question instead an issue, i'm working in a multivariate time series classification problem, the dataset is in the form:

feature 1 feature 2 ... feature n label time_series_group
23.4 12.5 ... 1.34 Class 1 1
27.3 14.7 ... 2.46 Class 1 1
... ... ... ... Class 1 1
26.5 11.2 ... 2.32 Class 1 1
28.4 11.5 ... 1.54 Class 1 2
27.3 14.3 ... 2.09 Class 1 2
... ... ... ... Class 1 2
21.2 14.9 ... 3.34 Class 1 2
25.4 12.5 ... 1.34 Class 2 3
28.3 14.7 ... 2.46 Class 2 3
... ... ... ... Class 2 3
29.5 11.9 ... 1.35 Class 2 3
27.4 10.5 ... 1.54 Class 2 4
21.3 17.3 ... 2.09 Class 2 4
... ... ... ... Class 2 4
26.4 13.6 ... 2.47 Class 2 4

The dataset above isn´t real, it is only an example. The column time_series_group identifies the quantity of rows that represent the information belonging a multidimensional time series. For instance, we going to consider that each group of time series have the same number of rows m, so, the first m rows (identified by the time_series_group = 1) have the information of the n features that represent the Class 1. Each column of features, inside a group number 1, represent a time series of the feature behavior, therefore, the Class 1 is identified by the n (number of features) time series.

Is there a way to solve this kind of problem with pyts? If not, do you have any thoughts about how to approach it?

plot_ssa.py example fails on 0.7.0

ValueError: could not broadcast input array from shape (48) into shape (1)

(both from the pip package and git)

Very useful toolbox, thanks.

no negative values

greetings. i've been trying to use your DTW library to shift two series (DEPTH in this case) to each other .

a value is generated howerver . it's always postive . and in my case i need it to specifiy is the shift up (negative) or down (postive)

for each barrel

https://github.com/sudomaze/core-dtw

please guide me to the correct way

Error executing MTF example

Hi,

I was running the for Markov Transition Field example and getting the following error

X_mtf = mtf.transform(X)
D:\Anaconda3\envs\tf-gpu4\lib\site-packages\pyts\image\image.py:321: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
MTF[np.meshgrid(list_values[i], list_values[j])] = MTM[i, j]
Traceback (most recent call last):

File "", line 1, in
X_mtf = mtf.transform(X)

File "D:\Anaconda3\envs\tf-gpu4\lib\site-packages\pyts\image\image.py", line 301, in transform
remainder)

File "D:\Anaconda3\envs\tf-gpu4\lib\site-packages\numpy\lib\shape_base.py", line 357, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))

File "D:\Anaconda3\envs\tf-gpu4\lib\site-packages\pyts\image\image.py", line 336, in _mtf
np.arange(start[j], end[j])].mean()

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (5,)

GASF and GADF example worked fine. Kindly assist.

Regards,
Avi

Feature request: Handle overlapping samples in shapelet methods

A common approach to constructing a sample for time series classification is to take one long time series and break it into smaller, overlapping series which each have their own label. However, if you have overlapping sequences, the ShapeletTransform and LearningShapelets algorithms in pyts return the same shapelets, since these will occur in multiple samples. Setting remove_similar = True in ShapeletTransform does not resolve this since it only excludes similar shapelets "taken from the same time series". It would be great if these algos would consider only a single instance of a shapelet that is shared across multiple time series samples. At the moment, I manually remove identical shapelets from the output (so that the number of shapelets I actually end up with is < n_shapelets).

I need help understanding the terminology in the docs

The documentation commonly uses this tuple: (samples, timestamps). That doesn't make any sense in my brain as I've always thought of those being the same thing. If I'm sampling something, I'm reading a sensor value periodically. I could create a timestamp for that sample, but I also have the sensor's value at that time. My input data is (samples, sensor values). It has one row for each time I read the sensors, and a column for the value of each sensor. I think this is called the "wide" data format. Is pyts compatible with the wide data format? Or is there an easy way to transform my data into something compatible with pyts?

will it work for multivariate time series prediction classification

great code thanks
may you clarify :
will it work for multivariate time series prediction both regression and classification
1
where all values are continues values
2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values

color        weight     gender  height  age  

1 black 56 m 160 34
2 white 77 f 170 54
3 yellow 87 m 167 43
4 white 55 m 198 72
5 white 88 f 176 32

Relationship with existing scikit-learn IR methods

Just a few thoughts from quickly running the examples, regarding the relationship of some of these estimators to those of scikit-learn.

Maybe mentioning some of it in the docstrings could be useful for users familiar with scikit-learn,

  • pyts.transformer.StandardScaler is similar to sklearn.preprocessing.StandardScaler except that the scaling is done along axis=1 instead of 0, (i.e. equivalent of using sklearn.preprocessing.scale(X, axis=1)) with an additional "empirical variance" added during the normalization.

  • For VSM(window_size=4, numerosity_reduction=False) + SAXVSMClassifier, I wonder if it's equivalent to running
    CountVectorizer(analyzer='char', ngram_range=(4, 4)) on the SAX output, followed by class-wise normalization with TfidfTransformer and finally applying the NearestCentroid(metric='cosine'). I've not read the SAX-VSM paper in detail, so I might be missing something.

In any case these are interesting analogies, and something that might be used for unit tests #3 (if true). Also using the scikit-learns TF-IDF methods, that have been optimized for a while, will probably be computational faster (in particular they use sparse arrays instead of lists).

I know you don't plan to actively develop this package at the moment, so just writing it for future reference..

ImportError: cannot import name 'RecurrencePlots' from 'pyts.image'

Description

Got error when import this lib

Steps/Code to Reproduce

Just using pip3 install pyts to install this lib, an get the following error:

$ python3
Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyts.image import RecurrencePlots
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'RecurrencePlots' from 'pyts.image' (/usr/local/lib/python3.8/site-packages/pyts/image/__init__.py)

Versions

NumPy 1.18.1
SciPy 1.5.2
Scikit-Learn 0.23.2
Numba 0.51.2
Pyts 0.11.0

ZeroDivisionError in LearningShapelets

Description

I have a ZeroDivisionError trying to fit LearningShapelets
data.txt

Steps/Code to Reproduce

import pandas as pd
from pyts.classification.learning_shapelets import LearningShapelets

train_data = pd.read_csv('data.txt', index_col=0)
y_train = train_data['class']
x_train = train_data.drop('class', axis=1)
pls = LearningShapelets()
pls.fit(x_train, y_train)

Versions

NumPy 1.18.5
SciPy 1.4.1
Scikit-Learn 0.22.2
Numba 0.51.2
Pyts 0.11.0

Release of version 0.9.0

Version 0.9.0 is available on PyPI (and on Anaconda Cloud via the conda-forge channel for this first time!). Brief summary of the changes:

  • Add datasets module with dataset loading utilities
  • Add multivariate module with utilities for multivariate time series
  • Revamp the tests using pytest.mark.parametrize
  • Add an Examples section in most of the public functions and classes
  • Require version 1.3.0 of scipy: this is required to load ARFF files with relational attributes using scipy.io.arff.loadarff

ImportError: cannot import name 'PowerTransformer'

Description

Steps/Code to Reproduce

<< 
from pyts.image import mtf

Traceback (most recent call last):
  File "C:\Users\BBD\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-33-02178b70bcc1>", line 1, in <module>
    from pyts.image import mtf
  File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\image\__init__.py", line 6, in <module>
    from .gaf import GramianAngularField
  File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\image\gaf.py", line 8, in <module>
    from ..approximation import PiecewiseAggregateApproximation
  File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\approximation\__init__.py", line 4, in <module>
    from .sax import SymbolicAggregateApproximation
  File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\approximation\sax.py", line 6, in <module>
    from ..preprocessing import KBinsDiscretizer
  File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\preprocessing\__init__.py", line 4, in <module>
    from .transformer import PowerTransformer, QuantileTransformer
  File "E:\pycharm\PyCharm 2018.3.2\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\BBD\Anaconda3\lib\site-packages\pyts\preprocessing\transformer.py", line 4, in <module>
    from sklearn.preprocessing import PowerTransformer as SklearnPowerTransformer
ImportError: cannot import name 'PowerTransformer'


>>

Versions

Range of values for images

Just wondering, are the image pixels values in range [0 255] for all image encodings (Gaf, Mtf, RecPlot)?

Question on whether it is possible to optimize many Recurrence Plots production

So recurrence plot pyts.image.RecurrencePlot works really nicely, but can be a bit slow. I want to write a function that would make a movie for threshold='point' and percentage in range 0 to 100. I can do it trivially by repeating the plot independently 100 times, but I presume that (theoretically) some part of the work can be re-used if percentage is changed but data stays the same. Can this be done?

Issues getting WEASEL transform of just one single time series

I'm trying to get a WEASEL transform of just one single time series and am running into issues, see below.

Please advise.

Thanks.

from pyts.transformation import WEASEL

Parameters

n_samples, n_timestamps = 1, 100
n_classes = 1

Toy dataset

rng = np.random.RandomState(41)
X = rng.randn(n_samples, n_timestamps)
y = rng.randint(n_classes, size=n_samples)

WEASEL transformation

weasel = WEASEL(word_size = 2, n_bins = 2, window_sizes=[12, 36])

X_weasel = weasel.fit_transform(X, y).toarray()

X_weasel = weasel.fit_transform(X, y)

X_weasel = weasel.fit_transform(np.array(X), np.array(y)).toarray()

Visualize the transformation for the first time series

plt.figure(figsize=(12, 8))
vocabulary_length = len(weasel.vocabulary_)
width = 0.3
plt.bar(np.arange(vocabulary_length) - width / 2, X_weasel[0],
width=width, label='First time series')
plt.xticks(np.arange(vocabulary_length),
np.vectorize(weasel.vocabulary_.get)(np.arange(X_weasel[0].size)),
fontsize=12, rotation=60)
plt.yticks(np.arange(np.max(X_weasel[:2] + 1)), fontsize=12)
plt.xlabel("Words", fontsize=18)
plt.ylabel("Frequencies", fontsize=18)
plt.title("WEASEL transformation", fontsize=20)
plt.legend(loc='best')
plt.show()


ValueError Traceback (most recent call last)
in
13 weasel = WEASEL(word_size = 2, n_bins = 2, window_sizes=[12, 36])
14 # X_weasel = weasel.fit_transform(X, y).toarray()
---> 15 X_weasel = weasel.fit_transform(X, y)
16 # X_weasel = weasel.fit_transform(np.array(X), np.array(y)).toarray()
17

~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/transformation/weasel.py in fit_transform(self, X, y)
258 )
259 y_repeated = np.repeat(y, n_windows)
--> 260 X_sfa = sfa.fit_transform(X_windowed, y_repeated)
261
262 X_word = np.asarray([''.join(X_sfa[i])

~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/sfa.py in fit_transform(self, X, y)
157 )
158 self.pipeline = Pipeline([('dft', dft), ('mcb', mcb)])
--> 159 X_sfa = self.pipeline.fit_transform(X, y)
160 self.support
= self.pipeline.named_steps['dft'].support
161 self.bin_edges
= self.pipeline.named_steps['mcb'].bin_edges

~/anaconda3/envs/tf36/lib/python3.6/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
391 return Xt
392 if hasattr(last_step, 'fit_transform'):
--> 393 return last_step.fit_transform(Xt, y, **fit_params)
394 else:
395 return last_step.fit(Xt, y, **fit_params).transform(Xt)

~/anaconda3/envs/tf36/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
554 else:
555 # fit method of arity 2 (supervised transformation)
--> 556 return self.fit(X, y, **fit_params).transform(X)
557
558

~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in fit(self, X, y)
113 self.check_constant(X)
114 self.bin_edges
= self._compute_bins(
--> 115 X, y, n_timestamps, self.n_bins, self.strategy)
116 return self
117

~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in _compute_bins(self, X, y, n_timestamps, n_bins, strategy)
207 )
208 else:
--> 209 bins_edges = self._entropy_bins(X, y, n_timestamps, n_bins)
210 return bins_edges
211

~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in _entropy_bins(self, X, y, n_timestamps, n_bins)
221 "The number of bins is too high for feature {0}. "
222 "Try with a smaller number of bins or remove "
--> 223 "this feature.".format(i)
224 )
225 bins[i] = threshold

ValueError: The number of bins is too high for feature 0. Try with a smaller number of bins or remove this feature.

samples with different length

If I understand the WEASEL+MUSE algorithm correctly it should be possible to use it with samples of different lengths.
This is currently not possible with the API of the WEASELMUSE class which expects a 3d array in the shape = (n_samples, n_features, n_timestamps) since a numpy array has the same shape for all samples.

I tried to fill the time series of all samples to the length of the longest samples with nan values, but the input checks reject nan values.
Is there a way to achieve using samples of different lengths?

VisibleDeprecationWarning

Description

Hi, I am getting many of these warnings with ShapeletTransform:

D:\Anaconda3\envs\tipjar\lib\site-packages\numpy\core\_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)

Steps/Code to Reproduce

st = ShapeletTransform(n_jobs=cpus)
    features = st.fit_transform(X_trn_seg, y_trn_seg)

Versions

NumPy 1.19.1
SciPy 1.4.1
Scikit-Learn 0.23.1
Numba 0.49.1
Pyts 0.11.0

Reproducing results from WEASEL paper

Hi --

Do you have an example of using WEASEL to get results close to those reported in the paper? I'm playing around w/ using WEASEL as a featurizer on some of the UCR datasets, but my results aren't nearly as good as in the paper -- I'm guessing I'm not using the right hyperparameters. Any ideas?

Thanks!

Tuple 'shapelets' length must be smaller than 1000.

Description

Hi, I am getting this error message

st = ShapeletTransform(n_jobs=cpus)
st.fit(X_trn_seg, y_trn_seg)
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 253, in __call__
    for func, args, kwargs in self.items]
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 253, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 522, in _fit_one_time_series
    X, window_sizes, shapelets, lengths, fit=True)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 115, in _derive_all_distances
    X, n_samples, n_timestamps, window_sizes, shapelets, lengths
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/numba/core/dispatcher.py", line 404, in _compile_for_args
    error_rewrite(e, 'unsupported_error')
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/numba/core/dispatcher.py", line 344, in error_rewrite
    reraise(type(e), e, None)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/numba/core/utils.py", line 80, in reraise
    raise value.with_traceback(tb)
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure IR is legal prior to lowering)
Tuple 'shapelets' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.

File "../anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 71:
@njit()
def _derive_all_squared_distances_fit(
^

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pipeline.py", line 25, in <module>
    X_trn, st = shaplet(X_trn_seg, y_trn_seg, X_trn)  # Shapelet features
  File "/home/jmrichardson/tipjar/tipjar/prepare/shapelet.py", line 14, in shaplet
    st.fit(X_trn_seg, y_trn_seg)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 304, in fit
    window_steps, n_jobs, rng) = self._check_params(X, y)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 499, in _check_params
    window_sizes = self._auto_length_computation(X, y, rng, n_jobs)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 623, in _auto_length_computation
    self.remove_similar, n_jobs, rng
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 570, in _fit
    for i in range(n_samples))
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 1042, in __call__
    self.retrieve()
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/home/jmrichardson/anaconda3/envs/tipjar/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure IR is legal prior to lowering)
Tuple 'shapelets' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.

File "../anaconda3/envs/tipjar/lib/python3.7/site-packages/pyts/transformation/shapelet_transform.py", line 71:
@njit()
def _derive_all_squared_distances_fit(
^

Versions

NumPy 1.19.2
SciPy 1.4.1
Scikit-Learn 0.23.1
Numba 0.49.1
Pyts 0.12.dev0

WEASEL+MUSE large number of features

Description

When using WEASELMUSE for multivariate time series classification the result of the trandformer give a very large number of features 650,000. Also, the number of counter in the histogram is sometime zero for some examples. Is this expected?

Steps/Code to Reproduce

I used my own data set the result of X_weasel was an ndarray size 1500 x 650000.
The 1500 makes sense as this is the number of examples I had, but the 650000 seems large.
I use the following code below. Also, when using the same code in the example when loading basic motions I get similar results. Large number of feature and some examples with all zeros. Thus if I plot the histogram there is nothing to plot.

    transformer = WEASELMUSE(strategy='uniform', word_size=4, window_sizes=np.arange(5, 70), sparse=False)
    X_weasel = transformer.fit_transform(X_train, y_train)

Versions

NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0

MTF some arrays with no constant error!

The MTF example ( https://pyts.readthedocs.io/en/latest/auto_examples/image/plot_mtf.html ) ran on my machine, so I know the setup is complete. But for some reason I am getting this error when I pass my 2-D array:

user@ubuntu:~/Desktop/pyts_ex$ python prog.py
Traceback (most recent call last):
File "prog.py", line 61, in
x_train_np_new = mtf.fit_transform(x_train_np)
File "/home/user/.local/lib/python2.7/site-packages/sklearn/base.py", line 464, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/usr/local/lib/python2.7/dist-packages/pyts/image/mtf.py", line 142, in transform
X_binned = discretizer.fit_transform(X)
File "/home/user/.local/lib/python2.7/site-packages/sklearn/base.py", line 464, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/usr/local/lib/python2.7/dist-packages/pyts/preprocessing/discretizer.py", line 119, in transform
self._check_constant(X)
File "/usr/local/lib/python2.7/dist-packages/pyts/preprocessing/discretizer.py", line 140, in _check_constant
raise ValueError("At least one sample is constant.")
ValueError: At least one sample is constant.

My interpretation for this error is that pyts requires some kind of constant, which seems to be missing from my provided array, any thought how to fix this? Thanks.

Inverse transform for dimensionality reduction algos

Description

Hello! Is there a way to inverse_transform any of the 'approximation' methods? I have a dataset of simple 20,0001D timeseries only 30 units long (i.e., [20,000 x 30]) and I want to reduce this to say 5 (i.e. to [20,000 x 5]). So far I have been using PCA and kernel PCA, neither of which are meant for time series work, and I would like to compare performance with the various algorithms in this package, but unfortuantely I did not find a way to inverse-transform the reduced time series.

Steps/Code to Reproduce

<< your code here >>

Versions

Questing regarding loading UEA public datasets

Hi, first of all, I really appreciate this wonderful library for processing time-series related issues. I am using pyts to loading UEA datasets, but I found that when I load a binary-class dataset, but the loaded labels are not binary. After debugging, I guess this might some issues existed with the line I provided below.

y.append(X_data[i][1])

I think the last number, X_data[i][-1], should be the label, either -1 or 1, instead of X_data[i][1]. Moreover, the X should drop the last column, which stands for labels.

I am not sure that my interpretation is correct or not. I look forward to hearing from you, and I wish this powerful tool becomes better and better. Thanks so much. :)

Unit tests

Very nice package. I was just wondering if there were any plans to add unit tests ?

Generate multiple image from multivariate time series

Description

I was wondering, is fitted.shape equal to (6, 6, 6) because it fitted and transformed for each feature and generated an image for each one, so I have 6 6x6 images?

Steps/Code to Reproduce

This code shows what I mean. fitted.shape is (6, 6, 6).

from pyts.image import GramianAngularField
gasf = GramianAngularField(image_size=6, method='summation')
cueSamples = numpy.zeros((6, 314)) # shape = (6, 314)

fitted = gasf.fit_transform(cueSamples)
fitted.shape # (6, 6, 6)

Versions

NumPy 1.20.2
SciPy 1.6.2
Scikit-Learn 0.24.1
Numba 0.53.1
Pyts 0.11.0

Creating a sequence of GAF's from a timeseries

I'm looking to train a Connectionist Temporal Classification (CTC) classifier. The input is a sequence of tensors of length N and the output a sequence of length M, M<N. I want to use a Gramian Angular Field to encode the input sequence.

From what I understand pyts Gramian Angular Field encodes the entire input to a single output? So given a series of 1x1000 where 1 is the batch dimension and 1000 is the series length, I get back a single tensor 1x32x32, what I want is Bx32x32 where B is the number of windows.

Is there a way of doing this using pyts? I'm guessing I could just reshape the input from 1x1000 to say 10x100 but is there a transform which does this, perhaps with overlap etc?

Parameter 'overlapping' in GramianAngularField clarification

This is a question, and I hope it is ok to ask it here.
While looking at the pyts.image.gaf module I've come across the parameter 'overlapping' that says "If True, reduce the size of each time series using PAA with possible overlapping windows."

However, I am not clear what effect does this parameter have on GAF image calculation? By how much does it reduce the size of the time series using Piecewise aggregate approximation, and also how does it determine possible overlapping windows (what are those windows)?

Also, in the example of GramianAngularField (https://pyts.readthedocs.io/en/latest/auto_examples/image/plot_gaf.html) i see that fit transform does not pass the y values, or are they encoded in the variable X somehow ?

How do you use weasel to classify?

This is a question. My understanding is Weasel transforms original continuous-valued features to bag of pattern, then how to do classification after the transformation?

Recurrence plot example

Hi,

Please add the following code to the documentation. For me it was not clear that the input is multiple time series in the form of 2D array. I thought it is a plot image of the function. It turned up to be completely false. It is indeed different epochs/time series for each line of this 2D ... let's call it "input". Below is an example of a single time series (the output of a sin with 126 values) and the resulting single reccurence plot.

import numpy as np
import matplotlib.pyplot as plt
import datetime
import sys


from pyts.image import RecurrencePlot
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas


sin_func = np.arange(0,4*np.pi,0.1)   # start,stop,step
y = np.sin(sin_func)

X = np.reshape(y,(1, y.size)) #1 because we have only one input

# Recurrence plot transformation
rp = RecurrencePlot(threshold='point', percentage=20)
X_rp = rp.fit_transform(X)

plt.subplot(211)
plt.plot(y)
plt.subplot(212)
plt.imshow(X_rp[0],  cmap='binary', origin='lower')

Setup.py missing comma

There are 2 problems.
The first one is the misalign of the documentation (https://pyts.readthedocs.io/en/latest/) that point to latest commit and the version on conda and pip still at 0.10 (while latest is the 0.11).
Maybe it is not a problem but i think it usefull if docs are sync with the package.

The second problem is the setup.py of the repository.
At the line 18 there is:

INSTALL_REQUIRES = ['numpy>=1.15.4'
                    'scipy>=1.3.0'
                    'scikit-learn>=0.22.1'
                    'joblib>=0.12'
                    'numba==0.46.0']

It missing some comma and as result when doing some pip install -e . it doesn't work...

It must be replaced simply with:

INSTALL_REQUIRES = ['numpy>=1.15.4',
                    'scipy>=1.3.0',
                    'scikit-learn>=0.22.1',
                    'joblib>=0.12',
                    'numba==0.46.0']

Thank you

plt.show() for visualisation

I was trying this python library on my mac and notice that during visualisation from this library, it does not automatically show the plot unless I explicitly show it. I think it would be a nice feature that it automatically show the plot when I use the visualisation features. It is probably as simple as adding plt.show() at the end.

Inconsistent calculation results between tslearn, pyts and fastdtw

Description

Please forgive me for my poor English since English is not my native language.

I have read the source code and Sakoe_Chiba band generation examples from tslearn, pyts. The generation manner of the Sakoe_Chiba seems different between pyts and tslearn, which leads to different calculation results when comparing 2 sequences with different lengths. I have also read the source code of fastdtw(https://pypi.org/project/fastdtw/) when the radius parameter is different, the calculation results between fastdtw and pyts are also varied.

Codes

import numpy as np
from tslearn.metrics import dtw as ts_dtw
from pyts.metrics import dtw as py_dtw
from fastdtw import fastdtw

if __name == "__main__":
    np.random.seed(2020)
    seq_0 = np.random.randn(140)
    seq_1 = np.random.randn(50)

    # Experiment 1: Different DTW caculation(Consistent)
    # [INFO] DTW Calculation:
    # -- tslearn dtw: 8.26240
    # -- pyts dtw: 8.26240
    print("[INFO] DTW Calculation:")
    print("-- tslearn dtw: {:.5f}".format(ts_dtw(seq_0, seq_1)))
    print("-- pyts dtw: {:.5f}".format(py_dtw(seq_0, seq_1)))
py_dtw
    # Experiment 2: FastDTW calculation(Inconsistent)
    # [INFO] FastDTW Calculation:
    # -- FastDTW results: 8.67608
    # -- pyts FastDTW: 9.12243
    print("\n[INFO] FastDTW Calculation:")
    print("-- FastDTW results: {:.5f}".format(
        np.sqrt(fastdtw(seq_0, seq_1, radius=2, dist=lambda x, y: (x-y)**2)[0])))
    print("-- pyts FastDTW: {:.5f}".format(
        py_dtw(seq_0, seq_1, method="fast", options={"radius": 2})))

    # Experiment 3: Sakoe_Chiba calculation(Inconsistent)
    # [INFO] Sakoe_Chiba Calculation:
    # -- tslearn Sakoe_Chiba dtw: 8.26240
    # -- pyts Sakoe_Chiba dtw: 10.49161
    print("\n[INFO] Sakoe_Chiba Calculation:")
    print("-- tslearn Sakoe_Chiba dtw: {:.5f}".format(
        ts_dtw(seq_0, seq_1, sakoe_chiba_radius=5)))
    print("-- pyts Sakoe_Chiba dtw: {:.5f}".format(
        py_dtw(seq_0, seq_1, method="sakoechiba",  options={"window_size": 5})))

    # Experiment 4: itakura calculation(In this example, they are consistent, however, I haven't read the source code yet)
    # [INFO] itakura Calculation:
    # -- tslearn itakura dtw: 8.51087
    # -- pyts itakura dtw: 8.51087
    print("\n[INFO] itakura Calculation:")
    print("-- tslearn itakura dtw: {:.5f}".format(
        ts_dtw(seq_0, seq_1, itakura_max_slope=6)))
    print("-- pyts itakura dtw: {:.5f}".format(
        py_dtw(seq_0, seq_1, method="itakura",  options={"max_slope": 6})))

Versions

NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Numba 0.49.1
Pyts 0.11.0
tslearn: '0.4.1'
fastdtw: See pypi

Error with llvmlite

pip install pyts

Failed building wheel for llvmlite

RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config

Ambiguity between `n_samples` and `n_timestamps`

Hi, thanks for this great library.

I am trying to create recurrence plots for a time series of geometric brownian motion, but however I try to set the parameters in RecurrencePlot, I keep getting errors.

import numpy as np
from pyts.image import RecurrencePlot
from sklearn.preprocessing import MinMaxScaler

# create a univariate (one feature) time series
x = np.random.normal(0, 0.01, 10000).cumsum()
x = MinMaxScaler().fit_transform(x.reshape(-1, 1))
print(x.shape)
>>> (10000, 1)

rp = RecurrencePlot(dimension=20, threshold=1.)
rp.fit_transform(x)
>>> ValueError: If 'dimension' is an integer, it must be greater than or equal to 1 and lower than or equal to n_timestamps (got 20).

I have dimension as an integer, it is set to 20, so it is greater than or equal to 1. What I don't understand is how 20 isn't lower than n_timestamps, because I do not fully understand what n_timestamps is.

As far as I understand my data x, it is shaped (10000, 1) which is (n_samples, n_features) i.e. each row is a unique 'sample' ordered chronologically, and it only has one column (one 'feature') as the time series is univariate. In addition, the documentation for RecurrencePlot.fit_transform states that the input X must have shape [n_samples, n_features], which as far as I understand, my data does have that shape.

What am I doing wrong here? What is the difference between n_samples and n_timestamps? Thanks in advance

The pip package needs to be updated

There is an error in the code corresponding to MTF that results in a shape mismatch whenever it is run. I believe that the repository is up to date and the error has been corrected and it seems to work fine. The MTF in the pip package however does not work and it needs to be updated. Changing the image.py works just fine, but it does not work right out the box.
Great package overall, thanks. :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.