Comments (5)
Hi,
First of all, I would like to mention that variable length time series are unfortunately badly supported in this library for the moment. The reasons for this are twofold: (i) algorithms introduced in their original papers were rarely meant to deal with variable length time series (because of the lack of such data sets in the UCR Time Series Classification repository) and I wanted to implement the algorithms as they were described in the papers, and (ii) it's obviously easier and more efficient to work with fixed length time series using NumPy arrays. Therefore, padding shorter time series with a fixed value is likely to introduce some issues.
More specifically on the WEASEL
algorithm, the first steps are:
- Extracting non-overlapping subsequences of each time series.
- Applying the SymbolicFourierApproximation algorithm on these subsequences, which consists of two steps:
i. Extracting some discrete Fourier coefficients for these subsequences.
ii. Discretizing (i.e. binning) these Fourier coefficients.
An error is raised when two back-to-back bin edges are equal, because in this case a bin is empty ([a, a)
is an empty interval for any real number a
). It would be possible to simply remove this bin, but in this case the number of bins would be smaller for this feature, which would be an issue.
Now let's have a look at the different strategies to compute the bin widths:
'uniform'
: All bins in each sample have identical widths'quantile'
: All bins in each sample have the same number of points'normal'
: Bin edges are quantiles from a standard normal distribution'entropy'
: Bin edges are computed using information gain
With strategy='uniform'
, you're unlikely to face this issue, because it only uses the minimum and maximum values to compute the extreme bin edges, and the intermediate bin edges are computed using simply a linear interpolation (the only possibility would be to have a constant feature). With strategy='normal'
, you will never face this issue because the bin edges are drawn from the quantiles of the standard normal distribution. With strategy='entropy'
, you may face this issue. Finally, with strategy='quantile'
is the strategy for which you're the most likely to face this issue, because it uses the quantiles of the feature to compute the bin edges. If you have a value that is very common in your feature, then it's possible that two back-to-back bin edges will be equal. A natural solution in that case is to decrease the number of bins (or to change the way the bin edges are computed).
Going back to your use case, this is where zero padding becomes an issue. You may have some subsequences that contain only zeros. All the Fourier coefficients will also be equal to zero. And if you have many subsequences that contain only zeros, then you will have a feature (i.e. a Fourier coefficient) that will have many zeros, and the aforementioned issue will occur.
Hope this helps you a bit to understand the reasoning of what's going on under the hood.
from pyts.
I ran into the problem of "constants" as well when I first started using this library. This is to do with the self checks where the library evaluates consecutive constants as an error. The issue can be resolved by reducing the number of bins, but that is not practical in most cases.
The solution is to edit the discretizer.py in "site-packages\pyts\preprocessing" under the library location. Find the following line:
self._check_constant(X)
and comment it out. That should resolve your problem.
from pyts.
Hi I tried commenting out the line of code you mentioned but I still get the error. It is a ValueError from the following function:
def _compute_bins(self, X, y, n_timestamps, n_bins, strategy):
if strategy == 'normal':
bins_edges = norm.ppf(np.linspace(0, 1, self.n_bins + 1)[1:-1])
elif strategy == 'uniform':
timestamp_min, timestamp_max = np.min(X, axis=0), np.max(X, axis=0)
bins_edges = _uniform_bins(timestamp_min, timestamp_max,
n_timestamps, n_bins)
elif strategy == 'quantile':
bins_edges = np.percentile(
X, np.linspace(0, 100, self.n_bins + 1)[1:-1], axis=0
).T
if np.any(np.diff(bins_edges, axis=0) == 0):
raise ValueError(
"At least two consecutive quantiles are equal. "
"Consider trying with a smaller number of bins or "
"removing timestamps with low variation."
)
else:
bins_edges = self._entropy_bins(X, y, n_timestamps, n_bins)
return bins_edges
from pyts.
Something interesting I found out is that when I change the strategy to uniform it seems to work.... I didn't need to commented out the line you suggested above. Also, all strategies work except the quantile which throws the error....
transformer = WEASELMUSE(strategy='uniform',word_size=4, window_sizes=np.arange(5, 105))
from pyts.
Thanks for the clear explanation @johannfaouzi. So the fix seems to be that we lower the number of bins with strategy='quantile'
method.
It will be great to see the strategies working with variable length time-series data, as these occur in majority of the use-cases where time-series data is involved.
from pyts.
Related Issues (20)
- Singular Spectrum Analysis decomposition method HOT 7
- Question about the 'strategy' parameter in SymbolicAggregateApproximation() HOT 4
- question about the diagonal line of RP plots HOT 2
- How do I import a local dataset that is neither UCR nor UEA HOT 2
- How can we reverse image to time-series HOT 1
- Error with Sklearn hyper parameter search with Multivariateclassifier HOT 1
- JointRecurrencePlot return zero HOT 1
- Shapelet Transform HOT 18
- Some datasets from the UEA & UCR Time Series Classification Archive do not contain arff files. HOT 2
- Question on the Interpretability of TSBF HOT 1
- Add license to readme HOT 1
- Difference between SAX and KBinsDiscretizer HOT 4
- BOSSVS not working with a single feature HOT 1
- Question about the input to compute SymbolicAggregateApproximation HOT 2
- what is about multivariate time series
- Request for Code or Guidance on Reversing Gramian Angular Fields Transformation HOT 3
- Porting WEASEL 2.0 to pyts HOT 3
- Learning Shapelets HOT 1
- [BUG] `ShapeletTransform` sporadically returns nested `numpy` arrays as `shapelets_` fitted parameter HOT 3
- [ENH] making `pyts` searchable via `sktime`, interfaces & collaboration HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyts.