Comments (4)
Sorry for my mistake, what I said is indeed totally wrong.
When you mentioned KBinsDiscretizer, I actually thought about MultipleCoefficientBinning, which performs the discretization column-wise and is used in Symbolic Fourier Approximation.
KBinsDiscretizer is very similar to SAX indeed, it's just that it does not support the alphabet argument. If you look at the source code of SAX, you can see that it just uses KBinsDiscretizer and takes care of returning symbols (instead of integers) if necessary. KBinsDiscretizer is mainly there to provide most of the preprocessing tools in scikit-learn (which are applied column-wise, feature-wise) to pyts (which are applied row-wise, sample-wise).
Edit: most of the tools in the preprocessing module are rarely used in practice (or not really mentioned in the literature I think). They are just there for conveniency and usually just use the implementation in scikit-learn (transposing the input matrix, applying the implementation in scikit-learn, transposing the output matrix). It's not the case for KBinsDiscretizer because I didn't want to allow the use of the k-means strategy to compute the bins, and I wanted to add the 'normal' strategy.
pyts/pyts/approximation/sax.py
Lines 94 to 102 in 2434592
from pyts.
KBinsDiscretizer
is different from SAX because SAX is a "row-wise" discretization (i.e., discretizing each time series independently) while KBinsDiscretizer
is a "column-wise" discretization (i.e., discretizing each column independently). KBinsDiscretizer
is used for the Symbolic Fourier Approximation (SFA) algorithm which discretizes the Fourier coefficients of a time series.
If you want to reproduce the original SAX algorithm, you indeed need a pipeline consisting of:
- StandardScaler (so that each time series has zero mean and unit variance)
- PAA (to decrease the number of time points in the time series)
- SAX with normal strategy (to use the the quantiles of the standard normal distribution as cut-off values)
from pyts.
So, as I understand it, this means that given an input X in the form (n_samples, n_timestamps), the KBinsDiscretizer discretizes over the individual timestamps while SAX discretizes over the individual samples?
However, this would mean for the KBinsDiscretizer that it could not generate bins with n_samples = 1, which is not true according to my observation.
A further test of the equivalence of both methods over different numbers of bins and strategies suggests to me that these methods are the same. For Example:
strategy = 'normal'
n_bins = 10
ts = load_gunpoint(return_X_y=True)[0]
SCALER = StandardScaler()
ts = SCALER.transform(ts)
SAX = SymbolicAggregateApproximation(n_bins=n_bins, strategy=strategy, alphabet='ordinal')
KBINS = KBinsDiscretizer(n_bins=n_bins, strategy=strategy)
same = np.all(KBINS.transform(ts) == SAX.transform(ts))
print(same)
Outputs: True
from pyts.
thank you for clearing up the misunderstanding and the additional info about scikit-learns implementation!
from pyts.
Related Issues (20)
- why does strategys not have kmeans? HOT 2
- ImportError: cannot import name 'MTF' from 'pyts.image' HOT 3
- Singular Spectrum Analysis decomposition method HOT 7
- Question about the 'strategy' parameter in SymbolicAggregateApproximation() HOT 4
- question about the diagonal line of RP plots HOT 2
- How do I import a local dataset that is neither UCR nor UEA HOT 2
- How can we reverse image to time-series HOT 1
- Error with Sklearn hyper parameter search with Multivariateclassifier HOT 1
- JointRecurrencePlot return zero HOT 1
- Shapelet Transform HOT 18
- Some datasets from the UEA & UCR Time Series Classification Archive do not contain arff files. HOT 2
- Question on the Interpretability of TSBF HOT 1
- Add license to readme HOT 1
- BOSSVS not working with a single feature HOT 1
- Question about the input to compute SymbolicAggregateApproximation HOT 2
- what is about multivariate time series
- Request for Code or Guidance on Reversing Gramian Angular Fields Transformation HOT 3
- Porting WEASEL 2.0 to pyts HOT 3
- Learning Shapelets HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyts.