aeon-toolkit / aeon Goto Github PK
View Code? Open in Web Editor NEWA toolkit for machine learning from time series
Home Page: https://aeon-toolkit.org/
License: BSD 3-Clause "New" or "Revised" License
A toolkit for machine learning from time series
Home Page: https://aeon-toolkit.org/
License: BSD 3-Clause "New" or "Revised" License
The main
branch should be protected to avoid accidental pushes and changes should be introduced through PR. The rule should include admin/core devs as means of clearly communicating changes.
Some options are;
Do we also do the same for the twitter and linkedin?
We are currently in communication with the current owners on obtaining access to the scikit-time PyPi.
hi, what do you think we should do with sktime governance? Personally I would like to scrap it all and start again, but htat might be too disruptive?
The current governance roles are:
Is your feature request related to a problem? Please describe.
The alignment module was introduced with no clear demand. It introduced a new soft dependency, dtw-python, rather than use our own implementations. There is a whole infrastructure of base class etc, but there are no examples I can see. The functionality
"computes the full alignment between X[0] and X[1]" is available in the distances module as follows.
from sktime.distances import dtw_alignment_path
x = np.random.rand(10)
y = np.random.rand(10)
path=dtw_alignment_path(x, y)
print(path)
from sktime.distances import distance_alignment_path
path2=distance_alignment_path(x, y, metric="dtw")
print(path)
I dont think there is anyone here who wants to maintain this.
The only place it is used is the dists_kernels module in the class DistFromAligner, which itself is not used anywhere. In my view, distances or alignments between time series should be found by calling distance functions which can return alignments if required. If you want a distance matrix/kernel for a set of time series, call the pairwise_distances function. That is how it is done in distances. This is how other packages provide distances. In Aligner and dist_kerns there is a confusing set of things such as "Composer that creates distance from aligner."
Describe the solution you'd like
I want to delete the director alignment and the class DistFromAligner
Describe alternatives you've considered
An alternative would be to attempt to merge it with dist_and_kerns and distances. I do not want to do this
Additional context
these are the only usages of this module. There are no docs related to it I can see. It should be simple to get rid of it.
To ensure operations and releases we would require:
Do we keep the contributors? I'm not sure of the protocol
PyPi now has a release to install from, but aeon
does not have a conda install option.
I think we should remove all of this
https://github.com/scikit-time/scikit-time/blob/main/README.md
for now, and just have a simple message "This is a fork of sktime made by previous core developers, and is a work in progress" or something. Any thoughts on that?
just checking, but I assume we keep the same licence?
https://github.com/scikit-time/scikit-time/blob/main/LICENSE
The alignment module wraps a package called dtw-python which contains an implementation of dtw that is slower than our ours. The functionality of alignment is available in distances using our own distance functions. The only place it is used is in KNN, which I would like to remove
#47
#39
and dists_kernels, which I would also like to deprecate. @chrisholder
I would like to take the opportunity of the reboot to restructure and partially rewrite the notebooks in the directory examples. There is an issue of whether we need separate notebooks and webpages, lets discuss, I dont feel strongly, but whichever, the structure needs looking at (scikit learn has an examples and docs dir)
Currently they look like this, just a dumping ground with inconsistent naming. such as AA_datatypes_and_datasetds
I think at the root dir we should have some basic info and then one notebook per module, e.g.
getting_started.ipynb
data_input.ipynb
annotation.ipynb
classification.ipynb
clustering.ipynb
distances.ipynb
forecasting.ipynb
networks.ipynb
regression.ipynb
transformations.ipynb
then subfolder more detailed ones by module.
with links to subdirectories with more detail.
Python 3.7 reaches its end-of-life for security support in 4 months (https://endoflife.date/python). Maybe now is a good time to drop it if it is going to be a big deal?
Is your feature request related to a problem? Please describe.
There are two checks I think should happen in the classifier base class
Do we keep sktime
as our package name or change it to avoid conflicts with the previous version?
Currently, it would be like scikit-learn
, which calls their main package sklearn
.
Describe the bug
Function plot_series
has side-effects as y
forgets index name after calling plot_series
. Probably it can be fixed by simply adding a y = y.copy()
to the function.
To Reproduce
from sktime.datasets import load_airline
from sktime.utils.plotting import plot_series
y = load_airline()
y = y.to_frame()
y_name1 = y.index.name
plot_series(y)
y_name2 = y.index.name
assert y_name1 == y_name2
Expected behavior
Additional context
Versions
'0.15.0'
Should we change the parallel backend to threading in the classifiers?
Parallel(n_jobs=n_jobs, backend="threading")(
I have noticed issues when calling nested Parallel-constructs.
@MatthewMiddlehurst @TonyBagnall I recall, you noticed similarly?
Is your feature request related to a problem? Please describe.
for panel forecasting we use MultiIndex
and the inner index level has to be the time index. This is an established convention within the framework and should therefore raise a more informative error message and checks on that. Users might not be aware of this convention.
the terms mtype/machine type and scitype/stype were introduced with the datatype module. These are non standard terms that mean nothing to most people and are sure to cause confusion.
For example, the header for the _convert module is this
"""Machine type converters for scitypes."""
and comments like this
"obj : object to convert - any type, should comply with mtype spec for as_scitype
there is no reference here to what these mean nor a link to an explanation. As far as I can tell it is a reinvention of the concept of abstract data types with some weird inclusion of objects. So a scitype would be an abstract data type such as a list, an mtype would be an implementation of a list such as a linked list or an array. I see absolutely no need for this distinction and think all is does is confuse things
it basically gives a form out automation to convert, but I find it convoluted and confusing.
Exports
-------
convert_to(obj, to_type: str, as_scitype: str, store=None)
converts object "obj" to type "to_type", considerd as "as_scitype"
convert(obj, from_type: str, to_type: str, as_scitype: str, store=None)
same as convert_to, without automatic identification of "from_type"
mtype(obj, as_scitype: str)
returns "from_type" of obj, considered as "as_scitype"
---
I would like to remove these terms completely, and replace them with, for example
input type/internal type/output type
then have simple converters that convert from one to another and remove the scitype
just looking at usages of this, to_type is a required parameter, so all this does
def convert_to(
obj,
to_type: str,
as_scitype: str = None,
store=None,
store_behaviour: str = None,
):
is some convoluted checking, then calls convert! This adds no value I can see. In first instance I would just change the terminology, but really I doubt the below adds any value at all.
# input checks on to_type, as_scitype; coerce to_type, as_scitype to lists
to_type = _check_str_or_list_of_str(to_type, obj_name="to_type")
# sub-set a preliminary set of as_scitype from to_type, as_scitype
if as_scitype is not None:
# if not None, subset to types compatible between to_type and as_scitype
as_scitype = _check_str_or_list_of_str(as_scitype, obj_name="as_scitype")
potential_scitypes = mtype_to_scitype(to_type)
as_scitype = list(set(potential_scitypes).intersection(as_scitype))
else:
# if None, infer from to_type
as_scitype = mtype_to_scitype(to_type)
# now further narrow down as_scitype by inference from the obj
from_type = infer_mtype(obj=obj, as_scitype=as_scitype)
as_scitype = mtype_to_scitype(from_type)
# if to_type is a list, we do the following:
# if on the list, then don't do a conversion (convert to from_type)
# if not on the list, we find and convert to first mtype that has same scitype
if isinstance(to_type, list):
# no conversion of from_type is in the list
if from_type in to_type:
to_type = from_type
# otherwise convert to first element of same scitype
else:
same_scitype_mtypes = [
mtype for mtype in to_type if mtype_to_scitype(mtype) == as_scitype
]
if len(same_scitype_mtypes) == 0:
raise TypeError(
"to_type contains no mtype compatible with the scitype of obj,"
f"which is {as_scitype}"
)
to_type = same_scitype_mtypes[0]
This includes:
I suggest we just delete the current version
https://github.com/scikit-time/scikit-time/blob/main/CODEOWNERS
and give admins ownership of everything. Any thoughts?
Some estimators dont support Python 3.7 anymore, we should think about deprecating it. Also newer scikit-learn>=1.1.0
has no Python 3.7 anymore.
Probably we can bump then scikit-learn>=1.1.0
and possibly also other dependencies to require a higher version.
Is your feature request related to a problem? Please describe.
There seems to be quite a problem related to performance of some data checks and pd.MultiIndex
operations.
Describe the solution you'd like
Related: sktime/sktime#4139
The KNN Regressor and Classifier were redesigned to replace the inheritance model with containment. The algorithms now always work out the pairwise distance for the train data, which is then never used, since the algorithms that can exploit the triangle inequality to find neighbours are only usable with hard coded scikit-learn distances, not our elastic distances. Any option other than algorithm="brute" breaks the current version. This was all done against my opinion, but I lost the energy to fight it.
They also tightly couple to distances with a string list, rather than use the distance factory as before. This is bad practice, since the addition of new distances will also require a change to the classifier/regressor. Better to use that.
I will dig a little deeper into the scikit algorithm, but my preference would be to abandon the scikit version all together and just implement knn.
generally need to look at EarlyClassification. There is a deprecation warning
not sure if we still want to do this, and also TEASER is failing some classification tests, due we think to my refactoring of _class_dictionary in the Classifier base class.
see this PR for more info
Describe the bug
There was a new bug introduced with sktime==0.16.0
, the cutoff
attribute of a forecaster is now PeriodIndex
but in past it was Period
, same for Timestamp
and DatetimeIndex
To Reproduce
from sktime.datasets import load_airline
from sktime.forecasting.naive import NaiveForecaster
y = load_airline()
forecaster = NaiveForecaster(strategy="drift")
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1,2,3])
forecaster.cutoff
With sktime==0.15.1
this returns Period('1960-12', 'M')
and Timestamp('1960-12-31 00:00:00', freq='M')
With sktime==0.16.0
this returns PeriodIndex(['1960-12'], dtype='period[M]', name='Period')
and DatetimeIndex(['1960-12-31'], dtype='datetime64[ns]', name='Period', freq='M')
Expected behavior
Additional context
Versions
scikit-time is just a temporary name, and we need to select a new one for the project. Some suggestions:
scikit-time is live under https://join.slack.com/t/scikit-timeworkspace/shared_invite/zt-1ph9lewat-x1ZgqoPIydbEzuswe4fJmQ
Describe the bug
ProximityForest is broken and at this point it is easiest to deprecate it and maybe look for someone to reimplement it. See here for a discussion
To Reproduce
from sktime.classification.distance_based import ProximityForest
from sktime.datasets import load_unit_test
pf = ProximityForest()
trainX, trainy = load_unit_test()
pf.fit(trainX, trainy)```
this results in an infinite recursion in _fit leading to stack overflow.
RecursionError: maximum recursion depth exceeded while calling a Python object
this happens on all problems I tried (unit_test, arrow_head, GunPoint, ItalyPowerDemand)
Expected behaviour
It should fit and predict with default parameters on data in an allowed format.
Additional context
Removal previously blocked. The original author of PF is unavailable,
contrib was a folder we included at the beginning of sktime as a working area for code not suitable or ready for sktime proper. We made it private a year or so ago, but I think there is no need for it any more. My group certainly doesn't need it. The simplest solution, and the one I am proposing, is just deleting the directory.
We currently have duplicate versions of the same algorithm TSF, and for some reason I have never understood it plays a very prominent role in examples etc. I think we should simplify to a single version and not use it so much, it is a very obscure algorithm not used at all in practice.
we currently have a module distances, which contains elastic distance functions, and one called dists_kernels that wraps everything in objects. I would like to remove dists_kernels. It is confusing having both, and I see no benefit from wrapping distances and introducing things like PwTrafoPanelPipeline, BasePairwiseTransformerPanel etc
I think this description would make little sense to most people.
https://www.sktime.org/en/stable/api_reference/dists_kernels.html
conceptually, insisting that functions be wrapped in objects (functor pattern) is much more Java like than python. Even java recognises functors are a clumsy pattern and introduced lambdas. To go the other way with python just seems perverse. @chrisholder can comment, but I believe all other relevant packages treat distances as functions.
quick review reveals a few possible improvements to EE, which was one of the earliest classifiers we introduced
1When calculating derivatives, EE uses a DerivativeSlopeTransformer which has internal type nested_univ. This means this calculation is shoving a numpy back into a panda, doing inefficient calculations, then pulling ti back into numpy.
# X is a 3D numpy that then gets internally converted to pandas
der_X = DerivativeSlopeTransformer().fit_transform(X)
# it convert back to numpy: remove this and just create differences
if isinstance(der_X, pd.DataFrame):
der_X = from_nested_to_3d_numpy(der_X)
Rather than changing DerivativeSlopeTransformer, which is embedded in a morass of code, I think this really simply transform should just be done in a static method in ElasticEnsemble using numpy only. Basically convert this
def get_der(x):
der = []
for i in range(1, len(x) - 1):
der.append(((x[i] - x[i - 1]) + ((x[i + 1] - x[i - 1]) / 2)) / 2)
return pd.Series([der[0]] + der + [der[-1]])
return [get_der(x) for x in X]
to use numpy. There are other possible optimisations in EE, there is no testing for correctness for EE and it does not include all of the original
Is your feature request related to a problem? Please describe.
TimeSeriesSVC was introduced on 25th De 2022 without review to sktime as a wrapper for scikit support vector machine. I see no issue associated with this classifier . Its purpose seems to be to find a use case for dists_kerns module, but its problematic and I think it should be removed for the following reasons.
def _fit(self, X, y):
self._X = X
kernel_mat = self._kernel(X)
self.svc_estimator_.fit(kernel_mat, y)
return self
def _predict(self, X):
kernel_mat = self._kernel(X, self._X)
y_pred = self.svc_estimator_.predict(kernel_mat)
return y_pred
Describe the solution you'd like
Remove this classifier
deprecate pip install sktime[mlflow,mlflow_tests]
. Move mlflow
to all_extras
and mlflow_tests
into dev
. I think @ltsaprounis you were also supportive for this?
there seems to be a new dependency which is not installed and is not a soft dependency
-*- coding: utf-8 -*-
"""Utility to check soft dependency imports, and raise warnings or errors."""
__author__ = ["fkiraly", "mloning"]
import io
import sys
import warnings
from importlib import import_module
from inspect import isclass
from packaging.requirements import InvalidRequirement, Requirement
from packaging.specifiers import InvalidSpecifier, SpecifierSet
from packaging.requirements import InvalidRequirement, Requirement
from packaging.specifiers import InvalidSpecifier, SpecifierSet
Describe the bug
warn is throwing a new pre-commit problem about stack levels
To Reproduce
Expected behavior
adding stacklevel argument fixes, will fix in #52, putting this issue up for reference
Should be a medium term goal, does not necessarily have to be as soon as we make the repository public?.
We actually have to decide what is different first!
Describe the bug
changing some docs has opened a can of worms it seems.
sktime/tests/test_softdeps.py::test_est_fit_without_modulenotfound[TEASER] - KeyError: 0
====== 1 failed, 1204 passed, 1 skipped, 30 warnings in 93.89s (0:01:33) =======
make: *** [Makefile:40: test_softdeps] Error 1
so it fails test_est_fit_without_modulenotfound for some reason. I'm guessing, but this is testing whether Teaser throws module not found when built without the correct soft dependency? Seems slow for the dictionary classifiers,
another example of the completely OTT testing see #59, its like the test suite is meant to be an oracle that can spot all possible bugs. This is just unrealistic and impractical imo
To Reproduce
](https://github.com/scikit-time/scikit-time/actions/runs/4184490173/jobs/7250212056)
Expected behavior
This is how it fails
it is imo a problem with the test not the classifier, I will look closer.
Versions
The GitHub organisation is currently on GitHub Free, which has a number of limitations (including limited actions minutes). https://github.com/organizations/scikit-time/settings/billing
How did sktime deal with this issue? I assume we can leverage our academic links i.e. https://docs.github.com/en/billing/managing-billing-for-your-github-account/discounted-subscriptions-for-github-accounts
Describe the bug
Documentation build break with python 3.10 because of this sphinx-doc/sphinx#9512
To Reproduce
pip install .[dev,docs]
cd docs
make html
Expected behavior
Additional context
Spinx version is currently pinned at 4.1.1, where the latest is 6.1.3. Anyone remembers why it is pinned?
Versions
Is your feature request related to a problem? Please describe.
Operator overloading (apparently called magic or dunder methods by python people) was introduced across the toolkit last year.
I simply does see the need, for classification, clustering and regression at least. It seems like a contrivance for doing it just because you can. I can see no issues asking for it, no genuine use case and to me at least the high level logic of equating pipeline with multiplication is not at all intuitive. It is not, for example, transitive. It may be that other people like it, in which case no problem.
Describe the solution you'd like
In the push for a simpler toolkit, I would like to strip it out and focus instead on using sklearn pipelines, which are more familiar to users.
Describe alternatives you've considered
if these are popular we can leave them as is, I'm just looking to simplify are reduce the bloatiness of the base classes.
Additional context
I would like to do the same for clustering and regression
ONNX is a stadardised ML persistance format for deployment e.g. on edge devices.
We would have to write a custom ONNX "wrapper" , here is how that works for scikit-learn
: https://onnx.ai/sklearn-onnx/
Useful links:
https://github.com/onnx/onnxmltools
https://github.com/onnx/onnx
Related:
sktime/sktime#1240
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.