giotto-ai / giotto-time Goto Github PK

Time series analysis suite

License: Other

Python 100.00%

giotto-time's Introduction

giotto-time

giotto-time is a machine learning based time series forecasting toolbox in Python. It is part of the Giotto collection of open-source projects and aims to provide feature extraction, analysis, causality testing and forecasting models based on scikit-learn API.

License

giotto-time is distributed under the AGPLv3 license. If you need a different distribution license, please contact the L2F team at [email protected].

Documentation

API reference (stable release): https://giotto-ai.github.io/giotto-time/

Getting started

Get started with giotto-time by following the installation steps below. Simple tutorials and real-world use cases can be found in example folder as notebooks.

Installation

User installation

Run this command in your favourite python environment

pip install giotto-time

Developer installation

Get the latest state of the source code with the command

git clone https://github.com/giotto-ai/giotto-time.git
cd giotto-time
pip install -e ".[tests, doc]"

Example

from gtime import *
from gtime.feature_extraction import *
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Create random DataFrame with DatetimeIndex
X_dt = pd.DataFrame(np.random.randint(4, size=(20)),
                    index=pd.date_range("2019-12-20", "2020-01-08"),
                    columns=['time_series'])

# Convert the DatetimeIndex to PeriodIndex and create y matrix
X = preprocessing.TimeSeriesPreparation().transform(X_dt)
y = model_selection.horizon_shift(X, horizon=2)

# Create some features
cal = feature_generation.Calendar(region="europe", country="Switzerland", kernel=np.array([1, 2]))
X_f = compose.FeatureCreation(
    [('s_2', Shift(2), ['time_series']),
     ('ma_3', MovingAverage(window_size=3), ['time_series']),
     ('cal', cal, ['time_series'])]).fit_transform(X)

# Train/test split
X_train, y_train, X_test, y_test = model_selection.FeatureSplitter().transform(X_f, y)

# Try sklearn's MultiOutputRegressor as time-series forecasting model
gar = forecasting.GAR(LinearRegression())
gar.fit(X_train, y_train).predict(X_test)

Contributing

We welcome new contributors of all experience levels. The Giotto community goals are to be helpful, welcoming, and effective. To learn more about making a contribution to giotto-time, please see the CONTRIBUTING.rst file.

Community

Giotto Slack workspace: https://slack.giotto.ai/

Contacts

[email protected]

giotto-time's People

Contributors

Stargazers

Watchers

giotto-time's Issues

Check the requirements.txt file

Put the versions in all the package and check their compatibility with all the Python version supported

Causality test should test correlation just with respect to the target column

Description

At the moment, the causality test classes let the user specify the target column. This columns is actually used only in the 'transform' method. Therefore, during the fit, the causality is computed between all pairs of columns, instead of taking into account just the target column. This of course causes the 'fit' method to take a lot of time and to do useless computations if the user is not interested in a specific pair of columns.

Wrong direction of the shift in the causality tests

Description

Shift is done in the wrong direction when calling transform for the causality tests (for both Pearson and LinearCorrelation).

Steps/Code to Reproduce

import pandas as pd
from pandas.util import testing as testing
from giottotime.causality_tests import ShiftedLinearCoefficient, ShiftedPearsonCorrelation
from typing import List

def make_df_from_expected_shifts(expected_shifts: List[int]) -> pd.DataFrame:
    testing.N, testing.K = 500, 1

    df = testing.makeTimeDataFrame(freq="D")
    for sh, k in zip(expected_shifts, range(3)):
        df[f"shift_{k}"] = df["A"].shift(sh)
    df = df.dropna()

    return df

df = make_df_from_expected_shifts([2])

slc = ShiftedLinearCoefficient(target_col="A")
slc.fit(df)

slc.transform(df).head(10)

Expected Results

After the transformation both columns should be aligned.

Actual Results

The 'transform' step performs the shift in the wrong direction thereby doubling the shift.

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.6 (default, Jan 8 2020, 13:42:34)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.0
SciPy 1.4.1
joblib 0.14.1
Scikit-Learn 0.22
giotto-Learn 0.1.3
giotto-time 0.1.0

Change fit_transform in TDA features

Pass from fit_transform to transform in the TDA features.

Refactor of causality tests

Description

There is a lot of duplicated code in the causality test. A refactor is needed in order to have a cleaner and more maintainable code.

Feature_creation should check for the output_name

Right now, the feature_creation does not check for the output_name of the features. Therefore, if the user inputs the same name for multiple features, the features are gonna get overwrite. The class FeatureCreation should check for the name of all the features and raise a ValueError if this happens.

Change train_test_split

train_test_split should have an expected behaviour, without trimming nans away
rename the function to "trim". It should have a flag for whether to trim nans from "X" and "y" and should also return the trimmed pieces

The causality tests should support a min_shift parameter

Description

It would be helpful to have a min_shift parameter, especially if there is an a-priori knowledge about the correlation and there is no need to test shifts below a certain value.

Pipeline for first test

A draft of a Pipeline class should be developed, in order to test the current status of the project and show it to Matteo with some real data.

Literature and sources section in the documentation

Description

This is not a bug. The library benefits from referencing the appropriate literature. I looked around the code and the project structure and could not find specific references.

Steps/Code to Reproduce

Not necessary

Expected Results

Better documentation.

Actual Results

Lacking documentation

Versions

Version as of 24/12/2019

Rewrite unittests for causality_test module

Description

The unittest for the causality_test module needs to be written in a better way.

Numpy support

The Features and the models should work with a numpy array as input

Use resample instead of reindex in CalendarFeature

If X is passed to the CalendarFeature, a resample (instead of a reindex) should be done with the same frequency of the index of X.

Score function

Description

The models should have a 'score' method, similar to the scikit-learn one. Moreover, models like GAR should have another 'horizon_score' method that is supposed to compute the score for each time step.

RegressorChain model options

GARFF model (GAR with feedforward=True) should have two options:

Training on X + y_preds
Training on X + y_truth (sklearn.multioutput.RegressorChain)

Implement calendar features

'UFuncTypeError' thrown in the DetrendedFeature function when changing the frequency to months or weeks

Description

'UFuncTypeError' thrown in the DetrendedFeature function when changing the frequency to months or weeks

Steps/Code to Reproduce

import pandas as pd
from giottotime.feature_creation import DetrendedFeature
from giottotime.models import PolynomialTrend
model = PolynomialTrend(order=2)
detrend_feature = DetrendedFeature(trend_model=model)
time_index = pd.date_range("2011-01-01", "2011-11-01", freq='M')
X = pd.DataFrame(range(0, len(time_index)), index=time_index)
detrend_feature.transform(X)

Comment

It works e.g. if I set the frequency to days or hours but not for weeks or months.

Error message

UFuncTypeError Traceback (most recent call last)
in
4 detrend_feature = DetrendedFeature(trend_model=model)
5
----> 6 detrend_feature.transform(pd.DataFrame(test_all, index=time_index))

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/feature_creation/index_dependent_features/trend_features.py in transform(self, time_series)
73 """
74 self.trend_model.fit(time_series)
---> 75 time_series_t = self.trend_model.transform(time_series)
76 time_series_t = self._rename_columns(time_series_t)
77 return time_series_t

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/models/trend_models/polynomial_trend.py in transform(self, ts)
118
119 p = np.poly1d(self.model_weights_)
--> 120 time_steps = (ts.index - self.t0_) / self.period_
121
122 predictions = pd.Series(index=ts.index, data=[p(t) for t in time_steps])

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/indexes/timedeltas.py in method(self, other)
45
46 def method(self, other):
---> 47 result = meth(self._data, maybe_unwrap_index(other))
48 return wrap_arithmetic_op(self, other, result)
49

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/arrays/timedeltas.py in truediv(self, other)
531 elif lib.is_scalar(other):
532 # assume it is numeric
--> 533 result = self._data / other
534 freq = None
535 if self.freq is not None:

UFuncTypeError: ufunc 'true_divide' cannot use operands with types dtype('<m8[ns]') and dtype('O')

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.0
SciPy 1.4.1
joblib 0.14.1
Scikit-Learn 0.22
giotto-Learn 0.1.3
giotto-time 0.1.0

Sphinx Issues -> ``

Add parent class for SeasonalFeature

Feature Transformation

Description

Another step in the Pipeline could be a feature transformation step, that could be after the feature creation one and before the fitting of the model. However, if we want to have an interface similar to the one of the FeatureCreation class where a list of FeatureTransformation classes are passed and applied sequentially, this requires some reasoning in order to design a good interface.
Some thoughts:

Some features, like PCA, require a fit step, while some other do not.
Features should have input columns (the ones to transform) and output columns (and drop the others). What should be the interface for this?

Write tests for Causality module

base.py
shifted_linear_coefficient.py
shifted_pearson_correlation.py

Write tests for Models module

models/regressors/linear.py
models/time_series_models/gar.py
models/trend_models/base.py
models/trend_models/exponential_trend.py
models/trend_models/function_trend.py
models/trend_models/polynomial_trend.py
models/utils.py

Implement polynomial features

Remove experimental module from code coverage

Use experimental features and estimators at your own risks

Write tests for TimeSeriesPreparation module

time_series_preparation/time_series_preparation.py
time_series_preparation/time_series_conversion.py

Refactor:

time_series_preparation/tests/utils.py

Add examples for each class in the docstring

Architectural goals

More 'sklearn'-like API #29
Numpy array support #31

Proposal A
Decorators to convert into DataFrame

Proposal B
array-like transformers (personal preference)

Issues
Should we return np.array or pd.DataFrame with np.array as input ?
See scikit-learn/scikit-learn#5523 (comment)

Comments and links:
SL/DataFrame current support: https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

SL is more likely to (better) support DataFrame in the future: https://scikit-learn.org/dev/roadmap.html
SL sample properties future support: scikit-learn/scikit-learn#4497
SL/Pandas mapping: https://github.com/scikit-learn-contrib/sklearn-pandas (not maintained)

Cesium is not fit_transform compliant: cesium-ml/cesium#243
Prophet don't use np.array at all and isn't sklearn API compliant (but still use fit/transform model). My suggestion would be to have something close to Prophet to allow people using both libs in the same project as smoothly as possible.

Scikit-learn transformers useful links:
https://scikit-learn.org/dev/developers/develop.html
https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py#L136

To be discussed:

Transformers name
Transformers category
TDA features management #28

Output name should be optional

If we are going to support also numpy arrays, the output_name should be required only if the input is a DataFrame. Moreover, we could have a default value for it (e.g. the name of the class).

Write tests for FeatureCreation class

The FeatureCreation class does not have any unit tests yet.

base.py
calendar_features.py
feature_creation/index_dependent_features/trend_features.py
feature_creation/standard_features/base.py
feature_creation/standard_features/standard_features.py

Remove unused/untested methods in:

test_base.py
test_time_series_features.py
test_feature_creation.py

Pick a convention for the FeatureTransformers class names

Interpretability: LIME

Description

Implementation of LIME in order to have a module which helps to better interpret time series model.

Architecture work

Refactor causality_tests tests

Integrate sklearn methods in the library

causality_tests
experimental
feature_creation
models

Originally posted by @nphilou in #45 (comment)

GAR tests fails

Description

test_correct_fit_date in test_gar.py fails with Hypothesis testing

Steps/Code to Reproduce

pytest --hypothesis-profile dev giottotime/models/forecasting/tests/test_gar.py

Expected Results

Passed test

Actual Results

FAILED giottotime/models/forecasting/tests/test_gar.py::TestFitPredict::test_correct_fit_date - hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.

CalendarFeature should work with DatetimeIndex

Description

Currently, the CalendarFeature only accept a PeriodIndex as the index of the input DataFrame. Since this is a too strict requirement, it should also accept DatetimeIndex.

sklearn-like interface

Use sklearn base classes like BaseEstimator and Mixins.

Clearer documentation about kernel in the holiday feature

Description

There is a lack of documentation about how the holiday feature is affected by the kernel. A clearer explanation in the documentation would be helpful to let the user understand what is exactly the behaviour.

MultiIndex in GAR prediction DataFrame

Description

The output DataFrame in the predict method of GAR should have be multi-indexed, in such a way that all predictions have the corresponding date. The date should be inferred using the frequency of the original index.

Check 'seasonal_features' docstrings

Check that the docstrings in the seasonal features are correctly written

Add score in GAR

Maybe use the score of BaseEstimator?

Broken documentation

Truncated description in LinearRegressor doc
https://docs-time.giotto.ai/index.html#module-giottotime.models

Fix imports

Remove all '*' from the imports (not recommended according to the Python guidelines).
According to sklearn conventions, absolute imports should be used only for tests. Change to relative imports for non-tests packages

Refactor trend features

Write tests for Utils module

utils/hypothesis/utils.py
utils/hypothesis/feature_matrices.py

TypeError when using ShiftedPearsonCorrelation with a constant time series

Description

Error is thrown when using ShiftedPearsonCorrelation and a time series (i.e. a column in the data frame) is constant.

Steps/Code to Reproduce

This is the example as shown in the documentation with an additional column 'E' which is constant.

from giottotime.causality_tests.shifted_pearson_correlation import ShiftedPearsonCorrelation
import pandas.util.testing as testing
import numpy as np
data = testing.makeTimeDataFrame(freq="s")
data['E'] = np.ones(len(data))
spc = ShiftedPearsonCorrelation(target_col="A")
spc.fit(data)
spc.best_shifts_

spc.transform(data)

Expected Results

No error is thrown.

Actual Results

TypeError Traceback (most recent call last)
in
8 spc.best_shifts_
9
---> 10 spc.transform(data)

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/causality_tests/shifted_pearson_correlation.py in transform(self, data)
119 for col in data_t:
120 if col != self.target_col:
--> 121 data_t[col] = data_t[col].shift(self.best_shifts_[col][self.target_col])
122
123 if self.dropna:

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/series.py in shift(self, periods, freq, axis, fill_value)
4371 def shift(self, periods=1, freq=None, axis=0, fill_value=None):
4372 return super().shift(
-> 4373 periods=periods, freq=freq, axis=axis, fill_value=fill_value
4374 )
4375

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/generic.py in shift(self, periods, freq, axis, fill_value)
9397 if freq is None:
9398 new_data = self._data.shift(
-> 9399 periods=periods, axis=block_axis, fill_value=fill_value
9400 )
9401 else:

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/internals/managers.py in shift(self, **kwargs)
570
571 def shift(self, **kwargs):
--> 572 return self.apply("shift", **kwargs)
573
574 def fillna(self, **kwargs):

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
436 kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
437
--> 438 applied = getattr(b, f)(**kwargs)
439 result_blocks = _extend_blocks(applied, result_blocks)
440

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/internals/blocks.py in shift(self, periods, axis, fill_value)
1351 else:
1352 axis_indexer[axis] = slice(periods, None)
-> 1353 new_values[tuple(axis_indexer)] = fill_value
1354
1355 # restore original order

TypeError: slice indices must be integers or None or have an index method

Versions

Write tests for ModelSelection module

train_test_splitters/base.py
splitters.py

Refactor:

test_splitter.py

Refactor of LinearRegressor

The LinearRegressor class should be refactored.

The 'fit' method should not have any arguments except for X
In general, the 'fit' method is too complex and should not modify input parameters such as the kwargs by adding or removing elements.

Rewrap FunctionTransformer

Description

CustomFeature can be a wrapper of sklearn.preprocessing.FunctionTransformer with a DataFrame as output.

Polynomial can also fit this design.

Refactor of X-independent features

The features that do not strictly require X should go into a different package and have a different baseclass.

Add minimum length and exclude NaN in series generator in hypothesis

Error in FeatureCreation when using the same feature type twice

Description

If I have multiple features of the same type (e.g. two ShiftFeatures) in the FeatureCreation function, I will get an error that the same name is used twice.

Steps/Code to Reproduce

import pandas as pd
from giottotime.feature_creation import FeatureCreation
from giottotime.feature_creation import ShiftFeature, MovingAverageFeature
ts = pd.DataFrame([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
shift_feature1 = ShiftFeature(shift=1)
shift_feature2 = ShiftFeature(shift=2)
mv_avg_feature = MovingAverageFeature(window_size=2)
feature_creation = FeatureCreation(horizon=3,
                                    time_series_features=[shift_feature1,
                                                          shift_feature2,
                                                          mv_avg_feature])

Expected Results

No error is thrown. Even though it is possible to specify a feature name in the corresponding step itself, it would be nice, if this was not necessary and names are generated automatically.

Actual Results

ValueError Traceback (most recent call last)
in
9 time_series_features=[shift_feature1,
10 shift_feature2,
---> 11 mv_avg_feature])

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/feature_creation/feature_creation.py in init(self, time_series_features, horizon)
68
69 def init(self, time_series_features: List[Feature], horizon: int = 5):
---> 70 _check_feature_names(time_series_features)
71
72 self.time_series_features = time_series_features

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/feature_creation/feature_creation.py in _check_feature_names(time_series_features)
11 if len(set(feature_output_names)) != len(feature_output_names):
12 raise ValueError(
---> 13 "The input features should all have different names, instead "
14 f"they are {feature_output_names}."
15 )

ValueError: The input features should all have different names, instead they are ['ShiftFeature', 'ShiftFeature', 'MovingAverageFeature'].

giotto-ai / giotto-time Goto Github PK

giotto-time's Introduction

giotto-time

License

Documentation

Getting started

Installation

User installation

Developer installation

Example

Contributing

Links

Community

Contacts

giotto-time's People

Contributors

Stargazers

Watchers

Forkers

giotto-time's Issues

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Description

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Description

Description

Description

Steps/Code to Reproduce

Comment

Error message

Versions

Description

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Description

Description

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Recommend Projects

Recommend Topics

Recommend Org