Giter Site home page Giter Site logo

giotto-time's Introduction

Deploy to gh-pages Upload Python Package CI PyPI version Slack-join

giotto-time

giotto-time is a machine learning based time series forecasting toolbox in Python. It is part of the Giotto collection of open-source projects and aims to provide feature extraction, analysis, causality testing and forecasting models based on scikit-learn API.

License

giotto-time is distributed under the AGPLv3 license. If you need a different distribution license, please contact the L2F team at [email protected].

Documentation

Getting started

Get started with giotto-time by following the installation steps below. Simple tutorials and real-world use cases can be found in example folder as notebooks.

Installation

User installation

Run this command in your favourite python environment

pip install giotto-time

Developer installation

Get the latest state of the source code with the command

git clone https://github.com/giotto-ai/giotto-time.git
cd giotto-time
pip install -e ".[tests, doc]"

Example

from gtime import *
from gtime.feature_extraction import *
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Create random DataFrame with DatetimeIndex
X_dt = pd.DataFrame(np.random.randint(4, size=(20)),
                    index=pd.date_range("2019-12-20", "2020-01-08"),
                    columns=['time_series'])

# Convert the DatetimeIndex to PeriodIndex and create y matrix
X = preprocessing.TimeSeriesPreparation().transform(X_dt)
y = model_selection.horizon_shift(X, horizon=2)

# Create some features
cal = feature_generation.Calendar(region="europe", country="Switzerland", kernel=np.array([1, 2]))
X_f = compose.FeatureCreation(
    [('s_2', Shift(2), ['time_series']),
     ('ma_3', MovingAverage(window_size=3), ['time_series']),
     ('cal', cal, ['time_series'])]).fit_transform(X)

# Train/test split
X_train, y_train, X_test, y_test = model_selection.FeatureSplitter().transform(X_f, y)

# Try sklearn's MultiOutputRegressor as time-series forecasting model
gar = forecasting.GAR(LinearRegression())
gar.fit(X_train, y_train).predict(X_test)

Contributing

We welcome new contributors of all experience levels. The Giotto community goals are to be helpful, welcoming, and effective. To learn more about making a contribution to giotto-time, please see the CONTRIBUTING.rst file.

Links

Community

Giotto Slack workspace: https://slack.giotto.ai/

Contacts

[email protected]

giotto-time's People

Contributors

alexbacce avatar ben-j-l2f avatar ckae95 avatar davide-burba avatar deatinor avatar giotto-learn avatar hella avatar matteocao avatar niko992 avatar nphilou avatar sburyachenko avatar srinidhi-patil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

giotto-time's Issues

Causality test should test correlation just with respect to the target column

Description

At the moment, the causality test classes let the user specify the target column. This columns is actually used only in the 'transform' method. Therefore, during the fit, the causality is computed between all pairs of columns, instead of taking into account just the target column. This of course causes the 'fit' method to take a lot of time and to do useless computations if the user is not interested in a specific pair of columns.

Wrong direction of the shift in the causality tests

Description

Shift is done in the wrong direction when calling transform for the causality tests (for both Pearson and LinearCorrelation).

Steps/Code to Reproduce

import pandas as pd
from pandas.util import testing as testing
from giottotime.causality_tests import ShiftedLinearCoefficient, ShiftedPearsonCorrelation
from typing import List

def make_df_from_expected_shifts(expected_shifts: List[int]) -> pd.DataFrame:
    testing.N, testing.K = 500, 1

    df = testing.makeTimeDataFrame(freq="D")
    for sh, k in zip(expected_shifts, range(3)):
        df[f"shift_{k}"] = df["A"].shift(sh)
    df = df.dropna()

    return df

df = make_df_from_expected_shifts([2])

slc = ShiftedLinearCoefficient(target_col="A")
slc.fit(df)

slc.transform(df).head(10)

Expected Results

After the transformation both columns should be aligned.

Actual Results

The 'transform' step performs the shift in the wrong direction thereby doubling the shift.

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.6 (default, Jan 8 2020, 13:42:34)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.0
SciPy 1.4.1
joblib 0.14.1
Scikit-Learn 0.22
giotto-Learn 0.1.3
giotto-time 0.1.0

Refactor of causality tests

Description

There is a lot of duplicated code in the causality test. A refactor is needed in order to have a cleaner and more maintainable code.

Feature_creation should check for the output_name

Right now, the feature_creation does not check for the output_name of the features. Therefore, if the user inputs the same name for multiple features, the features are gonna get overwrite. The class FeatureCreation should check for the name of all the features and raise a ValueError if this happens.

Change train_test_split

  • train_test_split should have an expected behaviour, without trimming nans away
  • rename the function to "trim". It should have a flag for whether to trim nans from "X" and "y" and should also return the trimmed pieces

Pipeline for first test

A draft of a Pipeline class should be developed, in order to test the current status of the project and show it to Matteo with some real data.

Literature and sources section in the documentation

Description

This is not a bug. The library benefits from referencing the appropriate literature. I looked around the code and the project structure and could not find specific references.

Steps/Code to Reproduce

Not necessary

Expected Results

Better documentation.

Actual Results

Lacking documentation

Versions

Version as of 24/12/2019

Numpy support

The Features and the models should work with a numpy array as input

Score function

Description

The models should have a 'score' method, similar to the scikit-learn one. Moreover, models like GAR should have another 'horizon_score' method that is supposed to compute the score for each time step.

RegressorChain model options

GARFF model (GAR with feedforward=True) should have two options:

  • Training on X + y_preds
  • Training on X + y_truth (sklearn.multioutput.RegressorChain)

'UFuncTypeError' thrown in the DetrendedFeature function when changing the frequency to months or weeks

Description

'UFuncTypeError' thrown in the DetrendedFeature function when changing the frequency to months or weeks

Steps/Code to Reproduce

import pandas as pd
from giottotime.feature_creation import DetrendedFeature
from giottotime.models import PolynomialTrend
model = PolynomialTrend(order=2)
detrend_feature = DetrendedFeature(trend_model=model)
time_index = pd.date_range("2011-01-01", "2011-11-01", freq='M')
X = pd.DataFrame(range(0, len(time_index)), index=time_index)
detrend_feature.transform(X)

Comment

It works e.g. if I set the frequency to days or hours but not for weeks or months.

Error message

UFuncTypeError Traceback (most recent call last)
in
4 detrend_feature = DetrendedFeature(trend_model=model)
5
----> 6 detrend_feature.transform(pd.DataFrame(test_all, index=time_index))

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/feature_creation/index_dependent_features/trend_features.py in transform(self, time_series)
73 """
74 self.trend_model.fit(time_series)
---> 75 time_series_t = self.trend_model.transform(time_series)
76 time_series_t = self._rename_columns(time_series_t)
77 return time_series_t

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/models/trend_models/polynomial_trend.py in transform(self, ts)
118
119 p = np.poly1d(self.model_weights_)
--> 120 time_steps = (ts.index - self.t0_) / self.period_
121
122 predictions = pd.Series(index=ts.index, data=[p(t) for t in time_steps])

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/indexes/timedeltas.py in method(self, other)
45
46 def method(self, other):
---> 47 result = meth(self._data, maybe_unwrap_index(other))
48 return wrap_arithmetic_op(self, other, result)
49

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/arrays/timedeltas.py in truediv(self, other)
531 elif lib.is_scalar(other):
532 # assume it is numeric
--> 533 result = self._data / other
534 freq = None
535 if self.freq is not None:

UFuncTypeError: ufunc 'true_divide' cannot use operands with types dtype('<m8[ns]') and dtype('O')

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.0
SciPy 1.4.1
joblib 0.14.1
Scikit-Learn 0.22
giotto-Learn 0.1.3
giotto-time 0.1.0

Feature Transformation

Description

Another step in the Pipeline could be a feature transformation step, that could be after the feature creation one and before the fitting of the model. However, if we want to have an interface similar to the one of the FeatureCreation class where a list of FeatureTransformation classes are passed and applied sequentially, this requires some reasoning in order to design a good interface.
Some thoughts:

  • Some features, like PCA, require a fit step, while some other do not.
  • Features should have input columns (the ones to transform) and output columns (and drop the others). What should be the interface for this?

Write tests for Models module

  • models/regressors/linear.py
  • models/time_series_models/gar.py
  • models/trend_models/base.py
  • models/trend_models/exponential_trend.py
  • models/trend_models/function_trend.py
  • models/trend_models/polynomial_trend.py
  • models/utils.py

Architectural goals

  • More 'sklearn'-like API #29
  • Numpy array support #31

Proposal A
Decorators to convert into DataFrame

Proposal B
array-like transformers (personal preference)

Issues
Should we return np.array or pd.DataFrame with np.array as input ?
See scikit-learn/scikit-learn#5523 (comment)

Comments and links:
SL/DataFrame current support: https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

SL is more likely to (better) support DataFrame in the future: https://scikit-learn.org/dev/roadmap.html
SL sample properties future support: scikit-learn/scikit-learn#4497
SL/Pandas mapping: https://github.com/scikit-learn-contrib/sklearn-pandas (not maintained)

Cesium is not fit_transform compliant: cesium-ml/cesium#243
Prophet don't use np.array at all and isn't sklearn API compliant (but still use fit/transform model). My suggestion would be to have something close to Prophet to allow people using both libs in the same project as smoothly as possible.

Scikit-learn transformers useful links:
https://scikit-learn.org/dev/developers/develop.html
https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py#L136

To be discussed:

  • Transformers name
  • Transformers category
  • TDA features management #28

Output name should be optional

If we are going to support also numpy arrays, the output_name should be required only if the input is a DataFrame. Moreover, we could have a default value for it (e.g. the name of the class).

Write tests for FeatureCreation class

The FeatureCreation class does not have any unit tests yet.

  • base.py
  • calendar_features.py
  • feature_creation/index_dependent_features/trend_features.py
  • feature_creation/standard_features/base.py
  • feature_creation/standard_features/standard_features.py

Remove unused/untested methods in:

  • test_base.py
  • test_time_series_features.py
  • test_feature_creation.py

Interpretability: LIME

Description

Implementation of LIME in order to have a module which helps to better interpret time series model.

Architecture work

  • Refactor causality_tests tests

Integrate sklearn methods in the library

  • causality_tests
  • experimental
  • feature_creation
  • models

Originally posted by @nphilou in #45 (comment)

GAR tests fails

Description

test_correct_fit_date in test_gar.py fails with Hypothesis testing

Steps/Code to Reproduce

pytest --hypothesis-profile dev giottotime/models/forecasting/tests/test_gar.py

Expected Results

Passed test

Actual Results

FAILED giottotime/models/forecasting/tests/test_gar.py::TestFitPredict::test_correct_fit_date - hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.

CalendarFeature should work with DatetimeIndex

Description

Currently, the CalendarFeature only accept a PeriodIndex as the index of the input DataFrame. Since this is a too strict requirement, it should also accept DatetimeIndex.

MultiIndex in GAR prediction DataFrame

Description

The output DataFrame in the predict method of GAR should have be multi-indexed, in such a way that all predictions have the corresponding date. The date should be inferred using the frequency of the original index.

Fix imports

  • Remove all '*' from the imports (not recommended according to the Python guidelines).
  • According to sklearn conventions, absolute imports should be used only for tests. Change to relative imports for non-tests packages

TypeError when using ShiftedPearsonCorrelation with a constant time series

Description

Error is thrown when using ShiftedPearsonCorrelation and a time series (i.e. a column in the data frame) is constant.

Steps/Code to Reproduce

This is the example as shown in the documentation with an additional column 'E' which is constant.

from giottotime.causality_tests.shifted_pearson_correlation import ShiftedPearsonCorrelation
import pandas.util.testing as testing
import numpy as np
data = testing.makeTimeDataFrame(freq="s")
data['E'] = np.ones(len(data))
spc = ShiftedPearsonCorrelation(target_col="A")
spc.fit(data)
spc.best_shifts_

spc.transform(data)

Expected Results

No error is thrown.

Actual Results


TypeError Traceback (most recent call last)
in
8 spc.best_shifts_
9
---> 10 spc.transform(data)

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/causality_tests/shifted_pearson_correlation.py in transform(self, data)
119 for col in data_t:
120 if col != self.target_col:
--> 121 data_t[col] = data_t[col].shift(self.best_shifts_[col][self.target_col])
122
123 if self.dropna:

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/series.py in shift(self, periods, freq, axis, fill_value)
4371 def shift(self, periods=1, freq=None, axis=0, fill_value=None):
4372 return super().shift(
-> 4373 periods=periods, freq=freq, axis=axis, fill_value=fill_value
4374 )
4375

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/generic.py in shift(self, periods, freq, axis, fill_value)
9397 if freq is None:
9398 new_data = self._data.shift(
-> 9399 periods=periods, axis=block_axis, fill_value=fill_value
9400 )
9401 else:

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/internals/managers.py in shift(self, **kwargs)
570
571 def shift(self, **kwargs):
--> 572 return self.apply("shift", **kwargs)
573
574 def fillna(self, **kwargs):

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
436 kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
437
--> 438 applied = getattr(b, f)(**kwargs)
439 result_blocks = _extend_blocks(applied, result_blocks)
440

~/anaconda3/envs/time/lib/python3.7/site-packages/pandas/core/internals/blocks.py in shift(self, periods, axis, fill_value)
1351 else:
1352 axis_indexer[axis] = slice(periods, None)
-> 1353 new_values[tuple(axis_indexer)] = fill_value
1354
1355 # restore original order

TypeError: slice indices must be integers or None or have an index method

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.0
SciPy 1.4.1
joblib 0.14.1
Scikit-Learn 0.22
giotto-Learn 0.1.3
giotto-time 0.1.0

Refactor of LinearRegressor

The LinearRegressor class should be refactored.

  • The 'fit' method should not have any arguments except for X
  • In general, the 'fit' method is too complex and should not modify input parameters such as the kwargs by adding or removing elements.

Rewrap FunctionTransformer

Description

CustomFeature can be a wrapper of sklearn.preprocessing.FunctionTransformer with a DataFrame as output.

Polynomial can also fit this design.

Error in FeatureCreation when using the same feature type twice

Description

If I have multiple features of the same type (e.g. two ShiftFeatures) in the FeatureCreation function, I will get an error that the same name is used twice.

Steps/Code to Reproduce

import pandas as pd
from giottotime.feature_creation import FeatureCreation
from giottotime.feature_creation import ShiftFeature, MovingAverageFeature
ts = pd.DataFrame([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
shift_feature1 = ShiftFeature(shift=1)
shift_feature2 = ShiftFeature(shift=2)
mv_avg_feature = MovingAverageFeature(window_size=2)
feature_creation = FeatureCreation(horizon=3,
                                    time_series_features=[shift_feature1,
                                                          shift_feature2,
                                                          mv_avg_feature])

Expected Results

No error is thrown. Even though it is possible to specify a feature name in the corresponding step itself, it would be nice, if this was not necessary and names are generated automatically.

Actual Results


ValueError Traceback (most recent call last)
in
9 time_series_features=[shift_feature1,
10 shift_feature2,
---> 11 mv_avg_feature])

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/feature_creation/feature_creation.py in init(self, time_series_features, horizon)
68
69 def init(self, time_series_features: List[Feature], horizon: int = 5):
---> 70 _check_feature_names(time_series_features)
71
72 self.time_series_features = time_series_features

~/anaconda3/envs/time/lib/python3.7/site-packages/giottotime/feature_creation/feature_creation.py in _check_feature_names(time_series_features)
11 if len(set(feature_output_names)) != len(feature_output_names):
12 raise ValueError(
---> 13 "The input features should all have different names, instead "
14 f"they are {feature_output_names}."
15 )

ValueError: The input features should all have different names, instead they are ['ShiftFeature', 'ShiftFeature', 'MovingAverageFeature'].

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.0
SciPy 1.4.1
joblib 0.14.1
Scikit-Learn 0.22
giotto-Learn 0.1.3
giotto-time 0.1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.