tinkoff-ai / etna Goto Github PK

View Code? Open in Web Editor NEW

852.0 10.0 82.0 131.58 MB

ETNA – Time-Series Library

Home Page: https://etna.tinkoff.ru

License: Apache License 2.0

Makefile 0.05% Python 91.19% Shell 0.09% Jupyter Notebook 8.61% Dockerfile 0.06%

python machine-learning time-series forecasting timeseries deep-learning

etna's Introduction

Predict your time series the easiest way

Homepage | Documentation | Tutorials | Contribution Guide | Release Notes

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from classic machine learning to SOTA neural networks, models combination methods and smart backtesting. ETNA is designed to make working with time series simple, productive, and fun.

ETNA is the first python open source framework of Tinkoff.ru Artificial Intelligence Center. The library started as an internal product in our company - we use it in over 10+ projects now, so we often release updates. Contributions are welcome - check our Contribution Guide.

Get started

Let's load and prepare the data.

import pandas as pd
from etna.datasets import TSDataset

# Read the data
df = pd.read_csv("examples/data/example_dataset.csv")

# Create a TSDataset
df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")

# Choose a horizon
HORIZON = 14

# Make train/test split
train_ts, test_ts = ts.train_test_split(test_size=HORIZON)

Define transformations and model:

from etna.models import CatBoostMultiSegmentModel
from etna.transforms import DateFlagsTransform
from etna.transforms import DensityOutliersTransform
from etna.transforms import FourierTransform
from etna.transforms import LagTransform
from etna.transforms import LinearTrendTransform
from etna.transforms import MeanTransform
from etna.transforms import SegmentEncoderTransform
from etna.transforms import TimeSeriesImputerTransform
from etna.transforms import TrendTransform

# Prepare transforms
transforms = [
    DensityOutliersTransform(in_column="target", distance_coef=3.0),
    TimeSeriesImputerTransform(in_column="target", strategy="forward_fill"),
    LinearTrendTransform(in_column="target"),
    TrendTransform(in_column="target", out_column="trend"),
    LagTransform(in_column="target", lags=list(range(HORIZON, 122)), out_column="target_lag"),
    DateFlagsTransform(week_number_in_month=True, out_column="date_flag"),
    FourierTransform(period=360.25, order=6, out_column="fourier"),
    SegmentEncoderTransform(),
    MeanTransform(in_column=f"target_lag_{HORIZON}", window=12, seasonality=7),
    MeanTransform(in_column=f"target_lag_{HORIZON}", window=7),
]

# Prepare model
model = CatBoostMultiSegmentModel()

Fit Pipeline and make a prediction:

from etna.pipeline import Pipeline

# Create and fit the pipeline
pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)
pipeline.fit(train_ts)

# Make a forecast
forecast_ts = pipeline.forecast()

Let's plot the results:

from etna.analysis import plot_forecast

plot_forecast(forecast_ts=forecast_ts, test_ts=test_ts, train_ts=train_ts, n_train_samples=50)

Print the metric value across the segments:

from etna.metrics import SMAPE

metric = SMAPE(mode="macro")
metric_value = metric(y_true=test_ts, y_pred=forecast_ts)
>>> {'segment_b': 3.3017151519000967, 'segment_c': 5.270557433427279, 'segment_a': 5.272811627335398, 'segment_d': 4.689085450895735}

Installation

ETNA is available on PyPI, so you can use pip to install it.

Install default version:

pip install --upgrade pip
pip install etna

The default version doesn't contain all the dependencies, because some of them are needed only for specific models, e.g. Prophet, PyTorch. Available user extensions are the following:

prophet: adds prophet model`,
torch: adds models based on neural nets,
wandb: adds wandb logger,
auto: adds AutoML functionality,
classiciation: adds time series classification functionality.

Install extension:

pip install etna[extension-name]

Install all extensions:

pip install etna[all]

There are also developer extensions. All the extensions are listed in pyproject.toml.

Without the appropriate extension you will get an ImportError trying to import the model that needs it. For example, etna.models.ProphetModel needs prophet extension and can't be used without it.

Configuration

ETNA supports configuration files. It means that library will check that all the specified packages are installed prior to script start and NOT during runtime.

To set up a configuration for your project you should create a .etna file at the project's root. To see the available options look at Settings. There is an example of configuration file.

Tutorials

We have also prepared a set of tutorials for an easy introduction:

Notebook	Interactive launch
Get started
Backtest
EDA
Regressors and exogenous data
Custom model and transform
Deep learning models
Ensembles
Outliers
Forecasting strategies
Forecast interpretation
Clustering
AutoML
Inference: using saved pipeline on a new data
Hierarchical time series
Classification
Feature selection

Documentation

ETNA documentation is available here.

Community

To ask the questions or discuss the library you can join our telegram chat. Discussions section on github is also open for this purpose.

Resources

Forecasting with ETNA: Fast and Furious on Medium
ETNA Regressors on Medium
Time series forecasting with ETNA: first steps on Medium
ETNA: Time Series Analysis. What, why and how? on Medium
Time Series Forecasting Strategies in ETNA on Medium
Tabular Playground Series - Mar 2022 (7th place!) on Kaggle
Store sales prediction with etna library on Kaggle
Tabular Playground Series - Jan 2022 on Kaggle
EDA notebook for Ubiquant Market Prediction on Kaggle
PyCon Russia September 2021 talk on YouTube
ETNA Meetup Jun 2022 on YouTube
DUMP May 2022 talk on YouTube
Прикладные задачи анализа данных, лекция 8 — Временные ряды 2 on YouTube

Acknowledgments

ETNA.Team

Andrey Alekseev, Nikita Barinov, Dmitriy Bunin, Aleksandr Chikov, Vladislav Denisov, Martin Gabdushev, Sergey Kolesnikov, Artem Makhin, Ivan Mitskovets, Albina Munirova, Julia Shenshina, Yuriy Tarasyuk, Konstantin Vedernikov, Ivan Nedosekov, Rodion Petrov

ETNA.Contributors

WinstonDovlatov, mvakhmenin, Carlosbogo, Pacman1984, looopka, Artem Levashov, Aleksey Podkidyshev

License

Feel free to use our library in your commercial and private applications.

ETNA is covered by Apache 2.0. Read more about this license here

Please note that etna[prophet] is covered by GPL 2.0 due to pystan package.

etna's People

Contributors

Stargazers

Watchers

Forkers

yhx07 321hg webclinic017 andrewtruong lifiuni winstondovlatov valeman jingmouren rambam613 carlosbogo seanahmad bordilovskii aurazov f4llou7 cerxun vaanes chanjeunlam statmixedml thompson0012 homgorn bve81 shalevy1 datalearns marchshares geopars o7s8r6 nlhnt pacman1984 constructionware sekibunrakuu nikolai-neustroev v-mk-s akravcuk montshasta2020 vishalbelsare alexeyklimov-git martins0n drdphd-ops hichnicksemen cibermax mvakhmenin purple-mustache looopka stjordanis plunixlp andyxieyong iphyer 1165073307 pavadik avmi lyzl2010 brsnw250 idlazarko mrcazor gg-big-org smetam v-v-denisov arumajirou finconmas gooseit red-hai tomateit megalevel arpitjain799 ghostdivisio roman-212 dimartgit etna-team 5l1v3r1 michelroud vegorovmsk svsilantev jc855 gldi-fc eilyashko datthan1576

etna's Issues

MyPy type checks

🚀 Feature Request

Add Mypy type checks as a CI step.

Motivation

Usually developers forget to add types to their code, however types help while writing code and make code maintainable.

Proposal

Add Mypy type check as CI step.

Test cases

No need

Alternatives

Manual type checking
flake8-plugin annotation checking

Checklist

feature proposal description
motivation
extra proposal context / proposal alternatives review

[BUG] cap/floor in ProphetModel

🐛 Bug Report

ProphetModel filters out column based on this rule. However prophet has reserved names, such as "cap" and "floor": censored

These columns are filtered out by this rule:
https://github.com/tinkoff-ai/etna-ts/blob/26d88da5f1c305da666e1ea218a3a57222603cd2/etna/models/prophet.py#L87-L91

Expected behavior

Should not filter out columns "cap" and "floor"

How To Reproduce

Try to add columns "cap" to TSDataset and apply ProphetModel to it.

Code

Screenshots

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

Add regressor property to TSDataset

🚀 Feature Request

Add regressor property which works similar to segment property:

https://github.com/tinkoff-ai/etna-ts/blob/6c12eb6ab66c5bd1c8d5018243e13f5e32f8bce0/etna/datasets/tsdataset.py#L219-L222

Motivation

It will eliminate need for such code checks:
https://github.com/tinkoff-ai/etna-ts/blob/6c12eb6ab66c5bd1c8d5018243e13f5e32f8bce0/etna/models/prophet.py#L113

Proposal

Add regressor property to TSDataset which returns list of all regressors in dataset

Test cases

Check that all regressors name are returned

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Add minimal examples to Prophet model methods

🚀 Feature Request

Add Example section to Prophet model.
It should be in added to class description and show how to use CatBoost model.
Expected result should be TSDataset (not its values)

Motivation

Make documentation as much helpful as possible

Proposal

Create Example section in ProphetModel.
Generate dataframe
Fit model
Create future dataset
Forecast
Check if result is TSDataset or that all future dates are fulled (not NaN)

Test cases

This should be tested by doc test

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

`inverse_transform` in `TimeSeriesImputerTransform`

🚀 Feature Request

Add inverse_transform methods for TimeSeriesImputerTransform

Motivation

In case of inplace transformation we expect smth like inverse_transform(transform(x)) == x.

Proposal

Save list of missing values on fit_transform stage, replace the corresponding timestamps with None in inverse_transform

Test cases

For TimeSeriesImputerTransform add tests that check inverse_transform(transform(x)) == x + check that inverse_transform(future_x) == future_x

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

No prefix "regressor" in `TimeFlagsTransform` class

🐛 Bug Report

Columns generated by TimeFlagsTransform should have prefix "regressor", but they don't.

Expected behavior

Columns generated by TimeFlagsTransform should have prefix "regressor".

How To Reproduce

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

More obvious Exception Error message while transforming with unfitted transform

🐛 Bug Report

If we try to use transform that requires fitting without fitting, then we will get different errors for different classes, that are difficult to understand. It is relative for classes:

TimeSeriesImputerTransform (with strategy='mean')
PytorchForecastingTransform
SegmentEncoderTransform
SklearnTransform and its children
SpecialDaysTransform

Expected behavior

To give easy to understand exception if unfitted transform is used.

How To Reproduce

Take one of transform classes with non-trivial fit method.
Run transform.
See error.

Code

transform = TimeSeriesImputerTransform(strategy="mean")
transform.transform(ts)

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

CLI

🚀 Feature Request

etna-ts should have CLI interface

Motivation

It will make integration of etna-ts in production easier

Proposal

etna should be able to run prediction from CLI.
Parameters may be stored in json/yml files.
Input and outputs are csv files.

Test cases

No response

Alternatives

https://github.com/fursovia/allenai-common
https://github.com/catalyst-team/hydra-slayer
https://github.com/facebookresearch/hydra

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Models missing documentation

Additional context

There are some classes when documentation for init exist only for private version of the class, but it is not visible in generated doc.
Classes with missing docs:

catboost models
sklearn models

CLI api. Part 1. Forecast.

🚀 Feature Request

сli command forecast which can be used for automatic forecasting without any code.

Motivation

It's will be helpfull for novice.
and most importantly it will help us to integrate new major version to our infrastructure.

Proposal

Using

etna forecast pipeline.yaml --dataset=dataset.csv --exog=exog.csv --output=output.csv

For initialization from yaml we should use hydra-slayer

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Links

Part of #65

Add prefix "regressor_*" in features

🚀 Feature Request

Some features should add prefix "regressor_*".

Motivation

Some features (such as lags) are able to generate data in future. So it is a must these features add prefix "regressor" for new columns.

Proposal

Should add "regressor_" prefix to column/columns name

Test cases

Check if newly created columns have proper names

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Add missing dependency GitPython for generating docs

File pyproject.toml don't have necessary for generating docs dependency: GitPython.

Clustering example notebook

🚀 Feature Request

Add example notebook for clustering pipeline

Motivation

We need to show how to use it

Proposal

I think we should use series generated with etna.datasets.generate_datasets

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Write in documentation about inplace nature of forecast

Additional context

Currently there is no information in documentation that forecast makes predictions inplace and it confuses users of the library.
For example, they fit three models, then generate TSDataset for forecasting and make forecasting under the same dataset with all three models sequentially. The results will be the same and it can be confusing.

docstring typing issue TimeSeriesCrossValidation

Issue:
We pass list of metrics in TimeSeriesCrossValidation, not a dictionary.

https://github.com/tinkoff-ai/etna-ts/blob/29f821bb429d85b233a65651692b60ac8fb3554a/etna/model_selection/backtest.py#L46

Fully parallel backtest

🚀 Feature Request

I propose making TimeSeriesCrossValidation.backtest fully parallel. Now generation of folds is made in cycle, however train/forecast is made in parallel.

Motivation

It can make applying backtest faster, because dataset transformations can take a lot of time.

Proposal

Rewrite calling joblib.Parallel inside TimeSeriesCrossValidation.backtest in a way that TimeSeriesCrossValidation._generate_folds_dataframes will be inside joblib.delayed.

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

[BUG] Forecast link in EDA notebook

🐛 Bug Report

there is a link in table of contents that refers to nothing

Expected behavior

no empty links

How To Reproduce

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

Transformers missing documentation

Additional context

There are some classes when documentation for __init__ exist only for _OneSegment version of the class, but it is not visible in generated doc.
Classes with missing docs:

TimeSeriesImputerTransform
LagTransform (not missing, but has typo and don't have enough description)
YeoJohnsonTransform
BoxCoxTransform
SpecialDaysTransform

Partial dependencies

🚀 Feature Request

In etna users should be able to install only part of it's dependencies.

Motivation

I'm always frustrated when I can't install etna because of Prophet however I don't need

Proposal

Make prophet and pytorch installation optional.

etna <- install everything except prophet and pytorch and pytorch-forecasting
etna[pytorch] <- install pytorch and pytorch-forecasting
etna[prophet] <- install prophet
etna[all] <- install all models

For the ease of importing, I suggest we use Catalyst way
https://github.com/catalyst-team/catalyst/blob/3f18d6b6543f007a18c59e11be66617a7debb5c2/catalyst/settings.py#L201-L202

This way we can check deps only ones, while first initialisation of etna-ts
https://github.com/catalyst-team/catalyst/blob/3f18d6b6543f007a18c59e11be66617a7debb5c2/catalyst/settings.py#L162-L168
and then we can import packages this way
https://github.com/catalyst-team/catalyst/blob/3f18d6b6543f007a18c59e11be66617a7debb5c2/catalyst/loggers/wandb.py#L10-L11

Test cases

No response

Alternatives

Keep as is

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

STLTransform

🚀 Feature Request

Add transform that applies STL decomposition

Motivation

It is widely used for series exploration, we can use it as a series preprocessing method

Proposal

Add STLTransform that applies inplace trend and seasonality subtraction.
https://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.seasonal_decompose.html

class _OneSegmentSTLTransform(Transform):
   def fit(x):
       _fit_seasonal_decompose(x)
      return self
        
   def transform(x):
       trend, seasonal = _get_seasonal_decomposition(x)
       x -= trend + seasonal
       return x

   def inverse_transform(x):
      x += trend + seasonal

class STLTransform(PerSegmentTransform):
   ...

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Adding `BaseMixin` with `repr` method

In the linked PR I am going to add a class BaseMixin with __repr__ method in order to add loggers in the future.

[BUG] fail test_outliers_detection

🐛 Bug Report

Test fails on DensityOutliersTransform(in_column="target"), get_anomalies_density

self = <DatetimeArray>
[]
Length: 0, dtype: datetime64[ns]
other = [numpy.datetime64('2021-01-11T00:00:00.000000000')]

Expected behavior

Test should pass

How To Reproduce

Run test test_outliers_detection

https://github.com/tinkoff-ai/etna-ts/blob/3268b7e0d11a64d7bd3bdf836875c920b1c1833c/tests/test_transforms/test_outliers_transform.py#L22-L43

Environment

MacOS, from master

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

Adding poetry.lock

Add poetry.lock to repository for faster installing library.

NaNs in the end of time series after OutliersTransform

🐛 Bug Report

In case, where the dataset contains outliers in the end of some time series(not all), they are replaced with NaNs by the OutliersTransform and time series stop being ending in the same timestamp. This cause raising errors in the subsequent transforms.

Expected behavior

Perform subsequent transforms without raising errors

How To Reproduce

Code

import pandas as pd
from etna.datasets.tsdataset import TSDataset
from etna.transforms import DensityOutliersTransform, TimeSeriesImputerTransform

classic_df = pd.read_csv("examples/data/example_dataset.csv")
df = TSDataset.to_dataset(classic_df)
ts = TSDataset(df, freq="D")

params = {"window_size":18,
          "distance_coef":1,
          "n_neighbors":4}
outliers_remover = DensityOutliersTransform(in_column="target",**params)
outliers_imputer = TimeSeriesImputerTransform(in_column="target",strategy="running_mean",window=30)
ts.fit_transform([outliers_remover])
ts.fit_transform([outliers_imputer])

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

Create class Pipeline

🚀 Feature Request

Class Pipeline should:

combine model and features
have fit and forecast methods.
implement logic for feature generation
apply inverse_transform to answers

Motivation

This structure will help make prediction easier with less boilerplate code.

Proposal

Create class Pipeline.
As arguments Pipeline expects model, features, horizon.
Implement 2 methods: fit and forecast.

.fit:

expects TSDataset
fit and apply features on given TSDataset
fit model

.forecast:

call make_future method with horizon parameter
apply model to future dataset
call inverse_transform method on forecast dataset
return forecast dataset

Test cases

Features fitted
Model fitted
.forecast method returns same result as script without it

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Create instruction for custom model and transform creation

🚀 Feature Request

Add jupyter notebook with example for custom model and transform creation

Motivation

Make contributing to etna easier

Proposal

Create custom LGBM model and custom floor and ceil transform

Test cases

No need

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Correlation_heatmap example -> EDA notebook

🚀 Feature Request

Add example of correlation_heatmap to EDA notebook.

Motivation

Forgot about it 😿

Proposal

Add heatmap for example dataset + lags

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Weekend flag as a part of DateFlagsTransform

🚀 Feature Request

Add args is_weekend arg to DateFlagsTransform(..., weekend: bool = True).
If True, DateFlagsTransform.transform creates is_weekend column.

Motivation

Return WeekendFeature as a part of DateFlagsTransform

Proposal

is_weekend is categorical

Test cases

Should passes WeekendFeature tests

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Add StackingEnsemble class

🚀 Feature Request

StackingEnsemble is a class that combines results from several Pipelines.
It uses meta model for combination.

The logic of this Ensemble follows https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingRegressor.html#sklearn.ensemble.StackingRegressor

Motivation

Make Ensembles in etna-ts even more useful

Proposal

Because we need to train our meta model we ought to use backtest.
The algorithm is assumed to be as follows:

Initialise StackingEnsemble using model as final model, and cv. The cv could be None, Number. CV could be 2 and higher. By default cv=3.
During fit all pipelines are run through backtest to get predictions.
Then final model is trained on the predictions
Finally pipelines are trained on whole dataset
During forecast pipelines return predictions and then its runs through final_model

se = StackingEnsemble([Pipeline1, Pipeline2], horizon = 7, final_model=Model, cv=None)
se.fit(data)
se.forecast()

In case where Pipeline1 and Pipeline2 have different horizon parameter, StackingEnsemble should return error saying that horizons should be equal.

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Default params for `plot_correlation_heatmap`

🚀 Feature Request

Set default params suitable for the method

Motivation

For example, now colormap is scaled automatically so it gives confusing visualization

Proposal

add check of heatmap_kwargs:

if not given, set heatmap_params = {"vmin": -1, "vmax": 1}
do nothing otherwise

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Jupyter Notebook with NN examples

🚀 Feature Request

Add Jupyter Notebook with NN examples

Motivation

It will make usage of neural networks easier for new users

Proposal

Use datasets from get_started.ipynb and create example with neural networks

Test cases

No need for ipynb

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Make backtest work with pipeline

🚀 Feature Request

Backtest should work with pipeline not bare model. It makes backtest easier as a class.

Motivation

Backtest implements logic that are now a part of Pipeline class.

Proposal

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

`out_column` in Transforms

🚀 Feature Request

My case
I want to compute two scale-based features for my series: in macro and per-segment modes. Now I can't do it because they create the same named columns.
Example:

import pandas as pd
import numpy as np
from etna.datasets import TSDataset
from etna.transforms.scalers import MinMaxScalerTransform

periods = 100
df1 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
df1["segment"] = "segment_1"
df1["target"] = np.random.uniform(10, 20, size=periods)

df2 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
df2["segment"] = "segment_2"
df2["target"] = np.random.uniform(1, 5, size=periods)

df = pd.concat([df1, df2]).reset_index(drop=True)
df = TSDataset.to_dataset(df)
tsds = TSDataset(df, freq="D")

transforms = [
    MinMaxScalerTransform(in_column="target", inplace=False),
    MinMaxScalerTransform(in_column="target", inplace=False, mode="macro"),
]

tsds.fit_transform(transforms)
tsds[:, "segment_1", :]

Motivation

It's a confusing behavior

Proposal

Add optional arg out_column to all the transforms that can be used in non inplace mode. Default value is None. If out_column is given use it, use f"[regressor_]?{in_column}_{self.__repr__}" otherwise.

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Hardcoded frequency in `PytorchForecastingTransform`

🐛 Bug Report

How we have this line in PytorchForecastingTransform.fit:

ts = TSDataset(df, "1d")

The problem is that this df can have any frequency.

Expected behavior

Frequency is calculated instead of hardcoding using pd.infer_freq.

How To Reproduce

Use not daily dataset with neural models.

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

Did you write any new necessary tests?

BinsegTrendTransform

🚀 Feature Request

Add BinsegTrendTransform that works with multiple trends in series.

Motivation

Works great in experiments!

Proposal

BinSegTrendTransform finds changepoints, fits detrend model for each trend interval and subtracts the trend from the origin series.

Test cases

No response

Alternatives

NEED DISCUSS:
we can use it as a simple detrend model or add inplace option and generate a column with trend.

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Delete `offset` in `WindowStatisticsTransform`

🚀 Feature Request

Delete offset arg because the correct way is to use lags first and statistics after it

Motivation

Too much responsibility for this class

Proposal

in request

Test cases

Update actual test cases

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Refactoring of linear trend models

There are some errors in signatures of methods in etna.transforms.detrend. There is also some discrepancy in naming classes inside this file with naming in other transforms.

More obvious Exception Error message while forecasting with unfitted model

Issue:
It's not obvious how to fix problem in that defined case.

from etna.models import NaiveModel

model = NaiveModel(lag=12)

#model.fit(train_ts)  this step is important!

future_ts = train_ts.make_future(HORIZON)
forecast_ts = model.forecast(future_ts)

Traceback:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_5058/597995025.py in <module>
      8 #Make the forecast
      9 future_ts = train_ts.make_future(HORIZON)
---> 10 forecast_ts = model.forecast(future_ts)

~/projects/etna-ts/etna/models/base.py in forecast(self, ts)
    100 
    101         result_list = list()
--> 102         for segment in self._segments:
    103             model = self._models[segment]
    104 

TypeError: 'NoneType' object is not iterable

`inverse_transform` in `*OutliersTransform`

🚀 Feature Request

Add inverse_transform methods for all the OutliersTransforms

Motivation

In case of inplace transformation we expect smth like inverse_transform(transform(x)) == x.

Proposal

Save outliers dict (with real outliers values) in OutliersTransform.transform method. In inverse_transform set the series with values from the dict.

Test cases

For all OutliersTransforms add tests that check inverse_transform(transform(x)) == x + check that inverse_transform(future_x) == future_x

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Add minimal examples to CatBoost model methods

🚀 Feature Request

Add Example section to CatBoost models.
It should be in added to class description and show how to use CatBoost model.
Expected result should be TSDataset (not its values)

Motivation

Make documentation as much helpful as possible

Proposal

Create Example section in CatBoostModelPerSegment and CatBoostModelMultiSegment.
Generate dataframe (not constant)
Calculate some simple features (lags for example)
Fit model
Create future dataset
Forecast
Check if result is TSDataset or that all future dates are fulled (not NaN)

Test cases

This should be tested by doc test

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Absence of return block in `get_anomalies_median`

Additional context

There is also a problem with missing return type in get_anomalies_density (return types should be typed to be visible in documentation).

Add minimal examples to all TSDatasets methods

🚀 Feature Request

Add Example sections to all public methods in TSDataset class.

Motivation

Example section in documentation makes library usage easier.

Proposal

Add minimal examples to all methods

Test cases

Run doctest.
Switch on doctest in pytest by default.

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Remove black and add flake8 plugins instead

🚀 Feature Request

Motivation

Black does not comply with PEP (PyCQA/pycodestyle#373, psf/black#315)
Flake8 plugins could be tuned as we wish and does make developers pay more attention to code style.

Proposal

Remove black.
Add following plugins:

Test cases

No need

Alternatives

Keep as is

Checklist

feature proposal description
motivation
extra proposal context / proposal alternatives review

Adding poetry.lock to .gitignore

Adding poetry.lock to .gitignore helps adding changes to repository by using git add ..

SARIMAX -> new arch

parameter grid-search option within model fitting step

🚀 Feature Request

With the structure of this package it seems like there is a great opportunity to further improve the model selection process using the existing cross-validation and metrics infrastructure by adding grid-search option to model fitting.
I believe it would require only one more class function in the base model that could be called from either from the 'fit' function (i.e. if 'grid-search' selection was turned on) or a separate 'fit_grid_search'.
If there were any configurable parameters in the model, the function would test the combinations using the timeseries cross-validation settings, and fit the final model with the best performing parameters. By default, for models without configurable parameter sets, it would just return the fitted model as it does now.

Motivation

I'm currently working on developing tools to fit many models to a dataset and then basically select the best performing models for predictions. The TSdataset, model, transform, and selection infrastructure in this package is excellent, but somewhat difficult to navigate externally. I think it would be easier to add grid-search into the system (in instances where parameter ranges are relatively well defined), rather than expect users to struggle through finding a way to get it working outside the package.

Alternatively, if someone believes there is an easy way to do grid search from outside the package, it would be great to add as an example notebook.

Proposal

Identify parameter ranges and add these as class attributes at the model level. Add fit_grid_search function to base model class to take in data, metrics, horizon, and search through the parameter sets with timeseries cv and then fit the final model with the best performing set.

Test cases

perSegmentModels vs. multiSegmentModels
benchmarking performance with and without grid-search for each model

Alternatives

Adding example notebooks explaining how to do grid-search of etna models manually using the framework.

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Running notebooks while cicd long build step

🚀 Feature Request

Notebook testing.

Motivation

Notebooks with examples are very important and they should be stay actual while API changing.
That's why we should check and run them from build to build.

Proposal

Add to long build step in CI command to run notebooks.

jupyter nbconvert --to notebook --execute mynotebook.ipynb

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal

Unable to run parallel backtest with neural network models

🐛 Bug Report

There is an error trying to use TimeSeriesCrossValidation.backtest with n_jobs > 1 when we use CUDA.

Expected behavior

Running parallel backtest.

How To Reproduce

Code

Example dataset was taken from datasets for example notebooks.

import pandas as pd
from etna.models.nn import DeepARModel
from etna.datasets.tsdataset import TSDataset
from etna.model_selection import TimeSeriesCrossValidation
from etna.metrics import SMAPE
from etna.transforms import PytorchForecastingTransform


classic_df = pd.read_csv("data/example_dataset.csv")
df = TSDataset.to_dataset(classic_df)
ts = TSDataset(df, freq="D")

HORIZON = 7

transform_deepar = PytorchForecastingTransform(
    max_encoder_length=HORIZON,
    max_prediction_length=HORIZON,
    time_varying_known_reals=["time_idx"],
    time_varying_unknown_reals=["target"],
    target_normalizer=None,
)

model = DeepARModel(max_epochs=3, learning_rate=[0.1], gpus=[0], batch_size=64)
tscv_deepar = TimeSeriesCrossValidation(
    model=model, horizon=HORIZON, metrics=[SMAPE()], n_folds=3, n_jobs=2
)

tscv_deepar.backtest(ts, transforms=[transform_deepar])

Environment

Standard colab environment with installed etna-ts using pip.

Additional context

No response

Checklist

Bug appears at the latest library version
Bug description added
Steps to reproduce added
Expected behavior added

Add VotingEnsemble class

🚀 Feature Request

Ensemble is a class that combines results from several Pipelines.
It uses weight for every model for combination. By default each model weight is equal to 1/N, where N is the number of models.

The logic of this Ensemble follows https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingRegressor.html#sklearn.ensemble.VotingRegressor

Motivation

Make easier to construct Ensembles from etna-ts models

Proposal

ve = VotingEnsemble([Pipeline1, Pipeline2], horizon=7, weights=None)
ve.fit(data)
ve.forecast()

In case where Pipeline1 and Pipeline2 have different horizon parameter, VotingEnsemble should return error saying that horizons should be equal.

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

Added feature request
Added motivation
Added proposal