Giter Site home page Giter Site logo

timeeval / gutentag Goto Github PK

View Code? Open in Web Editor NEW
69.0 8.0 14.0 1.77 MB

GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.

License: MIT License

Python 100.00%
time-series dataset-generation datasets anomaly-detection time-series-anomaly-detection multivariate-timeseries univariate-timeseries

gutentag's Introduction

TimeEval logo

A good Timeseries Anomaly Generator.

CI codecov Code style: black PyPI package License: MIT python version 3.7|3.8|3.9|3.10|3.11 Downloads

GutenTAG is an extensible tool to generate time series datasets with and without anomalies. A GutenTAG time series consists of a single (univariate) or multiple (multivariate) channels containing a base oscillation with different anomalies at different positions and of different kinds.

base-oscillations base-oscillations base-oscillations

base-oscillations

tl;dr

  1. Install GutenTAG from PyPI:

    pip install timeeval-gutenTAG

    GutenTAG supports Python 3.7, 3.8, 3.9, 3.10, and 3.11; all other requirements are installed with the pip-call above.

  2. Create a generation configuration file example-config.yaml with the instructions to generate a single time series with two anomalies: A pattern anomaly in the middle and an amplitude anomaly at the end of the series. You can use the following content:

    timeseries:
    - name: demo
      length: 1000
      base-oscillations:
      - kind: sine
        frequency: 4.0
        amplitude: 1.0
        variance: 0.05
      anomalies:
      - position: middle
        length: 50
        kinds:
        - kind: pattern
          sinusoid_k: 10.0
      - position: end
        length: 10
        kinds:
        - kind: amplitude
          amplitude_factor: 1.5
  3. Execute GutenTAG with a seed and let it plot the time series:

    gutenTAG --config-yaml example-config.yaml --seed 11 --no-save --plot

    You should see the following time series:

    Example unsupervised time series with two anomalies

Documentation

GutenTAG's documentation can be found here.

Citation

If you use GutenTAG in your project or research, please cite our demonstration paper:

Phillip Wenig, Sebastian Schmidl, and Thorsten Papenbrock. TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms. PVLDB, 15(12): 3678 - 3681, 2022. doi:10.14778/3554821.3554873

@article{WenigEtAl2022TimeEval,
  title = {TimeEval: {{A}} Benchmarking Toolkit for Time Series Anomaly Detection Algorithms},
  author = {Wenig, Phillip and Schmidl, Sebastian and Papenbrock, Thorsten},
  date = {2022},
  journaltitle = {Proceedings of the {{VLDB Endowment}} ({{PVLDB}})},
  volume = {15},
  number = {12},
  pages = {3678 -- 3681},
  doi = {10.14778/3554821.3554873}
}

Contributing

We welcome contributions to GutenTAG. If you have spotted an issue with GutenTAG or if you want to enhance it, please open an issue first. See Contributing for details.

gutentag's People

Contributors

arrrrrmin avatar b-deforce avatar codelionx avatar dependabot[bot] avatar gezelligheid avatar wenig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gutentag's Issues

Amplitude anomaly fails with `amplitude_bell` size offset

Hi there πŸ‘‹

first of all, thanks for the great repository.
I came across this example which seems to fail:

timeseries:
  - name: test-amp
    length: 1000
    base-oscillations:
      - kind: sine
    anomalies:
      - position: end
        length: 47
        channel: 0
        kinds:
          - kind: amplitude
            amplitude_factor: 2.0

Running with: python -m gutenTAG --config-yaml tests/configs/config-amp.yaml --seed 42 --no-save --plot
this fails to create the anomaly

Generating datasets:   0%|                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/gutenTAG.py", line 132, in generate
    results: List[Tuple[Dict, Dict[str, Any], Optional[List[ExtTimeSeries]]]] = Parallel(n_jobs=n_jobs)(
                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/joblib/parallel.py", line 1863, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/gutenTAG.py", line 157, in internal_generate
    ts.generate(ctx.seed)
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/generator/timeseries.py", line 37, in generate
    self.timeseries, self.labels = consolidator.generate(GenerationContext(seed=self._create_new_seed(random_seed)))
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/consolidator.py", line 43, in generate
    self.generate_anomalies(ctx)
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/consolidator.py", line 67, in generate_anomalies
    anomaly_protocol = anomaly.generate(ctx.to_anomaly(current_base_oscillation, positions))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/anomalies/__init__.py", line 65, in generate
    protocol = anomaly.generate(protocol)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/anomalies/types/amplitude.py", line 55, in generate
    subsequence = anomaly_protocol.base_oscillation.timeseries[anomaly_protocol.start:anomaly_protocol.end] * amplitude_bell
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
ValueError: operands could not be broadcast together with shapes (47,) (46,) 

Maybe I'm wrong but as it looks the error corresponds to
As far as I can see the error originates in:

transition_length = int(length * 0.2)
and with creeping-length passed in
anomaly_protocol, custom_anomaly_length=int(anomaly_length * 0.8)

I'll open a PR soon, if this is fine for you.
Thanks again for the repo πŸ‘

Incompatible types: pattern -> cylinder_bell_funnel

Hi, it seems that the pattern anomaly is not compatible with CBF base oscillations.

Here is my config file

timeseries:
- name: demo
  length: 1000
  semi-supervised: true
  supervised: false
  channels: 1
  base-oscillations:
  - kind: cylinder_bell_funnel
    frequency: 2.0
    amplitude: 1.0
    variance: 0.05
  anomalies:
  - position: beginning
    length: 100
    kinds:
    - kind: pattern
      cbf_pattern_factor: 2.0

Mapping of anomaly-types on time domain

Is there a clean way on a generated time series (I am working with the Benchmark GutenTag dataset) to label the individual time steps with the related anomaly types?

So far I have worked with a run length encoding on is_anomaly to be able to map the respective positions and anomaly types from the yaml.

Unfortunately, a correct labeling cannot be achieved if, for example, there are two different anomaly types in a time series where both positions are, for example, 'middle'.

Further exemplary sample:

image

Add a smooth transition to the anomaly API

Some existing anomaly kinds use smooth transitions. We should allow all (except extremum) anomalies to do this, and therefore move it to the anomaly itself instead of the anomaly kind.

Example configuration:

timeseries:
  - name: "test"
    base-oscillations:
      - kind: "sine"
    anomalies:
      - position: beginning
        length: 100
        transition-window: 10  # <--
        kinds:
          - kind: "amplitude"
            amplitude_factor: 2

Type Mismatch Error When Using Integer Data with Custom Input

Hello,

I encountered an issue when using the custom_input feature with my own dataset, which contains integer values. When I tried to inject an anomaly into my data, I received a numpy.core._exceptions._UFuncOutputCastingError. This error occurred because the apply_variations function in the Consolidator class tried to add float64 values (from bo.noise, bo.trend_series, and bo.offset) to my int64 time series data, resulting in a type mismatch.

Here is the error message:

numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

This error occurred at the following line in the consolidator.py file:

self.timeseries[:, c] += bo.noise + bo.trend_series + bo.offset

The documentation for the CustomInput class does not explicitly state that the input data must be of type float, but I guess in general gutenTAG is designed to work with floating-point time series data.

Suggested Fix:

To prevent this error, I suggest adding a simple check in the CustomInput class to ensure that the input data is of type float. If the data is of type integer, we can automatically convert it to float (maybe with a warning to inform the user).

if df.dtypes[0] == 'int64':
    df = df.astype(float)

Add post-processing options

Add configuration options to post-process (each individual channel) of a time series.

Post-processing options include:

  • smoothing using convolution along t
  • segment smoothing (smooth different parts of the time series differently)
  • scaling and standardization (MinMax, z-score, ...)

Example configuration:

timeseries:
  - name: "mytest"
    length: 1000
    base-oscillations:
      - kind: "sine"
        frequency: 1
      [...]
    anomalies:
      - position: "beginning"
        [...]
    post-processing:
      - channel: 0
        kind: "smoothing"
        factor: 2.0

New anomaly: Normalize

Normalization anomalies can be constructed by combining mean and amplitude anomaly kinds, however, this is difficult to achieve. Add a new anomaly kind that allows to specify the way a subsequence should be normalized. Normalization options:

  • normalize-z
  • normalize-minmax
  • normalize-median
  • normalize-mean
  • normalize-logistic
  • normalize-tanh

Example configuration:

timeseries:
  - name: "test"
    base-oscillations:
      - kind: "sine"
    anomalies:
      - position: beginning
        length: 100
        kinds:
          - kind: "normalize-minmax"
            min: 0
            max: 1

New anomaly: pattern-flip

Take a subsequence and flip it vertically:

image

This anomaly can be used for periodic datasets only.
Add an option for a smooth transition if the start and end does not fit to the time series.

Use existing TS file as base oscillation

Use an existing time series as base oscillation for a new dataset. This allows injecting our anomalies into existing time series. The file must be formatted in our canonical file format.

GutenTAG tries to parse the ground truth (label) information to prevent anomaly overlaps.

Example configuration:

timeseries:
  - name: "test"
    base-oscillations:
      - kind: file
        path: path/to/file.csv
        channel: 0

Create new mode - ts_augmentation

We are user of this repo to create time series. We like to introduce new mode on a top of supervised and semi-supervised, call "ts-augmentation' where we produce

  • time series (original)
  • same time series with anomaly (augmented time series)
  • label

We can provide a small code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.