timeeval / gutentag Goto Github PK
View Code? Open in Web Editor NEWGutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.
License: MIT License
GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.
License: MIT License
Hello,
I encountered an issue when using the custom_input
feature with my own dataset, which contains integer values. When I tried to inject an anomaly into my data, I received a numpy.core._exceptions._UFuncOutputCastingError
. This error occurred because the apply_variations
function in the Consolidator
class tried to add float64
values (from bo.noise
, bo.trend_series
, and bo.offset
) to my int64
time series data, resulting in a type mismatch.
Here is the error message:
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
This error occurred at the following line in the consolidator.py
file:
self.timeseries[:, c] += bo.noise + bo.trend_series + bo.offset
The documentation for the CustomInput
class does not explicitly state that the input data must be of type float, but I guess in general gutenTAG is designed to work with floating-point time series data.
To prevent this error, I suggest adding a simple check in the CustomInput
class to ensure that the input data is of type float. If the data is of type integer, we can automatically convert it to float (maybe with a warning to inform the user).
if df.dtypes[0] == 'int64':
df = df.astype(float)
Use an existing time series as base oscillation for a new dataset. This allows injecting our anomalies into existing time series. The file must be formatted in our canonical file format.
GutenTAG tries to parse the ground truth (label) information to prevent anomaly overlaps.
Example configuration:
timeseries:
- name: "test"
base-oscillations:
- kind: file
path: path/to/file.csv
channel: 0
Normalization anomalies can be constructed by combining mean and amplitude anomaly kinds, however, this is difficult to achieve. Add a new anomaly kind that allows to specify the way a subsequence should be normalized. Normalization options:
Example configuration:
timeseries:
- name: "test"
base-oscillations:
- kind: "sine"
anomalies:
- position: beginning
length: 100
kinds:
- kind: "normalize-minmax"
min: 0
max: 1
This doesn't return a dataset with a timestamp. I can add my own timestamps. But it would be better I could configure the start date and time and get the timestamp for each datapoint.
Example configuration:
timeseries:
- name: "mytest"
length: 1000
base-oscillations:
- kind: "formula"
formula:
base:
kind: "ecg"
frequency: 1
variance: 0
operation:
kind: "*"
operand: 1
Some existing anomaly kinds use smooth transitions. We should allow all (except extremum) anomalies to do this, and therefore move it to the anomaly itself instead of the anomaly kind.
Example configuration:
timeseries:
- name: "test"
base-oscillations:
- kind: "sine"
anomalies:
- position: beginning
length: 100
transition-window: 10 # <--
kinds:
- kind: "amplitude"
amplitude_factor: 2
Is there a clean way on a generated time series (I am working with the Benchmark GutenTag dataset) to label the individual time steps with the related anomaly types?
So far I have worked with a run length encoding on is_anomaly to be able to map the respective positions and anomaly types from the yaml.
Unfortunately, a correct labeling cannot be achieved if, for example, there are two different anomaly types in a time series where both positions are, for example, 'middle'.
Further exemplary sample:
Hi there π
first of all, thanks for the great repository.
I came across this example which seems to fail:
timeseries:
- name: test-amp
length: 1000
base-oscillations:
- kind: sine
anomalies:
- position: end
length: 47
channel: 0
kinds:
- kind: amplitude
amplitude_factor: 2.0
Running with: python -m gutenTAG --config-yaml tests/configs/config-amp.yaml --seed 42 --no-save --plot
this fails to create the anomaly
Generating datasets: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File ".../.venv/lib/python3.11/site-packages/gutenTAG/gutenTAG.py", line 132, in generate
results: List[Tuple[Dict, Dict[str, Any], Optional[List[ExtTimeSeries]]]] = Parallel(n_jobs=n_jobs)(
^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.11/site-packages/joblib/parallel.py", line 1863, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File ".../.venv/lib/python3.11/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.11/site-packages/gutenTAG/gutenTAG.py", line 157, in internal_generate
ts.generate(ctx.seed)
File ".../.venv/lib/python3.11/site-packages/gutenTAG/generator/timeseries.py", line 37, in generate
self.timeseries, self.labels = consolidator.generate(GenerationContext(seed=self._create_new_seed(random_seed)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.11/site-packages/gutenTAG/consolidator.py", line 43, in generate
self.generate_anomalies(ctx)
File ".../.venv/lib/python3.11/site-packages/gutenTAG/consolidator.py", line 67, in generate_anomalies
anomaly_protocol = anomaly.generate(ctx.to_anomaly(current_base_oscillation, positions))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.11/site-packages/gutenTAG/anomalies/__init__.py", line 65, in generate
protocol = anomaly.generate(protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.11/site-packages/gutenTAG/anomalies/types/amplitude.py", line 55, in generate
subsequence = anomaly_protocol.base_oscillation.timeseries[anomaly_protocol.start:anomaly_protocol.end] * amplitude_bell
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
ValueError: operands could not be broadcast together with shapes (47,) (46,)
Maybe I'm wrong but as it looks the error corresponds to
As far as I can see the error originates in:
creeping-length
passed inI'll open a PR soon, if this is fine for you.
Thanks again for the repo π
Hi, it seems that the pattern anomaly is not compatible with CBF base oscillations.
Here is my config file
timeseries:
- name: demo
length: 1000
semi-supervised: true
supervised: false
channels: 1
base-oscillations:
- kind: cylinder_bell_funnel
frequency: 2.0
amplitude: 1.0
variance: 0.05
anomalies:
- position: beginning
length: 100
kinds:
- kind: pattern
cbf_pattern_factor: 2.0
We are user of this repo to create time series. We like to introduce new mode on a top of supervised and semi-supervised, call "ts-augmentation' where we produce
We can provide a small code.
Creeping Anomaly is better fitting name/description.
Add configuration options to post-process (each individual channel) of a time series.
Post-processing options include:
Example configuration:
timeseries:
- name: "mytest"
length: 1000
base-oscillations:
- kind: "sine"
frequency: 1
[...]
anomalies:
- position: "beginning"
[...]
post-processing:
- channel: 0
kind: "smoothing"
factor: 2.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.