Giter Site home page Giter Site logo

lucciola111 / stream_autoencoder_windowing Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 87.96 MB

Stream Autoencoder Windowing (SAW) - Change Detection Framework for high dimensional data streams

License: MIT License

Python 100.00%
concept-drift high-dimensional-data data-streams data-science

stream_autoencoder_windowing's Introduction

Stream Autoencoder Windowing (SAW) -
Change Detection in High Dimensional Data Streams

This repository contains code of the change detection framework Stream Autoencoder Windowing (SAW) for the detection of concept drift in high dimensional data streams: We train an autoencoder on the incoming data stream and monitor its reconstruction error with a sliding window of adaptive size to detect ''when'' and ''where'' a drift occurs.

Abstract

The data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect the “change point”, i.e. “when” a drift occurs. As drifts may occur only in certain dimensions, a change detector should be able to identify the drifting dimensions, i.e. “where” a change occurs. This is particularly challenging when data streams are high dimensional because of the so-called “curse of dimensionality”: neighborhood becomes meaningless, and a concept drift might be only visible in sub spaces.

We introduce Stream Autoencoder Windowing (SAW), an unsupervised change detection framework based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. Our approach allows to effciently and effectively detect “when” and “where” drift occurs in high dimensional data streams. Unsupervised methods do not require ground truth or labels, which is an advantage in the data streaming environment, where obtaining labels can be very expensive or even impossible. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems on real data streams. We evaluate our framework SAW against state-of-the-art methods.

stream_autoencoder_windowing's People

Contributors

lucciola111 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.