Giter Site home page Giter Site logo

nets's Introduction

NETS:Extremely Fast Outlier Detection from a Data Stream via Set-Based Processing

This is the implementation of the paper published in PVLDB 2019 [Paper] [Slide] [Poster] [Video]

1. Overview

This paper addresses the problem of efficiently detecting outliers from a data stream as old data points expire from and new data points enter the window incrementally. The proposed method is based on a newly discovered characteristic of a data stream, that the change in the locations of data points in the data space is typically very insignificant. This observation has led to the finding that the existing distance-based outlier detection algorithms perform excessive unnecessary computations that are repetitive and/or canceling out the effects. Thus, in this paper, we propose a novel set-based approach to detecting outliers, whereby data points at similar locations are grouped and the detection of outliers or inliers is handled at the group level. Specifically, a new algorithm NETS is proposed to achieve a remarkable performance improvement by realizing set-based early identification of outliers or inliers and taking advantage of the "net effect" between expired and new data points. Additionally, NETS is capable of achieving the same efficiency even for a high-dimensional data stream through two dimensional-level filtering. Comprehensive experiments using six real-world data streams show 5 to 25 times faster processing time than state-of-the-art algorithms with comparable memory consumption. We assert that NETS opens a new possibility to real-time data stream outlier detection.

2. Algorithms

Reference
[1] M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos, "Continuous monitoring of distance-based outliers over data streams," in 2011 IEEE 27th International Conference on Data Engineering, pp. 135-146, 2011.
[2] L. Cao, D. Yang, Q. Wang, Y. Yu, J. Wang, and E. A. Rundensteiner, "Scalable distance-based outlier detection over high-volume data streams," in 2014 IEEE 30th International Conference on Data Engineering, pp. 76-87, 2014.
[3] L. Tran, L. Fan, and C. Shahabi, "Distance-based outlier detection in data streams," in Proceedings of the VLDB Endowment, vol. 9, pp. 1089-1100, 2016.

3. Data Sets

Name # data points # Dim Size Link
GAU 1M 1 7.74MB link
STK 1.05M 1 7.57MB link
TAO 0.58M 3 10.7MB link
HPC 1M 7 28.4MB link
GAS 0.93M 10 70.7MB link
EM 1M 16 119MB link
FC 1M 55 72.2MB link

4. Configuration

NETS algorithm was implemented in JAVA and run on JDK 1.8.0_191.

  • Compile
cd ~/NETS/src
javac test/testBase.java

5. How to run

  • Parameter options
--dataset: title of datasets (string, one of {GAU, STK, TAO, HPC, GAS, EM, FC})
--W: the size of a window (integer)
--S: the size of a slide (integer)
--R: the distance threshold (double)
--K: the number of neighbors threshold (integer)
--D: the number of full dimensions (integer)
--sD: the number of sub dimensions (integer)
--nW: the number of windows (integer)
  • Run a sample experiment
cd ~/NETS/src
java test.testBase --dataset TAO --W 10000 --S 500 --R 1.9 --K 50 --D 3 --sD 3 --nW 1
  • Check the result
cd ~/NETS/src/Result
cat Result_TAO_NETS_D3_sD3_rand0_R1.9_K50_S500_W10000_nW1.txt
At window 0, # outliers: 169
# Dataset: TAO
Method: NETS
Dim: 3
subDim: 3
R/K/W/S: 1.9/50/10000/500
# of windows: 1
Avg CPU time(s) 	 Peak memory(MB)
0.0025	4.3

6. Citation

@article{yoon2019nets,
  title={NETS: Extremely Fast Outlier Detection from a Data Stream via Set-Based Processing},
  author={Yoon, Susik and Lee, Jae-Gil and Lee, Byung Suk},
  journal={Proceedings of the VLDB Endowment},
  volume={12},
  number={11},
  pages={1303--1315},
  year={2019},
  publisher={VLDB Endowment}
}

nets's People

Contributors

cliveyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nets's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.