Giter Site home page Giter Site logo

datamllab / pyodds Goto Github PK

View Code? Open in Web Editor NEW
247.0 14.0 39.0 632 KB

An End-to-end Outlier Detection System

License: MIT License

Python 99.69% Shell 0.31%
anomaly-detection outlier-detection time-series time-series-analysis deep-learning machine-learning database tdengine

pyodds's Introduction

PyODDS

Build Status Coverage Status Documentation Status Codacy Badge PyPI version

Official Website: http://pyodds.com/

PyODDS is an end-to end Python system for outlier detection with database support. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access to a wide range of outlier detection algorithms, including statistical analysis and more recent deep learning based approaches. It is developed by DATA Lab at Texas A&M University.

PyODDS is featured for:

  • Full Stack Service which supports operations and maintenances from light-weight SQL based database to back-end machine learning algorithms and makes the throughput speed faster;

  • State-of-the-art Anomaly Detection Approaches including Statistical/Machine Learning/Deep Learning models with unified APIs and detailed documentation;

  • Powerful Data Analysis Mechanism which supports both static and time-series data analysis with flexible time-slice(sliding-window) segmentation.

  • Automated Machine Learning PyODDS describes the first attempt to incorporate automated machine learning with outlier detection, and belongs to one of the first attempts to extend automated machine learning concepts into real-world data mining tasks.

The Full API Reference can be found in handbook.

API Demo:

from utils.import_algorithm import algorithm_selection
from utils.utilities import output_performance,connect_server,query_data

# connect to the database
conn,cursor=connect_server(host, user, password)

# query data from specific time range
data = query_data(database_name,table_name,start_time,end_time)

# train the anomaly detection algorithm
clf = algorithm_selection(algorithm_name)
clf.fit(X_train)

# get outlier result and scores
prediction_result = clf.predict(X_test)
outlierness_score = clf.decision_function(test)

#visualize the prediction_result
visualize_distribution(X_test,prediction_result,outlierness_score)

Cite this work

Yuening Li, Daochen Zha, Praveen Kumar Venugopal, Na Zou, Xia Hu. "PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning" (Download)

Biblatex entry:

@inproceedings{10.1145/3366424.3383530,
    author = {Li, Yuening and Zha, Daochen and Venugopal, Praveen and Zou, Na and Hu, Xia},
    title = {PyODDS: An End-to-End Outlier Detection System with Automated Machine Learning},
    year = {2020},
    isbn = {9781450370240},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3366424.3383530},
    doi = {10.1145/3366424.3383530},
    booktitle = {Companion Proceedings of the Web Conference 2020},
    pages = {153--157},
    numpages = {5},
    keywords = {Automated Machine Learning, Outlier Detection, Open Source Package, End-to-end System},
    location = {Taipei, Taiwan},
    series = {WWW '20}
  }

Quick Start

python demo.py --ground_truth --visualize_distribution

Results are shown as

connect to TDengine success
Load dataset and table
Loading cost: 0.151061 seconds
Load data successful
Start processing:
100%|████████████████████████████████████| 10/10 [00:00<00:00, 14.02it/s]
==============================
Results in Algorithm dagmm are:
accuracy_score: 0.98
precision_score: 0.99
recall_score: 0.99
f1_score: 0.99
roc_auc_score: 0.99
processing time: 15.330137 seconds
==============================
connection is closed

Installation

To install the package, please use the pip installation as follows:

pip install pyodds
pip install [email protected]:datamllab/PyODDS.git

Note: PyODDS is only compatible with Python 3.6 and above.

Required Dependencies

- pandas>=0.25.0
- taos==1.4.15
- tensorflow==2.0.0b1
- numpy>=1.16.4
- seaborn>=0.9.0
- torch>=1.1.0
- luminol==0.4
- tqdm>=4.35.0
- matplotlib>=3.1.1
- scikit_learn>=0.21.3

To compile and package the JDBC driver source code, you should have a Java jdk-8 or higher and Apache Maven 2.7 or higher installed. To install openjdk-8 on Ubuntu:

sudo apt-get install openjdk-8-jdk

To install Apache Maven on Ubuntu:

sudo apt-get install maven

To install the TDengine as the back-end database service, please refer to this instruction.

To enable the Python client APIs for TDengine, please follow this handbook.

To insure the locale in config file is valid:

sudo locale-gen "en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
locale

To start the service after installation, in a terminal, use:

taosd

Implemented Algorithms

Statistical Based Methods

Methods Algorithm Class API
CBLOF Clustering-Based Local Outlier Factor :class:algo.cblof.CBLOF
HBOS Histogram-based Outlier Score :class:algo.hbos.HBOS
IFOREST Isolation Forest :class:algo.iforest.IFOREST
KNN k-Nearest Neighbors :class:algo.knn.KNN
LOF Local Outlier Factor :class:algo.cblof.CBLOF
OCSVM One-Class Support Vector Machines :class:algo.ocsvm.OCSVM
PCA Principal Component Analysis :class:algo.pca.PCA
RobustCovariance Robust Covariance :class:algo.robustcovariance.RCOV
SOD Subspace Outlier Detection :class:algo.sod.SOD

Deep Learning Based Methods

Methods Algorithm Class API
autoencoder Outlier detection using replicator neural networks :class:algo.autoencoder.AUTOENCODER
dagmm Deep autoencoding gaussian mixture model for unsupervised anomaly detection :class:algo.dagmm.DAGMM

Time Serie Methods

Methods Algorithm Class API
lstmad Long short term memory networks for anomaly detection in time series :class:algo.lstm_ad.LSTMAD
lstmencdec LSTM-based encoder-decoder for multi-sensor anomaly detection :class:algo.lstm_enc_dec_axl.LSTMED
luminol Linkedin's luminol :class:algo.luminol.LUMINOL

APIs Cheatsheet

The Full API Reference can be found in handbook.

  • connect_server(hostname,username,password): Connect to Apache backend TDengine Service.

  • query_data(connection,cursor,database_name,table_name,start_time,end_time): Query data from table table_name in database database_name within a given time range.

  • algorithm_selection(algorithm_name,contamination): Select an algorithm as detector.

  • fit(X): Fit X to detector.

  • predict(X): Predict if instance in X is outlier or not.

  • decision_function(X): Output the anomaly score of instances in X.

  • output_performance(algorithm_name,ground_truth,prediction_result,outlierness_score): Output the prediction result as evaluation matrix in Accuracy, Precision, Recall, F1 Score, ROC-AUC Score, Cost time.

  • visualize_distribution(X,prediction_result,outlierness_score): Visualize the detection result with the the data distribution.

  • visualize_outlierscore(outlierness_score,prediction_result,contamination) Visualize the detection result with the outlier score.

License

You may use this software under the MIT License.

pyodds's People

Contributors

daochenzha avatar haifeng-jin avatar praveenvenugopal avatar pysods avatar yli96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyodds's Issues

Sliding window usage

Do you have any example using a sliding window for incremental learning? Is this possible? thanks. This package looks super cool byw.

demo.py

When I run demo.py in the terminal, the terminal asks me to enter my password. What does this mean?

LSTM-AD/LSTM-ED input-size documentation

Is there any guidance on the minimum input size for LSTM-AD and LSTM-ED? - except for digging through code and calculating it by hand, that is of course.

I have found that LSTM-AD requires 40 datapoints minimum.
LSTM-ED is still above this and I have settled at around 140.

Inconsistent API for timeseries?

I noticed that when using luminol the first column is taken as the timestamp. However when using lstm based approaches, I do not notice such a case? Perhaps the docs could be clearer.

Missing files in sdist

It appears that the manifest is missing at least one file necessary to build
from the sdist for version 1.0.0rc1. You're in good company, about 5% of other
projects updated in the last year are also missing files.

+ /tmp/venv/bin/pip3 wheel --no-binary pyodds -w /tmp/ext pyodds==1.0.0rc1
Looking in indexes: http://10.10.0.139:9191/root/pypi/+simple/
Collecting pyodds==1.0.0rc1
  Downloading http://10.10.0.139:9191/root/pypi/%2Bf/987/8cf3b9087dafd/pyodds-1.0.0rc1.tar.gz (37 kB)
    ERROR: Command errored out with exit status 1:
     command: /tmp/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-wheel-kvsisbl_/pyodds/setup.py'"'"'; __file__='"'"'/tmp/pip-wheel-kvsisbl_/pyodds/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-wheel-kvsisbl_/pyodds/pip-egg-info
         cwd: /tmp/pip-wheel-kvsisbl_/pyodds/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-wheel-kvsisbl_/pyodds/setup.py", line 8, in <module>
        with open('requirements.txt') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Not one or two but a number of issues in the library

I wasted my two days in making the library work. Firstly, the installation downgraded my Tensorflow 2.4.0 to 2.0 Beta. Obviously, the library has just too many bugs that need to be corrected. I would rather suggest to use the algorithms directly through the class files rather than using them as prescribed in the research paper.
There are gaps in the code vs explanation in the research paper.
In short, please do not put in half baked code as people start with trust that the code is going to work. Also, it should not fiddle with the base packages of Python during installation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.