Giter Site home page Giter Site logo

cfof's Introduction

CFOF (Concentration Free Outlier Factor)

🚧 Work In Progress..

Python implementation of Concentration Free Outlier Factor (CFOF) [1].

CFOF properties

  • Concentration free
  • Does not suffer of the hubness problem
  • Semi–locality
  • fast-CFOF algorithm allows to calculate reliably CFOF scores with linear cost both in the dataset size and dimensionality

Installation

To install the latest release:

$ pip install cfof

Usage

Import CFOF and FastCFOF.

>>> from cfof import CFOF, FastCFOF
>>> import numpy as np

Load data.

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

Instantiate CFOF or FastCFOF, then call .compute(X) to calculate the scores. .compute(X) returns sc, where sc[i, l] is score of object i for ϱ_l (rhos[l]).

You can also calculate CFOF scores from a precomputed distance matrix using .compute_from_distance_matrix().

CFOF (hard-CFOF)

Use compute to compute CFOF scores directly from data.

>>> cfof_clf = CFOF(metric='euclidean', rhos=[0.5, 0.6], n_jobs=1)
>>> cfof_clf.compute(X)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

FastCFOF (soft-CFOF)

Use compute to compute CFOF scores directly from data.

>>> np.random.seed(10)
>>> X = np.random.randint(0, 100, size=(1000, 3))
>>>
>>> fast_cfof_clf = FastCFOF(metric='euclidean',
...                          rhos=[0.001, 0.005, 0.01, 0.05, 0.1],
...                          epsilon=0.1, delta=0.1, n_bins=50, n_jobs=1)
>>> fast_cfof_clf.compute(X)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> fast_cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

CFOFiSAX

This library provides a wrapper for pyCFOFiSAX [2]

>>> from cfof.cfof_isax import CFOFiSAXWrapper

Refer to pyCFOFiSAX documentation for more details.

TODOs

  • Add support for faiss (GPU).
  • Parallelize FastCFOF.
  • Add unit tests.
  • Add benchmarks.
  • Wrap pyCFOFiSAX.

References

[1] ANGIULLI, Fabrizio. CFOF: a concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 2020, vol. 14, no 1, p. 1-53.

[2] FOULON, Lucas, FENET, Serge, RIGOTTI, Christophe, et al. Scoring Message Stream Anomalies in Railway Communication Systems. In : 2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019. p. 769-776.

cfof's People

Contributors

ghilesmeddour avatar

Stargazers

 avatar Taki avatar Amine Remache avatar

Watchers

 avatar

Forkers

lyesmestiri

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.