Giter Site home page Giter Site logo

jmbhughes / crcf Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 3.0 2.57 MB

Combination Robust Cut Forests: Merging Isolation Forests and Robust Random Cut Forests

Home Page: https://jmbhughes.github.io/crcf/

License: MIT License

Python 100.00%
anomaly-detection machine-learning trees isolation-forest robust-random-cut-forest

crcf's Introduction

Combination Robust Cut Forests

CodeFactor PyPI version codecov

Isolation Forests [Liu+2008] and Robust Random Cut Trees [Guha+2016] are very similar in many ways, as outlined in the supporting overview. Most notably, they are extremes of the same outlier scoring function:

$$\theta \textrm{Depth} + (1 - \theta) \textrm{[Co]Disp}$$

The combination robust cut forest allows you to combine both scores by using an theta other than 0 or 1.

Install

You can install with through pip install crcf. Alternatively, you can download the repository and run python3 setup.py install or pip3 install . Please note that this package uses features from Python 3.7+ and is not compatible with earlier Python versions.

Tasks

  • complete basic implementation
  • provide clear documentation and usage instructions
  • ensure interface allows for fitting and scoring on multiple points at the same time
  • implement a better saving method than pickling
  • use random tests with hypothesis
  • implement tree down in cython
  • accelerate forests with multi-threading
  • incorporate categorical variable support, including categorical rules
  • complete the write-up document with a benchmarking of performance

References

crcf's People

Contributors

jmbhughes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

crcf's Issues

Improve code coverage

For such a small and straight-forward package, the code coverage should be better.

Revise combination equation

$\theta \textrm{Depth} + (1 - \theta) \textrm{[Co]Disp}$ doesn't make sense because an outlier has low depth but high co-displacement. An inverse should be taken of one of the terms.

Create readthedocs

This package needs a refresh and clear instructions on how to use it.

Create save/load system

Right now the saving and loading is done with pickles. We want a more transparent and safer system, potentially using HDF5.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.