Giter Site home page Giter Site logo

rulelist's Introduction

MDL Rule Lists for prediction and subgroup discovery.

PyPI version PyPI - Python Version License: MIT

This repository contains the code for using rule lists for univariate or multivariate classification or regression and its equivalents in Data Mining and Subgroup Discovery. These models use the Minimum Description Length (MDL) principle as optimality criteria.

Dependencies

This project was written for Python 3.7. All required packages from PyPI are specified in the requirements.txt.

NOTE: This list of packages includes the gmpy2 package.

Installation

For the latest version clone this package as is and use it directly:

$ git clone https://github.com/HMProenca/RuleList

For the latest stable version from pip (it can be older than the current github version) please use

pip install rulelist

If you run into issues regarding the gmpy2 package mentioned above, please refer to their documentation for help.

For the current version, you can clone the repository and install the dependencies locally:

git clone https://github.com/HMProenca/RuleList.git
cd RuleList
pip install -r requirements.txt

Example of usage for prediction:

import pandas as pd
from rulelist import RuleListClassifier, RuleListRegressor
from sklearn import datasets
from sklearn.model_selection import train_test_split


data = datasets.load_breast_cancer()
Y = pd.Series(data.target)
X = pd.DataFrame(data.data)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)

model = RuleListClassifier(discretization = "static")

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test.values,y_pred)

print(model)

Example of usage for subgroup discovery:

import pandas as pd
from rulelist import SubgroupListCategorical, SubgroupListGaussian
from sklearn import datasets

data = datasets.load_boston()
y = pd.Series(data.target)
X = pd.DataFrame(data.data)

model = SubgroupListGaussian()

model.fit(X, y)

print(model)

Contact

If there are any questions or issues, please contact me by mail at [email protected] or open an issue here on Github.

Citation

In a machine learning (prediction) context for problems of classification, regression, multi-label classification, multi-category classification, or multivariate regression cite the corresponding bibtex of the first classification application of MDL rule lists:

@article{proencca2020interpretable,
  title={Interpretable multiclass classification by MDL-based rule lists},
  author={Proen{\c{c}}a, Hugo M and van Leeuwen, Matthijs},
  journal={Information Sciences},
  volume={512},
  pages={1372--1393},
  year={2020},
  publisher={Elsevier}
}

in the context of data mining and subgroup discovery please refer to subgroup lists:

@article{proencca2020discovering,
  title={Discovering outstanding subgroup lists for numeric targets using MDL},
  author={Proen{\c{c}}a, Hugo M and Gr{\"u}nwald, Peter and B{\"a}ck, Thomas and van Leeuwen, Matthijs},
  journal={arXiv preprint arXiv:2006.09186},
  year={2020}
} 

and

@article{proencca2021robust,
  title={Robust subgroup discovery},
  author={Proen{\c{c}}a, Hugo Manuel and B{\"a}ck, Thomas and van Leeuwen, Matthijs},
  journal={arXiv preprint arXiv:2103.13686},
  year={2021}
}

References

rulelist's People

Contributors

hmproenca avatar sjvrijn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.