Giter Site home page Giter Site logo

icot-example's Introduction

Overview

This is the documentation repository for the clustering algorithm of the paper "Interpretable Clustering: An Optimization Approach" by Dimitris Bertsimas, Agni Orfanoudaki, and Holly Wiberg. The purpose of this method, ICOT, is to generate interpretable tree-based clustering models.

Academic License and Installation

This code runs in Julia version 1.1.0, which can be downloaded at the following links:

Note: version 1.1.0 is required for compatibility with the package.

The ICOT software package uses tools from the Interpretable AI suite and thus it requires an academic license.

You can download the system image the following links:

You can find detailed installation guidelines for the system image here.

Once you have completed the installation you will be presented with a machine ID. You can request an academic license by emailing [email protected] with your academic institution address and the subject line "Request for ICOT License". Please include the machine ID in the email, and Interpretable AI will generate a license file for your machine.

Algorithm Guidelines

The main command to run the algorithm on a dataset X is ICOT.fit!(learner, X, y); where the y can refer to some data partition that is associated with the dataset. The learner is defined as an ICOT.InterpretableCluster() object with the following parameters:

  • criterion: defines the internal validation criterion used to train the ICOT algorithm. The algorithm accepts to options :dunnindex (Dunn 1974) and :silhouette (Rousseeuw 1987).
  • ls_warmstart_criterion: defines the internal validation criterion used to create the initial solution of the warmstart. The same options are offered with the criterion parameter.
  • kmeans_warmstart: provides a warmstart solution to initialize the algorithm. Details are provided in Section 3.3.2 of the paper. It can take as input :none, :greedy, and :oct. The OCT option uses user-selected labels (i.e. from K-means) to fit an Optimal Classification Tree as a supervised learning problem to provide a warm-start to the algorithm. The greedy option fits a CART tree to these labels.
  • geom_search: is a boolean parameter that controls where the algorithm will enable the geometric component of the feature space search. See details in Section 3.3.1 of the paper.
  • geom_threshold: refers to the percentile of gaps that will be considered by the geometric search for each feature. For example: 0.99.
  • minbucket: controls the minimum number of points that must be present in every leaf node of the fitted tree.
  • max_depth: accepts a non-negative Integer to control the maximum depth of the fitted tree. This parameter must always be explicitly set or tuned. We recommend tuning this parameter using the grid search process described in the guide to parameter tuning.
  • ls_random_seed: is an integer controlling the randomized state of the algorithm. We recommend to set the seed to ensure reproducability of results.
  • ls_num_tree_restarts: is an integer specifying the number of random restarts to use in the local search algorithm. Must be positive and defaults to 100. The performance of the tree typically increases as this value is increased, but with quickly diminishing returns. The computational cost of training increases linearly with this value.
  • cp: the complexity parameter that determines the tradeoff between the accuracy and complexity of the tree to control overfitting, as commonly seen in supervised learning problems. The internal validation criteria used for this unsupervised algorithm naturally limit the tree complexity, we recommend to set the value to 0.0.

You can visualize your model on a browser using the ICOT.showinbrowser() command.

You can evaluate the score on a trained ICOT model using the score_al_ws_oct = ICOT.score(learner, X, y, criterion); command.

We have added an example for the ruspini dataset in the src folder called runningICOT_example.jl.

Citing ICOT

If you use ICOT in your research, we kindly ask that you reference the original paper that first introduced the algorithm:

@article{bertsimas2020interpretable,
  title={Interpretable clustering: an optimization approach},
  author={Bertsimas, Dimitris and Orfanoudaki, Agni and Wiberg, Holly},
  journal={Machine Learning},
  pages={1--50},
  year={2020},
  publisher={Springer}
}

icot-example's People

Contributors

agniorf avatar hwiberg avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

icot-example's Issues

CSV failed to precompile

Attempting to run runningICOT_example.jl in Julia version Version 1.8.5 (2023-01-08) but running into the following error message,
ERROR: LoadError: Failed to precompile CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b] to /home/matthewvandergrift/.julia/compiled/v1.8/CSV/jl_uWIaQE.

I have tried running ] up and ] add CSV. I tried downgrading my CSV version to v0.10.2 but then the readtable() command was not recognized. I was wondering if there is a fix to this issue that I am missing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.