Giter Site home page Giter Site logo

wildwood's Introduction

Build Status Documentation Status PyPI - Python Version PyPI - Wheel GitHub stars GitHub license

WildWood is a python package providing improved random forest algorithms for multiclass classification and regression introduced in the paper Wildwood: a new random forest algorithm by S. Gaïffas, I. Merad and Y. Yu (2021). It follows scikit-learn's API and can be used as an inplace replacement for its Random Forest algorithms (although multilabel/multiclass training is not supported yet). WildWood mainly provides, compared to standard Random Forest algorithms, the following things:

  • Improved predictions with less trees
  • Faster training times (using a histogram strategy similar to LightGBM)
  • Native support for categorical features
  • Parallel training of the trees in the forest

Multi-class classification can be performed with WildWood using ForestClassifier while regression can be performed with ForestRegressor.

Documentation

Documentation is available here:

http://wildwood.readthedocs.io

Installation

The easiest way to install wildwood is using pip

pip install wildwood

But you can also use the latest development from github directly with

pip install git+https://github.com/pyensemble/wildwood.git

Basic usage

Basic usage follows the standard scikit-learn API. You can simply use

from wildwood import ForestClassifier

clf = ForestClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:, 1]

to train a classifier with all default hyper-parameters. However, let us pinpoint below some of the most interesting ones.

Categorical features

You should avoid one-hot encoding of categorical features and specify instead to WildWood which features should be considered as categorical. This is done using the categorical_features argument, which is either a boolean mask or an array of indices corresponding to the categorical features.

from wildwood import ForestClassifier

# Assuming columns 0 and 2 are categorical in X
clf = ForestClassifier(categorical_features=[0, 2])
clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:, 1]
For now, `WildWood` will actually use a maximum of 256 modalities for categorical 
features, since internally features are encoded using a memory efficient ``uint8`` data 
type. This will change in a near future.

Improved predictions through aggregation with exponential weights

By default (aggregation=True) the predictions produced by WildWood are an aggregation with exponential weights (computed on out-of-bag samples) of the predictions given by all the possible prunings of each tree. This is computed exactly and very efficiently, at a cost nearly similar to that of a standard Random Forest (which averages the prediction of leaves). See {ref}description-wildwood for a deeper description of WildWood.

wildwood's People

Contributors

stephanegaiffas avatar yiyang-yu avatar imerad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.