Giter Site home page Giter Site logo

evoml's Introduction

EvoML

Can evolutionary sampling improve bagged ensembles?

Data Efficient Machine Learning Workshop, ICML 2016 NY

https://arxiv.org/pdf/1610.00465.pdf

Introduction

Bagging and various variants of the it have been widely popular and studied extensively in the last two decade. There has been notable work in understanding the theoretical underpinning of bootstrap aggregating and as to what makes it such a powerful method.

In traditional bagging, each training example is sampled with replacement and with probability N1 . Adaptive Resampling and Combining techniques which modify the probability of each training example being sampled based on heuristics have also been developed and widely used.

Motivation

Random sampling, Error based resampling algorithms which try to set the train-set error to zero, designed bagged ensembles with minimal intersection (Papakonstantinou et al., 2014), diversity and uncorrelated errors, importance sampling etc. are some of the areas being studied to improve bagged ensembles. Either there are multiple answers to the question, or the answer changes with each dataset.

How?

Instead of figuring out precisely as to what sampling and combination of training sets make a bagged ensemble better, we try to fix the definition of better, and allow the bootstrapped training sets to evolve themselves in order to align with the definition.

We generate multiple sampled candidate training sets for the final ensemble and let them compete, mutate and mate their way to the optimal sampling and combination

We use Evolutionary sampling in two domains Subsampling which is sampling rows of data and Subspacing which is sampling features.

Playground

We've developed basic Evolutionary Sampling based prediction models for both subspacing and subsampling seperately. The API is built upon sklearn's fit and predict API. Currently we are experimenting with 3 different fitness functions:

  • FEMPO (Fitness Each Model Private OOB)
  • FEGT (Fitness Ensemble Global Test)
  • FEMPT (Fitness Each Model Private Test)

You can read about them in our paper, Can evolutionary sampling improve bagged ensembles?

Usage

Check out EvoML - Example Usage.ipynb.

Requirements

Python libraries: DEAP, Pandas, sklearn and numpy.

Contribute

In the spirit of reproduciblity we've kept the research open and would be thrilled to have contributors and collaboraters to the research. Please get in touch any ideas or submit an issue if you find a bug.

License

GNU GENERAL PUBLIC LICENSE

Authors

Harsh Nisar and Bhanu Pratap

evoml's People

Contributors

harshnisar avatar bsinghpratap avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.