Giter Site home page Giter Site logo

ck37 / superlearner-guide Goto Github PK

View Code? Open in Web Editor NEW
36.0 10.0 8.0 2.34 MB

SuperLearner guide: fitting models, ensembling, prediction, hyperparameters, parallelization, timing, feature selection, etc.

HTML 100.00%
superlearner cross-validation statistical-learning ensembles tmle targeted-learning

superlearner-guide's Introduction

SuperLearner Guide

A guide to using SuperLearner for prediction. This is now included as a vignette in the SuperLearner package.

Note: this tutorial is a bit out of date; some supplemental methods are now in my ck37r package.

  • Installing
  • Background
  • Create dataset
  • Review available models
  • Fit single models
  • Fit ensemble
  • Predict on new dataset
  • Customize a model setting
  • External cross-validation
  • Test multiple hyperparameter settings
  • Parallelize across CPUs
  • Distribution of ensemble weights
  • Feature selection (screening)
  • Optimize for AUC
  • XGBoost hyperparameter exploration

Intermediate

(To be created)

  • create.Learner() custom environments
  • SL.caret wrapper
  • Custom learner wrapper
  • Custom screener
  • Library analysis - cumulative
  • Library analysis - individual algorithms
  • Recombine SuperLearner

Advanced

(To be created)

  • Parallelize across computers (SLURM)
  • Repeated cross-validation
  • Data-adaptive V-selection for cross-validation
  • Multi-level meta-learning

Resources

Books:

Campus Groups:

Courses at Berkeley:

  • Stat 154 - Statistical Learning
  • CS 189 / CS 289A - Machine Learning
  • PH 252D - Causal Inference
  • PH 295 - Big Data
  • PH 295 - Targeted Learning for Biomedical Big Data
  • INFO - TBD

Also many Coursera offerings and other online classes.

References

Erin LeDell, Maya L. Petersen & Mark J. van der Laan, "Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates." (Electronic Journal of Statistics)

Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226. http://biostats.bepress.com/ucbbiostat/paper266/

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

superlearner-guide's People

Contributors

ck37 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

superlearner-guide's Issues

SL.predict annoying failure...

Just as it took the entire math/philosophy world something like 15 years to realize their definition of a set failed miserably due to Russell's paradox, this bug has gone on way too long...

`# This should not happen with SL.predict with a failed algorithm

set up easy regression data

X = rnorm(100)
Y = 5*X + rnorm(100) + 2
X = data.frame(X = X)

set up bad algorithm

SL.nothing = function(x) {x}

declare the library

SL.library = c("SL.glm", "SL.nothing")

call superlearner with bad alg in library

fit = SuperLearner(Y, X, family = gaussian, SL.library = SL.library)

SL.predict breaks

preds = fit$SL.predict[1:100]

library.predict still fine

fit$library.predict

And this work around breaks if only one learner works

preds = fit$library.predict[,fit$coef != 0] %*% fit$coef[fit$coef != 0]

So I code the following--maybe dumb and inefficient but works

if (length(fit$coef[fit$coef != 0]) == 1) {
preds = fit$library.predict[,fit$coef != 0]
} else {
preds = fit$library.predict[,fit$coef != 0] %*% fit$coef[fit$coef != 0]
}

preds
`

include HAL maybe, it works and is stellar...

I have been using David's SL wrapper for hal that comes with his package:

if (!require(devtools)) install.packages(devtools)
devtools::install_github("benkeser/halplus")

It is a fabulous wrapper so maybe we can just include it with a warning in the vignette that one should time it first before putting it in the library. Anyway, I'm sure that HAL with screeners will also be excellent even if dimension is high.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.