Giter Site home page Giter Site logo

ilamm's Introduction

ILAMM

Nonconvex Regularized Robust Regression via I-LAMM (Iterative Local Adaptive Majorize-Minimization) Algorithm

Description

This package employs the I-LAMM algorithm to solve regularized Huber regression. The choice of penalty functions includes the l1-norm, the smoothly clipped absolute deviation (SCAD) and the minimax concave penalty (MCP). Tuning parameter λ is chosen by cross-validation, and τ (for Huber loss) is calibrated either by cross-validation or via a tuning-free principle. As a by-product, this package also produces regularized least squares estimators, including the Lasso, SCAD and MCP.

Assume that the observed data (Y, X) follow a linear model Y = X β + ε, where Y is an n-dimensional response vector, X is an n × d design matrix, β is a sparse vector and ε is an n-vector of noise variables whose distributions can be asymmetric and/or heavy-tailed. The package will compute the regularized Huber regression estimator.

With this package, the simulation results in Section 5 of this paper can be reporduced.

Update 2022-05-09

We are wrapping up the package and will submit it to CRAN soon.

Installation

Install ILAMM from GitHub:

install.packages("devtools")
library(devtools)
devtools::install_github("XiaoouPan/ILAMM")
library(ILAMM)

Getting help

Help on the functions can be accessed by typing ?, followed by function name at the R command prompt.

For example, ?ncvxHuberReg will present a detailed documentation with inputs, outputs and examples of the function ncvxHuberReg.

Common error messages

The package ILAMM is implemented in Rcpp and RcppArmadillo, so the following error messages might appear when you first install it (we'll keep updating common error messages with feedback from users):

  • Error: "...could not find build tools necessary to build ILAMM": For Windows you need Rtools, for Mac OS X you need to install Command Line Tools for XCode. See this link for details.

  • Error: "library not found for -lgfortran/-lquadmath": It means your gfortran binaries are out of date. This is a common environment specific issue.

    1. In R 3.0.0 - R 3.3.0: Upgrading to R 3.4 is strongly recommended. Then go to the next step. Alternatively, you can try the instructions here.

    2. For >= R 3.4.* : download the installer here. Then run the installer.

Functions

There are five functions, all of which are based on the I-LAMM algorithm.

  • ncvxReg: Nonconvex regularized regression (Lasso, SCAD, MCP).
  • ncvxHuberReg: Nonconvex regularized Huber regression (Huber-Lasso, Huber-SCAD, Huber-MCP).
  • cvNcvxReg: K-fold cross-validation for nonconvex regularized regression.
  • cvNcvxHuberReg: K-fold cross-validation for nonconvex regularized Huber regression.
  • tfNcvxHuberReg: Tuning-free nonconvex regularized Huber regression.

Examples

Here we generate data from a sparse linear model Y = X β + ε, where β is sparse and ε consists of indepedent coordinates from a log-normal distribution, which is asymmetric and heavy-tailed.

library(ILAMM)
n = 50
d = 100
set.seed(2018)
X = matrix(rnorm(n * d), n, d)
beta = c(rep(2, 3), rep(0, d - 3))
Y = X %*% beta + rlnorm(n, 0, 1.2) - exp(1.2^2 / 2)

First, we apply the Lasso to fit a linear model on (Y, X) as a benchmark. It can be seen that the cross-valided Lasso produces an overfitted model with many false positives.

fitLasso = cvNcvxReg(X, Y, penalty = "Lasso")
betaLasso = fitLasso$beta

Next, we apply two non-convex regularized least squares methods, SCAD and MCP, to the data. Non-convex penalties reduce the bias introduced by the l1 penalty.

fitSCAD = cvNcvxReg(X, Y, penalty = "SCAD")
betaSCAD = fitSCAD$beta
fitMCP = cvNcvxReg(X, Y, penalty = "MCP")
betaMCP = fitMCP$beta

We further apply Huber regression with non-convex penalties to fit (Y, X): Huber-SCAD and Huber-MCP. With heavy-tailed sampling, we can see evident advantages of Huber-SCAD and Huber-MCP over their least squares counterparts, SCAD and MCP.

fitHuberSCAD = cvNcvxHuberReg(X, Y, penalty = "SCAD")
betaHuberSCAD = fitHuberSCAD$beta
fitHuberMCP = cvNcvxHuberReg(X, Y, penalty = "MCP")
betaHuberMCP = fitHuberMCP$beta

Finally, we demonstrate non-convex regularized Huber regression with τ calibrated via a tuning-free procedure. This function is computationally more efficient, because the cross-validation is only applied to choosing the regularization parameter. More details of the tuning-free procedure can be found in Wang et al., 2018.

fitHuberSCAD.tf = tfNcvxHuberReg(X, Y, penalty = "SCAD")
betaHuberSCAD.tf = fitHuberSCAD.tf$beta
fitHuberMCP.tf = tfNcvxHuberReg(X, Y, penalty = "MCP")
betaHuberMCP.tf = fitHuberMCP.tf$beta

We summarize the performance of the above methods with a table including true positive (TP), false positive (FP), true positive rate (TPR), false positive rate (FPR), l1 error and l2 error below. These results can easily be reproduced.

Method TP FP TPR FPR l1 error l2 error
Lasso 3 17 1 0.175 5.014 1.356
SCAD 3 3 1 0.031 1.219 0.741
MCP 3 0 1 0 1.156 0.795
Huber-SCAD 3 1 1 0.010 0.710 0.402
Huber-MCP 3 0 1 0 0.611 0.354
TF-Huber-SCAD 3 1 1 0.010 0.710 0.402
TF-Huber-MCP 3 0 1 0 0.611 0.354

To obtain more reliable results, users can run the above simulation repeatedly on datasets with larger scales and take average over the summary statistics.

Notes

Function cvNcvxHuberReg is slower than the others because it carries out a two-dimensional grid search to choose both λ and τ via cross-validation.

License

GPL (>= 2)

System requirements

C++11

Authors

Xiaoou Pan [email protected], Qiang Sun [email protected], Wen-Xin Zhou [email protected]

Maintainer

Xiaoou Pan [email protected]

Reference

Eddelbuettel, D. and Francois, R. (2011). Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40(8) 1-18. Paper

Eddelbuettel, D. and Sanderson, C. (2014). RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput. Statist. Data Anal. 71 1054-1063. Paper

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. Paper

Fan, J., Li, Q. and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 247-265. Paper

Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 814-841. Paper

Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73-101. Paper

Pan, X., Sun, Q. and Zhou, W.-X. (2019). Iteratively reweighted l1-penalized robust regression. Preprint. Paper.

Sanderson, C. and Curtin, R. (2016). Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 1 26. Paper

Sun, Q., Zhou, W.-X. and Fan, J. (2019) Adaptive Huber regression, J. Amer. Statist. Assoc. 0 1-12. Paper

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288. Paper

Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2018). A new principle for tuning-free Huber regression. Preprint. Paper

Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942. Paper

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.