jolars / slope Goto Github PK

View Code? Open in Web Editor NEW

17.0 5.0 10.0 12.96 MB

Sorted L1 Penalized Estimation

Home Page: https://jolars.github.io/SLOPE

License: GNU General Public License v3.0

R 66.80% C++ 30.76% Makefile 0.57% C 0.51% TeX 1.36%

slope sparse-regression r generalized-linear-models

slope's Introduction

SLOPE

Efficient implementations for Sorted L-One Penalized Estimation (SLOPE): generalized linear models regularized with the sorted L1-norm. There is support for ordinary least-squares regression, binomial regression, multinomial regression, and poisson regression, as well as both dense and sparse predictor matrices. In addition, the package features predictor screening rules that enable efficient solutions to high-dimensional problems.

Installation

You can install the current stable release from CRAN with

install.packages("SLOPE")

or the development version from GitHub with

# install.packages("remotes")
remotes::install_github("jolars/SLOPE")

Versioning

SLOPE uses semantic versioning.

Code of conduct

Please note that the ‘SLOPE’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

slope's People

Contributors

Stargazers

Watchers

Forkers

straw-boy joaosantinha minghao2016 tejasvigupta kushagragpt99 chaitany1729 abhirup7 biogenies rnaimehaom jacob-roth

slope's Issues

Erroneous result when x is identity

SLOPE() returns the wrong solution for the following problem:

library(SLOPE)

n <- p <- 4

X <- diag(n)
y <- c(8, 6, 4, 2)
lambda <- c(4, 3, 2, 1)

A <- SLOPE(
  X,
  y,
  family = "gaussian",
  intercept = FALSE,
  center = FALSE,
  scale = "none",
  lambda = lambda / n,
  alpha = 1,
  verbosity = 3
)
#> penalty:  0, dev: 6.907e-310, dev ratio:   1.000, dev change:   0.000, n var:     0, n unique:     0

beta <- coef(A)
0.5 * norm(X %*% beta - y, "2")^2 + sum(sort(abs(beta), decreasing = TRUE)*lambda)
#> [1] 60

beta <- c(4, 3, 2, 1)
0.5 * norm(X %*% beta - y, "2")^2 + sum(sort(abs(beta), decreasing = TRUE)*lambda)
#> [1] 45

^{Created on 2021-03-16 by the reprex package (v1.0.0)}

The result should clearly be c(4, 3, 2, 1).

refactor: switch from armadillo to Eigen

The SLOPE package appears to be slow for some people because they use R's internal BLACK/LAPACK packages, which are far inferior to OPENBLAS. If we switch to Eigen, we can avoid this issue entirely since all linear algebra in Eigen is just c++ headers.

trainSLOPE() returns wrong optimum for AUC

trainSLOPE() appears to select the wrong optimum in terms of AUC.

vary penalty parameter (rho) for ADMM implementaiton

It is possible that we might want to vary penalty parameter across iterations for the implementation in ADMM. See the excerpt below from the ADMM monograph by Boyd et al.

Add argument to `coef()` to print only nonzero coefficients.

Add argument to coef() to print only nonzero coefficients. Mock-up:

fit <- SLOPE(x, y)
coef(fit, only_nonzeros = TRUE)

The return type needs to be a list of course.

Release SLOPE 0.3.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

trainSLOPE for multinomial case

In case of multinomial family the measures calculated by trainSLOPE are incorrect.

Using the following code

set.seed(42)
xy <- SLOPE:::randomProblem(100, p = 20, response = "multinomial")
x <- xy$x
y <- xy$y
fit <- trainSLOPE(x, y, q = c(0.1, 0.2), number = 2, family = "multinomial")

we obtain :

Call:
trainSLOPE(x = x, y = y, q = c(0.1, 0.2), number = 2, family = "multinomial")

Optimum values:
    q        alpha  measure       mean         se         lo        hi
1 0.2 1.965713e-04 deviance 0.03233713 0.01546872 -0.1642116 0.2288858
2 0.2 1.210581e-04      mae 0.07141966 0.04495552 -0.4997944 0.6426337
3 0.2 4.591359e-05      mse 0.09408138 0.03977632 -0.4113247 0.5994874

When we exchange the line https://github.com/jolars/SLOPE/blob/master/R/score.R#L108 into

   mse = apply((y - y_hat)^2, c(1, 3), mean) + 1,

we get

Call:
trainSLOPE(x = x, y = y, q = c(0.1, 0.2), number = 2, family = "multinomial")

Optimum values:
    q        alpha  measure     mean         se        lo       hi
1 0.2 1.965713e-04 deviance 1.032337 0.01546872 0.8357884 1.228886
2 0.2 1.210581e-04      mae 1.071420 0.04495552 0.5002056 1.642634
3 0.2 4.591359e-05      mse 1.094081 0.03977632 0.5886753 1.599487

Thus, change of mse impacts other measures.

Release SLOPE 0.2.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

Reinstate just-in-time standardization for sparse predictor matrices

This is relatively easy to do with first-order methods but hard with ADMM.

Outputs of FISTA and ADMM don't match even after setting 'intercept=FALSE'

I came across the following test:

set.seed(1)
n = 10
p = 20
d <- randomProblem(n,p,response="gaussian")
fit1 <- SLOPE(d$x, d$y,alpha=c(1,0.00),intercept=FALSE,verbosity=3)
fit2 <- SLOPE(d$x, d$y,alpha=c(1,0.00),solver="admm",intercept=FALSE,verbosity=3)
print(coef(fit1)-coef(fit2))

Coefficients for alpha=0 don't match. Output doesn't match in solvers package too when alpha=c(1.0,0.005) which I am currently looking into. Seemed a critical issue for SLOPE package as well so I raised it. On a 10k feet view, duality gap is blowing up.

feat: use duality gap as stopping criterion

We should use the duality gap as a stopping criterion instead of the pseudo-gap + infeasibility that we're currently using.

Release SLOPE 0.3.1

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

Convert visualisations to ggplot2

Functions to update:

plot.SLOPE
plot.TrainedSLOPE
plotDiagnostics
~~@michbur and @jakubkala will work on that.~~

Consider removing `caretSLOPE()`

Apparently there are issues with the caret package that cause errors in the latest R-devel. I received the following e-mail just now:

See the logs for D2MCS SLOPE TSGS ampir varrank at

https://www.stats.ox.ac.uk/pub/bdr/LENGTH1/

(with reproduction details in the 00README.txt file).

All are showing issues in the use of && inside caret.

Even if this is incorrect usage, caret needs to add sanity checks
and we would ask you to liaise with the maintainers if they need to
make changes.

Please correct before 2022-03-11 to safely retain your package on CRAN.

Note that this will be the final reminder.

--

Here's our log: https://www.stats.ox.ac.uk/pub/bdr/LENGTH1/SLOPE.out

I'm thinking that we should just deprecate (or rather make defunct) the caret wrapper, since caret is more or less superseeded by tidymodels anyway. Thoughts?

Reconsider penalty scaling for SLOPE

In SLOPE version 0.3.0 and above, the penalty in the SLOPE objective is scaled depending on the type of scaling that is used in the call to SLOPE(). The behavior is:

for scaling = "l1", no scaling is applied
for scaling = "l2", the penalty is scaled with sqrt(n)
for scaling = "sd", the penalty is scaled with n`.

There are advantages and disadvantages of doing this kind of scaling, and I think a discussion is warranted regarding what the correct behavior should be.

Pros

Regularization strength is independent from the number of observations, which means that the same level of regularization is applied over, for instance, differently sized resamples in cross-validation or when fitting a trained model on a test data set.
Scaling the penalty is standard practice in many implementations of l1-regularized models, such as glmnet, ncvreg, biglasso
Having regularization strength independent from the number of observations means that the model can still control for misspecification as n becomes large.

Cons

The fact that the penalty scaling differs depending on type of standardization can be confusing.
Overfitting becomes less and less of an issue as n becomes larger, so it makes sense to decrease the regularization strength as n grows.
The model definition is now somewhat different from the definitions used in almost all publications, which also means that the interpretation of the alpha parameter as variance in the orthogonal X case is lost.

Possible solutions

Whichever way we go with this, I think we should keep the other option available as a toggle, i.e. add an argument along the lines of penalty_scaling to turn off/on penalty scaling, or even to provide a more fine-grained type of penalty scaling. That way, it would be possible to achieve either behavior, which, really, means that this discussion is really about what the default should be.

Thoughts? Ideas?

References

Hastie et al. (2015) mentions that scaling with n is "useful for cross-validation" and makes lambda values comparable for different sizes of samples, but otherwise doesn't seem to mention it.

Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations (1 edition). Chapman and Hall/CRC.

scikit-learn has a brief article covering these things here: https://scikit-learn.org/stable/auto_examples/svm/plot_svm_scale_c.html

Release SLOPE 0.3.2

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

jolars / slope Goto Github PK

slope's Introduction

SLOPE

Installation

Versioning

Code of conduct

slope's People

Contributors

Stargazers

Watchers

Forkers

slope's Issues

Pros

Cons

Possible solutions

References

Recommend Projects

Recommend Topics

Recommend Org