Giter Site home page Giter Site logo

boost-r / fdboost Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 5.0 9.38 MB

Boosting Functional Regression Models. The current release version can be found on CRAN (http://cran.r-project.org/package=FDboost).

R 100.00%
boosting boosting-algorithms cran function-on-function-regression function-on-scalar-regression machine-learning scalar-on-function-regression variable-selection

fdboost's People

Contributors

almond-s avatar davidruegamer avatar druegamer avatar eva2703 avatar sbrockhaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fdboost's Issues

applyFolds() with factor variable that has small group levels

For a factor with a factor-levels that has only very few observations, it can happen that this factor-level is empty in a resampling fold. With such a fold, applyFolds() does not work in this fold and just does not use this fold.

require(FDboost)
require(fda)

data("CanadianWeather", package = "fda")
CanadianWeather$l10precip <- t(log(CanadianWeather$monthlyPrecip))
CanadianWeather$temp <- t(CanadianWeather$monthlyTemp)
CanadianWeather$region <- factor(CanadianWeather$region)
CanadianWeather$month.s <- CanadianWeather$month.t <- 1:12

## center the temperature curves per time-point
CanadianWeather$temp <- scale(CanadianWeather$temp, scale = FALSE)
rownames(CanadianWeather$temp) <- NULL ## delete row-names

## fit model with cyclic splines over the year
mod3 <- FDboost(l10precip ~ bols(region, df = 2.5, contrasts.arg = "contr.dummy") 
                + bsignal(temp, month.s, knots = 11, cyclic = TRUE, 
                          df = 2.5, boundary.knots = c(0.5,12.5), check.ident = FALSE), 
                timeformula = ~ bbs(month.t, knots = 11, cyclic = TRUE, 
                                    df = 3, boundary.knots = c(0.5, 12.5)), 
                offset = "scalar", offset_control = o_control(k_min = 5), 
                control = boost_control(mstop = 60), 
                data = CanadianWeather) 

## Not run:                   
#### find the optimal mstop over 5-fold bootstrap 
## using the function applyFolds 
set.seed(123)
folds3 <- cv(rep(1, length(unique(mod3$id))), B = 5)


## error in first fold, as one factor level is empty in training set 
## but then neede in test set 
appl3 <- applyFolds(mod3, folds = folds3, grid = 1:200)

## just for comparison: use cvrisk
cvm3 <- cvrisk(mod3, folds = folds3[mod3$id, ], grid = 1:200)

par(mfrow= c(1,2))
plot(appl3)
plot(cvm3)

@davidruegamer do we want to fix applyFolds() for this case? If yes, predict 0 for unobserved factor levels?

predict for bsignal seems broken

The example from the vignette FLAM_fuel seems broken, the "predict" step does lead to errors

# From the vignette:
data(fuelSubset)
fuel <- fuelSubset
fuel$dUVVIS <- t(apply(fuel$UVVIS, 1, diff))
fuel$dNIR <- t(apply(fuel$NIR, 1, diff))
fuel$duvvis.lambda <- fuel$uvvis.lambda[-1]
fuel$dnir.lambda <- fuel$nir.lambda[-1]

modH2O <- FDboost(h2o ~ bsignal(UVVIS, uvvis.lambda, knots=40, df=4)
                    + bsignal(NIR, nir.lambda, knots=40, df=4)
                    + bsignal(dUVVIS, duvvis.lambda, knots=40, df=4)
                    + bsignal(dNIR, dnir.lambda, knots=40, df=4),
                    timeformula=~bols(1), data=fuel)

# Predict errors
predict(modH2O, fuel)
> Error in h(simpleError(msg, call)) :
>   error in evaluating the argument 'x' in selecting a method for function 'which': subscript out of bounds
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] FDboost_0.3-3     mboost_2.9-2      stabs_0.6-3       checkmate_2.0.0  
[5] mlr_2.17.1.9004   testthat_2.3.2    ParamHelpers_1.14

Fix applyFolds for irregular grid

The following lines are now removed from the FDboost help as they throw an error :

  ## Find optimal mstop, small grid/low B for a fast example
  set.seed(123)
  folds4 <- cv(rep(1, length(unique(mod4$id))), B = 3)
  appl4 <- applyFolds(mod4, folds = folds4, grid = 1:50)
  ## val4 <- validateFDboost(mod4, folds = folds4, grid = 1:50)

  set.seed(123)
  folds4long <- cvLong(id = mod4$id, weights = model.weights(mod4), B = 3)
  cvm4 <- cvrisk(mod4, folds = folds4long, grid = 1:50)
  mstop(cvm4)

@Almond-S found that this is related to applyFold which does not work correctly with irregular data

Check applyFolds for scalar response

applyFolds does not correctly handle matrices

data("fuelSubset", package = "FDboost")

# center the functional covariates per observed wavelength
fuelSubset$UVVIS <- scale(fuelSubset$UVVIS, scale = FALSE)
fuelSubset$NIR <- scale(fuelSubset$NIR, scale = FALSE)

# fit the scalar-on-function regression model 
sof <- FDboost(heatan ~ bbs(h2o, df = 4) 
               + bsignal(UVVIS, uvvis.lambda, knots = 40, df = 4) 
               + bsignal(NIR, nir.lambda, knots = 40, df = 4), 
               timeformula = NULL, data = fuelSubset)


test <- applyFolds(sof)

The error occures because UVVIS and other matrices are defined as long variables and handed over as such to reweightData, which then generates a wrong new data set.

Also: fitfct in applyFolds does not need / use oobweights
https://github.com/fdboost/FDboost/blob/master/R/crossvalidation.R#L176
Bug? Just remove it?

Remaining errors in test file

  • plot throws an error for model m1: length(x) == length(y) & length(y) == length(id) is not TRUE

  • plot.bootstrapCI fails for m4, m4i, m4iii, m4iv, ms1a, ms2. For the historical models, this happens as the function can't deal with interaction effects, which are stored as lists of matrices. For the scalar models, this is due to the fact, that the line temp$value <- split(temp_CI, seq(nrow(temp_CI))) in plot.bootstrapCI can't deal with scalar intercepts.

  • predict and potentially many following methods throw an error for m4ii (most likely reason: the use of myBlg

  • plotResiduals, validateFDboost and applyFolds for factor response ('not implemented yet', see also #7 )

  • coef(object[0]) fails for m4iv

bug in bhist() when limits are specified within bhist()

@davidruegamer raised the following bug that only occurs in bhist() when limits is specified as function within bhist() and the argument s is not called s, see

library(FDboost)
library(refund)

############################################
# model with functional historical effect, use bhist() 
# Y(t) = f(t)  + \int_0^t X1(s)\beta(s,t)ds + eps
set.seed(2121)
mylimits <- function(s, t){
  (s < t) | (s == t)
}
data2 <- pffrSim(scenario = "ff", n = 40, limits = mylimits)
data2$X1 <- scale(data2$X1, scale = FALSE)
dat2_list <- as.list(data2)
dat2_list$myt <- attr(data2, "yindex")
dat2_list$mys <- attr(data2, "xindex")

## works
m1 <- FDboost(Y ~ 1 + bhist(x = X1, s = mys, time = myt, knots = 5, limits = mylimits), 
              timeformula = ~ bbs(myt, knots = 5), data = dat2_list, 
              control = boost_control(mstop = 40))


## bug
m2 <- FDboost(Y ~ 1 + bhist(x = X1, s = mys, time = myt, knots = 5, limits = function(s, t){(s < t) | (s == t)}), 
              timeformula = ~ bbs(myt, knots = 5), data = dat2_list, 
              control = boost_control(mstop = 40))

predict fails for interaction of bsignal baselearners

library(FDboost)
#> Loading required package: mboost
#> Loading required package: parallel
#> Loading required package: stabs
#> This is mboost 2.9-1. See 'package?mboost' and 'news(package  = "mboost")'
#> for a complete list of changes.
#> This is FDboost 0.3-2.

######## Example for scalar-on-function-regression with bsignal()  
data("fuelSubset", package = "FDboost")

## center the functional covariates per observed wavelength
fuelSubset$UVVIS <- scale(fuelSubset$UVVIS, scale = FALSE)
fuelSubset$NIR <- scale(fuelSubset$NIR, scale = FALSE)

## to make mboost:::df2lambda() happy (all design matrix entries < 10)
## reduce range of argvals to [0,1] to get smaller integration weights
fuelSubset$uvvis.lambda <- with(fuelSubset, (uvvis.lambda - min(uvvis.lambda)) /
                                  (max(uvvis.lambda) - min(uvvis.lambda) ))
fuelSubset$nir.lambda <- with(fuelSubset, (nir.lambda - min(nir.lambda)) /
                                (max(nir.lambda) - min(nir.lambda) ))

mod_int <- FDboost(heatan ~  
                  bsignal(UVVIS, uvvis.lambda, knots = 10, df = 4, check.ident = FALSE) %X%
                  bsignal(NIR, nir.lambda, knots = 10, df=4, check.ident = FALSE), 
                timeformula = NULL, data = fuelSubset)
#> Warning in df2lambda(X, df = args$df, lambda = args$lambda, dmat =
#> K, weights = w, : estimated degrees of freedom differ from 'df' by
#> -0.00387383609195169

predict(mod_int, fuelSubset)
#> Error in predict.FDboost(mod_int, fuelSubset): Can only predict effect of one functional effect in %X% or %Xc%.

Created on 2019-03-26 by the reprex package (v0.2.1)

why?

Collection of non-meaningful error messages / handling

  • For SOF models, which are specified with a data list containing a response with a wrong length (e.g. response has length 7 and covariates have length 8 or 8 rows, FDboost does not throw an error until the df2lambda function tries to do the cholesky decomposition with wrong weights (credits to Joanna)
fuelSubset$heatan <- fuelSubset$heatan[1:100]
sof <- FDboost(heatan ~ bsignal(NIR, nir.lambda, knots = 20, df = 4),
                         timeformula = NULL, data = fuelSubset)

Fix CI

Currently an old setup for Travis CI is used and hence automatic checks are broken.

Fix bootstrapCI

bootstrapCI throws errors for all m4 models (due to nested hmatrix indexing) and for ms1 as well as ms2 as in this case, some of the baselearners are not selected at all (which is still not handled). Everything else in the test file seems to work (except for those models, for which applyFolds or other validation function fail, e.g., binomial case).

defaults for cross-validation for non-cyclical FDboostLSS

When fitting an FDboostLSS model with method = "noncyclic" the resulting S3-object has the classes nc_mboostLSS, FDboostLSS and mboostLSS in this order. Thus, the defaults specified in FDboost:::cvrisk.FDboostLSS won't apply. (See code below.)
Two quick ideas would be to A) change the class order, if that's unproblematic, or B), stress that the folds have to be specified manually in the documentation and deprecate FDboost:::cvrisk.FDboostLSS.

library(FDboost)
library(gamboostLSS)

########### simulate Gaussian function-on-scalar data
n <- 50 ## number of observations
G <- 12 ## number of observations per functional covariate

set.seed(123) ## ensure reproducibility
n_innerknots <- 4
B <- mboost:::bsplines(1:G, knots = 4, boundary.knots = c(1,G), degree = 2)
theta <- rnorm(ncol(B)) ## sample coefficients for x = 1

x <- runif(n) ## sample covariates
y <- B %*% matrix( rnorm(n*length(theta), mean = rep(theta, n) * rep(x, each = length(theta)), sd = rep(x, each = length(theta))), ncol = n) ## sample response

dat_list <- list(y = t(y), x = x, t = 1:G)

## model fit assuming Gaussian location scale model 
model <- FDboostLSS(formula = y ~ bols(x, df = 2), 
                    timeformula = ~ bbs(t, df = 2), 
                    data = dat_list, method = "noncyclic")

class(model)

## -> cvrisk.nc_mboostLSS is directly applied
debug(cvrisk)
cvrisk(model)

update.FDboost() with scalar response

update.FDboost() used on an FDboost-object with scalar response breaks. See the following MWE:

library(FDboost)

######## Example for scalar-on-function-regression
data("fuelSubset", package = "FDboost")
fuelSubset$UVVIS <- scale(fuelSubset$UVVIS, scale = FALSE)
fuelSubset$uvvis.lambda <- with(fuelSubset, (uvvis.lambda - min(uvvis.lambda)) /
                                  (max(uvvis.lambda) - min(uvvis.lambda) ))

mod <- FDboost(heatan ~ 1 + bsignal(UVVIS, uvvis.lambda),
               timeformula = NULL, data = fuelSubset)

## use update.FDboost() ... 
test <- update(mod, formula = heatan ~ 1 + bsignal(UVVIS, uvvis.lambda, df = 5))

## ... breaks as 
## yind <- all.vars(as.formula(object$timeformula))[[1]]
## does not exist 

FDboost does not deal correctly with `is.bdfamily` families.

These now expect matrix valued responses cbind(#success, #failures) (gamboostLSS>1.5), but FDboost treats matrix valued responses as functions and shit's all fucked up.

Code examples in supplement to Statmod discussion paper, too long & secret to post here.

`Matrix::qrR` usage in R/factorize.R

Hello,

Your package, FDboost, has:

FDboost/R/factorize.R

Lines 171 to 174 in d948f05

R <- lapply(QR, lapply, function(x) {
if(inherits(x, "qr"))
qr.R(x)[, order(x$pivot)] else
qrR(x, backPermute = TRUE) })

but qrR may be deprecated in favour of qr.R as soon as Matrix 1.6-0 (to be released in July), although to give package maintainers more time we may delay the deprecation to a later version.

To be backwards compatible with earlier versions of Matrix, I would patch your code like so:

R <- lapply(QR, lapply, function(x) {
    j <- if(isS4(x)) x@q else x[["pivot"]]
    if(is.unsorted(j))
        qr.R(x)[, order(j), drop = FALSE]
    else qr.R(x) })

rather than simply replacing qrR with qr.R. Let me know if you think that you could make such a change before July, or if you think that you need more time. Well, in any case, it would be good to verify that your package passes its checks under Matrix 1.6-0, which you can install with install.packages("Matrix", repos="http://R-Forge.R-project.org").

Thanks,

Mikael { Matrix package co-author }

options("mboost_indexmin") and bhistx() gives errors in coef()

If internally an index is used for model fitting for a model containing bhistx() some methods, e.g., coef(), no longer work:

library(FDboost)

options("mboost_indexmin")

require(refund))
## simulate some data from a historical model
## the interaction effect is in this case not necessary
n <- 100
nygrid <- 35
data1 <- pffrSim(scenario = c("int", "ff"), limits = function(s,t){ s <= t }, 
                 n = n, nygrid = nygrid)
data1$X1 <- scale(data1$X1, scale = FALSE) ## center functional covariate                  
dataList <- as.list(data1)
dataList$tvals <- attr(data1, "yindex")

## create the hmatrix-object
X1h <- with(dataList, hmatrix(time = rep(tvals, each = n), id = rep(1:n, nygrid), 
                              x = X1, argvals = attr(data1, "xindex"), 
                              timeLab = "tvals", idLab = "wideIndex", 
                              xLab = "myX", argvalsLab = "svals"))
dataList$X1h <- I(X1h)   
dataList$svals <- attr(data1, "xindex")


#################################

options("mboost_indexmin" = 10000)

## do the model fit with main effect of bhistx() and interaction of bhistx() and bolsc()
mod <- FDboost(Y ~ bhistx(x = X1h, df = 5, knots = 5), 
               timeformula = ~ bbs(tvals, knots = 10), data = dataList)

coef_mod <- coef(mod)


###################################

options("mboost_indexmin" = 10)

## do the model fit with main effect of bhistx() and interaction of bhistx() and bolsc()
mod2 <- FDboost(Y ~ bhistx(x = X1h, df = 5, knots = 5), 
               timeformula = ~ bbs(tvals, knots = 10), data = dataList)

### breaks within predict
coef_mod2 <- coef(mod2)

Suboptimal usage of all.equal in if-statement in applyFolds()

On some PCs, line 508 in applyFolds()

if (all.equal(papply, mclapply) == TRUE) {
produced the error

Error in if (all.equal(papply, mclapply) == TRUE) { :
the condition has length > 1

for me. It can be easily fixed by changing the line to
if (isTRUE(all.equal(papply, mclapply))) {
(It also says in ?all.equal not to use all.equal directly in if expressions, but use isTRUE(all.equal(....)) instead.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.