boyiguo1 / bham Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 1.04 MB

Bayesian Hierarchical Additive Models

Home Page: https://boyiguo1.github.io/BHAM

License: Other

R 99.46% C 0.48% Rez 0.06%

bayesian spike-and-slab additive-models r-package r rstats

bham's Introduction

BHAM

The goal of BHAM is to fit Bayesian hierarchical generalized additive models, and provide ancillary functions for high-dimensional model specification, model summary, variable selection, curve fitting.

Installation

The development version from GitHub with:

# install.packages("devtools")
devtools::install_github("boyiguo1/BHAM")

Example

Please visit the vignettes for a quick introduction

bham's People

Contributors

Watchers

bham's Issues

Inference for IWLS model

Automate Tuning of bacoxph

Develop Cox additive Model

We rely on some of the c functions from the library survival for constructing counting process. However, these functions are internal. Without spending a lot of time, we will just to copy the internal functions to the current project

Curve Plotting Functions

To develop the curve plotting functions

Track if the variable name is in the set of variables names
- If yes, pull all the relevant variables
- If no, return an error message, "No variable found in the model predictor space. Please double check."
Use a loop to create a graph for each of the variables of interest
- retrieve the range of interested variable
- create design matrix of the coefficient
- calculate the risk ratio or linear predictor
- make the plots
Assemble the graphs

Parameters:

model, a object of bham
variable names, a vector of characteristic string including the variables of interest
plot, a logical variable indicating if a ggplot object are returned, or a list containing the ggplots
type, one of the two options "linear predictor scale", "outcome scale"

Implement Spline Plotting Function

Add dev version
Merge to main branch

Writing Test Case for `bamlasso_var_selection`

Test:

Contains only linear part
Contains only parametric part
contains no variable
contains only non-linear part

Debug Cox Model bacoxph Fitting in Simulation

When running simulation with 190b084, errors are reported for bacoxph.

Error message for bacoxph is

initial values lead to overflow or underflow of the exp function

The minimum reproducible example is

library(tidyverse)
library(mgcv)
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
library(BHAM)
library(BhGLM)
#> 
#> Attaching package: 'BhGLM'
#> The following objects are masked from 'package:BHAM':
#> 
#>     De, mde, mt, NegBin, Student
library(survival)
library(simsurv)


# Fixed Parameters --------------------------------------------------------
# * Basic Parameters --------------------------------------------------------
k <- 10               # Smoothing function degrees
n_test <- 1000        # Training set Sample size

n_train <- 200
p <- c(4, 10, 50, 100, 200)[3]
rho <- c(0, 0.5)[1]
pi_cns <- c(0.15, 0.3, 0.4)[1]

# * Survival Parameters ---------------------------------------------------
shape.t <- 1.2     # Shape par of hazard distribution
scale.t <- 1       # Scale par of hazard distribution
shape.c <- 0.8     # Shape par of censoring distribution


# * Derived Parameters
n_total <- n_train + n_test


# Functions ---------------------------------------------------

# * Nonliner Functions ----------------------------------------------------
f_1 <- function(x) (x+1)^2/10
f_2 <- function(x) exp(x+1)/100
f_3 <- function(x) 3*sin(x)/20
f_4 <- function(x) (1.4*x+0.5)/10

Auto-regressive correlation matrix

@param p integer, the number of dimension
@param rho double, range between 0-1, 0 means mutually independent, 1 means complete correlated

@return A p*p dimension matrix

@examples
p <- 5
rho <- 0.8
X_mat <- MASS::mvrnorm(n=100, rep(0, p), AR(p, rho))

AR <- function(p, rho){
  rho^abs(outer(1:p,1:p,"-"))
}



it <- 1
set.seed(it)


# Helper Functions --------------------------------------------------------

Estimate the Weibull scale parameter for controlled censoring proportion

find_censor_parameter <- function(
  lambda, pi.cen,
  shape_hazard, shape_censor
){
  # Estimate the density function of the linear predictor
  dens <- density(lambda, bw = "nrd0",na.rm=TRUE)

  x<-dens$x
  y<-dens$y
  y.loess <-loess(y ~ x,span=0.1)


  density.fun.lambda<-function(x){
    pred.y <- predict(y.loess, newdata=x)
    return(pred.y)
  }

  integration_over_time <- function(
    u,               # Different values of lambda
    # Arguments for integration
    theta, shape_hazard, shape_censor
  ){
    ret <- integrate(            # Start (Integrate of time)
      f = function(ti, theta, alpha.t, alpha.c){
        lambda.i<-u
        part1<-density.fun.lambda(lambda.i)
        part2<-dweibull(ti,alpha.c,theta)
        part3<-exp(-(ti/lambda.i)^alpha.t)
        return(part1*part2*part3)
      },
      0, Inf,
      theta = theta, alpha.t = shape_hazard, alpha.c = shape_censor
    )
    return(ret$value)
  }


  censor.prop <- function(theta, p, shape_hazard, shape_censor){
    cen.P <- integrate(         # Start (Integrate of lambda)
      function(u, theta, shape_hazard, shape_censor){
        sapply(u, integration_over_time,
               # Arguments of integration_over_time
               theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
        )
      },
      # Limits of integration
      lower = min(lambda), upper = max(lambda),
      ## Arguments for integration function
      theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
    )$value     # End(int of lambda)
    return(cen.P-p)
  }

  theta<-uniroot(censor.prop, interval = c(0.1,200),
                 ## Argument for censor.prop
                 p = pi.cen, shape_hazard = shape_hazard, shape_censor = shape_censor,
                 tol=0.00000001)$root

  return(theta)
}

Create spline formula for high-dimension applications

create_HD_formula <- function(formula, data, spl_df, rm_overlap = TRUE, verbose=T){

  if(missing(formula) & missing(data))
    stop("Either formula or data must be provided. The response variable of the model is not supplied.")

  if(missing(formula)){
    if(!is.data.frame(data)) data <- data.frame(data)
    formula = DF2formula(data)
  }

  if(!missing(data)){
    warning("Both formula and dat provided, dat is ignored.")
    warning("Please consider use the function DF2formula and update to improve your formula.")
  }


  if(missing(spl_df)) {
    warning(" No additional spline terms supplied")
  } else {
    # Manipulate spl_df
    sp_trm <-  spl_df %>%
      dplyr::filter(Func!="")  %>% # Removing unnecessary terms
      glue::glue_data("{Func}( {Var}{ifelse(is.na(Args)||Args=='', '', paste0(',', Args))})") %>%
      paste(collapse  = " + ")
    sp_trm <- paste0("~ . + " , sp_trm)
  }
  # Adding Spline Terms
  ret <- update(formula, sp_trm)

  if(verbose){
    cat("Create formula:\n")
    print(ret)
  }

  return(ret)
}


# * Generate Data -------------------------------------------------
x_all <- MASS::mvrnorm(n_train+n_test, rep(0, p), AR(p, rho)) %>%
  data.frame
eta_all <- with(x_all, f_1(X1) + f_2(X2) + f_3(X3) + f_4(X4))
dat_all <- simsurv::simsurv(dist = "weibull",
                            lambdas = scale.t,
                            gammas = shape.t,
                            x = data.frame(eta = eta_all) ,
                            beta = c(eta = 1)) %>%
  data.frame( x_all, eta = eta_all, .)

train_dat <- dat_all[1:n_train, ]
test_dat <- dat_all[(n_train+1):n_total, ]


## Censoring Distribution, Weibull(alpha.c, scale.p)
scale.p <- find_censor_parameter(lambda = exp(-1*train_dat$eta/shape.t),
                                 pi.cen = pi_cns,
                                 shape_hazard = shape.t, shape_censor = shape.c)

train_dat <-  train_dat %>%
  data.frame(
    c_time = rweibull(n = n_train, shape = shape.c, scale = scale.p)
  ) %>%
  mutate(
    cen_ind = (c_time < eventtime),
    status = (!cen_ind)*1
  ) %>%
  rowwise() %>%
  mutate(time = min(c_time, eventtime)) %>%
  ungroup()



# Fit Models------------------------------------------------------------------


# * Spline Specification --------------------------------------------------
mgcv_df <- data.frame(
  Var = grep("X", names(train_dat), value = TRUE),
  Func = "s",
  Args = paste0("bs='cr', k=", k)
)


train_sm_dat <- construct_smooth_data(mgcv_df, train_dat)
train_smooth_data <- train_sm_dat$data

test_sm_dat <- BHAM::make_predict_dat(train_sm_dat$Smooth, dat = test_dat)

bacox_raw_mdl <- bacoxph(Surv(time, event = status) ~ ., data = data.frame(time = train_dat$time, status = train_dat$status,
                                                                           train_smooth_data),
                         prior = mde(), group = make_group(names(train_smooth_data)))

s0_seq <- seq(0.005, 0.1, length.out = 20)    # TODO: need to be optimized
cv_res <- tune.bgam(bacox_raw_mdl, nfolds = 5, s0= s0_seq, verbose = FALSE)
#> Fitting ncv*nfolds = 5 models:
#> Error in coxph(formula = formula, init = init, weights = weights, control = coxph.control(iter.max = 1, : initial values lead to overflow or underflow of the exp function

^{Created on 2022-01-02 by the reprex package (v2.0.1)}

Create a vignettes to explain cross-validation

Wrongful Offset when using non-jointly updating algorithm for `bacoxph`

The group updating algorithm via the argument method.coef uses offset when fitting coxph internally, which reserves the wrong offset when outputting the final model.

Fix Nested Cross Validation

When we use nested cross-validation in cv.bgam via the argument ncv = 10, the outcome is not organized well and misaligned due to the default vector concatenation.

cv.bgam(mdl, ncv = 10)

Algorithm update problem when using penalize_null = FALSE function

mdl3 <- bgam(y~.-y,
         data = data.frame(train_smooth_data, y = dat$y), family = "binomial", prior = mt(df=Inf), group = make_group(names(train_smooth_data), penalize_null = FALSE))

Rename Bases in make_predict_data

We need to rename the bases in align with construct_smooth_data, as var1.nullX and var1.penX

Expand the `construct_smooth_data` to accommodate multivariate spline function.

if smoothCon()[[1]] still makes sense

Remove unnecessary packages imported

I was using some packages from tidyverse to ease the programming. For example, I imported %>% function from magrittr package. However, in the DESCRIPTION file, I imported the whole magrittr package. Some other examples include glue_data from glue package, unbglue_unnest from unglue package. Is there any way to include one function instead of the whole package.

Remove the function sim_Bai from the package

Debug Cox Model bamlasso(family="cox") Fitting in Simulation

When running simulation with 190b084, errors are reported for bamlasso(family = "cox")

Error message for bamlasso is

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': 'data' must be of a vector type, was 'NULL'

The minimum reproducible example is

library(tidyverse)
library(mgcv)
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
library(BHAM)
library(survival)
library(simsurv)


# Fixed Parameters --------------------------------------------------------a
# * Basic Parameters --------------------------------------------------------
k <- 10               # Smoothing function degrees
n_test <- 1000        # Training set Sample size

n_train <- 200
p <- c(4, 10, 50, 100, 200)[1]
rho <- c(0, 0.5)[1]
pi_cns <- c(0.15, 0.3, 0.4)[1]

# * Survival Parameters ---------------------------------------------------
shape.t <- 1.2     # Shape par of hazard distribution
scale.t <- 1       # Scale par of hazard distribution
shape.c <- 0.8     # Shape par of censoring distribution


# * Derived Parameters
n_total <- n_train + n_test


# Functions ---------------------------------------------------

# * Nonliner Functions ----------------------------------------------------
f_1 <- function(x) (x+1)^2/10
f_2 <- function(x) exp(x+1)/100
f_3 <- function(x) 3*sin(x)/20
f_4 <- function(x) (1.4*x+0.5)/10

Auto-regressive correlation matrix

@param p integer, the number of dimension
@param rho double, range between 0-1, 0 means mutually independent, 1 means complete correlated

@return A p*p dimension matrix

@examples
p <- 5
rho <- 0.8
X_mat <- MASS::mvrnorm(n=100, rep(0, p), AR(p, rho))

AR <- function(p, rho){
  rho^abs(outer(1:p,1:p,"-"))
}



it <- 1
set.seed(it)


# Helper Functions --------------------------------------------------------

Estimate the Weibull scale parameter for controlled censoring proportion

find_censor_parameter <- function(
  lambda, pi.cen,
  shape_hazard, shape_censor
){
  # Estimate the density function of the linear predictor
  dens <- density(lambda, bw = "nrd0",na.rm=TRUE)
  
  x<-dens$x
  y<-dens$y
  y.loess <-loess(y ~ x,span=0.1)
  
  
  density.fun.lambda<-function(x){
    pred.y <- predict(y.loess, newdata=x)
    return(pred.y)
  }
  
  integration_over_time <- function(
    u,               # Different values of lambda
    # Arguments for integration
    theta, shape_hazard, shape_censor
  ){
    ret <- integrate(            # Start (Integrate of time)
      f = function(ti, theta, alpha.t, alpha.c){
        lambda.i<-u
        part1<-density.fun.lambda(lambda.i)
        part2<-dweibull(ti,alpha.c,theta)
        part3<-exp(-(ti/lambda.i)^alpha.t)
        return(part1*part2*part3)
      },
      0, Inf,
      theta = theta, alpha.t = shape_hazard, alpha.c = shape_censor
    )
    return(ret$value)
  }
  
  
  censor.prop <- function(theta, p, shape_hazard, shape_censor){
    cen.P <- integrate(         # Start (Integrate of lambda)
      function(u, theta, shape_hazard, shape_censor){
        sapply(u, integration_over_time,
               # Arguments of integration_over_time
               theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
        )
      },
      # Limits of integration
      lower = min(lambda), upper = max(lambda),
      ## Arguments for integration function
      theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
    )$value     # End(int of lambda)
    return(cen.P-p)
  }
  
  theta<-uniroot(censor.prop, interval = c(0.1,200),
                 ## Argument for censor.prop
                 p = pi.cen, shape_hazard = shape_hazard, shape_censor = shape_censor,
                 tol=0.00000001)$root
  
  return(theta)
}

Create spline formula for high-dimension applications

create_HD_formula <- function(formula, data, spl_df, rm_overlap = TRUE, verbose=T){
  
  if(missing(formula) & missing(data))
    stop("Either formula or data must be provided. The response variable of the model is not supplied.")
  
  if(missing(formula)){
    if(!is.data.frame(data)) data <- data.frame(data)
    formula = DF2formula(data)
  }
  
  if(!missing(data)){
    warning("Both formula and dat provided, dat is ignored.")
    warning("Please consider use the function DF2formula and update to improve your formula.")
  }
  
  
  if(missing(spl_df)) {
    warning(" No additional spline terms supplied")
  } else {
    # Manipulate spl_df
    sp_trm <-  spl_df %>%
      dplyr::filter(Func!="")  %>% # Removing unnecessary terms
      glue::glue_data("{Func}( {Var}{ifelse(is.na(Args)||Args=='', '', paste0(',', Args))})") %>%
      paste(collapse  = " + ")
    sp_trm <- paste0("~ . + " , sp_trm)
  }
  # Adding Spline Terms
  ret <- update(formula, sp_trm)
  
  if(verbose){
    cat("Create formula:\n")
    print(ret)
  }
  
  return(ret)
}


# * Generate Data -------------------------------------------------
x_all <- MASS::mvrnorm(n_train+n_test, rep(0, p), AR(p, rho)) %>%
  data.frame
eta_all <- with(x_all, f_1(X1) + f_2(X2) + f_3(X3) + f_4(X4))
dat_all <- simsurv::simsurv(dist = "weibull",
                            lambdas = scale.t,
                            gammas = shape.t,
                            x = data.frame(eta = eta_all) ,
                            beta = c(eta = 1)) %>%
  data.frame( x_all, eta = eta_all, .)

train_dat <- dat_all[1:n_train, ]
test_dat <- dat_all[(n_train+1):n_total, ]


## Censoring Distribution, Weibull(alpha.c, scale.p)
scale.p <- find_censor_parameter(lambda = exp(-1*train_dat$eta/shape.t),
                                 pi.cen = pi_cns,
                                 shape_hazard = shape.t, shape_censor = shape.c)

train_dat <-  train_dat %>%
  data.frame(
    c_time = rweibull(n = n_train, shape = shape.c, scale = scale.p)
  ) %>%
  mutate(
    cen_ind = (c_time < eventtime),
    status = (!cen_ind)*1
  ) %>%
  rowwise() %>%
  mutate(time = min(c_time, eventtime)) %>%
  ungroup()



# Fit Models------------------------------------------------------------------


# * Spline Specification --------------------------------------------------
mgcv_df <- data.frame(
  Var = grep("X", names(train_dat), value = TRUE),
  Func = "s",
  Args = paste0("bs='cr', k=", k)
)


train_sm_dat <- construct_smooth_data(mgcv_df, train_dat)
train_smooth_data <- train_sm_dat$data

test_sm_dat <- BHAM::make_predict_dat(train_sm_dat$Smooth, dat = test_dat)

bamlasso_raw_mdl <- bamlasso( x = train_smooth_data, y = Surv(train_dat$time, event = train_dat$status),
                              family = "cox", group = make_group(names(train_smooth_data)),
                              ss = c(0.04, 0.5))
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': 'data' must be of a vector type, was 'NULL'

^{Created on 2021-12-29 by the reprex package (v2.0.1)}

Session info

sessioninfo::session_info()
#> ─ Session info  🛶  🕥  🛀🏽   ─────────────────────────────────────────────────
#>  hash: canoe, ten-thirty, person taking bath: medium skin tone
#> 
#>  setting  value
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Catalina 10.15.7
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Chicago
#>  date     2021-12-29
#>  pandoc   2.14.0.3 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.3.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  BHAM        * 0.1.0.9002 2021-12-29 [1] local
#>  broom         0.7.10     2021-10-31 [1] CRAN (R 4.1.0)
#>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 4.1.0)
#>  cli           3.1.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  codetools     0.2-18     2020-11-04 [1] CRAN (R 4.1.1)
#>  colorspace    2.0-2      2021-06-24 [1] CRAN (R 4.1.0)
#>  crayon        1.4.2      2021-10-29 [1] CRAN (R 4.1.0)
#>  DBI           1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
#>  dbplyr        2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
#>  digest        0.6.28     2021-09-23 [1] CRAN (R 4.1.0)
#>  dplyr       * 1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  forcats     * 0.5.1      2021-01-27 [1] CRAN (R 4.1.0)
#>  foreach       1.5.1      2020-10-15 [1] CRAN (R 4.1.0)
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
#>  generics      0.1.1      2021-10-25 [1] CRAN (R 4.1.0)
#>  ggplot2     * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
#>  glmnet        4.1-3      2021-11-02 [1] CRAN (R 4.1.0)
#>  glue          1.5.0      2021-11-07 [1] CRAN (R 4.1.0)
#>  gtable        0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
#>  haven         2.4.3      2021-08-04 [1] CRAN (R 4.1.0)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
#>  httr          1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
#>  iterators     1.0.13     2020-10-15 [1] CRAN (R 4.1.0)
#>  jsonlite      1.7.2      2020-12-09 [1] CRAN (R 4.1.0)
#>  knitr         1.36       2021-09-29 [1] CRAN (R 4.1.0)
#>  lattice       0.20-45    2021-09-22 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
#>  lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
#>  MASS          7.3-54     2021-05-03 [1] CRAN (R 4.1.1)
#>  Matrix        1.3-4      2021-06-01 [1] CRAN (R 4.1.1)
#>  mgcv        * 1.8-38     2021-10-06 [1] CRAN (R 4.1.0)
#>  modelr        0.1.8      2020-05-19 [1] CRAN (R 4.1.0)
#>  munsell       0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
#>  nlme        * 3.1-153    2021-09-07 [1] CRAN (R 4.1.0)
#>  pillar        1.6.4      2021-10-18 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
#>  plyr          1.8.6      2020-03-03 [1] CRAN (R 4.1.0)
#>  pROC          1.18.0     2021-09-03 [1] CRAN (R 4.1.0)
#>  purrr       * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0     2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0     2021-09-26 [1] CRAN (R 4.1.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.7      2021-07-07 [1] CRAN (R 4.1.0)
#>  readr       * 2.0.2      2021-09-27 [1] CRAN (R 4.1.0)
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 4.1.0)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         0.4.12     2021-10-18 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
#>  rvest         1.0.2      2021-10-16 [1] CRAN (R 4.1.1)
#>  scales        1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.1      2021-11-02 [1] CRAN (R 4.1.0)
#>  shape         1.4.6      2021-05-19 [1] CRAN (R 4.1.0)
#>  simsurv     * 1.0.0      2021-01-13 [1] CRAN (R 4.1.0)
#>  stringi       1.7.5      2021-10-04 [1] CRAN (R 4.1.0)
#>  stringr     * 1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2      2021-09-23 [1] CRAN (R 4.1.0)
#>  survival    * 3.2-13     2021-08-24 [1] CRAN (R 4.1.0)
#>  tibble      * 3.1.5      2021-09-30 [1] CRAN (R 4.1.0)
#>  tidyr       * 1.1.4      2021-09-27 [1] CRAN (R 4.1.0)
#>  tidyselect    1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
#>  tidyverse   * 1.3.1      2021-04-15 [1] CRAN (R 4.1.0)
#>  tzdb          0.2.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  unglue        0.1.0      2020-06-11 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.28       2021-11-04 [1] CRAN (R 4.1.0)
#>  xml2          1.3.2      2020-04-23 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.1.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Tuning Fucntion for bamlasso(family="cox")

Examine the prior update of the procedures

Check if the update of the penalty makes sense
- Set up bglm_spline one predictor per group and compare to Lasso and bglm
Check if the group update makes sense
- Iterate over the update of the groups

Rename the bases from construct_smooth_data

Currently, the "design matrix" returned from contruct_smooth_data are named in the pattern var_name.baseX. However, such a naming convention doesn't help to distinguish the null space from the penalized space, which creates more problems during variable/function selection.

The naming convention should be a combination of var_name.nullX and var_name.penX. It also helps with creating groups.

Feature Selection function

We should be implementing the feature selection function for bglm_spline and bmlasso_spline model.

For bmlasso_spline model,

we can check the p or ptheta of the fitted model, and soft-thresholding to get which one would be selected in the model.

For bglm_spline, there could be two ways

Soft thresholding p, like described for bmlasso_spline model
Hypothesis testing, we need to estimate the degree of freedom of the components in order to do hypothesis testing.
- The null space should have the degree of freedom corresponding to the number of dimensions
- The penalized space, we need to calculate the effective degree of freedom.

Investigation of the theta prior implementation in 408874c

In the commit 408874c

TODO:

Rename the previous simulation results with the commit number
Create a branch shared_indicator
Modify the code, make_group as well as update.scale.p.group similar to the branch eff_hrd via 6244a7d
Perform simulations and analyze the performance
See Simulation Result at /data/user/boyiguo1/bgam/sim_res_6244a7d/ . The simulation results is not fine-tuned.
Merge to main

Find a example dataset for examples

Implement Effect Heredity for Linear & Non-linear Component Selection

Create a new branch eff_hrd
Implement the new prior and updating algorithm according to the attached pdf.SS_GAM.pdf via 3270a96
Examine performance of the new algorithm with simulation
See Simulation Result at /data/user/boyiguo1/bgam/sim_res_3270a96/ . The simulation results is not fine-tuned.
Remove TODO(boyiguo1) message as inline comments
Pull request to main