Giter Site home page Giter Site logo

boyiguo1 / bham Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 1.04 MB

Bayesian Hierarchical Additive Models

Home Page: https://boyiguo1.github.io/BHAM

License: Other

R 99.46% C 0.48% Rez 0.06%
bayesian spike-and-slab additive-models r-package r rstats

bham's Introduction

BHAM

Lifecycle: experimental R-CMD-check

The goal of BHAM is to fit Bayesian hierarchical generalized additive models, and provide ancillary functions for high-dimensional model specification, model summary, variable selection, curve fitting.

Installation

The development version from GitHub with:

# install.packages("devtools")
devtools::install_github("boyiguo1/BHAM")

Example

Please visit the vignettes for a quick introduction

bham's People

Contributors

boyiguo1 avatar

Watchers

 avatar  avatar

bham's Issues

Develop Cox additive Model

We rely on some of the c functions from the library survival for constructing counting process. However, these functions are internal. Without spending a lot of time, we will just to copy the internal functions to the current project

Curve Plotting Functions

To develop the curve plotting functions

  1. Track if the variable name is in the set of variables names
    • If yes, pull all the relevant variables
    • If no, return an error message, "No variable found in the model predictor space. Please double check."
  2. Use a loop to create a graph for each of the variables of interest
    • retrieve the range of interested variable
    • create design matrix of the coefficient
    • calculate the risk ratio or linear predictor
    • make the plots
  3. Assemble the graphs

Parameters:

  • model, a object of bham
  • variable names, a vector of characteristic string including the variables of interest
  • plot, a logical variable indicating if a ggplot object are returned, or a list containing the ggplots
  • type, one of the two options "linear predictor scale", "outcome scale"

Debug Cox Model bacoxph Fitting in Simulation

When running simulation with 190b084, errors are reported for bacoxph.

Error message for bacoxph is

initial values lead to overflow or underflow of the exp function


The minimum reproducible example is

library(tidyverse)
library(mgcv)
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
library(BHAM)
library(BhGLM)
#> 
#> Attaching package: 'BhGLM'
#> The following objects are masked from 'package:BHAM':
#> 
#>     De, mde, mt, NegBin, Student
library(survival)
library(simsurv)


# Fixed Parameters --------------------------------------------------------
# * Basic Parameters --------------------------------------------------------
k <- 10               # Smoothing function degrees
n_test <- 1000        # Training set Sample size

n_train <- 200
p <- c(4, 10, 50, 100, 200)[3]
rho <- c(0, 0.5)[1]
pi_cns <- c(0.15, 0.3, 0.4)[1]

# * Survival Parameters ---------------------------------------------------
shape.t <- 1.2     # Shape par of hazard distribution
scale.t <- 1       # Scale par of hazard distribution
shape.c <- 0.8     # Shape par of censoring distribution


# * Derived Parameters
n_total <- n_train + n_test


# Functions ---------------------------------------------------

# * Nonliner Functions ----------------------------------------------------
f_1 <- function(x) (x+1)^2/10
f_2 <- function(x) exp(x+1)/100
f_3 <- function(x) 3*sin(x)/20
f_4 <- function(x) (1.4*x+0.5)/10

Auto-regressive correlation matrix

@param p integer, the number of dimension
@param rho double, range between 0-1, 0 means mutually independent, 1 means complete correlated

@return A p*p dimension matrix

@examples
p <- 5
rho <- 0.8
X_mat <- MASS::mvrnorm(n=100, rep(0, p), AR(p, rho))

AR <- function(p, rho){
  rho^abs(outer(1:p,1:p,"-"))
}



it <- 1
set.seed(it)


# Helper Functions --------------------------------------------------------

Estimate the Weibull scale parameter for controlled censoring proportion

find_censor_parameter <- function(
  lambda, pi.cen,
  shape_hazard, shape_censor
){
  # Estimate the density function of the linear predictor
  dens <- density(lambda, bw = "nrd0",na.rm=TRUE)

  x<-dens$x
  y<-dens$y
  y.loess <-loess(y ~ x,span=0.1)


  density.fun.lambda<-function(x){
    pred.y <- predict(y.loess, newdata=x)
    return(pred.y)
  }

  integration_over_time <- function(
    u,               # Different values of lambda
    # Arguments for integration
    theta, shape_hazard, shape_censor
  ){
    ret <- integrate(            # Start (Integrate of time)
      f = function(ti, theta, alpha.t, alpha.c){
        lambda.i<-u
        part1<-density.fun.lambda(lambda.i)
        part2<-dweibull(ti,alpha.c,theta)
        part3<-exp(-(ti/lambda.i)^alpha.t)
        return(part1*part2*part3)
      },
      0, Inf,
      theta = theta, alpha.t = shape_hazard, alpha.c = shape_censor
    )
    return(ret$value)
  }


  censor.prop <- function(theta, p, shape_hazard, shape_censor){
    cen.P <- integrate(         # Start (Integrate of lambda)
      function(u, theta, shape_hazard, shape_censor){
        sapply(u, integration_over_time,
               # Arguments of integration_over_time
               theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
        )
      },
      # Limits of integration
      lower = min(lambda), upper = max(lambda),
      ## Arguments for integration function
      theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
    )$value     # End(int of lambda)
    return(cen.P-p)
  }

  theta<-uniroot(censor.prop, interval = c(0.1,200),
                 ## Argument for censor.prop
                 p = pi.cen, shape_hazard = shape_hazard, shape_censor = shape_censor,
                 tol=0.00000001)$root

  return(theta)
}

Create spline formula for high-dimension applications

create_HD_formula <- function(formula, data, spl_df, rm_overlap = TRUE, verbose=T){

  if(missing(formula) & missing(data))
    stop("Either formula or data must be provided. The response variable of the model is not supplied.")

  if(missing(formula)){
    if(!is.data.frame(data)) data <- data.frame(data)
    formula = DF2formula(data)
  }

  if(!missing(data)){
    warning("Both formula and dat provided, dat is ignored.")
    warning("Please consider use the function DF2formula and update to improve your formula.")
  }


  if(missing(spl_df)) {
    warning(" No additional spline terms supplied")
  } else {
    # Manipulate spl_df
    sp_trm <-  spl_df %>%
      dplyr::filter(Func!="")  %>% # Removing unnecessary terms
      glue::glue_data("{Func}( {Var}{ifelse(is.na(Args)||Args=='', '', paste0(',', Args))})") %>%
      paste(collapse  = " + ")
    sp_trm <- paste0("~ . + " , sp_trm)
  }
  # Adding Spline Terms
  ret <- update(formula, sp_trm)

  if(verbose){
    cat("Create formula:\n")
    print(ret)
  }

  return(ret)
}


# * Generate Data -------------------------------------------------
x_all <- MASS::mvrnorm(n_train+n_test, rep(0, p), AR(p, rho)) %>%
  data.frame
eta_all <- with(x_all, f_1(X1) + f_2(X2) + f_3(X3) + f_4(X4))
dat_all <- simsurv::simsurv(dist = "weibull",
                            lambdas = scale.t,
                            gammas = shape.t,
                            x = data.frame(eta = eta_all) ,
                            beta = c(eta = 1)) %>%
  data.frame( x_all, eta = eta_all, .)

train_dat <- dat_all[1:n_train, ]
test_dat <- dat_all[(n_train+1):n_total, ]


## Censoring Distribution, Weibull(alpha.c, scale.p)
scale.p <- find_censor_parameter(lambda = exp(-1*train_dat$eta/shape.t),
                                 pi.cen = pi_cns,
                                 shape_hazard = shape.t, shape_censor = shape.c)

train_dat <-  train_dat %>%
  data.frame(
    c_time = rweibull(n = n_train, shape = shape.c, scale = scale.p)
  ) %>%
  mutate(
    cen_ind = (c_time < eventtime),
    status = (!cen_ind)*1
  ) %>%
  rowwise() %>%
  mutate(time = min(c_time, eventtime)) %>%
  ungroup()



# Fit Models------------------------------------------------------------------


# * Spline Specification --------------------------------------------------
mgcv_df <- data.frame(
  Var = grep("X", names(train_dat), value = TRUE),
  Func = "s",
  Args = paste0("bs='cr', k=", k)
)


train_sm_dat <- construct_smooth_data(mgcv_df, train_dat)
train_smooth_data <- train_sm_dat$data

test_sm_dat <- BHAM::make_predict_dat(train_sm_dat$Smooth, dat = test_dat)

bacox_raw_mdl <- bacoxph(Surv(time, event = status) ~ ., data = data.frame(time = train_dat$time, status = train_dat$status,
                                                                           train_smooth_data),
                         prior = mde(), group = make_group(names(train_smooth_data)))

s0_seq <- seq(0.005, 0.1, length.out = 20)    # TODO: need to be optimized
cv_res <- tune.bgam(bacox_raw_mdl, nfolds = 5, s0= s0_seq, verbose = FALSE)
#> Fitting ncv*nfolds = 5 models:
#> Error in coxph(formula = formula, init = init, weights = weights, control = coxph.control(iter.max = 1, : initial values lead to overflow or underflow of the exp function

Created on 2022-01-02 by the reprex package (v2.0.1)

Fix Nested Cross Validation

When we use nested cross-validation in cv.bgam via the argument ncv = 10, the outcome is not organized well and misaligned due to the default vector concatenation.

cv.bgam(mdl, ncv = 10)

Remove unnecessary packages imported

I was using some packages from tidyverse to ease the programming. For example, I imported %>% function from magrittr package. However, in the DESCRIPTION file, I imported the whole magrittr package. Some other examples include glue_data from glue package, unbglue_unnest from unglue package. Is there any way to include one function instead of the whole package.

Debug Cox Model bamlasso(family="cox") Fitting in Simulation

When running simulation with 190b084, errors are reported for bamlasso(family = "cox")

Error message for bamlasso is

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': 'data' must be of a vector type, was 'NULL'


The minimum reproducible example is

library(tidyverse)
library(mgcv)
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
library(BHAM)
library(survival)
library(simsurv)


# Fixed Parameters --------------------------------------------------------a
# * Basic Parameters --------------------------------------------------------
k <- 10               # Smoothing function degrees
n_test <- 1000        # Training set Sample size

n_train <- 200
p <- c(4, 10, 50, 100, 200)[1]
rho <- c(0, 0.5)[1]
pi_cns <- c(0.15, 0.3, 0.4)[1]

# * Survival Parameters ---------------------------------------------------
shape.t <- 1.2     # Shape par of hazard distribution
scale.t <- 1       # Scale par of hazard distribution
shape.c <- 0.8     # Shape par of censoring distribution


# * Derived Parameters
n_total <- n_train + n_test


# Functions ---------------------------------------------------

# * Nonliner Functions ----------------------------------------------------
f_1 <- function(x) (x+1)^2/10
f_2 <- function(x) exp(x+1)/100
f_3 <- function(x) 3*sin(x)/20
f_4 <- function(x) (1.4*x+0.5)/10

Auto-regressive correlation matrix

@param p integer, the number of dimension
@param rho double, range between 0-1, 0 means mutually independent, 1 means complete correlated

@return A p*p dimension matrix

@examples
p <- 5
rho <- 0.8
X_mat <- MASS::mvrnorm(n=100, rep(0, p), AR(p, rho))

AR <- function(p, rho){
  rho^abs(outer(1:p,1:p,"-"))
}



it <- 1
set.seed(it)


# Helper Functions --------------------------------------------------------

Estimate the Weibull scale parameter for controlled censoring proportion

find_censor_parameter <- function(
  lambda, pi.cen,
  shape_hazard, shape_censor
){
  # Estimate the density function of the linear predictor
  dens <- density(lambda, bw = "nrd0",na.rm=TRUE)
  
  x<-dens$x
  y<-dens$y
  y.loess <-loess(y ~ x,span=0.1)
  
  
  density.fun.lambda<-function(x){
    pred.y <- predict(y.loess, newdata=x)
    return(pred.y)
  }
  
  integration_over_time <- function(
    u,               # Different values of lambda
    # Arguments for integration
    theta, shape_hazard, shape_censor
  ){
    ret <- integrate(            # Start (Integrate of time)
      f = function(ti, theta, alpha.t, alpha.c){
        lambda.i<-u
        part1<-density.fun.lambda(lambda.i)
        part2<-dweibull(ti,alpha.c,theta)
        part3<-exp(-(ti/lambda.i)^alpha.t)
        return(part1*part2*part3)
      },
      0, Inf,
      theta = theta, alpha.t = shape_hazard, alpha.c = shape_censor
    )
    return(ret$value)
  }
  
  
  censor.prop <- function(theta, p, shape_hazard, shape_censor){
    cen.P <- integrate(         # Start (Integrate of lambda)
      function(u, theta, shape_hazard, shape_censor){
        sapply(u, integration_over_time,
               # Arguments of integration_over_time
               theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
        )
      },
      # Limits of integration
      lower = min(lambda), upper = max(lambda),
      ## Arguments for integration function
      theta = theta, shape_hazard = shape_hazard, shape_censor = shape_censor
    )$value     # End(int of lambda)
    return(cen.P-p)
  }
  
  theta<-uniroot(censor.prop, interval = c(0.1,200),
                 ## Argument for censor.prop
                 p = pi.cen, shape_hazard = shape_hazard, shape_censor = shape_censor,
                 tol=0.00000001)$root
  
  return(theta)
}

Create spline formula for high-dimension applications

create_HD_formula <- function(formula, data, spl_df, rm_overlap = TRUE, verbose=T){
  
  if(missing(formula) & missing(data))
    stop("Either formula or data must be provided. The response variable of the model is not supplied.")
  
  if(missing(formula)){
    if(!is.data.frame(data)) data <- data.frame(data)
    formula = DF2formula(data)
  }
  
  if(!missing(data)){
    warning("Both formula and dat provided, dat is ignored.")
    warning("Please consider use the function DF2formula and update to improve your formula.")
  }
  
  
  if(missing(spl_df)) {
    warning(" No additional spline terms supplied")
  } else {
    # Manipulate spl_df
    sp_trm <-  spl_df %>%
      dplyr::filter(Func!="")  %>% # Removing unnecessary terms
      glue::glue_data("{Func}( {Var}{ifelse(is.na(Args)||Args=='', '', paste0(',', Args))})") %>%
      paste(collapse  = " + ")
    sp_trm <- paste0("~ . + " , sp_trm)
  }
  # Adding Spline Terms
  ret <- update(formula, sp_trm)
  
  if(verbose){
    cat("Create formula:\n")
    print(ret)
  }
  
  return(ret)
}


# * Generate Data -------------------------------------------------
x_all <- MASS::mvrnorm(n_train+n_test, rep(0, p), AR(p, rho)) %>%
  data.frame
eta_all <- with(x_all, f_1(X1) + f_2(X2) + f_3(X3) + f_4(X4))
dat_all <- simsurv::simsurv(dist = "weibull",
                            lambdas = scale.t,
                            gammas = shape.t,
                            x = data.frame(eta = eta_all) ,
                            beta = c(eta = 1)) %>%
  data.frame( x_all, eta = eta_all, .)

train_dat <- dat_all[1:n_train, ]
test_dat <- dat_all[(n_train+1):n_total, ]


## Censoring Distribution, Weibull(alpha.c, scale.p)
scale.p <- find_censor_parameter(lambda = exp(-1*train_dat$eta/shape.t),
                                 pi.cen = pi_cns,
                                 shape_hazard = shape.t, shape_censor = shape.c)

train_dat <-  train_dat %>%
  data.frame(
    c_time = rweibull(n = n_train, shape = shape.c, scale = scale.p)
  ) %>%
  mutate(
    cen_ind = (c_time < eventtime),
    status = (!cen_ind)*1
  ) %>%
  rowwise() %>%
  mutate(time = min(c_time, eventtime)) %>%
  ungroup()



# Fit Models------------------------------------------------------------------


# * Spline Specification --------------------------------------------------
mgcv_df <- data.frame(
  Var = grep("X", names(train_dat), value = TRUE),
  Func = "s",
  Args = paste0("bs='cr', k=", k)
)


train_sm_dat <- construct_smooth_data(mgcv_df, train_dat)
train_smooth_data <- train_sm_dat$data

test_sm_dat <- BHAM::make_predict_dat(train_sm_dat$Smooth, dat = test_dat)

bamlasso_raw_mdl <- bamlasso( x = train_smooth_data, y = Surv(train_dat$time, event = train_dat$status),
                              family = "cox", group = make_group(names(train_smooth_data)),
                              ss = c(0.04, 0.5))
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': 'data' must be of a vector type, was 'NULL'

Created on 2021-12-29 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> โ”€ Session info  ๐Ÿ›ถ  ๐Ÿ•ฅ  ๐Ÿ›€๐Ÿฝ   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  hash: canoe, ten-thirty, person taking bath: medium skin tone
#> 
#>  setting  value
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Catalina 10.15.7
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Chicago
#>  date     2021-12-29
#>  pandoc   2.14.0.3 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)
#> 
#> โ”€ Packages โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  package     * version    date (UTC) lib source
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.3.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  BHAM        * 0.1.0.9002 2021-12-29 [1] local
#>  broom         0.7.10     2021-10-31 [1] CRAN (R 4.1.0)
#>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 4.1.0)
#>  cli           3.1.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  codetools     0.2-18     2020-11-04 [1] CRAN (R 4.1.1)
#>  colorspace    2.0-2      2021-06-24 [1] CRAN (R 4.1.0)
#>  crayon        1.4.2      2021-10-29 [1] CRAN (R 4.1.0)
#>  DBI           1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
#>  dbplyr        2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
#>  digest        0.6.28     2021-09-23 [1] CRAN (R 4.1.0)
#>  dplyr       * 1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  forcats     * 0.5.1      2021-01-27 [1] CRAN (R 4.1.0)
#>  foreach       1.5.1      2020-10-15 [1] CRAN (R 4.1.0)
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
#>  generics      0.1.1      2021-10-25 [1] CRAN (R 4.1.0)
#>  ggplot2     * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
#>  glmnet        4.1-3      2021-11-02 [1] CRAN (R 4.1.0)
#>  glue          1.5.0      2021-11-07 [1] CRAN (R 4.1.0)
#>  gtable        0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
#>  haven         2.4.3      2021-08-04 [1] CRAN (R 4.1.0)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
#>  httr          1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
#>  iterators     1.0.13     2020-10-15 [1] CRAN (R 4.1.0)
#>  jsonlite      1.7.2      2020-12-09 [1] CRAN (R 4.1.0)
#>  knitr         1.36       2021-09-29 [1] CRAN (R 4.1.0)
#>  lattice       0.20-45    2021-09-22 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
#>  lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
#>  MASS          7.3-54     2021-05-03 [1] CRAN (R 4.1.1)
#>  Matrix        1.3-4      2021-06-01 [1] CRAN (R 4.1.1)
#>  mgcv        * 1.8-38     2021-10-06 [1] CRAN (R 4.1.0)
#>  modelr        0.1.8      2020-05-19 [1] CRAN (R 4.1.0)
#>  munsell       0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
#>  nlme        * 3.1-153    2021-09-07 [1] CRAN (R 4.1.0)
#>  pillar        1.6.4      2021-10-18 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
#>  plyr          1.8.6      2020-03-03 [1] CRAN (R 4.1.0)
#>  pROC          1.18.0     2021-09-03 [1] CRAN (R 4.1.0)
#>  purrr       * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0     2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0     2021-09-26 [1] CRAN (R 4.1.0)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.7      2021-07-07 [1] CRAN (R 4.1.0)
#>  readr       * 2.0.2      2021-09-27 [1] CRAN (R 4.1.0)
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 4.1.0)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         0.4.12     2021-10-18 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
#>  rvest         1.0.2      2021-10-16 [1] CRAN (R 4.1.1)
#>  scales        1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.1      2021-11-02 [1] CRAN (R 4.1.0)
#>  shape         1.4.6      2021-05-19 [1] CRAN (R 4.1.0)
#>  simsurv     * 1.0.0      2021-01-13 [1] CRAN (R 4.1.0)
#>  stringi       1.7.5      2021-10-04 [1] CRAN (R 4.1.0)
#>  stringr     * 1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2      2021-09-23 [1] CRAN (R 4.1.0)
#>  survival    * 3.2-13     2021-08-24 [1] CRAN (R 4.1.0)
#>  tibble      * 3.1.5      2021-09-30 [1] CRAN (R 4.1.0)
#>  tidyr       * 1.1.4      2021-09-27 [1] CRAN (R 4.1.0)
#>  tidyselect    1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
#>  tidyverse   * 1.3.1      2021-04-15 [1] CRAN (R 4.1.0)
#>  tzdb          0.2.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  unglue        0.1.0      2020-06-11 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.28       2021-11-04 [1] CRAN (R 4.1.0)
#>  xml2          1.3.2      2020-04-23 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.1.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Examine the prior update of the procedures

  • Check if the update of the penalty makes sense
    • Set up bglm_spline one predictor per group and compare to Lasso and bglm
  • Check if the group update makes sense
    • Iterate over the update of the groups

Rename the bases from construct_smooth_data

Currently, the "design matrix" returned from contruct_smooth_data are named in the pattern var_name.baseX. However, such a naming convention doesn't help to distinguish the null space from the penalized space, which creates more problems during variable/function selection.

The naming convention should be a combination of var_name.nullX and var_name.penX. It also helps with creating groups.

Feature Selection function

We should be implementing the feature selection function for bglm_spline and bmlasso_spline model.

For bmlasso_spline model,

we can check the p or ptheta of the fitted model, and soft-thresholding to get which one would be selected in the model.

For bglm_spline, there could be two ways

  • Soft thresholding p, like described for bmlasso_spline model
  • Hypothesis testing, we need to estimate the degree of freedom of the components in order to do hypothesis testing.
    • The null space should have the degree of freedom corresponding to the number of dimensions
    • The penalized space, we need to calculate the effective degree of freedom.

Investigation of the theta prior implementation in 408874c

In the commit 408874c

TODO:

  • Rename the previous simulation results with the commit number
  • Create a branch shared_indicator
  • Modify the code, make_group as well as update.scale.p.group similar to the branch eff_hrd via 6244a7d
  • Perform simulations and analyze the performance
  • See Simulation Result at /data/user/boyiguo1/bgam/sim_res_6244a7d/ . The simulation results is not fine-tuned.
  • Merge to main

Implement Effect Heredity for Linear & Non-linear Component Selection

  • Create a new branch eff_hrd

  • Implement the new prior and updating algorithm according to the attached pdf.SS_GAM.pdf via 3270a96

  • Examine performance of the new algorithm with simulation

  • See Simulation Result at /data/user/boyiguo1/bgam/sim_res_3270a96/ . The simulation results is not fine-tuned.

  • Remove TODO(boyiguo1) message as inline comments

  • Pull request to main

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.