The drtmle from benkeser

R Build Warnings/Errors

@nhejazi -- Mind checking out these warnings and build errors when you have a minute?

CC Super Learner methods fail sometimes

Seems like the convex combination methods in SuperLearner fail with error

Error: matrix D in quadratic function is not positive definite!

too frequently. Should include method based on Rsolnp with tryCatch fail safes?

Add section on common warnings and their meanings to vignette

Error when guard = "Q" or guard = "g" only

Hi! I somehow ran into the following error when I only specify guard = "Q" or guard = "g" only:
object 'grn' not found
or
object 'QrnStar' not found

Could you please help me on this issue?
Many thanks!

use of foreach in tests

The current CRAN build report/check identifies a warning that must be fixed before 2018-07-02.

Check Details:

Version: 1.0.2
Check: for unstated dependencies in ‘tests’
Result: WARN
    '::' or ':::' import not declared from: ‘foreach’
    'library' or 'require' call not declared from: ‘foreach’

Fixing user-specified gn and Qn, the result doesn't change from iteration to iteration

Hi all, when I'm trying to use drtmle with my own specified initial estimates in gn and Qn, the result doesn't change from iteration to iteration. Am I missing something?
Thanks!

drtmle not changing from iteration to iteration if specifying outcome model as zero

Hi, I run into another issue while using drtmle: when I specify glm_Q = '0' (putting constant zeros as starting outcome model estimates), drtmle doesn't seem to run its iterative algorithm, but glm_Q = '1' still works.

X <- runif(100, 0, 1)

Q <- X

g <- exp(X) / (1 + exp(X))

A <- rbinom(100, 1, g)

Y <- Q + runif(100, -0.1, 0.1)

X <- as.data.frame(X)

Zero to Q

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '0', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 1)

a$drtmle$est

[1] -0.03679124

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '0', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 2)

a$drtmle$est

[1] -0.03679124

Intercept only model

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '1', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 1)

a$drtmle$est

[1] 0.4682421

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '1', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 2)

a$drtmle$est

[1] 0.4671978

fk_regression... A faster implementation of univariate kernel regression

In case this is of interest, I stumbled upon an implementation of kernel regression in one dimension that offers significant speed improvements and scalability compared to npreg (and this is an understatement). The package is called FKSUM (an interesting name choice), and you can find a link to it here.

Below is the SuperLearner wrapper I have been using. I used the Nadaraya-Watson implementation, but `fk_regression' also supports local linear regression, which may be a better option.

SL.fastkernel <- function (Y, X, newX, family = gaussian(), obsWeights = rep(1,
                                                                               length(Y)), rangeThresh = 1e-07, ...)
{

  X <- as.matrix(X)
  newX <- as.matrix(newX)
  if(ncol(X) > 1) {
    stop("Univariate X kernel smooths only.")
  }
  fit <- FKSUM::fk_regression(X, Y, type = 'NW')
  pred <- predict(fit, xtest = newX)

  fit <- list(object = fit)
  class(fit) <- "SL.fastkernel"
  out <- list(pred = pred, fit = fit)
  return(out)
}

predict.SL.fastkernel <- function (object, newdata, ...)
{

  pred <- predict(object, xtest = as.matrix(newdata))

  return(pred)
}

future::plan(): always undo changes

Please undo the changes to future::plan() you do when you exit your function in for instance,

drtmle/R/adaptive_iptw.R

Lines 147 to 168 in 922844d

    
           if (!parallel) { 
        
             future::plan(future::transparent) 
        
           } else { 
        
             doFuture::registerDoFuture() 
        
             if (all(c("sequential", "uniprocess") %in% class(future::plan())) & 
        
               is.null(future_hpc)) { 
        
               future::plan(future::multiprocess) 
        
             } else if (!is.null(future_hpc)) { 
        
               if (future_hpc == "batchtools_torque") { 
        
                 future::plan(future.batchtools::batchtools_torque) 
        
               } else if (future_hpc == "batchtools_slurm") { 
        
                 future::plan(future.batchtools::batchtools_slurm) 
        
               } else if (future_hpc == "batchtools_sge") { 
        
                 future::plan(future.batchtools::batchtools_sge) 
        
               } else if (future_hpc == "batchtools_lsf") { 
        
                 future::plan(future.batchtools::batchtools_lsf) 
        
               } else if (future_hpc == "batchtools_openlava") { 
        
                 future::plan(future.batchtools::batchtools_openlava) 
        
               } else { 
        
                 stop("The currently specified HPC backend is not (yet) available.") 
        
               } 
        
             }

From ?future::plan:

For package developers

Please refrain from modifying the future strategy inside your packages / functions, i.e. do not call plan() in your code. Instead, leave the control on what backend to use to the end user. This idea is part of the core philosophy of the future framework - as a developer you can never know what future backends the user have access to. Moreover, by not making any assumptions about what backends are available, your code will also work automatically with any new backends developed after you wrote your code.

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using:
  oplan <- plan(new_set_of_strategies)
  on.exit(plan(oplan), add = TRUE)
  [...]
This is important because the end-user might have already set the future strategy elsewhere for other purposes and will most likely not known that calling your function will break their setup. Remember, your package and its functions might be used in a greater context where multiple packages and functions are involved and those might also rely on the future framework, so it is important to avoid stepping on others' toes.

BTW,

The & in all(c("sequential", "uniprocess") %in% class(future::plan())) & is.null(future_hpc) could be replaced with a && to make that statement more robust.
I think inherits(future::plan(), c("sequential", "uniprocess")) is better than c("sequential", "uniprocess") %in% class(future::plan()).

drtmle errors when user inputs Qn and gn and returnModels = TRUE

Issue with delta method and 'contrast'

Hi,

I'm noticing an issue with the risk ratio 'contrast' option in ci() in cases with a multi-level treatment. Here is the example from the vignette where I've computed the risk ratio and confidence interval 3 ways (two manually, which agree, and one with ci(), which is much wider):

library(drtmle)
library(dplyr)
library(SuperLearner)


#simulate data as in vignette
set.seed(1234)
n <- 200
W <- data.frame(W1 = runif(n), W2 = rbinom(n, 1, 0.5))
A <- rbinom(n, 2, plogis(W$W1 + W$W2))
Y <- rbinom(n, 1, plogis(W$W1 + W$W2*A))

glm_fit_multiA <- drtmle(W = W, A = A, Y = Y, stratify = FALSE,
                                      glm_g = "W1 + W2", glm_Q = "W1 + W2*A",
                                      glm_gr = "Qn", glm_Qr = "gn",
                                      family = binomial(), a_0 = c(0,1,2))
##

#compute RR as per example
riskRatio <- list(f = function(eff){ log(eff) },
                  f_inv = function(eff){ exp(eff) },
                  h = function(est){ est[1]/est[2] },
                  fh_grad =  function(est){ c(1/est[1],-1/est[2], 0) }) #added 0
CI.builtin <- ci(glm_fit_multiA, contrast = riskRatio)

#try manually two ways: 

#1 with gradient manually 
#get TMLE estimates 
psi.0 <- glm_fit_multiA$drtmle$est[1]
psi.1 <- glm_fit_multiA$drtmle$est[2]

#compute gradient 
grad <- (c(1/psi.0,-1/psi.1, 0))%*%glm_fit_multiA$drtmle$cov%*%c(1/psi.0,-1/psi.1, 0)

#CI
CI.grad <- 
  c(log(psi.0/psi.1), 
  log(psi.0/psi.1) - 1.96*sqrt(grad),
  log(psi.0/psi.1) + 1.96*sqrt(grad)) %>% exp()


#2 with influence curve 

#get influence 
d.0 <- glm_fit_multiA$ic_drtmle$ic[,1]
d.1 <- glm_fit_multiA$ic_drtmle$ic[,2]

logRR <- log(psi.0/psi.1) #log risk ratio
eic.log.rr <- d.0/psi.0 - d.1/psi.1 #functional delta method for EIC

var.log.rr <- var(eic.log.rr)/n #variance from EIC

#output on RR scale
CI.ic <- 
  c(logRR,
  (logRR - 1.96*sqrt(var.log.rr)),
  (logRR + 1.96*sqrt(var.log.rr))) %>% exp()


CI.ic #ageees with CI.grad
CI.grad
CI.builtin

Vignette rendering on CRAN

I recently noticed that the html vignette on CRAN does not look good. Seems to take issue with the inclusion of latex directly in the document.

TO DO

Add test function that tests whether any means are different (multi-df test?)
Clean up splitting into validation and training code at beginning of each estimateX function?

Warnings on CRAN checks

Looks like v1.0.3 still has some warnings in CRAN checks.

transparent future: please use sequential instead (transparent is not meant for production)

Hi. As part of taking the future package to the next level, I'm narrowing down what futures may do. This is mostly backward compatible, but one thing that needs to go is the support for future(..., local = FALSE). Most of this has already been done, cf. HenrikBengtsson/future#382. However, it is still used internally by transparent futures, which I'll keep supporting for a while, but they should only be used for debugging and troubleshooting purposes. They should never for production use The reason for this is that local = FALSE introduces behavior that is an outlier in the future framework. (This will be clarified further in the next release of future).

Running reverse dependency checks, I spotted that drtmle uses transparent futures in two places:

drtmle/R/adaptive_iptw.R

Line 148 in 922844d

future::plan(future::transparent)

drtmle/R/drtmle.R

Line 243 in 922844d

future::plan(future::transparent)

Could you please switch to use sequential here instead?

BTW, I have a vague memory that you've asked about how to run futures with near-zero overhead. Is this why you used transparent here rather than sequential? If so, I'm not sure how much difference that makes these days relative to the other types of overhead that exist, e.g. relaying of stdout and conditions.

Fitting with W = 1

In attempting to apply drtmle as a replacement for tmle I've run into an odd error related to the use of the predict.lm method somewhere in drtmle. An example that illustrates my problem is given below

n <- 100
W <- rep(1, n)
A <- rbinom(n, 1, 0.5)
Y <- rnorm(n)
y_star <- (Y - min(Y)) / (max(Y) - min(Y))
Q_lib <- g_lib <- c("SL.mean", "SL.glm")
family <- "binomial"

out <- drtmle(Y = as.numeric(y_star),
              A = as.numeric(A),
              W = as.data.frame(W),
              family = family,
              SL_Q = Q_lib,
              SL_g = g_lib,
              SL_Qr = "SL.npreg",
              SL_gr = "SL.npreg"
             )

which then results in

Error: $ operator is invalid for atomic vectors

use of traceback yields

❯ traceback()
7: stop(FutureError(future))
6: value.Future(tmp, ...)
5: value(tmp, ...)
4: values.list(fs)
3: values(fs)
2: future::future_lapply(x = validRows, FUN = estimateQ, Y = Y,
       A = A, W = W, DeltaA = DeltaA, DeltaY = DeltaY, verbose =
verbose,
       returnModels = returnModels, SL_Q = SL_Q, a_0 = a_0, strat
ify = stratify,
       glm_Q = glm_Q, family = family)
1: drtmle(Y = as.numeric(y_test_star), A = as.numeric(A), W = as.
data.frame(W),
       family = family, SL_Q = Q_lib, SL_g = g_lib, SL_Qr = "SL.n
preg",
       SL_gr = "SL.npreg")

Lines 172 to 199 in 8efedb7

    
           #> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations 
        
           #> ('future_lapply-1') unexpectedly generated random numbers without declaring so. 
        
           #> There is a risk that those random numbers are not statistically sound and the 
        
           #> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This 
        
           #> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer- 
        
           #> CMRG method. To disable this check, use 'future.seed = NULL', or set option 
        
           #> 'future.rng.onMisuse' to "ignore". 
        
           #> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations 
        
           #> ('future_lapply-1') unexpectedly generated random numbers without declaring so. 
        
           #> There is a risk that those random numbers are not statistically sound and the 
        
           #> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This 
        
           #> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer- 
        
           #> CMRG method. To disable this check, use 'future.seed = NULL', or set option 
        
           #> 'future.rng.onMisuse' to "ignore". 
        
           #> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations 
        
           #> ('future_lapply-1') unexpectedly generated random numbers without declaring so. 
        
           #> There is a risk that those random numbers are not statistically sound and the 
        
           #> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This 
        
           #> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer- 
        
           #> CMRG method. To disable this check, use 'future.seed = NULL', or set option 
        
           #> 'future.rng.onMisuse' to "ignore". 
        
           #> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations 
        
           #> ('future_lapply-1') unexpectedly generated random numbers without declaring so. 
        
           #> There is a risk that those random numbers are not statistically sound and the 
        
           #> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This 
        
           #> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer- 
        
           #> CMRG method. To disable this check, use 'future.seed = NULL', or set option 
        
           #> 'future.rng.onMisuse' to "ignore".

@nhejazi, mind taking a look?

	if (!parallel) {
	future::plan(future::transparent)
	} else {
	doFuture::registerDoFuture()
	if (all(c("sequential", "uniprocess") %in% class(future::plan())) &
	is.null(future_hpc)) {
	future::plan(future::multiprocess)
	} else if (!is.null(future_hpc)) {
	if (future_hpc == "batchtools_torque") {
	future::plan(future.batchtools::batchtools_torque)
	} else if (future_hpc == "batchtools_slurm") {
	future::plan(future.batchtools::batchtools_slurm)
	} else if (future_hpc == "batchtools_sge") {
	future::plan(future.batchtools::batchtools_sge)
	} else if (future_hpc == "batchtools_lsf") {
	future::plan(future.batchtools::batchtools_lsf)
	} else if (future_hpc == "batchtools_openlava") {
	future::plan(future.batchtools::batchtools_openlava)
	} else {
	stop("The currently specified HPC backend is not (yet) available.")
	}
	}

	#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
	#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
	#> There is a risk that those random numbers are not statistically sound and the
	#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
	#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
	#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
	#> 'future.rng.onMisuse' to "ignore".
	#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
	#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
	#> There is a risk that those random numbers are not statistically sound and the
	#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
	#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
	#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
	#> 'future.rng.onMisuse' to "ignore".
	#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
	#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
	#> There is a risk that those random numbers are not statistically sound and the
	#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
	#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
	#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
	#> 'future.rng.onMisuse' to "ignore".
	#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
	#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
	#> There is a risk that those random numbers are not statistically sound and the
	#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
	#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
	#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
	#> 'future.rng.onMisuse' to "ignore".

benkeser / drtmle Goto Github PK

drtmle's People

Contributors

Stargazers

Watchers

Forkers

drtmle's Issues

Zero to Q

a$drtmle$est

[1] -0.03679124

a$drtmle$est

[1] -0.03679124

Intercept only model

a$drtmle$est

[1] 0.4682421

a$drtmle$est

[1] 0.4671978

For package developers

Recommend Projects

Recommend Topics

Recommend Org