Giter Site home page Giter Site logo

benkeser / drtmle Goto Github PK

View Code? Open in Web Editor NEW
18.0 3.0 7.0 3.07 MB

Nonparametric estimators of the average treatment effect with doubly-robust confidence intervals and hypothesis tests

Home Page: https://benkeser.github.io/drtmle/

License: Other

R 97.01% Makefile 0.55% TeX 2.44%
tmle causal-inference ensemble-learning statistical-inference iptw

drtmle's People

Contributors

benkeser avatar ck37 avatar nhejazi avatar resekneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

drtmle's Issues

CC Super Learner methods fail sometimes

Seems like the convex combination methods in SuperLearner fail with error

Error: matrix D in quadratic function is not positive definite!

too frequently. Should include method based on Rsolnp with tryCatch fail safes?

Error when guard = "Q" or guard = "g" only

Hi! I somehow ran into the following error when I only specify guard = "Q" or guard = "g" only:
object 'grn' not found
or
object 'QrnStar' not found

Could you please help me on this issue?
Many thanks!

use of foreach in tests

The current CRAN build report/check identifies a warning that must be fixed before 2018-07-02.

Check Details:

Version: 1.0.2
Check: for unstated dependencies intestsResult: WARN
    '::' or ':::' import not declared from:foreach'library' or 'require' call not declared from:foreach

drtmle not changing from iteration to iteration if specifying outcome model as zero

Hi, I run into another issue while using drtmle: when I specify glm_Q = '0' (putting constant zeros as starting outcome model estimates), drtmle doesn't seem to run its iterative algorithm, but glm_Q = '1' still works.

X <- runif(100, 0, 1)

Q <- X

g <- exp(X) / (1 + exp(X))

A <- rbinom(100, 1, g)

Y <- Q + runif(100, -0.1, 0.1)

X <- as.data.frame(X)

Zero to Q

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '0', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 1)

a$drtmle$est

[1] -0.03679124

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '0', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 2)

a$drtmle$est

[1] -0.03679124

Intercept only model

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '1', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 1)

a$drtmle$est

[1] 0.4682421

a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '1', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 2)

a$drtmle$est

[1] 0.4671978

fk_regression... A faster implementation of univariate kernel regression

In case this is of interest, I stumbled upon an implementation of kernel regression in one dimension that offers significant speed improvements and scalability compared to npreg (and this is an understatement). The package is called FKSUM (an interesting name choice), and you can find a link to it here.

Below is the SuperLearner wrapper I have been using. I used the Nadaraya-Watson implementation, but `fk_regression' also supports local linear regression, which may be a better option.

SL.fastkernel <- function (Y, X, newX, family = gaussian(), obsWeights = rep(1,
                                                                               length(Y)), rangeThresh = 1e-07, ...)
{

  X <- as.matrix(X)
  newX <- as.matrix(newX)
  if(ncol(X) > 1) {
    stop("Univariate X kernel smooths only.")
  }
  fit <- FKSUM::fk_regression(X, Y, type = 'NW')
  pred <- predict(fit, xtest = newX)

  fit <- list(object = fit)
  class(fit) <- "SL.fastkernel"
  out <- list(pred = pred, fit = fit)
  return(out)
}

predict.SL.fastkernel <- function (object, newdata, ...)
{

  pred <- predict(object, xtest = as.matrix(newdata))

  return(pred)
}

future::plan(): always undo changes

Please undo the changes to future::plan() you do when you exit your function in for instance,

drtmle/R/adaptive_iptw.R

Lines 147 to 168 in 922844d

if (!parallel) {
future::plan(future::transparent)
} else {
doFuture::registerDoFuture()
if (all(c("sequential", "uniprocess") %in% class(future::plan())) &
is.null(future_hpc)) {
future::plan(future::multiprocess)
} else if (!is.null(future_hpc)) {
if (future_hpc == "batchtools_torque") {
future::plan(future.batchtools::batchtools_torque)
} else if (future_hpc == "batchtools_slurm") {
future::plan(future.batchtools::batchtools_slurm)
} else if (future_hpc == "batchtools_sge") {
future::plan(future.batchtools::batchtools_sge)
} else if (future_hpc == "batchtools_lsf") {
future::plan(future.batchtools::batchtools_lsf)
} else if (future_hpc == "batchtools_openlava") {
future::plan(future.batchtools::batchtools_openlava)
} else {
stop("The currently specified HPC backend is not (yet) available.")
}
}

From ?future::plan:

For package developers

Please refrain from modifying the future strategy inside your packages / functions, i.e. do not call plan() in your code. Instead, leave the control on what backend to use to the end user. This idea is part of the core philosophy of the future framework - as a developer you can never know what future backends the user have access to. Moreover, by not making any assumptions about what backends are available, your code will also work automatically with any new backends developed after you wrote your code.

If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using:

  oplan <- plan(new_set_of_strategies)
  on.exit(plan(oplan), add = TRUE)
  [...]

This is important because the end-user might have already set the future strategy elsewhere for other purposes and will most likely not known that calling your function will break their setup. Remember, your package and its functions might be used in a greater context where multiple packages and functions are involved and those might also rely on the future framework, so it is important to avoid stepping on others' toes.

BTW,

  1. The & in all(c("sequential", "uniprocess") %in% class(future::plan())) & is.null(future_hpc) could be replaced with a && to make that statement more robust.
  2. I think inherits(future::plan(), c("sequential", "uniprocess")) is better than c("sequential", "uniprocess") %in% class(future::plan()).

Issue with delta method and 'contrast'

Hi,

I'm noticing an issue with the risk ratio 'contrast' option in ci() in cases with a multi-level treatment. Here is the example from the vignette where I've computed the risk ratio and confidence interval 3 ways (two manually, which agree, and one with ci(), which is much wider):

library(drtmle)
library(dplyr)
library(SuperLearner)


#simulate data as in vignette
set.seed(1234)
n <- 200
W <- data.frame(W1 = runif(n), W2 = rbinom(n, 1, 0.5))
A <- rbinom(n, 2, plogis(W$W1 + W$W2))
Y <- rbinom(n, 1, plogis(W$W1 + W$W2*A))

glm_fit_multiA <- drtmle(W = W, A = A, Y = Y, stratify = FALSE,
                                      glm_g = "W1 + W2", glm_Q = "W1 + W2*A",
                                      glm_gr = "Qn", glm_Qr = "gn",
                                      family = binomial(), a_0 = c(0,1,2))
##

#compute RR as per example
riskRatio <- list(f = function(eff){ log(eff) },
                  f_inv = function(eff){ exp(eff) },
                  h = function(est){ est[1]/est[2] },
                  fh_grad =  function(est){ c(1/est[1],-1/est[2], 0) }) #added 0
CI.builtin <- ci(glm_fit_multiA, contrast = riskRatio)

#try manually two ways: 

#1 with gradient manually 
#get TMLE estimates 
psi.0 <- glm_fit_multiA$drtmle$est[1]
psi.1 <- glm_fit_multiA$drtmle$est[2]

#compute gradient 
grad <- (c(1/psi.0,-1/psi.1, 0))%*%glm_fit_multiA$drtmle$cov%*%c(1/psi.0,-1/psi.1, 0)

#CI
CI.grad <- 
  c(log(psi.0/psi.1), 
  log(psi.0/psi.1) - 1.96*sqrt(grad),
  log(psi.0/psi.1) + 1.96*sqrt(grad)) %>% exp()


#2 with influence curve 

#get influence 
d.0 <- glm_fit_multiA$ic_drtmle$ic[,1]
d.1 <- glm_fit_multiA$ic_drtmle$ic[,2]

logRR <- log(psi.0/psi.1) #log risk ratio
eic.log.rr <- d.0/psi.0 - d.1/psi.1 #functional delta method for EIC

var.log.rr <- var(eic.log.rr)/n #variance from EIC

#output on RR scale
CI.ic <- 
  c(logRR,
  (logRR - 1.96*sqrt(var.log.rr)),
  (logRR + 1.96*sqrt(var.log.rr))) %>% exp()


CI.ic #ageees with CI.grad
CI.grad
CI.builtin 


TO DO

  • Add test function that tests whether any means are different (multi-df test?)
  • Clean up splitting into validation and training code at beginning of each estimateX function?

transparent future: please use sequential instead (transparent is not meant for production)

Hi. As part of taking the future package to the next level, I'm narrowing down what futures may do. This is mostly backward compatible, but one thing that needs to go is the support for future(..., local = FALSE). Most of this has already been done, cf. HenrikBengtsson/future#382. However, it is still used internally by transparent futures, which I'll keep supporting for a while, but they should only be used for debugging and troubleshooting purposes. They should never for production use The reason for this is that local = FALSE introduces behavior that is an outlier in the future framework. (This will be clarified further in the next release of future).

Running reverse dependency checks, I spotted that drtmle uses transparent futures in two places:

future::plan(future::transparent)

future::plan(future::transparent)

Could you please switch to use sequential here instead?

BTW, I have a vague memory that you've asked about how to run futures with near-zero overhead. Is this why you used transparent here rather than sequential? If so, I'm not sure how much difference that makes these days relative to the other types of overhead that exist, e.g. relaying of stdout and conditions.

Fitting with W = 1

In attempting to apply drtmle as a replacement for tmle I've run into an odd error related to the use of the predict.lm method somewhere in drtmle. An example that illustrates my problem is given below

n <- 100
W <- rep(1, n)
A <- rbinom(n, 1, 0.5)
Y <- rnorm(n)
y_star <- (Y - min(Y)) / (max(Y) - min(Y))
Q_lib <- g_lib <- c("SL.mean", "SL.glm")
family <- "binomial"

out <- drtmle(Y = as.numeric(y_star),
              A = as.numeric(A),
              W = as.data.frame(W),
              family = family,
              SL_Q = Q_lib,
              SL_g = g_lib,
              SL_Qr = "SL.npreg",
              SL_gr = "SL.npreg"
             )

which then results in

Error: $ operator is invalid for atomic vectors

use of traceback yields

❯ traceback()
7: stop(FutureError(future))
6: value.Future(tmp, ...)
5: value(tmp, ...)
4: values.list(fs)
3: values(fs)
2: future::future_lapply(x = validRows, FUN = estimateQ, Y = Y,
       A = A, W = W, DeltaA = DeltaA, DeltaY = DeltaY, verbose =
verbose,
       returnModels = returnModels, SL_Q = SL_Q, a_0 = a_0, strat
ify = stratify,
       glm_Q = glm_Q, family = family)
1: drtmle(Y = as.numeric(y_test_star), A = as.numeric(A), W = as.
data.frame(W),
       family = family, SL_Q = Q_lib, SL_g = g_lib, SL_Qr = "SL.n
preg",
       SL_gr = "SL.npreg")

future warning in readme

In adaptive_iptw apparently we're doing some seed setting that future doesn't like.

drtmle/README.md

Lines 172 to 199 in 8efedb7

#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
#> 'future.rng.onMisuse' to "ignore".
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
#> 'future.rng.onMisuse' to "ignore".
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
#> 'future.rng.onMisuse' to "ignore".
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_lapply-1') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
#> CMRG method. To disable this check, use 'future.seed = NULL', or set option
#> 'future.rng.onMisuse' to "ignore".

@nhejazi, mind taking a look?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.