benkeser / drtmle Goto Github PK
View Code? Open in Web Editor NEWNonparametric estimators of the average treatment effect with doubly-robust confidence intervals and hypothesis tests
Home Page: https://benkeser.github.io/drtmle/
License: Other
Nonparametric estimators of the average treatment effect with doubly-robust confidence intervals and hypothesis tests
Home Page: https://benkeser.github.io/drtmle/
License: Other
@nhejazi -- Mind checking out these warnings and build errors when you have a minute?
Seems like the convex combination methods in SuperLearner
fail with error
Error: matrix D in quadratic function is not positive definite!
too frequently. Should include method based on Rsolnp
with tryCatch
fail safes?
Hi! I somehow ran into the following error when I only specify guard = "Q" or guard = "g" only:
object 'grn' not found
or
object 'QrnStar' not found
Could you please help me on this issue?
Many thanks!
The current CRAN build report/check identifies a warning that must be fixed before 2018-07-02.
Check Details:
Version: 1.0.2
Check: for unstated dependencies in ‘tests’
Result: WARN
'::' or ':::' import not declared from: ‘foreach’
'library' or 'require' call not declared from: ‘foreach’
Hi all, when I'm trying to use drtmle with my own specified initial estimates in gn and Qn, the result doesn't change from iteration to iteration. Am I missing something?
Thanks!
Hi, I run into another issue while using drtmle: when I specify glm_Q = '0' (putting constant zeros as starting outcome model estimates), drtmle doesn't seem to run its iterative algorithm, but glm_Q = '1' still works.
X <- runif(100, 0, 1)
Q <- X
g <- exp(X) / (1 + exp(X))
A <- rbinom(100, 1, g)
Y <- Q + runif(100, -0.1, 0.1)
X <- as.data.frame(X)
a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '0', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 1)
a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '0', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 2)
a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '1', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 1)
a <- drtmle(W = X, A = A, Y = Y, a_0 = 1, glm_Q = '1', glm_g = 'X', SL_Qr = 'SL.npreg', guard = 'Q', returnModel = TRUE, maxIter = 2)
In case this is of interest, I stumbled upon an implementation of kernel regression in one dimension that offers significant speed improvements and scalability compared to npreg (and this is an understatement). The package is called FKSUM (an interesting name choice), and you can find a link to it here.
Below is the SuperLearner wrapper I have been using. I used the Nadaraya-Watson implementation, but `fk_regression' also supports local linear regression, which may be a better option.
SL.fastkernel <- function (Y, X, newX, family = gaussian(), obsWeights = rep(1,
length(Y)), rangeThresh = 1e-07, ...)
{
X <- as.matrix(X)
newX <- as.matrix(newX)
if(ncol(X) > 1) {
stop("Univariate X kernel smooths only.")
}
fit <- FKSUM::fk_regression(X, Y, type = 'NW')
pred <- predict(fit, xtest = newX)
fit <- list(object = fit)
class(fit) <- "SL.fastkernel"
out <- list(pred = pred, fit = fit)
return(out)
}
predict.SL.fastkernel <- function (object, newdata, ...)
{
pred <- predict(object, xtest = as.matrix(newdata))
return(pred)
}
Please undo the changes to future::plan()
you do when you exit your function in for instance,
Lines 147 to 168 in 922844d
From ?future::plan:
For package developers
Please refrain from modifying the future strategy inside your packages / functions, i.e. do not call
plan()
in your code. Instead, leave the control on what backend to use to the end user. This idea is part of the core philosophy of the future framework - as a developer you can never know what future backends the user have access to. Moreover, by not making any assumptions about what backends are available, your code will also work automatically with any new backends developed after you wrote your code.If you think it is necessary to modify the future strategy within a function, then make sure to undo the changes when exiting the function. This can be done using:
oplan <- plan(new_set_of_strategies) on.exit(plan(oplan), add = TRUE) [...]This is important because the end-user might have already set the future strategy elsewhere for other purposes and will most likely not known that calling your function will break their setup. Remember, your package and its functions might be used in a greater context where multiple packages and functions are involved and those might also rely on the future framework, so it is important to avoid stepping on others' toes.
BTW,
&
in all(c("sequential", "uniprocess") %in% class(future::plan())) & is.null(future_hpc)
could be replaced with a &&
to make that statement more robust.inherits(future::plan(), c("sequential", "uniprocess"))
is better than c("sequential", "uniprocess") %in% class(future::plan())
.Hi,
I'm noticing an issue with the risk ratio 'contrast' option in ci() in cases with a multi-level treatment. Here is the example from the vignette where I've computed the risk ratio and confidence interval 3 ways (two manually, which agree, and one with ci(), which is much wider):
library(drtmle)
library(dplyr)
library(SuperLearner)
#simulate data as in vignette
set.seed(1234)
n <- 200
W <- data.frame(W1 = runif(n), W2 = rbinom(n, 1, 0.5))
A <- rbinom(n, 2, plogis(W$W1 + W$W2))
Y <- rbinom(n, 1, plogis(W$W1 + W$W2*A))
glm_fit_multiA <- drtmle(W = W, A = A, Y = Y, stratify = FALSE,
glm_g = "W1 + W2", glm_Q = "W1 + W2*A",
glm_gr = "Qn", glm_Qr = "gn",
family = binomial(), a_0 = c(0,1,2))
##
#compute RR as per example
riskRatio <- list(f = function(eff){ log(eff) },
f_inv = function(eff){ exp(eff) },
h = function(est){ est[1]/est[2] },
fh_grad = function(est){ c(1/est[1],-1/est[2], 0) }) #added 0
CI.builtin <- ci(glm_fit_multiA, contrast = riskRatio)
#try manually two ways:
#1 with gradient manually
#get TMLE estimates
psi.0 <- glm_fit_multiA$drtmle$est[1]
psi.1 <- glm_fit_multiA$drtmle$est[2]
#compute gradient
grad <- (c(1/psi.0,-1/psi.1, 0))%*%glm_fit_multiA$drtmle$cov%*%c(1/psi.0,-1/psi.1, 0)
#CI
CI.grad <-
c(log(psi.0/psi.1),
log(psi.0/psi.1) - 1.96*sqrt(grad),
log(psi.0/psi.1) + 1.96*sqrt(grad)) %>% exp()
#2 with influence curve
#get influence
d.0 <- glm_fit_multiA$ic_drtmle$ic[,1]
d.1 <- glm_fit_multiA$ic_drtmle$ic[,2]
logRR <- log(psi.0/psi.1) #log risk ratio
eic.log.rr <- d.0/psi.0 - d.1/psi.1 #functional delta method for EIC
var.log.rr <- var(eic.log.rr)/n #variance from EIC
#output on RR scale
CI.ic <-
c(logRR,
(logRR - 1.96*sqrt(var.log.rr)),
(logRR + 1.96*sqrt(var.log.rr))) %>% exp()
CI.ic #ageees with CI.grad
CI.grad
CI.builtin
I recently noticed that the html vignette on CRAN does not look good. Seems to take issue with the inclusion of latex directly in the document.
test
function that tests whether any means are different (multi-df test?)Looks like v1.0.3 still has some warnings in CRAN checks.
Hi. As part of taking the future package to the next level, I'm narrowing down what futures may do. This is mostly backward compatible, but one thing that needs to go is the support for future(..., local = FALSE)
. Most of this has already been done, cf. HenrikBengtsson/future#382. However, it is still used internally by transparent
futures, which I'll keep supporting for a while, but they should only be used for debugging and troubleshooting purposes. They should never for production use The reason for this is that local = FALSE
introduces behavior that is an outlier in the future framework. (This will be clarified further in the next release of future).
Running reverse dependency checks, I spotted that drtmle uses transparent
futures in two places:
Line 148 in 922844d
Line 243 in 922844d
Could you please switch to use sequential
here instead?
BTW, I have a vague memory that you've asked about how to run futures with near-zero overhead. Is this why you used transparent
here rather than sequential
? If so, I'm not sure how much difference that makes these days relative to the other types of overhead that exist, e.g. relaying of stdout and conditions.
In attempting to apply drtmle
as a replacement for tmle
I've run into an odd error related to the use of the predict.lm
method somewhere in drtmle
. An example that illustrates my problem is given below
n <- 100
W <- rep(1, n)
A <- rbinom(n, 1, 0.5)
Y <- rnorm(n)
y_star <- (Y - min(Y)) / (max(Y) - min(Y))
Q_lib <- g_lib <- c("SL.mean", "SL.glm")
family <- "binomial"
out <- drtmle(Y = as.numeric(y_star),
A = as.numeric(A),
W = as.data.frame(W),
family = family,
SL_Q = Q_lib,
SL_g = g_lib,
SL_Qr = "SL.npreg",
SL_gr = "SL.npreg"
)
which then results in
Error: $ operator is invalid for atomic vectors
use of traceback
yields
❯ traceback()
7: stop(FutureError(future))
6: value.Future(tmp, ...)
5: value(tmp, ...)
4: values.list(fs)
3: values(fs)
2: future::future_lapply(x = validRows, FUN = estimateQ, Y = Y,
A = A, W = W, DeltaA = DeltaA, DeltaY = DeltaY, verbose =
verbose,
returnModels = returnModels, SL_Q = SL_Q, a_0 = a_0, strat
ify = stratify,
glm_Q = glm_Q, family = family)
1: drtmle(Y = as.numeric(y_test_star), A = as.numeric(A), W = as.
data.frame(W),
family = family, SL_Q = Q_lib, SL_g = g_lib, SL_Qr = "SL.n
preg",
SL_gr = "SL.npreg")
https://www.stats.ox.ac.uk/pub/bdr/LENGTH1/drtmle.out
appears to be from earth
package?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.