romainkp / stremr Goto Github PK
View Code? Open in Web Editor NEWStreamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data
License: MIT License
Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data
License: MIT License
:when calling fitPropensity:
Version 0.8.99 of stremr when the treatment is continuous give this error:
Error: 'Lrnr_condensier' is not an exported object from 'namespace:sl3'
When I install version 1 or 2, then even the bionary treatment does not work. All of the cases give me this error:
Error: outvar is not a character vector
my project is on hold until I can solve this problem. Your help is appreciated. Best Regards
Is there documentation for all of the package dependencies of stremr?
All the functionality is already there, except that extraction of the prediction results requires writing custom access functions. Specifically we need to extract the final predictions for each strata of Q prediction and combine them into N predictions.
Current set-up only permits the input of probabilities/indicators of observation following an intervention-specific treatment and/or monitoring rule (arguments gstar.TRT
and gstar.MONITOR
).
In some cases it might be more intuitive to have as input the counterfactual values of treatment and monitoring based on some intervention rule (the current functionality of ltmle
and tmlenet
). For dynamic rules, these would be evaluated for each observation and could also be specified as probabilities of TRT[t]=1 and MONITOR[t]=1, under some intervention rule. In this case, the rule followers/non-followers (gstar.TRT
& gstar.N
) would be evaluated automatically by comparing the observed TRT/MONITOR to their corresponding counterfactual values/probabilities.
DT <- data.table(data, key = c("ID", "t"))
I think "ID" and "t" should be replaced with IDvar and tvar
When a factor variable can take on only one value, importData creates a dummy of class "list" which cannot be handled by subsequent stremr routines (Error message is produced).
Stremr should catch the problem, print a warning and create a single dummy variable of class "integer".
When either one of the arguments CENS
, TRT,
MONITOR` is not specified, do not to intervene on it and do not attempt to fit the corresponding model for censoring, treatment or monitoring. Currently will return an error if either is not specified.
Hello,
I'm trying to conduct an intent-to-treat analysis and am having difficulty fitting the treatment propensity score using only t == 0 (baseline) and then using that estimated baseline treatment propensity across all time points with time-varying censoring.
It looks like when I stratify and only calculate it on t == 0, the treatment propensity score for all remaining time points (1 to 170) may be set to 1 by default? Any recs or hints on how to apply the t = 0 propensity score to all future time points? Censoring on the other hand I do want to be estimated at each time point (or in a pooled manner). Also just fyi at t == 0, 8.7% of the sample is treated.
This is using scrambled data from Romain and attempting to replicate Marcus, J. L., Neugebauer, R. S., Leyden, W. A., Chao, C. R., Xu, L., Quesenberry Jr, C. P., ... & Silverberg, M. J. (2016). Use of abacavir and risk of cardiovascular disease among HIV-infected individuals. JAIDS Journal of Acquired Immune Deficiency Syndromes, 71(4), 413-419.
Here is my rough code:
vars = list(
id = "ID",
time = "intnum",
treatment = "exposure",
censoring = "censor",
outcome = "outcome",
covars_time_varying =
c("cd4", "I.cd4", "vl", "I.vl", "egfr.low", "I.egfr.low", "diabetes", "htn.meds",
"llt.meds")
# covars_baseline will be defined after factors_to_indicators conversion.
)
odata =
importData(dt,
ID = vars$id,
t = vars$time,
covars = c(vars$covars_baseline, vars$covars_time_varying),
CENS = vars$censoring,
TRT = vars$treatment,
OUTCOME = vars$outcome)
# Try simpler propensity score specification to get a better distribution.
(gform_TRT = paste(vars$treatment, "~", paste(c("eversmoke", "everdrug", "sex_m", "art.naive"), collapse = " + ")))
# For ITT parameter only estimate treatment propensity at baseline.
# (uses rlang::list2 and !! so that we can substitute the treatment variable into the list name)
(stratify_TRT = list2(!!vars$treatment := paste0(vars$time, " == 0L")))
(gform_CENS = paste(vars$censoring, "~", paste(c(vars$covariates), collapse = " + ")))
odata = fitPropensity(odata,
gform_TRT = gform_TRT,
gform_CENS = gform_CENS,
stratify_TRT = stratify_TRT)
# Excessive right skewing: 1st quartile to max are all 1.0
# Min is 6.7%.
# Perhaps because we need to examine only t = 0?
summary(odata$g_preds$g0.A)
qplot(odata$g_preds$g0.A)
Thank you and happy to provide any more context if it would be helpful. Sorry if this is an easy fix.
Add a test that the ID variable provided by the user is NOT a factor within importData and return an error to the user otherwise.
Make it clear in the doc that the ID variable should be not be provided as a factor. Otherwise, the current implementation will create dummies for all study ID.
Make it clear in the doc that all character variables are ignored by stremr (i.e., they won't be converted to factors).
Sometimes the user may set the intervention node (g^*) to NA
. If this value is actually being applied, then we should produce a warning, since the resulting weights will be also NA
.
The best place to catch it is probably here:
https://github.com/osofr/stremr/blob/master/R/DeterministicBinaryOutcomeModel.R#L29-L44
fit = function(overwrite = FALSE, data, ...) { # Move overwrite to a field? ... self$overwrite
self$n <- data$nobs
self$define.subset.idx(data)
private$probA1 <- data$get.outvar(TRUE, self$gstar.Name)
# private$.isNA.probA1 <- is.na(private$probA1)
# self$subset_idx <- rep.int(TRUE, self$n)
self$subset_idx <- seq_len(self$n)
private$.outvar <- data$get.outvar(TRUE, self$getoutvarnm) # Always a vector of 0/1
# private$.isNA.outvar <- is.na(private$.outvar)
self$is.fitted <- TRUE
# **********************************************************************
# to save RAM space when doing many stacked regressions wipe out all internal data:
# self$wipe.alldat
# **********************************************************************
invisible(self)
},
Or here (private$probA1
is the g^*):
https://github.com/osofr/stremr/blob/master/R/DeterministicBinaryOutcomeModel.R#L53-L67
predictAeqa = function(newdata, ...) { # P(A^s[i]=a^s|W^s=w^s) - calculating the likelihood for indA[i] (n vector of a`s)
assert_that(self$is.fitted)
if (missing(newdata)) {
indA <- self$getoutvarval
} else {
indA <- newdata$get.outvar(self$getsubset, self$getoutvarnm) # Always a vector of 0/1
}
assert_that(is.integerish(indA)) # check that observed exposure is always a vector of integers
probAeqa <- rep.int(1L, self$n) # for missing values, the likelihood is always set to P(A = a) = 1.
# probA1 <- private$probA1[self$getsubset]
probA1 <- private$probA1
probAeqa[self$getsubset] <- probA1^(indA) * (1 - probA1)^(1L - indA)
self$wipe.alldat # to save RAM space when doing many stacked regressions wipe out all internal data:
return(probAeqa)
},
`
The trial is a sequentially randomized trial with possible re-randomization based on previous response. At baseline, participants are randomized 1:1:1 into three arms, texting intervention, mobile app intervention, and standard of care. At 3 month increments participants in the text and app arms who do not "improve" their risk behaviors are eligible for re-randomization into a third intervention, e-coaching.
Because it's a randomized trial, at each time I know the probability of receiving each treatment given the past. I don't necessarily have to input the known probabilities, but I can't figure out how to properly stratify the fitPropensity
command and can't digest what the error messages are actually telling me.
Attached is an example data table.
ID
= partic. id
covariate
= a binary baseline covariateY
= the outcome at month 12txt
= on the texting interventionapp
= on the app interventiont
= time (0 = baseline, 3, 6, 9, 12 months)monitor
= whether they are seen at t + 3risk
= risk behavior score, if risk at t+3 is bigger than or equal to risk at t (and risk at t != 0) and -txt == 1 | app == 1
, then they are eligible at t+3 for rerandomizaiton to e-coaching. Y
is the sum of risk at 3, 6, 9, and 12.trt
= multinomial treatment indicator 1 = txt, 2 = appY_scale
= Y scaled by maximum value of Ytrt.setXX
= treatment levels for rules I'm interested in (no int, all txt int, all app int, txt + e-coaching when risk doesn't improve, app + e-coaching when risk doesn't improve)eligible
= whether or not someone is eligible for re-randomization at tecoach
= on the e-coach interventioncoach.tminus1
= on e-coach intervention at previous time pointcens
= censoring = 0 for everyone, since it seemed like providing monitoring status made stremr happier than censoring + missing valuesHere's what I've tried:
gform_MONITOR <- "monitor ~ risk + covariate"
# would perhaps prefer this to be intercept only, but ~1 syntax seems to piss everything off
gform_TRT <- "trt ~ risk"
stratify_TRT <- list(trt = c(
"t == 0", # baseline randomization prob.
"t == 3 & eligible == 1", # at time 3 prob. of getting e-coaching amongst those eligible (should be 1/2)
"t == 6 & eligible == 1", # at time 6 prob. of getting e-coaching amongst those eligible (should be 1/2)
"t == 9 & eligible == 1", # at time 9, etc...
"t == 3 & txt == 1 & eligible == 0", # at time 3, txt folks should stay on txt with prob. 1 if not eligible for e-coach
"t == 6 & txt == 1 & eligible == 0", # at time 6, txt folks should stay on txt with prob. 1 if not eligible for e-coach
"t == 9 & txt == 1 & eligible == 0", # etc...
"t == 3 & app == 1 & eligible == 0", # ditto app people who are not eligible
"t == 6 & app == 1 & eligible == 0", # etc
"t == 9 & app == 1 & eligible == 0", # etc
"t == 3 & app == 0 & txt == 0", # ditto standard of care people, i.e. trt == 0
"t == 6 & app == 0 & txt == 0", # etc
"t == 9 & app == 0 & txt == 0", # etc
"t == 6 & ecoach.tminus1 == 1", # once you get to e-coaching, you stay on e-coaching with prob. 1
"t == 9 & ecoach.tminus1 == 1"))
I'm just wondering how to properly specify these regressions and whether there is a way to make them intercept only or not.
The results tables provided by estimating functions are poorly documented.
This could be used as a start: https://github.com/osofr/stremr/#simulated-data-example
In the report and in the argument values of the make report routine
This is to be match the most common terminology used in the literature
Add 95th, 99th, and 99.9th percentiles and maximum value of IP weights in relevant section of report
Documentation of getIPWeights states:
"The output is person-specific data with evaluated weights, ‘wts.DT’, only observation-times with non-zero weight are kept"
The returned object can actually contain person-time obs with cum.IPAW = 0
Can you help me understand the output of stremr? In the optimal dynamic treatment rule example, in the output table IPW_estimates, there are the variables: sum_Y_IPAW, sum_all_IPAW, ht, St.IPTW, ht.KM. Can you help me understand how to interpret each of these variables?
when the MSM parameterization results in no event for a time interval under a given rule
Different from stratified ps models: how do I specify different variable sets in g_form_TRT?
Multivaraite_TRT__4_dummy_exposure_.docx
# ----------------------------------------------------------------------
# Instal stremr Version 0.8.99 Data
# ----------------------------------------------------------------------
#knitr::opts_chunk$set(echo = TRUE)
library(devtools)
#install_github("osofr/stremr", ref = "experimental_master")
# ----------------------------------------------------------------------
# Instal stremr Version 0.8.99 Data
# ----------------------------------------------------------------------
library(stremr)
library(data.table)
library(magrittr)
library(h2o)
options(stremr.verbose=TRUE)
sessionInfo()
library(repmis)
source_data("https://github.com/Soudi00/Multi-Treatment-Causal-Modeling/blob/master/sampleAD.RData?raw=True")
AD = as.data.table(AD, key= c(ID,SEQ))
#I have 4 different treatment , should I only use 3 of the dummies in the importData or I should include all of them?
# ----------------------------------------------------------------------
# Define intervention (always on TRT1):
# ----------------------------------------------------------------------
AD[, ("zero.set1") := 0L]
AD[, ("zero.set2") := 0L]
AD[, ("zero.set3") := 0L]
AD[, ("TRT1.set") := 1L]
# ----------------------------------------------------------------------
# Import Data in to stremr object
# ----------------------------------------------------------------------
OData.1 <- importData(AD, ID = "ID", t_name = "SEQ",
covars = c("CAT_VAR1","CAT_VAR2","CONT_VAR1"),
CENS = c("CNS","ADM_CNS"),
TRT = c("TRT1","TRT2","TRT3","TRT4"),
MONITOR = NULL, OUTCOME = "STATUS",
weights = NULL, remove_extra_rows = TRUE,
verbose = getOption("stremr.verbose"))
# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData.1)
# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData.1)
# ----------------------------------------------------------------------
# Model the Right Censroing and Adminstrative Censoring and Exposure
# ----------------------------------------------------------------------
gform_CENS <- "CNS + ADM_CNS ~ CAT_VAR1 + CONT_VAR1"
gform_TRT = "TRT1+TRT2+TRT3+TRT4 ~ CAT_VAR1 + CAT_VAR2 + CONT_VAR1"
# ----------------------------------------------------------------------
# Fit Propensity Scores
# ----------------------------------------------------------------------
OData.1 <- fitPropensity(OData.1, gform_CENS = gform_CENS,ngform_TRT = gform_TRT )
I have my own defined dynamic treatment patterns of interest (5 dummy variables for the 5 patterns). That is:
# ----------------------------------------------------------------------
# Error: length(intervened_NODE) not equal to length(NodeNames)
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData.1, intervened_TRT="PATH1")
# ----------------------------------------------------------------------
# Error in modelfit.g$getPsAsW.models()[[i]] : subscript out of bounds
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData.1, intervened_TRT=c("TRT1.set","zero.set1","zero.set2","zero.set3"))
# ----------------------------------------------------------------------
# useing diffrent intervened_TRT didnt make a diffrence in the result
# ----------------------------------------------------------------------
wts.DT.1 <- getIPWeights(OData = OData.1, useonly_t_TRT="PATH1==1",rule_name ="Only TRT1")
wts.DT.1
wts.DT.2 <- getIPWeights(OData = OData.1, useonly_t_TRT="PATH2==1", rule_name = "Only TRT2")
wts.DT.2
Hi, thanks a lot for this wonderful package. I wanted to run CVTMLE, and I am getting this error:
Error in UseMethod("predict_SL") :
no applicable method for 'predict_SL' applied to an object of class "c('Lrnr_sl', 'Lrnr_base', 'R6')"
I guess it is something to do with gridsl::predict_SL.
My function:
params <- gridisl::defModel(estimator = "speedglm__glm")
IPW <- getIPWeights(Odata, intervened_TRT = "Intervention_1", holdout = TRUE)
qwigh <- quantile(IPW[cum.IPAW>0,][["cum.IPAW"]], c(0.99))
gcomp_est <- fit_GCOMP(Odata, tvals = tvals,
TMLE = TRUE,
CVTMLE = TRUE,
Qforms = Qforms,
intervened_TRT = "Intervention_1"
models = params,
stratifyQ_by_rule = FALSE,
stratify_by_last = TRUE,
trunc_weights = qwigh,
fit_method = "cv",
byfold_Q = FALSE,
IPWeights = IPW)
Regards!
Example: UACR/TI 90days.
dx with x=7.5
bar{n}* = continous 1, 101010..., 1001001...
In this example, we are interested in an MSM for 3 rules.
survMSM
however ignores the fact that each intervention is indexed by a particular monitoring intervention and defines model terms solely based on the various levels of treatment interventions. In this case, there is only one dx so survMSM
gives:
MSM.IPAW$MSM.fit
$coef
Periods.0to0_new.d7.5 Periods.1to1_new.d7.5 Periods.2to2_new.d7.5 Periods.3to3_new.d7.5
-5.362056 -3.953137 -4.067354 -3.152000
Periods.4to4_new.d7.5 Periods.5to5_new.d7.5 Periods.6to6_new.d7.5 Periods.7to7_new.d7.5
-6.936838 -3.870630 -4.138810 -2.931822
Periods.8to8_new.d7.5 Periods.9to17_new.d7.5
-5.218133 -5.028533$linkfun
[1] "logit_linkinv"$fitfunname
[1] "h2o.glm"
I'll email template code that describes the problem in more details
RDtables.t0.1.2year <- get_MSM_RDs(MSM.IPAW, c(3,7), getSEs = TRUE) ## BUG
RDtables.t0.1.year <- get_MSM_RDs(MSM.IPAW, c(3), getSEs = TRUE) ## BUG
Error in se.RDscale.Sdt.K[d1.idx, d2.idx] <- getSE.RD.d1.minus.d2(nID = nID, :
replacement has length zero
Note that when there is only a single t0 specified in the argument (value 3), there is an error while the routine runs without a problem with 2+ t0's.
IPW_for_Categorical_Exposure_with_4_Levels.docx
stremr version 0.8.99 and Installing stremr package from GitHub and loading the packages
# ----------------------------------------------------------------------
# Instal stremr Version 0.8.99 Data
# ----------------------------------------------------------------------
#knitr::opts_chunk$set(echo = TRUE)
library(devtools)
#install_github("osofr/stremr", ref = "experimental_master")
# ----------------------------------------------------------------------
# Get Libraries
# ----------------------------------------------------------------------
library(stremr)
library(data.table)
library(magrittr)
library(h2o)
options(stremr.verbose=TRUE)
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rmarkdown_1.8 repmis_0.5 sl3_1.0.0 h2o_3.16.0.2
[5] magrittr_1.5 data.table_1.10.5 devtools_1.13.4 haven_1.1.0
[9] stremr_0.8.99
Read sampleAD.RData from Soudi00 GitHub repository
library(repmis)
source_data("https://github.com/Soudi00/Multi-Treatment-Causal-Modeling/blob/master/sampleAD.RData?raw=True")
AD = as.data.table(AD, key= c(ID,SEQ))
use importData to prepare the data to get porpensity scores, also define censoring and exposrure regressions
# ----------------------------------------------------------------------
# Import Data
# ----------------------------------------------------------------------
OData.2 <- importData(AD, ID = "ID", t_name = "SEQ",
covars = c("CAT_VAR1","CAT_VAR2","CONT_VAR1"),
CENS = c("CNS","ADM_CNS"),
TRT = "TRTN",
MONITOR = NULL, OUTCOME = "STATUS",
weights = NULL, remove_extra_rows = TRUE,
verbose = getOption("stremr.verbose"))
# ----------------------------------------------------------------------
# Look at the input data object
# ----------------------------------------------------------------------
print(OData.2)
# ----------------------------------------------------------------------
# Access the input data
# ----------------------------------------------------------------------
get_data(OData.2)
# ----------------------------------------------------------------------
# Regression formula for Right Censoring and Administrative
# Censoring and Exposure
# ----------------------------------------------------------------------
gform_CENS <- "CNS + ADM_CNS ~ CAT_VAR1 + CONT_VAR1"
gform_TRT = "TRTN ~ CAT_VAR1 + CAT_VAR2 + CONT_VAR1"
#Error in fitPropensity (not meaningful for factors)
I tried different options in fitPropensity but none of them works for categorical with more than 2 levels.
# ----------------------------------------------------------------------
# Estimate Propensity Scores
# fitPRopensity score with all defult option has an error
# ----------------------------------------------------------------------
OData.2 <- fitPropensity(OData.2, gform_CENS = gform_CENS,ngform_TRT = gform_TRT )
Using the default regression formula: TRTN ~ CAT_VAR1_2+CAT_VAR1_3+CAT_VAR1_4+CAT_VAR1_6+CAT_VAR1_7+CAT_VAR2_2+CAT_VAR2_3+CAT_VAR2_4+CAT_VAR2_5+CONT_VAR1
[1] "New 'ModelBinomial' regression defined:"
[1] "P(CNS|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: ;\ N: NA"
[1] "New 'ModelBinomial' regression defined:"
[1] "P(ADM_CNS|CNS, CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: CNS == 0;\ N: NA"
[1] "New 'ModelCategorical' regression defined:"
[1] "P(TRTN|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CAT_VAR2_2, CAT_VAR2_3, CAT_VAR2_4, CAT_VAR2_5, CONT_VAR1);\ outvar.class: categorical;\ Stratify: ;\ N: NA"
[1] "fitting the model: P(CNS|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: ;\ N: NA"
[1] "fitting the model: P(ADM_CNS|CNS, CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CONT_VAR1);\ outvar.class: binomial;\ Stratify: CNS == 0;\ N: NA"
[1] "fitting the model: P(TRTN|CAT_VAR1_2, CAT_VAR1_3, CAT_VAR1_4, CAT_VAR1_6, CAT_VAR1_7, CAT_VAR2_2, CAT_VAR2_3, CAT_VAR2_4, CAT_VAR2_5, CONT_VAR1);\ outvar.class: categorical;\ Stratify: ;\ N: NA"
Failed on Lrnr_condensier_c("equal.mass", "equal.len", "dhist")_5_20_FALSE_NA_FALSE_NULL
Error in Summary.factor(structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, :
‘max’ not meaningful for factors
sl3 error debugging info:
[1] "Error in Summary.factor(structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, : \n ‘max’ not meaningful for factors\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in Summary.factor(structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 2L, 2L, 2L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 3L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 4L, 2L, 4L, 2L, 2L, 4L, 2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L), .Label = c("1", "2", "3", "4"), class = "factor"), na.rm = FALSE): 'max' not meaningful for factors>
...trying to run Lrnr_glm_fast as a backup...
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(eta, offset): '-' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(weights, y): '*' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Error in private$PsAsW.models[[k_i]]$predictAeqa(newdata = newdata, n = n, : some of the modeling predictions resulted in NAs, which indicates an error in a prediction routine
# ----------------------------------------------------------------------
# Fitting treatment model with Gradient Boosting machines:
# ----------------------------------------------------------------------
require("h2o")
h2o::h2o.init(nthreads = -1)
gform_CENS <- "CNS + ADM_CNS ~ CAT_VAR1 + CONT_VAR1"
models_TRT <- sl3::Lrnr_h2o_grid$new(algorithm = "gbm")
OData.2 <- fitPropensity(OData.2, gform_CENS = gform_CENS,
gform_TRT = gform_TRT,
models_TRT = models_TRT)
# Use `H2O-3` distributed implementation of GLM for treatment model estimator:
models_TRT <- sl3::Lrnr_h2o_glm$new(family = "multinomial")
OData.2 <- fitPropensity(OData.2, gform_CENS = gform_CENS,
gform_TRT = gform_TRT,
models_TRT = models_TRT)
Hello,
I mentioned this in person but thought I'd start a quick feature request issue to support observation weights. I believe Mark is on board with simply passing the weights through to the Q & g estimation as well as the fluctuation. For the variance of the IC he suggested that the weights be normalized to sum to 1 (possibly also applied to the Q & g estimation as well, not sure of the specifics).
Personally I could use this for a study where we have 30 million observations but only a few categorical covariates, so we can aggregate the replicated observations and incorporate observation weights to drastically speed up the superlearning & reduce memory usage.
Susan's tmle package doesn't support observation weights but ltmle does, so I'm using that right now. Mark suggested that observation weights would generally be important to support because they can be used to solve many different problems.
Thanks,
Chris
The code below leads to this error:
Error in runglmMSM(OData, wts_data, all_dummies, Ynode, verbose) :
trying to get slot "model" from an object (class "try-error") that is not an S4 object
Note that the function g.Nstatic10001 below implies that the monitoring intervention is 11111 when number.0.between.1s <- 0
# ESTIMATE SURVIVAL FOR ONE TREATMENT REGIMEN WITH STATIC (1,0,0,0,1,0,...) INTERVENTIONS ON MONITORING REGIME
# get the likelihood for the following static g^* N(t): 1,0,0,0,1,0,0,0,1,0
g.Nstatic10001 <- function(){
function(OdataDT, gn.N = "g0.N", SHIFTED.OUTCOME = "outcome.tplus1", ID = "ID", MONITOR = "N", t = "t", ...){
ID.expression <- as.name(ID)
Odata_sel <- OdataDT[, c(ID, MONITOR, t, SHIFTED.OUTCOME, gn.N), with = FALSE]
number.0.between.1s <- 0 ## every k weeks means k-1 0's between 1's and this shoudl be set to k-1
N.star.t <- as.integer((Odata_sel[[t]]%%(number.0.between.1s+1))%in%number.0.between.1s)
Odata_sel[, g.N := as.numeric(as.integer(Odata_sel[[MONITOR]]) == N.star.t)]
return(Odata_sel[["g.N"]])
}
}
# Define N(t) rule followers under static N.g.star: (1,0,1,0) and a column to the observed data.table in OData:
g.star.N.follow <- g.Nstatic10001()
OData$dat.sVar[, N.star.stat10001 := g.star.N.follow(OData$dat.sVar, gn.N = "g0.N", SHIFTED.OUTCOME = "outcome.tplus1", ID = "StudyID", MONITOR = "N", t = "intnum")]
#1. Obtain weighted data sets by rule:
wts.St.d7 <- getIPWeights(OData, gstar_TRT = "new.d7", gstar_MONITOR = "N.star.stat10001")
wts.St.d7.5 <- getIPWeights(OData, gstar_TRT = "new.d7.5", gstar_MONITOR = "N.star.stat10001")
wts.St.d8 <- getIPWeights(OData, gstar_TRT = "new.d8", gstar_MONITOR = "N.star.stat10001")
wts.St.d8.5 <- getIPWeights(OData, gstar_TRT = "new.d8.5", gstar_MONITOR = "N.star.stat10001")
wts.all <- list(d7 = wts.St.d7, d7.5 = wts.St.d7.5, d8 = wts.St.d8, d8.5 = wts.St.d8.5)
print(object.size(wts.all), units = "MB")
wts.all <- rbindlist(wts.all)
wts.all <- wts.all[!is.na(cumm.IPAW) & !is.na(outcome.tplus1) & (cumm.IPAW > 0), ]
print(object.size(wts.all), units = "MB")
data.table::setthreads(20) # will help load data into h2o
# MSM for hazard with regular weights:
t.breaks.byquarter <- c(1:9)-1 ## need to change or will get error due to bins with no event when computing IC - need to add as bug on github
MSM.IPAW <- survMSM(OData, wts_data = wts.all, t_breaks = t.breaks.byquarter, use_weights = TRUE, est_name = "IPAW", getSEs = getSEs)
# MSM for hazard with truncated weights:
MSM.trunc <- survMSM(OData, wts_data = wts.all, t_breaks = t.breaks.byquarter, use_weights = TRUE, trunc_weights = 20, est_name = "IPAWtrunc", getSEs = getSEs)
# crude MSM for hazard without any weights:
MSM.crude <- survMSM(OData, wts_data = wts.all, t_breaks = t.breaks.byquarter, use_weights = FALSE, est_name = "crude", getSEs = getSEs)
# save(list = c("MSM.IPAW", "MSM.trunc", "MSM.crude"), file = "./MSMs.Rdata")
Currently, a call to make_report_rmd leads to a browser or a pdf reader to open up automatically and display the report of analysis results. This is problematic when make_report_rmd is called several times from a larger program and one ends up with a cluttered desktop.
Is it possible to add an argument that prevents the pop-up of the display windows?
The documentation in fit_GCOMP()
for stratifyQ_by_rule
says,
Set to 'TRUE' for stratifying the fit of Q (the outcome model) by rule-followers only.
Does this mean that non-rule-followers are excluded at each time point from the hazard estimate? I don't see separate "stratified" estimates.
Thanks,
For example:
est_name time sum_Y_IPW sum_IPW St.directIPW rule.name
1: directIPW 0 9.828827 225.6710 0.9564462 TI.set1
2: directIPW 1 14.841714 308.6067 0.9519073 TI.set1
3: directIPW 2 21.627479 430.0012 0.9497037 TI.set1
4: directIPW 3 21.627479 606.0012 0.9643112 TI.set1
5: directIPW 4 43.571094 868.0025 0.9498030 TI.set1
6: directIPW 5 61.760385 1241.4314 0.9502507 TI.set1
7: directIPW 6 66.382274 1802.6454 0.9631751 TI.set1
8: directIPW 7 116.322109 2638.1985 0.9559085 TI.set1
9: directIPW 8 198.486284 3871.7562 0.9487348 TI.set1
10: directIPW 9 254.941362 5714.0214 0.9553832 TI.set1
11: directIPW 10 254.941362 8456.0006 0.9698508 TI.set1
12: directIPW 11 397.563386 12563.9928 0.9683569 TI.set1
13: directIPW 12 1225.200094 18784.3937 0.9347756 TI.set1
14: directIPW 13 1816.300699 27581.8941 0.9341488 TI.set1
15: directIPW 14 1816.300699 40628.9102 0.9552954 TI.set1
16: directIPW 15 4927.103677 60323.6409 0.9183222 TI.set1
17: directIPW 16 4927.103677 88105.7038 0.9440774 TI.set1
See stats::isoreg(y = y, x = ...)
I inted to submit paper about observational study in urological field.
I think TMLE is double estimation. If that is right, I could not understand the meaning of the result from TMLE.
Please, tell me, the script for drawing the Doubly robust adjusted Kaplan Meier curve (if possible 95% CI) and estimate log-rank test.
Say user specifies stratified PS models for A (resp. C or N) nodes such that the union of the person-time observations on A that fall in each stratum is not equal to the total number of person-time observations in the dataset.
Say that the user then asks stremr to compute an effect estimate for interventions on a subset of the A nodes (resp. C or N) such some elements of the subsets of person-time observations on A do not fall into any of the strata specified for gA.
Stremr should then return an error message to warn the user than they did not specify a sufficient number of strata for gA. Currently, stremr will return an output which will be incorrect.
Idea to fix this: add an argument check in getIPWeights that checks that the A (resp. C and N) person-time observations on which the user wish to intervene are all contained in the union of the person-time observations on A that fall within each stratum specified for gA. If not, stremr should give an error that asks the user to modify/add how they specified the strata for the PS model for gA.
Please see the example from the documentation of the functions bellow:
the return_wts option dose not return value when survNPMSM function is called.
data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c(ID, t))
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")
print(OData)
get_data(OData)
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))
OData <- fitPropensity(OData, gform_CENS = gform_CENS,
gform_TRT = gform_TRT,
gform_MONITOR = gform_MONITOR,
stratify_CENS = stratify_CENS)
require("magrittr")
AKME.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
survNPMSM(OData, return_wts = TRUE) %$%
estimates
AKME.St.1$wts_data
IPW.St.1 <- getIPWeights(OData, intervened_TRT = "TI.set1") %>%
directIPW(OData, return_wts = TRUE)
IPW.St.1$wts_data
When doing Q learning, especially with stratification, a single algorithm might fail. In this case a fall back fitting either with glm or speedglm should be performed. For some reason this doesn't happen in this case (./tests/examples/2_building_blocks_example.R):
The following error is produced:
unable to run randomForest with h2o for: intercept only models or designmat with zero rows or constant outcome (y) ...
Error in UseMethod("predictP1") :
no applicable method for 'predictP1' applied to an object of class "try-error"
In addition: Warning messages:
The code from ./tests/examples/2_building_blocks_example.R:
data(OdataNoCENS)
OdataDT <- as.data.table(OdataNoCENS, key=c(ID, t))
OdataDT[, ("N.tminus1") := shift(get("N"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.tminus1") := shift(get("TI"), n = 1L, type = "lag", fill = 1L), by = ID]
OdataDT[, ("TI.set1") := 1L]
OdataDT[, ("TI.set0") := 0L]
OData <- importData(OdataDT, ID = "ID", t = "t", covars = c("highA1c", "lastNat1", "N.tminus1"),
CENS = "C", TRT = "TI", MONITOR = "N", OUTCOME = "Y.tplus1")
gform_CENS <- "C ~ highA1c + lastNat1"
gform_TRT = "TI ~ CVD + highA1c + N.tminus1"
gform_MONITOR <- "N ~ 1"
stratify_CENS <- list(C=c("t < 16", "t == 16"))
require("h2o")
h2o::h2o.init(nthreads = -1)
params_TRT = list(fit.package = "h2o", fit.algorithm = "gbm", ntrees = 50,
learn_rate = 0.05, sample_rate = 0.8, col_sample_rate = 0.8,
balance_classes = TRUE)
params_CENS = list(fit.package = "speedglm", fit.algorithm = "glm")
params_MONITOR = list(fit.package = "speedglm", fit.algorithm = "glm")
OData <- fitPropensity(OData,
gform_CENS = gform_CENS, stratify_CENS = stratify_CENS, params_CENS = params_CENS,
gform_TRT = gform_TRT, params_TRT = params_TRT,
gform_MONITOR = gform_MONITOR, params_MONITOR = params_MONITOR)
t.surv <- c(0:5)
Qforms <- rep.int("Q.kplus1 ~ CVD + highA1c + N + lastNat1 + TI + TI.tminus1", (max(t.surv)+1))
params_Q = list(fit.package = "h2o", fit.algorithm = "randomForest",
ntrees = 100, learn_rate = 0.05, sample_rate = 0.8,
col_sample_rate = 0.8, balance_classes = TRUE)
tmle_est <- fitTMLE(OData, t_periods = t.surv, intervened_TRT = "TI.set1",
Qforms = Qforms, params_Q = params_Q,
stratifyQ_by_rule = TRUE)
Is it possible to use stremr without a monitoring variable? There does not seem to be a default for MONITOR in defineIntervedTRT(), so do I have to construct a pseudo monitoring variable for the function to work?
Thanks
Seems some definitions are missing in stratify_ if when following your example code.
> OData <- fitPropensity(OData, gform_CENS = gform_CENS, gform_TRT = gform_TRT, gform_MONITOR = gform_MONITOR, stratify_CENS = stratify_CENS)
Error in create_subset_expr(outvars = res$outvars, stratify.EXPRS = stratify.EXPRS) :
Could not locate the appropriate regression variable(s) within the supplied stratification list stratify_CENS, stratify_TRT or stratify_MONITOR.
The regression outcome variable(s) specified in gform_CENS, gform_TRT or gform_MONITOR were: ( 'C,TI,N' )
However, the item names in the matching stratification list were: ( 'C' )
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.