migariane / tutorialcomputationalcausalinferenceestimators Goto Github PK

Tutorial_Computational_Causal_Inference_Estimators

License: MIT License

R 17.69% Stata 12.91% Jupyter Notebook 69.40%

tutorialcomputationalcausalinferenceestimators's Introduction

Educational notes: Introduction to computational causal inference using reproducible Stata, R and Python code

Authors

Affiliations

Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology. London School of Hygieneand Tropical Medicine, London, U.K.
Department of Epidemiology andBiostatistics. Tehran University of MedicalSciences, Tehran, Iran.
Department of Epidemiology. University of North Carolina at Chapel Hill, North Carolina, U.S.
Carolina Population Center. University ofNorth Carolina at Chapel Hill, North Carolina, U.S.
Non-communicable Disease and Cancer Epidemiology Group, Instituto de investigacion Biosanitaria de Granada (ibs.GRANADA), Andalusian School of Public Health, University of Granada, Granada, Spain.
Biomedical Network Research Centers of Epidemiology and Public Health (CIBERESP), Madrid, Spain.

Correspondence* Miguel Angel Luque-Fernandez, Email: [email protected]

This repository makes available to the scientific community the data and code used in the manuscript available at

Link to the published article

ABSTRACT

In research studies it can be unethical to assign a treatment to individuals in randomised controlled trials, instead observational data and an appropriate study design must be used. The purpose of many observational health studies is to estimate the effect of a treatment on an outcome which is causal. Although, there are major challenges with observational studies: one of which is confounding that can lead to biased estimates of the causal effect. Controlling for confounding is commonly performed by simple adjustment of measured confounders; although, sometimes this approach is suboptimal. Recent advances in the field of causal inference have dealt with confounding by building on classical standardisation methods. However, these recent advances have progressed quickly with a relative paucity of computational-oriented applied tutorials contributing to some confusion in the use of these methods among applied researchers. In this tutorial, we show the computational implementation of different causal inference estimators from a historical perspective where new estimators were developed to overcome the limitations of the previous estimators (i.e., non-parametric and parametric g-formula, inverse probability weighting, double-robust, and data-adaptive estimators). Furthermore, we illustrate the use of different methods using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R and Python for researchers to adapt in their own observational study. The code can be accessed at

https://github.com/migariane/Tutorial_Computational_Causal_Inference_Estimators

KEYWORDS: Causal Inference; Regression adjustment; G-methods; G-formula; Propensity score; Inverse probability weighting; Double-robust methods; Machine learning; Targeted maximum likelihood estimation; Epidemiology; Statistics; Tutorial

tutorialcomputationalcausalinferenceestimators's People

Contributors

Stargazers

Watchers

Forkers

sara-kassani mattyjsmith biostata giulioscola zhanyq yongfuyu owain-s joannadiong uxuelazkano ajiajiajiaji saqibmamoon diarmuid78 ricciardi aki0012 denisashah mfosset

tutorialcomputationalcausalinferenceestimators's Issues

possible error code in R script box 27?

hi there,

is it possible that there might be an error in the R code in the simulation part? in the generateData function in Box 27, we currently have:

 generateData <- function(n){
      w1 <- round(runif(n, min=1, max=5), digits=0) 
      w2 <- rbinom(n, size=1, prob=0.45)
      w3 <- round(runif(n, min=0, max=1), digits=0 + 0.75*w2 + 0.8*w1)
      w4 <- round(runif(n, min=0, max=1), digits=0 + 0.75*w2 + 0.2*w1)
      A  <- rbinom(n, size=1, prob= plogis(-1 -  0.15*w4 + 1.5*w2 + 0.75*w3 + 0.25*w1 + 0.8*w2*w4))
      # Counterfactuals
      Y.1 <- rbinom(n, size=1, prob = plogis(-3 + 1 + 0.25*w4 + 0.75*w3 + 0.8*w2*w4 + 0.05*w1))
      Y.0 <- rbinom(n, size=1, prob = plogis(-3 + 0 + 0.25*w4 + 0.75*w3 + 0.8*w2*w4 + 0.05*w1)) 
      # Observed outcome
      Y <- Y.1*A + Y.0*(1 - A)
      # return data.frame
      data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
      }

As w3 and w4 are meant to be categorical variables, shouldn't it be this instead:

     w3 <- round(runif(n, min=0, max=1) + 0.75*w2 + 0.8*w1, digits=0)
     w4 <- round(runif(n, min=0, max=1) + 0.75*w2 + 0.2*w1, digits=0)

Also, still in box 27, what's the reason for choosing family = poisson(link="log") in the ATE naive approach rather than family = binomial(link = "logit") ?

many thanks,

Christel

Query: zEpid non-parametric vs parametric g-formula

Hi there,

Many thanks for a really interesting paper and tutorial!

This is more a Discussion query than an Issue. In the Python code boxes,

Box 6: Non-parametric g-formula using a fully saturated model with zEpid
Box 8: Parametric regression adjustment using zEpid

The zEpid model specification in these boxes look identical, although Box 6 header states it's for non-parametric g-formula whereas Box 8 header is for parametric g-formula. Is this because zEpid applies parametric g-formula either way?

Parametric g-formula seems to be the default application from the docs, as I understand. Apologies if I missed something.

Thanks again.

Marginal structural model in R code for box 20

The marginal structural model used with non-stabilised weights appears to include covariates
msm <- lm(Y ~ A + C + w1 + w2 + as.factor(w3) + as.factor(w4), data = data, weights = data$w) # MSM

Is there a reason for that?
I would have expected it to look exactly like the MSM with stabilised weights (just with different weights)
msm <- lm(Y ~ A, data = data, weights = sw)

The corresponding Stata code in the paper does not appear to include any covariates in either case.

error in code

in the code (TutorialComputationalCausalInferenceEstimators), line 97, the calculation refer to the C function, without any parameters. It produces the error message

Error in reg$coefficients[3] * C : 
  non-numeric argument to binary operator

migariane / tutorialcomputationalcausalinferenceestimators Goto Github PK

tutorialcomputationalcausalinferenceestimators's Introduction

Educational notes: Introduction to computational causal inference using reproducible Stata, R and Python code

Authors

Affiliations

ABSTRACT

tutorialcomputationalcausalinferenceestimators's People

Contributors

Stargazers

Watchers

Forkers

tutorialcomputationalcausalinferenceestimators's Issues

possible error code in R script box 27?

Query: zEpid non-parametric vs parametric g-formula

Marginal structural model in R code for box 20

error in code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent