Giter Site home page Giter Site logo

tutorialcomputationalcausalinferenceestimators's Introduction

Educational notes: Introduction to computational causal inference using reproducible Stata, R and Python code

Authors

Matthew J. Smith (1) | Mohammad Ali Mansournia (2) | Camille Maringe (1) | Paul N. Zivich (3,4) | Stephen R. Cole (3) | Clemence Leyrat (1) | Aurelien Belot (1) | Bernard Rachet (1) | Miguel Angel Luque-Fernandez (*1,5,6) |

Affiliations

  1. Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology. London School of Hygieneand Tropical Medicine, London, U.K.
  2. Department of Epidemiology andBiostatistics. Tehran University of MedicalSciences, Tehran, Iran.
  3. Department of Epidemiology. University of North Carolina at Chapel Hill, North Carolina, U.S.
  4. Carolina Population Center. University ofNorth Carolina at Chapel Hill, North Carolina, U.S.
  5. Non-communicable Disease and Cancer Epidemiology Group, Instituto de investigacion Biosanitaria de Granada (ibs.GRANADA), Andalusian School of Public Health, University of Granada, Granada, Spain.
  6. Biomedical Network Research Centers of Epidemiology and Public Health (CIBERESP), Madrid, Spain.

Correspondence* Miguel Angel Luque-Fernandez, Email: [email protected]

This repository makes available to the scientific community the data and code used in the manuscript available at

Link to the published article

ABSTRACT

In research studies it can be unethical to assign a treatment to individuals in randomised controlled trials, instead observational data and an appropriate study design must be used. The purpose of many observational health studies is to estimate the effect of a treatment on an outcome which is causal. Although, there are major challenges with observational studies: one of which is confounding that can lead to biased estimates of the causal effect. Controlling for confounding is commonly performed by simple adjustment of measured confounders; although, sometimes this approach is suboptimal. Recent advances in the field of causal inference have dealt with confounding by building on classical standardisation methods. However, these recent advances have progressed quickly with a relative paucity of computational-oriented applied tutorials contributing to some confusion in the use of these methods among applied researchers. In this tutorial, we show the computational implementation of different causal inference estimators from a historical perspective where new estimators were developed to overcome the limitations of the previous estimators (i.e., non-parametric and parametric g-formula, inverse probability weighting, double-robust, and data-adaptive estimators). Furthermore, we illustrate the use of different methods using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R and Python for researchers to adapt in their own observational study. The code can be accessed at

https://github.com/migariane/Tutorial_Computational_Causal_Inference_Estimators

KEYWORDS: Causal Inference; Regression adjustment; G-methods; G-formula; Propensity score; Inverse probability weighting; Double-robust methods; Machine learning; Targeted maximum likelihood estimation; Epidemiology; Statistics; Tutorial

tutorialcomputationalcausalinferenceestimators's People

Contributors

mattyjsmith avatar migariane avatar pzivich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tutorialcomputationalcausalinferenceestimators's Issues

possible error code in R script box 27?

hi there,

is it possible that there might be an error in the R code in the simulation part? in the generateData function in Box 27, we currently have:

 generateData <- function(n){
      w1 <- round(runif(n, min=1, max=5), digits=0) 
      w2 <- rbinom(n, size=1, prob=0.45)
      w3 <- round(runif(n, min=0, max=1), digits=0 + 0.75*w2 + 0.8*w1)
      w4 <- round(runif(n, min=0, max=1), digits=0 + 0.75*w2 + 0.2*w1)
      A  <- rbinom(n, size=1, prob= plogis(-1 -  0.15*w4 + 1.5*w2 + 0.75*w3 + 0.25*w1 + 0.8*w2*w4))
      # Counterfactuals
      Y.1 <- rbinom(n, size=1, prob = plogis(-3 + 1 + 0.25*w4 + 0.75*w3 + 0.8*w2*w4 + 0.05*w1))
      Y.0 <- rbinom(n, size=1, prob = plogis(-3 + 0 + 0.25*w4 + 0.75*w3 + 0.8*w2*w4 + 0.05*w1)) 
      # Observed outcome
      Y <- Y.1*A + Y.0*(1 - A)
      # return data.frame
      data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
      }

As w3 and w4 are meant to be categorical variables, shouldn't it be this instead:

     w3 <- round(runif(n, min=0, max=1) + 0.75*w2 + 0.8*w1, digits=0)
     w4 <- round(runif(n, min=0, max=1) + 0.75*w2 + 0.2*w1, digits=0)

Also, still in box 27, what's the reason for choosing family = poisson(link="log") in the ATE naive approach rather than family = binomial(link = "logit") ?

many thanks,

Christel

Query: zEpid non-parametric vs parametric g-formula

Hi there,

Many thanks for a really interesting paper and tutorial!

This is more a Discussion query than an Issue. In the Python code boxes,

  • Box 6: Non-parametric g-formula using a fully saturated model with zEpid
  • Box 8: Parametric regression adjustment using zEpid

The zEpid model specification in these boxes look identical, although Box 6 header states it's for non-parametric g-formula whereas Box 8 header is for parametric g-formula. Is this because zEpid applies parametric g-formula either way?

Parametric g-formula seems to be the default application from the docs, as I understand. Apologies if I missed something.

Thanks again.

Marginal structural model in R code for box 20

The marginal structural model used with non-stabilised weights appears to include covariates
msm <- lm(Y ~ A + C + w1 + w2 + as.factor(w3) + as.factor(w4), data = data, weights = data$w) # MSM

Is there a reason for that?
I would have expected it to look exactly like the MSM with stabilised weights (just with different weights)
msm <- lm(Y ~ A, data = data, weights = sw)

The corresponding Stata code in the paper does not appear to include any covariates in either case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.