Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision (eLife, 2021)
https://elifesciences.org/articles/69698
The underlying work is broken down into a few RMarkdown and R scripts:
- Data_Prep: this puts together the analysis datasets into a manageable set. This script can only be run with access to the original data files - this needs permission via data access committees (see details below)
- Mixture_modelling: this is the main script for the diagnostic model of severe malaria. This generates Figures 1 and 3 of the paper. This script can be run using the curated dataset provided. Note that the Bayesian fitting for the diagnostic model takes a long time on a standard computer (approx 10 hours)
- Direct_typed_Association_Study: association study using the directly typed SNPs. This can only be run with access to the directly typed polymorphism data
- Extra_analyses: a few extra bits using the estimated probabilities of severe malaria, using data that are not open access (positive blood cultures, haematocrits)
- Simulation_study_weightedLikelihood: this implements the simulations given in Appendix 12 of the paper, showing how the weighted likelihood works
Any questions contact me: jwatowatson at gmail dot com
To run the models you will need a few widely used R packages plus two key model fitting packages:
- rstan (run the Bayesian models)
- mgcv (fit the generalised additive models using penalised splines)
A curated minimal clinical dataset is currently available alongside the code on this github repository. This does not contain any genetic data apart from sickle genotyping. The GWAS reported in this paper used genome-wide genotyping data generated by the MalariaGen consortium. These data are available on request from the European Genome-Phenome Archive (dataset accession ID: EGAD00010001742).
Requests for access to appropriately anonymized clinical data and directly typed genetic variants for the Kenyan severe malaria cohort can be made by application to the data access committee at the KEMRI–Wellcome Trust Research Programme by e-mail to mmunene at kemri hyphen wellcome dot org.
The FEAST trial datasets are available from the principal investigator on reasonable request (k dot maitland at imperial dot ac dot uk). Requests for access to appropriately anonymized clinical data from the AQ and AAV Vietnam study and the Asian paediatric cohort can be made via the Mahidol Oxford Tropical Medicine Research Unit data access committee by emailing me (jwatowatson at gmail dot com).