covpn / correlates_reporting_usgcove_archive Goto Github PK

Reproducible reporting workflows for the immune correlates statistical analyses of the Moderna and Janssen COVID-19 vaccine efficacy trials by the USG/COVE Response Biostatistics Team [Archived 16 October 2021]

License: GNU General Public License v3.0

Makefile 0.41% Shell 0.16% R 23.43% TeX 2.03% HTML 73.97%

r data-science immune-correlates covid-19 causal-inference machine-learning statistics exploratory-data-analysis vaccine-efficacy

correlates_reporting_usgcove_archive's Introduction

CoVPN/USG Correlates Analysis Reporting Archive

Note: As of 16 October 2021, this repository has been archived. Generalized versions of its data processing and analysis modules have been split in two and migrated to a correlates data processing workflow and a correlates of risk and protection analysis workflow. This archive serves as a reference of the workflows used in developing reports for the immune correlates analyses of the Moderna and Janssen (ENSEMBLE) COVID-19 vaccine efficacy trials.

The Statistical Analysis Plan is available

Collaboration Guide

Please consult this blog post, which outlines most aspects of our project organization recommendations.

Code style guide, with some modifications; this will largely be enforcd with styler.
Project organization: mostly independent subdirectories, each incorporating here for path resolution.
Package version control and virtual environments using renv.
Code review procedure: see our contribution guidelines.

Citation

When citing the analysis workflow or analytic results produced by its use, please use the following BibTeX entry or equivalent:

    @software{gilbert2021usgcove,
      author = {Gilbert, Peter B and Fong, Youyi and Benkeser, David and
        Hejazi, Nima S and Hughes, Ellis and Borate, Bhavesh and Yu,
        Chenchen and Lu, Yiwen and Li, Kendrick Q and {van der Laan}, Lars
        WP and Simpkins, Brian},
      title = {{COVID-19 Prevention Network Immune Correlates Analyses},
      year  = {2021},
      doi = {10.5281/zenodo.5593129},
      url = {https://github.com/CoVPN/correlates_reporting_usgcove_archive}
    }

License

The contents of this repository are distributed under the GPL-3 license. See file LICENSE.md for details.

correlates_reporting_usgcove_archive's People

Contributors

Stargazers

Watchers

Forkers

dilu963 cyu-hvtn yinghuang124 kenli93 bdwilliamson sijia95 atombaby yiwenlu brborate youyifong briansimpkins larsvanderlaan scarlett422301 wxyrc5 hqraiwxy

correlates_reporting_usgcove_archive's Issues

ongoing travis build issues

Current build fails in a couple places:

Can't find threeparttablex Latex package and apparently unable to download it on the fly.

tlmgr: package repository http://ftp.math.purdue.edu/mirrors/ctan.org/systems/texlive/tlnet (not verified: unknown)
[1/1, ??:??/??:??] install: threeparttablex [3k]
TLPDB::_install_package: downloading did not succeed
tlmgr: package log updated: /home/travis/texlive/texmf-var/web2c/tlmgr.log
tlmgr update --self
No connection to the internet.
Unable to download the checksum of the remote TeX Live database,
but found a local copy so using that.
You may want to try specifying an explicit or different CTAN mirror;
see the information and examples for the -repository option at
http://tug.org/texlive/doc/install-tl.html
(or in the output of install-tl --help).
tlmgr: package repository http://ftp.math.purdue.edu/mirrors/ctan.org/systems/texlive/tlnet (not verified: unknown)
[1/1, ??:??/??:??] install: threeparttablex [3k]
TLPDB::_install_package: downloading did not succeed
tlmgr: package log updated: /home/travis/texlive/texmf-var/web2c/tlmgr.log
! LaTeX Error: File `threeparttablex.sty' not found.

Also getting error about xtable, which I thought was already in the lockfile.

Error in loadNamespace(name) : there is no package called ‘xtable’
Calls: mytex ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted

mock data documentation

Could we add to data_clean/README.md to describe

wt.2
NAs in wt and wt.2 and how to interpret

horizontal tables in immuno report

When the immuno report is knitted to a pdf, some tables run off the page.

Maybe this solution could help?

more transparency in spaghetti plots

horizontal table in cor report

@youyifong what do you think about turning this figure sideways?

Move tables in front of figures in immuno report

issues with RCDF plots

NAs present
text in legend cut off
Also, what does o: and e: mean in the legend?

threshold CI questions

Why, when the risk dips to 0, does the CI go from 0 to 1? I also wonder whether it is possible to render the superscripts in math font?

Originally posted by @benkeser in #85 (comment)

cor report with only subset of assays

I am just wrapping up #194, which allows the immuno report to build for any subset of assays and days (29/57).

It seems that our first analyses may not include all four assays and possibly only Day 57 markers, so I'd like to get our code to a place where it can run without requiring this.

@yiwenlu cor_graphical currently breaks (see line 6008 here).

I haven't run through @youyifong 's cor_coxph nor @Larsvanderlaan cor_threshold. If time this week, would be good to have PRs submitted as needed for all of these directories.

warnings in `cox_corph`

Just wanted to check that the warning generated in cor_coxph is nothing to worry about.

Warning messages:
1: In fitter(X, Y, istrat, offset, init, control, weights = weights,  :
  Loglik converged before variable  2 ; coefficient may be infinite. 
2: In fitter(X, Y, istrat, offset, init, control, weights = weights,  :
  Loglik converged before variable  1,2 ; coefficient may be infinite. 
3: In fitter(X, Y, istrat, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge

Add study name to report titles

tagging compiled reports with a build ID

@nhejazi @thebioengineer
It occurs to me that we may want to modify the deploy script for travis builds. Right now, it just clears out the entire gh-pages branch and puts the compiled reports there and pushes back.

Maybe better would be to have a folder for master builds and a folder for PR builds that includes reports where the name includes the PR number or the travis build ID or something. That way, submitters could track down their actual report built from their PR without the hassle of going through commits.

What do you think? Worth the few lines of bash in the travis file?

`immuno_graphical` requests

@KenLi93 (cc @nhejazi)

I took a hack through your code to try to get a bit more modularity and fixed a few bugs that came up along the way. This has now been merged into master.

A few FYI points:

I added a code/params.R file where some of the common parameters for the script are saved.
I have removed the compiled figures from the repo. Ideally we will just be storing the source code on GitHub. Output files can be saved locally, but please add to .gitignore so they are not pushed to GitHub for the time being.

I have a few additional requests for you:

Please look through Makefile and confirm you can successfully execute make twophase_plots and make demo_plots to confirm that my re-coding worked correctly.
code/descriptive_graphics_barcharts.R doesn't seem to be called anywhere that I could find. Either remove this file or add a make command.
For all of your functions (e.g., in code/covid_corr_plot_functions.R), please add roxygen2 style documentation.
Please update README.md to add information about the contents of your directory, the purpose of the code, etc... Anything you think someone would need to know to execute your code properly.

parameters in `_common.R` vs. `code/params.R`

@KenLi93 @cyu-hvtn @yiwenlu @youyifong @nhejazi @thebioengineer

Currently, there is redundancy the definition of certain variables across _common.R, which is sourced by all scripts, and in subdirectory specific folders (e.g., cor_coxph/code/params.R).

To make the code bank more robust, we would like to remove this redundancy. The idea is to have _common.R define parameters that span across multiple analyses and reserve code/params.R for subdirectory-level-analysis-specific parameters.

For example, what should be in _common.R?

names of assays
labels for variables
LLOQ/LLOD (these are all over the place right now across the scripts)
times of the analysis (D0 vs. 29 vs. 57)
etc...

Basically all the things that were previously coming from the COVIDcorr package.

What should be in params.R? Anything specific to your analysis that may change across company runs; for example,

number of bootstrap replications
limits of axes for plots
table options
etc...

If something is defined in _common.R, it should not be re-defined in your params.R script. This will save us headache later.

Could you please ensure that you are able to build your subdirectory-specific analysis after modifying any params.R file as necessary to remove redundancies with _common.R?

parameters defined in immuno_tabular

@cyu-hvtn

The variable labels.assays.short is defined both in _common.R, as well as in immuno_tabular/code/make_parameters.R.

Is this redundancy needed? Could we stick to the version used in _common.R? Or does it need to be re-defined here for some reason?

problems with `glmnet` learner in `cor_thresh`

The current build fails in cor_thresh running Run_Threshold_analysis.R

[1] "Day29bindSpike"
[1] "Running analysis for threshold: 3.221"
Failed on Lrnr_glmnet_NULL_deviance_10_1_100_TRUE
Error in if (!all(o)) { : missing value where TRUE/FALSE needed
> traceback()
10: stop(first_error$value)
9: self$compute_step()
8: scheduler$compute()
7: delayed_fit$compute(job_type = sl3_delayed_job_type(), progress = verbose)
6: lrnr_Delta$train(tasks_Delta$train_u) at tmleThresh.R#290
5: get_preds_TSM(task_list, lrnr_A, lrnr_Y, lrnr_Delta) at tmleThresh.R#45
4: thresholdTMLE(data_full, node_list, thresholds = thresholds, 
       biased_sampling_strata = "grp", biased_sampling_indicator = "TwophasesampInd", 
       lrnr_A = lrnr, lrnr_Y = lrnr, lrnr_Delta = Lrnr_glmnet$new()) at #12
3: withCallingHandlers(expr, warning = function(w) if (inherits(w, 
       classes)) tryInvokeRestart("muffleWarning"))
2: suppressWarnings(thresholdTMLE(data_full, node_list, thresholds = thresholds, 
       biased_sampling_strata = "grp", biased_sampling_indicator = "TwophasesampInd", 
       lrnr_A = lrnr, lrnr_Y = lrnr, lrnr_Delta = Lrnr_glmnet$new())) at #12
1: run_threshold_analysis(marker)

Originally posted by @benkeser in #96 (comment)

caption + figure placement

In new report builds, the formatting can get a bit gory.

there are very small vertical margins
Figure captions can be separated from the figures by a page break.

This is a relatively minor issue, but maybe it's a quick fix. Or was this something @nhejazi had to do in order to get the section headings to go to the right spot?

Broken Cor coxph

@youyifong The changes you introduced in fd02b6b to revert to old, broken code are causing cor report builds to silently break.

Because this slipped through so long ago, there may have been additional bugs introduced after this commit that we haven't discovered yet. Please replace report code with the working code that I had inserted and re-submit a PR ASAP.

I also see that report.Rmd hard codes in mock as the study name. Please remove that code in favor of reading study_name from _common.R.

Flag for adding watermark

If report is made with mock data, add a water mark that says Mock on every page.

Random Selection Reproducibility

I wanted to open an issue to document how we wanted to approach reproducibility with respect to random selections.
Normally I approach this by setting a seed just before the random selection, but I wanted to solicit thoughts and feedback.

This will be important for the verification of plots where we do random subsetting, possibly other scenarios as well.

@benkeser @nhejazi

>= rendering in forest plots

Reviewing _common.R, I see the variables Bstratum.labels and demo.stratum.labels defined.

I do not see these variables referenced anywhere except for @youyifong 's forest plots. Could we wrap these in a call to expression() so that they render as the mathematical symbol in the plots rather than >=?

Rplots.pdf in cor_graphical

When I run make all in cor_graphical a random plot called Rplots.pdf appears in cor_graphical directory. @yiwenlu @KenLi93 could you figure out what code is producing this plot and fix it?

Low priority issue.

Resolving build errors in immuno report

I think Chenchen is better suited to lead the response to this one. It is
probably just some light touching-up that needs to be done in the immuno
report code in the merged PR.

On Thu, Mar 25, 2021 at 5:28 PM Nima Hejazi @.***>
wrote:

Assigned #169 #169 to
@youyifong https://github.com/youyifong.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#169 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AHBH5LZPBTNBDZL75VM6SXTTFPIL7ANCNFSM4Z2JIO2A
.

Originally posted by @youyifong in #169 (comment)

page numbers off in TOC

???

Is it a need to run pdflatex multiple times thing?

Googling did not turn up much...

subsections in `cor_coxph`

@youyifong , could you add a few subsection heads in your report.Rmd to help the TOC render more informatively?

`cor_threshold` subdirectory

@youyifong , we set a placeholder for threshold finding methods as part of the CoR report, but did I see that perhaps some of those methods are applied in cor_nonlinear analysis? If so, we'll just delete the cor_threshold directory.

`ylim` plotting error in `cor_coxph`

The automated Travis builds are running into an error due to non-finite ylim values in a call to plot(), in cor_coxph (https://travis-ci.com/github/CoVPN/correlates_reporting/builds/218921367#L2899). With a fresh clone from the travis_fix branch, are you able to replicate this @youyifong?

NA's in violin plots

@yiwenlu @KenLi93

I'm seeing an NA category generated now in the violin plots in cor_graphical. I think it's probably down to the missing values for wt.

Long build times on travis

Travis seems to enforce a 10 min limit. Is that new?

That's always been the case. We've had, at times, to request that code print messages providing status updates so that the builds continue.

I think we need to go back to the travis build logs and git diffs to see what changes may have caused this.

Originally posted by @benkeser in #158 (comment)

Axis size in `cor_threshold`

Minor detail, but the font size on the axes are a bit large.

Originally posted by @benkeser in #85 (comment)

build branch

@nhejazi @thebioengineer
Brain storm here -- and it's late in the day, so this might not make sense -- what if we instituted a build branch of the repo. The idea is that only Travis would push to this branch after successful builds, but this branch would include derived (binary) output files.

This may be convenient for contributors for pulling derived objects from other folders to confirm that the report builds when they've updated their code.

Example: I want to update cop_mediation, but I don't want to have to re-build all the other cop analyses from scratch. So I populate those folder's output/fig directories from the build branch, so that I can quickly compile the report with my own cop results, to confirm that the report builds.

Maybe this is overkill, as I could just specify my own report build command that checks just my report code.

I don't know: any benefit to this? Or am I under-caffeinated?

automated code checks via precommit

In order to reduce the potential of introducing erroneous code and to enforce uniformity in code style, we should adopt the pre-commit framework. By adding a simple configuration file and installing the pre-commit package, we can run automatic checks that ensure R code is valid, correctly styled (via the styler package), and more — all prior to commits being made. More at https://lorenzwalthert.github.io/precommit/articles/available-hooks.html. Worth getting this set up @benkeser @thebioengineer? If so, I can take an initial pass at it.

`renv` creep

I suspect that we have consider creep of the renv libraries needed for this project. It might be a good idea at some point to provision a fresh set of renv libraries to reduce the burden of activating renv.

Empty subfolders

@cyu-hvtn @Larsvanderlaan @youyifong @yiwenlu @KenLi93

Just a note that if you save output to a subfolder of output or figs or data_clean, you need to manually upload that folder to GitHub. Do this by placing an empty file named .gitkeep in the empty directory and committing the result.

For example, say my code in cop_mediation generates a figure that I want to save in cop_mediation/figs/mediation_graphics. To submit this code for a PR, I also need to manually create this folder and drop a .gitkeep file in it.

# create folder
mkdir cop_mediation/figs/mediation_graphics
# empty file in it so git recognizes it
touch cop_mediation/figs/mediation_graphics/.gitkeep
# commit the result
git add cop_mediation/figs/mediation_graphics
git commit -m "add mediation_graphics folder"

@nhejazi or @thebioengineer , could we drop this somewhere in contributing guidelines? It's happened numerous times on PRs.

boxplots in baseline positive

@yiwenlu @KenLi93

Currently, plots with label boxplots_Delta57overB_trt_vaccine_x_cc_BaselinePos_ are not being generated as part of the Makefile, but they are included in report.Rmd. Should these plots be included in the report? If so, how can the code be modified to generate these plots?

NA in plot

@KenLi93 @yiwenlu

This figure in the immuno report needs some tweaks. What's up with the NA? And the legend is cut off.

overwriting reports

It seems that currently, if an immuno report is built prior to a cor report, the cor report will overwrite the file for immuno. Perhaps how bookdown handles placing the output (re-writing the contents of _book every time.

legend too big in RCDF plot

@KenLi93 what do you think about moving the confidence interval information to the caption of the plot for these figures so there is more room for the figure itself?

license

We should probably pick a license to release under

COVIDcorr location in lockfile

location in the lockfile points to the old installation as opposed to the CoVPN directory

warning in base_riskscore

In a recent build, I am seeing the following warning (line 1435 of build log) when code/tables_figures.R is executed:

Warning message:
Problem with `mutate()` input `Learner`.
ℹ Unknown levels in `f`: SL.ranger
ℹ Input `Learner` is `fct_relevel(...)`.

@brborate can you confirm whether this warning is OK to ignore? The report still builds, but not clear if it's as expected or not. For verification purposes, I can't look into the issue further myself.

scatter plots with smoothers

I find it difficult to read these plots. The gray lines are hard to line up with the colored figures. And the points seem overly large. @peterbriangilbert @youyifong @nhejazi @yiwenlu @KenLi93 . Any ideas?

scatter_pnAb_id50_Vaccine_BaselineNeg_Day29_mock.pdf

Immunogenicity graphics build issues

Hi @KenLi93 (cc @benkeser) ---
As part of assembling the full immunogenicity report, we need to run make book_immuno from the top level of this repository. After merging the latest PRs and running make all in the immuno_graphics directory, the immuno_graphics/report.Rmd appears to contain references to some graphics that are not produced as part of your make recipes. I've manually added the RMarkdown chunk option eval=FALSE to all such code chunks in your report.Rmd; please take a look and implement any necessary fixes (I'm not sure why these files don't exist: they could be outdated and no longer necessary, or it could be that they're named incorrectly in the process intended to generate them). Once you've identified the underlying issues and fixed, please submit a new PR with the corrections.

When working through this, please make sure to pull the latest from this master branch as a starting point, checkout a new branch (e.g., git checkout -b immuno_graphics_fixes), implement your changes, and then generate a PR to address this issue. To check that the full book builds, please trying running make book_immuno from the top level of this repository.

removing imagemagick dependencies

@nhejazi @Larsvanderlaan

Getting the threshold report to build on travis was proving to be quite a pain; around a46d530, I started noticing that the table files rendered to pdf were not being generated. It seems to be an issue with needing to configure kableExtra. This all just seemed like too much, so I moved away from that structure in 13ded38 to have the kable tables generated as latex output directly in sub_report.Rmd. It seems to be building on travis now, but waiting for the full build to confirm.

ALL THAT SAID: can we now remove the imagemagick libraries and dependencies? @Larsvanderlaan is this used anywhere else in your code? It'd be nice to get rid of it to speed up automatic builds.

Immunogenicity tables build issues

Hi @cyu-hvtn (cc @benkeser) ---
As I'm building out the full report (using make book_immuno from the top level of this repo), I'm running into an error in immuno_tabular/report.Rmd. Specifically, I'm getting the failure

Error: Can't subset columns that don't exist.
✖ Column `responder_cat` doesn't exist.

which points to (at least as a first instance) https://github.com/CoVPN/correlates_reporting/blob/master/immuno_tabular/report.Rmd#L172. Would you mind looking into this error, and fixing this and any subsequent errors in the building of your report?

When working through this, please make sure to pull the latest from this master branch as a starting point, checkout a new branch (e.g., git checkout -b immuno_tabular_fixes), implement your changes, and then generate a PR to address this issue. To check that the full book builds, please trying running make book_immuno from the top level of this repository.

RCDF plots

@KenLi93 @peterbriangilbert
In the reverse CDF plots like the one below, it looks like there is some linear interpolation that occurs below the LLOQ. Is that desired? Or should it be more like KM, where the curve would stay at 1 until it hits that LLOQ and then steps down to the 1 - the fraction at the LLOQ?

error building immuno report with current master

I'm getting a build error on the immuno_report that i wasn't seeing previously. Something happening in code/descriptive_graphics_two_phase_plots.R and in particular calls to covid_corr_pairplots. The error is

Error in floor(rr[1]):ceiling(rr[2]) : NA/NaN argument

Originally posted by @benkeser in #66 (comment)

2: Missing column names filled in: 'X1' [1]

Could you check whether there is supposed to be a name there?