Problem
During the course of a computational replication, many sources of error might arise, causing the replication to fail. One critical component in a computational replication is the computing environment -- should, for example, any dependencies, like R packages, be no longer available, anyone wishing to reproduce your analyses in R will be unable to do so. Tools like Docker and The Rocker Project provide completely containerised environments -- including all dependencies -- for reproducing R analyses and projects.
Unfortunately this model of facilitating computational reproducibility across machines and analysts is extremely difficult to implement for the regular R user wishing to time-capsule their work. Specialised knowledge, and a good deal of time, is needed to get docker up and running. Some folk might not even know that Docker exists!
Consequently one of the most common models of open science involves authors submitting data and code to repositories like Dryad, and then providing the link inside their journal article. Whilst this ticks the transparency box of open science, it certainly does not guarantee reproducibility, for the reasons exemplified above.
Proposed solution
The fundamental objective is to create some sort of a time-capsule:
- R package - set of commands akin to blogdown for hugo in Rstudio. The user can timecapsule their R project.
- Shiny App - This package will be loaded so that the functions are available for implementation in a shinhy app, where the user can simply upload their data + code + etc etc and then hit "TImecapsule my R project", and the app using the package creates a docker container, that the user can then download.
The goal of the package, and the Shiny App, if we get there, is to create a "docker-like" system where the user can:
a) match the environment such that you can at least get the code to run
b) run the code, in a make-like manner
c) access the computing environment such that you can engage with raw, intermediate, and output objects in the data analysis pipeline of a scientific study to check the validity of the coding implementation of its analyses.
It should make the process of going from code, data, packages, and some set of assemblage instructions --> docker EASY!! The ultimate aim of making this process easy is to increase the generation of more reproducible scientific outputs, such that independent analysts can 1. obtain, and 2. re-run scientific analyses -- and, hopefully, reproduce them!
Thanks:
Thank you to @smwindecker and @stevekambouris for the initial ideas and impromptu workshopping today!