The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at one solution in particular, Renku and work through an end-to-end scenario with it.
There are several easy ways to set up an environment for working through the tutorial. The easiest is to use a hosted environment.
-
Renkulab is a Renku environment hosted by SDSC. Follow these instructions to use Renkulab.
-
Alternatively, you can use a MyBinder Environment.
If you wish to run the tutorial on your own computer, you can create an environment with conda or docker.
If you prefer to use something else (e.g., pipenv), you will need to ensure that git
, git-lfs
, curl
, and node
are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.
Note for Windows users If you are on Windows, we recommend using one of the hosted environments, either renkulab or binder.
Introduction (1h) | ||
---|---|---|
15 min | Background & Theory | Terminology, history, and philosophy of reproducibility |
30 min | Building Blocks | Building blocks for achieving reproducibility |
15 min | Tools | Survey of the current tool landscape |
Break (10 min) | ||
Hands-on with Renku (1h 30m) | ||
30 min | Starting | Starting a project, importing data, building a workflow |
30 min | Iterating | Updating code and data to improve analysis |
30 min | Details and Reflection | What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers? |
Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!
The icons used are from Icons8.