Reproducible Data Science in Python using Renku

Description

The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at one solution in particular, Renku and work through an end-to-end scenario with it.

Set Up

There are several easy ways to set up an environment for working through the tutorial. The easiest is to use a hosted environment.

Hosted

Renkulab is a Renku environment hosted by SDSC. Follow these instructions to use Renkulab.
Alternatively, you can use a MyBinder Environment.

Local

If you wish to run the tutorial on your own computer, you can create an environment with conda or docker.

If you prefer to use something else (e.g., pipenv), you will need to ensure that git, git-lfs, curl, and node are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.

Note for Windows users If you are on Windows, we recommend using one of the hosted environments, either renkulab or binder.

Schedule

Introduction (1h)
15 min	Background & Theory	Terminology, history, and philosophy of reproducibility
30 min	Building Blocks	Building blocks for achieving reproducibility
15 min	Tools	Survey of the current tool landscape

Break (10 min)

Hands-on with Renku (1h 30m)
30 min	Starting	Starting a project, importing data, building a workflow
30 min	Iterating	Updating code and data to improve analysis
30 min	Details and Reflection	What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers?

Acknowledgements

Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!

The icons used are from Icons8.

ibonaranburu / reproducible-data-science Goto Github PK

reproducible-data-science's Introduction

Reproducible Data Science in Python using Renku

Description

Set Up

Hosted

Local

Schedule

Acknowledgements

reproducible-data-science's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent