Giter Site home page Giter Site logo

ibonaranburu / reproducible-data-science Goto Github PK

View Code? Open in Web Editor NEW

This project forked from swissdatasciencecenter/reproducible-data-science

0.0 1.0 0.0 3.15 MB

Repository for the Reproducible Data Science tutorial

License: Apache License 2.0

Jupyter Notebook 99.12% Python 0.88%

reproducible-data-science's Introduction

Reproducible Data Science in Python using Renku

Description

The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at one solution in particular, Renku and work through an end-to-end scenario with it.

Set Up

There are several easy ways to set up an environment for working through the tutorial. The easiest is to use a hosted environment.

Hosted

Binder

Local

If you wish to run the tutorial on your own computer, you can create an environment with conda or docker.

If you prefer to use something else (e.g., pipenv), you will need to ensure that git, git-lfs, curl, and node are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.

Note for Windows users If you are on Windows, we recommend using one of the hosted environments, either renkulab or binder.

Schedule

Introduction (1h)
15 min Background & Theory Terminology, history, and philosophy of reproducibility
30 min Building Blocks Building blocks for achieving reproducibility
15 min Tools Survey of the current tool landscape
Break (10 min)
Hands-on with Renku (1h 30m)
30 min Starting Starting a project, importing data, building a workflow
30 min Iterating Updating code and data to improve analysis
30 min Details and Reflection What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers?

Acknowledgements

Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!

The icons used are from Icons8.

reproducible-data-science's People

Contributors

cramakri avatar ciyer avatar rokroskar avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.