Giter Site home page Giter Site logo

wiser-reproducibility's Introduction

Data

Abstract

We apply WiSER to three datasets in order to investigate factors related to intra-individual variability: the Women’s Health Study Accelerometry Study (WHS available via dbGaP), The Action to Control Cardiovascular Disease (ACCORD available via BioLINCC), and S&P 500/ President Trump’s Twitter Data (publicly available). WHS data contains accelerometer data on over 15,000 women over 7 days. ACCORD data contains data from a multi-center trial in patients with type II diabetes. The S&P 500/Trump Twitter data is data downloaded from publicly available web APIs that contain Trump's Tweets and Daily historic stock data from the stocks in S&P 500.

Availability

  • Data are publicly available.

Publicly available data

Description

File format(s)

  • CSV or other plain text.

Data dictionary

Part 2: Code

Abstract

Code to download data (when applicable), clean data, and reproduce results are provided in the form of Jupyter Notebooks in the GitHub respository for this project. The subfolders contain code/related content for each analysis found in the paper.

Description

Code format(s)

Supporting software requirements

Version of primary software used

WiSER.jl version v0.0.2.

Libraries and dependencies used by the code

  • R packages (used for plotting): data.table, facetscales, ggplot2, gridExtra, scales
  • Julia Packages: WiSER (and its dependencies found at https://GitHub.com/OpenMendel/WiSER.jl/blob/master/Project.toml), CodecZlib, CSV, DataFrames, DelimitedFiles, GLM, KNITRO [academic license], MarketData, RCall, Roots, SpecialFunctions, StatsBase, TimeZones, DelimitedFiles.

Parallelization used

  • No parallel code used

License

  • MIT License (default)

Additional information (optional)

Parallelization of code was not used, but is easily possible in WiSER, shown in its GitHub documentation.

Julia allows for easy reproducibility, by including a Manifest.toml and Project.toml pair in each subfolder. The user can simply run ] activate . in Julia at that directory and the correct environment with Julia package dependencies used will run.

Part 3: Reproducibility workflow

Scope

The Jupyter notebooks and code provided can be used to reproduce all results (including tables and figures) in Sections 5 and 6, and their accompanying supplementary material sections (S.5-S.8).

Workflow

Format(s)

  • Self-contained R Markdown file, Jupyter notebook, or other literate programming approach

Instructions

Each subfolder in the GitHub repository links to certain sections of the paper (Simulations, Women's Health Study, ACCORD, Twitter/Stock data). These each contain Jupyter notebooks with extensions .ipynb that go step-by-step through the workflow of the analyses presented in the paper, starting from downloading the data (when applicable), to cleaning the data, to analyzing the data. Once you have access to the data sets that require researcher requests, you can run these notebooks with the data and it will produce the results seen in the paper. For easy readability, .html files of the rendered notebooks are also included, which can be opened to view the notebook contents without launching Jupyter.

Note: In order to run Julia in a Jupyter notebook, you must install Julia and the IJulia package. After downloading and launching Julia, IJulia can be installed and Jupyter notebook can be launched by running the following code in Julia:

using Pkg
Pkg.add("IJulia")
Pkg.build("IJulia")
using IJulia
notebook()

Expected run-time

Approximate time needed to reproduce the analyses on a standard desktop machine:

  • > 8 hours

Additional information (optional)

The simulations take the bulk of the time. The real data analyses, including cleaning the data, should take under an hour on a standard desktop machine.

wiser-reproducibility's People

Contributors

chris-german avatar

Stargazers

Hua Zhou avatar  avatar

Watchers

Hua Zhou avatar Jin Zhou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.