Giter Site home page Giter Site logo

r_computational_workflow's Introduction

Synthetic Medicare Data for Environmental Health Studies

Authors: Naeem Khoshnevis, Xiao Wu, Danielle Braun
Email: [email protected]
Github: https://github.com/Naeemkh

Summary

We present example of computational workflow in R. The original workflow and documentations is located here. In this workflow, we generate public data sets for benchmarking and illustration purposes for air pollution and health studies. In most of these studies, the health care data cannot be shared with the public; as a result, there are no public data sets to be used as benchmark data set for testing the packages or illustrating their functionalities. CMS has generated synthetic data for the 2008-2010 range for Medicare data. This report uses these data, census, and exposure data to compile the study data set.

Set up project environment

To be able to reproduce the results. You need to download raw data. All data are open to the public. In the entire processing workflow, we keep data (input, output, and cache) and code in separate locations; this has several advantages, including:

  • Some input data has been used in several projects; as a result, separating the code and data helps us store the data once.
  • Reduces the chance of submitting data into the version control system.
    • In the case of public data, this redundantly increases disk usage.
    • In the case of private data, this increases the risk of a data breach.
  • Helps the developers to manage disk spaces easily. For example, one can easily connect the project to an external disk space without changing a character in the code. The following figure shows the relation between the project and data folders.

Input files are shared here for direct download. The following steps represent reproducing the results (Windows is not tested).

  • Step 1: Create a project_path_info.md file and add the following fields:
PROJECT_NAME=your_project_name
PUBLIC_DATA_DIR=path_to_public_data_folder_on_your_system
PRIVATE_DATA_DIR=path_to_private_data_folder_on_your_system
OUTPUT_DATA_DIR=path_to_output_folder_on_your_system
    

Please note to include the last empty line.

  • Step 2: Run initialize_project.sh (Windows users should skip this step and manually create the soft links. See lines 76-78 of initialize_project.sh)

  • Step 3: Copy downloaded files into the public data directory.

  • Step 4: Create R conda environment.

    conda env create -n r_env -f r_env.yaml
  • Step 5: Activate R conda environment.

    conda activate r_env
  • Step 6: Run RStudio

    In my case, I am using a macOS and the Rstudio application is located in the following path.

    /Applications/RStudio.app/Contents/MacOS/RStudio
  • Step 7: Double-check if your conda path is all set.

    Inside RStudio, run:

    > R.home()

    [1] "/Users/[your username]/anaconda3/envs/r_env/lib/R"

    > .libPaths()

    [1] "/Users/[your username]/anaconda3/envs/r_env/lib/R/library"

  • Step 8: Navigate to the code folder and run synthetic_county_2010.Rmd

  • Step 9: A Study_dataset_2010.csv file will be created in the output/results folder.

  • Done!

r_computational_workflow's People

Contributors

naeemkh avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.