Giter Site home page Giter Site logo

d3b-warehouse-redcap's Introduction

D3b REDCap Warehouser

Purpose

  1. Extracts clinical data from REDCap
  2. De-identifies it via BRP-eHB
  3. Stores everything into a warehouse database.

CLI Help

python warehouse_project.py --help

Known invocations

Where:

  • REDCAP_TOKEN_##### is an environment key storing the API token for that REDCap project
  • BRP_TOKEN is an environment key storing the user API token for the BRP
  • CID_MAGIC_NUMBER is an environment key storing the magic number used for generating CIDs from eHB IDs
  • D3B_WAREHOUSE_DB_URL is an environment key storing an authenticated URL (protocol://user:password@address:port/db) for the D3b warehouse

Oligo Nation, REDCap project 27084, BRP protocol 159, eHB organization 102

python warehouse_project.py REDCAP_TOKEN_27084 BRP_TOKEN 159 CID_MAGIC_NUMBER D3B_WAREHOUSE_DB_URL --redcap_organization_override_value 102 --redact description_of_chemotherap --redact other_rad_treat --redact describe_predis

DGD, REDCap project 33723, BRP protocol 95, (only if CID exists)

python warehouse_project.py REDCAP_TOKEN_33723 BRP_TOKEN 95 CID_MAGIC_NUMBER D3B_WAREHOUSE_DB_URL --redcap_id_within_organization_field mrn --only_warehouse_if_CID_already_exists --fillmask diagnosis_id=dgd_diagnosis=d3b_event_identifiers

d3b-warehouse-redcap's People

Contributors

fiendish avatar liberaliscomputing avatar parimalak avatar dependabot[bot] avatar

Watchers

 avatar Nick Van Kuren avatar  avatar  avatar

d3b-warehouse-redcap's Issues

Add organization name and id in redcap instrument

to keep the organizations between different d3b-center projects consistent, we should add BRP organization id and name in Enrollment instrument because of query parameters of BRP does not work and they have inconsistent organizations.

Containerize Extraction and De-Identification Processes

Containerize the extraction and de-identification Python processes for data from REDCap and Nautilus data. It would be helpful to create a Dockerfile to build an image that can install all the dependencies and run the scripts necessary for the ETL which will be deployed to the K8s cluster on the IS premise.

Consider Changing the Repo Name

Cross-referencing d3b-center/d3b-chordoma#11, let's consider changing this repo name to a more representative name following KF's naming convention. Possible candidates include:

  • d3b-lib-datawarehouse-redcap
  • d3b-lib-data-warehouse-redcap
  • d3b-datawarehouse-redcap
  • d3b-data-warehouse-redcap

Support upsert on global tables

we need to support upsert for project_info and sample_information, now we are appending to the table which is going to create multiple entries if we run the same study.

Discuss Infrastructure for the Implementation of Data Warehousing

Discuss infrastructure for the implementation of a data warehouse for D3b reporting. We need to discuss the following, but not limited to, items:

  • Purpose
    • Who are the audience of the data warehouse, CRU/BRU, engineers, and/or leaderships?
    • Do we aim to 1) simply provide automated reporting pulls per study protocol or 2) add additional layers of analytical capability against approved protocols?
    • Do we want to run reporting/analytics solely per study or across the entire protocols?
  • Location
    • Where will this warehousing be residing? (REDCap, Nautilus, and BRP reside inside of the CHOP network)
    • Which level of security should the werehouse be kept with?
  • Data flow
    • Does it include PHIs in any part of the system? Where and how do PHIs move around?
    • Do we need to ingest raw data pulls from REDCap, Nautilus, and EHB (historical) or just periodical reports?
    • How often will the ETL be kicked in and the periodic ETL preserve the history of changes?

Handle Multiple Protocols Import

The current pipeline ETLs only one study against REDCAP_TOKEN_ENV_KEY and BRP_PROTOCOL given in compose.env. Implement a multiple studies import.

Create docker-compose

It would be helpful for local development if there was a docker-compose.yml file that ran the ETL code and also ran the target database. The compose file should also configure all the environment variables for the ETL job so that it may be run without additional configuration.

Bonus points for being able to mock out the source data layers in some way.

Design Data Warehouse Schemas

Design new data warehouse schemas to 1) stage raw and transformed data and 2) load reports. The following schemas can be considered:

  1. Staging: this schema is intended to store raw data pulls (possibly de-normalized) and transformed, normalized data. Each of these data sets can be stored in different namespaces, conventionally, staging and user_maintained. Tables in staging are for historical, sanity checking and others in user_maintained are for integrated, merged data
  2. Warehouse (Target): it stores periodical reports run and pulled against pre-defined metrics

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.