Giter Site home page Giter Site logo

data-quality's Introduction

Data Quality Methods and Tools to Support CTSA Hub Data Sharing

Electronic Health Record (EHR) data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Many CTSA institutions have harmonized their EHR data to the Observational Medical Outcomes Partnership (OMOP) data model, yet no publicly available tool with a standard operating procedure (SOP) exists to easily assess and visualize data quality tests, particularly across institutions. This project will launch a publically available data quality testing tool and SOP, configurable to any database environment for N OMOP datasets.

Project description

EHR data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Harmonized datasets need to conform to an established standard format and vocabulary before any analysis can be done. They need to have a bare minimum threshold of completeness (i.e., what percentage of values are null or empty). They also need to prove a certain level of plausibility (i.e., do the data make sense for what is expected, are they believable and credible). To date, most data sharing networks have developed internal protocols and tools to manage data harmonization, but no publicly available tool with a standard operating procedure exists to easily assess and visualize data quality tests across institutions. Therefore, data quality remains a problem that is inconsistently tackled and only by high level analytic teams if available.

Alignment to program objectives

TODO see here

Contact person

Point person (github handle) Site Program Director
Kari Stephens (@kstephen0909) UW Sean Mooney (@sdmooney)

Leads

Lead(s) (github handle) Site
Kari Stephens (@kstephen0909) UW
Adam Wilcox (@abwilcox) UW

Team members

Team members can be found here

Repositories

Originally Develop DQe-c Tool
https://github.com/data2health/DQe-c

Ongoing Re-Engineering of DQe-c Tool
https://github.com/data2health/DQe-c-v2

Deliverables

  • Data quality testing tool (DQe-c) available to CTSA hubs and affiliates
  • Data quality testing tool standard operating procedures and documentation supporting local configuration
  • List of recommended minimum level data quality tests to help with data sharing assurance

Milestones

View the project milestones here

Evaluation

View the Evaluation component here

Education

View the education component here.

Get involved

View the engagment component here

Working documents

Team collaborative working folder can be found here

Slack channel

#data-quality is accessible to participants that have been onboarded

data-quality's People

Contributors

annbwes avatar eichmann avatar ezampino avatar jmcmurry avatar kstephen0909 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ezampino annbwes

data-quality's Issues

Develop the Load.py module

This module will create the DQTBL object and output the tablelist.csv file. The main purpose of the module is to calculate the size of the Common Data Model and the difference between the expected Common Data Model and actual queried Common Data Model.

Build the Prep.py module

This module will load the CDM csv file, configure the database connection, and collect all necessary configurations for the rest of the pipeline.

License Agreement

Work with UW CoMotion staff to develop a license that fosters open source collaboration with reasonable boundaries in place.

-What do we want from the pilot site? What sort of feedback?

Informative Community Engager - one pager

One page document describing the mission of the project, goals and desirable outcomes.
Address how members can contribute.
Communication Plan for convene meetings.

Webinar (2nd Convening Event)

2nd Meeting

  • Choose a date (early may?)
  • Finalize attendees/audience (Analysts)
  • Develop marketing materials
  • Develop communications plan
  • Define/finalize purpose:
    Convening meeting of analysts in early May for Nicole’s webinar and getting input from analysts about tests for DQe-c enhancements.

Develop Orphan.py Module

This module will calculate the number of foreign keys that are present in table that are not present in the host reference table.

Convene Community

Determine the communication plan, audience (attendees list), marketing materials and scope of convening community

Initial Convening event

Convening meeting of iDTF and PI’s in early mid April

  • Choose a date for initial meeting
  • Finalize attendee; DTF and PI’s
  • Develop marketing materials
  • Develop communications plan
  • Define/finalize purpose

Outline new architecture

Hold whiteboarding sessions to lay out the architecture of the new tools
Outline the set of modules and mini-milestones that need to be met.

Develop Indicators.py Module

This module will calculate the percent missingness for the patient population for specific clinical indictators.

Requirements Matrix

Tech requirements needed at pilot site.
Support
Implementation
Documentation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.