Data Quality Methods and Tools to Support CTSA Hub Data Sharing
Electronic Health Record (EHR) data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Many CTSA institutions have harmonized their EHR data to the Observational Medical Outcomes Partnership (OMOP) data model, yet no publicly available tool with a standard operating procedure (SOP) exists to easily assess and visualize data quality tests, particularly across institutions. This project will launch a publically available data quality testing tool and SOP, configurable to any database environment for N OMOP datasets.
Project description
EHR data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Harmonized datasets need to conform to an established standard format and vocabulary before any analysis can be done. They need to have a bare minimum threshold of completeness (i.e., what percentage of values are null or empty). They also need to prove a certain level of plausibility (i.e., do the data make sense for what is expected, are they believable and credible). To date, most data sharing networks have developed internal protocols and tools to manage data harmonization, but no publicly available tool with a standard operating procedure exists to easily assess and visualize data quality tests across institutions. Therefore, data quality remains a problem that is inconsistently tackled and only by high level analytic teams if available.
Alignment to program objectives
TODO see here
Contact person
Point person (github handle) | Site | Program Director |
---|---|---|
Kari Stephens (@kstephen0909) | UW | Sean Mooney (@sdmooney) |
Leads
Lead(s) (github handle) | Site |
---|---|
Kari Stephens (@kstephen0909) | UW |
Adam Wilcox (@abwilcox) | UW |
Team members
Team members can be found here
Repositories
Originally Develop DQe-c Tool
https://github.com/data2health/DQe-c
Ongoing Re-Engineering of DQe-c Tool
https://github.com/data2health/DQe-c-v2
Deliverables
- Data quality testing tool (DQe-c) available to CTSA hubs and affiliates
- Data quality testing tool standard operating procedures and documentation supporting local configuration
- List of recommended minimum level data quality tests to help with data sharing assurance
Milestones
View the project milestones here
Evaluation
View the Evaluation component here
Education
View the education component here.
Get involved
View the engagment component here
Working documents
Team collaborative working folder can be found here
Slack channel
#data-quality is accessible to participants that have been onboarded