Giter Site home page Giter Site logo

ehr2hpo.prj's Introduction

HPO 2 EHR

Problem statement

Electronic health records (EHRs) contain rich phenotype information that can be utilized to stratify diseases and to develop hypotheses. Despite the great potential of EHR data, patient phenotyping from EHRs is still challenging because the phenotype information is distributed in many EHR locations (laboratories, notes, problem lists, imaging data, etc.) and since EHRs have vastly different structures across sites. This lack of integration represents a substantial barrier to widespread use of EHR data in translational research. In the first phase of this project, we developed a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms (Zhang et al. (2019) npj Digital Medicine 2:3). In the current phase of the project, we will use the software to search for biomarkers in EHR data of participating CTSA centers. In future work, we will extend the resource to additional phenotype sources in the EHR.

Project description

The Human Phenotype Ontology (HPO) is a freely available and open source logically defined vocabulary for describing human abnormal phenotypes. The HPO has become the de facto standard for computational phenotype analysis in genomics and rare disease, being used by the NIH Undiagnosed Diseases Network, the 100,000 Genomes project, and many other academic, clinical, and commercial entities. The HPO currently contains 14,184 terms (February, 2019).

A phenotype-driven approach opens up entirely new ways of mining EHR data for correlations that might be important in understanding disease pathophysiology, gender or age-differences, and biomarkers. It is important to develop clever ways of analyzing the data. We expect that many phenotype abnormalities might be highly correlated in all disease states, and thus identifying such an “obvious” correlation would not be an interesting result. For instance, Abnormal hematocrit and Abnormal hemoglobin level are expected to be highly correlated. Here, we propose adapting the approach taken to characterize synergy networks in expression data which was developed to find gene-gene interactions that are specifically associated with a phenotype (such as a particular cancer). The method is based on an information theoretic analysis of multivariate synergy that decomposes sets of genes into submodules each of which contains synergistically interacting gene. The method can be extended to phenotype to search for pairs of markers (HPO terms) that show mutual information conditional upon the presence of a specific diagnosis (e.g., an ICD9 code, or possible an eMERGE classification). The result would be a data driven way of defining pairs of features that show a surprising correlation in the presence of a disease — this might lead to the discovery of potential biomarkers (in this case, if one finds some HPO term in a person with some disease, then “synergy” would suggest the other HPO term of the pair would be more likely to be present than expected by chance). We also believe this might be a good opportunity to engage CTSA hubs in data exploration or the use of this approach/resulting derived data for DREAM challenges.

A detailed Implementation protocol is available in this GoogleDoc. We are currently testing the implementations with a public dataset on intensive care unit patients (refer to MIMIC_HPO ).

Contact person

Point person (github handle) Site
Peter Robinson (@pnrobinson) JAX

Leads

Lead(s) (github handle) Site
Peter Robinson (@pnrobinson) JAX
Aaron Zhang (@kingmanzhang ) JAX
Amy Yates (@aeyates ) OHSU

Team members

See https://github.com/data2health/project-repo-template/tree/master/team.md

Repositories

https://github.com/TheJacksonLaboratory/loinc2hpoAnnotation https://github.com/monarch-initiative/loinc2hpo

Deliverables

Current 6 month period

Item Delivery
Publication of LOINC2HPO in NPJ Digit Med. 2019; PMID:31119199 and demonstration of LOINC2HPO May 2019
Mutual information content algorithm for biomarker identification implemented in Java ready for use in CTSA sites August 2019
Analysis of test datasets (Asthma, ICU) September 2019
Obtain IRB approval to implement the algorithm at two CTSA sites and start initial analysis September 2019

Subsequent six month period

  • Publication about novel algorithm and results in collaboration with CTSA sites
  • Cross-site clustering algorithm
  • Analysis of test datasets (cross-clustering)
  • Implementation of cross-site clustering at CTSA sites
  • Implementation plan for adding additional datatypes (e.g., radiology) to EHR2HPO software

Evaluation plan

The operational architecture evaluation plan is here.

Education

See here

Engagement

See here

ehr2hpo.prj's People

Contributors

cgcook avatar eichmann avatar jmcmurry avatar kingmanzhang avatar mellybelly avatar pnrobinson avatar tricfran avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ehr2hpo.prj's Issues

Determine goal of multi-CTSA project

In the original project, we looked at a cohort that was enriched in but not exclusively composed of asthma patients. This type of cohort would be good to have for a larger project.

  • Determine areas of medical interest and opportunity among the chosen CTSA sites
  • Determine method for assigning a patient to a diagnosis or excluding the diagnosis (e.g., ICD codes, or some other means offered by existing CTSA technologies such as OMOP)
  • Analyze our current LOINC2HPO annotations and try to ensure that all tests relevant to the disease of interest as included in our library.
  • If possible, design some test as to whether we are correctly identifying our diagnosis of interest in the cohort, for instance, by manual inspection of 100 charts.

Documentation for HPO website

We will write up documentation that will live on the main HPO website that will explain LOINC and the LOINC2HPO project. This will be linked to the new pages on the HPO website. It will also use material from #9

Demo needs prepopulated examples and better functionality

  • Prepopulate examples

The text in the header is easy to miss. Needs to be more Plug-and-Play whether hyperlink (eg. https://hpo.jax.org/app/ or button eg. https://alpha.monarchinitiative.org/)

  • Phenotype search

Moreover, searching by patient is the least interesting form of what the LOINC2HPO makes possible. We should be able to search by phenotype, preferably one shared by a bunch of patients. Ultimately this should be possible using autocomplete, however, for now, some hardcoded examples (prepopulated per above) would be fine.

Determine most likely group of participating CTSAs

We should try to identify at least 2 CTSAs (and better 3-5) that would be willing to implement a LOINC2HPO analysis project

  • Determine contact people
  • Determine whether the centers can use FHIR
  • Determine what level of data sharing will be possible

Improve evaluation plan

right now the eval plan is largely the same as the deliverables. It would be good to have some evaluation measures.
What functionality will there be and how will you know the app has achieved it?
How will you measure user satisfaction or query answering capabilities?
What constitutes success?

Here is a simple example of an evaluation plan with more of these sorts of items, you could copy the template: https://github.com/data2health/Operational-architecture/blob/master/evalulation.md

Implement synergy algorithm (test case)

Implement the algorithm described here and test it using artificial or genomic data as preparation for the CTSA analysis

Watkinson J, Wang X, Zheng T, Anastassiou D. Identification of gene interactions associated with disease from gene expression data using synergy networks. BMC Syst Biol. 2008;2:10.
Anastassiou D. Computational analysis of the synergy among multiple interacting genes. Mol Syst Biol. 2007;3:83.

Analysis of reasons for mapping failures on real data

Can we make a list of LOINC terms that were not mapped at one real-life CTSA, and look at the most common ones to identify any potential systematic reasons for failure? That is, if we are expecting the wrong codes or something like that, then the failure might not be because of incomplete curation (we know about that) but because of a logic problem in our code that we are not aware of. This analysis will try to figure that out by looking at the top 250 most common failures. If possible, it would be good to share this in a de-identified way with the entire team.

Begin work on DREAM ontology mapping

Not funded by CD2H but related.
The HPO team has been contacted by a European radiology project with lots of data on radiology findings with the goal of ammping this to HPO Terms

naming and privacy

should this repo/project be renamed ehr2HPO rather than HPO2ehr?
And any reason it is private? the tracking repos must be public. Ideally the code is too ;-).

Write implementation protocol

  • draft by Peter
  • revisions by Amy Yates and Justin Ramsdill: implementation in FHIR/Epic system
  • discuss with Chris Chute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.