Giter Site home page Giter Site logo

mh_mapping_initiative's People

Contributors

anitacaron avatar matentzn avatar sbello avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

anitacaron

mh_mapping_initiative's Issues

Capture curation rules for MP-HP mapping decisions

Can we get a set of (1) rules (under what conditions do you apply a mapping relation) and (2) examples (can you give some examples for when you curate broad/narrow/close etc) for when you are curating:

  • broad
  • narrow
  • exact
  • close
  • related

Some notes from our meetings

  • Very different way of handling neoplasm/tumor phenotypes: incidence vs abnormal -> related
  • if there is something so human-specific that it just does not make sense for animals.
  • When is broad too broad?
  • Under what circumstances should we add new terms?

Review GXD mappings

http://www.informatics.jax.org/downloads/reports/MP_EMAPA.rpt

Terry H:

In 2017, I did a broad-scale mapping between MP, MA, EMAPA and Uberon terms, initially using the existing MA-EMAPA, Uberon-MA, Uberon-EMAPA and MP-Uberon associations already existing in the MA, Uberon and MP files. The goal of this effort was to created links from MP to EMAPA using Uberon, (http://www.informatics.jax.org/downloads/reports/MP_EMAPA.rpt) so the focus was on anatomical terms relevant to the MP. I submitted a list of several hundred terms, many of which had more recently been added to the EMAPA, to the Uberon group along with what I determined to be their Uberon mappings. I worked with Nicole Vasilevsky to have them added as xrefs.

there were many terms for which either the MP-Uberon or Uberon-EMAPA associations or both were not existing, I have made some efforts to address this, and to monitor updates, but not in a concerted way. Certainly my spreadsheets (which is what I used for the analysis) have not kept up with the status of all of the relevant ontologies.

2017: generated an alignment report for MP, Uberon, EMAPA and MA terms. (Note that there are 1:n and n:1 associations, as might be expected, especially for the phenotype terms.) It might be useful to have it rerun.

Problems when inferring human gene-phenotype associations from model organism by sequence orthology

Model organisms, especially the laboratory mouse Mus musculus, provide useful knowledge about human diseases. I am studying human gene-HPO term annotations and want to utilize phenotype annotations of animal models to improve the prediction of HPO annotations of human genes.

However, I find a strange problem. Taking keratoconjunctivitis sicca as an example, the associated human genes and mouse genes are largely different:

If I map these mouse genes to their orthologous human genes, the intersection of two gene sets is empty. Why are the genes related to the same phenotype so different between human and model organism? Is there something wrong here?

Moreover, I check the related human and mouse genes to DOID:12895 (keratoconjunctivitis sicca), they are

Some genes here are inferred from sequence orthology by RGD. But it is strange that the annotated genes here are quite different from those in HP/MP annotations. Why are the genes associated with the same phenotype so different? Is it feasible/reliable to infer gene-phenotype associations from sequence orthology like what RGD does here?

Use case for predicting human phenotypes of unannotated genes based on mouse

See obophenotype/upheno#572 for original ticket;

The author (@xianshu1994) is working on developing an algorithm to predict HPO annotations of unannotated human genes. They want to transfer the knowledge of phenotype data in mouse or other model organisms to human in order to facilitate the HPO annotation prediction; similar to what other groups here are trying to achieve.

This ticket is a reminder to loop them in when we have a better answer!

Add score for how close two terms are when using broad/narrow/close predicates

A the last meeting (8/3/23) @matentzn proposed adding a score to the manual mappings for how close two terms are when mapped using broad/narrow/close. Using this ticket to write up initial thoughts and track proposals for implementing this.

My initial proposal is to estimate how often the given match would result in inclusion of unwanted data when traversing from the narrower term to the broad term. Basically, how much noise is inherent in the match. The consideration has to be from narrow to broad as all annotations to the narrower term are or should be applicable to the broad term. If this is not the case then you should use related.

Proposed scale is 0-1, where 1 is an exact match; you should never use 1 as those should use the skos:exactMatch predicate.

I've essentially be treating close as almost but not quite an exact match so these should have a high score on the scale.

Given that this is at best a rough estimate I'm going to stick with 1 decimal place for now. So a score of:
0.9 = little noise, almost everything should useful, I think these should be mostly skos:closeMatch
0.5 = moderately noisy, should be broad/narrow/related
0.1 = very noisy, still broad/narrow but several steps away from each other in the hierarchy of the ontologies, often question the value of even making the mapping I would not make 'related' mappings that were this noisy

incorrect MP IDs in the mp_hp_pat_impc.sssom.tsv file

File: mp_hp_pat_impc.sssom.tsv
Invalid MP ID: MP:000095 abnormal spinal cord morphology skos:closeMatch HP:0002143 Abnormality of the spinal cord semapv:LexicalMatching
missing the final 5 on the ID should be MP:0000955
Invalid MP ID: MP:0007180 decreased brown fat amount skos:closeMatch HP:0005995 Decreased adipose tissue around neck semapv:LogicalReasoning
ID should be MP:0001780
Invalid MP ID: MP:0001782 increased white adipose tissue amount skos:closeMatch HP:0008993 Increased intraabdominal fat semapv:LogicalReasoning
ID should be MP:0000008

Unable to find appropriate mappings in OXO

Reference use case:

image

Problem description:

I find the term “high density lipoprotein cholesterol measurement” among the traits in the GWAS output for “cardiovascular disease”, which the GWAS Catalog tells me is EFO:0004612

Mapping EFO:0004612 using OXO: https://www.ebi.ac.uk/spot/oxo/search does not find:

  • MP abnormal circulating HDL cholesterol level MP:0000184
  • HP Increased HDL cholesterol concentration HP:0012184
  • HP Abnormal HDL cholesterol concentration HP:0031888

Not sure if this is a problem inherent to OXO or the mappings are lacking?
Some time ago I looked for other mapping tools – not sure if there is anything better out there.
Searching in OLS for “high density lipoprotein cholesterol” https://www.ebi.ac.uk/ols/search?q=high-density+lipoprotein+cholesterol&groupField=iri&start=0
Can filter by MP and I get: https://www.ebi.ac.uk/ols/search?q=high-density+lipoprotein+cholesterol&groupField=iri&start=0&ontology=mp
For efficiency, we would need an automated way to conduct these mappings.

Add Machine learning based cross species mapping table

We want to add a table with all the cross-phenotype matches based on graph-machine learning.

  • Input graph: phenio

  • Learning set:

    • known equivalencies?
    • An interesting negative edge set (known false predications) could be anything within an prefix space (MP-MP), but I think this may screw up the algorithm
  • Output:

    • a table of predicted cross-species equivalencies (cut off at some threshold of your choosing)
    • a table of known equivalencies (training set) which are not predicted well by the final model (for QC purposes, and because it is interesting)

The equivalencies we are interested in are those where the subject and the object are from different prefix spaces. The prefix spaces of interest are: HP, MP, ZP, XPO, WBPhenotype, DPO (FBcv), PLANP, FYPO, DDPHENO, MGPO (the order is from most to least important for the time being).

First draft for trait to phenotype mappings

In recent months we have migrated OBA to a modern infrastructure, so we should be able to create a draft mapping based on the following process:

  • @rays22 Publish a VT-OBA mapping in SSSOM here and share with Elissa's team [oba_vt.sssom.tsv]
  • Elissa's team will make sure that all relevant VT codes are present
  • @rays22 Merge OBA, HP and MP (robot merge) and runs the reasoner. Run a ROBOT (SPARQL) query to determine all VT-MP/HP mappings (all of these are necessarily narrow from OBA to HP/MP!) [oba_upheno.sssom.tsv]
  • @rays22 Send to share SSSOM mappings with Elissa's team
  • @matentzn will share MONDO - OMIM mapping with Elissa's team
  • Elissa's team will merge HPOA annotations (HP to OMIM p2d's) with [oba_upheno.sssom.tsv] and [oba_vt.sssom.tsv] to produce [vt_mondo.sssom.tsv, vt_hp.sssom.tsv, vt_mp.sssom.tsv] and provide feedback here
  • If these "mappings" (not really mappings in the strict sense, but let's say vaguely mappings) are any good, then we are start thinking about adding other sources of p2d and streamlining this process for other trait to disease use cases like GWAS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.