mapping-commons / mh_mapping_initiative Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 1.0 2.37 MB

Repo to organise the mouse-human phenotype mapping initiative and reconcile resources.

Makefile 12.86% Jupyter Notebook 55.56% Python 31.59%

mh_mapping_initiative's People

Contributors

Stargazers

Watchers

Forkers

anitacaron

mh_mapping_initiative's Issues

Capture curation rules for MP-HP mapping decisions

Can we get a set of (1) rules (under what conditions do you apply a mapping relation) and (2) examples (can you give some examples for when you curate broad/narrow/close etc) for when you are curating:

broad
narrow
exact
close
related

Some notes from our meetings

Very different way of handling neoplasm/tumor phenotypes: incidence vs abnormal -> related
if there is something so human-specific that it just does not make sense for animals.
When is broad too broad?
Under what circumstances should we add new terms?

Review GXD mappings

http://www.informatics.jax.org/downloads/reports/MP_EMAPA.rpt

Terry H:

In 2017, I did a broad-scale mapping between MP, MA, EMAPA and Uberon terms, initially using the existing MA-EMAPA, Uberon-MA, Uberon-EMAPA and MP-Uberon associations already existing in the MA, Uberon and MP files. The goal of this effort was to created links from MP to EMAPA using Uberon, (http://www.informatics.jax.org/downloads/reports/MP_EMAPA.rpt) so the focus was on anatomical terms relevant to the MP. I submitted a list of several hundred terms, many of which had more recently been added to the EMAPA, to the Uberon group along with what I determined to be their Uberon mappings. I worked with Nicole Vasilevsky to have them added as xrefs.

there were many terms for which either the MP-Uberon or Uberon-EMAPA associations or both were not existing, I have made some efforts to address this, and to monitor updates, but not in a concerted way. Certainly my spreadsheets (which is what I used for the analysis) have not kept up with the status of all of the relevant ontologies.

2017: generated an alignment report for MP, Uberon, EMAPA and MA terms. (Note that there are 1:n and n:1 associations, as might be expected, especially for the phenotype terms.) It might be useful to have it rerun.

Files do not parsing using SSSOM parser

Missing mapping justification

see

mapping-commons/sssom#211

Problems when inferring human gene-phenotype associations from model organism by sequence orthology

Model organisms, especially the laboratory mouse Mus musculus, provide useful knowledge about human diseases. I am studying human gene-HPO term annotations and want to utilize phenotype annotations of animal models to improve the prediction of HPO annotations of human genes.

However, I find a strange problem. Taking keratoconjunctivitis sicca as an example, the associated human genes and mouse genes are largely different:

HP:0001097 There are 41 related human genes, such as AEBP1, B2M, BTNL2, etc. https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=HP:0001097&species=Human#annot
MP:0013466 There are only 3 mouse genes associated with keratoconjunctivitis sicca, i.e. Chd7, Ctnnb1, and Nrtn. https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=MP:0013466&species=Mouse#annot

If I map these mouse genes to their orthologous human genes, the intersection of two gene sets is empty. Why are the genes related to the same phenotype so different between human and model organism? Is there something wrong here?

Moreover, I check the related human and mouse genes to DOID:12895 (keratoconjunctivitis sicca), they are

Human: CCL20, CCR5, IL6, MUC4, MUC5AC, NRTN, TNF https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=DOID:12895&species=Human#annot
Mouse: Ccl20, Ccr5, Il6, Muc4, Muc5ac, Nrtn, Tnf https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=DOID:12895&species=Mouse#annot

Some genes here are inferred from sequence orthology by RGD. But it is strange that the annotated genes here are quite different from those in HP/MP annotations. Why are the genes associated with the same phenotype so different? Is it feasible/reliable to infer gene-phenotype associations from sequence orthology like what RGD does here?

Migrate mapping commons to cookie cutter

@ehartley, can you try to migrate this mapping commons to the cookie cutter to see if it all works fine, an provide some instructions on how to update moving forward?

Add phecode - HPO mappings

emcarthur/phecode-HPO-map#1

Use case for predicting human phenotypes of unannotated genes based on mouse

See obophenotype/upheno#572 for original ticket;

The author (@xianshu1994) is working on developing an algorithm to predict HPO annotations of unannotated human genes. They want to transfer the knowledge of phenotype data in mouse or other model organisms to human in order to facilitate the HPO annotation prediction; similar to what other groups here are trying to achieve.

This ticket is a reminder to loop them in when we have a better answer!

Add score for how close two terms are when using broad/narrow/close predicates

A the last meeting (8/3/23) @matentzn proposed adding a score to the manual mappings for how close two terms are when mapped using broad/narrow/close. Using this ticket to write up initial thoughts and track proposals for implementing this.

My initial proposal is to estimate how often the given match would result in inclusion of unwanted data when traversing from the narrower term to the broad term. Basically, how much noise is inherent in the match. The consideration has to be from narrow to broad as all annotations to the narrower term are or should be applicable to the broad term. If this is not the case then you should use related.

Proposed scale is 0-1, where 1 is an exact match; you should never use 1 as those should use the skos:exactMatch predicate.

I've essentially be treating close as almost but not quite an exact match so these should have a high score on the scale.

Given that this is at best a rough estimate I'm going to stick with 1 decimal place for now. So a score of:
0.9 = little noise, almost everything should useful, I think these should be mostly skos:closeMatch
0.5 = moderately noisy, should be broad/narrow/related
0.1 = very noisy, still broad/narrow but several steps away from each other in the hierarchy of the ontologies, often question the value of even making the mapping I would not make 'related' mappings that were this noisy

Try AgreementMakerLight

https://github.com/AgreementMakerLight/AML-Project

incorrect MP IDs in the mp_hp_pat_impc.sssom.tsv file

File: mp_hp_pat_impc.sssom.tsv
Invalid MP ID: MP:000095 abnormal spinal cord morphology skos:closeMatch HP:0002143 Abnormality of the spinal cord semapv:LexicalMatching
missing the final 5 on the ID should be MP:0000955
Invalid MP ID: MP:0007180 decreased brown fat amount skos:closeMatch HP:0005995 Decreased adipose tissue around neck semapv:LogicalReasoning
ID should be MP:0001780
Invalid MP ID: MP:0001782 increased white adipose tissue amount skos:closeMatch HP:0008993 Increased intraabdominal fat semapv:LogicalReasoning
ID should be MP:0000008

Unable to find appropriate mappings in OXO

Reference use case:

Problem description:

I find the term “high density lipoprotein cholesterol measurement” among the traits in the GWAS output for “cardiovascular disease”, which the GWAS Catalog tells me is EFO:0004612

Mapping EFO:0004612 using OXO: https://www.ebi.ac.uk/spot/oxo/search does not find:

MP abnormal circulating HDL cholesterol level MP:0000184
HP Increased HDL cholesterol concentration HP:0012184
HP Abnormal HDL cholesterol concentration HP:0031888

Not sure if this is a problem inherent to OXO or the mappings are lacking?
Some time ago I looked for other mapping tools – not sure if there is anything better out there.
Searching in OLS for “high density lipoprotein cholesterol” https://www.ebi.ac.uk/ols/search?q=high-density+lipoprotein+cholesterol&groupField=iri&start=0
Can filter by MP and I get: https://www.ebi.ac.uk/ols/search?q=high-density+lipoprotein+cholesterol&groupField=iri&start=0&ontology=mp
For efficiency, we would need an automated way to conduct these mappings.

Add Machine learning based cross species mapping table

We want to add a table with all the cross-phenotype matches based on graph-machine learning.

Input graph: phenio
Learning set:
- known equivalencies?
- An interesting negative edge set (known false predications) could be anything within an prefix space (MP-MP), but I think this may screw up the algorithm
Output:
- a table of predicted cross-species equivalencies (cut off at some threshold of your choosing)
- a table of known equivalencies (training set) which are not predicted well by the final model (for QC purposes, and because it is interesting)

The equivalencies we are interested in are those where the subject and the object are from different prefix spaces. The prefix spaces of interest are: HP, MP, ZP, XPO, WBPhenotype, DPO (FBcv), PLANP, FYPO, DDPHENO, MGPO (the order is from most to least important for the time being).

First draft for trait to phenotype mappings

In recent months we have migrated OBA to a modern infrastructure, so we should be able to create a draft mapping based on the following process:

@rays22 Publish a VT-OBA mapping in SSSOM here and share with Elissa's team [oba_vt.sssom.tsv]
Elissa's team will make sure that all relevant VT codes are present
@rays22 Merge OBA, HP and MP (robot merge) and runs the reasoner. Run a ROBOT (SPARQL) query to determine all VT-MP/HP mappings (all of these are necessarily narrow from OBA to HP/MP!) [oba_upheno.sssom.tsv]
@rays22 Send to share SSSOM mappings with Elissa's team
@matentzn will share MONDO - OMIM mapping with Elissa's team
Elissa's team will merge HPOA annotations (HP to OMIM p2d's) with [oba_upheno.sssom.tsv] and [oba_vt.sssom.tsv] to produce [vt_mondo.sssom.tsv, vt_hp.sssom.tsv, vt_mp.sssom.tsv] and provide feedback here
If these "mappings" (not really mappings in the strict sense, but let's say vaguely mappings) are any good, then we are start thinking about adding other sources of p2d and streamlining this process for other trait to disease use cases like GWAS

Update registry yaml to latest SSSOM model

https://github.com/mapping-commons/mh_mapping_initiative/blob/master/registry.yml

Update this to the latest SSSOM spec for mapping registries
request on sssom issue tracker all elements not in the sssom data model