mapping-commons / mh_mapping_initiative Goto Github PK
View Code? Open in Web Editor NEWRepo to organise the mouse-human phenotype mapping initiative and reconcile resources.
Repo to organise the mouse-human phenotype mapping initiative and reconcile resources.
Can we get a set of (1) rules (under what conditions do you apply a mapping relation) and (2) examples (can you give some examples for when you curate broad/narrow/close etc) for when you are curating:
Some notes from our meetings
http://www.informatics.jax.org/downloads/reports/MP_EMAPA.rpt
Terry H:
In 2017, I did a broad-scale mapping between MP, MA, EMAPA and Uberon terms, initially using the existing MA-EMAPA, Uberon-MA, Uberon-EMAPA and MP-Uberon associations already existing in the MA, Uberon and MP files. The goal of this effort was to created links from MP to EMAPA using Uberon, (http://www.informatics.jax.org/downloads/reports/MP_EMAPA.rpt) so the focus was on anatomical terms relevant to the MP. I submitted a list of several hundred terms, many of which had more recently been added to the EMAPA, to the Uberon group along with what I determined to be their Uberon mappings. I worked with Nicole Vasilevsky to have them added as xrefs.
there were many terms for which either the MP-Uberon or Uberon-EMAPA associations or both were not existing, I have made some efforts to address this, and to monitor updates, but not in a concerted way. Certainly my spreadsheets (which is what I used for the analysis) have not kept up with the status of all of the relevant ontologies.
2017: generated an alignment report for MP, Uberon, EMAPA and MA terms. (Note that there are 1:n and n:1 associations, as might be expected, especially for the phenotype terms.) It might be useful to have it rerun.
Model organisms, especially the laboratory mouse Mus musculus, provide useful knowledge about human diseases. I am studying human gene-HPO term annotations and want to utilize phenotype annotations of animal models to improve the prediction of HPO annotations of human genes.
However, I find a strange problem. Taking keratoconjunctivitis sicca as an example, the associated human genes and mouse genes are largely different:
HP:0001097 There are 41 related human genes, such as AEBP1, B2M, BTNL2, etc. https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=HP:0001097&species=Human#annot
MP:0013466 There are only 3 mouse genes associated with keratoconjunctivitis sicca, i.e. Chd7, Ctnnb1, and Nrtn. https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=MP:0013466&species=Mouse#annot
If I map these mouse genes to their orthologous human genes, the intersection of two gene sets is empty. Why are the genes related to the same phenotype so different between human and model organism? Is there something wrong here?
Moreover, I check the related human and mouse genes to DOID:12895 (keratoconjunctivitis sicca), they are
Human: CCL20, CCR5, IL6, MUC4, MUC5AC, NRTN, TNF https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=DOID:12895&species=Human#annot
Mouse: Ccl20, Ccr5, Il6, Muc4, Muc5ac, Nrtn, Tnf https://rgd.mcw.edu/rgdweb/ontology/annot.html?acc_id=DOID:12895&species=Mouse#annot
Some genes here are inferred from sequence orthology by RGD. But it is strange that the annotated genes here are quite different from those in HP/MP annotations. Why are the genes associated with the same phenotype so different? Is it feasible/reliable to infer gene-phenotype associations from sequence orthology like what RGD does here?
@ehartley, can you try to migrate this mapping commons to the cookie cutter to see if it all works fine, an provide some instructions on how to update moving forward?
See obophenotype/upheno#572 for original ticket;
The author (@xianshu1994) is working on developing an algorithm to predict HPO annotations of unannotated human genes. They want to transfer the knowledge of phenotype data in mouse or other model organisms to human in order to facilitate the HPO annotation prediction; similar to what other groups here are trying to achieve.
This ticket is a reminder to loop them in when we have a better answer!
A the last meeting (8/3/23) @matentzn proposed adding a score to the manual mappings for how close two terms are when mapped using broad/narrow/close. Using this ticket to write up initial thoughts and track proposals for implementing this.
My initial proposal is to estimate how often the given match would result in inclusion of unwanted data when traversing from the narrower term to the broad term. Basically, how much noise is inherent in the match. The consideration has to be from narrow to broad as all annotations to the narrower term are or should be applicable to the broad term. If this is not the case then you should use related.
Proposed scale is 0-1, where 1 is an exact match; you should never use 1 as those should use the skos:exactMatch predicate.
I've essentially be treating close as almost but not quite an exact match so these should have a high score on the scale.
Given that this is at best a rough estimate I'm going to stick with 1 decimal place for now. So a score of:
0.9 = little noise, almost everything should useful, I think these should be mostly skos:closeMatch
0.5 = moderately noisy, should be broad/narrow/related
0.1 = very noisy, still broad/narrow but several steps away from each other in the hierarchy of the ontologies, often question the value of even making the mapping I would not make 'related' mappings that were this noisy
File: mp_hp_pat_impc.sssom.tsv
Invalid MP ID: MP:000095 abnormal spinal cord morphology skos:closeMatch HP:0002143 Abnormality of the spinal cord semapv:LexicalMatching
missing the final 5 on the ID should be MP:0000955
Invalid MP ID: MP:0007180 decreased brown fat amount skos:closeMatch HP:0005995 Decreased adipose tissue around neck semapv:LogicalReasoning
ID should be MP:0001780
Invalid MP ID: MP:0001782 increased white adipose tissue amount skos:closeMatch HP:0008993 Increased intraabdominal fat semapv:LogicalReasoning
ID should be MP:0000008
Reference use case:
Problem description:
I find the term “high density lipoprotein cholesterol measurement” among the traits in the GWAS output for “cardiovascular disease”, which the GWAS Catalog tells me is EFO:0004612
Mapping EFO:0004612 using OXO: https://www.ebi.ac.uk/spot/oxo/search does not find:
Not sure if this is a problem inherent to OXO or the mappings are lacking?
Some time ago I looked for other mapping tools – not sure if there is anything better out there.
Searching in OLS for “high density lipoprotein cholesterol” https://www.ebi.ac.uk/ols/search?q=high-density+lipoprotein+cholesterol&groupField=iri&start=0
Can filter by MP and I get: https://www.ebi.ac.uk/ols/search?q=high-density+lipoprotein+cholesterol&groupField=iri&start=0&ontology=mp
For efficiency, we would need an automated way to conduct these mappings.
We want to add a table with all the cross-phenotype matches based on graph-machine learning.
Input graph: phenio
Learning set:
Output:
The equivalencies we are interested in are those where the subject and the object are from different prefix spaces. The prefix spaces of interest are: HP, MP, ZP, XPO, WBPhenotype, DPO (FBcv), PLANP, FYPO, DDPHENO, MGPO (the order is from most to least important for the time being).
In recent months we have migrated OBA to a modern infrastructure, so we should be able to create a draft mapping based on the following process:
oba_vt.sssom.tsv
]narrow
from OBA to HP/MP!) [oba_upheno.sssom.tsv
]oba_upheno.sssom.tsv
] and [oba_vt.sssom.tsv
] to produce [vt_mondo.sssom.tsv
, vt_hp.sssom.tsv
, vt_mp.sssom.tsv
] and provide feedback herehttps://github.com/mapping-commons/mh_mapping_initiative/blob/master/registry.yml
Update this to the latest SSSOM spec for mapping registries
request on sssom issue tracker all elements not in the sssom data model
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.