mapping-commons / disease-mappings Goto Github PK
View Code? Open in Web Editor NEWRepo to host disease ontology mappings
License: Creative Commons Zero v1.0 Universal
Repo to host disease ontology mappings
License: Creative Commons Zero v1.0 Universal
Lets focus on MONDO/ICD10 related ones for now.
The idea is to figure out a clear recipe with which we can determine a match between two phenotypes and two diseases.
@sabrinatoro Can you help me with that? I would like to capture all the possible mapping rules that can lead to a mapping. This does not include your fine-grained work on distinguishing when to do "exact" vs "narrow" that you captured in your ICD10 work - just the general "thought processes" that can be applied to determine whether a mapping (exact or otherwise) holds.
When matching diseases, potentially across species, the following matching disease rules (MDR) can be applied:
http://w3id.org/sssom/commons/disease/[a-z][a-z0-9-]+.sssom.tsv
i.e. no underscores, upper case, or non-ascii characters in the id part.
To be continued
Implement goal in the Makefile that takes as an input a set of mappings, then
While we are still working on propagating license statements, can we use
license: https://github.com/mapping-commons/mapping-commons.github.io/blob/main/docs/original_license_applies.md
as a default license for all mapping sets in dmc (disease mapping commons) that do not have one already?
We need a way to understand how far we are along the mapping process and how far we still need to go.
unmapped-in-scope
: but in scope (it is a disease)unmapped-excluded
: but excluded (because out of scope)mapped
.unmapped-in-scope
, unmapped-excluded
and mapped
add up to 100%.category
that is either unmapped-excluded, mapped or unmapped-in-scope### Example Table:
code | category |
---|---|
ICD10CM:ABC | unmapped-in-scope |
ICD10CM:A12 | unmapped-excluded |
Makefile
goal hereanalysis/icd10cm-mapping-progress.tsv: mirror/icd10cm.owl exclusions/icd10.tsv mappings/icd10cm.sssom.tsv
python........
mirror/do.owl:
)ntbt
for example)Provide a list of URLs that you think hold good quality ICD10CM mappings, e.g.:
We should start thinking about capturing confidence in mappings from the side of the registry.
My suggestion is to have a separate element on the registry if the referenced mapping sets:
mapping_set_id: x:y
registry_confidence: 0.5
which capture how much we "trust" a mapping set. We can specify also a default_registry_confidence
directly for the registry metadata, which captures the confidence for all registered mapping sets that do not have a registry_confidence
value. I would suggest to set it at 0.75 or something similar.
From Mondo:
https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html
this should be realised as a SSSOM py extension, analogous to:
If this turns out cumbersome, a stand alone python script here in mapping commons will do as well!
See here: monarch-initiative/mondo#3122
These will then naturally be taking into account by a future "boomer" run (#9). Once the boomer pipeline is implemented, reconciled mappings will go straight back into mondo.owl.
No need to try and use sssom-py mapping extraction on Mondo!
So far I have this list during generation of mondo.sssom.tsv
WARNING:root:ICD9CM is used in the data frame but does not exist in prefix map
WARNING:root:Orphanet is used in the data frame but does not exist in prefix map
WARNING:root:Wikidata is used in the data frame but does not exist in prefix map
WARNING:root:SCITD is used in the data frame but does not exist in prefix map
WARNING:root:DERMO is used in the data frame but does not exist in prefix map
WARNING:root:PO_GIT is used in the data frame but does not exist in prefix map
WARNING:root:MTH is used in the data frame but does not exist in prefix map
WARNING:root:MEDGEN is used in the data frame but does not exist in prefix map
WARNING:root:KUPO is used in the data frame but does not exist in prefix map
WARNING:root:Reactome is used in the data frame but does not exist in prefix map
WARNING:root:HGNC is used in the data frame but does not exist in prefix map
WARNING:root:CSP is used in the data frame but does not exist in prefix map
WARNING:root:GC_ID is used in the data frame but does not exist in prefix map
WARNING:root:SUBSET_SIREN is used in the data frame but does not exist in prefix map
WARNING:root:url is used in the data frame but does not exist in prefix map
WARNING:root:LOINC is used in the data frame but does not exist in prefix map
WARNING:root:NDFRT is used in the data frame but does not exist in prefix map
WARNING:root:IMDRF is used in the data frame but does not exist in prefix map
WARNING:root:ICDO is used in the data frame but does not exist in prefix map
WARNING:root:OMOP is used in the data frame but does not exist in prefix map
WARNING:root:MeSH is used in the data frame but does not exist in prefix map
WARNING:root:ICD9 is used in the data frame but does not exist in prefix map
WARNING:root:GARD is used in the data frame but does not exist in prefix map
WARNING:root:COHD is used in the data frame but does not exist in prefix map
WARNING:root:Fyler is used in the data frame but does not exist in prefix map
WARNING:root:ICD11 is used in the data frame but does not exist in prefix map
WARNING:root:MEDDRA is used in the data frame but does not exist in prefix map
WARNING:root:Wikipedia is used in the data frame but does not exist in prefix map
WARNING:root:GTR is used in the data frame but does not exist in prefix map
WARNING:root:CALOHA is used in the data frame but does not exist in prefix map
WARNING:root:ONCOTREE is used in the data frame but does not exist in prefix map
WARNING:root:NIFSTD is used in the data frame but does not exist in prefix map
WARNING:root:EPCC is used in the data frame but does not exist in prefix map
Is there an automated way of deducing prefix_maps
@matentzn ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.