mapping-commons / semantic-mapping-vocabulary Goto Github PK

9.0 3.0 3.0 115 KB

https://mapping-commons.github.io/semantic-mapping-vocabulary/

Makefile 100.00%

semantic-mapping-vocabulary's Introduction

SEMAPV: A Vocabulary for Semantic Mappings

The Semantic Mapping Vocabulary (SEMAPV) is a vocabulary about the processes, entities and agents involved in the curation of mappings. It is being developed in conjunction with the Simple Standard for Sharing Ontology Mappings (SSSOM), providing a detailed vocabulary to describe, for example, different kinds of matching processes (lexical, logical, etc), as well as pre- and post-processing techniques employed.

To cite: http://doi.org/10.5281/zenodo.7672104

SEMAPV is currently in beta state and is likely still undergoing changes.

Core editorial Team

Nicolas Matentzoglu (Semanticly Ltd; @matentzn)
Chris Mungall (LBL; @cmungall)
Ernesto Jimenez-Ruiz (City, University of London)
Catia Pesquita (University of Lisbon)
John Graybeal (Stanford)
Charlie Hoyt (Harvard Medical School; @cthoyt)
Thomas Liener (Pistoia Alliance)

Please join the team by making an issue in the issue tracker.

Overview

A snapshot of the current SEMAPV hierarchy can be seen here:

A preliminary LODE documentation can be found here: https://mapping-commons.github.io/semantic-mapping-vocabulary/.

semantic-mapping-vocabulary's People

Stargazers

Watchers

Forkers

cthoyt gouttegd allenbaron

semantic-mapping-vocabulary's Issues

scope of "matching process"

Hi, thanks a lot for this work. I'd like to use semapv terms to fill mapping_justification in SSSOM but I have a question first.
Does "matching process" is relevant for a matching performed by a human expert?
In particular, I'd like to use the subclass sempav_voc:BackgroundKnowledgeBasedMatching to, e.g, explain correspondance between resistance + and + plant response. Eg: "Stripe rust plant response (CO_321:0000179)" and "Resistance to Stripe Rust (WTO:0000562)"

Would it be correct?

Definition for two matching process

Dear developer,

I am impressed by your work on the semantic mapping and how it relates to the SSSOM, so I would very much like to reuse your terms. However, the definition for two mapping_justification is missing: BackgroundKnowledgeBasedMatching and InstanceBasedMatching. Could you provide any interpretation for them?

Much thanks!

Proposal for a new classification of mapping relations

This is the initial draft for a better classification of semantic mapping relations. It is based on the skos classification of mapping relations, and extends it to allow for additional kinds of mapping relationships such as cross-species mappings and potentially other kinds of conflation relations (gene-reference, disease-phenotype protein, etc). It has been often noted that we are over-using skos:exactMatch massively for isomorphic concepts, and a new vocabulary of mapping relations, conservatively evolved (to avoid proliferation), will not only allow us to cater for these use cases, but also avoid watering down exactMatch further. In order to group all kinds of isomorphic match properties (definition below) we introduce a new mapping relationship semapv:isomorphicMatch, which groups relationships like skos:exactMatch / semapv:crossSpeciesExactMatch under one parent. semapv:isomorphicMatch is not (at least not as part of the intended use case) here supposed to be used as a mapping predicate, just to group other mapping predicates.

skos:semanticRelation
- skos:mappingRelation
  - skos:closeMatch: Two concepts are very closely related but not necessarily the same
    - skos:exactMatch: Two concepts correspond to the same real-world entity.
  - semapv:isomorphicMatch: The subject is isomorphic to object in the object_source, i.e considered of identical or similar form, shape, or structure; and vice versa.
    - skos:exactMatch: see above
    - semapv:crossSpeciesExactMatch: see above
  - semapv:nonIsomorphicMatch: The subject cannot be considered isomorphic to the object in the object_source, i.e considered of identical or similar form, shape, or structure. The object corresponds to exactly one subject in the subject_source.
    - skos:broadMatch
    - semapv:crossSpeciesBroadMatch
    - skos:narrowMatch
      - semapv:crossSpeciesNarrowMatch
    - skos:relatedMatch
      - semapv:crossSpeciesExactMatch: Two concepts correspond to an analogous concept across species (e.g. homologous structures, but could be anything, e.g, birds eye and human eye)

Add new term MappingDerivation

DerivedMapping:

Def: A matching process based on interpreting an existing mapping provided without an explicit semantic mapping predicate.

Example: An ad-hoc two column mapping provided by a research paper is used as a source to provide a semantic mapping (skos:exactMatch).

Motivation:

This happens often when translating mappings from non-SSSOM formats into SSSOM. We should recommend adding a comment to the mapping sets that describes how the predicate decision was made.

Obtain basic review of skeleton from core team members

Add sempav:BoundedPathMatching to SEMAPV

A matching process based on the comparison of matched super and subclasses (paths) of two entities.

from @sven-h

need to clarify some terms

Hi. I feel there is some sources of confusion in the definitions for the semapv elements:

matching: in semapv, sounds like the activity/process. Should it be by machines only (as is looks like today)? or also by humans?
mapping: in semapv, sounds like the result
curation:
- in semapv, ManualMappingCuration is defined as An matching process that is performed by a human agent and is based on human judgement and domain knowledge. . With this definition, the element should rather be named "ManualMatching" (as is it a process)
- otherwise, according to Merriam Webster, curation is the act or process of selecting and organizing (something, such as articles or images) for distribution or publication so this means that mappings already exist, i.e. have been computed. This would then be close to the semapv definition for review
review: A process that is concerned with determining if a mapping “candidate” (otherwise determined) is reasonable/correct. This should be applicable to mappings created by either a human or a machine
Could you please clarify or harmonize the lexicon used?

Add semapv:MappingInversion

Analogous to semapv:MappingChaining we introduce "mapping flipping-based matching process", which is defined as:

A matching process based on the reversing or flipping of the subject with the object of a mapping in accordance with the semantics of the mapping predicate.

Add versioning to semapv

Already done, just documenting here.

f1aecb7

Add semapv:EntityCloning

I want to add a new "mapping activity" which I can use as a "mapping justification" in SSSOM:

name: "entity cloning"
definition: "A process that involves cloning an entity from one semantic space to another."
comment: "Many semantic spaces, such as Wikidata or dbpedia, frequently incorporate concepts or classes from other semantic spaces, such as ontologies. For example, the class DOID:0060392 was migrated/cloned to wikidata:Q21124537 as part of a DO - Wikidata alignment process."

A cloning process like this one can serve as a mapping justification is SSSOM: If the term was migrated / copied / cloned, an exact mapping can be safely assumed.

How to deal with Grouping terms for matching approaches

We typically want to represent the matching approaches themselves, i.e. the techniques used, rather than their groupins, such as:

Entity Alignment Matching
Ontology Matching
Knowledge Graph Matching

because we do not want these to crop up as part of our justifications in SSSOM. However, there is a valid case to wanting these properly defined here. We could accept them as terms, and exclude them from the SSSOM mapping_justification?

See #8

Using mappings to replace obsolete terms

One possible use of a SSSOM mapping set is to perform mass renaming in a given database, ontology or other data vault. For example, given an ontology and a mapping set, if the IRI of an entity in the ontology matches the subject ID of a mapping in the set, then replace that IRI with the corresponding object ID.

Should we have a way to explicitly indicate that a mapping is intended to to be used for this kind of replacement? That is, instead of a mapping that merely indicates that the subject and the object are an “exact match”, we would have a mapping that explicitly indicates that the subject is to be replaced by the object – which is slightly different than saying than the subject and the object can be used interchangeably (the normal meaning of an exact match).

I can see three ways of making such a statement explicit:

a) Using IAO:0100001 (“term replaced by”) as the mapping predicate. That’s the easiest way as it does not require anything that does not already exist.

b) Having a new dedicated mapping relation in SEMAPV, such as semapv:ReplacementTerm or similar. It would probably be a subproperty of skos:exactMatch.

c) Instead of using the mapping predicate, we use another field, probably sssom:MappingJustification. That is, the mapping predicate would remain skos:exactMatch, but the mapping justification would be a new value like semapv:TermReplacement or similar.

I have no strong opinion on which way would be better (though I slightly dislike c as I feel this is overloading the meaning of sssom:MappingJustification somehow). But I think it would be nice to have one recommended way of doing replacements with SSSOM, otherwise I am concerned that all three methods (and possibly other methods I have not thought of!) will end up being used in the wild.

VoiD defines

http://rdfs.org/ns/void#Linkset

Add transformer, llm, ML and graph-rl matching processes

Due to some work with @cassiatrojahn and @sven-h I am suggesting to add the following categories of match types related to neural networks in the wider sense.

IRI	skos:prefLabel	skos:definition	skos:example	Parent
semapv:TransformerBasedMatching	transformer-based matching process	A matching process that utilizes transformer models, which are a type of deep learning model architecture designed to handle sequential data, particularly for natural language processing tasks.	Matches between entities are established based on the contextual relationships learned by the transformer from large datasets.	semapv:Matching
semapv:LLMBasedMatching	LLM-based matching process	A matching process that employs large language models (LLMs) which are pre-trained on vast amounts of text data and can understand and generate human-like text, making them suitable for tasks requiring a deep understanding of language.	Matches between entities are determined through the language understanding capabilities of LLMs, such as semantic context and language inference.	semapv:Matching
semapv:MachineLearningBasedMatching	machine learning-based matching process	A matching process that involves machine learning algorithms which learn from data to find patterns or make decisions with minimal human intervention.	Matches between entities are made by applying learned models to data points to predict similarities or relationships.	semapv:Matching
semapv:GraphRepresentationLearningBasedMatching	graph representation learning-based matching process	A matching process that uses graph representation learning which is a method in machine learning that focuses on learning a compact representation for graphs, capturing their structural information.	Matches between entities are identified by analyzing the learned representations that encode the structural features and relationships within graph data.	semapv:Matching

Unexpected prefix displayed in documentation

Why does ns2 appear as the prefix in https://mapping-commons.github.io/semantic-mapping-vocabulary/? Shouldn't this be semapv?