Giter Site home page Giter Site logo

fair-mi / miiid-schema Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 732 KB

A metadata schema for the Minimum Information about Intermicrobial Interaction Data (MIIID) using LinkML

Home Page: https://fair-mi.github.io/miiid-schema/

License: GNU General Public License v3.0

Makefile 27.53% Python 71.23% Shell 1.24%
fair-data linkml linkml-schema metadata-standard microbial-ecology microbial-interactions

miiid-schema's People

Contributors

cpauvert avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

miiid-schema's Issues

Intermicrobial interaction modelling decisions

This is a rolling issue of pros and cons of modelling decisions especially regarding how to model relationships (see docs) as well as assumptions taken during the development of the MIIID metadata schema.

0. Model all properties from the Perspective paper as strings

Pros:

  • Easy to model
  • Can accomodate n-ary relationships
  • Easy-ish as a TSV

Cons:

  • No mapping of items/enums to WikiData or ontologies
  • No constraint nor validation (except with circonvoluted regex)
  • Not FAIR in the sense that the schema does not know what is each element
  • Burden on the user for formatting input

Status: not considered

1. microbial Participant as a separate class and participants is a slot accepting multiple Participant

Pros:

Cons:

  • Conversion from YAML -> TSV -> YAML does not preserve data integrity (#2)
  • Will prove difficult for integration with DataHarmonizer
  • Inlining of this nested structure in a TSV will makes it hard for people to contribute
  • Inlining prevents converting to others format as well that are not tree-like

Status: Tried as a first approach. Superseded by (3)
Commit: 9a2016a

2. Model interaction using the biolink

Pros:

  • Reusing existing schema and being part of larger association schema
  • So all pros of (1)

Cons:

Status: not considered yet because of complexity

3. participants is a slot accepting multivalued names, tax_id

Pros:

  • Easier to model
  • Each value of the multivalue has an homogeneous range
  • Can accomodate n-ary relationships

Cons:

  • Name and taxonomic identifiers (as well as future properties) DO NOT belong to the same object which reduces increases errors during data input.

Status: considered implemented
Commit: 10fadb6

add comments and examples to slots

  1. to help during the modelling
  2. to provide more documentation when exporting to DataHarmonizer

For instance, in the AMBR schema in DataHarmonizer:

 sample collection project name:
    name: sample collection project name
    title: sample collection project name
    description: The name of the project/initiative/program for which the sample was
      collected.
    comments: Provide the name of the project and/or the project ID here. If the information
      is unknown or cannot be provided, leave blank or provide a null value.
    examples:
    - value: Children's Hospital biofilm study (A3-701-01)
    slot_uri: GENEPIO:0100429
    range: WhitespaceMinimizedString
    recommended: true

will appear as follow in the DH help

image

Inlining does not preserve data integrity

Using the toy example of interaction data: https://github.com/FAIR-MI/miiid-schema/blob/23a565a843aec06f6073bbf0bb3e559911480d99/examples/IntermicrobialInteraction-001.yaml

cd miiid-schema/examples
poetry shell
# YAML -> TSV
linkml-convert -o IntermicrobialInteraction-001.tsv -s ../src/miiid_schema/schema/miiid_schema.yaml IntermicrobialInteraction-001.yaml

The data was "understood" but the formatting is not great. Plus, the fields are repeated (tax_id..)

#  IntermicrobialInteraction-001.tsv
id	participants	evidence_type	reference
example:IntermicrobialInteraction001	[{'name': 'Acidobacteria', 'tax_id': 57723}|{'name': 'Gammaproteobacteria', 'tax_id': 1236}]	high throughput evidence used in automatic assertion	https://doi.org/10.1038/ismej.2011.119

But when converting back to YAML

# TSV -> YAML
linkml-convert -o IntermicrobialInteraction-001-from-tsv.yaml -s ../src/miiid_schema/schema/miiid_schema.yaml IntermicrobialInteraction-001.tsv
#  IntermicrobialInteraction-001-from-tsv.yaml
entries:
- id: example:IntermicrobialInteraction001
  participants:
  - '{''name'': ''Acidobacteria'', ''tax_id'': 57723}'
  - '{''name'': ''Gammaproteobacteria'', ''tax_id'': 1236}'
  evidence_type: high throughput evidence used in automatic assertion
  reference: https://doi.org/10.1038/ismej.2011.119

It is interpreted as a string..

all classes are displayed in the artefacts

the container class IntermicrobialInteractionCollection has an excel sheet for instance, as well as the NamedThing. Not sure how to prevent this.

UPDATE: it's not a bug, it's a feature!

So maybe related to #6 to enforce a display

add the possibility of missing data

see https://linkml.io/linkml/schemas/advanced.html#unions-as-ranges

Just like in DataHarmonizer: suggest to use the INSDC terms https://ena-docs.readthedocs.io/en/latest/submit/samples/missing-values.html#insdc-missing-value-reporting-terms

  • not applicable
  • not collected
  • not provided
  • restricted access

BUt how would they fit for MIIID?

  • tax_id: when novel species -> not provided
  • sequence_id: for presequencing era -> not applicable
  • env_broad_scale: for engineered ecosystems -> not applicable
  • participants_outcome: for unknown -> not provided

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.