Giter Site home page Giter Site logo

ejp-rd-vp / resource-metadata-schema Goto Github PK

View Code? Open in Web Editor NEW
14.0 9.0 10.0 31.08 MB

Metadata model and schemas for the EJP virtual platform

Home Page: https://ejp-rd-vp.github.io/resource-metadata-schema/

License: Creative Commons Zero v1.0 Universal

rare-disease-registries ejp biosample-registries registries metadata

resource-metadata-schema's Introduction

GitHub GitHub tag (latest by date) GitHub tag (funder)

Metadata for EJP rare disease patient registries, biobanks and catalogs

As part of the European Joint Programme (EJP) for Rare Disease, we are developing standards for rare disease registries to describe their metadata that will improve the FAIR-ness of these resources.

The core model is designed to represent data about a rare disease patient and biosample registries. The model is based on and builds on existing standards, such as the European Rare Disease Registry Infrastructure and the Common Data Elements from the rare disease community and other more generalised standards for data sharing such as the W3C DCAT vocabulary. We are also working to align with similar schema standarisation efforts such as RD connect semantic model, schema.org, bioschemas, MIABIS and GA4GH (see also schema blocks and phenopackets. A proposed semantic model for the Common Data Elements can be found here.

Status: This is version 1.0 of EJP RD metadata model.

Metadata modules overview

The figure below gives an overview of upper level concepts and properties used in our metadata model.

You can browse different metadata modules by visiting the links below.

Ontologies used in the metadata model

To describe different metadata modules listed above we used various exiting ontologies and vocabularies which are listed below.

Concepts to describe resources that we use in the Rare diseases domain such as biobank and patient registry are not defined in resource description vocabulary as a DCAT. To overcome this issue we have extended DCAT with by added missing concepts to DCAT vocabulary.

Implementation of metadata model

The metadata modules described in this repository are provided with RDF examples and RDF validation artifacts. You get more benefits if you implement metadata model in RDF. However implementation of our metadata model is not limited only to RDF.

Use cases

Below you can find some use cases which can be addressed by proposed metadata modules.

  • Provide minimal metadata to describe a rare disease registry or biobank, or a catalog of registries or biobanks. The metadata should be sufficient to expose data about these resource through the virtual platform
  • Provide a uniform way for resources to expose the primary disease using a Orphanet code so that resources can be searched by disease in the virtual platform
  • Provide a mechanism to identify resources and harmonies duplicate resources across catalogs
  • Provide geographical information so resources can be filtered by country in the VP
  • Expose if resources provide metrics about individuals, such as number of cases
  • Expose if a resource has access to biological samples, such as tissue or cell ines.
  • Expose if the resource has further contact information
  • ...

Configurations to provide metadata records according to the EJP-RD schemas

  • FDP reference implementation or FAIR in a Box (FiaB): here
  • Spreadsheet to FDP reference implementation: here

resource-metadata-schema's People

Contributors

brunasv avatar esthervanenckevort avatar haithamabaza avatar hbcesar avatar henrietteharmse avatar holubp avatar luizbonino avatar markwilkinson avatar matentzn avatar orphanet avatar philipvd avatar rajaram5 avatar ronaldcornet avatar s2ola avatar simonjupp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

resource-metadata-schema's Issues

Decision on VPQueryable

A long discussion resulted in a decision to change the ontology term from ejprd:VPQueryable --> ejprd:VPContentDiscovery

Decision will be ratified at P2 meeting on Friday

capturing number of cases at the registry level

Both the RD connect finder and ERDRI.dor capture the number of cases for a given disease within a registry. Sometimes a registry might house multiple different diseases with different counts for each. Obviously this value could be computed if you can get access to the individual records, but let's suppose for now this isn't always going to be possible and assume there is a use-case for getting at these counts at a registry level.

Summary level metadata about the cases is what I called StudyDesign in the schema diagram. This was loosely based on the description of https://schema.org/MedicalStudy. ERDRI call this section Structure and it covers metadata for things like number of cases and inclusion/exclusion criteria.

I initially suggest we go for something simple in the modelling such as

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ejp: <http://purl.org/ejp-rd/vocabulary/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix ordo: <http://www.orphanet.org/ORDO/>

<http://catalogue.rd-connect.eu/apiv1/regbb/organization-id/10779>
  dc:title "3q29 deletion Registry" ;
  a ejp:PatientRegistryDataset ;
  ejp:disease_case [
    ejp:is_about ordo:ORPHA_65286 ;
    ejp:number_of_patients "38" ;
  ] , 
[
    ejp:is_about ordo:ORPHA_75286 ;
    ejp:number_of_patients "23" ;
  ] ;

In this scenario the use of dcat:theme to capture the disease become a bit redundant. We wouldn't want there to be two places in the schema where the disease associated to the registry is described.

Data Service "conformsTo" is incorrectly described

the correct "ConformTo" should be "conformsTo"

The definition is "the standard that the output data from the service will conform to - e.g. JSON-LD, or .CSV

Should have a URI as a value, that points to the spec document of that data standard

JSON schema visualizer

Identify an appropriate JSON schema visualizer tool or library to a) sketch an understandable view on the actual state of the model and b) keep the visualization on the website (#5) synchronized with the changes in the repo.

Metdata model diagram is missing accessService property on Distribution

In the FDP reference implementation, the only way to create a Data Service ("type 2") is to create it as a connection from dcat:Distribution.

The property connecting a distribution to a data service is:
dcat:accessService

In the diagram (and in the reference implementation) there is a property of Data Service "dcat:servesDataset" which points to the Dataset, but in practice that field is EXTREMELY difficult to fill correctly! So the missing property described above is the more important one, and the one that is required by the FDP.

Data Service "description"

Currently:

Description
A description of this data service.
Note This field is optional

Description
A description of the services available via the end-points, including their operations, parameters etc.
Note This field is optional

This should be:

Description
A human-readable (narrative) description of the functionality and features of the Servie
Field is optional (but highly recommended!)

endpointDescription
A machine-readable document defining the API of the service (e.g. in openAPI)
Field is OPTIONAL for services that are attached to dcat:Catalog
Field is MANDATORY for services that are attached to dcat;Dataset
(this is regulated at the level of policy, not at the level of schema)

Change the Biobank model to be compliant with MIABIS

MIABIS aims to standardize data elements used to describe biobanks, research on samples, and associated data.
It defines, general attributes to describe biobanks and sample collections at an aggregated/metadata level.
The model of EJPRD can represent the biobank Resource to be compliant with the MIABIS concepts and attributes.
The current EJPRD model uses a customized term from the EJPRD vocabulary for a Biobank Resource (ejprd:Biobank). To be compliant with MIABIS I propose to adopt obib ontology to denote the concepts.

The list of changes I propose to make the model compliant with MIABIS, is the following:

  • Replace the ejp:Biobank entity to be a dcat:Resource (so it fits in the DCAT model) and obo:OBIB_0000616 (obib:biobank). The term from OBIB will make the concept to be compatible with MIABIS collection.
  • Change dct:publisher object of the Biobank (i.e. the current :organisationShape) with the following changes:
    • add type obo:OBIB_0000623 (obib:biobank_organization) to specify it is a Biobank Organization as defined by MIABIS. It will be both a foaf:Organization and an obib:biobank_organization
    • add the obo:OBIB_0000732 (obib:owns) that relates the biobank to the obib:biobank. This may not be strictly necessary (there is already the dct:publisher ) but it better defines the nature of the link between the biobank organization and its collections as defined by MIABIS.
    • add the obo:RO_0000053 (obo:has_charasteristic) predicate to specify the Legal Entity of the Organization. This field, which may be optional, is used to model the "Juristic Person" attribute of a Biobank Organization in MIABIS. Since there is no specific property for this, I propose a generic predicate that links to an object of type obo:OMRSE_00000038 (obo:legal_entity).

The changes can be seen in the PR #39

testing framework for the schema

On commit tests should

  • check all schemas are valid JSON schema files
  • test some example datasets against the schema
  • convert example JSON to RDF using the JSON-LD context
  • validate the resulting RDF using the SHACL files

accessService incorrectly placed

In the current schema, accessService is a property of a DataService. In DCAT, accessService is a property of a Dataset object.

boolean modelisation

During DCAT-AP workshop with EC expert on DCAT, they provide us an advice regarding booleans:

"Modeling advice: Boolean properties are to be avoided, use properties with a controlled vocabulary instead."

Catalogue location Organization

I'm guessing that you're using schema.org for these structures?

My reading of Schema does not allow Organization to be the range of the location predicate (it can be the domain, but not the range) (See Catalog model)

the Location component

This is a "heads up" that there is a disconnect between the Location model in the metadata, and the model that is expected by Beacon.

I need to talk to Tony (Friday) about how to resolve this.

Can you flag "location" as being potentially subject to change?

Compatibility Check for Static vp-index with Latest Meta Data Schema Changes

Describe your problem.

We are currently actively developing on the vp-portal. In order to ensure a seamless integration with the latest changes in the meta data schema, we're sharing the current version of the static vp-index. We kindly request your assistance in reviewing the provided list of attributes and their compatibility with the recent schema updates.

Describe the solution you'd like

Please evaluate the list of attributes we've shared and verify their compatibility with the latest meta data schema. If any discrepancies or incompatibilities are identified, we kindly request your guidance on the necessary adjustments that need to be made to ensure a 100% compatibility with the latest schema changes.

Attachments

STATIC-VP-INDEX

Additional context

If you have any questions or require further explanations, please don't hesitate to reach out to us. We're also available to schedule a meeting in order to go over all the necessary details.

Onboarding document review

Upon the review of the onboarding document, the team has a few items that require confirmation (https://ejprd.sharepoint.com/:w:/r/sites/pillar2-central9/Shared%20Documents/General/3.3_WF_FAIRification/OnboardingTasks/Onboarding%20document%202.0.docx?d=wf386b4e9247b47b3bdc049bb9cff57e5&csf=1&web=1&e=aSim66):

Can you kindly take a look at the questions under the following sections:

  1. DataService properties
  1. Guideline no properties found pg 30 (should we remove it from the documentation altogether?) question for @henrietteharmse

Drop VPDiscoverable and VPQueyable

This is based on the discussion during General Meeting 2023/05/12:

[11:29] Brookes, Anthony J. (Prof.) (Guest)
Mark Wilkinson

I think we have discarded the "Discoverable" tag, because the FDPs are now required to register themselves in the VP index, so there's no benefit to that tag anymore... is that not correct?

Agreed. We don't need the 'discoverable' tag. And as Marc said, we don't the queryable tag either, as DCAT has a placeholder for services URLs.

[11:29] Marc Hanauer
Mark Wilkinson
I think we have discarded the "Discoverable" tag, because the FDPs are now required to register themselves in the VP index, so there's no benefit to that tag anymore... is that not correct?
same for "queryable". By the fact to have a dcat:service properly described with url endpoint, it's enough...

License

We have decided on a default license for EJP: https://w3id.org/ejp-rd/resources/licenses/v1.0

If people use the FiaB, the configuration documents tell them how to set it as the default "inside of the FDP". but for those who have not configured the reference implementation, they will need to manually add it as the "license" field.

Type of the registry

In the registry json schema, the type attribute is described as the primary type of the registry.
However, in Erdri.dor, a registry can have multiple types without a way to know which one is the primary. How can these types be then represented based on such a schema?

PopulationCoverage

Currently PopulationCoverage is modelled to have type sio:SIO_001166 that can have the values ["National" "International" "Regional"]. A potential problem with this modelling is that the meanings of National, International and Regional are not well defined. It may be better to add these as concepts to the EJPRD ontology, possibly extending http://purl.org/dc/terms/spatial.

logoURL and homepage

Because of #40 (comment) we now have various levels of a DCAT record being independently discoverable. Because of this, we need to hang the logoURL and homepage properties off of dcat:Resource, so that they propogate down to the catalog, dataset, distribution, and/or data service.

Thanks!

description should not be optional

"description" is a required field for EOSC-compliant metadata, and the FAIR evaluator tools will be checking for it, so we should change this from 0..* to 1..*

Building on standards

The README describes that we build on standards.
I would like to take this as far as possible, using URI's wherever we can.
Now there is mention of "catalog", which I interpret as "ejp_rd:catalog".
I would really like to see this changed to "dcat:catalog".
In other words: adopt or explain. If dcat:catalog doesn't work, then make clear why and how we adhere and divert from it.
In MarkW's strawman demo, which really helps to provide insight, a catalog has catalog_of_registries. In dcat, a catalog has datasets.

I strongly suggest to adopt the dcat modeling, or show why it's broken.
And this also goes for other elements in our current model.

The definition and use of Location is confusing

My aim here is to start a discussion on this. It may be purely that I do not understand the intent here well enough.

I find the definition and use of Location shapes confusing in the metadata schema. There seems to be several approaches for defining location schema in different shape files. Here are the different approaches I noted:

  1. ejp:location: location.shex, dataset.shex, dataservice.shex, catalog.shex

  2. dc:Location: Used in biobank.shex, resource.shex, patientRegistry.shex, catalog.shex

:locationShape IRI {
  a [dct:Location];
  dct:title xsd:string;
  dct:description xsd:string*;
}
  1. dc:spatial: Used in organization.shex.

It may be perfectly valid to have these different options for representing a location. However, it may be better to define the different options in single .shex file and import it into other .shex files where this is needed? The advantage is that if we need to change how we deal with locations, it can be done in a single place.

@rajaram5 Do you have any thoughts on this?

Biobank and PatientRegistry as disjoint sub-types of Resource

During the L1/L2 meeting an issue was raised concerning resources that present themselves as both Biobank and PatientRegistry. In the group's discussion, we concluded that a given resource should only be declared as one of these types, not both. Although in the metadata schema they have the same set of properties, these entities are of different nature. Naturally, a biobank (or multiple) and a patient registry (or multiple) can be managed by or belong to a single organisation. However, we consider that the focus of EJP-RD is on the informational resources and not on the organisations managing them.

The request is to make clear that these sub-classes are disjoint in the diagram, related ontology and documentation.

RDF version of json-schema

To have a single point of reference for defining SPARQL queries over our schema, it would be good to have a canonical representation of the json-version of the schema in SPARQL. @rajaram5 @S2Ola before we start getting our hands dirty, could you both write up your proposal on how to maintain an RDF representation of the schema that is automatically updated wrt to changes in the json schema please?

Discussion 27 Oct wrt model changes

  1. PatientRegistries & Biobank extend Datasets. Should it not be extending Catalog? Main difference is that Catalog is a recursive structure and Dataset is not.
  2. IsPartOf is internal to FDP and can be removed from Distribution.
  3. Changed edam:operation2421 to edam:operation_0004 and linked to DataService rather than Resource.
  4. accessService in Distribution have cardinality 0..*

Error in ejp-rd prefix in image

Image on main page of the Resource Semantic Metadata Schema has a typo in the ejp-rd prefix, now ending with "cvocabulary" instead of "vocabulary".

Looks great otherwise!
Any chance that some of the open issues can be closed now? ๐Ÿ˜€

Decision about logo

logo should be associated with a dcat:Resource (not an agent)

logo should be optional

dcat:Resource already has a landingPage, so no additional web page element is necessary.

Decision will be ratified at P2 meeting on Friday

endpointURL in Data Service

endpointURL should be [0..1]

There are two different kinds of services. Some services provide a website as their interface, while others are more "Swaggery".
For the first type, we require a dcat:landingPage
For the second type, we require a dcat:endpoingURL, and a dcat:serviceDescription (pointing to theYAMLy kind of interface definition)

clarify accessRights vs odrl:hasPolicy

In the documentation, it needs to be clear that accessRIghts serves some rights statement OTHER THAN an ODRL rights statement, and that the alternative property should be used if you are providing ODRL.

(these are properties of dcat:Resource)

Make license optional in spreadsheet template

This was raised in the resource onboarding meeting of 20 Jan 2023. Users may not have exact licence information which then is a hurdle to the onboarding of their resources. The suggestion is to make the licence optional and then have the FDP default to a very restrictive licence. Thus from a FDP it is mandatory, but from a user uploading the resource metadata information, it is optional.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.