Giter Site home page Giter Site logo

ternaustralia / ontology_tern Goto Github PK

View Code? Open in Web Editor NEW
5.0 10.0 3.0 9.93 MB

TERN Ontology

Home Page: https://linkeddata.tern.org.au/viewers/tern-ontology

License: Creative Commons Attribution 4.0 International

Dockerfile 3.73% Python 92.22% Makefile 1.07% CSS 2.98%
rdf owl shacl linked-data semantic-web ontology

ontology_tern's Introduction

TERN Ontology

The TERN Ontology is an OWL Ontology with SHACL profiles to facilitate the representation of ecological site-based survey and opportunistic observation data. The TERN Ontology is used as a common information model to represent and facilitate the sharing of survey data across different systems.

View classes: https://linkeddata.tern.org.au/viewers/tern-ontology
Online documentation: https://linkeddata.tern.org.au/information-models/tern-ontology
Specification document: https://ternaustralia.github.io/ontology_tern

Releases

The TERN Ontology makes GitHub Releases for each version. See TERN Ontology releases for a list of releases.

Source files

Source files are maintained as RDF Turtle files, and they are located in the docs/ directory as files ending in .ttl.

Only edit the source files in TopBraid Composer.

Source files:

Version control

The main branch (master) is the working branch of the TERN Ontology. Changes must be made in another branch, along with a GitHub pull request to merge into the main branch.

Each push to a branch will trigger GitHub Actions to run validations and tests. These validations and tests must pass before merging the branch into the main branch.

Editing the TERN Ontology

We use ontotools, a Python command line application to normalise the source files.

Ensure the following instructions are performed whenever edits are made to the source files before committing to git.

Create a Python 3 virtual environment

python3 -m venv venv

Activate the virtual environment

source venv/bin/activate

Install the required packages

pip install -r requirements.txt

Run ontotools to normalize the source file for TERN Ontology

This will normalize the tern.ttl file.

ontotools file normalize docs/tern.ttl

Run ontotools to normalize the source file for TERN Ontology TERN Ontology SHACL shapes

This will normalize the tern.shacl.ttl file.

ontotools file normalize docs/tern.shacl.ttl

Making modifications

  • Bump the version number in the ontology, the version information, and the modified date.
  • Enter the new changes into CHANGELOG.md following the conventions of semantic versioning.

Each version should:

  • List its release date in the above format.
  • Group changes to describe their impact on the project, as follows:
  • Added for new features.
  • Changed for changes in existing functionality.
  • Deprecated for once-stable features removed in upcoming releases.
  • Removed for deprecated features removed in this release.
  • Fixed for any bug fixes.
  • Security to invite users to upgrade in case of vulnerabilities.

License

The contents of this repository is made available for use under the Creative Common Attribution 4.0 International (CC BY 4.0). See the LICENSE file for the deed.

Contact

TERN Support
[email protected]

ontology_tern's People

Contributors

edmondchuc avatar junrongyu avatar kitchenprinzessin3880 avatar nicholascar avatar smguru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ontology_tern's Issues

Rename tern:dimension to tern:dimensions

tern:dimension should be better names "dimensions" as you clearly want more than one (e.g. the example of 100x100)

Also, the definition contains a mispelling.

Also, consider providing guidance for the use of GeoSPARQL 1.1's geo:hasArea property somewhere.

Alternative to SHACL validation requiring inferencing

At the moment, the target instances must contain the target class to pass validation.

Example, an observation's sosa:hasFeatureOfInterest must have a value which states it is a tern:FeatureOfInterest to pass validation even if the target data states it is one of the specialised classes, e.g., tern:MaterialSample.

The current solution is to either explicitly state the required class in the data or enable inferencing during validation.

What I propose is to run the SHACL shapes definitions through a processor which expands the target classes to include all of the specialised classes of the original target class. This will ensure the data is conformant without downstream users requiring inferencing or the addition of explicit statements to validate their data.

Relax shape constraints

The current set of SHACL constraints on classes are too strict and too bias towards TERN's implementation. Relax the constraints so that they are still valid and are still interoperable with the other aligned standards.

Add admin area Feature Type

The Feature Type vocab at http://linked.data.gov.au/def/tern-cv/68af3d25-c801-4089-afff-cf701e2bd61d does not include some FTs needed for things the BDR is encountering, e.g. Sampling might be done to show the existence of a Taxon within an administrative area, such as a mine site. The Administrative area really is an FoI since it is a property of it - existence/non of a Taxon - that we want to know about.

Please add:

  • PrefLabel
    • administrative area
  • Description
    • human-defined region for business purposes such as allocated for mineral exploitation, or reserved for nature preservation or zoned for suburbs
  • Definition
    • A region of the earth's surface, terrestrial or marine or mixed, that is defined for human purposes.
  • Top concept of
    • yes
  • source
    • BDR feature types
  • example
    • Tom Price Mine, Kakadu National Park, Southern Kangaroo Island Marine Park, Shorncliffe (suburb of Brisbane), New South Wales, Meshblock 11234 (ABS census region)
  • notation
    • administrative-area

TERN loc ontology not needed

The TERN Geometry class, https://w3id.org/tern/ontologies/loc/Geometry, is defined as a subclass of geo:Geometry but imposes no additional required properties*, thus it's not needed.

The purpose of tern:Geometry seems to be to indicate that a Geometry could/should have a geometry type, indicated by dcterms:type, but this can be realised by just using that property on a regular geo:Geometry instance.

The TERN Geometry also includes a Shape to ensure that, if dcterms:type is present, then it must be of type sh:IRI but again, this Shape can be defined for geo:Geometry.

So: remove the tern:Geometry class but retain the Shape! This will improve ease of use since we just have a GeoSPARQL Geometry but you may type it.

Likewise, tern:LineString, tern:Point & tern:Polygon are not needed as they can be replaced with the SF classes they subclass.

* sure, it says to use only WKT for the geometry literal but GeoSPARQL has conversion functions defined

Site geometry

For many sites there is only a point location.
The definition of tern:Site should allow this directly.

I suggest just using geo:hasGeometry in the OWL/RDFS, and then add an OR SHACL rule to constrain it to a Point or Polygon.

Create specialised classes of `tern:Site`

Some of the well-defined concepts in ecology such as quadrats and transects are modelled as a Site. And the type of site is currently represented by a controlled vocabulary. Since transects are well-known concepts and have well-known properties such as transect direction and transect start point, it makes sense to model transects as a specialised class of tern:Site.

The current site type controlled vocabulary is also small (3 concepts - plot, transect, quadrat) so the cost of creating these specialised classes is small.

Rename `tern:dimension` to `tern:extent`

The term extent is more commonly understood and less ambiguous than the term dimension, especially in GIS land. See spatial extent. See notes on dimensional extent.

Note that geosparql:Geometry is already used to describe the spatial extent and is the preferred way of representing extent. The property tern:extent is only used when the value space is a text literal. E.g. 100mx100m as seen in some of the plot datasets.

Update W3ID redirects to point to GH pages instead of https://raw.githack.com/

We are currently using https://raw.githack.com/ with their development URLs for serving the Turtle files out of this repository. The development URLs rate limit the number of requests to the server which may become blockers on TERN's side.

We had to use https://raw.githack.com/ because it served the files back with the correct content-type header while GitHub's raw.githubusercontent.com did not (defaults to text/html).

I checked with GitHub Pages just now and it serves the files back with the correct content-type header for Turtle files (text/turtle). We need to switch the W3ID redirects from the https://raw.githack.com/ service to GH pages.

Over/Under specification of temporal values

A tern:Observation MUST have:

  • exactly 1 sosa:resultTime property indicating a xsd:dateTime literal value
  • exactly 1 sosa:phenomenonTime property indicating a tern:Instant

The sosa:resultTime is ok - literal dateTime(Stamp) value (see Issue $57) - but sosa:phenomenonTime is either over- or under-specified.

For sosa:phenomenonTime, a complex object, an instance of tern:Instant, is required as the range value but then that instance is itself required to have exactly 1 time:inXSDDateTimeStamp property with a literal value. What then is the purpose of requiring a complex object if it's always locked to a literal in that way? Why not just have:

  • exactly 1 sosa:resultTime property indicating a xsd:dateTime literal value
  • exactly 1 sosa:phenomenonTime property indicating a xsd:dateTime literal value

If the purpose of requiring an instance of tern:Instant is to allow for multiple ways of indicating temporarily, the tern:Instant property restriction should be removed. If a xsd:dateTime(Stamp) literal value is required, why the tern:Instant?

Separate Shapes from Ontology definitions

Currently the TERN Ontology uses a legitimate but restrictive form of Shapes declarations where the Shapes are bound to class definitions. E.g., for the class tern:Sample:

tern:Sample
  a owl:Class ;
  a sh:NodeShape ;
  ...
  sh:property [
      a sh:PropertyShape ;
      sh:path sosa:isResultOf ;
      sh:class tern:Sampling ;
      sh:minCount 1 ;
      sh:name "is result of" ;
      sh:nodeKind sh:IRI ;
    ] ;
  sh:property [
      a sh:PropertyShape ;
      sh:path sosa:isSampleOf ;
      sh:class tern:FeatureOfInterest ;
      sh:minCount 1 ;
      sh:name "is sample of" ;
      sh:nodeKind sh:IRI ;
    ] ;
.

This pattern, while technically OK in RDF, makes it hard to identify individual requirements because the Shapes aren't distinct from the ontology items, e.g. Class. So a Requirement for the BDR taken from this Shape but articulated in English, such as "All Samples must be the result of exactly one Sampling, indicated with a sosa:isResultOf property", can't be IDed. Again, not an issue in execution of the RDF but a pain for Requirements listing in Specification documents, test cases and so on.

A better pattern would be this:

tern:Sample
  a owl:Class ;
  ...
.

tern:SampleShape
  a sh:NodeShape ;
  sh:property
    tern:SampleIsResultOfShape ,
    tern:SampleIsSampleOfShape ;
.

tern:SampleIsResultOfShape
      a sh:PropertyShape ;
      sh:path sosa:isResultOf ;
      sh:class tern:Sampling ;
      sh:minCount 1 ;
      sh:name "is result of" ;
      sh:nodeKind sh:IRI ;
.

tern:SampleIsSampleOfShape
      a sh:PropertyShape ;
      sh:path sosa:isSampleOf ;
      sh:class tern:FeatureOfInterest ;
      sh:minCount 1 ;
      sh:name "is sample of" ;
      sh:nodeKind sh:IRI ;
.

With this form, I can indicate specific Shapes by their unique IRIs.

Also we - BDR builders - have a strong preference for separating ontology from Shapes. This is due to the way our BDR application layer (SURROUND Ontology Platform) manages different assets, so, if the above split can be implemented, could the Shapes all be in a separate file to the ontology, perhaps:

  • tern.ttl - ontology
  • tern.shapes.ttl - shapes

You can, of course, combine them for your tooling (e.g. the documentation tool used for https://linkeddata-dev.tern.org.au/tern-ontology).

This linked Shape/ontology patterning also prevents the reuse of Property Shapes, e.g. Observation, SiteVisit and FeatureOfInterest all have the following Shape:

  sh:property [
      a sh:PropertyShape ;
      sh:path <http://rdfs.org/ns/void#inDataset> ;
      sh:class tern:RDFDataset ;
      sh:maxCount 1 ;
      sh:minCount 1 ;
      sh:nodeKind sh:IRI ;
    ] ;

But this is re-articulated 3 time because, in current form, it can't be reused.

Best practices around non-persistent URIs without using blank nodes

Avoiding blank nodes simplify queries and downstream applications. When we validate data using SHACL, we produce sub-graphs to avoid loading the entire knowledge graph into memory. Having blank nodes complicates this and makes it very hard to scale as each target class for SHACL validation requires modification to the SPARQL query.

Our current solution is to replace all uses of blank nodes with a non-persistent URI. All individuals use the base URI just like persistent and named individuals except their local name is prefixed with an underscore (_).

Example:

http://linked.data.gov.au/dataset/ausplots-rangelands/_bc4a8844-cb51-49f6-837b-d5b8f884af6e
......................................................^....................................

The nice thing with this is that all individuals in our knowledge graph are now named, therefore we can retrieve data from a SPARQL endpoint at any granularity level as we wish (example use case - using a graph viewer like Ontodia). The downside to this practice is that we are using a non-standard way of representing unnamed individuals (blank nodes). And if people are unfamiliar with our practice, they may interpret it as a persistent name.

We need to document this somewhere.

What is the purpose of Sample geometry?

Since a Sampling has a geometry - where it was done - we can infer that that's where a Sample comes from. What then is the value in recording a location for the Sample? Additionally, I think it misleading to record Sample location as the sample may actually change its place (i.e. when I take it back to the lab).

I think best to require location for the Sampling (as discussed: either directly with geometry or indirectly via links to a Site or indirectly via other spatial relations to a Feature) and to infer initial location for a Sample from that and to either discourage, or best prevent, location for a Sample. directly.

Observation missing spatially indicator requirement

tern:Observation instances must have temporality indicated by both a sosa:phenomenonTime & sosa:resultTime property but seem not to require a spatiality indicator.

They are allowed to have one, via geo:hasGeometry but why is this not a requirement?

They are allowed to be linked to a Site via a SiteVisit but this is not required.

If spatiality of a tern:Observation instances is either to be given or to be inferred, perhaps by linking to a tern:Site then either or both should be required, not neither required.

Gaia and SURROUNd expect many Observations to not have explicit SiteVisit or even Site information so it doesn't seem wise to require an Observation → SiteVisit → Site chain, so we suggest that spatiality for an Observation should be mandated but not only via geo:hasGeometry but via any other spatial relations property, e.g. the Observation is geo:sfWithin SomeNamedLocation (i.e. not a Site).

Simplify the model by removing convenience predicates and inverse relationships

Simplify the model by removing convenience relationships from the primary entities to sites and site visits. This allows us to have single sources of truth when creating/transforming data into the RDF model.

Also, where possible, remove inverse relationships to form an opinionated way of creating the source data in RDF. This declutters the data graph and promotes interoperability between different data sources. It also makes delta updates to the data graph much easier. If required, inverse relationships can always be produced by materialised rules.

Further reading on best practices https://www.topquadrant.com/modeling-graph-relationships/

Violation of time:Instant + time:inXSDDateTimeStamp ontology requirements

The tern:Instant class is declared a subclass of time:Instant but the attached Shapes allow for values which violate time:Instant requirements.

time:inXSDDateTimeStamp has a range value of xsd:dateTimeStamp and yet the TERN ontology, when requiring use of this property for class tern:Instant allows for xsd:dateTimeStamp & xsd:dateTime:

tern:Instant
  a owl:Class ;
  a sh:NodeShape ;
  rdfs:label "Instant" ;
  rdfs:subClassOf time:Instant ;
  sh:property [
      a sh:PropertyShape ;
      sh:path time:inXSDDateTimeStamp ;
      sh:maxCount 1 ;
      sh:minCount 1 ;
      sh:name "in XSDDate time stamp" ;
      sh:or (
          [ sh:datatype xsd:dateTimeStamp ; ]
          [ sh:datatype xsd:dateTime ; ]
        ) ;
    ] ;
.

I'm pretty sure <x> time:inXSDDateTimeStamp "y"^^xsd:dateTime will fail in some validators derived from OWL TIME.

Discussion around specialised classes of `tern:Sample` and relationships to taxon identification

For tern:Sample, we should stick to the same definition as sosa:Sample. Currently, tern:Sample is the class that represents all physical samples. Create a specialised class tern:MaterialSample to represent physical samples. See Darwin Core Terms MaterialSample.

Depending on the Surveillance protocol, other specialised classes of tern:Sample may be required (if they have specific properties). E.g. tern:AnimalSample?

Site's transitively-required sosa:isResultOf is problematic

Site is a subclass of Sample which is a subclass of Result.

Result must have exactly one sosa:isResultOf property.

This is ontologically correct but difficult to work with: what is a sensible sosa:isResultOf for a permanent monitoring station, or even a temporary survey site?

The only sosa:isResultOf object allowances are Observation, Sampling or Attribute and the last seems odly-placed compared to the first two: the first two are temporal events, the last is something very different: an property of something else.

So what would make a sensible object value for a Sites's sosa:isResultOf property? Sampling events may be carried out with the Site as the FoI but I don't think the Site is a result of those Sampling instances. Is the Site established as a result of an Observation? Of what? An ultimate FoI perhaps that isn't a site itself? Attribute seems irrelevant unless the Site is a result of an Attribute of an ultimate FoI, but this seems very difficult...

Inconsistency of RDFDataset definition

The tern:RDFDataset class is defined as:

?? -- no definition given in the ontology!

Inferring its definition from void:Dataset and dcat:Distribution that it is a subclass of:

  • VoID Dataset: "A set of RDF triples that are published, maintained or aggregated by a single provider."
  • DCAT Distribution: "A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above)."

So we have something like:

A specific representation of a dataset that is a set of RDF triples that are published, maintained or aggregated by a single provider...

But now we have a problem as we can't both conform to VoID's "A set of RDF triples" and DCAT's "...multiple serializations that may differ in various ways, including natural language, media-type or format...". Even if we allowed only RDF formats, what about "A dataset might be available in ... [different forms of] ... temporal and spatial resolution, level of detail or profiles"?

Clearly we have a conflation of a conceptual package of information (VoID Dataset) and a resource (DCAT Distribution). Even though VoID's Dataset is RDF-locked, it's still not the same as a DCAT Distribution.

sosa:resultTime datatype limitations relative to data

Data that we have been investigating for the BDR in fields equivalent to dwc:eventDate, dwc:dateIdentified are not as precise to have a recorded time (just date). To fulfil the validation requirements on sosa:resultTime we need xsd:dateTime. Is there a particular default time we should implement to fulfil the requirement or can the datatype options be expanded to reflect the data precision more accurately?

Use `sosa:resultTime` once changes are accepted in SOSA

Original issue: #135
Change request: w3c/sdw-sosa-ssn#38
Original pull request: #137

Changes required:

  • Deprecate tern:resultDateTime
  • Provide deprecation warnings or failures in the form of property shapes to inform users to use sosa:resultTime instead of tern:resultDateTime
  • Update the sh:path value in the two property shapes tern-shacl:sosa-resultTime and tern-shacl:ObservationCollection-resultTime to sosa:resultTime
  • Update OWL restrictions on tern:Observation and tern:Sampling

The value of `tern:Attribute` should not be represented with a `tern:Result`

An act of sampling or observation produces a result whereas an attribute of a feature of interest is just a value. Therefore, the value of an Attribute should have a class representative of a value.

Proposal: create class tern:Value and fix up relationships for tern:Attribute

Question: how to model the current subclasses of tern:Result (tern:Boolean, tern:QuantitativeMeasure, etc)? They are specialisations of tern:Value but they may be a tern:Result. How to model the may part? Do we still need tern:Result?

Is SiteVisit required for things like subplots, quadrats and transects?

Currently, subplots, quadrats and transects of a sampling site are modelled as tern:Site. They are represented using dcterms:isPartOf and dcterms:hasPart.

Observations made within a tern:Site usually expect a relationship to a tern:SiteVisit. It is quite verbose to also have to declare these for the subplots, quadrats and transects within a main sampling site.

In these situations, should we create rules to ensure that if an activity (observation, sampling) has a relationship to a tern:Site then they should also have a relationship to a tern:SiteVisit. And the tern:SiteVisit must link to the top-level tern:Site.

Provide guidance for Agent roles

Biodiversity observations data tends to include Observation/Agent relations with named roles, for example this data that TERN has seen, https://github.com/ternaustralia/bdr-ibsa-sample-data/blob/master/source-data/fauna.csv, includes "Author" and Darwin Core includes the following properties:

  • "owner" (via dwc:ownerInstitutionCode)
  • "current holder" (via dwc:institutionCode)
  • "rights holder" (via dwc:rightsHolder)
  • "collector" (via dwc:recordedById)

...and perhaps others.

Can the TERN Ontology, or TERN in general, provide guidance on how to express these relationships? Should it be that dwc or other (which other, dcterms??) properties should be used and then prov:wasAssociatedWith will be inferred, e.g.:

ex:observation-x dwc:recordedById <http://orcid.org/1234-456-789> .

prov:wasAssociatedWith dwc:recordedById <http://orcid.org/1234-456-789>

etc??

If an agent roles vocab is preferred, where is it and how shall it be used?

Remove `tern:Instrument` and use `tern:Sampler` instead

tern:Instrument was created only to have a more idiomatic name than the Sampler class. In last week's meeting, the decision was to remove the tern:Instrument class and use tern:Sampler class directly.

  • Apply the constraints that are on tern:Instrument to tern:Sampler.
  • Rename the property tern:instrumentType to tern:samplerType.
  • Remove tern:usedInstrument and use sosa:madeBySampler instead.

See meeting notes for reference.

Inconsistency with PROV for wasAssociatedWith use

A tern:Observation may have a property prov:wasAssociatedWith which, according to TERN Shapes, may indicate a sh:IRIOrLiteral but PROV-O does not allow for literal use. From PROV:

:wasAssociatedWith
    ...
    rdfs:domain :Activity ;
    ...
    rdfs:range :Agent ;
    rdfs:subPropertyOf :wasInfluencedBy ;
    owl:propertyChainAxiom (:qualifiedAssociation :agent) ;
    ...
.

So you need to either restrict your Shape or invent a looser property.

Alignments to VoiD and DCAT for metadata

We have defined tern:RDFDataset as a subclass to dcat:Distribution and void:Dataset via tern:Distribution. tern:Dataset is a subclass of dcat:Dataset.

The individual properties are not finished yet. Basically what is available as basic metadata in void:Dataset will be moved to tern:Distribution, if they are shared.

Relationships between a Distribution and a Dataset

DCAT provides a way to relate Dataset and Distribution using dcat:distribution. But we also need a way to relate a Distribution or RDFDataset to Dataset.

It'd be nice to use dcat:dataset to relate from a Distribution to a Dataset, but its domain is defined as dcat:Catalog.

Diagram of alignment

diagram

nodeKind of geosparql:hasGeometry must be h:BlankNodeOrIRI not sh:IRI

Shapes like this:

  sh:property [
      a sh:PropertyShape ;
      sh:path geosparql:hasGeometry ;
      sh:class loc:Geometry ;
      sh:name "has geometry" ;
      sh:nodeKind sh:IRI ;
    ] ;

Don't allow us to do this:

<http://example.org/2>
    a tern:Site ;
    geo:hasGeometry [
        geo:asWKT "POINT (145.6075, -35.2383)" ;
    ] ;

So we need this restriction changed to include Blank Nodes.

Remove the use of tern-org

Remove tern-org to enable the use of more general data models which tern-org is based on.

E.g., PROV, schema.org, ORG.

GitHub PR checks don't run on forks

The GitHub PR checks run properly on a normal branching model but they do not get triggered on a forking model.

Need to investigate if it's a permission issue at the GitHub organisations level for ternaustralia, or if it's a permission issue on the repository level or if the workflow actions need to be updated to work for PRs from forks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.