Giter Site home page Giter Site logo

pySHACL for yaml? about pyshacl HOT 5 CLOSED

rdflib avatar rdflib commented on June 12, 2024 1
pySHACL for yaml?

from pyshacl.

Comments (5)

vsoch avatar vsoch commented on June 12, 2024 1

hey @ashleysommer and @nicholascar, thank you for your prompt and detailed responses! First I'll quickly answer your questions.:

This is really interesting stuff you're working on. If I'm understanding correctly, you're using map2model to generate specification documents in YAML. These specification documents are used to validate schema.org content, but you're looking for a way to validate the YAML specification documents themselves. Please correct me if I've misunderstood that.

I'm generating the specification documents using the greatly modified code from the original map2model provided by the openschemas library that I'm developing. The README in the map2model repo here now just serves as an example for the module here. But the user doesn't need to know the complexities of any library or installation, I've created a docker container and associated CI templates so you can literally just drop some Google Drive exported files into a folder and have it run for you in continuous integration. See an example builder template.

YAML to JSON-LD: I know YAML can be converted to JSON easily, but I'm not sure if there are tools which can convert your YAML spec documents into semantically valid JSON-LD.

I think this must be possible, because I know that the bioschemas team has (at least it seems like they have) generated json-ld in their repos! Given the need for json-ld, this is something I can add to openschemas. I've created an issue to do that here.

pySHACL can only validate against SHACL Shape files, with SHACL constraints. You would need a way of converting the rules in your criteria.yaml file to SHACL shapes and SHACL constraints, before you can run the validator against your target.

I think I could totally do this too! I do need to learn more about SHACL though - so I'll need some time to do my homework and read up. Is the best resource the publication, or is there any friendly "getting started" guide? I created an issue for that here as well.

One of our main drivers for creating pySHACL was to actually to allow easier validation of schema.org content, so this is of interest to us. There is already a SHACL Shapes file for Schema.org here: http://datashapes.org/schema (in RDF format) which should be able to validate against normal schema.org data (in JSON-LD format), though we haven't tested that out yet.

This is hugely helpful! My goal as well is to make it easy to write checks / criteria. Today I finished up a first go at a validator where you write the criteria in json (here is the base set for a simple specification) where the function key there is in reference to a python function in the library. But it doesn't have to be in that library, and it doesn't have to be that particular specification.yml! This model makes it easy to write custom validation files, or edit them, and there is even a BasicValidator that doesn't assume a schema yaml at all, and takes the stance of "validate some input file against some yaml specification." I'll have better documentation for this soon, but a basic example looks like this:

image

That Container.html is the first specification I'm developing, the one from the spec-container that I linked above. And I find it very funny that I missed so many fields and it's invalid, haha. So what I'm going to do next is add this in continuous integration to the specifications repo so anyone contributing a new one will have these checks! And if there is json-ld generated too I hope I can figure out the pySHACL here so we can add a check for that file too.

@nicholascar I definitely agree about the development here being an extension too - and I'd make it part of openschemas.

Anyway - if it's okay with you it would be helpful to leave this issue open so I can come back with questions as I'm learning these things! Thank you again for your help and interest, it's been super fun so far building all these tools!

from pyshacl.

ashleysommer avatar ashleysommer commented on June 12, 2024

Hi Vanessa, thanks for creating this issue to share your idea and questions.

This is really interesting stuff you're working on. If I'm understanding correctly, you're using map2model to generate specification documents in YAML. These specification documents are used to validate schema.org content, but you're looking for a way to validate the YAML specification documents themselves. Please correct me if I've misunderstood that.

I think pySHACL might be able to do what you want, but there are some things that to be in place
before that can happen.

  • YAML to JSON-LD: I know YAML can be converted to JSON easily, but I'm not sure if there are tools which can convert your YAML spec documents into semantically valid JSON-LD.
  • pySHACL can only validate against SHACL Shape files, with SHACL constraints. You would need a way of converting the rules in your criteria.yaml file to SHACL shapes and SHACL constraints, before you can run the validator against your target.

One of our main drivers for creating pySHACL was to actually to allow easier validation of schema.org content, so this is of interest to us. There is already a SHACL Shapes file for Schema.org here: http://datashapes.org/schema (in RDF format) which should be able to validate against normal schema.org data (in JSON-LD format), though we haven't tested that out yet.

from pyshacl.

nicholascar avatar nicholascar commented on June 12, 2024

Good topic: I suggest that spporting YAML would be an extension to pySHACL, not within the core. The reason is that pySHACL is "as per the spec", i.e. RDF validation by shapes graphs, and I think it should stay there. However, in the same way pySHACL builds on rdflib (and OWL-RL), pySHACL-for-YAML or pySHACL-for-X could be added quite nicely.

As Ahsley's already said, there would need to be a defined way of adding the namespacing context to YAML to be able to use SHACL at all and this may be hard or, if not hard, not widely recognised. If you already have a deterministic YAML/JSON-LD mapping then this extension is just a format converter for the data graph being validated as long as you have your validating shapes graph in RDF. I you want both your data and validating shapes graph in YAML then you'll need to alow that YAML/JSON-LD converter to work on both graphs.

If you don't yet have a YAML/JSON-LD converter, this will be a bit harder as you'll likely have to implement JSON-LD context mechanics in YAML, and check you've not missed other parts of the JSON-LD or RDF specs perhaps unsupported in YAML.

from pyshacl.

ashleysommer avatar ashleysommer commented on June 12, 2024

Closing this issue. Please reopen if there is any further development in this area.

from pyshacl.

vsoch avatar vsoch commented on June 12, 2024

Thanks anyway, the discussed died so I pursued my goals not involving these tools. For future readers of this post, you can use the schemaorg python module that has a validator for this purpose.

from pyshacl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.