Giter Site home page Giter Site logo

vemonet / shapes-of-you Goto Github PK

View Code? Open in Web Editor NEW
36.0 10.0 6.0 74.04 MB

๐Ÿ’  An index for linked open data & standard knowledge descriptions (ontologies, vocabularies, shapes, queries, mappings)

Home Page: http://index.semanticscience.org

License: MIT License

CSS 0.20% TypeScript 51.10% Dockerfile 0.41% JavaScript 0.06% Python 46.56% Shell 1.68%
shacl-shapes shapes shacl registry shex sparql grlc owl ontologies skos

shapes-of-you's Introduction

Access standard knowledge indexed from code repositories, connected to the Linked Open Data access points

Deploy to GitHub Pages CodeQL analysis

๐Ÿ–ฅ Access the web app at index.semanticscience.org

๐Ÿ“ฌ Query our knowledge graph using the OpenAPI at grlc.io/api-git/vemonet/shapes-of-you/subdir/api (powered by grlc.io and SPARQL)

โœจ Directly query the SPARQL endpoint on YASGUI at https://graphdb.dumontierlab.com/repositories/shapes-registry.

The SPARQL endpoint is also conveniently accessible in the webapp Active endpoints tab, since Shapes of You indexes its own SPARQL query files, and computes metadata for its SPARQL endpoint.

Shapes of you is a global index for semantically descriptive files published to public Git repositories (GitHub, GitLab, and Gitee), it enables semantic web enthusiast to connect those standard knowledge definitions to active Linked Open Data access points (SPARQL endpoints).

To be found by our indexer, make sure your repository description, or topics, on GitHub, GitLab, or Gitee includes one of the resources mentionned below, we automatically index files from public repositories every week on saturday at 1:00 GMT+1 ๐Ÿ•

  • SHACL shapes: we index RDF files such as .ttl, .rdf, .jsonld, etc), with all sh:NodeShape they contain
  • ShEx expressions: we index .shex files, and ShEx shapes defined in RDF files
  • SPARQL queries: we index .rq and .sparql files, and parse grlc.io APIs metadata
  • OWL ontologies: we index all RDF files with all owl:Class they contain
  • SKOS vocabularies: we index all RDF files with all skos:Concept they contain
  • RML mappings: we index RDF files, with all r2rml:SubjectMap and rml:LogicalSource they contain
  • R2RML mappings: we index RDF files, with all r2rml:SubjectMap they contain
  • CSVW metadata: we index RDF files, with all csvw:Column they contain
  • Nanopublication templates: we index RDF files, with all nt:AssertionTemplates and inputs they contain
  • OBO ontologies: we index all .obo files with all terms they contain
  • OpenAPI specifications: we index .yml, .yaml and .json files, and parse the spec to retrieve API metadata
  • DCAT datasets: we index RDF files, with all dcat:Dataset they contain

If your repository or endpoint is missed by our indexer:

Technical overview ๐Ÿงญ

This web service is composed of those 4 main parts, described more in details below:

  • A python script to retrieve SPARQL queries, SHACL & ShEx Shapes files with some metadata from GitHub repositories. The retrieved data is defined using RDF.
    • A GitHub Actions workflow runs every week on saturday night to execute the python script, and publish the RDF output to the triplestore
  • A React web app written in TypeScript, which displays the files and metadata from the SPARQL endpoint with filters, and search
  • A triplestore with a publicly available SPARQL endpoint at https://graphdb.dumontierlab.com/repositories/shapes-registry
  • A grlc.io powered OpenAPI to query the SPARQL endpoint at http://grlc.io/api-git/vemonet/shapes-of-you
    • Most SPARQL queries used by the webapp are also provided as API calls

Shapes of You architecture


Data model ๐Ÿ“‹

We defined and published a simple schema for our data as a OWL ontology, mainly re-using schema.org concepts.

Checkout the OWL ontology in website/assets/shapes-of-you-ontology.ttl ๐Ÿฆ‰

Here is an overview of the ontology (generated by gra.fo):

Ontology overview

Prefixes

Just copy/paste this if you are missing some prefixes to query the Shapes of You knowledge graph:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX sio: <http://semanticscience.org/resource/SIO_>
PREFIX schema: <https://schema.org/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX shex: <http://www.w3.org/ns/shex#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
PREFIX sdm: <https://w3id.org/vocab/sdm#>
PREFIX r2rml: <http://www.w3.org/ns/r2rml#>
PREFIX rml: <http://semweb.mmlab.be/ns/rml#>
PREFIX nt: <https://w3id.org/np/o/ntemplate/>
PREFIX csvw: <http://www.w3.org/ns/csvw#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

Classes

  • "Shape" files: schema:SoftwareSourceCode
    • Properties:
      • dcterms:hasPart
      • rdfs:comment
      • schema:codeRepository > schema:DataCatalog
    • Subclasses:
      • sh:Shape (SHACL shape)
      • shex:Schema (ShEX schema)
      • sh:SPARQLFunction (SPARQL query) - additional properties: void:sparqlEndpoint, schema:query
      • owl:Ontology (OWL ontology)
      • skos:ConceptScheme (SKOS vocabulary)
      • sio:000623 (OBO ontology)
      • schema:APIReference (OpenAPI)
      • rml:LogicalSource (RML and YARRRML mappings)
      • r2rml:TriplesMap (R2RML mappings)
      • nt:AssertionTemplate (Nanopublication templates)
      • dcat:Dataset (DCAT datasets)
  • Git repositories: schema:DataCatalog
    • Properties:
      • rdfs:comment
  • Active SPARQL endpoints:schema:EntryPoint

Run the web app ๐Ÿ›ฉ๏ธ

Requirements: npm and yarn installed.

In development ๐Ÿ—

Clone the repository:

git clone https://github.com/vemonet/shapes-of-you
cd shapes-of-you

Install dependencies ๐Ÿ“ฅ

yarn

Run the web app on http://localhost:19006, it should reload automatically at each changes to the code ๐Ÿ”ƒ

yarn dev

Upgrade the packages versions in yarn.lock ๐Ÿ”’

yarn upgrade

In production ๐ŸŒ

This website is automatically deployed by a GitHub Actions workflow to GitHub Pages which is accessed from http://index.semanticscience.org

You can also build locally in the /web-build folder and serve on http://localhost:5000 (checkout the Dockerfile)

yarn build
yarn serve

Deploy the backend

Deploy the Oxigraph triplestore and ElasticSearch index using Docker ๐Ÿณ (requires docker installed)

  1. Make sure the folder for ElasticSearch has the right permissions
mkdir -p /data/shapes-of-you/elasticsearch
sudo chown -R 1000:0 /data/shapes-of-you/elasticsearch
  1. Deploy the stack
docker-compose up -d

Checkout the docker-compose.yml file to see how we run the Docker image.


โ›๏ธ Index structured and semantic files

Requirements: Python 3.6+, git

๐Ÿ—ƒ๏ธ Index files from code repositories

This script is run every day by the mighty .github/workflows/index-shapes.yml workflow

The Python script retrieves shapes files from various popular Git services API (GitHub GraphQL API, GitLab API , Gitee API), and generates RDF data. The RDF data is then automatically published to the publicly available triplestore by the GitHub workflow.

You can find the python scripts and requirements in the etl folder.

Use this command to locally define the API_GITHUB_TOKEN, GITLAB_TOKEN and GITEE_TOKEN environment variables required to run the script (you might need to adapt on Windows, but you should know better than me):

export API_GITHUB_TOKEN=MYGITHUBTOKEN000
export GITLAB_TOKEN=MYGITLABTOKEN000
export GITEE_TOKEN=MYGITEETOKEN000

Add those commands to your .zshrc or .bashrc to make it permanent

For GitHub you can create a new GitHub API key (aka. personal access token) at https://github.com/settings/tokens

Go to the etl folder:

cd etl

Install the requirements:

pip install -e .

Retrieve shapes files from search the GitHub GraphQL API (you can also use a topic to search, e.g. topic:sparql):

python3 main.py github vemonet/shapes-of-you

Retrieve shapes files from GitLab API using the python-gitlab package:

python3 main.py gitlab sparql

Retrieve shapes files from Gitee API:

python3 main.py gitee ontology

โœจ Generate SPARQL endpoints metadata

This task is performed every day by the swifty .github/workflows/analyze-endpoints.yml workflow

We use the d2s tool (aka. data2services) to generate HCLS metadata for a SPARQL endpoint:

pip install d2s
d2s metadata analyze https://graphdb.dumontierlab.com/repositories/shapes-registry -o metadata.ttl

We commit the generated metadata file to the metadata branch, to experiment using git to version and keep track of changes of the metadata generated for the SPARQL endpoints over time.

Enable Virtuoso Linked Data Platform

Enable WebDAV LDP on Virtuoso 7 (from the official Virtuoso documentation)

Start the virtuoso-opensource-7 docker image

docker-compose up -d

The first time you start Virtuoso, or after you reset the database, you will need to run this script to prepare the Linked Data Platform:

./prepare_virtuoso.sh

To prepare for shapes-of-you, create folders github, gitlab, gitee, apis and endpoints using the same owner and permission as for the ldp folder.

Test by uploading a turtle file to the LDP (change the password before):

curl -u ldp:$ENDPOINT_PASSWORD --data-binary @shapes-rdf.ttl -H "Accept: text/turtle" -H "Content-type: text/turtle" -H "Slug: test-shapes-rdf" https://data.index.semanticscience.org/DAV/home/ldp/github

Enable CORS to query the Virtuoso SPARQL endpoint from JavaScript. See the Virtuoso CORS documentation.

  • Go to Web Application Server > Virtual Domains & Directories
  • Expand Interface for the Default Web Site
  • Locate the /sparql Logical Path > click Edit
  • Enter \* in the Cross-Origin Resource Sharing input field.

๐Ÿ‘ฉโ€๐Ÿ’ป Contribute

Contributions are welcome! See the guidelines to contribute.

๐Ÿค Acknowledgements

RDF data hosted in a Oxigraph triplestore (open source)

OpenAPI powered by grlc.io

SPARQL query UI powered by Triply's YASGUI

Ontology built with gra.fo

Data processing workflows run for free using GitHub Actions open source plan

Files parsed using python libraries: rdflib, obonet, prance

shapes-of-you's People

Contributors

daniel-mietchen avatar dependabot[bot] avatar egonw avatar vemonet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shapes-of-you's Issues

Add a new page for each file

Describe the bug/improvement

When clicking on a file, it opens a new page given more informations about the file content and metadata:

  • Retrieve the file metadata and concepts in our graph
  • Download the file, parse it with JavaScript, and display additional relevant infos? e.g. metadata, classes, properties, instances, subClassOf hierarchy...

Implement SOLID

  • Implement SOLID login
  • Store user preferences to their SOLID pod (star shapes)
  • Retrieve user preferences from their SOLID pod
  • Enable user to choose to make their user preferences private (default) or public

Checkout the SOLID React SDK docs: https://docs.inrupt.com/developer-tools/javascript/react-sdk/

Seems only to be UI components to edit the SOLID user profile

Try: https://github.com/solid/react-components (and https://github.com/solid/query-ldflex)

solid.data['https://ruben.verborgh.org/profile/#me']['https://maastrichtu-ids.github.io/shapes-of-you#preferences'] = 1

Replace Like by Star? https://github.com/solid/react-components#-social-interactions

Use this URI to store preferences in SOLID pod: https://maastrichtu-ids.github.io/shapes-of-you#preferences

Example to create a custom Star button:

star.tsx:

import { customActivityButton } from './ActivityButton';

/** Button to view and perform a "Star" action on an item. */
export default customActivityButton('ShapeStar', 'Star', 'You starred', 'Starred');

Then star it:

<Star object="https://shape-file-url">Star Icon</Star>

Fix indexing issues

python3 etl/index_shapes.py gitlab shacl-shapes

Found 3 relevant SPARQL endpoints, but encoding error

Returns:

INFO:[04/19/2021, 09:55:12] SPARQL endpoint failed: http://digital-agenda-data.eu/data/sparqlbyte indices must be integers or slices, not str
INFO:[04/19/2021, 09:55:21] SPARQL endpoint failed: http://el.dbpedia.org/sparqlEndPointInternalError: endpoint returned code 500 and response. Response:b'Virtuoso 42000 Error The estimated execution time 2928956 (sec) exceeds the limit of 900 (sec).\n\nSPARQL query:\nSELECT * WHERE { ?s ?p ?o } LIMIT 10'
INFO:[04/19/2021, 09:55:25] SPARQL endpoint failed: http://cr.eionet.europa.eu/sparqlbyte indices must be integers or slices, not str

Report what tests have been run on a resource

Reporting of tests done

It would be really nice if the GUI would report what tests have been run on a resource, similar to what YummyData does, so that we can learn how resources can be made more FAIR.

Categorize resources

Describe the bug/improvement

Filtering/categorizing files, repositories and SPARQL endpoints based on categories?

Perform community detection?

Improve scalability of the user interface

Describe the bug/improvement

The user interface currently load the 60k+ files in a simple JSON file, which can slow or freeze the user computer

To be able to properly scale it would be interesting to look into using a local database, the IndexedDB w3c recommendation and Dexie JS seems to be good candidates

Relevant links

Implement better filtering

Improve repositories and shapes files display and filtering:

  • Add faceted search with categories? (biomedical, organization, etc)
  • Group shapes files under their respective github repository?
  • Enable to sort files by date of last update?

Make it a React component easy to reuse in other apps?

We have a use-case with multiple types of inputs for the different facets (checkboxes for file types, MaterialUI autocomplete for repositories, text box for full text search), and we want the options (e.g. filter displayed repositories based on the search input)

We could not find a good dynamic faceted search, the most relevant options for React are:

Currently we prefer to implement it ourself (search for "faceted" in the SemanticIndex page). But it could be tricky when we will start to have more categories to filter on.

Use Virtuoso instead of GraphDB as triplestore

Use Virtuoso webDAV to easily upload RDF files through HTTP.

We might need to play around with the configuration as Virtuoso current version (7) does not include LDP capabilities out-of-the-box

GitHub repo with some instructions to enable LDP in Virtuoso: https://github.com/markwilkinson/to_build_virtuoso_ldp
With the corresponding image on DockerHub: https://hub.docker.com/r/markw/ldp_server

Note: delete a file from the webDAV with iSQL

select DB.DBA.DAV_DELETE ('/DAV/home/dba/rdf_sink/gitee.ttl', 0, 'dba', 'dba');

Trying with current Virtuoso deployment

Create a folder:

curl -iX PUT -u dav:password -H 'Content-Type: text/turtle' https://data.index.semanticscience.org/DAV/home/dav/rdf_sink/test-folder

Upload with HTTP:

curl --verbose -iX PUT -F "[email protected]" -u dav:password -H 'Content-Type: text/turtle' https://data.index.semanticscience.org/DAV/home/dav/rdf_sink/shapes-of-you.ttl

Error (even if the prefix is properly define in the file):

<p>TURTLE RDF loader, line 6: SP029: TURTLE RDF loader, line 6: Undefined namespace prefix at ns1:EntryPoint</p>%

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.