aamedina / rdf Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 0.0 28.14 MB

The Universal Translator

License: ISC License

Clojure 99.93% Forth 0.07%

rdf's People

Contributors

Stargazers

Watchers

rdf's Issues

Use Methodical?

https://github.com/camsaul/methodical

Investigate Asami for MOP Bootstrapping environment

https://github.com/quoll/asami

I used XTDB when bootstrapping because I needed a datalog database without a schema so I could prepare one for Datomic based on the RDF models required by the user. XTDB also supports full-text search with Lucene which has proven useful for exploring datasets. However, Asami supports the open world assumption by default and supports transitive queries out of the box. Furthermore it supports CLJS. I think there is significant potential for being a superior bootstrapping environment than XTDB for RDF data in Clojure.

I think having the ability to choose a bootstrapping database for users would also be a good enhancement in general.

Use RDFa and JSON-LD vocabulary to describe all Clojure metadata

Problem:
Right now I use a made up attribute :rdf/ns-prefix-map to annotate a namespace with its prefixes. This is non-standard and can't be represented in RDF automatically.

Solution:
Instead, I think it should be something that I can embed in the :owl/Ontology instance directly with RDFa microdata combined with JSON-LD vocabulary.

{:rdf/type :owl/Ontology
 "@context" {"ex" "http://example.com/"}}

The "@context" key can be processed by JSON-LD processors to expand into a JSON-LD context.

Alternative:
I was also thinking of using a reader macro like #jsonld/Context {...} but to include that in the ontology is too verbose I think.

{:rdf/type :owl/Ontology
 :jsonld/context #jsonld/Context {"ex" "http://example.com/"}}

the reader macro would expand into something like:

{:rdf/type          :jsonld/Context
 :jsonld/definition [{:rdf/type    :jsonld/TermDefinition
                           :jsonld/term "ex"
                           :jsonld/id   {:rdfa/uri "http://example.com/"}}]}

Questions:
Does sh:declare fit into this?

Separate class finalization from Universal Translator

Problem:
The Universal Translator component does too much right now. If people want to use a metaobject protocol it isn't clear how to effectively configure it. I think it was useful to develop the MOP but now it presents a challenge introducing the technology to other people. I want to perform the minimum amount of metaobject inferencing when bootstrapping instead and move the full MOP (which does class finalization on every loaded OWL class) into its own component. I originally did write it like this but the cyclic dependencies of the RDF module and MOP module made it convenient to move everything into one library.

Solution:
The essential elements of the MOP I have found are:

compute-class-precedence-list / class-precedence-list
compute-slots / class-slots / class-direct-slots
class-direct-subclasses / class-direct-superclasses

In the MOP RDF vocabulary these are represented by the following RDF properties:

:mop/classPrecedenceList
:mop/classSlots / :mop/classDirectSlots
:mop/classDirectSubclasses

class-direct-superclasses just looks at :rdfs/subClassOf so it doesn't need to be materialized beyond RDFS.

The approach I am considering taking during bootstrap is as follows:

Gather your vocabulary by requiring Clojure namespaces that represent RDF models
Transact all of the vocabulary into Asami local storage "asami:local://.vocab"
Query Asami for all classes (not individuals)
Run the "poor mans" finalize in parallel materializing those essential properties per class.
Transact the updated metaobjects into Asami which becomes your RDF graph

Document RDF/EDN

Problem:
I need a way to serialize arbitrary RDF models as EDN.

Solution:
I use Apache Jena and Aristotle to read from some RDF source (could be JSON-LD, Turtle, RDF/XML etc.) and convert the triples into maps of subjects. The blank nodes are represented by subjects with a :db/id and named resources are given a :db/ident iff a prefix mapping exists at the time the RDF model is parsed by Jena. If no prefix mapping is found resources are identified by :rdfa/uri.

Typed literals are read as EDN tagged literals using the IRI of the datatype as the tag.

Language tagged literals are read as #rdf/langString "hello@en".

#rdf/List [:owl/Class :rdfs/Class]  
=> 
{:rdf/type :rdf/List,
 :rdf/first :owl/Class,
 :rdf/rest
 {:rdf/type :rdf/List, :rdf/first :rdfs/Class, :rdf/rest :rdf/nil}}

Subsumes #6

Document exporting RDF graph from Datomic

Problem:
It isn't obvious how to create an RDF graph out of a Datomic database.

Solution:
Document the process and offer a Clojure API that makes it possible to do this with one function given a database and arguments (time/transaction based filters?).

Questions:
Do I want to make this a conforming RDFa processor that operates on a "document" which is a datomic database?

Materialize "The Semantics of Schema Vocabulary" OWL RL subset for Datomic

Problem:
In order to get to OWL RL reasoning I need to bootstrap a datomic schema that will support the necessary datalog queries.

Proposed solution:
Implement the following inferences over the bootstrapping environment:

Semantics	If	Then
scm-cls	T(?c, rdf:type, owl:Class)	T(?c, rdfs:subClassOf, ?c) T(?c, owl:equivalentClass, ?c) T(?c, rdfs:subClassOf, owl:Thing) T(owl:Nothing, rdfs:subClassOf, ?c)
scm-sco	T(?c1, rdfs:subClassOf, ?c2) T(?c2, rdfs:subClassOf, ?c3)	T(?c1, rdfs:subClassOf, ?c3)
scm-eqc1	T(?c1, owl:equivalentClass, ?c2)	T(?c1, rdfs:subClassOf, ?c2) T(?c2, rdfs:subClassOf, ?c1)
scm-eqc2	T(?c1, rdfs:subClassOf, ?c2) T(?c2, rdfs:subClassOf, ?c1)	T(?c1, owl:equivalentClass, ?c2)
scm-op	T(?p, rdf:type, owl:ObjectProperty)	T(?p, rdfs:subPropertyOf, ?p) T(?p, owl:equivalentProperty, ?p)
scm-dp	T(?p, rdf:type, owl:DatatypeProperty)	T(?p, rdfs:subPropertyOf, ?p) T(?p, owl:equivalentProperty, ?p)
scm-spo	T(?p1, rdfs:subPropertyOf, ?p2) T(?p2, rdfs:subPropertyOf, ?p3)	T(?p1, rdfs:subPropertyOf, ?p3)
scm-eqp1	T(?p1, owl:equivalentProperty, ?p2)	T(?p1, rdfs:subPropertyOf, ?p2) T(?p2, rdfs:subPropertyOf, ?p1)
scm-eqp2	T(?p1, rdfs:subPropertyOf, ?p2) T(?p2, rdfs:subPropertyOf, ?p1)	T(?p1, owl:equivalentProperty, ?p2)
scm-dom1	T(?p, rdfs:domain, ?c1) T(?c1, rdfs:subClassOf, ?c2)	T(?p, rdfs:domain, ?c2)
scm-dom2	T(?p2, rdfs:domain, ?c) T(?p1, rdfs:subPropertyOf, ?p2)	T(?p1, rdfs:domain, ?c)
scm-rng1	T(?p, rdfs:range, ?c1) T(?c1, rdfs:subClassOf, ?c2)	T(?p, rdfs:range, ?c2)
scm-rng2	T(?p2, rdfs:range, ?c) T(?p1, rdfs:subPropertyOf, ?p2)	T(?p1, rdfs:range, ?c)
scm-hv	T(?c1, owl:hasValue, ?i) T(?c1, owl:onProperty, ?p1) T(?c2, owl:hasValue, ?i) T(?c2, owl:onProperty, ?p2) T(?p1, rdfs:subPropertyOf, ?p2)	T(?c1, rdfs:subClassOf, ?c2)
scm-svf1	T(?c1, owl:someValuesFrom, ?y1) T(?c1, owl:onProperty, ?p) T(?c2, owl:someValuesFrom, ?y2) T(?c2, owl:onProperty, ?p) T(?y1, rdfs:subClassOf, ?y2)	T(?c1, rdfs:subClassOf, ?c2)
scm-svf2	T(?c1, owl:someValuesFrom, ?y) T(?c1, owl:onProperty, ?p1) T(?c2, owl:someValuesFrom, ?y) T(?c2, owl:onProperty, ?p2) T(?p1, rdfs:subPropertyOf, ?p2)	T(?c1, rdfs:subClassOf, ?c2)
scm-avf1	T(?c1, owl:allValuesFrom, ?y1) T(?c1, owl:onProperty, ?p) T(?c2, owl:allValuesFrom, ?y2) T(?c2, owl:onProperty, ?p) T(?y1, rdfs:subClassOf, ?y2)	T(?c1, rdfs:subClassOf, ?c2)
scm-avf2	T(?c1, owl:allValuesFrom, ?y) T(?c1, owl:onProperty, ?p1) T(?c2, owl:allValuesFrom, ?y) T(?c2, owl:onProperty, ?p2) T(?p1, rdfs:subPropertyOf, ?p2)	T(?c2, rdfs:subClassOf, ?c1)
scm-int	T(?c, owl:intersectionOf, ?x) LIST[?x, ?c1, ..., ?cn]	T(?c, rdfs:subClassOf, ?c1) T(?c, rdfs:subClassOf, ?c2) ... T(?c, rdfs:subClassOf, ?cn)
scm-uni	T(?c, owl:unionOf, ?x) LIST[?x, ?c1, ..., ?cn]	T(?c1, rdfs:subClassOf, ?c) T(?c2, rdfs:subClassOf, ?c) ... T(?cn, rdfs:subClassOf, ?c)

Rethink and rewrite datomic schema inferencing

Problem:

Right now there are multimethods used to navigate the metaobject type hierarchy and while this works I find it cumbersome to write and reason about.

Solution:
Instead I think the user interface needs to be EDN that expands into multimethod implementations and "hide" the multimethods from the user. The hierarchical dispatch can still be useful. Another possibility is adding an interactive mode where if a schema attribute is ambiguous the user will be prompted to disambiguate.

Document approach to RDF datatype boxing

Problem:
Boxing RDF literals can be complex and confusing. Most of the time RDF literals are used with vocabularies declaring properties that are well-typed but occasionally you need to refer to a data value with an IRI which presents complications for Datomic attributes.

Solution:
Document the current approach which is based on creating a blank node with a single property: the IRI of the XSD datatype and the corresponding value.

:xsd/min(max)Inclusive(Exclusive) datomic schema should be a bigdec

Problem:
When I defined the XSD datatype schema for Datomic I used double's which forced me to special case use of these datatypes as properties for boxing RDF literal numeric values for database serialization. However, Datomic supports bigdec and I would like to uniformly represent :xsd/decimal as a bigdec when both reading and writing RDF.

Solution:
Update the schema definitions in net.wikipunk.rdf.xsd and the special case boxing logic in rdf.clj, and test bootstrapping the boot + ext JSON-LD context to verify behavior and catch edge cases.

Automate MOP bootstrap environment with embedded Blazegraph + OWL Axioms

dev system should use a vocabulary written to a Blazegraph journal
The environment should support full-text search over metaobjects in Blazegraph and not require re-initialization unless new ontologies are loaded by the user
Datafy w/materialization using embedded blazegraph via DESCRIBE and consider linked data use cases over HTTP
Need to figure out Blazegraph namespace management
Is it possible to SPARQL federate in-process over the namespaces?
Redesign the universal translator component around Blazegraph embedded capabilities, what can it enable when Blazegraph is available in process to a Clojure program with reasoning?
Datomic schema generation should be redesigned so that inference is done on a materialized ontology + RDF in process
Is it possible to remove all dependencies EXCEPT blazegraph and Clojure outside of the main library?

Fully specify what a bootstrapping environment is & MOP vocabulary

Problem:
The mop/*env* is a dynamic variable that specifies an environment where metaobjects are resolved during processing. If it is nil, the assumption is that you are searching required Clojure namespaces for the RDF terms that have names. Usually it is bound to either an XTDB node or Datomic database.

While convenient having this be a dynamic variable complicates many aspects of the protocol since the implementation of each method depends on the class of the environment.

Furthermore I think this is a good time to start thinking about what a fully specified MOP ontology in OWL would look like.

Consider :mop/descendants, :mop/ancestors, :mop/parents

I was thinking maybe I should persist the multimethod hierarchy by attaching them to each entity to enable restoration from a versioned database in the cloud. For example, :schema/Movie:

(parents :schema/Movie)
;;=>
#{:schema/CreativeWork :rdfs/Class}

Could be persisted by attaching :mop/parents to the :schema/Movie entity and transacting that into the db.

Then restoration would be: construct the multimethod hierarchy by querying the database for :mop/parents etc and reduce the results into {:parents {:schema/Movie ...}}.

Document RDFa initial context & JSON-LD context design

I need to write documentation explaining that the net.wikipunk.boot namespace contains the RDFa initial context for "wikipunk" RDF processors.

net.wikipunk.ext contains an extended context that may be included if desired.

The design concept is to treat Clojure data structures, which either describe data themselves or contain metadata in source, as potential sources of RDF triples. Using RDFa vocabulary I can annotate Clojure data and retrieve embedded triples. In order to do this I needed to settle on some vocabulary that I could rely on the processor being "aware" of so the RDFa context fits the bill.

Usually these prefix mappings exist as blank nodes in a graph, but how do you figure out what they are before you have a graph? In order to sort through the potential mappings I treat Clojure namespaces with an :rdf/type of :jsonld/Context as places where these prefix or term mappings reside. Any required namespaces with this type will be searched for mappings when the system is started.

aamedina / rdf Goto Github PK

rdf's People

Contributors

Stargazers

Watchers

rdf's Issues

Recommend Projects

Recommend Topics

Recommend Org