Giter Site home page Giter Site logo

prolog_rdf's Introduction

Prolog-based RDF library

This library provides advanced support for working with RDF in Prolog.

Dependencies

Install SWI-Prolog.

Installation

Install this library:

swipl -g 'pack_install(prolog_rdf)' -t halt

Use

Libraries can be loaded in the following way:

:- [library(rdf_term)].

RDF-specific Prolog types

This library uses the following extended Prolog types in the documentation headers of predicates:

Type Definition
rdf_bnode An atom that starts with _:.
rdf_graph Either a term of type iri or the atom default.
rdf_iri An atom that can be decomposed with uri_components/2 from library(uri).
rdf_literal A compound term of the form literal(type(iri,atom)) or literal(lang(atom,atom)).
rdf_name An RDF name (IRI or literal).
rdf_quad A compound term of the form rdf(rdf_nonliteral,iri,rdf_term,rdf_graph).
rdf_term An RDF term (blank node, IRI, or literal).
rdf_triple A compound term of the form rdf(rdf_nonliteral,iri,rdf_term).
rdf_tuple A term of type rdf_quad or rdf_triple.

Modules

This section enumerates the various modules that are included in this library.

library(rdf_clean)

This module contains data cleaning predicates that were previously part of LOD Laundromat. They can be used to clean RDF tuples that are streamed from an RDF source. See module [[semweb/rdf_deref]] for creating streams over RDF sources.

In order to use this module, library prolog_uriparser must be installed.

Blank node cleaning

The parsers in the Semantic Web standard library emit blank node labels that contain characters that are not allowed in standards-compliant output formats (e.g., forward slashes). This is unfortunate, since writing the data into standard-compliant formats requires maintaining a state that ensures that Prolog internal blank node labels are consistently emitted by the same standard-compliant external blank node label. See this Github issue for context.

Besides the above considerations, blank nodes form a scalability issue in general. Since blank node labels are only guaranteed to be unique within the context of an RDF document, combining data from multiple documents requires a check of all blank node labels in the to be combined documents. Furthermore, all blank node labels that appear in more than one RDF document must be consistently renamed prior to combining the data.

Since Pro-RDF focusses on scalability, it cannot rely on maintaining an internal state that consistently maps internal Prolog blank node labels to external standards-compliant blank node labels. For the same reasons, it also cannot rely on full document inspection and blank node relabeling approaches. For these reasons, the data cleaning prediates in library(rdf_clean) replace blank nodes with well-known IRIs, in line with the RDF 1.1 standard. This means that every data cleaning predicate must bind a valid well-known IRI to the BNodePrefix argument. It also means that Prolog internal blank node labels are hashed using the MD5 algorithm to provide the local names for the generated well-known IRIs. The latter ensures consistent relabeling without maintaining an internal state.

Graph cleaning

The parsers from the Semantic Web standard library denote the default graph with atom user. This is translated to atom default. For named graphs, this library checks whether they are well-formed IRIs.

IRI cleaning

IRI cleaning is the most difficult part of syntactic RDF data cleaning. To date, the IRI grammar (RFC 3987) has not yet been implemented. Since this grammar was published over a decade ago, we must anticipate a future in which the main syntactic component of the Semantic Web cannot be validated.

While there are implementations of the URI grammar (RFC 3986), the one provided by the SWI-Prolog standard library (library(uri)) is incorrect.

Because of the above two reasons we currently only check the following:

  • Whether an IRI can be decomposed into scheme, authority, path, query, and fragment components using the Prolog standard library grammar (uri_components/2).

  • Whether the scheme, authority, and path components are non-empty.

  • Whether the scheme components conforms to the IRI grammar.

Literal cleaning

For language-tagged strings, cleaning involves downcasing the language tag. While there are implementations of the language tag grammar (RFC 5646), we are not yet using these.

Simple literals, i.e., literals with neither language tag not datatype IRI, are translated to typed literals with datatype IRI xsd:string.

For typed literals, cleaning involves:

  • Cleaning the datatype IRI (see [[IRI cleaning]]).

  • Making sure the datatype IRI is not rdf:langString.

  • Cleaning the lexical form according to the datatype IRI. Lexical form cleaning is the most involved step, since there are many different datatype IRIs. Since it is impractical to implement lexical form cleaning for all datatype IRIs, we focus on those that are most widely used. For this we use rdf_literal_value/3, which is part of library library(semweb/rdf_term).

Predicates

This module provides the following predicates.

rdf_clean_quad(+Site:uri, +Dirty:rdf_quad, -Clean:rdf_quad)

Cleans quadruple compound terms.

rdf_clean_triple(+Site:uri, +Dirty:rdf_triple, -Clean:rdf_triple)

Cleans triple compound terms.

rdf_clean_tuple(+Site:uri, +Dirty:rdf_tuple, -Clean:rdf_tuple)

Cleans quadruple and/or triple compound terms.

library(rdf_deref)

This module implements RDF dereferencing, i.e., the act of obtaining interpreted RDF statements based on a given RDF document, stream, or HTTP(S) URI.

Predicates

This library provides the following predicates.

rdf_deref_file/[2,3]

Calls RDF dereferencing on local RDF documents. Uses heuristics in order to determine the RDF serialization of the file.

rdf_deref_stream/[3,4]

Performs RDF dereferencing on an input stream containing one of the standardized RDF serialization formats.

rdf_dered_uri/[2,3]

Performs RDF dereferencing on a URI, typically an HTTP(S) URI. Uses heuristics in order to determine the RDF serialization of the reply body.

library(rdf_dot)

This library provides primitives for generating GraphViz DOT exports of RDF terms and tuples.

This module requires library prolog_graphviz to be installed.

library(rdf_export)

This module writes RDF data in a simple and standards-compliant serialization format. It contains the following predicates:

  • rdf_write_iri/2
  • rdf_write_literal/2
  • rdf_write_name/2
  • rdf_write_quad/[2,3,5]
  • rdf_write_triple/[2,4]
  • rdf_write_tuple/2

library(rdf_guess)

This module peeks at the beginning of a file, stream, or string in order to heuristically guesstimate the RDF serialization formats (if any) containing in that input:

  • rdf_guess_file/3
  • rdf_guess_stream/3
  • rdf_guess_string/2

library(rdf_media_type)

This module provides support for the standardized RDF serialization format Media Types:

rdf_file_name_media_type/2

Guesses the RDF serialization format based on the file name extension alone.

rdf_media_type/1

Enumerates all standardized RDF Media Types.

'rdf_media_type_>'/2

Succeeds if the former argument is an RDF Media Type that syntactically encompasses the latter argument (e.g., TriG > Turtle > N-Triples, N-Quads > N-Triples).

rdf_media_type_extension

Gives a standard file name extension for RDF serializations that are not RDFa (which is part of HTML or XHTML content).

rdfa_media_type/1

Succeeds for RDFa Media Types.

library(rdf_prefix)

This module provides extended support for working with RDF prefix declarations:

rdf_prefix/[1,2]

Enumerates the currently declared RDF prefix declarations.

rdf_prefix_any/2

rdf_prefix_append/[2,3]

rdf_prefix_iri/[2,3]

Succeeds for (alias,local-name) pairs and full IRIs.

rdf_prefix_maplist/[2,3]

rdf_prefix_member/2

rdf_prefix_memberchk/2

Provide the corresponding popular Prolog predicates, but apply RDF prefix notation expansion on their arguments.

RDF prefix expansion must be specifically declared for arguments in predicates. In the SWI-Prolog standard libraries, such declarations have only been added for predicates in the Semantic Web libraries, but not for predicates in other standard libraries. For example, the following will not check whether P is bound to either of the four RDFS properties, because the prefix notation is not expanded:

memberchk(P, [rdfs:domain,rdfs:range,rdfs:subClassOf,rdfs:subPropertyOf]),

With the SWI-Prolog standard library, the above call must be spelled out using rdf_equal/2 in the following way:

(   rdf_equal(P, rdfs:domain)
->  true
;   rdf_equal(P, rdfs:range)
->  true
;   rdf_equal(P, rdfs:subClassOf)
->  true
;   rdf_equal(P, rdfs:subPropertyOf)
->  true
)

When library(rdf_prefix) is loaded, the above can be written as follows:

rdf_prefix_memberchk(P, [rdfs:domain,rdfs:range,rdfs:subClassOf,rdfs:subPropertyOf]),

rdf_prefix_selectchk/3

rdf_prefix_term/2

rdf_register_prefix/[1-3]

rdf_register_prefixes/0

library(rdf_print)

This module provides DCG rules for printing RDF terms and tuples.

library(rdf_term)

This module provides advanced support for composing, decomposing, parsing, and generating RDF terms.

prolog_rdf's People

Contributors

wouterbeek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prolog_rdf's Issues

Re-implement updating datetime literals.

    rdf_update_now/3, % +Term:rdf_term
                      % +Predicate:iri
                      % +Graph:atom
    rdf_update_today/3 % +Term:rdf_term
                       % +Predicate:iri
                       % +Graph:atom
%! rdf_update_now(+Term:rdf_term, +Predicate:iri, +Graph:atom) is det.

rdf_update_now(S, P, Graph):-
  rdf_retractall_datatype(S, P, xsd:dateTime, Graph),
  rdf_assert_now(S, P, Graph).


%! rdf_update_today(+Term:rdf_term, +Predicate:iri, +Graph:atom) is det.

rdf_update_today(Term, P, Graph):-
  rdf_retractall_datatype(Term, P, xsd:date, Graph),
  rdf_assert_today(Term, P, Graph).

Depends on #3 .

Add support for Geography Markup Language (GML)

Module semweb/rdf_geo currnetly supported Well-Known Text (WKT), but not Geography Markup Language (GML), which is also part of the GeoSPARQL standard. Let's add this support to implement more of this standard.

Merge plRdf_api(rdfs_read)

Merge plRdf_api(rdfs_read) with entailment modules or with semweb/rdfs.

Implements backward chaining but unsure about correctness, completeness.

Add a module for operations on RDF graphs

This module should incluce the following predicates:

  • rdf_isomorphic_graphset/2 determines whether two RDF graphs are
    isomorphic or not, based on the Prolog notion of a /variant/.

  • rdf_triples_graphset/2 translates a list of triples into an
    ordered graph set.

How to run tests?

I see that there are some tests in the test directory. I would like to know how to run them (could be documented in the README).

Re-implement RDF literal updating.

:- module(
  rdf_literal_build,
  [
    rdf_update_literal/7 % ?Subject:oneof([bnode,iri])
                         % ?Predicate:iri
                         % ?FromLexicalForm:atom
                         % ?FromDatatypeIri:iri
                         % ?FromLanguageTag:atom
                         % ?Graph:atom
                         % +Action:compound
  ]
).

/** <module> RDF literal: Build

Support for asserting/retracting triples with literal object terms.

@author Wouter Beek
@version 2013/10, 2014/03, 2014/10
*/

:- use_module(library(semweb/rdf_db)).

:- use_module(generics(meta_ext)).

:- use_module(plRdf(api/rdf_build)).
:- use_module(plRdf(term/rdf_literal)).

:- rdf_meta(rdf_update_literal(r,r,?,r,?,?,t)).
:- rdf_meta(rdf_update_literal_preview(r,r,?,r,?,?,+)).
:- rdf_meta(rdf_update_literal_rule(?,?,:,:,:)).



%! rdf_convert_literal(
%!   +FromLexicalForm:atom,
%!   +FromDatatypeIri:iri,
%!   ?FromLanguageTag:atom,
%!   -ToLexicalForm:atom,
%!   ?ToDatatypeIri:iri,
%!   ?ToLanguageTag:atom
%! ) is det.
% If `ToDatatype` is uninstantiated, it is equated to `FromDatatype`.

rdf_convert_literal(
  FromLexicalForm, FromDatatype, FromLangTag,
  ToLexicalForm,   ToDatatype,   ToLangTag
):-
  atom(FromLexicalForm),
  nonvar(FromDatatype),
  default(FromDatatype, ToDatatype),
  rdf_literal_map(FromLexicalForm, FromDatatype, FromLangTag, Value),
  rdf_literal_map(ToLexicalForm,   ToDatatype,   ToLangTag,   Value).



%! rdf_update_literal(
%!   ?Subject:oneof([bnode,uri]),
%!   ?Predicate:iri,
%!   ?LexicalForm:atom,
%!   ?Datatype:iri,
%!   +LangTag:list(atom),
%!   +Graph:atom,
%!   +Action:compound
%! ) is det.
% Updates triples with literal objtect terms.
%
% The following actions are supported:
%   - `datatype(+Iri:atom)`
%   - `langtag(+LangTag:list(atom))`
%   - `lexical_form(+LexicalForm:atom)`
%   - `lexical_form_convert(+callable)`
%     Updates only the lexical form of matching literals algorithmically.
%     `Goal` takes the additional arguments `FromLexicalFrom`
%     and `ToLexicalForm`.
%   - `lexical_form_split(+Split:atom)`
%     Splits lexical forms, resulting into a separate triple for each split.

rdf_update_literal(Term, P, LexicalForm, Datatype, LangTag, Graph, Action):-
  rdf_update_literal_rule(
    [S,P,LexicalForm,Datatype,LangTag,Graph,Action],
      _,
      Antecedent,
      Consequent,
      _
    )
  ),
  forall(Antecedent, Consequent).


%! rdf_update_literal_rule(
%!   ?Match:list,
%!   ?Header:list,
%!   :Antecedent,
%!   :Consequent,
%!   :Preview
%! ) is nondet.
% Rules that can be used by:
%   * rdf_update_literal/7
%   * rdf_update_literal_preview/7

% Replace literal components.
rdf_update_literal_rule(
  [S,P,FromLexicalForm,FromDatatype,FromLangTag,G,
      literal(ToLexicalForm,ToDatatype,ToLangTag)],
  [S,P,FromLiteral,ToLiteral,G],
  rdf_literal(S, P, FromLexicalForm, FromDatatype, FromLangTag, G),
  (
    rdf_convert_literal(
      FromLexicalForm,
      FromDatatype,
      FromLangTag,
      ToLexicalForm,
      ToDatatype,
      ToLangTag
    ),
    rdf_retractall_literal(
      S,
      P,
      FromLexicalForm,
      FromDatatype,
      FromLangTag,
      G
    ),
    rdf_assert_literal(S, P, ToLexicalForm, ToDatatype, ToLangTag, G)
  ),
  (
    rdf_literal(FromLiteral, FromLexicalForm, FromDatatype, FromLangTag),
    rdf_literal(ToLiteral,   ToLexicalForm,   ToDatatype,   ToLangTag  )
  )
).
% Transformation of lexical forms.
rdf_update_literal_rule(
  [S,P,FromLexicalForm,Datatype,LangTag,G,lexical_form(Goal)],
  [S,P,FromLiteral,ToLiteral,G],
  rdf_literal(S, P, FromLexicalForm, Datatype, LangTag, G),
  (
    call(Goal, FromLexicalForm, ToLexicalForm),

    % The lexical form must be altered.
    FromLexicalForm \== ToLexicalForm,

    % Make sure the altered lexical form still results in an valid literal.
    rdf_literal_map(ToLexicalForm, Datatype, LangTag, _),

    % Assert the new lexical form and retract the old one.
    rdf_assert_literal(    S, P, ToLexicalForm,   Datatype, LangTag, G),
    rdf_retractall_literal(S, P, FromLexicalForm, Datatype, LangTag, G)
  ),
  (
    rdf_literal(FromLiteral, FromLexicalForm, Datatype, LangTag),
    rdf_literal(ToLiteral,   ToLexicalForm,   Datatype, LangTag)
  )
).
% Splitting of lexical forms.
rdf_update_literal_rule(
  [S,P,FromLexicalForm,Datatype,LangTag,G,split_lexical_form(Split)],
  [S,P,FromLiteral,ToLiteral,G],
  rdf_literal(S, P, FromLexicalForm, Datatype, LangTag, G),
  (
    atomic_list_concat(ToLexicalForms, Split, FromLexicalForm),

    % The split must be non-trivial.
    ToLexicalForms \== [_],

    % Make sure the altered lexical forms still result in valid literals.
    forall(
      member(ToLexicalForm, ToLexicalForms),
      rdf_literal_map(ToLexicalForm, Datatype, LangTag, _)
    ),

    % Assert all the splitted lexical forms.
    forall(
      member(ToLexicalForm, ToLexicalForms),
      rdf_assert_literal(    S, P, ToLexicalForm,   Datatype, LangTag, G)
    ),
    % Retract the single non-splitted lexical form.
    rdf_retractall_literal(S, P, FromLexicalForm, Datatype, LangTag, G)
  ),
  (
    rdf_literal(FromLiteral, FromLexicalForm, Datatype, LangTag),
    rdf_literal(ToLiteral,   ToLexicalForm,   Datatype, LangTag)
  )
).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.