Giter Site home page Giter Site logo

sansa-stack / sansa-inference Goto Github PK

View Code? Open in Web Editor NEW
27.0 9.0 7.0 1.59 MB

A general Inference API based on two of the most popular Big Data processing engines: Apache Spark and Apache Flink

License: Apache License 2.0

Scala 99.37% Shell 0.57% Java 0.06%
reasoning semantic-web owl rdfs spark flink distributed-computing

sansa-inference's Introduction

Archived Repository - Do not use this repository anymore!

SANSA got easier to use! All its code has been consolidated into a single repository at https://github.com/SANSA-Stack/SANSA-Stack

SANSA Inference Layer

Maven Central Build Status License Twitter

Table of Contents

Structure

sansa-inference-common

  • common datastructures
  • rule dependency analysis

sansa-inference-spark

Contains the core Inference API based on Apache Spark.

sansa-inference-flink

Contains the core Inference API based on Apache Flink.

sansa-inference-tests

Contains common test classes and data.

Setup

Prerequisites

  • Maven 3.x
  • Java 8
  • Scala 2.11 (support for Scala 2.12 once Spark moved to Scala 2.12 as well)
  • Apache Spark 2.x
  • Apache Flink 1.x

From source

To install the SANSA Inference API, you need to download it via Git and install it via Maven.

git clone https://github.com/SANSA-Stack/SANSA-Inference.git
cd SANSA-Inference
mvn clean install

Afterwards, you have to add the dependency to your pom.xml

For Apache Spark

<dependency>
  <groupId>net.sansa-stack</groupId>
  <artifactId>sansa-inference-spark_2.11</artifactId>
  <version>VERSION</version>
</dependency>

and for Apache Flink

<dependency>
  <groupId>net.sansa-stack</groupId>
  <artifactId>sansa-inference-flink_2.11</artifactId>
  <version>VERSION</version>
</dependency>

with VERSION beeing the released version you want to use.

Using Maven pre-build artifacts

The latest release is available in Maven Central, thus, you only have to add the following dependency to your pom.xml:

For Apache Spark

<dependency>
  <groupId>net.sansa-stack</groupId>
  <artifactId>sansa-inference-spark_2.11</artifactId>
  <version>0.6.0</version>
</dependency>

and for Apache Flink

<dependency>
  <groupId>net.sansa-stack</groupId>
  <artifactId>sansa-inference-flink_2.11</artifactId>
  <version>0.6.0</version>
</dependency>

Using SBT

Add the following lines to your SBT file:

For Apache Spark add

libraryDependencies += "net.sansa-stack" % "sansa-inference-spark_2.11" % "0.6.0"

and for Apache Flink add

libraryDependencies += "net.sansa-stack" % "sansa-inference-flink_2.11" % "0.6.0"

Using Snapshots

Snapshot version are only avalibale via our custom Maven repository located at http://maven.aksw.org/archiva/repository/snapshots .

Usage

Besides using the Inference API in your application code, we also provide a command line interface with various options that allow for a convenient way to use the core reasoning algorithms:

RDFGraphMaterializer 0.6.0
Usage: RDFGraphMaterializer [options]

  -i, --input <path1>,<path2>,...
                           path to file or directory that contains the input files (in N-Triples format)
  -o, --out <directory>    the output directory
  --properties <property1>,<property2>,...
                           list of properties for which the transitive closure will be computed (used only for profile 'transitive')
  -p, --profile {rdfs | rdfs-simple | owl-horst | transitive}
                           the reasoning profile
  --single-file            write the output to a single file in the output directory
  --sorted                 sorted output of the triples (per file)
  --parallelism <value>    the degree of parallelism, i.e. the number of Spark partitions used in the Spark operations
  --help                   prints this usage text

This can easily be used when submitting the Job to Spark (resp. Flink), e.g. for Spark

/PATH/TO/SPARK/sbin/spark-submit [spark-options] /PATH/TO/INFERENCE-SPARK-DISTRIBUTION/FILE.jar [inference-api-arguments]

and for Flink

/PATH/TO/FLINK/bin/flink run [flink-options] /PATH/TO/INFERENCE-FLINK-DISTRIBUTION/FILE.jar [inference-api-arguments]

In addition, we also provide Shell scripts that wrap the Spark (resp. Flink) deployment and can be used by first setting the environment variable SPARK_HOME (resp. FLINK_HOME) and then calling

/PATH/TO/INFERENCE-DISTRIBUTION/bin/cli [inference-api-arguments]

(Note, that setting Spark (resp. Flink) options isn't supported here and has to be done via the corresponding config files)

Example

RDFGraphMaterializer -i /PATH/TO/FILE/test.nt -o /PATH/TO/TEST_OUTPUT_DIRECTORY/ -p rdfs

will compute the RDFS materialization on the data contained in test.nt and write the inferred RDF graph to the given directory TEST_OUTPUT_DIRECTORY.

Supported Reasoning Profiles

Currently, the following reasoning profiles are supported:

RDFS

The RDFS reasoner can be configured to work at two different compliance levels:

RDFS (Default)

This implements all of the RDFS closure rules with the exception of bNode entailments and datatypes (rdfD 1). RDFS axiomatic triples are also omitted. This is an expensive mode because all statements in the data graph need to be checked for possible use of container membership properties. It also generates type assertions for all resources and properties mentioned in the data (rdf1, rdfs4a, rdfs4b).

RDFS Simple

A fragment of RDFS that covers the most relevant vocabulary, prove that it preserves the original RDFS semantics, and avoids vocabulary and axiomatic information that only serves to reason about the structure of the language itself and not about the data it describes. It is composed of the reserved vocabulary rdfs:subClassOf, rdfs:subPropertyOf, rdf:type, rdfs:domain and rdfs:range. This implements just the transitive closure of rdfs:subClassOf and rdfs:subPropertyOf relations, the rdfs:domain and rdfs:range entailments and the implications of rdfs:subPropertyOf and rdfs:subClassOf in combination with instance data. It omits all of the axiomatic triples. This is probably the most useful mode but it is a less complete implementation of the standard.

More details can be found in

Sergio Muñoz, Jorge Pérez, Claudio Gutierrez: Simple and Efficient Minimal RDFS. J. Web Sem. 7(3): 220-234 (2009)

OWL Horst

OWL Horst is a fragment of OWL and was proposed by Herman ter Horst [1] defining an "intentional" version of OWL sometimes also referred to as pD*. It can be materialized using a set of rules that is an extension of the set of RDFS rules. OWL Horst is supposed to be one of the most common OWL flavours for scalable OWL reasoning while bridging the gap between the unfeasible OWL Full and the low expressiveness of RDFS.

[1] Herman J. ter Horst: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. J. Web Sem. 3(2-3): 79-115 (2005)

How to Contribute

We always welcome new contributors to the project! Please see our contribution guide for more details on how to get started contributing to SANSA.

sansa-inference's People

Contributors

aklakan avatar gezimsejdiu avatar hebaallahibrahim avatar lorenzbuehmann avatar patrickwestphal avatar simonbin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sansa-inference's Issues

Running RDFS forward chaining generates triples with literals on subject position

I ran the RDFS forward chaining (on GO) through the example set up in SANSA-Examples/sansa-examples-spark and got triples looking like this

<"Reactome:REACT_90070"^^<http://www.w3.org/2001/XMLSchema#string>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Resource> .

in the output.
I think a literal check is needed here: https://github.com/SANSA-Stack/SANSA-Inference/blob/develop/sansa-inference-spark/src/main/scala/net/sansa_stack/inference/spark/forwardchaining/ForwardRuleReasonerRDFS.scala#L186

Conformance tests not executable on command line

When calling mvn clean test e.g. in sansa-inference-spark/ I get

net.sansa_stack.inference.spark.conformance.RDFSConformanceTest *** ABORTED ***
  java.lang.RuntimeException: Unable to load a Suite class that was discovered in the runpath: net.sansa_stack.inference.spark.conformance.RDFSConformanceTest
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  ...
  Cause: java.lang.NullPointerException:
  at scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
  at scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
  at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
  at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
  at net.sansa_stack.test.conformance.TestCases$.loadTestCases(TestCases.scala:29)
  at net.sansa_stack.test.conformance.ConformanceTestBase.<init>(ConformanceTestBase.scala:38)
  at net.sansa_stack.test.conformance.RDFSConformanceTestBase.<init>(RDFSConformanceTestBase.scala:17)
  at net.sansa_stack.inference.spark.conformance.RDFSConformanceTest.<init>(RDFSConformanceTest.scala:21)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

for RDFSConformanceTest and OWLHorstConformanceTest. Seems, the problem is, that in net.sansa_stack.test.conformance.TestCases (part of sansa-inference-tests) the directory: File variable passed in, holds e.g. the path file:/home/me/.m2/repository/net/sansa-stack/sansa-inference-tests_2.11/0.3.0-SNAPSHOT/sansa-inference-tests_2.11-0.3.0-SNAPSHOT-tests.jar!/data/conformance/owl2rl and directory.listFiles() returns null.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.