Giter Site home page Giter Site logo

d2rq's Introduction

D2RQ – A Database to RDF Mapper

D2RQ exposes the contents of relational databases as RDF. It consists of:

  • The D2RQ Mapping Language. Use it to write mappings between database tables and RDF vocabularies or OWL ontologies.
  • The D2RQ Engine, a SPARQL-to-SQL rewriter that can evaluate SPARQL queries over your mapped database. It extends ARQ, the query engine that is part of Apache Jena.
  • D2R Server, a web application that provides access to the database via the SPARQL Protocol, as Linked Data, and via a simple HTML interface.

Homepage and Documentation

Learn more about D2RQ at its homepage: http://d2rq.org/

License

Apache License, Version 2.0

http://www.apache.org/licenses/LICENSE-2.0.html

Contact, feedback, discussion

Please use the issue tracker here on GitHub for feature/bug discussion and support requests.

Building from source

Prerequisites

You need some tools in order to be able to build D2RQ. Depending on your operating system, they may or may not be already installed.

  • git, for forking the source code repository from GitHub. Run git on the command line to see if it's there.
  • Java JDK v5 or later, for compiling Java sources. Run java -version and javac on the command line to see if it's there.
  • Apache Ant, for building D2RQ. Run ant on the command line to see if it's there.

Getting the source

Get the code by forking the GitHub repository and cloning your fork, or directly clone the main repository:

git clone [email protected]:d2rq/d2rq.git

Doing Ant builds

D2RQ uses Apache Ant as its build system. You can run ant -p from the project's main directory to get an overview of available targets:

To run the D2RQ tools, you need to do at least ant jar.

ant allGenerate distribution files in zip and tar.gz formats
ant cleanDeletes all generated artefacts
ant compileCompile project classes
ant compile.testsCompile test classes
ant jarGenerate project jar file
ant javadocGenerate Javadoc API documentation
ant tarGenerate distribution file in tar.gz format
ant testRun tests
ant vocab.configRegenerate Config vocabulary files from Turtle source
ant vocab.d2rqRegenerate D2RQ vocabulary files from Turtle source
ant warGenerate war archive for deployment in servlet container
ant zipGenerate distribution file in zip format

Running D2RQ

After building with ant jar, you can test-run the various components. Let's assume you have a MySQL database called mydb on your machine.

Generating a default mapping file

./generate-mapping -u root -o mydb.ttl jdbc:mysql:///mydb

This generates a mapping file mydb.ttl for your database.

Dumping the database

./dump-rdf -m mydb.ttl -o dump.nt

This creates dump.nt, a dump containing the mapped RDF in N-Triples format.

Running D2R Server

./d2r-server mydb.ttl

This starts up a server at http://localhost:2020/

Deploying D2R Server into a servlet container

Edit /webapp/WEB-INF/web.xml to point the configFile parameter to the location of your mapping file.

Build a war file with ant war.

Deploy the war file, e.g., by copying it into the servlet container's webapps directory.

Running the unit tests

The unit tests can be executed with ant test.

Some unit tests rely on MySQL being present, and require that two databases are created:

  1. A database called iswc that contains the data from /doc/example/iswc-mysql.sql:

    echo "CREATE DATABASE iswc" | mysql -u root mysql -u root iswc < doc/example/iswc-mysql.sql

  2. An empty database called D2RQ_TEST.

d2rq's People

Contributors

aftiqb avatar arekinath avatar copierrj avatar cygri avatar devwout avatar hannes avatar jgeluk avatar luiseufrasio avatar petervandenabeele avatar tayfunhalac avatar tslifset avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

d2rq's Issues

Exclude servlet-api.jar from war

The war target in build.xml must exclude servlet-api-*.jar. Currently it just excludes one specific version, which is not the one that's actually present. This causes warnings in Tomcat, and might have worse consequences in other servlet containers.

Joins don't work properly if translateWith is involved

Let's say we have two class maps that are supposed to produce the same URIs, but using slightly different transformatons:

  • one uses translateWith, the other doesn't
  • one uses translateWith, the other uses a different translateWith
  • one uses urlify, the other doesn't

SPARQL queries that require joins over these node makers don't work properly. The generated SQL will have joins as if the translateWith or urlify wasn't there.

30-character limit for identifier names in Oracle

Oracle has 30-character limit for identifiers. The T1_<schema name>_<table name> pattern used by D2RQ can easily result in identifiers that are too long. Just use A1, A2 if it's longer than 30? Or even always?

Handling of views in generate-mapping

Views generally don't have a primary key and therefore should be excluded in generate-mapping.

There should be an option --include-views.

This becomes particularly interesting if we implement some way of auto-detecting uniqueness constraints.

Support reasoning/inference

Limited inference can be done by applying the RDFS rules (and potentially other rules) to the D2RQ mapping. This would add additional triple relations to the mapping.

As a first step, the trivial rules that don't rely on schema knowledge could be implemented (?x a rdfs:Class; ?x a rdf:Property). This is already partially done?

As a next step, the engine could be provided with an RDFS/OWL file, and further rules are taken from that file. This would allow implementing subclass, subproperty, equivalent class/property, domain, range, and more. Most if not all of RDFS-Plus could be implemented that way, perhaps with the exception of transitive properties.

Even OWL-QL might be possible.

This implies adding command line and configuration options for enabling inference (should be off by default), and for pointing to the RDFS/OWL file(s). Loading them automatically by dereferencing term IRIs would also be a good option.

FixedNodeMaker cannot be cast to TypedNodeMaker

Query:

SELECT ?x ?y ?z
WHERE { {<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromRatingSite1/Review6712>  rev:reviewer ?x .
     ?x ?y ?z.}
UNION
    {<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromRatingSite1/Review6712>  rev:reviewer ?a.
     ?x ?y ?z.
        FILTER(?z = ?a).} }

This yields the following exception:

de.fuberlin.wiwiss.d2rq.nodes.FixedNodeMaker cannot be cast to de.fuberlin.wiwiss.d2rq.nodes.TypedNodeMaker

Make SPARQL Basic Federated Query extension work

This would be a first step towards addressing a frequent D2R use case.

ARQ implements this, it's a matter of working out and documenting how exactly to enable and invoke it, and see what needs to be done in the query engine.

This gives us opportunities towards making multiple DBs in a single D2R instance work. The default dataset could make no-join queries by running the entire query against one DB after the other. To make queries that involve joins across DBs, one would have to use SERVICE to address the individual DBs (using their d2rq:Database resource IRI?)

Issues with backslashes in bNodeIdColumns

Reportedly, if a column value listed in d2rq:bNodeIdColumns contains a backslash, then no triples will show up. In the specific case, the backslash-containing value was also the object of the generated triple.

Refactor to introduce a mapping API

The map package already contains a pretty decent representation of a D2RQ mapping as Java objects. This should be refactored:

  • should have setters that allow programmatic creation of these objects (currently they are always initialized from an RDF graph)
  • mapping generator should produce these objects rather than directly producing Turtle
  • code for producing the mapping objects from RDF should be separated
  • code for producing TripleRelations and so on from the objects should (compile()) should be separated
  • code for validating the mapping objects (currently in validate() and in the RDF-to-object code) should be separated
  • there should be code that writes mapping objects to Turtle
  • should also make things like rdfs:label available – useful for HTML frontend

This would allow many good things:

  • simplify implementation of different generate-mapping algorithms, for example for the W3C Direct Mapping
  • simplify implementation of alternate mapping languages, in particular R2RML
  • allow third-party code to produce mappings using its own algorithms
  • simplify implementation of inference by expanding the mapping objects
  • easier testing of components in isolation

Unsupported database column types

This is a frequent class of interoperability problems in D2R.

The mapping should be made configurable and moved into an external file. Mapping authors should perhaps be able to add additional type mappings to their mapping files.

The error message for unsupported type codes could be slightly more helpful: it should also show the type name of the type of that column.

Also, is it wise to die whenever we don't know the column type? Perhaps we should just cast all unknown types to string, and only complain if that fails.

Examples for things that have been reported not to work:

  • TIMESTAMP in Oracle produces “Unsupported database type code (1111) for column xxxx” Fixed
  • NVARCHAR2 in Oracle Fixed
  • ENUM in Postgres (type code 1111)
  • BLOB Partially fixed, see #115
  • Any binary type should be treated as xsd:hexBinary Fixed

Exclude certain tables from generate-mapping

Add --exclude-table and --include-table options that take table names (incl. “*” wildcard) as argument. By default, all tables are included. Only tables that are included and not explicitly excluded will be taken into account.

Later we could use this infrastructure to exclude certain tables by default, such as system tables.

D2R Server deployment issues with Tomcat

Observed on Windows. Happens both when deploying a .war built with the usual instructions, and when deploying just by copying the D2R Server webapp directory into webapps. This is for a non-root webapp.

31-Mar-2010 12:03:08 org.apache.catalina.core.StandardContext addApplicationListener
INFO: The listener "de.fuberlin.wiwiss.d2rs.WebappInitListener" is already configured for this context. The duplicate definition has been ignored.
31-Mar-2010 12:03:09 org.apache.catalina.core.StandardContext start
SEVERE: Error listenerStart
31-Mar-2010 12:03:09 org.apache.catalina.core.StandardContext start
SEVERE: Context [/idac] startup failed due to previous errors

Allow generation of SQL statement without executing it

And allow turning a (virtual) DB result into SPARQL results/triples.

Use cases:

  • debugging
  • unit tests
  • Snorql and d2r-query could support “SQL query” as additional result format (again mostly for debugging)
  • use cases where third-party code must run the SQL query for some reason (no JDBC driver)

Issues with TEXT columns in MySQL

Reportedly, TEXT columns don't work in MySQL.

Christian Brenninkmeijer [email protected] reports that he fixed it in de.fuberlin.wiwiss.d2rq.sql.ConnectedDB.columnType() like so:

default:
    if ("NVARCHAR2".equals(type.typeName()))
        return TEXT_COLUMN;
    else if ("text".equals(type.typeName()))
        return TEXT_COLUMN;
    else
        throw new D2RQException("Unsupported database type  code (" + type + ") for column " + column.qualifiedName());

Relative path in metadata template doesn't work

Setting d2r:metadataTemplate to "metadata.n3" results in an error message:

com.hp.hpl.jena.shared.NotFoundException: Not found: /d2rqwebapp/WEB-INF/templates/metadata.n3

The path here seems to come somehow from this property in velocity.properties?

file.resource.loader.path=webapp/WEB-INF/templates

A relative-directory reference should be interpreted as relative to WEB-INF!

Guess primary key in generate-mapping

This came up in discussion with Christian Brenninkmeijer [email protected] who has partially implemented this in his own code.

Add a --guess-pk switch. Analyze columns to see if they are unique. If so, use it as key. Perhaps, if none is found, start doing pairs of columns, then triplets etc

SPARQL DESCRIBE not working in some cases where d2rq:join is used

Reported by Evangelos Theodoridis [email protected].

When adding a property to a table that uses information from another database table , the HTML view seems to have a problem, but the SPARQL endpoint works properly.

For example (schema used attached) when adding

a) a property that lists the interactions of a person using info from the interactions table

map:person_interactsWith a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:person;
d2rq:property vocab:person_interactsWith;
d2rq:join "person.ID = interactions.PersonA";
d2rq:uriSqlExpression "CONCAT('http://192.168.2.4:8080/resource/person/', interactions.PersonB)";
d2rq:propertyDefinitionLabel "Has Interacted With the following";
.

or

b) a property that lists the Categories of persons that a person has interacted with

map:person_CategoryInteractions a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:person;
d2rq:property vocab:person_CategoryInteractions;
d2rq:join "person.ID <= interactions.PersonA";
d2rq:sqlExpression "SELECT person.Category from person WHERE person.ID = interactions.PersonB";

we get (in the HTML view) the following error

HTTP ERROR: 500

Column not found in DESCRIBE result: @@interactions.PersonA@@ (E0)
RequestURI=/page/person/1

SPARQL view works just fine however.

Validate mapping against an RDFS/OWL file

Add validation that checks whether all classes and properties generated by the mapping are actually defined in an ontology (either by dereferencing, or provided in an RDFS/OWL file).

This is related to reasoning/inferencing, which would also benefit from access to an external RDFS/OWL file.

Dynamically-generated language tags

There are situations where the language tag of a property bridge is not static, but actually is a column in the DB. The BSBM schema is an example.

Currently this can only be handled by duplicating the bridge with different d2rq:conditions.

There could be a new property d2rq:langMap which points to a sub-resource that allows specification of a d2rq:column etc.

Serve RDFS/OWL file on the server

Make it easy to serve an RDFS/OWL file (which might come from the web or from the file system) alongside the data in D2R Server.

Ideally, use it for rdfs:label etc, and mix it into the SPARQL dataset.

This could be used to serve the auto-generated vocabulary.

Drop or update the Sesame support

Either drop or update the Sesame API support. It hasn't been touched since 2006 and most likely no longer works for recent Sesame versions. It has no tests.

Make subqueries work in d2rq:sqlExpression and d2rq:condition

Making subqueries work would be extremely handy. Currently the problem is that all columns mentioned there will be renamed with aliases. This should be done only for columns on tables that are not in the FROM clause of the subquery (?).

Maybe it works if one uses only aliases as table names in the FROM clause? If so, document that. If not, some other way has to be found.

SPARQLing for a DATE in SQL Server doesn't work

Reportedly, querying for a specific DATE value in SQL Server doesn't work.

The generated query contains: t.foo=DATE '1999-02-13'.

Reportedly, the correct syntax for SQL Server is foo='1999-02-13'.

Supporting multiple databases in a single mapping

Doing this right is very difficult because it requires query planning.

A cheap way to address the use case without actually supporting multiple mappings is to support SPARQL basic federated query, but that's a separate issue.

If we really don't support multiple d2rq:Databases in a single mapping, then we should just consider that a validation failure and abort.

Otherwise, there are a number of things that can be done. Most importantly, NodeMakers over relations from different DBs should never be considered compatible. This should allow at least non-join federation of databases.

VoID support

Add VoID support:

  • void:inDataset on all RDF representations
  • assign a dataset IRI
  • serve a VoID description that includes all information we know, such as SPARQL endpoint and perhaps even stats
  • allow mixing a user-provided RDF file into the VoID description

Also, SPARQL Service Description should be served from the SPARQL endpoint URI – could likely be the same document as the VoID representation.

Issues with booleans in Postgres

Ensure that returning booleans works in Postgres.

Reportedly, they come out from the DB as “t” instead of “true”, being evaluated to false in TypedNodeMaker

SQL query run twice when containsDuplicates set to true

Reported by Phil Ashworth [email protected]:

I've created a classmap on a table that contains duplicates, so I've used the d2rq:containsDuplicates "true"; statement. See Below. However I noticed that the results were slower returning with this statement. Running a local host I checked the debug statements for the SQL calls.

It runs the correct query (with the distinct clause) but it runs it twice before the results are displayed.

If I comment out the d2rq:containsDuplicates statement, the sql is only run once.

map:Entity a d2rq:ClassMap;
    d2rq:dataStorage map:database;
    d2rq:uriPattern "entity/@@NBE_READER.UCB_NBE_DETAIL_VW.COMPOUND@@";;
    d2rq:class vocabClass:Entity;
    d2rq:classDefinitionLabel "Entity Class definition";
    d2rq:classDefinitionComment "Class defining Entities";
    d2rq:containsDuplicates "true";
    d2rq:condition "NBE_READER.UCB_NBE_DETAIL_VW.TEST_YN = 'N'";
    . 

Graceful handling of filenames where IRIs are expected

Users often accidentally specify a file name where an IRI is expected. This results in errors such as “unknown protocol C” or similar (Windows).

There should be more graceful handling of this common user error. Either make filenames work by converting them to file: IRIs. Or error message “IRI expected, file name found as value of foo: xyz”.

See Issue #52.

ClassMap with d2rq:constantValue is rejected by map parser

This map:

map:Throot a d2rq:ClassMap;
    d2rq:dataStorage map:database;
    d2rq:class skos:ConceptScheme;
    d2rq:constantValue "http://example.com/thesaurus/0.1/";

yields this error:

d2rq:ClassMap map:Throot needs a column/pattern/bNodeID specification (E0)

Either it should complain that d2rq:constantValue is a literal, or it should just work.

Extract best-effort RDF Schema file from mapping

Add code that infers a best-effort RDF Schema file from a mapping.

I guess that this could be done in an “open-world” or “closed-world” fashion. In closed-world, if property foo is only used to relate A and B, then those are its domain and range. In open-world, it would just mean that the domain and range intersects A and B.

There should be a script for saving that schema to a file. This could be part of generate-mapping if the implementation only works for mappings generated by that script; or a separate script if its input is any mapping.

Separate issue: serve the extracted schema in D2R Server (#42).

Support arbitrary characters in table/column names in generate-mapping

With escaping, any character can occur in a table/column name. The generate-mapping script isn't prepared for that; it derives IRIs from those names in a way that fails quite often. A generic escaping mechanism that works for absolutely any character is needed.

Examples for characters known to cause problems: “.”, “@”

Also, the current design causes clashes; "XXX_YYY.ID" and "XXX.YYY_ID" result in the same IRI in the mapping file.

DISTINCT with TEXT and BLOB in Oracle and MS SQL Server

D2RQ routinely generates queries that result in this error if run on MS SQL Server:

The text data type cannot be selected as DISTINCT because it is not comparable.

Similar things happen on Oracle for Text and BLOB.

Example SPARQL query:

SELECT ?title ?text WHERE {
    ?review bsbm:reviewFor <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromProducer30/Product1358> .
    ?review dc:title ?title .
    ?review rev:text ?text .
}

Resulting SQL query:

SELECT DISTINCT "T1_review"."reviewID", "T2_review"."title", "T1_review"."publisher", "T3_review"."text" FROM "review" AS "T3_review", "review" AS "T1_review", "product" AS "T1_product", "review" AS "T2_review" WHERE ("T1_product"."producerID" = 30 AND "T1_product"."productID" = "T1_review"."productID" AND "T1_product"."productID" = 1358 AND "T1_review"."publisher" = "T2_review"."publisher" AND "T1_review"."publisher" = "T3_review"."publisher" AND "T1_review"."reviewID" = "T2_review"."reviewID" AND "T1_review"."reviewID" = "T3_review"."reviewID" AND ("T3_review"."language"='zh'))

This is part of the BSBM suite.

Load database dump into a SPARQL store

There should be a script that loads the mapped contents of a DB directly into a SPARQL store using SPARQL Update or the Graph Store Protocol.

This would make D2R a nicely complete ETL tool.

Duplicates in directory pages

Reportedly, a class map over a denormalized table will have duplicates in the directory pages even if d2rq:containsDuplicates is set to true.

Incorrect SQL query for property bridges with different d2rq:lang

Mapping (this comes from BSBM):

map:review_text1 a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:review;
d2rq:property rev:text;
d2rq:column "review.text";
d2rq:lang "en";
d2rq:condition "review.language='en'";
.
# snip more languages
map:review_text8 a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:review;
d2rq:property rev:text;
d2rq:column "review.text";
d2rq:lang "ko";
d2rq:condition "review.language='ko'";

SPARQL query:

SELECT ?title ?text WHERE {
    ?review bsbm:reviewFor <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromProducer30/Product1358> .
    ?review dc:title ?title .
    ?review rev:text ?text .
}

Resulting SQL:

SELECT DISTINCT "T1_review"."reviewID", "T2_review"."title", "T1_review"."publisher", "T3_review"."text" FROM "review" AS "T3_review", "review" AS "T1_review", "product" AS "T1_product", "review" AS "T2_review" WHERE ("T1_product"."producerID" = 30 AND "T1_product"."productID" = "T1_review"."productID" AND "T1_product"."productID" = 1358 AND "T1_review"."publisher" = "T2_review"."publisher" AND "T1_review"."publisher" = "T3_review"."publisher" AND "T1_review"."reviewID" = "T2_review"."reviewID" AND "T1_review"."reviewID" = "T3_review"."reviewID" AND ("T3_review"."language"='zh'))

So the SQL query explicitly asks only for zh, which isn't what was queried for.

Replace all N3 references with Turtle throughout

We say everywhere that D2RQ uses N3, but it actually is just the Turtle subset. With Turtle on its way towards a W3C Recommendation, it's time to replace all references to N3 throughout codebase and documentation with Turtle.

As part of this, make sure that owl:sameAs doesn't get serialized as = like it did in Pubby!

Issues with xsd:gYear comparison

Reportedly, FILTER (?y >= "1936"^^xsd:gYear) doesn't work even if there are newer years. Perhaps because it doesn't do numeric comparison?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.