Giter Site home page Giter Site logo

sparql-anything / sparql.anything Goto Github PK

View Code? Open in Web Editor NEW
187.0 14.0 11.0 33.85 MB

SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.

Home Page: https://sparql-anything.cc/

License: Apache License 2.0

Java 98.54% ANTLR 0.06% HTML 0.88% TeX 0.03% Shell 0.20% Dockerfile 0.29%
sparql semantic-web rdf json knowledge-graph-construction linked-data xml csv

sparql.anything's People

Contributors

dependabot[bot] avatar emidiostani avatar enridaga avatar justin2004 avatar kvistgaard avatar luigi-asprino avatar mathiasvda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparql.anything's Issues

[html] Invalid local name

An exception occurs with details:

Exception in thread "main" org.apache.jena.shared.InvalidPropertyURIException: http://www.w3.org/1999/xhtml#http:
	at org.apache.jena.rdf.model.impl.PropertyImpl.checkLocalName(PropertyImpl.java:66)
	at org.apache.jena.rdf.model.impl.PropertyImpl.<init>(PropertyImpl.java:55)
	at org.apache.jena.rdf.model.ResourceFactory$Impl.createProperty(ResourceFactory.java:296)
	at org.apache.jena.rdf.model.ResourceFactory.createProperty(ResourceFactory.java:144)

Support CSV first row as header

If my CSV has a first row as header, as in the example:

email,id,first name,last name
[email protected],2070,Laura,Grey
[email protected],4081,Craig,Johnson
[email protected],9346,Mary,Jenkins
[email protected],5079,Jamie,Smith

could I supply an option for the headers to provide additional property bindings (escaping spaces and all) instead of rdf:_x?

Based on the code at CSVTriplifier.java, I tried to pass -Dcsv.headers=true, but it doesn't seem to be picked up by the way the engine passes Java properties to the triplifier.

Triple pattern filtering

Currently, triplifiers transform all the data before executing the query. We could use the query expression and extract triple patterns to limit the triples added to the model. The resulting graph could be a subset of the full graph, including all the triples useful to evaluate the query. This approach is easy to implement and may improve performance in some (but not all) cases.

Exception when reading remote CSV

It seems that CSV Triplifier has a problem when reading remote resources and throws an exception here.
The issue can be reproduced with the query:

PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT *
WHERE {

    SERVICE <x-sparql-anything:csv.headers=true,location=https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale-20200409.csv> {
        ?s ?p ?o
    }

}

Add CLI parameter 'strategy'

We are developing alternative approaches to query execution, these should be also available as options in the CLI.

Guess output format from output file extension

At the moment, the CLI wants you to specify the output format as a separate parameter:

fx -q titles.rq -o titles.xml -f xml

While doing the following generates a JSON file (named .XML) - because the format falls back to defalut/JSON:

fx -q titles.rq  -o titles.xml

RDF files as resources

It could be handy to allow users to SPARQL static RDF files using the SERVICE operator. I think SPARQL.anything should include this feature although it is not related to Facade-X so probably we don't want to use the same protocol handler?

CLI: support multiple output formats

And to pick the right serialisation format, accordingly to the following serialisations: JSON, XML, CSV, TEXT, TTL, NT

In the future, we may reuse the io.github.basilapi:rendering library which is based on mime types as format identifiers and supports a large variety of formats.

Use input parameters to customise the output filename

When iterating over input parameters, when multiple files are generated, it would be useful to customise the output filename.
This is useful for creating a collection of RDF files each one identified nicely from the parameters.

For example, having an input including tuples such as

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="artistUrl"/>
    <variable name="artistNickname"/>
  </head>
  <results>
    <result>
      <binding name="artistUrl">
        <literal>https://imma.ie/artists/william-leech/</literal>
      </binding>
      <binding name="artistNickname">
        <literal>leech-william</literal>
      </binding>
    </result>
    <result>
      <binding name="artistUrl">
        <literal>https://imma.ie/artists/marie-foley/</literal>
      </binding>
      <binding name="artistNickname">
        <literal>foley-marie</literal>
      </binding>
    </result>

being able to instruct the tool to use artistNickname as (part of the) output filename.

Handling namespaces for properties and entities

At the moment, the namespace option parameter is used as default prefix fro both schema elements and named entities. This may create problems, for example, when joining two CSVs. Users should declare different namespaces to avoid clashes. An easier way may be to use the root entity (file name) as default prefix.

facade-x IRI not compliant with IETF scheme

Jena complains as follows:

Bad IRI: <facade-x:media-type=text/html,html.selector=#az-group,location=https://imma.ie/artists/> Code: 45/UNREGISTERED_NONIETF_SCHEME_TREE in SCHEME: The scheme name has a "-" in it, but it does not start in "x-" and the prefix is not known as the prefix of an alternative tree for URI schemes.

We may change the URI scheme, discussion open in alternatives.

Performance: slicing

A simple way of improving usability with large files is to support a general option to 'slice' content in some ways. Some ideas:

  • by indicating the number of containers to be produced, or the maximum number of values
  • by indicating the top N slots, or the bottom N
  • by indicating a limit in the triples produced.

We may think of others.

New parameter: input

This feature will allow chaining the output of a previous query to a new one.

When the input is a sparql result set, it will be used as a source of query parameters, following the BASIL convention.

Consistent parameter names in URIs

I suggest using location, namespace (applied to domain specific properties/types), media-type for general-purpose parameters and specify type-specific ones using, for example, csv.format=DEFAULT or json.root ...

New parameter: load

This is useful to chain the output of one query into another. RDF files can be loaded and queried along with the output of the service clause(s).

JSON iterator breaks with Unicode escaped characters

Unicode escaped characters should be expected in JSON serialisations. However, JsonIterator does not seem to support them and horribly breaks. Could not find a configuration parameter or anything else related to this issue so far.

Add default type for root element

It would be useful to be able to distinguish root elements from normal containers.

What about adding, for example rdf:type <urn:facade-x:default#root>?

Audit graph

We can add an option audit=1 to include a graph with information about the generated, queried graphs, using the SPARQL Service Description Vocabulary and VoID. This would be an optional meta-graph, useful for debugging and troubleshooting.

We can use a new boolean option 'audit=1' (defaults off)

HTML profiles: DOM, Microdata, RDFa, ...

HTML can be a data source in many different ways. Currently, we support a CSS-selector and generate a Facade-X representation of the related DOM portions (each solution in a separate graph). We should consider also alternative approaches, for example, referring to well-known approaches to embed data, such as Microdata or RDFa.

Relational database

Support relational databases, for example, by developing a connector using JDBC.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.