Giter Site home page Giter Site logo

comunica / comunica Goto Github PK

View Code? Open in Web Editor NEW
409.0 409.0 70.0 160.02 MB

๐Ÿ“ฌ A knowledge graph querying framework for JavaScript

Home Page: https://comunica.dev

License: Other

TypeScript 99.10% JavaScript 0.83% Dockerfile 0.02% Shell 0.06%
decentralization federation graphql hacktoberfest heterogeneity javascript query-engine rdf sparql triple-pattern-fragments

comunica's Introduction

Comunica

A knowledge graph querying framework for JavaScript
Flexible SPARQL and GraphQL over decentralized RDF on the Web.

Build Status Coverage Status DOI Gitter Lobby chat Gitter Dev chat

Learn more about Comunica on our website.

Comunica is an open-source project that is used by many other projects, and is being maintained by a group of volunteers. If you would like to support this project, you may consider:

Supported by

Comunica is a community-driven project, sustained by the Comunica Association. If you are using Comunica, becoming a sponsor or member is a way to make Comunica sustainable in the long-term.

Our top sponsors are shown below!

Query with Comunica

Read one of our guides to get started with querying:

Or jump right into one of the available query engines:

Modify or Extending Comunica

Read one of our guides to get started with modifying Comunica, or have a look at some examples:

Contribute

Interested in contributing? Have a look at our contribution guide.

Development Setup

(JSDoc: https://comunica.github.io/comunica/)

This repository should be used by Comunica module developers as it contains multiple Comunica modules that can be composed. This repository is managed as a monorepo using Lerna.

If you want to develop new features or use the (potentially unstable) in-development version, you can set up a development environment for Comunica.

Comunica requires Node.JS 8.0 or higher and the Yarn package manager. Comunica is tested on OSX, Linux and Windows.

This project can be setup by cloning and installing it as follows:

$ git clone https://github.com/comunica/comunica.git
$ cd comunica
$ yarn install

Note: npm install is not supported at the moment, as this project makes use of Yarn's workspaces functionality

This will install the dependencies of all modules, and bootstrap the Lerna monorepo. After that, all Comunica packages are available in the packages/ folder and can be used in a development environment, such as querying with Comunica SPARQL (@comunica/query-sparql).

Furthermore, this will add pre-commit hooks using husky to build, lint and test. These hooks can temporarily be disabled at your own risk by adding the -n flag to the commit command.

Benchmarking

If you want to do benchmarking with Comunica in Node.js, make sure to run Node.js in production mode as follows:

> NODE_ENV=production node packages/some-package/bin/some-bin.js

The reason for this is that Comunica extensively generates internal Error objects. In non-production mode, these also produce long stacktraces, which may in some cases impact performance.

Cite

If you are using or extending Comunica as part of a scientific publication, we would appreciate a citation of our article.

@inproceedings{taelman_iswc_resources_comunica_2018,
  author    = {Taelman, Ruben and Van Herwegen, Joachim and Vander Sande, Miel and Verborgh, Ruben},
  title     = {Comunica: a Modular SPARQL Query Engine for the Web},
  booktitle = {Proceedings of the 17th International Semantic Web Conference},
  year      = {2018},
  month     = oct,
  url       = {https://comunica.github.io/Article-ISWC2018-Resource/}
}

License

This code is copyrighted by the Comunica Association and Ghent University โ€“ imec and released under the MIT license.

comunica's People

Contributors

albaike avatar bcommeine avatar brechtvdv avatar constraintautomaton avatar danielbeeke avatar florianfv avatar greenkeeper[bot] avatar jacoscaz avatar jasmineleonard avatar jeswr avatar jitsedesmet avatar joachimvh avatar laurin-w avatar maartyman avatar peeja avatar renovate-bot avatar renovate[bot] avatar rubeneschauzier avatar rubensworks avatar rubenverborgh avatar sandervanhove avatar simonvbrae avatar smessie avatar stephaniech97 avatar surilindur avatar timplication avatar tpt avatar vinnl avatar woutermont avatar wschella avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

comunica's Issues

Offline unit testing

Make it so that all unit tests still work when not connected to the internet.

All HTTP-related unit tests currently perform actual HTTP requests, so we have to make sure to mock this.

Add identity-based DISTINCT actor

The current DISTINCT implementation works based on hashes. While the chance of clashes is quite small, we should add an identity-based implementation as well.

This could work by for example just stringifying each object, instead of hashing it.

Feature-rdf-metadata

A way to extract metadata from an already-fetched and parsed RDF document, with a Hydra actor. (rdf-metadata)

After that, also a way to fetch paged Hydra RDF documents in a lazy manner, together with metadata. (hydra-paged)

Internally use asynciterator<RDF.Quad> to represent the quad stream, and metadata as a property.

This depends on #6.

Deduplicate webpacked dependencies

Webpack support was added in #64.

Webpack will by default deduplicate dependencies, but because using a (Lerna) monorepo, Webpack can sometimes still duplicate dependencies (webpack/webpack#5593).

There is a plugin that is supposed to fix this (https://github.com/RoboBurned/dedup-resolve-webpack-plugin), but it can cause issues in dependency resolution when packaged in a monorepo.
(Note: line 44 in the plugin should be replaced with if (fs.realpathSync(request.path) !== fs.realpathSync(cacheEntry.path)) {, and the failing modules should be blacklisted via the plugin.

In practise, when creating a web bundle in a regular installation (non-monorepo), this issue shouldn't occur, so let's test this when we get there.

Rename 'entrypoint' in context

Currently, 'entrypoint' is used in the context to indicate a TPF entrypoint.
We should make this more specific, and rename this to 'tpf' or something similar,
to avoid confusion with other source types.

Scripts not working (on Windows of course)

Since all build scripts now changed from tsc to ../../node_modules/.bin/tsc Windows is having issues. Windows can interpret paths with slashes, but not if it's part of the path pointing to the command you're trying to execute: '..' is not recognized as an internal or external command, etc.. I'll have a look if I can find a workaround.

actor-http-native not fully tested

It looks like the tests in actor-http-native do not reach a full coverage.

(the block on line 34 in ActorQueryOperationLeftJoinNestedLoop should be commented/changed a bit so that coverage for that also becomes 100%)

Metadata not always resolved

This is related to one of my comments in #30, but since I see the "problem" also occurs in another init actor I made a separate issue.

I'm also not sure if this is expected behaviour or not.

When running ActorInitRdfDereferencePaged with the pattern ?movie dbpedia-owl:starring dbpedia:Brad_Pitt. I get the following output:

Metadata: {
  "isFulfilled": false,
  "isRejected": false
}
{"subject":{"value":"http://dbpedia.org/resource/12_Monkeys"},"predicate":{"value":"http://dbpedia.org/ontology/starring"},"object":{"value":"http://dbpedia.org/resource/Brad_Pitt"},"graph":{"value":""}}
...

Meaning the metadata wasn't resolved (yet) but was printed. This is easily solved by adding an await on the metadata output line:

readable.push('Metadata: ' + JSON.stringify(await result.firstPageMetadata, null, '  ') + '\n');

Now the question is, is this the expected behaviour that the metadata can still be a promise at this point or is this a bug? (If it's expected behaviour the paged dereference init actor will need that change).

Browser support

It should be possible to use comunica using browserify and/or webpack.

Implement promise cancellations

As we are use cancellable Bluebird promises, we should start adding support for the cancellation behaviour. This will be important on things such as HTTP requests.

For instance, when performing a query with a certain limit, execution should be able to stop immediately after reaching this limit.

Allow HTTP timeout configuration

We should allow users to pass an HTTP timeout value via the context (httpTimeout).

This could be implemented using our own setTimeout and the fetch AbortController: node-fetch/node-fetch#95

We should keep in mind here that we should clear our own timeout once the request completes (response object is available).

Additionally, we need an extra context option (boolean: httpTimeoutOnBody) to make it so that the timeout not only applies to the time until response starts coming in, but also to the time until the response body is fully available. The latter could take longer, or potentially be infinite for e.g. continuous data streams. This should also take into account that response bodies can be cancelled from within Comunica.


Bounty

A bounty has been placed on this issue by:

Netwerk Digitaal Erfgoed
โ‚ฌ1088

Click here to learn more if you're interested in claiming this bounty by resolving this issue.

Support SPARQL operations

The following operations should be supported (assuming the SPARQL algebra types):

  • BGP
  • Construct
  • Describe
  • Distinct (sha1 hash-based)
  • Graph (not required if sparqlalgebrajs gets called with quad option)
  • Join
  • Leftjoin (has expression parameter, but can be implemented by calling Filter implementation)
  • Pattern
  • Project
  • Slice
  • Union
  • Ask

Dependent on Expression implementation:

  • Filter
  • Orderby
  • Expression

Dependent on Expression implementation, but not supported in the current client:

  • Aggregate
  • Extend
  • Group

Not supported in current LDF client (so should not supported right away):

  • Reduced
  • Values
  • Minus

Path-related operators. Not supported in current LDF client (so should not supported right away):

  • Alt
  • Inv
  • Link
  • Nps
  • OneOrMorePath
  • Path
  • Seq
  • ZeroOrMorePath
  • ZeroOrOnePath

Abstract configurations

In the future, we could provide component sets which provide a certain specific functionality. These sets could provide importable config files (using owl:imports) to simplify config files in cases where knowledge of the deeper component levels is not needed.

Add convenience implementations for IHeader and IBody

The HTTP bus provides the IHeaders and IBody interfaces which are based on the node-fetch types.

A default implementation should be provided for implementing HTTP actors that don't necessarily use the fetch API internally.

Feature/rdf-dereference

A way to dereference a URI to a quad stream.

This depends on #2 and #4.

  • bus-rdf-dereference: Bus and abstract actor for dereferencing a URI to an RDF/JS stream. Actor in: URI, Actor out: RDF.Stream
  • actor-rdf-dereference-http-parse: Uses bus-rdf to get an overview of all available media types, uses bus-http to fetch the contents of the URI with an accept header based on these media types, and bus-rdf to parse these contents.

In the future, we should provide a way to give dynamic priorities to media types. Either statically at config-level.

Memento support

Just realized this, but we don't have Memento time conneg support planned yet.

Ideally, this should also be implemented before we release.

@mielvds Are you up for this?

Query local files

There should be a way to query files on the local filesystem.

One way of doing this would be to add an HTTP actor that proxies the file system. One problem with this would be that conneg would only be best-effort.

Note: We need to ensure somehow that remote resources can not somehow trigger local files to be queried for security. This could be done by adding an additional flag to the context when initializing so that it is stated that local files must be queried.

Different output serializers

Just like the old LDF client, we should support different result writers.

Currently, this is hardcoded to be print JSON bindings to the console.
This should be bus-ified, so that different writers can be easily added.

Also extend the HTTP interface to support this when done.

Federation support

We'll have to support multiple entrypoints.

This could be done by creating a generic sources entry in the context, which can contain multiple sources ('sub-contexts') of different types (key: entrypoint, file, hdtFile, value: any). A certain actor could delegate these sources to a mediator.

Add debug mode

  • Add profiler (at bus-level?) so that the execution time of each run/test per actor can be seen.
  • Allow query plan to be dumped. (Probably only after execution, as the left-deep-smallest actor results in dynamic plans)
  • Make debug mode not disable stack traces.

Add support for prefixes

Allow prefixes to be defined externally from the query AND allow RDF serializations to use prefixes somehow.

Add query operation actor generator

We already have a code generator for actors. Make sure to add one for query operation actors as well, which all share certain properties.

Cache wired engine in query API

Currently, when calling query via the JS API, the engine will be rewired every time.

Change this so that first, the developer has to instantiate an engine, and only then can the query method be called.

Add query API

Currently, the init actor can only run from the command line and print to the console.

We should add an init actor (and a runner-?) that allows query to be evaluated and results to be returned via JavaScript.

Filter

Support filter expressions by simply copy-pasting the impl of the current client.

Add SPARQL optimize bus

This bus should allow queries (in SPARQL algebra) to be rewritten by actors on its bus.

Actors could use this bus to optimize certain query types, or to modify certain operations so that certain specific actors can evaluate them.

Add SPARQL protocol interface

We should add a HTTP-based actor that accepts SPARQL queries.
This should do something similar to what ldf-client-http does.

Unable to build from clean install

Been fighting with this while trying to fix #1 (pretty sure this one is not Windows related!).

When running lerna bootstrap, the following steps get executed in order:

lerna info lifecycle preinstall
lerna info Symlinking packages and binaries
lerna info lifecycle postinstall
lerna info lifecycle prepublish
lerna info lifecycle prepare

The packages have been configured to build typescript to javascript in the prepare step.
In the Symlinking step, lerna tries to link all packages and binaries, which are defined in the bin field of package.json.
runner-cli has the following in its package.json:

"bin": {
    "comunica-run": "./bin/run.js"
  }

This file does not exist yet at this point since the build has not happened yet, only run.ts exists, causing the lerna bootstrap process to fail at this point.

Moving the build process to preinstall is not a solution since there would be missing dependencies due to the symlink of packages not having happened yet.

Only solution I see is to write the binaries in javascript instead of typescript?

ActorQueryOperationQuadpattern not returning all results

When changing config-example-quadpattern.json to use entrypoint http://fragments.dbpedia.org/2016-04/en and the pattern to ?movie dbpedia-owl:direct ?director. I no longer get all results, only the first few pages. The actual number of results differs every time I run this making me think this is a timing problem.

Make SPARQL init actor more user-friendly

Add things like support for URI prefixes, defining entrypoints as CLI argument, query files, ...

A default config file should be available in a separate package that contains all required actors for resolving SPARQL queries. This package should then become the main entrypoint of the query engine.

Make BGP resolving make use of join actors

Incorporate the join actors into BGP resolving as introduced by #30.
This could be done by making a new BGP actors that simply delegates joining.

This is probably only needed after version 1.0.0

Make public

A couple of things that need to be done before we make the project public.

  • Support QPF queries: #8
  • Release new major Components.js version: #27
  • General code cleanup:
    • Remove todo's
    • Check if some defaultScopeds can be changed to defaults in component files
    • Cleanup tests #12
  • SPARQL query support
  • Add more information to wiki about architecture
  • Improve README.md
  • Add contribution guidelines + issue template
  • CLA
  • Auto-deploy docs to github pages.
  • Setup Travis (also add tests for browser compilation)
  • Coveralls
  • Greenkeeper + greenkeeper-lockfile
  • Publish to npm (first try locally using sinopia).

Test helpers

Add test helpers in the root package, for things that are commonly used in tests, such as creating streams from triples and strings, converting from stream to array, or checking if all given element exist in a stream.

Dynamic actor loading

Investigate if actors could dynamically be loaded. (using Components.js?)

For example, certain RDF parsers should only be loaded if they are actually needed. The JSON-LD parser for example takes a long time to load, so this should be avoided until a server only supports JSON-LD.

https://webpack.js.org/guides/code-splitting/

Reorder subpatterns in BGP

The ReorderingGraphPatternIterator in the current LDF client seems to reorder triple patterns based on the number of free variables. Investigate what this does exactly, and where we can plug it in. (Either in a/the BGP actor, or in the new SPARQL optimize bus, #46)

Make client compilation more convenient

Make it possible for the componentsjs compilation to be done on the engine more easily, so that it can for example also be used for the command line script.

Federated query failure

The following query crashes:

PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX schema: <http://schema.org/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?person ?name ?book ?title {
  ?person dbpedia-owl:birthPlace [ rdfs:label "San Francisco"@en ].
  ?viafID schema:sameAs ?person;
               schema:name ?name.
  ?book dc:contributor [ foaf:name ?name ];
              dc:title ?title.
}

Sources: http://fragments.dbpedia.org/2016-04/en http://data.linkeddatafragments.org/viaf http://data.linkeddatafragments.org/harvard

Execution only works when in correct root folder

When testing the master branch, I tried to run the command from the root folder (i.e. packages\actor-init-hello-world\node_modules\.bin\comunica-run packages\actor-init-hello-world\config\config-example.json Desmond Hume ) which resulted in an error, while I did get the correct output when executing from the actor-init-hello-world folder.

The error was

Error: Invalid components file "packages\actor-init-hello-world\config\packages\actor-init-hello-world\config\config-example.json":
Error: No valid parser was found, both N3 and JSON-LD failed:
...
    name: 'jsonld.InvalidUrl',
    message: 'Dereferencing a URL did not result in a valid JSON-LD object. Possible causes are an inaccessible URL perhaps due to a same-origin policy (ensure the server uses CORS if you are using client-side JavaScript), too many redirects, a non-JSON response, o
r more than one HTTP Link Header was provided for a remote context.',
    details:
     { code: 'loading remote context failed',
       url: 'https://linkedsoftwaredependencies.org/contexts/comunica-actor-init-hello-world.jsonld',
...

So I assume this has something to do again with components.js having to find those jsonld files and linking them to the URL. (And in this case not finding them due to the path being different).

Feature-bgp

An actor should be created that listens on bus-query-operator and resolves BGPs.

This could be done based on a 'join' bus, for joining bindings streams.

This depends on #8.

Feature-quad-pattern-query

A way to perform a QPF query against an entrypoint and get an s, p, o, g binding stream.
This stream must be an asynciterator of immutable binding objects.

Do this based on the RDFJS Source interface, so that any implementation can work with it.

  • bus-rdf-resolve-quad-pattern Returns quad stream based on a quad pattern with options. An actor based on a RDFJS.Source factory with query options.
  • All available context and metadata entries must be documented on the wiki.
  • bus-query-operation Based on SPARQL Algebra operator. Returns (immutable) bindings asynciterator. For now, just an actor that can handle 'quadpattern'. Streams also have 'metadata', for things such as order and estimated number of elements.

This depends on #7 and #16.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.