comunica / comunica Goto Github PK
View Code? Open in Web Editor NEW๐ฌ A knowledge graph querying framework for JavaScript
Home Page: https://comunica.dev
License: Other
๐ฌ A knowledge graph querying framework for JavaScript
Home Page: https://comunica.dev
License: Other
In the future, we could provide component sets which provide a certain specific functionality. These sets could provide importable config files (using owl:imports
) to simplify config files in cases where knowledge of the deeper component levels is not needed.
Webpack support was added in #64.
Webpack will by default deduplicate dependencies, but because using a (Lerna) monorepo, Webpack can sometimes still duplicate dependencies (webpack/webpack#5593).
There is a plugin that is supposed to fix this (https://github.com/RoboBurned/dedup-resolve-webpack-plugin), but it can cause issues in dependency resolution when packaged in a monorepo.
(Note: line 44 in the plugin should be replaced with if (fs.realpathSync(request.path) !== fs.realpathSync(cacheEntry.path)) {
, and the failing modules should be blacklisted via the plugin.
In practise, when creating a web bundle in a regular installation (non-monorepo), this issue shouldn't occur, so let's test this when we get there.
Just realized this, but we don't have Memento time conneg support planned yet.
Ideally, this should also be implemented before we release.
@mielvds Are you up for this?
Add test helpers in the root package, for things that are commonly used in tests, such as creating streams from triples and strings, converting from stream to array, or checking if all given element exist in a stream.
Make something so that this commonly occurring pattern can be abstracted: https://github.com/rubensworks/comunica/blob/master/packages/actor-init-query-operation/lib/ActorInitQueryOperation.ts#L18-L20
Possibly do this by adding something to Components.js, such as marking a parameter as required.
I did not yet have time to test this one fully, but my small tests seem to indicate that these streams do not fire an 'end' event when they're finished. But the code does terminate, so something stops at least. The lack of 'end' evet is a problem though when combining iterators.
Currently, the init actor can only run from the command line and print to the console.
We should add an init actor (and a runner-?
) that allows query to be evaluated and results to be returned via JavaScript.
Add a basic actor in which an RDFJS source can be plugged in via the constructor.
When changing config-example-quadpattern.json
to use entrypoint http://fragments.dbpedia.org/2016-04/en
and the pattern to ?movie dbpedia-owl:direct ?director.
I no longer get all results, only the first few pages. The actual number of results differs every time I run this making me think this is a timing problem.
An actor should be created that listens on bus-query-operator
and resolves BGPs.
This could be done based on a 'join' bus, for joining bindings streams.
This depends on #8.
As we are use cancellable Bluebird promises, we should start adding support for the cancellation behaviour. This will be important on things such as HTTP requests.
For instance, when performing a query with a certain limit, execution should be able to stop immediately after reaching this limit.
Add things like support for URI prefixes, defining entrypoints as CLI argument, query files, ...
A default config file should be available in a separate package that contains all required actors for resolving SPARQL queries. This package should then become the main entrypoint of the query engine.
This file should be ignored according to .npmignore . So either .npmignore should change or we should have a typescript version of this file.
Since all build scripts now changed from tsc
to ../../node_modules/.bin/tsc
Windows is having issues. Windows can interpret paths with slashes, but not if it's part of the path pointing to the command you're trying to execute: '..' is not recognized as an internal or external command, etc.
. I'll have a look if I can find a workaround.
Currently, when calling query
via the JS API, the engine will be rewired every time.
Change this so that first, the developer has to instantiate an engine, and only then can the query
method be called.
Support filter expressions by simply copy-pasting the impl of the current client.
This is related to one of my comments in #30, but since I see the "problem" also occurs in another init actor I made a separate issue.
I'm also not sure if this is expected behaviour or not.
When running ActorInitRdfDereferencePaged
with the pattern ?movie dbpedia-owl:starring dbpedia:Brad_Pitt.
I get the following output:
Metadata: {
"isFulfilled": false,
"isRejected": false
}
{"subject":{"value":"http://dbpedia.org/resource/12_Monkeys"},"predicate":{"value":"http://dbpedia.org/ontology/starring"},"object":{"value":"http://dbpedia.org/resource/Brad_Pitt"},"graph":{"value":""}}
...
Meaning the metadata wasn't resolved (yet) but was printed. This is easily solved by adding an await on the metadata output line:
readable.push('Metadata: ' + JSON.stringify(await result.firstPageMetadata, null, ' ') + '\n');
Now the question is, is this the expected behaviour that the metadata can still be a promise at this point or is this a bug? (If it's expected behaviour the paged dereference init actor will need that change).
A way to perform a QPF query against an entrypoint and get an s, p, o, g
binding stream.
This stream must be an asynciterator
of immutable binding objects.
Do this based on the RDFJS Source interface, so that any implementation can work with it.
bus-rdf-resolve-quad-pattern
Returns quad stream based on a quad pattern with options. An actor based on a RDFJS.Source
factory with query options.context
and metadata
entries must be documented on the wiki.bus-query-operation
Based on SPARQL Algebra operator. Returns (immutable) bindings asynciterator. For now, just an actor that can handle 'quadpattern'. Streams also have 'metadata', for things such as order and estimated number of elements.Make it possible for the componentsjs compilation to be done on the engine more easily, so that it can for example also be used for the command line script.
We'll have to support multiple entrypoints.
This could be done by creating a generic sources
entry in the context, which can contain multiple sources ('sub-contexts') of different types (key: entrypoint
, file
, hdtFile
, value: any). A certain actor could delegate these sources to a mediator.
A way to dereference a URI and get a stream of quads that lazily follows pages.
Been fighting with this while trying to fix #1 (pretty sure this one is not Windows related!).
When running lerna bootstrap
, the following steps get executed in order:
lerna info lifecycle preinstall
lerna info Symlinking packages and binaries
lerna info lifecycle postinstall
lerna info lifecycle prepublish
lerna info lifecycle prepare
The packages have been configured to build typescript to javascript in the prepare step.
In the Symlinking step, lerna tries to link all packages and binaries, which are defined in the bin
field of package.json.
runner-cli has the following in its package.json:
"bin": {
"comunica-run": "./bin/run.js"
}
This file does not exist yet at this point since the build has not happened yet, only run.ts exists, causing the lerna bootstrap process to fail at this point.
Moving the build process to preinstall is not a solution since there would be missing dependencies due to the symlink of packages not having happened yet.
Only solution I see is to write the binaries in javascript instead of typescript?
We need extensive documentation.
Implement it like in the current LDF client.
When testing the master branch, I tried to run the command from the root folder (i.e. packages\actor-init-hello-world\node_modules\.bin\comunica-run packages\actor-init-hello-world\config\config-example.json Desmond Hume
) which resulted in an error, while I did get the correct output when executing from the actor-init-hello-world
folder.
The error was
Error: Invalid components file "packages\actor-init-hello-world\config\packages\actor-init-hello-world\config\config-example.json":
Error: No valid parser was found, both N3 and JSON-LD failed:
...
name: 'jsonld.InvalidUrl',
message: 'Dereferencing a URL did not result in a valid JSON-LD object. Possible causes are an inaccessible URL perhaps due to a same-origin policy (ensure the server uses CORS if you are using client-side JavaScript), too many redirects, a non-JSON response, o
r more than one HTTP Link Header was provided for a remote context.',
details:
{ code: 'loading remote context failed',
url: 'https://linkedsoftwaredependencies.org/contexts/comunica-actor-init-hello-world.jsonld',
...
So I assume this has something to do again with components.js having to find those jsonld files and linking them to the URL. (And in this case not finding them due to the path being different).
It looks like the tests in actor-http-native do not reach a full coverage.
(the block on line 34 in ActorQueryOperationLeftJoinNestedLoop
should be commented/changed a bit so that coverage for that also becomes 100%)
We should add a HTTP-based actor that accepts SPARQL queries.
This should do something similar to what ldf-client-http
does.
A way to extract metadata from an already-fetched and parsed RDF document, with a Hydra actor. (rdf-metadata
)
After that, also a way to fetch paged Hydra RDF documents in a lazy manner, together with metadata. (hydra-paged
)
Internally use asynciterator<RDF.Quad>
to represent the quad stream, and metadata
as a property.
This depends on #6.
It should be possible to use comunica using browserify and/or webpack.
This bus should allow queries (in SPARQL algebra) to be rewritten by actors on its bus.
Actors could use this bus to optimize certain query types, or to modify certain operations so that certain specific actors can evaluate them.
We should allow users to pass an HTTP timeout value via the context (httpTimeout
).
This could be implemented using our own setTimeout
and the fetch AbortController
: node-fetch/node-fetch#95
We should keep in mind here that we should clear our own timeout once the request completes (response object is available).
Additionally, we need an extra context option (boolean: httpTimeoutOnBody
) to make it so that the timeout not only applies to the time until response starts coming in, but also to the time until the response body is fully available. The latter could take longer, or potentially be infinite for e.g. continuous data streams. This should also take into account that response bodies can be cancelled from within Comunica.
A bounty has been placed on this issue by:
โฌ1088 |
Click here to learn more if you're interested in claiming this bounty by resolving this issue.
Just like the old LDF client, we should support different result writers.
Currently, this is hardcoded to be print JSON bindings to the console.
This should be bus-ified, so that different writers can be easily added.
Also extend the HTTP interface to support this when done.
The current DISTINCT implementation works based on hashes. While the chance of clashes is quite small, we should add an identity-based implementation as well.
This could work by for example just stringifying each object, instead of hashing it.
A couple of things that need to be done before we make the project public.
The following query crashes:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX schema: <http://schema.org/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?name ?book ?title {
?person dbpedia-owl:birthPlace [ rdfs:label "San Francisco"@en ].
?viafID schema:sameAs ?person;
schema:name ?name.
?book dc:contributor [ foaf:name ?name ];
dc:title ?title.
}
Sources: http://fragments.dbpedia.org/2016-04/en http://data.linkeddatafragments.org/viaf http://data.linkeddatafragments.org/harvard
Currently, 'entrypoint' is used in the context to indicate a TPF entrypoint.
We should make this more specific, and rename this to 'tpf' or something similar,
to avoid confusion with other source types.
Make it so that all unit tests still work when not connected to the internet.
All HTTP-related unit tests currently perform actual HTTP requests, so we have to make sure to mock this.
The ReorderingGraphPatternIterator
in the current LDF client seems to reorder triple patterns based on the number of free variables. Investigate what this does exactly, and where we can plug it in. (Either in a/the BGP actor, or in the new SPARQL optimize bus, #46)
There should be a way to query files on the local filesystem.
One way of doing this would be to add an HTTP actor that proxies the file system. One problem with this would be that conneg would only be best-effort.
Note: We need to ensure somehow that remote resources can not somehow trigger local files to be queried for security. This could be done by adding an additional flag to the context when initializing so that it is stated that local files must be queried.
The following operations should be supported (assuming the SPARQL algebra types):
Dependent on Expression implementation:
Dependent on Expression implementation, but not supported in the current client:
Not supported in current LDF client (so should not supported right away):
Path-related operators. Not supported in current LDF client (so should not supported right away):
The HTTP bus provides the IHeaders
and IBody
interfaces which are based on the node-fetch types.
A default implementation should be provided for implementing HTTP actors that don't necessarily use the fetch API internally.
Investigate if actors could dynamically be loaded. (using Components.js?)
For example, certain RDF parsers should only be loaded if they are actually needed. The JSON-LD parser for example takes a long time to load, so this should be avoided until a server only supports JSON-LD.
A way to dereference a URI to a quad stream.
bus-rdf-dereference
: Bus and abstract actor for dereferencing a URI to an RDF/JS stream. Actor in: URI, Actor out: RDF.Streamactor-rdf-dereference-http-parse
: Uses bus-rdf
to get an overview of all available media types, uses bus-http
to fetch the contents of the URI with an accept header based on these media types, and bus-rdf
to parse these contents.In the future, we should provide a way to give dynamic priorities to media types. Either statically at config-level.
Incorporate the join actors into BGP resolving as introduced by #30.
This could be done by making a new BGP actors that simply delegates joining.
This is probably only needed after version 1.0.0
We already have a code generator for actors. Make sure to add one for query operation actors as well, which all share certain properties.
left-deep-smallest
actor results in dynamic plans)Add an actor that can resolve full SPARQL queries against a SPARQL endpoint.
Allow prefixes to be defined externally from the query AND allow RDF serializations to use prefixes somehow.
Some old browser will not support the fetch API (which we require).
In these browser, the cryptic error "Expected a ReadableStream" will be shown.
We should indicate a more user-friendly error for browsers that do not support the fetch API: https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.