Giter Site home page Giter Site logo

fedx's Introduction

Welcome to the FedX Repository

FedX is a practical framework for transparent access to Linked Data sources through a federation. It incorporates new sophisticated optimization techniques combined with effective variants of existing techniques and is thus a highly scalable solution for practical federated query processing.

FedX has moved

Veritas contributed FedX to the Eclipse RDF4J project. Development continues there, please see https://github.com/eclipse/rdf4j.

fedx's People

Contributors

aschwarte10 avatar veritasosrb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fedx's Issues

Large number of threads / limit number of threads ?

Hello

On a query involving a large amount of entities (tens of thousands) and a join between 2 sources in the federation (the tens of thousands of entities have a property linking them to an entity in the other source), I am seeing a lot of errors like the following, and the query does not terminate :

[269,472s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,472s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,472s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,473s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,474s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,475s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,476s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[269,477s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

I am taking the hypothesis that FedX needs to create a lot of threads and the threads creation fails. How can I control the threads being created to avoid such errors ?

Adjust data structures used for identifiers

For identifiers of queries, nodes and parallel executors currently (mostly) atomic integers have been used.

We should use more suitable data structures supporting a longer runtime without potential overflows.

For query identifiers it would make sense to switch to BigInteger as this makes sure that we can collect all queries over time.

Evaluation of OPTIONAL runs into deadlock with large intermediate results

When there are large intermediate result sets during the evaluation of a query, it could happen that the FedX query engine ran into a deadlock.

This is due to the fact that we require a controlled parallel execution order of operators, i.e. as the FedX query engine uses concurrent execution of sub queries, it is important that this happens in a controlled order.

Dynamic federations (endpoints different for each query)

Hello

I am in a situation where the endpoints participating in a federation are selected at runtime by the user before issuing a query.
This means that I need to create a different federation each time for each query; however FederationManager is a singleton and throws an Exception when I try to create a second federation when one was first created.
Even if I shutdown() the FederationManager after each query, this means only one FederationManager can exists at a given time, and this would block concurrent queries on my application.

I think I need to use FederationManager removeAll / addAll to update the list of endpoints for each query while keeping the same FederationManager instance. Correct ?
Do I need to recreate a new Repository object from the updated FederationManager ? how ? or should I keep the same Repository object and only update the FederationManager ?
Sample pseudo-code on how to deal with this situation would be very welcome !

Thanks a lot

Revise logging infrastructure

Currently FedX uses Log4J 1.x as a logging framework. This is EOL and needs to be replaced.

The goal is to switch to SLF4J as a compile time dependency, and bundle Log4J2 as an optional runtime implementation (e.g. for the CLI).

Support and document use of FedX in RDF4J workbench

FedX is built on top of the RDF4J framework and thus should support all its underlying functionaltity.

This particularly includes the RDF4J workbench: it should be possible (and well documented) of how FedX can be used in the RDF4J workbench.

Improve federation shutdown behavior

The current shutdown behavior is not clean and intuitive from an API perspective: examples from the documentation describe to shutdown the singleton federation manager.

Instead it is more intuitive to invoke a shutdown on the Repository instance.

Desired example

Config.initialize(fedxConfig);
 List<Endpoint> members = ...			// e.g. use EndpointFactory methods
Repository repo = FedXFactory.initializeFederation(members);
ReositoryConnection conn = repo.getConnection();
 
 // Do something with the connection, e.g. query evaluation
 repo.shutDown();

Evaluate Bound Joins using VALUES clause by default

FedX has already two strategies for evaluating a bound join: a pure SPARQL 1.0 implementation, and an implementation that uses the VALUES clause.

Performance tests have revealed that the implementation with the VALUES clause is more efficient.

As today's endpoints support SPARQL 1.1 we should switch the default to the VALUES clause implementation

SPARQL Bound Join with VALUES clause does not yield correct result in certain endpoints

The implementation of the bound join operator in FedX has been changed to use the VALUES clause.

In practical tests (particularly against Virtuoso endpoints) it turned out that the query execution does not yield correct results.

An analysis has show that the scope of the VALUES clause behaves differently in different query engines. FedX uses to have an implementation where the VALUES clause is outside the WHERE block.

Examples:

Not working:

SELECT  ?x ?name ?__index WHERE { 
   ?x <http://xmlns.com/foaf/0.1/name> ?name . 
} 
VALUES (?x ?name ?__index) { 
   (<urn:1> UNDEF "0") 
    (<urn:2> UNDEF "1") 
}

When putting the VALUES clause inside the WHERE block the sub query is correctly evaluated in both Virtuoso endpoints as well as in the tests running in the RDF4J engine

SELECT  ?x ?name ?__index WHERE { 
VALUES (?x ?name ?__index) { 
   (<urn:1> UNDEF "0") 
    (<urn:2> UNDEF "1") 
}
   ?x <http://xmlns.com/foaf/0.1/name> ?name . 
} 

Revise logging infrastructure

Currently FedX uses Log4J 1.x as a logging framework. This is EOL and needs to be replaced.

The goal is to switch to SLF4J as a compile time dependency, and bundle Log4J2 as an optional runtime implementation (e.g. for the CLI).

Unify namespaces used to in data configuration

Currently FedX uses two different namespaces in the configuration files:

Data config

@prefix fluid: <http://fluidops.org/config#>.

<http://DBpedia> fluid:store "SPARQLEndpoint";
fluid:SPARQLEndpoint "http://dbpedia.org/sparql".

Repository config:

@prefix fedx: <http://www.fluidops.com/config/fedx#>.

We should align these and use the second one consistently.

Note that this is a non-backwards compatible change

Seeing lots of "Repository for endpoint ... not initialized" and "Error executing union operator: null"

I am currently testing FedX.
I try to query the following federation :

			List<Endpoint> endpoints = new ArrayList<Endpoint>();			
			endpoints.add( EndpointFactory.loadSPARQLEndpoint("arsol", "http://localhost:7200/repositories/openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr"));
			endpoints.add( EndpointFactory.loadSPARQLEndpoint("aerba", "http://localhost:7200/repositories/openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr"));
			endpoints.add( EndpointFactory.loadSPARQLEndpoint("referentiels", "http://localhost:7200/repositories/openarchaeo-referentiels"));

With Log4J debug activated, I see a lot of errors like this :

31930 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31930 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31930 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31931 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31931 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31931 [Join Scheduler-10] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized
31867 [Join Scheduler-1] WARN com.fluidops.fedx.evaluation.concurrent.ControlledWorkerScheduler  - Exception encountered while evaluating task (QueryEvaluationException): com.fluidops.fedx.exception.FedXRuntimeException: Repository for endpoint sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr not initialized

And also a few like that :

47057 [Join Scheduler-8] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47099 [Join Scheduler-2] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47110 [Join Scheduler-4] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47118 [Join Scheduler-3] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47132 [Join Scheduler-6] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47212 [Join Scheduler-7] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47229 [Join Scheduler-9] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null
47244 [Join Scheduler-5] WARN com.fluidops.fedx.evaluation.union.UnionExecutorBase  - Error executing union operator: null

And the query seems to hang and never returns.

Any idea what can cause these 2 errors ?
Interestingly, if I remove one of the endpoints in the federation, the query performs fine.
The triplestore is GraphDB.

How can I get more debug information on what is currently happing inside FedX ?

FWIW, here is the SPARQL query and its execution plan

SELECT DISTINCT  ?this ?thisLabel
WHERE
  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object> .
    ?this (<http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by>/<http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within>)|<http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in> ?Site1 .
    ?Site1  a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site> .
    ?Site1 <http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed>/<http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by> <https://halshs.archives-ouvertes.fr/search/index/q/*/contributorId_i/103825/> .
    OPTIONAL
      { ?this  <http://www.w3.org/2004/02/skos/core#prefLabel>  ?thisLabel}
  }
7248 [main] DEBUG com.fluidops.fedx.FedXConnection  - Optimization start
7277 [main] DEBUG com.fluidops.fedx.optimizer.StatementGroupOptimizer  - Join arguments could be reduced to a single argument, replacing join node.
7277 [main] DEBUG com.fluidops.fedx.FedXConnection  - Optimization duration: 29
Optimized query execution plan: 
QueryRoot
   Distinct
      Projection
         ProjectionElemList
            ProjectionElem "this"
            ProjectionElem "thisLabel"
         LeftJoin
            NJoin
               ExclusiveGroup
                  ExclusiveStatement
                     Var (name=this)
                     Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
                     Var (name=_const_e4a0c91_uri, value=http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object, anonymous)
                     StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
                  ExclusiveStatement
                     Var (name=_anon_d5e73e6a_4e89_4b72_bfa1_450d50e722b8, anonymous)
                     Var (name=_const_35bfec94_uri, value=http://www.cidoc-crm.org/cidoc-crm/P14_carried_out_by, anonymous)
                     Var (name=_const_775f2f14_uri, value=https://halshs.archives-ouvertes.fr/search/index/q/*/contributorId_i/103825/, anonymous)
                     StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
               StatementSourcePattern
                  Var (name=Site1)
                  Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
                  Var (name=_const_9d328667_uri, value=http://www.cidoc-crm.org/cidoc-crm/E27_Site, anonymous)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr, type=REMOTE)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
               StatementSourcePattern
                  Var (name=Site1)
                  Var (name=_const_9c6af60f_uri, value=http://www.cidoc-crm.org/cidoc-crm/P8i_witnessed, anonymous)
                  Var (name=_anon_d5e73e6a_4e89_4b72_bfa1_450d50e722b8, anonymous)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Faerba.univ-tours.fr, type=REMOTE)
                  StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
               NUnion
                  ExclusiveGroup
                     ExclusiveStatement
                        Var (name=this)
                        Var (name=_const_440507c3_uri, value=http://www.ics.forth.gr/isl/CRMsci/O19i_was_object_found_by, anonymous)
                        Var (name=_anon_db9424a9_ce5b_4399_9647_687bf8269037, anonymous)
                        StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
                     ExclusiveStatement
                        Var (name=_anon_db9424a9_ce5b_4399_9647_687bf8269037, anonymous)
                        Var (name=_const_e772e6db_uri, value=http://www.cidoc-crm.org/cidoc-crm/P8_took_place_on_or_within, anonymous)
                        Var (name=Site1)
                        StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
                  ExclusiveStatement
                     Var (name=this)
                     Var (name=_const_2dea2a6c_uri, value=http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in, anonymous)
                     Var (name=Site1)
                     StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)
            ExclusiveStatement
               Var (name=this)
               Var (name=_const_c9f3cb8c_uri, value=http://www.w3.org/2004/02/skos/core#prefLabel, anonymous)
               Var (name=thisLabel)
               StatementSource (id=sparql_localhost:7200_repositories_openarchaeo?default-graph-uri=http%3A%2F%2Farsol.univ-tours.fr, type=REMOTE)

Type BindingSetAssignment not supported for cost estimation.

Hello

I need FedX to work with the VALUES keyword, which is not the case at the moment. See test query below :

SELECT DISTINCT  ?this
WHERE
  { ?this  a                     <http://www.cidoc-crm.org/cidoc-crm/E22_Man-Made_Object> .
    ?this <http://www.ics.forth.gr/isl/CRMarchaeo/AP21i_is_contained_in> ?Site1 .
    ?Site1  a                     <http://www.cidoc-crm.org/cidoc-crm/E27_Site>
    VALUES ?Site1 { <http://arsol.univ-tours.fr/4DACTION/WFICHEWEB/isiteAA> }
  }
Caused by: com.fluidops.fedx.exception.FedXRuntimeException: Type BindingSetAssignment not supported for cost estimation. If you run into this, please report a bug.
	at com.fluidops.fedx.optimizer.JoinOrderOptimizer.getFreeVars(JoinOrderOptimizer.java:162)
	at com.fluidops.fedx.optimizer.JoinOrderOptimizer.optimizeJoinOrder(JoinOrderOptimizer.java:79)
	at com.fluidops.fedx.optimizer.StatementGroupOptimizer.meetNJoin(StatementGroupOptimizer.java:170)
	at com.fluidops.fedx.optimizer.StatementGroupOptimizer.meetOther(StatementGroupOptimizer.java:78)
	at com.fluidops.fedx.algebra.NJoin.visit(NJoin.java:49)
	at org.eclipse.rdf4j.query.algebra.BinaryTupleOperator.visitChildren(BinaryTupleOperator.java:101)
	at org.eclipse.rdf4j.query.algebra.LeftJoin.visitChildren(LeftJoin.java:87)
	at org.eclipse.rdf4j.query.algebra.helpers.AbstractQueryModelVisitor.meetNode(AbstractQueryModelVisitor.java:661)
	at org.eclipse.rdf4j.query.algebra.helpers.AbstractQueryModelVisitor.meetBinaryTupleOperator(AbstractQueryModelVisitor.java:607)
	at org.eclipse.rdf4j.query.algebra.helpers.AbstractQueryModelVisitor.meet(AbstractQueryModelVisitor.java:367)
	at org.eclipse.rdf4j.query.algebra.LeftJoin.visit(LeftJoin.java:76)

How can this be enhanced/improved ?

Introduce hash join operator as alternative to bound joins

In certain use situations bound joins (or generally any nested loop evaluation) might not be the optimal evaluation algorithm.

Consider the following example:

?city a :City .
?city :inRegion :BW .

There might be a large number of cities in the database, which would cause a large intermediate result set as input to the nested loop join.

If there are factors like latency involved, it might make sense to fetch the result sets of both graph patterns individually, and perform a hash join locally.

This issue is about adding the appropriate operator.

It is yet to be decided how and in which cases to activate it.

Support to use fresh RepositoryConnections when interacting with federation members

Currently FedX uses a Singleton RepositoryConnection per endpoint when interacting with the federation member.

As per API RepositoryConnections are not thread-safe.

We sporadically observed strange behavior that cannot really be explained. This might be caused by reusing the connections across threads.

This issues is about adding the possibility to use fresh connections for each operation with a federation member

Support proper query timeouts

FedX should support proper query timeouts, i.e. users should be able to define the maximum execution time.

If the maximum execution time is passed, the query execution (and the evaluation of subqueries) should be stopped.

Remove VariableScopeOptimizer

The VariableScopeOptimizer was used to identify variables that are local to a BGP. The main idea was to use this information in the creation of subqueries, i.e. to avoid projecting non-required data from remote sources. Typically today transferred data is zipped, so the potentially non-relevant data does not provide significant cost, also this is typically an edge case.

It turned out that the implementation of this optimizer w.r.t correctness is very hard to achieve, compared to the potential gain in performance.

Thus we propose to remove this optimizer.

Support to set bindings on the query using the RDF4J Api

The RDF4J API supports to set bindings for a query using the API via Query#setBinding()

Currently in FedX such bindings are ignored during the evaluation.

FedX should support setting such bindings properly and consider these in the evaluation

Filter Optimizer does not correctly push filters to BGPs

The FilterOptimizer currently has an issue when pushing filter expressions to statement expressions:

the same filter expressions is attached to two statement expressions (and thus potentially applied twice) yielding invalid results in the join.

Example AST

QueryRoot
   Projection
      ProjectionElemList
         ProjectionElem "person"
         ProjectionElem "author"
      NJoin
         StatementSourcePattern
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
            StatementSource (id=sparql_localhost:18080_repositories_endpoint1, type=REMOTE)
            StatementSource (id=sparql_localhost:18080_repositories_endpoint2, type=REMOTE)
            FilterExpr
               Compare (=)
                  Str
                     Var (name=person)
                  ValueConstant (value="http://namespace2.org/Person_7")
         ExclusiveStatement
            Var (name=author)
            Var (name=_const_9f24f144_uri, value=http://www.w3.org/2002/07/owl#sameAs, anonymous)
            Var (name=person)
            StatementSource (id=sparql_localhost:18080_repositories_endpoint3, type=REMOTE)
            FilterExpr
               Compare (=)
                  Str
                     Var (name=person)
                  ValueConstant (value="http://namespace2.org/Person_7")

Optimization: push LIMIT clauses as early as possible

Often users write queries like the following when exploring datasets:

SELECT * WHERE { ?s ?p ?o } LIMIT 10

or

SELECT * WHERE { ?x a foaf:Person } LIMIT 10

Currently FedX does not push the limits to the endpoint, thus potentially reading the full database although the user is only interested in a very small subset indicated by the LIMIT.

For simple queries the LIMIT should be pushed to the endpoint to reduce the transfer of data.

Optimize exclusive groups in presence of SERVICE clauses

In case there is a SERVICE expresion resulting in an ExclusiveGroup, there might be other ExclusiveStatements having the same owner. These should be evaluated in the same ExclusiveGroup.

Example query plans

Before

QueryRoot
   Projection
      ProjectionElemList
         ProjectionElem "person"
         ProjectionElem "name"
      NJoin
         ExclusiveGroup
            ExclusiveStatement
               Var (name=person)
               Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
               Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
               StatementSource (id=endpoint1, type=REMOTE)
            ExclusiveStatement
               Var (name=person)
               Var (name=_const_23b7c3b6_uri, value=http://xmlns.com/foaf/0.1/name, anonymous)
               Var (name=name)
               StatementSource (id=endpoint1, type=REMOTE)
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_d7b490e6_uri, value=http://namespace1.org/Person, anonymous)
            StatementSource (id=endpoint1, type=REMOTE)

After

QueryRoot
   Projection
      ProjectionElemList
         ProjectionElem "person"
         ProjectionElem "name"
      ExclusiveGroup
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
            StatementSource (id=endpoint1, type=REMOTE)
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_23b7c3b6_uri, value=http://xmlns.com/foaf/0.1/name, anonymous)
            Var (name=name)
            StatementSource (id=endpoint1, type=REMOTE)
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_d7b490e6_uri, value=http://namespace1.org/Person, anonymous)
            StatementSource (id=endpoint1, type=REMOTE)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.