Giter Site home page Giter Site logo

ftsrg / ingraph Goto Github PK

View Code? Open in Web Editor NEW
47.0 47.0 10.0 37.17 MB

Incremental view maintenance for openCypher graph queries.

Home Page: http://docs.inf.mit.bme.hu/ingraph/

License: Eclipse Public License 1.0

Java 4.32% TeX 20.21% Scala 44.62% Python 0.18% Shell 0.69% HTML 0.24% Makefile 0.06% CSS 0.75% Gherkin 27.69% Perl 0.90% SQLPL 0.33%
cypher cypher-query-language graph graph-processing graphs incremental ingraph opencypher pattern-matching query query-engine research

ingraph's People

Contributors

alippai avatar dszakallas avatar gkate90 avatar hegyibalint avatar imbur avatar jmakai avatar jmarton avatar makaijozsef avatar marci543 avatar nyirit avatar szarnyasg avatar wafle avatar zsambek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ingraph's Issues

Use different VariableFactories for query parts where applicable

For certain query parts, using the same name does not mean they are the same variables. So separate VariableFactory instances need to be used for:

  • each singleQuery (i.e. those that are unioned together) (see also: #9)
  • subqueries (i.e. those delimited by WITH or RETURN)

Note: for the subqueries, the variables present in the WITH clause are chained forward and available in the subsequent subqueries

Why do we need bags?

Let's gather some use cases for multisets/bags.

SQL keeps duplicates, because:

  • Duplicates are important for aggregate queries.
  • In general, bags can be more “efficient” than sets.

(source)

Type check for comparison expressions

Not all expression-pairs can be operand of comparison operators. Some type check needs to be done when compiling the cypher query to te relational algebra expressions.

These are legal:

  • nodeVar1 = nodeVar2
  • nodeVar1 != nodeVar2
  • nodeVar1.numericAttribute < 5

However, these are not, and a compile-time error should be omitted:

  • nodeVar1 < 5
  • nodeVar1 < nodeVar2

In general, these cases should emit a warning (list can be continued):

vertex and edge variables:

  • can't be tested for less than, greater than etc. with each other
  • can't be compared to arithmetic expressions and string expressions
  • can't be compared to attribute variables

Support OPTIONAL MATCH as the first MATCH clause of a query

We should support single queries that has an OPTIONAL MATCH clause as their first MATCH clause.

E.g. the following should return a singlle null-row, provided, there is no vertex having the label NON_EXISTENT_LABEL.

OPTIONAL MATCH (n:NON_EXISTENT_LABEL)
RETURN n

Note: this implies that all MATCH clauses in this single query is OPTIONAL MATCH.

Handle Union in opencypher2relalg compilation

  • add Union as BetaOperator to EMF model

  • handle Union in cyp AST -> relalg compilation

  • use different VariableFactories fo each singleQuery (i.e. those that are unioned together)

  • make distinction between UNION ALL/UNION

    match (n:A)
    return n
    union
    match (n:D)
    return n
    
  • add requirement: variable aliases need to be matched (see also: #16)
    otherwise exception is thrown in Neo4j, see:

    CREATE (n1:A{name:"World"}) create (n2:B{whatever:"Universe"});
    MATCH (a:A) RETURN a AS a
    UNION
    MATCH (b:B) RETURN b AS a;
    

    Neo4j: All sub queries in an UNION must have the same column names

Handle Edge label collision in Cypher2RelalgUtil.ensureLabel

See also: #10

  • ensureLabel should indicate (possibly in EMF model) when a labelset is already set for a variable, and the new is incompatible.
  • This has similar effect as a filter(false) condition. query optimizer can replace these branches with an empty data source of specified schema (list of variables)

"Don't care" vs "anonymous" variables

Check this query:

MATCH ()
RETURN *

In this case, our parser generated a variable and marks it as "don't care". I don't think this is an appropriate name - we should call these variables "anonymous" variables.

Improve tech report

  • Fix broken references
  • Add header
  • Improve description of relational algebraic operators
    • Unwind
    • Aggregation operators, e.g. collect()
    • Bag semantics
  • Add new query sets
    • Train Benchmark queries
    • Social Network Benchmark queries
    • Movie database queries
    • Static analysis - JS
    • Static analysis - Java (jQAssistant)

Allow many labels for VertexVariable and EdgeVariable

See also: #11

  • improve EMF relalg model for VertexVariable
  • improve EMF relalg model for EdgeVariable
  • handle in compilation (Cypher2RelalgUtil.ensureLabel)
    • for VertexVariable
    • for EdgeVariable

Effective semantics of the labelset constraints:

  • multiple labels of a VertexVariable are and-ed together (in the relalg model, this is the semantics of VertexLabelSet)
  • multiple occurences of a VertexVariable with possible different label sets enforce a contraint that a matching variable needs to match the union of the label sets.
  • multiple labels of an EdgeVariable are or-ed together (in the relalg model, this is the semantics of EdgeLabelSet)
  • multiple occurences of an EdgeVariable with possible different label sets enforce a constraint that a matching variable needs to match the intersection of the label sets.
  • for EdgeVariables, a clear distinction is to be made between the cases when there are no labelset constraints given for an edge variable and when their intersection is the empty set (this is handled in LabelSetStatus)

"Operator" is ambiguous

We currently use Operator multiple times in the relalg model:

  • Operator, UnaryOperator, etc. are relational algebraic operators
  • ArithmeticComparisonOperator, UnaryArithmeticOperator, etc. are enums

Sometimes this makes "programming by Ctrl+Space" a bit more difficult :-)

Idea: add a Type suffix to enums, e.g. ArithmeticComparisonOperatorType, UnaryArithmeticOperatorType, ...

"mutual" vs "common" attributes

Currently, we use the terminology "mutual attributes" in the code (LaTeX/Xcore) and "common attributes" in the submitted paper. We should unify this.

Add support for lists

This a tough one, so let's break it down to simpler ones.

First, we should aim at simple queries:

MATCH (n)
RETURN COLLECT(n) AS nodes
WITH [1,2,3] AS x
UNWIND x AS y
RETURN y
WITH [1,2,3] AS x
UNWIND x AS y
RETURN COLLECT(y) AS ys

Handle transitive closure operations

We should consider how to handle minHops and maxHops in Cypher queries: https://neo4j.com/docs/developer-manual/current/cypher/#_variable_length_relationships

-[:TYPE*minHops..maxHops]->

The default values are:

  • minHops: 1
  • maxHops: infinity

Possible cases (projections [π operator] omitted):

  • If maxHops is limited: we should unfold the expression to a union of joins.
    • If minHops is 0 -> e:E*0..2 can be defined as r ∪ r ⨝ (E ∪ (E ⨝ E))
    • If minHops is larger than 0 -> e:E*1..3 can be defined as r ⨝ (E ∪ (E ⨝ E) ∪ (E ⨝ E ⨝ E))
  • If maxHops is infinity (default):
    • If minHops is 0 -> e:E*0.. can be defined as r ∪ r ⨝ E+
    • If minHops is 1 (default), we should simply use transitive closure -> e:E*1.. can be defined as r ⨝ E+
    • If minHops is larger than 1 and maxHops is infinity, we should unfold exactly minHops hops -> e:E*2.. can be defined as r ⨝ E ⨝ E+

compiler: Standard Cypher constructions

(Related ticket: #51)

Use this issue to keep track of supported Cypher constructs.

Clauses

  • Pattern matching constructs
    • node pattern
    • relationship pattern of a single hop
    • relationship pattern of multiple hops
    • direction of the relationship pattern
    • attribute constraints in node and relationship patterns
  • Data manipulation
    • CREATE
    • MERGE
    • SET
    • DELETE
    • DETACH DELETE
    • REMOVE
  • Procedure calls
    • CALL ... YIELD (as this call may run an arbitrary Java code, there is no way we can handle these incrementally)

Sub-clauses

We can safely ignore these for now.

  • ON CREATE
  • ON MATCH

Operators

  • String operations
    • STARTS WITH
    • CONTAINS
    • ENDS WITH
  • Query constructs
    • MATCH
      • Sub-clause:
        • WHERE
      • MATCH, without OPTIONAL
      • OPTIONAL MATCH
        • as non-first MATCH clause of a single query
        • as first MATCH clause of a single query, see #40
    • RETURN
      • RETURN *
      • expression on the return list
      • alias handling
      • RETURN DISTINCT
      • Sub-clauses:
        • ORDER BY (handles variables and expressions as well as name resolution for aliased RETURN items)
        • SKIP (SKIP and LIMIT only implemented for constants)
        • LIMIT
    • UNWIND
    • WITH
      • Sub-clause:
    • UNION
  • Arithmetic comparison
    • =
    • <> (also !=, though non-standard)
    • <
    • >
    • <=
    • >=
  • Arithmetic operators
    • +
    • -
    • /
    • *
    • ^
    • %
  • Logical operators
    • NOT
    • AND
    • OR
    • XOR
  • NULLs
    • <variable> IS [NOT] NULL
    • <expression> IS [NOT] NULL
  • Accessors
    • . (property access)
    • [] (subscript) (Note: we don't support p['foo'], which is a syntax sugar for p.foo)

Expressions

We can safely ignore these for now:

Functions

Types

  • primitives
  • list
  • map
  • node
  • relationship
  • path

Handle expression aliases for RETURN and WITH clauses

  • update EMF model to accomodate expression aliases
  • WONTFIX: in EMF model, change expression aliases from String to Variable. INSTEAD that, ExpressionVariable was introdiced to the relalg EMF model to accommodate maned expressions.
  • handle name collision between model variables and alias variables (see explanation below in 2nd comment)
  • handle in union (see #9) either in cypher2relalg or at a later stage
  • handle WITH clause
  • handle WITH ... WHERE subclause (moved to #124)
  • implement variable chaining

Enable regression testing by introducing two test sets

There are a few hundred tests that we have imported from several sources. Our project, however, is a work-in-progress and succeeds only for a subset of the tests. We would like to mark those tests that are known to work at some time point (test set 1). We require these tests to pass for future commits on Travis in order to recognize if something breaks.

To achieve this, we need to

  • mark tests that are required to pass (test set 1)
  • run test set 1 on Travis
  • create a convenient way to upgrade test set 1 from all tests with the newly covered tests

Note: test set 1 is intended to expand only, i.e. to form a set-series where each element is a superset of any previous elements.

engine: Incremental evaluation of standard Cypher constructions

(Related ticket: #35)

Use this issue to keep track of supported Cypher constructs.

Clauses

  • Query constructs
    • MATCH
      • Transitive closure
    • RETURN (Production node)
    • UNWIND
    • OPTIONAL MATCH (left outer join)
    • WITH this is translated to joins in the query plan
    • UNION
  • Data manipulation (moved to #170)
    • CREATE
    • MERGE
    • SET
    • DELETE
    • DETACH DELETE
    • REMOVE
  • Procedure calls
    • CALL ... YIELD (as this call may run an arbitrary Java code, there is no way we can handle these incrementally)

Sub-clauses

  • Conditions
  • Ordering
    • ORDER BY
    • SKIP
    • LIMIT
  • Data manipulation (moved to #170)
    • ON CREATE
    • ON MATCH

Operators

  • DISTINCT
  • String operations
    • STARTS WITH
    • CONTAINS
    • ENDS WITH
  • Arithmetic comparison #43
    • =
    • <>
    • <
    • >
    • <=
    • >=
  • Arithmetic operators
    • +
    • -
    • /
    • *
    • ^
    • %
  • Logical operators https://github.com/FTSRG/ingraph/tree/master/queries/ldbc-snb
    • NOT
    • AND
    • OR
    • XOR
  • NULLs
    • IS NULL
    • IS NOT NULL
  • Accessors
    • . (property access) not required as the query plan will be translated to tuples
    • [] (subscript)

Expressions

Functions

Types

  • primitives
  • list
  • map
  • node
  • relationship
  • path

Implement SortAndTopNode operator for "ORDER BY ... SKIP/LIMIT ..." expressions

Implement a SKIP...LIMIT operator, which works on ordered expressions: https://neo4j.com/docs/developer-manual/current/cypher/#_skip_and_limit

We only support SKIP and LIMIT if they are preceded by an ORDER BY expression:
https://neo4j.com/docs/developer-manual/3.0/cypher/#query-order

A sub-clause following RETURN or WITH, specifying that the output should be sorted in particular way.
The user of the library should specify a list of variable-direction pairs (e.g. person-ASC, city-DESC) for the ordering.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.