cleishm / libcypher-parser Goto Github PK

View Code? Open in Web Editor NEW

144.0 144.0 38.0 2.09 MB

Cypher Parser Library

License: Apache License 2.0

Shell 0.38% C 93.54% Makefile 0.59% M4 3.87% Dockerfile 0.08% XSLT 0.16% CMake 1.20% Roff 0.18%

libcypher-parser's People

Contributors

Stargazers

Watchers

libcypher-parser's Issues

Where is the grammar file

I found libcypher-parser does not fully match openCypher9 syntax.
So where can I find the grammar file that libcypher-parser obeys.
Thanks!

A question regarding numbers representation

Hi
Can you please explain the considerations for allowing a number (integer or float) representation to be only a string representing only non-negative values, while negative values are expressions with unary minus operation over a non-negative number?

float-string =
      < [0-9]+ '.'? [0-9]* [eE] [-+]? [0-9] sym-part* >
                                       { strbuf_append_block(); }
    | < [0-9]* '.' [0-9] sym-part* >   { strbuf_append_block(); }

integer-string = < [0-9] sym-part* >   { strbuf_append_block(); }

Your answer is appreciated.

APOC support

Hi @cleishm,

It seems that some APOC procedures are not supported. The following example query throws an error due to the parenthese after apoc.convert.toBoolean.

Query:

    MATCH (:NodeType)-[rel:relationship]->(:NodeType2)
    UNWIND rel.list_data as data
    WITH data, split(data, " ")[-1] as flag
    WHERE apoc.convert.toBoolean(flag)
    RETURN data

Note here: rel.list_data is of the format ["some_value flag", "some_value flag",...] where the flag string is being converted into a boolean via the APOC procedure.

This gives the output:

 @0    5..188  statement                  body=@1
 @1    5..188  > query                    clauses=[@2, @12, @17, @35]
 @2    5..60   > > MATCH                  pattern=@3
 @3   11..55   > > > pattern              paths=[@4]
 @4   11..55   > > > > pattern path       (@5)-[@7]-(@10)
 @5   11..22   > > > > > node pattern     (:@6)
 @6   12..21   > > > > > > label          :`NodeType`
 @7   22..43   > > > > > rel pattern      -[@8:@9]->
 @8   24..27   > > > > > > identifier     `rel`
 @9   27..40   > > > > > > rel type       :`relationship`
@10   43..55   > > > > > node pattern     (:@11)
@11   44..54   > > > > > > label          :`NodeType2`
@12   60..93   > > UNWIND                 expression=@13, alias=@16
@13   67..81   > > > property             @14.@15
@14   67..70   > > > > identifier         `rel`
@15   71..80   > > > > prop name          `list_data`
@16   84..88   > > > identifier           `data`
@17   93..165  > > WITH                   projections=[@18, @20], WHERE=@29
@18   98..102  > > > projection           expression=@19
@19   98..102  > > > > identifier         `data`
@20  104..137  > > > projection           expression=@21, alias=@28
@21  104..125  > > > > subscript          @22[@26]
@22  104..120  > > > > > apply            @23(@24, @25)
@23  104..109  > > > > > > function name  `split`
@24  110..114  > > > > > > identifier     `data`
@25  116..119  > > > > > > string         " "
@26  121..123  > > > > > unary operator   - @27
@27  122..123  > > > > > > integer        1
@28  128..132  > > > > identifier         `flag`
@29  143..165  > > > property             @30.@33
@30  143..155  > > > > property           @31.@32
@31  143..147  > > > > > identifier       `apoc`
@32  148..155  > > > > > prop name        `convert`
@33  156..165  > > > > prop name          `toBoolean`
@34  165..175  > > error                  >>(flag)\n   <<
@35  176..188  > > RETURN                 projections=[@36]
@36  183..188  > > > projection           expression=@37
@37  183..187  > > > > identifier         `data`

This is a valid implementation of the apoc procedure which performs as expected. Is there any current support for APOC procedures or are there plans to include this in a future release?

Thanks in advance!

UPDATE

This seems a similar issue to #19 which has been closed. What was the outcome of this?

In a similar fashion, putting backticks around apoc.convert.toBoolean to escape the projection removes the error, but this then does not work in neo

Build error on GCC 8.2.1 (20180831) due to -Werror

On ArchLinux, the current system's default GCC won't build the code because of a warning that's treated as error. Personally, I'm against releasing code to end users with -Werror, it makes sense on author's testing grid, but not on myriad end user's configurations. Warnings can vary per compiler and per compiler version. Sorry, that's the reason I'm reporting the occurrence without disclosing the error. If it had been kept as a warning I'd probably report and disclose it instead.

ASC or DESC in the return clause

Hi,

Is ASC/DESC not implemented yet? I use your parser in my project and have tested some queries.
Here one of the test queries: MATCH (n:N),(m:M) WHERE n.num = m.num RETURN n.num ASC
and here are the last three lines of the parser:

@23  47..50  > > > > > prop name     `num`
@24  45..51  > > > > identifier      `n.num`
@25  51..55  > > error               >>ASC\n<<

Will this feature come in the next time?

-Daizy

Operators precedency

Hi
There might be an issue with operators precedence, given an "IN" operator and subscript operator.
Given this open cypher TCK scenario, the query is
RETURN 3 IN [[1, 2, 3]][0] AS r, which should return true.

The AST generated by libcypher-parser is as follows:

I believe that the "IN" operator should be the higher in the tree, while the subscript should be much lower

Is there any guide for compiling libcypher-parser on windows 64 platform? Thank you

FYI: ALE support for (Neo)Vim available

For your information, I made the linter available in ALE, so anyone that uses ALE with Vim or NeoVim and have cypher-lint in path will benefit from instant lint feedback while editing Cypher source files.

Publish New Release

The last tag was in April of 2019 and there are quite a few things added since then. Can we get a release published? I'm specifically looking to use the fix for #19

More specifically, I am referring to getting it published to add-apt-repository ppa:cleishm/neo4j

support `CALL {} (subquery)`

neo4j support CALL {} (subquery) since v3.5. It is an amazing feature.

https://neo4j.com/docs/cypher-manual/4.1/clauses/call-subquery/

Syntax error when combining CALL and WHERE

Procedure calls cannot currently be succeeded by WHERE conditions:

$ echo "CALL db.labels() YIELD label WHERE label = 'fruit' RETURN label" | ./src/cypher-lint -a
<stdin>:1:31: Invalid input 'H': expected WITH
CALL db.labels() YIELD label WHERE label = 'fruit' RETURN label

Interpolating a WITH projection allows us to make this combination:

$ echo "CALL db.labels() YIELD label WITH label WHERE label = 'fruit' RETURN label" | ./src/cypher-lint -a
 @0   0..75  statement              body=@1
 @1   0..75  > query                clauses=[@2, @6, @12]
 @2   0..29  > > CALL               name=@3, YIELD=[@4]
 @3   5..14  > > > proc name        `db.labels`
 @4  23..29  > > > projection       expression=@5
 @5  23..28  > > > > identifier     `label`
 @6  29..62  > > WITH               projections=[@7], WHERE=@9
 @7  34..40  > > > projection       expression=@8
 @8  34..39  > > > > identifier     `label`
 @9  46..62  > > > binary operator  @10 = @11
@10  46..51  > > > > identifier     `label`
@11  54..61  > > > > string         "fruit"
@12  62..75  > > RETURN             projections=[@13]
@13  69..75  > > > projection       expression=@14
@14  69..74  > > > > identifier     `label`

The CALL...WHERE construction is supported in Neo4j.

Incorrect constraint error

$ cypher-lint -a < scripts/upload_t10_data.cypher
<stdin>:2:19: Invalid input 'u': expected '=' or CREATE CONSTRAINT ON
CREATE CONSTRAINT uniqueT10 IF NOT EXISTS ON (n:T10)
                  ^
<stdin>:5:19: Invalid input 'u': expected '=' or CREATE CONSTRAINT ON
CREATE CONSTRAINT uniqueMonth IF NOT EXISTS ON (m:Month)
                  ^
  @0     2..33    line_comment                // Make sure we have unique nodes
  @1    34..145   error                       >>CREATE CONSTRAINT uniqueT10 IF NOT EXISTS ON (n:T10)\n      ASSERT (n.date, n.name, n.result, n.m50) IS NODE KEY<<
  @2   149..175   line_comment                // Constraint for Month/Year
  @3   176..275   error                       >>CREATE CONSTRAINT uniqueMonth IF NOT EXISTS ON (m:Month)\n      ASSERT (m.month, m.year) IS NODE KEY<<
  @4   280..327   line_comment                // Needs apoc.import.file.enabled=true  in config

According to the cypher refcard, the following constraints format is acceptable (and working as expected):

// Make sure we have unique nodes
CREATE CONSTRAINT uniqueT10 IF NOT EXISTS ON (n:T10)
      ASSERT (n.date, n.name, n.result, n.m50) IS NODE KEY;
// Constraint for Month/Year
CREATE CONSTRAINT uniqueMonth IF NOT EXISTS ON (m:Month)
      ASSERT (m.month, m.year) IS NODE KEY;

Accepting Binary Inputs

I'm using RedisGraph and implementing some custom graph algorithms, and was hoping I'd be able to pass a serialized binary blob into my functions, so that I receive my inputs pre-structured and avoid having to parse them.

Something like... CALL algo.abc(binaryBlob)

This of course isn't possible because cypher relies on string parsing. But maybe you have some other suggestion for me to accomplish this?

Linting a composite index CREATE statement fails unexpectedly

Creating a file point.cypher with the contents:

create index on :Point(latitude, longitude);

and running

$ cypher-lint point.cypher

gets the following result:

point.cypher:1:32: Invalid input ',': expected ')'
create index on :Point(latitude, longitude);
                               ^

I expected this statement to lint correctly, per the spec here: https://neo4j.com/docs/developer-manual/current/cypher/schema/index/#create-a-composite-index

Are composite index statements not supported by the linter?

Possible to reconstruct query

Is it possible to take the parser output and reconstruct the query using that output?

In my use case I would like to allow my users to enter a query, but I would like to modify it on their behalf before finally querying the DB. Is this doable using this library, or am I barking up the wrong tree?

Thank you!

Q: Parser generation

Can I assume all of the ast_ files are generated from a grammar syntax and if so, is it the grammar from http://www.opencypher.org/resources?

Namespaced functions (e.g. `duration.between`) are rejected

Functions like duration.between are incorrectly rejected as they are parsed as property accesses instead of function calls.

root@36c23cede604:/tmp# cat t.c
MATCH (a) WHERE duration.between(a, b) < 4 RETURN a;
root@36c23cede604:/tmp# cypher-lint --ast t.c
t.c:1:33: Invalid input '(': expected '.', AND, OR, XOR, NOT, '=', '<>', '+', '-', '*', '/', '%', '^', IN, '=~', CONTAINS, STARTS WITH, ENDS WITH, '<=', '>=', '<', '>', IS NULL, IS NOT NULL, '[', '{', a label, ';' or a clause
MATCH (a) WHERE duration.between(a, b) < 4 RETURN a;
                                ^
t.c:
 @0   0..52  statement               body=@1
 @1   0..52  > query                 clauses=[@2, @11]
 @2   0..32  > > MATCH               pattern=@3, where=@7
 @3   6..9   > > > pattern           paths=[@4]
 @4   6..9   > > > > pattern path    (@5)
 @5   6..9   > > > > > node pattern  (@6)
 @6   7..8   > > > > > > identifier  `a`
 @7  16..32  > > > property          @8.@9
 @8  16..24  > > > > identifier      `duration`
 @9  25..32  > > > > prop name       `between`
@10  32..42  > > error               >>(a, b) < 4<<
@11  43..51  > > RETURN              projections=[@12]
@12  50..51  > > > projection        expression=@13
@13  50..51  > > > > identifier      `a`

The expected behavior would be to parse this as a function application with the function name as duration.between.

cmake build issue when running `tests`

libcypher-parser/lib/test/check_libcypher-parser_suite.c:1:10: fatal error: check.h: No such file or directory
1 | #include <check.h>
| ^~~~~~~~~

https://stackoverflow.com/questions/63697460/installed-check-for-c-but-check-h-not-found

I saw this post. Do I need to add any cmake arguments to get this working? I installed check by sudo apt install check.

I have this in the CMakeList.txt:

SET(CMAKE_C_FLAGS "-lpthread -lX11 -ldrm -lcheck -lm  -lrt -lsubunit")

Thank you in advance!

parse error on extraneous parentheses

for

MATCH p=((anna)-[:FriendOf*]->(bob))
RETURN p

cypher-lint gives:

<stdin>:1:10: Invalid input '(': expected an identifier, a label, '{', a parameter or ')'
MATCH p=((anna)-[:FriendOf*]->(bob))
         ^

but Neo4j seems to accept it alright. I'm running the latest release, not master, so I'm sorry if this has been fixed already!

#26 addresses this, I think?

Support for REGEX operator?

I've posted a question in Stackoverflow, but just cross-posting here since I could not use the libcypher-parser tag.

https://stackoverflow.com/questions/54573401/does-libcypher-parser-has-support-for-regex-operator

An more succinct example here:

echo "RETURN word =~ '.*'" | cypher-lint -a
<stdin>:1:14: Invalid input '~': expected NOT, '+', '-', TRUE, FALSE, NULL, "...string...", a float, an integer, '[', a parameter, '{', CASE, FILTER, EXTRACT, REDUCE, ALL, ANY, NONE, SINGLE, shortestPath, allShortestPaths, '(', a function name or an identifier
RETURN word =~ '.*'
             ^
@0   0..20  statement           body=@1
@1   0..20  > query             clauses=[@2]
@2   0..12  > > RETURN          projections=[@3]
@3   7..12  > > > projection    expression=@4
@4   7..11  > > > > identifier  `word`
@5  12..20  > > error           >>=~ '.*'\n<<

I can see references to an CYPHER_OP_REGEX in the code. But cannot find why it's not parsing as expected.

Question on setting parameters

What is the recommended way to set parameters in libcypher-parser?

":param x => [1,2];" results in:

@0   0..17  command   name=@1, args=[@2, @3, @4]
@1   1..6   > string  "param"
@2   7..8   > string  "x"
@3   9..11  > string  "=>"
@4  12..17  > string  "[1,2]"

Looking at the client-arg-string definition in the source, it seems that arguments are always interpreted as simple strings. Would it be difficult to parse parameter values as atoms instead, or does that pose issues I haven't considered?

Violation of safety checks on children in `cypher_ast_query` constructor

Hi @cleishm,

Thanks very much for the annotation API! I've started migrating our anonymous identifiers to leverage it, and it's working very smoothly so far.

Now that RedisGraph is synced with master here, we're running afoul of a safety check you introduced in 8c76cc5. Specifically, the cypher_ast_query children check:

libcypher-parser/lib/src/ast_query.c

Lines 51 to 52 in 776d96e

    
           REQUIRE_CHILD_ALL(children, nchildren, clauses, nclauses, 
        
                   CYPHER_AST_QUERY_CLAUSE, NULL);

To differentiate between scopes in multi-part queries, we invoke this constructor to make sub-ASTs that only represent a slice of clauses (punctuated by WITH and RETURN clauses). With appropriate shame, I admit that this approach involves a bit of dodgy casting:

	cypher_astnode_t *clauses[n];
	for(uint i = 0; i < n; i ++) {
		clauses[i] = (cypher_astnode_t *)cypher_ast_query_get_clause(master_ast->root, i + start_offset);
	}
	struct cypher_input_range range = {};
	ast->root = cypher_ast_query(NULL, 0, (cypher_astnode_t *const *)clauses, n, NULL, 0, range);

This now produces the error:
ast_query.c:52: cypher_ast_query: Assertion 'nchildren >= nclauses' failed.
We can avoid this by introducing the same variables as children, but this causes a lot of unnecessary allocations.

Do you have any advice for how we could construct ephemeral sub-ASTs to represent a slice of clauses in a more orthodox way?

Thank you!

Neo4j 4 database management commands unimplemented

Neo4j 4 offers new syntax for database management that I suppose libcypher-parser doesn’t know about yet.

Reproduce

Running libcypher-parser 0.6.2 via @majensen’s fork majensen/libneo4j-client@7bd6dfd:

neo4j> :status
Connected to 'neo4j://neo4j@localhost:7687' (insecure) [Neo4j/4.2.1]
neo4j> :dbname system
db set
neo4j> SHOW DEFAULT DATABASE;

Expected

"name","address","role","requestedStatus","currentStatus","error"
"neo4j","localhost:7687","standalone","online","online",""

Actual

<interactive>:1:2: error: Invalid input 'H': expected 't/T' or 'e/E'
SHOW DEFAULT DATABASE
 ▲

libcypher-parser allows clause sequences prohibited by the grammar

Cypher has some restrictions in its grammar file regarding clause sequences. For example, in a single-part query, a reading clause cannot follow a writing clause, and no clause can follow RETURN.

libcypher-parser allows these constructions, however, constructing ASTs for queries such as:

CREATE (a) MATCH (a)
RETURN 1 DELETE a
RETURN 1 RETURN 1

Shouldn't these constructions trigger parser errors?

unable to parse `apply` operator with `.` (membership operator) in funcName

this works:

MATCH (a:Node) RETURN abc("args1") AS a;

output:

 @0   0..40  statement                body=@1
 @1   0..40  > query                  clauses=[@2, @8]
 @2   0..15  > > MATCH                pattern=@3
 @3   6..14  > > > pattern            paths=[@4]
 @4   6..14  > > > > pattern path     (@5)
 @5   6..14  > > > > > node pattern   (@6:@7)
 @6   7..8   > > > > > > identifier   `a`
 @7   8..13  > > > > > > label        :`Node`
 @8  15..39  > > RETURN               projections=[@9]
 @9  22..39  > > > projection         expression=@10, alias=@13
@10  22..34  > > > > apply            @11(@12)
@11  22..25  > > > > > function name  `abc`
@12  26..33  > > > > > string         "args1"
@13  38..39  > > > > identifier       `a`

but this doesn't

MATCH (a:Node) RETURN abc.fn("args1") AS a;

output:

<stdin>:1:29: Invalid input '(': expected '.', AND, OR, XOR, NOT, '=~', '=', '<>', '+', '-', '*', '/', '%', '^', IN, CONTAINS, STARTS WITH, ENDS WITH, '<=', '>=', '<', '>', IS NULL, IS NOT NULL, '[', '{', a label, AS, ',', ORDER BY, SKIP, LIMIT, ';' or a clause
MATCH (a:Node) RETURN abc.fn("args1") AS a;
                            ^
 @0   0..43  statement               body=@1
 @1   0..43  > query                 clauses=[@2, @8]
 @2   0..15  > > MATCH               pattern=@3
 @3   6..14  > > > pattern           paths=[@4]
 @4   6..14  > > > > pattern path    (@5)
 @5   6..14  > > > > > node pattern  (@6:@7)
 @6   7..8   > > > > > > identifier  `a`
 @7   8..13  > > > > > > label       :`Node`
 @8  15..28  > > RETURN              projections=[@9]
 @9  22..28  > > > projection        expression=@10, alias=@13
@10  22..28  > > > > property        @11.@12
@11  22..25  > > > > > identifier    `abc`
@12  26..28  > > > > > prop name     `fn`
@13  22..28  > > > > identifier      `abc.fn`
@14  28..42  > > error               >>("args1") AS a<<

I think apply should have higher priority than property

	REQUIRE_CHILD_ALL(children, nchildren, clauses, nclauses,
	CYPHER_AST_QUERY_CLAUSE, NULL);