jakewins / gqlite Goto Github PK

View Code? Open in Web Editor NEW

20.0 7.0 5.0 572 KB

Rust 99.23% Python 0.77%

graphdb graph database

gqlite's Introduction

Structure

gqlite exposes three main API surfaces

the gqlite rust library, at [src/lib.rs]
the g program, at [src/main.rs]
the libgqlite c bindings to the rust library, at [gqlite-capi/src/lib.rs]

The g program and the libgqlite c bindings are both wrappers around the gqlite rust library.

Internal structure

gqlite is organized into a "frontend" and a "backend". The frontend contains the parser and planner. Backends contain storage and provides executable implementations of the logical operators emitted by the frontend.

Getting Started

To build everything, ensure that you have Cargo and Rust installed.

Build

cargo build

Run

The repo comes with a small graph in gram file format, representing the characters in Les Miserables. To run a "hello world" example, let's apply a simple cypher query that pulls out character names.

$ ./target/debug/g -f miserables.gram 'MATCH (n:Person) RETURN n.name'
built pg: PatternGraph { e: {8: PatternNode { identifier: 8, labels: [0], props: [], solved: false }}, e_order: [8], v: [] }
plan: Return { src: NodeScan { src: Argument, slot: 0, labels: Some(0) }, projections: [Projection { expr: Prop(Slot(0), [2]), alias: 9, dst: 1 }] }
----
9
----
"Napoleon"
"Myriel"
"Mlle.Baptistine"
"Mme.Magloire"
"CountessdeLo"
"Geborand"

Test

$ cargo test
(...)
test result: ok. 14 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Limitations

This code is currently under development and only supports a small subset of the Cypher language. Trying certain cypher queries may result in errors about "The gram backend does not support this expression type yet" or other syntax errors.

The subset of Cypher that is currently supported is best described by the grammar found in src/backend, and should expand over time.

License

This is not (yet) available under an open source license, the source is simply available for reading.

gqlite's People

Contributors

Stargazers

Watchers

Forkers

knutwalker moxious s1ck akollegger

gqlite's Issues

Add SET support to planner

See how we add CREATE support here: https://github.com/jakewins/gqlite/pull/4/files#diff-e6ecf87708389c5c6e42b32a67b1b2c1R516

You check if an identifier is "bound", so you tell that this query is invalid (because n doesn't point to anything): SET n.name = "bob", but this one is ok: MATCH (n:Person) SET n.name="Bob", see the description here

Use OpenCypher Technology Compatibility Kit to test gqlite cypher implementation

https://www.opencypher.org/resources
"OpenCypher Technology Compatibility Kit is a Cucumber-based set of tests that can be used for any Cypher implementation in any of the many languages supported by Cucumber. "

We could test in rust using cucumber-rust

or we could use nodes wrapper to test in node using cucumber-js which looks much more mature/supported

Python/pip packaging

Make gqlite usable in python by creating a python package that accesses the gqlite dll.

Publish to pip

Gram file format question

Notice the output syntax in the text on this: #36

Note that the "x" param is in quotes, when it needn't be, and how in the standard miserables.gram it doesn't show this same format:

(`Child1`:Person {name: "Child1", group:10})

Probably not harmful, but notable that the gram backend I guess is outputting unnecessary extra stuff, presumably not required by the gram format.

Set up CircleCI or travis or similar CI

Add CREATE support to the gram backend

This should enable some of the OpenCypher TCK tests to actually pass

Add DELETE support to planner

eg.

MATCH (n:Person)
DELETE n

This would involve adding delete <identifier> to the parser, and extending the frontend to understand it, so that it emits an additional LogicalPlan operator. See how the CREATE operator is added here. The delete operator would be saying "delete the thing in slot X", where "slot x" is the slot that the n identifier go assigned to.

You can check that an identifier has a slot assignment (as a result of a match or a create clause) by converting the string you get out of the parse to a Token (via pc.tokenize(..)) and then calling pc.is_bound(..), like this.

The create PR has examples of one approach to writing tests here.

Finally, you'll need to modify the GRAM backend to panic or something if it gets a logical plan with delete in it, for now.

Nodejs packaging

Create a node.js module wrapper for gqlite dll

project neon provides fancy tooling: https://github.com/neon-bindings/neon
some more info (including how to do it without using neon) here:
https://blog.risingstack.com/node-js-native-modules-with-rust/

Publish to npm

Add LDBC benchmarks

Rust has a built in benchmark framework that seems like a good place to start?

One suggested approach would be something like:

Get a hold of the LDBC CSV files for some reasonable scale factor (ie something that we can check into the repo)
Convert those CSV files to gram and gzip it
Add support for reading gzipped gram files
Write a few benchmarks that import the gzipped gram file and runs some subset of the LDBC workload

Create gram file if none exists

In the g tool, if you run against a non-existent file:

g -f nofile.db "CREATE (n:Person)"

Then it fails, since the file doesn't exist.

We should create the file if it's missing.

Add WHERE support to planner

This is more interesting than it seems. There currently is a limited type of constraints you can add to MATCH clauses:

MATCH (n:User {name: "Bob"})

Is really short-hand for:

MATCH (n)
WHERE n:User AND n.name = "Bob"

So, the interesting thing here is that ideally you pull the WHERE clause up into the PatternGraph, as part of parsing the MATCH, and then solve any constraints in WHERE as part of solving the PatternGraph.

However, the general where clause will make it clear that our current implementation is missing an important aspect: It currently ignores data dependencies between expressions when it solves the graph. In a case like this:

MATCH (n)-[:KNOWS]->(p)
WHERE n.name = p.name

You've actually effectively added another edge into the pattern graph: You can't resolve the predicate until both n and p are in scope.

So, the solver needs to become smart enough to understand that some expressions depend on others, and thus help dictate the possible ways you can solve the query. As a minor aside, I think gqlite technically allow the following currently:

MATCH (n)-[:KNOWS]->(p {name: n.name})

Which.. is kind of cool. Maybe we should allow that. Maybe GQL already does, I'm not sure.

And, a final observation: When there are data dependencies like the last example query above, ideally we represent them in the patterngraph such that both ways of viewing that dependency is clear. Recognizing that that last query is identical to this one:

MATCH (n {name: p.name})-[:KNOWS]->(p)

So you can either do a NodeScan from p, expand to n-nodes matching the predicate, or you can do a nodescan from n, and expand to p matching the predicate.