google / badwolf Goto Github PK
View Code? Open in Web Editor NEWTemporal graph store abstraction layer.
License: Apache License 2.0
Temporal graph store abstraction layer.
License: Apache License 2.0
Build a test corpus to validate BQL behavior. Also this workbench needs to support multiple backends. T
Is there any wikidata example available ?
If not, what would be the rough steps to use badwolf with wikidata
create graph ?world;
insert data into ?world {
/room<Hallway> "connects_to"@[] /room<Kitchen>.
/room<Kitchen> "connects_to"@[] /room<Hallway>.
/room<Kitchen> "connects_to"@[] /room<Bathroom>.
/room<Kitchen> "connects_to"@[] /room<Bedroom>.
/room<Bathroom> "connects_to"@[] /room<Kitchen>.
/room<Bedroom> "connects_to"@[] /room<Kitchen>.
/room<Bedroom> "connects_to"@[] /room<Fire Escape>.
/room<Fire Escape> "connects_to"@[] /room<Kitchen>.
/item/book<000> "in"@[2016-04-10T4:21:00.000000000Z] /room<Hallway>.
/item/book<000> "in"@[2016-04-10T4:23:00.000000000Z] /room<Kitchen>.
/item/book<000> "in"@[2016-04-10T4:25:00.000000000Z] /room<Bedroom>
};
select ?item, ?t from ?world where {
?item "in"@[?t] /room<Bedroom>
};
drop graph ?world;
results in an infinite loop at
Processing statement (3/4):
select ?item, ?t from ?world where { ?item "in"@[?t] /room<Bedroom> };
The Boolean expression evaluator is required to implement the HAVING clause listed on issue 19.
The internal representation of time can lead to difference on equal. Also, since the text version is used on the serialization, it can also affect stability of GUIDs for triples.
Add a method to retrieve all available graph names from the store.
Implement binding projection for the resulting results Table.
The idea of using an immutable graph store in conjunction with event sourcing make a lot of sense for an upcoming project, however it looks like BadWolf has been abandoned. Is this the case?
This is not the expected result. It seems like the merge table is a bad merge.
Welcome to BadWolf vCli (0.4.2-dev)
Using driver "VOLATILE". Type quit; to exit
Session started at 2016-05-17 12:46:28.098374381 -0700 PDT
bql> create graph ?family;
[OK]
bql> load /tmp/family.txt ?family;
Successfully processed 6 lines from file "/tmp/family.txt".
Triples loaded into graphs:
- ?family
bql> select ?grandparent from ?family where {?s "parent of"@[] /person<Amy Schumer> . ?grandparent "parent of"@[] ?s};
?grandparent
/person<Gavin Belson>
/person<Gavin Belson>
/person<Mary Belson>
/person<Mary Belson>
[OK]
bql>
The data used to run this command in /tmp/family.txt is
/person<Gavin Belson> "born in"@[] /city<Springfield>
/person<Gavin Belson> "parent of"@[] /person<Peter Belson>
/person<Gavin Belson> "parent of"@[] /person<Mary Belson>
/person<Mary Belson> "parent of"@[] /person<Amy Schumer>
/person<Mary Belson> "parent of"@[] /person<Joe Schumer
According to the BQL overview document, the "as" keyword should be able to be used to return a different name for variables. However, the keyword causes an error when running the program. The keyword only works when used with an aggregation.
When running this program:
# Create a graph.
CREATE GRAPH ?family;
# Insert some data into the graph.
INSERT DATA INTO ?family {
/u<joe> "parent_of"@[] /u<mary> .
/u<joe> "parent_of"@[] /u<peter> .
/u<peter> "parent_of"@[] /u<john> .
/u<peter> "parent_of"@[] /u<eve>
};
# Find all Joe's offspring names.
# Works fine without "as" keyword.
SELECT ?name
FROM ?family
WHERE {
/u<joe> "parent_of"@[] ?offspring ID ?name
};
# Find all Joe's offspring names.
# Fails with "as" keyword.
SELECT ?name as ?n
FROM ?family
WHERE {
/u<joe> "parent_of"@[] ?offspring ID ?name
};
# Count offspring.
# Works with "as" keyword.
SELECT ?parent_name, count(?name) as ?n
FROM ?family
WHERE {
?parent ID ?parent_name "parent_of"@[] ?offspring ID ?name
}
GROUP BY ?parent_name;
# Drop the graph.
DROP GRAPH ?family;
The output is:
Processing file bug.bql
Processing statement (1/6):
CREATE GRAPH ?family;
Result:
OK
Processing statement (2/6):
INSERT DATA INTO ?family { /u<joe> "parent_of"@[] /u<mary> . /u<joe> "parent_of"@[] /u<peter> . /u<peter> "parent_of"@[] /u<john> . /u<peter> "parent_of"@[] /u<eve> };
Result:
OK
Processing statement (3/6):
SELECT ?name FROM ?family WHERE { /u<joe> "parent_of"@[] ?offspring ID ?name };
Result:
?name
mary
peter
OK
Processing statement (4/6):
SELECT ?name as ?n FROM ?family WHERE { /u<joe> "parent_of"@[] ?offspring ID ?name };
[FAIL] [ERROR] Failed to execute BQL statement with error cannot project against unknow binding ?n; known bindinds are [?offspring ?name]
Processing statement (5/6):
SELECT ?parent_name, count(?name) as ?n FROM ?family WHERE { ?parent ID ?parent_name "parent_of"@[] ?offspring ID ?name } GROUP BY ?parent_name;
Result:
?parent_name ?n
joe "2"^^type:int64
peter "2"^^type:int64
OK
Processing statement (6/6):
DROP GRAPH ?family;
Result:
OK
Implement the result returning LIMIT clause.
Revisit the covariant definition after comment on https://news.ycombinator.com/reply?id=10432492
$ bw --driver=VOLATILE bql
...
bql> CREATE GRAPH ?foo;
[OK]
bql> INSERT DATA INTO ?foo {
/u<joe> "parent_of"@[2016-12-12T15:00Z] /u<julia>
};
[ERROR] failed to parse BQL statement with error predicate.Parse failed to parse time anchor
2016-12-12T15:00Z in "parent_of"@[2016-12-12T15:00Z] with error parsing time
"2016-12-12T15:00Z" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "Z" as ":"
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<julia> };
[ERROR] failed to parse BQL statement with error hook.DataAccumulator requires a predicate to
create a predicate, got &{NODE /u<joe> } instead
# This second error is spurious, but sticky. Only quitting and restarting bw seems to allow data to be
# inserted. If I enter the same sequence but with an acceptable timestamp, the second error does not
# occur.
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<fred> };
[ERROR] failed to parse BQL statement with error hook.DataAccumulator requires a predicate to create a predicate, got &{NODE /u<joe> } instead
bql> quit;
Thanks for all those BQL queries!
$ bw --driver=VOLATILE bql
Welcome to BadWolf vCli (0.5.1-dev @141940248)
Using driver "VOLATILE". Type quit; to exit
Session started at 2016-12-21 13:50:07.001652809 -0500 EST
bql> CREATE GRAPH ?foo;
[OK]
bql> INSERT DATA INTO ?foo {
/u<joe> "parent_of"@[2016-12-12T15:00:00Z] /u<julia>
};
[OK]
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<julia> };
[OK]
bql> INSERT DATA INTO ?foo { /u<joe> "parent_of"@[] /u<fred> };
[OK]
Excuse for the probably very naive question. I think I have a working badwolf instance, which I obtained by
go get golang.org/x/net/context
go get github.com/peterh/liner
go get github.com/google/badwolf/...
(is this the right way? It is not mentioned anywhere how to install).
In any case, I am able to use the bw
tool and follow the examples, use bw bql
to get a REPL and so on.
The question is: how do I leave a long running instance of badwolf? Even assuming I want to keep the data in RAM (persistence is not a priority right now, even though I see there are persistent backends ) each time I run the bw
an entire new instance of badwolf is created and apparently destroyed.
I assume there must be some way to leave badwolf running in the background and keep querying the existing graphs (even using the bw
tool, preferably with some kind of driver/network interface) but I could not find any information on this
For instance, it is not clear to me how to use the bw export
command: by the time I run a new bw
process, everything in the previous runs is lost, hence there is nothing to export. Similarly, I can run bw load
, but then data is lost as soon as the command returns. I am sure I am missing something obvious and fundamental here
Hey! I'm the maintainer of the other Google graph project, https://github.com/google/cayley
I know I've been out of the Google-sphere for a year now, but I've still been contributing to Cayley and it's been used in a number of projects and a few production instances. In short, it works pretty well and I'd like to grow it more.
I've read over your repo (you've written in Go too, good choice ;) ). For storage, you've got 99% the same primitives as Cayley. Triples Quads (went down that road, believe me), memory store, indices. Your methods for things like "TriplesForPredicateAndObject" are pretty standard Cayley iterators.
You're doing some nice things with regard to RDF literals I'd be excited to add to Cayley.
It seems like most of your novel work is in BQL. I'm just reading up on it now, so I haven't quite gotten the full notion of what makes this a new and interesting query language (would love to discuss), but even proposed as a black box, I'd be happy to add it as a query language in Cayley.
I've always been more of a storage guy, so if your interest is in inference and query languages, that works great. The advantages you'd get would be all sorts of backends, various optimizations on the iterators, while still being able to push forward with your temporal graph idea. Everybody wins. What do you think?
In bql/semantic/semantic_test.go, there is a typo in a test name that causes it to be ignored:
func TesIsEmptyClause(t *testing.T) {
testTable := []struct {
in *GraphClause
out bool
}{
{
in: &GraphClause{},
out: true,
},
{
in: &GraphClause{SBinding: "?foo"},
out: true,
},
}
for _, entry := range testTable {
if got, want := entry.in.IsEmpty(), entry.out; got != want {
t.Errorf("IsEmpty for %v returned %v, but should have returned %v", entry.in, got, want)
}
}
}
After changing the name to the proper TestIsEmptyClause
and running the tests, the test fails:
--- FAIL: TestIsEmptyClause (0.00s)
semantic_test.go:148: IsEmpty for &{<nil> ?foo <nil> <nil> <nil> false <nil> <nil> <nil> false} returned false, but should have returned true
FAIL
FAIL github.com/google/badwolf/bql/semantic 0.028s
Add the collection of bindings and directions to the Statement, enforce validation, and extend the query planner to use the table sort functionality.
Plumb context.Contexxt
https://godoc.org/golang.org/x/net/context
to all methods on the storage interfaces and fix the volatile memory driver accordingly.
Extend the Table functionality to allow arbitrary row filtering with the provided filtering functions. This is requires to solve issue #19.
You can create bindings for TYPE and ID for nodes. For predicates you have ID for the id, and AT for extracting the id and time anchor. Documentation should reflect those.
Right now there is no insight on how selects behave. This would help gain it.
Implementing #45 will require to extend the planner to be able to create and insert he new facts based on the retrieved data.
The test workbench build on issues #25 should be available to run via the command line tool.
In preparation for 2017, besides working on extending BQL (see issues #45, #46, #47, and #48), we are planning to start exploring providing support for graph structural query operations. Some examples we could focus one could cover:
At this point we are considering the list above more or less in the order we would approach them. Is there any other operation you would need to get added? Do you have a pressing operation that would simplify your usage?
Construct queries allow to create new facts to be added to graphs. The facts are defined based on the bindings provided in the WHERE
clause. Basic filtering capabilities are provided by adding a HAVING
clause.
A simple example adding new facts based on the current ones:
CONSTRUCT {
?p "grandmother of"@[] ?g .
?g "grandchild of"@[] ?p
}
INTO ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
?p "parent of"@[] ?parent .
?parent "parent of"@[] ?g .
?p "gender"@[] ?gender
}
HAVING ?gender == /gender<male>;
It is worth mention that the abover query could be simplified as shown below. Never the less, the goal what to show the full structure of a CONSTRUCT
query.
CONSTRUCT {
?p "grandmother of"@[] ?g .
?g "grandchild of"@[] ?p
}
INTO ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
?p "parent of"@[] ?parent .
?parent "parent of"@[] ?g .
?p "gender"@[] /gender<male>
};
Subjects are allowed to specify _
instead of a WHERE
clause binding. This will inject a new blank node.
As a first step towards unifying efforts with http://github.com/google/cayley we are going to target creating a driver implementation against Cayley together with @barakmich.
Implementing #45 requires to modify the grammar to accept the new CONSTRUCT
statement.
How does one use type covariance in a query? The documentation doesn't cover it.
Check needs validation that bindings outside the group by ones all have aggregation functions.
https://github.com/google/badwolf/blob/master/bql/grammar/grammar.go#L612
Does not contain any of the elements that form and object making the comparison bounded on flexibility.
Do one last pass to the initial conformance tests and if everything checks out cut the first release.
Generated tables via graph clauses needs to be projected via SELECT vars.
Héllo,
First and foremost thanks for sharing this project! This is very interesting!
I am a database modeling enthusiast, I created a database in Python called AjguDB, which is a graphdb on top EAV (on top of wiredtiger ordered key value store, it's similar to boltdb). I did a similar project in Scheme which is can be queried using miniKanren (a logic language embedded in Scheme language). My inspiration is mostly datomic database even if I skipped the immutable part.
I used to think that EAV was a triplestore.I am reconsidering that fact. It seems like EAV model is less generic that triplestore model. My understanding, is that both are good at modeling sparse matrix / multidimentional data but EAV is really good at representing documents whereas triplestores are good at representing triples (or facts). One might say, that a document is a set of triples. But in EAV model you don't have control over the entity, it's randomly generated. At the end of the day, I think triplestore is just EAV with E that is not a unique identifier. WDYT?
Is it possible to adapt Gremlin to work on quads?
I am surprised that there is no mention of geographical data in some way. Is it something you plan to add?
How do you cope with immutability during querying? Here is a pratical example, say there is triple that says that «there is hundred people in a twon in 2017». Now, it's 2018, do I need to create a new triple or update the old triple? Do triples have an history? It seems to me that a database present in BadWolf must be clean, you can not fix typos or it will kludge results.
Can you recommend me stuff to read about BadWolf.
I will dive into boltdb drivers.
Use the Table grouping implemented on issue #17 to provide a functional group by clause.
create graph ?world;
insert data into ?world {
/room<000> "named"@[] "Hallway"^^type:text.
/room<000> "connects_to"@[] /room<001>
};
fails with:
[FAIL] [ERROR] Failed to parse BQL statement with error Parser.parse: Failed to consume symbol INSERT_OBJECT, with error Parser.consume: could not consume token &{ERROR "Hallway" [lexer:0:57] predicates require time anchor information; missing "@[} in production INSERT_OBJECT
To improve execution debugging and performance improvements, we should add a simple tracing mechanism to see detailed traces of query execution.
In commit 9845651 DeleteRow introduced non deterministic behavior. This is a bit strange and should be investigated further.
Deconstruct queries allow to removed derived facts from a graphs. The facts are defined based on the bindings provided in the WHERE clause. Basic filtering capabilities are provided by adding a HAVING clause. This is the complementary statement for CONSTRUCT
introduced in issue #45.
DECONSTRUCT {
?p "grandmother of"@[] ?g .
?g "grandchild of"@[] ?p
}
AT ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
?p "parent of"@[] ?parent .
?parent "parent of"@[] ?g .
?p "gender"@[] ?gender
}
HAVING ?gender == /gender<male>;
It is worth mention that the abover query could be simplified as shown below. Never the less, the goal what to show the full structure of a CONSTRUCT
query.
DECONSTRUCT {
?p "grandmother of"@[] ?g .
?g "grandchild of"@[] ?p
}
AT ?graph1, ?graph2
FROM ?graph3, ?graph4
WHERE {
?p "parent of"@[] ?parent .
?parent "parent of"@[] ?g .
?p "gender"@[] /gender<male>
};
_
are not allowed on DECONSTRUCT
clauses.
Extend the Table functionality to allow arbitrary aggregation with the provided aggregation functions.
Implementation of #45 requires to update the Statement
to collect the relevant information on how to construct the new triples.
Do another pass to the document ion and compliance stories. Once done, label the latest master commit as RC1 after updating the version number.
Right now I am manually running all tests before commits. We should set up continuous testing for the whole project to run at least all the available unit test.
Add the collection of anchor bounds and properly compute the intervals, enforce validation, and extend the query planner to properly use the provided bounds.
Given the following data set
/_<c175b457-e6d6-4ce3-8312-674353815720> "_predicate"@[] "/some/immutable/id"@[]
/_<c175b457-e6d6-4ce3-8312-674353815720> "_owner"@[2017-05-23T16:41:12.187373-07:00] /gid<0x9>
/_<c175b457-e6d6-4ce3-8312-674353815720> "_subject"@[] /aid</some/subject/id>
/_<c175b457-e6d6-4ce3-8312-674353815720> "_object"@[] /aid</some/object/id>
/_<cd8bae87-be96-41af-b1a8-27df990c9825> "_object"@[2017-05-23T16:41:12.187373-07:00] /aid</some/object/id>
/_<cd8bae87-be96-41af-b1a8-27df990c9825> "_owner"@[2017-05-23T16:41:12.187373-07:00] /gid<0x6>
/_<cd8bae87-be96-41af-b1a8-27df990c9825> "_predicate"@[2017-05-23T16:41:12.187373-07:00] "/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00]
/_<cd8bae87-be96-41af-b1a8-27df990c9825> "_subject"@[2017-05-23T16:41:12.187373-07:00] /aid</some/subject/id>
/aid</some/subject/id> "/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00] /aid</some/object/id>
/aid</some/subject/id> "/some/immutable/id"@[] /aid</some/object/id>
/aid</some/subject/id> "/some/ownerless_temporal/id"@[2017-05-23T16:41:12.187373-07:00] /aid</some/object/id>
The following query succeeds as expected.
bql> SELECT ?bn,?p, ?o
FROM ?test
WHERE {
?bn "_subject"@[,] /aid</some/subject/id>.
?bn "_predicate"@[,] ?p .
?bn "_object"@[,] ?o
};
?bn ?p ?o
/_<cd8bae87-be96-41af-b1a8-27df990c9825> "/some/temporal/id"@[2017-05-23T16:41:12.187373-07:00] /aid</some/object/id>
[OK] Time spent: 578.963µs
However, when you specify ?o, it fails with a filtering error.
bql> SELECT ?bn,?p
FROM ?test
WHERE {
?bn "_subject"@[,] /aid</some/subject/id>.
?bn "_predicate"@[,] ?p .
?bn "_object"@[,] /aid</some/object/id>
};
[ERROR] planner.Execute: failed to execute insert plan with error failed to fully specify clause { ?bn "_object"@[,] /aid</some/object/id> } for row map[?bn:/_<cd8bae87-be96-41af-b1a8-27df990c9825>]
Time spent: 514.294µs
Given the this is just and update on the query above, this should have not failed and return one row with ?bn
and ?p
bindings.
A triple is asserted with an anchor time, but there is no mechanism for unanchoring the triple, that is invalidating the triple. One approach I thought about was have a nil
type that denotes the triple has been retracted. Another method could be to implement this in the logic of the storage layer. When a triple is "deleted", the triple is stored in a retracted set. When triples are requested, matched triples would need to be evaluated against the retracted set before being returned. Have you thought about this at all?
As part of the effort to enable Cayley (http://github.com/google/cayley) as a backend for BadWolf, its efficient usage would require to provide an iterator tree as an output of the BQL parsing for Cayley to pick it up. This would enable Cayley to use the optimizer before executing the query. The iterator tree should be translatable to the final Cayley one as discussed with @barakmich.
Should be the last pre release cut before the stable initial 0.1.0 release.
The CONTRIBUTING.md
file has a cryptic sentence at the end:
This commit can be part of your first [Differential][] code review.
This appears to be missing a link and out-of-context. What's "Differential"? Is it a component of Phabricator? But then how does it relate to GitHub PRs which is presumably the way to contribute to this project?
Either there should be a link there to explain where Differential comes in, or this sentence should be removed entirely.
In W3C SPARQL, a collection of triples can be structured in terms of named graphs, and a query expression can refer to these (directly by identifier, or using variables). Have you considered applying this structure for your temporal query facilities? e.g. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#namedGraphs and nearby.
Add the collection of bindings and conditions, directions to the Statement, enforce validation, and extend the query planner to use the table filter functionality.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.