wangjohn / eager_db Goto Github PK

5.0 2.0 0.0 528 KB

Database management layer for preloading queries.

License: MIT License

Ruby 100.00%

eager_db's Introduction

EagerDB

EagerDB is a database management layer for preloading queries. Think of it as a predictive cache warming tool. It listens to your incoming SQL queries and warms your cache with queries that you are likely to make.

Development Progress

EagerDB is currently in development (Version 0) and not quite ready to release yet. If you find any issues or would like to collaborate, please email [email protected] and I'd love to talk!

Getting Started

Manually Specifying Preloads

If you know exactly what is going to be needed by your application, you can manually specify queries to preload. For example, if you know that whenever the application makes a certain SQL query, you can tell EagerDB to load another query. For example, say some user is looking for the apple pies that grandma made:

SELECT * FROM pies WHERE name = 'apple_pie' AND creator = 'grandma'

With high probability, you know that user is then looking to try to look for the recipe to that specific applie pie. Let's say you got back a result from your original query which was something like:

{ id: 553, name: 'apple_pie', creator: 'grandma', recipe_id: 234 }

Then, you might want to go ahead and warm your database's cache with the following query:

SELECT * FROM recipes WHERE id = 234 AND creator = 'grandma'

However, you'd really like this to be more general. You know that whenever a user is looking for any type of apple pie with any type of creator, then you should preload the recipe for that apple pie with that creator. You can do just that in EagerDB. Just write the following:

- "SELECT * FROM pies WHERE name = ? AND creator = ?"
    => "SELECT * FROM recipes WHERE id = ? AND creator = ?", match_result.recipe_id, match_bind_value(1)

There are a couple things here to note about syntax, the dash - at the beginning of the first line signifies that you want to match on that statement. It tells EagerDB that whenever you see the following statement, preload the statements that come after which are preceded by a hash rocket =>.

You can use match_bind_value(index) to get the bind value from the matching statement. Bind values are the values which are replaced by question marks ?. So in our example, match_bind_value(1) will correspond to the second bind value (we index by 0) which in our example would be grandma.

You can also use match_result to use the result from the match query. By specifying match_result.recipe_id, you are getting the recipe_id from the result of the original matched query. Of course, if you try to use columns that don't exist for a particular result, you'll get an error so make sure to check your schema beforehand!

eager_db's People

Contributors

Stargazers

Watchers

eager_db's Issues

Add tests to probability calculator for actual sql statements with different bind values.

Implement bind value matching.

Once we have a set of high probability transitions, go back and find the bind values for these transitions.

Doing match_bind_value(i) should be easy since the bind values are the same in a large number of the queries.

However, doing match_result.column_name will require knowing the database schema. Try to think about a way around this.

Allow result to be an array

Right now, we assume a single result, but we could have multiple. We want to handle preloads for all possible results.

match_bind_value Currently not working

Need tests in sql_statement_test.rb as well.

Refactor `raw_sql` method to `non_binded_sql` in SqlStatement

Create Postgres example

Should show matches and preloads that work for a Postgres database.

Parsing infrastructure for rails db logs

We need some infrastructure to convert db logs (of any type) into SqlLog objects.

Easy to use file format

There should be a way that we read in some text file and create preloaders based on that file. Something like the following:

- "SELECT * FROM users WHERE name = ?"
  => "SELECT * FROM products WHERE user_name = ?", bind_value(0)

- "SELECT * FROM users WHERE name = ?"
  => "SELECT * FROM user_accounts WHERE user_id = ?", result.id
  => "SELECT * FROM user_tables WHERE user_table_name = ?", result.table_name

Create rake tasks for running probability calculator on canonical rails logs.

Write test cases for the file converter.

Fix the tests which don't have communication_channel in EagerloadQueryJob

Fix base.rb test for two preloads

Currently broken and not outputting the correct preload SQL.

Change README so it includes file converter stuff.

More lazy eager_db

Neat project! I really like the idea of warming caches based on queries that are likely to come next after given queries. I don't like the idea of doing any more work than I already do. I would really just want to sit down, install the gem, specify some learning parameters, and have everything Just Be Faster™.

One way of doing this might be creating markov chains on the fly based on the incoming stream of queries and then firing off queries based on that.

Let's say you see this stream of queries in your logs, which is roughly laid out as date, parameterized query, parameters in json form. (I also changed one of the queries from your example for a specific reason that is below).

[2013:12:7:14:30:00] "SELECT * FROM pies WHERE name = $recipe AND creator = $name AND id=$id" {"recipe":"pecan_pie","name:":"grandma","id":234}
[2013:12:7:14:30:30]  "SELECT * FROM pies WHERE name = $recipe AND creator = $name AND id=$id" {"recipe":"meatloaf","name:":"mom", id:234}
[2013:12:7:14:30:45]  "SELECT * FROM pies WHERE name = $recipe AND creator = $name AND id=$id" {"recipe":"pasta","name:":"mom", "id":24601}
[2013:12:7:14:31:00] "SELECT * FROM pies WHERE name = $recipe AND creator = $name and id=$id" {"recipe":"apple_pie","name:":"grandma", "id":234}
[2013:12:7:14:31:30] "SELECT * FROM recipes WHERE id = $id AND creator = $name" {"id": 234, "name":"grandma"}

Then I would have specified some learning parameters, such as a windowing threshold. Let's say it's 60 seconds and see what we can get.

The inferred markov chain I have in mind would be something like the following:

Vertex with query labels:

1:"SELECT * FROM pies WHERE name = $recipe AND creator = $name"
2:"SELECT * FROM recipes WHERE id = $id AND creator = $name"

Edges with probability of occurrence and pseudoparameters from the previous query to pass into the next query:

1->1  1.0 {"name": $name, "recipe": "*"}
1->2  0.5 {"name": $name, "id": $id}  
2->1  0.0 {}
2->2  0.0 {}

And now, whenever a vertex get's hit, you just precache all the edges that are above a certain threshold in terms of probability (or sample each edge according to calculated probability if you want to get really fancy). One of the big things to note is that the inference seems pretty easy to do if no new information is needed for the next query or if a parameter can be assumed to be *. That's why I changed a query to include the creator's id, otherwise the select all recipes query would be much harder to infer (and would probably have to look at the results of each query and do way too much work). Depending on how smart one made the markov chain inference, it might be able to infer things people never even imagined. I mean who really wants to read all those logs and derive the queries themselves? Not me, I'm hella lazy.

Anyway, this is a neat project and I'm interested to see what it can do.