Giter Site home page Giter Site logo

eager_db's Introduction

EagerDB

EagerDB is a database management layer for preloading queries. Think of it as a predictive cache warming tool. It listens to your incoming SQL queries and warms your cache with queries that you are likely to make.

Development Progress

EagerDB is currently in development (Version 0) and not quite ready to release yet. If you find any issues or would like to collaborate, please email [email protected] and I'd love to talk!

Getting Started

Manually Specifying Preloads

If you know exactly what is going to be needed by your application, you can manually specify queries to preload. For example, if you know that whenever the application makes a certain SQL query, you can tell EagerDB to load another query. For example, say some user is looking for the apple pies that grandma made:

SELECT * FROM pies WHERE name = 'apple_pie' AND creator = 'grandma'

With high probability, you know that user is then looking to try to look for the recipe to that specific applie pie. Let's say you got back a result from your original query which was something like:

{ id: 553, name: 'apple_pie', creator: 'grandma', recipe_id: 234 }

Then, you might want to go ahead and warm your database's cache with the following query:

SELECT * FROM recipes WHERE id = 234 AND creator = 'grandma'

However, you'd really like this to be more general. You know that whenever a user is looking for any type of apple pie with any type of creator, then you should preload the recipe for that apple pie with that creator. You can do just that in EagerDB. Just write the following:

- "SELECT * FROM pies WHERE name = ? AND creator = ?"
    => "SELECT * FROM recipes WHERE id = ? AND creator = ?", match_result.recipe_id, match_bind_value(1)

There are a couple things here to note about syntax, the dash - at the beginning of the first line signifies that you want to match on that statement. It tells EagerDB that whenever you see the following statement, preload the statements that come after which are preceded by a hash rocket =>.

You can use match_bind_value(index) to get the bind value from the matching statement. Bind values are the values which are replaced by question marks ?. So in our example, match_bind_value(1) will correspond to the second bind value (we index by 0) which in our example would be grandma.

You can also use match_result to use the result from the match query. By specifying match_result.recipe_id, you are getting the recipe_id from the result of the original matched query. Of course, if you try to use columns that don't exist for a particular result, you'll get an error so make sure to check your schema beforehand!

eager_db's People

Contributors

wangjohn avatar

Stargazers

 avatar Esteban avatar Guillermo Iguaran avatar Zack Maril avatar  avatar

Watchers

James Cloos avatar  avatar

eager_db's Issues

Implement bind value matching.

Once we have a set of high probability transitions, go back and find the bind values for these transitions.

Doing match_bind_value(i) should be easy since the bind values are the same in a large number of the queries.

However, doing match_result.column_name will require knowing the database schema. Try to think about a way around this.

Allow result to be an array

Right now, we assume a single result, but we could have multiple. We want to handle preloads for all possible results.

Easy to use file format

There should be a way that we read in some text file and create preloaders based on that file. Something like the following:

- "SELECT * FROM users WHERE name = ?"
  => "SELECT * FROM products WHERE user_name = ?", bind_value(0)

- "SELECT * FROM users WHERE name = ?"
  => "SELECT * FROM user_accounts WHERE user_id = ?", result.id
  => "SELECT * FROM user_tables WHERE user_table_name = ?", result.table_name

More lazy eager_db

Neat project! I really like the idea of warming caches based on queries that are likely to come next after given queries. I don't like the idea of doing any more work than I already do. I would really just want to sit down, install the gem, specify some learning parameters, and have everything Just Be Faster™.

One way of doing this might be creating markov chains on the fly based on the incoming stream of queries and then firing off queries based on that.

Let's say you see this stream of queries in your logs, which is roughly laid out as date, parameterized query, parameters in json form. (I also changed one of the queries from your example for a specific reason that is below).

[2013:12:7:14:30:00] "SELECT * FROM pies WHERE name = $recipe AND creator = $name AND id=$id" {"recipe":"pecan_pie","name:":"grandma","id":234}
[2013:12:7:14:30:30]  "SELECT * FROM pies WHERE name = $recipe AND creator = $name AND id=$id" {"recipe":"meatloaf","name:":"mom", id:234}
[2013:12:7:14:30:45]  "SELECT * FROM pies WHERE name = $recipe AND creator = $name AND id=$id" {"recipe":"pasta","name:":"mom", "id":24601}
[2013:12:7:14:31:00] "SELECT * FROM pies WHERE name = $recipe AND creator = $name and id=$id" {"recipe":"apple_pie","name:":"grandma", "id":234}
[2013:12:7:14:31:30] "SELECT * FROM recipes WHERE id = $id AND creator = $name" {"id": 234, "name":"grandma"}

Then I would have specified some learning parameters, such as a windowing threshold. Let's say it's 60 seconds and see what we can get.

The inferred markov chain I have in mind would be something like the following:

Vertex with query labels:

1:"SELECT * FROM pies WHERE name = $recipe AND creator = $name"
2:"SELECT * FROM recipes WHERE id = $id AND creator = $name"

Edges with probability of occurrence and pseudoparameters from the previous query to pass into the next query:

1->1  1.0 {"name": $name, "recipe": "*"}
1->2  0.5 {"name": $name, "id": $id}  
2->1  0.0 {}
2->2  0.0 {}

And now, whenever a vertex get's hit, you just precache all the edges that are above a certain threshold in terms of probability (or sample each edge according to calculated probability if you want to get really fancy). One of the big things to note is that the inference seems pretty easy to do if no new information is needed for the next query or if a parameter can be assumed to be *. That's why I changed a query to include the creator's id, otherwise the select all recipes query would be much harder to infer (and would probably have to look at the results of each query and do way too much work). Depending on how smart one made the markov chain inference, it might be able to infer things people never even imagined. I mean who really wants to read all those logs and derive the queries themselves? Not me, I'm hella lazy.

Anyway, this is a neat project and I'm interested to see what it can do.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.