Giter Site home page Giter Site logo

tihku's Introduction

Tihku

Tihku is an work-in-progress, open-source implementation of the Hekaton multi-version concurrency control (MVCC) written in Rust. The project aims to provide a foundational building block for implementing database management systems.

One of the projects using Tihku is an experimental libSQL branch with MVCC that aims to implement BEGIN CONCURRENT with Tihku improve SQLite write concurrency.

Features

  • Main memory architecture, rows are accessed via an index
  • Optimistic multi-version concurrency control
  • Rust and C APIs

Experimental Evaluation

Single-threaded micro-benchmarks

Operations Throughput
begin_tx, read, and commit 2.2M ops/second
begin_tx, update, and commit 2.2M ops/second
read 12.9M ops/second
update 6.2M ops/second

(The cargo bench was run on a AMD Ryzen 9 3900XT 2.2 GHz CPU.)

Development

Run tests:

cargo test

Test coverage report:

cargo tarpaulin -o html

Run benchmarks:

cargo bench

Run benchmarks and generate flamegraphs:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench my_benchmark -- --profile-time=5

References

Larson et al. High-Performance Concurrency Control Mechanisms for Main-Memory Databases. VLDB '11

Paper errata: The visibility check in Table 2 is wrong and causes uncommitted delete to become visible to transactions (fixed in commit 6ca3773).

tihku's People

Contributors

avinassh avatar penberg avatar psarna avatar tpisto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tihku's Issues

Log structured memory allocator?

We currently store row versions in a Vec, which is not optimal. One possible solution is to decouple the index from storage and implement a log structured memory allocator.

Row data type

This is more of a question than an issue. Have you planned to use generic row data type instead of String?

pub struct Row<T> {
    pub id: RowID,
    pub data: T,
}

I tried to implement the Row in my fork (https://github.com/tpisto/tihku) and it's mostly working, tests are passing, etc. But the self.rows.remove(&id) gives a lifetime issue I have not yet managed to solve... If Row is in your interest, maybe you could have better insight what's wrong the lifetime there.

ps. congrats for the cool project ๐Ÿ‘๐Ÿป

Transaction log trait

The commit_tx() function needs to write to a transaction log and wait for it to persist on stable storage. Let's add a transaction log trait that mvcc-rs users can provide.

Switch to lockless data structures

The current code uses a mutex to essentially make commit() and rollback() and other operations atomic. Let's switch to lockless data structures like Hekaton.

cargo test

Even though this repo is WIP, just quickly to let you know (you might already do) but when running tests, one test fails:
test database::tests::test_lost_update ... FAILED

Serializability

We currently target snapshot isolation, but we also want serializability. The Hekaton paper does talk about how to do this, but we haven't implemented any of that.

Fix lost update anomaly

There's a test_lost_update() test that's disabled because we don't detect the write-write conflict and abort later transactions.

Row manager

Implement a row manager to decouple the in-memory index represented by Database and the actual stored rows. For example, let's use a log structured memory allocator to allocate records and make RowVersion point to the LSA-backed row. With reference counting, we can now return rows from the index using zero-copy.

[Clarification] is ScanSet required?

Hi,
the optimistic transaction in paper contains 3 sets: ReadSet / WriteSet / ScanSet, I only see ReadSet / WriteSet here, is ScanSet not needed for the implementation?

Thanks.

Index support

The MVCC database is already an index internally, but we also need the ability for clients to insert row IDs instead of the records.

Copy-on-write cloning

Many database management systems implement "branching", which allows users to create a copy-on-write clones of a database. With a row manager (#20), we can make a copy of the in-memory index and bump the reference counts. This essentially gives us copy-on-write semantics with MVCC because updates create a new version rather than modify the rows in-place.

For example, let's say we have a database A and we make a clone B. Initially, they both point to the same set of rows. If either one of them updates a row, the other one will not be affected because the updating clone just marks the shared row version as deleted, but the actual row is unchanged.

As a future optimization, we could also use a copy-on-write hash table to even share the index between clones.

Garbage collect transactions

We need to garbage collect transactions that are no longer needed to avoid eventually overflowing the memory.

Bug: Committed rows can become invisible to new transactions

The paper has a typo, which I have confirmed with one of the authors. Since the code closely follows the paper and the bug has crept in here.

Let me explain with a diagram, following the same conventions as the paper:

Untitled-2023-04-10-1232

  1. At time 60, we observe our initial state, where the value of Larry is 170. This is a committed row.
  2. At time 75, a transaction is started and is in Active state. It wants to update the value to 150, so it appended row 2
  3. At time 80, another new transaction, Tx80 is started, and it wants to read the value of Larry. Both Tx75 and Tx80 are in Active state.

What is the value Tx80 going to read?

Tx80 can't see row 2 because of Table 1 rules. If Tx75 is in Active state, only Tx75 can see row 2. This makes sense because row 2 is fresh, and Tx75 might drop the changes later. We don't want any other transactions to see this row.

But can Tx80 see row 1? The rules from the paper contradict that since Tx75 is in Active state:

Screenshot 2023-04-09 at 11 51 24 PM

This is a typo, and Tx80 should be able to see row 1 since it is a committed row.

Code

This is the line which causes this bug - https://github.com/penberg/mvcc-rs/blob/44df6c/database/src/database.rs#L438

We can reproduce this error:

  1. Lets first insert a new row and commit it
  2. Start a new transaction te, updating this row. Let te be in active state
  3. Start one more transaction tx. Let this be in the active state too. According to the code, tx won't see the committed value.

Fix

  • tx should be able to see this row
  • te should not see this, instead the version it is updating (e.g. row2 from the previous diagram)
  • usual rules about the time ranges apply

Column store indexes

The "Real-Time Analytical Processing with SQL Server" paper by Larson et al. (VLDB 2015) describe how they build column store indexes for Hekathon. Let's look into adopting something like that to make the MVCC implementation work better for HTAP workloads.

Tracing

Let's add tracing to the database operations such as begin_tx(), commit_tx() and so on for debuggability.

Compact row versions

As transactions commit, we keep creating new versions. But we need to compact the versions at some point to avoid running out of memory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.