Giter Site home page Giter Site logo

fjall-rs / fjall Goto Github PK

View Code? Open in Web Editor NEW
132.0 3.0 6.0 2.32 MB

LSM-based embedded key-value storage engine written in safe Rust

License: Apache License 2.0

Rust 99.22% JavaScript 0.78%
database lsm lsm-tree lsmt rust rust-lang storage-engine mit-license embedded-database embedded-kv

fjall's People

Contributors

marvin-j97 avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fjall's Issues

Data partitioning/Column families

Column families/partitions are the missing piece to having a great storage engine. Each column family is its own partition inside the LSM, but all share the same journal, enabling atomic writes. So each column family is physically stored separately from the others. Compaction then looks at each partition instead, never mixing tables of different partitions. It's like a big meta-LSM tree, instead of using multiple trees, which don't have atomic semantics, unless putting another super structure on top. That super structure will be the new LSM tree which contains partitions basically. Compaction may even be set differently for each column family.

image

When creating no column family, everything gets put into the "default" column family.
Creating column families is pretty simple, you add a new memtable. Dropping a column family is simple. Delete its memtable and all the segments inside the partition. The journal needs to be handled accordingly, because flushing one column family doesn't necessarily mean the others are flushed too. And if the column family was deleted, the journal should not flush parts to the partition at all.

There needs to be new semantics for writing to a column family:

  • insert(col_family, key, value)
  • remove(col_family, key)
  • batch() ... needs col_family support

https://github.com/facebook/rocksdb/wiki/Column-Families

Example use cases

  • One partition could be an index in a relational database for example, so writing a row into "default" also atomically inserts/updates rows inside "index-1" (https://github.com/facebook/mysql-5.6/wiki/MyRocks-data-dictionary-format)
  • Dropping the index just drops the column family "index-1", which is a O(1) operation
  • another example would be locality groups in Bigtable and column families in RocksDB and Cassandra

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

cargo
Cargo.toml
  • byteorder 1.5.0
  • crc32fast 1.4.0
  • lsm-tree 0.6.3
  • log 0.4.21
  • std-semaphore 0.1.0
  • tempfile 3.10.1
  • fs_extra 1.3.0
  • path-absolutize 3.1.1
  • criterion 0.5.1
  • nanoid 0.4.0
  • test-log 0.2.15
  • rand 0.8.5
github-actions
.github/workflows/release.yml
  • actions/checkout v4
  • katyo/publish-crates v2
.github/workflows/test.yml
  • actions/checkout v4
  • Swatinem/rust-cache v2
  • actions/checkout v4

  • Check this box to trigger a request for Renovate to run again on this repository

Transactions

We should have all the pieces to get transactions working:

  • Snapshots achieved by MVCC
  • Atomicity achieved by Batch

Optional TTL for FIFO strategy

  1. Look at each segment's metadata
  2. Compare with TTL setting
  3. If now > (segment.created_at + ttl_seconds), drop that segment (use DropSegments)
  4. Then check the max L0 size setting, and use that to maybe drop segments as well

Allow more characters in partition name.

motivation

I am seeking a lsm-tree based database as my in-memory database backend. Firstly, I found crate lsm_tree, but it doesn't contain auto flushing and partition, the only thing i can do is to store different table in one folder, that's fine. Then I find this crate, it exactly is the wheel that I am looking for. After a few minutes migration, a big problem occurs.

the problem

I was using a reflection of a type as the name of the lsm_tree file to store different types in different trees. Let's say, the dictionary-like generic api for my storage is Dense<K, V, D>, D is for delta. When lib user want to store a type like <u64,u64,()>, the name after reflection of this type is something like Dense<u64, u64, ()>
As you can see, the name contains character <>, and blank space, which cannot used as the name of partition.

solution

remove the limit for partition name
or, use a AsRef<[u8]> as the partition name api.

Bloom Filters

Need a bloom (or XOR or ribbon or cuckoo) filter, that:

  1. can be restored from disk/raw bytes

This is obvious. On recovery, load all bloom filters back into memory from disk. So the bloom filter needs to be able to give us its internal bytes and needs a constructor to recreate it from raw bytes.

  1. can be resized while writing to it

RocksDB used to store a bloom filter per data block, so filter construction is simple. However, its read path is much worse because you need to travel through the SST file. The new full filter format just stores one big bloom filter per SST file. That requires much more memory when flushing and compacting because all keys or hashes need to be buffered until the SST is written because the number of items is unknown until everything is written. Then the bloom filter needs to be built from the buffer.

Using a scalable filter may solve this memory issue.

  1. Also...

M O N K E Y may maximize efficiency for a given amount of memory: http://daslab.seas.harvard.edu/monkey/.

Find MSRV

cargo msrv said 1.68.2, but it's not building on Mac.


Search for TODO: #8

Periodic (maintenance) compaction

On a regular, but infrequent, interval, check each level and maybe compact it into itself (need a SingleLevelCompactor), or pull down into last level ideally (to remove tombstones).

Optimistic concurrency control through sequence number

Is your feature request related to a problem? Please describe.
Currently it's not possible to ensure you work with latest key version.

Describe the solution you'd like
Make possible to keys have sequence number, which may be used for denying modifying operations, if last sequence of stream and requested number doesn't match.

`tree.get` function returned the unexpected value

tree.get function seems using prefix to match key.

Here is my code:

let folder = "";

// A tree is a single physical keyspace/index/...
// and supports a BTreeMap-like API
// but all data is persisted to disk.
let tree = Config::new(folder).open().unwrap();

// Note compared to the BTreeMap API, operations return a Result<T>
// So you can handle I/O errors if they occur
tree.insert("hello-key-999991", "hello-value-999991")
    .unwrap();

let item = tree.get("hello-key-99999").unwrap();
match item {
    Some(value) => {
        println!("value: {}", std::str::from_utf8(&value).unwrap());
    }
    None => println!("Not Found"),
}

// Flush to definitely make sure data is persisted
tree.flush().unwrap();

Expect Not Found, but printed value: hello-value-999991

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.