Giter Site home page Giter Site logo

ebay / jungle Goto Github PK

View Code? Open in Web Editor NEW
215.0 7.0 50.0 859 KB

An embedded key-value store library specialized for building state machine and log store

License: Apache License 2.0

CMake 1.44% C++ 96.34% Shell 0.18% Python 0.91% C 1.13%
key-value-store embedded-kv lsm-tree b-tree state-machine logstore hybrid

jungle's Introduction

Jungle

build codecov

Embedded key-value storage library, based on a combined index of LSM-tree and copy-on-write (append-only) B+tree. Please refer to our paper.

Jungle is specialized for building replicated state machine of consensus protocols such as Paxos or Raft, by providing chronological ordering and lightweight persistent snapshot. It can be also used for building log store.

Features

  • Ordered mapping of key and its value on disk (file system). Both key and value are arbitrary length binary.
  • Monotonically increasing sequence number for each key-value modification.
  • Point lookup on both key and sequence number.
  • Range lookup on both key and sequence number, by using iterator:
    • Snapshot isolation: each individual iterator is a snapshot.
    • Bi-directional traversal and jump: prev, next, gotoBegin, gotoEnd, and seek.
  • Lightweight persistent snapshot, based on sequence number:
    • Nearly no overhead for the creation of a snapshot.
    • Snapshots are durable; preserved even after process restart.
  • Tunable configurations:
    • The number of threads for log flushing and compaction.
    • Custom size ratio between LSM levels.
    • Compaction factor (please refer to the paper).
  • Log store mode:
    • Ordered mapping of sequence number and value, eliminating key indexing.
    • Lightweight log truncation based on sequence number.

Things we DO NOT (and also WILL NOT) support

  • Secondary indexing, or SQL-like query:
    • Jungle will not understand the contents of value. Value is just a binary from Jungle's point of view.
  • Server-client style service, or all other network-involving tasks such as replication:
    • Jungle is a library that should be embedded into your process.

Benefits

Compared to other widely used LSM-based key-value storage libraries, benefits of Jungle are as follows:

  • Smaller write amplification.
    • Jungle will have 4-5 times less write amplification, while providing the similar level of write performance.
  • Chronological ordering of key-value pairs
    • Along with persistent logical snapshot, this feature is very useful when you use it as a replicated state machine for Paxos or Raft.

How to Build

1. Install cmake:

  • Ubuntu
$ sudo apt-get install cmake
  • OSX
$ brew install cmake

2. Build

jungle$ ./prepare.sh -j8
jungle$ mkdir build
jungle$ cd build
jungle/build$ cmake ../
jungle/build$ make

Run unit tests:

jungle/build$ ./runtests.sh

How to Use

Please refer to this document.

Example Implementation

Please refer to examples.

Supported Platforms

  • Ubuntu (tested on 14.04, 16.04, and 18.04)
  • Centos (tested on 7)
  • OSX (tested on 10.13 and 10.14)

Platforms will be supported in the future

  • Windows

Contributing to This Project

We welcome contributions. If you find any bugs, potential flaws and edge cases, improvements, new feature suggestions or discussions, please submit issues or pull requests.

Contact

Coding Convention

  • Recommended not to exceed 90 characters per line.
  • Indent: 4 spaces, K&R (1TBS).
  • Class & struct name: UpperCamelCase.
  • Member function and member variable name: lowerCamelCase.
  • Local variable, helper function, and parameter name: snake_case.
class MyClass {
public:
    void myFunction(int my_parameter) {
        int local_var = my_parameter + 1;
        if (local_var < myVariable) {
            // ...
        } else {
            // ...
        }
    }
private:
    int myVariable;
};

int helper_function() {
    return 0;
}
  • Header include order: local to global.
    1. Header file corresponding to this source file (if applicable).
    2. Header files in the same project (i.e., Jungle).
    3. Header files from the other projects.
    4. C++ system header files.
    5. C system header files.
    • Note: alphabetical order within the same category.
    • Example (my_file.cc):
#include "my_file.h"            // Corresponding header file.

#include "table_file.h"         // Header files in the same project.
#include "table_helper.h"

#include "forestdb.h"           // Header files from the other projects.

#include <cassert>              // C++ header files.
#include <iostream>
#include <vector>

#include <sys/stat.h>           // C header files.
#include <sys/types.h>
#include <unistd.h>

License Information

Copyright 2017-present eBay Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

3rd Party Code

  1. URL: https://github.com/couchbase/forestdb
    License: https://github.com/couchbase/forestdb/blob/master/LICENSE
    Originally licensed under the Apache 2.0 license.

  2. URL: https://github.com/stbrumme/crc32
    Original Copyright 2011-2016 Stephan Brumme
    See Original ZLib License: https://github.com/stbrumme/crc32/blob/master/LICENSE

  3. URL: https://github.com/greensky00/simple_logger
    License: https://github.com/greensky00/simple_logger/blob/master/LICENSE
    Originally licensed under the MIT license.

  4. URL: https://github.com/greensky00/testsuite
    License: https://github.com/greensky00/testsuite/blob/master/LICENSE
    Originally licensed under the MIT license.

  5. URL: https://github.com/greensky00/latency-collector
    License: https://github.com/greensky00/latency-collector/blob/master/LICENSE
    Originally licensed under the MIT license.

  6. URL: https://github.com/eriwen/lcov-to-cobertura-xml/blob/master/lcov_cobertura/lcov_cobertura.py
    License: https://github.com/eriwen/lcov-to-cobertura-xml/blob/master/LICENSE
    Copyright 2011-2012 Eric Wendelin
    Originally licensed under the Apache 2.0 license.

  7. URL: https://github.com/bilke/cmake-modules
    License: https://github.com/bilke/cmake-modules/blob/master/LICENSE_1_0.txt
    Copyright 2012-2017 Lars Bilke
    Originally licensed under the BSD license.

  8. URL: https://github.com/aappleby/smhasher/tree/master/src
    Copyright 2016 Austin Appleby
    Originally licensed under the MIT license.

jungle's People

Contributors

awitten1 avatar dong-ho-kim avatar erfanz avatar greensky00 avatar mohiuddin-shuvo avatar smallsmallc avatar tobecontinued avatar yfinkelstein avatar yong-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jungle's Issues

How to tune with parameters?

Can the parameters in Jungle correspond to those in RocksDB? Such as Memtable size, SSTable size, block size, background job number, direct i/o, cache size and etc.

Could you please give some advice on tuning the jungle db?

Hi, Why is the space amplification of Jungle bigger than Tiering?

  • Firstly, Jungle is an excellent work. Thanks for your efforts~
  • I have a question after reading the paper. Why is the space amplification of Jungle bigger than Tiering?
    • From previous analysis, the space amplification of Jungle may be similar to that of Tiering. However, the evaluation result shows that the space amplification of Jungle when C=5 or C=10 is bigger than tiering. And the fluctuation of space amplication seems also bigger than tiering. I am confused about this.

image

alpine musl build error

In file included from jungle/src/logger.cc:24:

jungle/src/backtrace.h:36:10: fatal error: 'execinfo.h' file not found
#include <execinfo.h>
         ^~~~~~~~~~~~

Some problems about the application of raft

In raft, if a master is responsible for read operation, will the read operation enter the log store? If enter, need the follow node to actively ignore it?

In the implementation, I see that the follow node can forward operation to the master node。in this case, must the operation enter the log store?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.