Giter Site home page Giter Site logo

kipdata / kipdb Goto Github PK

View Code? Open in Web Editor NEW
286.0 10.0 24.0 1.72 MB

Lightweight, asynchronous based on LSM Leveled Compaction KV database

License: Apache License 2.0

Rust 99.68% Dockerfile 0.19% Shell 0.13%
async database kv-store rust hash lsm-tree sled docker key-value-store lightweight

kipdb's Introduction

KipDB - Keep it Public DB

github star github fork

Crates.io LICENSE Rust Community

KipDB 轻量级键值存储引擎

整体设计参考LevelDB,旨在作为NewSQL分布式数据库的存储引擎

  • 支持嵌入式/单机存储/远程调用等多应用场景
  • Kiss作为开发理念,设计以简单而高效为主
  • 实现MVCC以支持ACID
  • 高性能,BenchMark写入吞吐量约为Sled的两倍,且大数据量下的顺序读取平均延迟为1μs左右
  • 远程连接使用ProtoBuf实现,支持多语言通信
  • 极小的内存占用(待机/大量冷数据)
  • 并发安全,读读、读写并行

组件原理Wiki : https://github.com/KKould/KipDB/wiki

快速上手 🤞

Tips: 使用RPC时请确保 Protocol Buffer Compiler 已安装。

组件引入

kip_db = "0.1.2-alpha.15 "

代码编译

基本编译

# 代码编译
cargo build

# 代码编译(正式环境)
cargo build --release

# 单元测试
cargo test

# 性能基准测试
cargo bench

Docker镜像编译

# 编译镜像
docker build -t kould/kip-db:v1 .

# 运行镜像
docker run kould/kip-db:v1

直接调用(基本使用)

/// 指定文件夹以开启一个KvStore
let kip_db = LsmStore::open("/welcome/kip_db").await?;

// 插入数据
kip_db.set(&b"https://github.com/KKould/KipDB", Bytes::from(&b"your star plz"[..])).await?;
// 获取数据
let six_pence = kip_db.get(&b"my deposit").await?;
// 已占有硬盘大小
let just_lot = kip_db.size_of_disk().await?
// 已有数据数量
let how_many_times_you_inserted = kip_db.len().await?;
// 删除数据
kip_db.remove(&b"ex girlfriend").await?;

// 创建事务
let mut transaction = kip_db.new_transaction().await?;
// 插入数据至事务中
transaction.set(&b"this moment", Bytes::from(&b"hope u like it"[..]));
// 删除该事务中key对应的value
transaction.remove(&b"trouble")?;
// 获取此事务中key对应的value
let ping_cap = transaction.get(&b"dream job")?;
// 提交事务
transaction.commit().await?;

// 创建持久化数据迭代器
let guard = kip_db.iter().await?;
let mut iterator = guard.iter()?;

// 获取下一个元素
let hello = iterator.next_err()?;
// 移动至第一个元素
let world = iterator.seek(Seek::Last)?;

// 强制数据刷入硬盘
kip_db.flush().await?;

远程应用

服务启动

/// 服务端启动!
let listener = TcpListener::bind("127.0.0.1:8080").await?;

kip_db::net::server::run(listener, tokio::signal::ctrl_c()).await;

远程调用

/// 客户端调用!
let mut client = Client::connect("127.0.0.1:8080").await?;

// 插入数据
client.set(&vec![b'k'], vec![b'v']).await?
// 获取数据
client.get(&vec![b'k']).await?
// 已占有硬盘大小
client.size_of_disk().await?
// 存入指令数
client.len().await?
// 数据刷入硬盘
client.flush().await?
// 删除数据
client.remove(&vec![b'k']).await?;
// 批量指令执行(可选 并行/同步 执行)
let vec_batch_cmd = vec![CommandData::get(b"k1".to_vec()), CommandData::get(b"k2".to_vec())];
client.batch(vec_batch_cmd, true).await?

内置多种持久化内核👍

  • LsmStore: LSM存储,使用Leveled Compaction策略(默认内核)
  • HashStore: 类Bitcask
  • SledStore: 基于Sled数据库进行封装

操作示例⌨️

服务端

PS D:\Workspace\kould\KipDB\target\release> ./server -h
KipDB-Server 0.1.0
Kould <[email protected]>
A KV-Store server

USAGE:
server.exe [OPTIONS]

OPTIONS:
-h, --help           Print help information
--ip <IP>
--port <PORT>
-V, --version        Print version information

PS D:\Workspace\kould\KipDB\target\release> ./server   
2022-10-13T06:50:06.528875Z  INFO kip_db::kernel::lsm::ss_table: [SsTable: 6985961041465315323][restore_from_file][TableMetaInfo]: MetaInfo { level: 0, version: 0, data_len: 118, index_len: 97, part_size: 64, crc_code: 43553795 }, Size of Disk: 263
2022-10-13T06:50:06.529614Z  INFO kip_db::net::server: [Listener][Inbound Connections]
2022-10-13T06:50:13.437586Z  INFO kip_db::net::server: [Listener][Shutting Down]

客户端

PS D:\Workspace\kould\KipDB\target\release> ./cli --help
KipDB-Cli 0.1.0
Kould <[email protected]>
Issue KipDB Commands

USAGE:
    cli.exe [OPTIONS] <SUBCOMMAND>

OPTIONS:
    -h, --help                   Print help information
        --hostname <hostname>    [default: 127.0.0.1]
        --port <PORT>            [default: 6333]
    -V, --version                Print version information

SUBCOMMANDS:
    batch-get
    batch-remove
    batch-set
    flush
    get
    help                     Print this message or the help of the given subcommand(s)
    len
    remove
    set
    size-of-disk
    
PS D:\Workspace\kould\KipDB\target\release> ./cli batch-set kould kipdb welcome !
2022-09-27T09:50:11.768931Z  INFO cli: ["Done!", "Done!"]

PS D:\Workspace\kould\KipDB\target\release> ./cli batch-get kould kipdb          
2022-09-27T09:50:32.753919Z  INFO cli: ["welcome", "!"]

Features🌠

  • Major Compation
    • 多级递增循环压缩 ✅
    • SSTable压缩状态互斥
      • 避免并行压缩时数据范围重复 ✅
  • KVStore
    • 参考Sled增加api
      • size_of_disk ✅
      • clear
      • contains_key
      • iter ✅
      • len ✅
      • is_empty ✅
      • ...
    • 多进程锁 ✅
      • 防止多进程对文件进行读写造成数据异常
  • SSTable
    • 布隆过滤器 ✅
      • 加快获取键值的速度
    • MetaBlock ✅
      • 用于存储统计数据布隆过滤器的存放
  • Block
    • DataBlock、IndexBlock复用实现并共享缓存 ✅
    • 实现前缀压缩并使用varint编码以及LZ4减小空间占用 ✅
    • 基于前缀进行二分查询 ✅
  • Cache
    • TableCache: SSTableLoader懒加载 ✅
    • BlockCache: 稀疏索引数据块缓存 ✅
    • 类LevelDB的并行LruCache: ShardingLruCache ✅
  • Iterator 迭代器
    • BlockIterator ✅
    • SSTableIterator ✅
    • LevelIterator ✅
    • VersionIterator ✅
  • WAL 防灾日志
    • 落盘时异常后重启数据回复 ✅
    • 读取数据不存在时尝试读取 ✅
  • MVCC单机事务 ✅
    • Manifest多版本持久化 ✅
    • SSTable多版本持久化 ✅
  • 网络通信
    • 使用ProtoBuf进行多语言序列化 ✅
    • Ruby of KipDB
    • Java of KipDB
    • Rust of KipDB ✅
  • 分布式
    • 使用Raft复制协议保持状态一致

Perf火焰图监测

  • 为了方便性能调优等监测,提供了两个Dockerfile作为支持
    • Dockerfile: KipDB的Server与Cli
    • Dockerfile-perf: 外部Perf监测

使用步骤

  1. 打包KipDB本体镜像docker build -t kould/kip-db:v1 .
  2. 打包Perf监测镜像docker build -f Dockerfile-perf -t kould/perf:v1 .
  3. 以任意形式执行kould/kip
    • 例: docker run kould/kip-db:v1
  4. 执行attach-win.sh <kip-db容器ID>
    • 例: ./attach-win.sh 263ad21cc56169ebec79bbf614c6986a78ec89a6e0bdad5e364571d28bee2bfc
  5. 在该bash内输入. record.sh <kip-db的server进程pid>
    • 若不清楚进程id是多少可以直接输入ps,通常为1
    • 注意!: 不要关闭bash,否则会监听失败!
  6. 随后去对KipDB进行对应需要监测的操作
  7. 操作完毕后回到步骤5的bash内,以ctrl + c终止监听,得到perf.data
  8. 继续在该bash内输入. plot.sh <图片名.svg>, 即可生成火焰图
    • 导出图片一般可使用 docker cpdocker exec 或挂载 volume,为方便预览和复制文件,容器内置了轻量网页服务,执行 thttpd -p <端口号> 即可。由于脚本中没有设置端口转发,需要 docker inspect <目标容器ID> | grep IPAdress 查看目标容器的 IP,然后在浏览器中访问即可。若需要更灵活的操作,可不用以上脚本手动添加参数运行容器。

参考自:https://chinggg.github.io/post/docker-perf/

如果你想参与KipDB或KipSQL,欢迎通过下方微信二维码与我交流

微信联系方式

Thanks For

JetBrains

kipdb's People

Contributors

arlottang avatar kkmaaan avatar kkould avatar lewiszlw avatar loloxwg avatar sacloudy avatar yongxin-hu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kipdb's Issues

Wiki 内容完善

根据LSMStore的各个组件原理编写相应描述文档

  • VersionEdit快照
  • WAL
  • 迭代器

Remove does not work

Bug Report

After storing data, flush is performed, and then it can still be read after deleting data

#[tokio::main]
async fn main() -> Result<(), KernelError> {
    let temp_dir = TempDir::new().expect("unable to create temporary working directory");
    let config = Config::new(temp_dir.into_path()).enable_level_0_memorization();
    let kip_storage = KipStorage::open_with_config(config).await?;

    println!("Set KeyValue -> (apple, banana)");
    kip_storage
        .set(b"apple", Bytes::copy_from_slice(b"banana"))
        .await?;

    println!(
        "Get Key: apple -> Value: {:?}",
        kip_storage.get(b"apple").await?
    );
    println!("SizeOfDisk: {}", kip_storage.size_of_disk().await?);
    println!("Len: {}", kip_storage.len().await?);
    println!("IsEmpty: {}", kip_storage.is_empty().await);

    kip_storage.flush().await?;

    let join_cmd_1 = vec![
        CommandData::set(b"moon".to_vec(), b"star".to_vec()),
        CommandData::remove(b"apple".to_vec()),
    ];
    println!(
        "Join 1: {:?} -> {:?}",
        join_cmd_1.clone(),
        kip_storage.join(join_cmd_1).await?
    );
    let join_cmd_2 = vec![
        CommandData::get(b"moon".to_vec()),
        CommandData::get(b"apple".to_vec()),
    ];
    println!(
        "Join 2: {:?} -> {:?}",
        join_cmd_2.clone(),
        kip_storage.join(join_cmd_2).await?
    );

    Ok(())
}

Show:

Set KeyValue -> (apple, banana)
Get Key: apple -> Value: Some(b"banana")
SizeOfDisk: 0
Len: 1
IsEmpty: false
Join 1: [Set { key: [109, 111, 111, 110], value: [115, 116, 97, 114] }] -> [None]
Join 2: [Get { key: [109, 111, 111, 110] }, Get { key: [97, 112, 112, 108, 101] }] -> [Some([115, 116, 97, 114]), Some([98, 97, 110, 97, 110, 97])]

Benchmark不同Store文件隔离

Bug Report

Benchmark中Sled与KipDB会交错运行,进行对比性能数据参考,但T::open中未对文件格式名区分,会导致两个数据库之间的测试数据混杂在一个文件夹下

可以参考BenchMark测试名的形式使用T::name插入T::open中format的实例文件夹命名
image

Cli客户端重构

目前使用clap作为简单的测试客户端,若是能支持命令式则是极好的

优化VersionStatus统计信息

目前VersionStatus根据VersionLog进行状态回归的时候,根据VersionEdit进行对应SSTable加载获取其统计信息

此处可以减少SSTable加载以优化恢复性能:
例:Version::NewFile新增参数SSTableMeta等,统计时使用该参数而避免加载SSTable

重构SSTable的FilterBlock

Feature Request

目前SSTable简化,仅仅使用MetaBlock存储BloomFilter
参考到BF的后继优化以及减少额外依赖,因此进行复现重构

cargo run的一些疑惑

这是我rustup show的信息

Default host: x86_64-unknown-linux-gnu
rustup home:  /root/.rustup

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu

active toolchain
----------------

nightly-x86_64-unknown-linux-gnu (overridden by '/root/rs_code/KipDB/rust-toolchain')
rustc 1.72.0-nightly (e6d4725c7 2023-06-05)

当我在ubuntu上运行cargo run --bin server,他会报两个错

error: expected one of `)`, `,`, `.`, `?`, or an operator, found `:`
   --> src/kernel/lsm/compactor.rs:145:71
    |
145 |                         (future::try_join_all(ss_table_futures).await?: Vec<(SSTable, Scope)>)
    |                                                                       ^ expected one of `)`, `,`, `.`, `?`, or an operator


error: calls to `std::mem::drop` with a value that implements `Copy` does nothing
   --> src/kernel/utils/lru_cache.rs:389:17
    |
389 |                 drop(node.as_ptr());
    |                 ^^^^^-------------^
    |                      |
    |                      argument has type `*mut Node<K, V>`
    |
    = note: use `let _ = ...` to ignore the expression or result

我做了如下修改后,可以运行起server

let (vec_new_ss_table, vec_new_scope): (Vec<SSTable>, Vec<Scope>) =
                        future::try_join_all(ss_table_futures).await?
                    .into_iter()
                    .unzip();
 let _ = node.as_ptr();

请问,这是我使用方式不对,还是在开发过程中后面会进一步解决的问题

Supported Data Clearing Method: `clear`

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:
Need to be able to clean up disk and memory data methods, such as clear
Tips: MVCC needs to be taken into account when clearing, and the latest Version should be used as the cleanup object

some question about kernel.io.directio

Question

Direct IO, usally means O_DIRECT [1] and related IO, which could bypass some IO stack [2]. Usally it needs to align the size of io, and can bypass some page buffer/cache

Should we rename this to Local IO or other?

Besides, read using current reader is limited to single thread, since read is only support single thread. should we support a readAt (pread [3]) for possible multi-thread read?

impl Read for DirectIoReader {
    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
        self.fs.read(buf)
    }
}

[1] https://stackoverflow.com/questions/41257656/what-does-o-direct-really-mean
[2] https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram
[3] https://man7.org/linux/man-pages/man2/pwrite.2.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.