Giter Site home page Giter Site logo

bytestorage / flydb Goto Github PK

View Code? Open in Web Editor NEW
1.1K 10.0 96.0 1.65 MB

The high-performance kv storage engine based on bitcask paper made in golang

License: Apache License 2.0

Go 99.83% Shell 0.14% Makefile 0.01% Dockerfile 0.02%
database golang go key-value kv-store redis flydb raft

flydb's Introduction

FlyDB-logo

GitHub top language Go Reference LICENSE GitHub stars GitHub forks Go Report Card GitHub go.mod Go version (subdirectory of monorepo)GitHub contributors

English | 简体中文

FlyDB aims to serve as an alternative to in-memory key-value storage (such as Redis) in some cases, aiming to strike a balance between performance and storage cost. It does this by optimizing resource allocation and using cost-effective storage media. By intelligently managing data, FlyDB ensures efficient operations while minimizing storage costs. It provides a reliable solution for scenarios that require a balance between performance and storage costs.

📚 What is FlyDB ?

FlyDB is a high-performance key-value (KV) storage engine based on the efficient bitcask model. It offers fast and reliable data retrieval and storage capabilities. By leveraging the simplicity and effectiveness of the bitcask model, FlyDB ensures efficient read and write operations, resulting in improved overall performance. It provides a streamlined approach to storing and accessing key-value pairs, making it an excellent choice for scenarios that require fast and responsive data access. FlyDB's focus on speed and simplicity makes it a valuable alternative for applications that prioritize performance while balancing storage costs. 

🏁 Fast Start: FlyDB

You can install FlyDB using the Go command line tool:

go get github.com/ByteStorage/FlyDB@v1.1.0

Or clone this project from github:

git clone https://github.com/ByteStorage/FlyDB.git

🖥 How to use FlyDB ?

Used by Golang SDK

Here is a simple example of how to use the Linux version:

See flydb/examples for details.

package main

import (
	"fmt"
	"github.com/ByteStorage/FlyDB/flydb"
	"github.com/ByteStorage/FlyDB/config"
)

func main() {
    	options := config.DefaultOptions
	options.DirPath = "/tmp/flydb"
	db, _ := flydb.NewFlyDB(options)

    	err := db.Put([]byte("name"), []byte("flydb-example"))
    	if err != nil {
        	fmt.Println("Put Error => ", err)
    	}


	val, err := db.Get([]byte("name"))
	if err != nil {
		fmt.Println("Get Error => ", err)
	}
    	fmt.Println("name value => ", string(val))
    
    
    	err := db.Delete([]byte("name"))
    	if err != nil {
        	fmt.Println("Delete Error => ", err)
    	}
}

Used By Shell Command

./build.sh

Used By Docker

docker run -d --name flydb-server --network=host -p 8999:8999 bytestorage/flydb:v1.0

Used By Kubernetes

kubectl apply -f kubernetes/flydb-namespace.yaml
kubectl apply -f kubernetes/flydb-deployment.yaml
kubectl apply -f kubernetes/flydb-service.yaml
kubectl wait --for=condition=ready pod -l app=flydb -n flydb-system
kubectl port-forward svc/flydb-service -n flydb-system 8999:8999

When install flydb server by shell/docker/kubernetes, you can use the flydb-cli to connect the flydb server.

./bin/flydb-client 127.0.0.1:8999"

🚀 Performance test

We did a simple performance test of the V1.0.4 version of FlyDB. This test mainly focused on reading and writing large-scale data, and we selected 500,000 random data for testing.

Through testing, we found that in V1.0.4, with 500,000 data:

BTree Index

PUT performance: 572.265968ms

GET performance: 355.943926ms

v1.0.4-btree

ARTree Index

PUT performance: 569.610614ms

GET performance: 297.781977ms

v1.0.4-art

If you have a better way to optimize read and write performance, please submit your 'pr'.

📢 Benchmark test

We compared the results of a benchmark test using FlyDB V1.0.4 with other kv databases written in golang on the market and found that the read/write performance test results exceeded most open source kv databases.

See in detail: https://github.com/ByteStorage/contrast-benchmark

goos: linux
goarch: amd64
pkg: contrast-benchmark
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz

Benchmark_PutValue_FlyDB
Benchmark_PutValue_FlyDB-16        	   95023	     13763 ns/op	    2904 B/op	      16 allocs/op
Benchmark_GetValue_FlyDB
Benchmark_GetValue_FlyDB-16    	 	 2710143	     463.5 ns/op	     259 B/op	       5 allocs/op
Benchmark_PutValue_Badger
Benchmark_PutValue_Badger-16       	   59331	     22711 ns/op	    6006 B/op	      48 allocs/op
Benchmark_GetValue_Badger
Benchmark_GetValue_Badger-16       	  158686	      7686 ns/op	   10844 B/op	      42 allocs/op
Benchmark_PutValue_BoltDB
Benchmark_PutValue_BoltDB-16       	   32637	     56519 ns/op	   21009 B/op	     123 allocs/op
Benchmark_GetValue_BoltDB
Benchmark_GetValue_BoltDB-16       	  655971	     24327 ns/op	     723 B/op	      26 allocs/op 
Benchmark_PutValue_GoLevelDB
Benchmark_PutValue_GoLevelDB-16    	   71931	     14709 ns/op	    2226 B/op	      12 allocs/op
Benchmark_GetValue_GoLevelDB
Benchmark_GetValue_GoLevelDB-16    	  500736	      2520 ns/op	    1278 B/op	      15 allocs/op
Benchmark_PutValue_NutsDB
Benchmark_PutValue_NutsDB-16       	   78801	     13582 ns/op	    3242 B/op	      22 allocs/op
Benchmark_GetValue_NutsDB
Benchmark_GetValue_NutsDB-16       	  373124	      5702 ns/op	    1392 B/op	      14 allocs/op
Benchmark_PutValue_RoseDB
Benchmark_PutValue_RoseDB-16       	   69776	     19166 ns/op	    6242 B/op	      59 allocs/op
Benchmark_GetValue_RoseDB
Benchmark_GetValue_RoseDB-16       	 4155183	     298.0 ns/op	     167 B/op	       4 allocs/op
Benchmark_PutValue_Pebble
Benchmark_PutValue_Pebble-16       	   91304	     21877 ns/op	    2720 B/op	       8 allocs/op
Benchmark_GetValue_Pebble
Benchmark_GetValue_Pebble-16       	   66135	     15837 ns/op	   17193 B/op	      22 allocs/op
PASS

🔮 How to contact us ?

If you have any questions and want to contact us, you can contact our developer team, we will reply to your email:

Team Email: [email protected]

Or add my wechat, invite you to enter the project community, and code masters together to exchange learning.

Add wechat please comment Github

vx

✅ TODO List

  • Extended data structure support: including but not limited to string, list, hash, set, etc.
  • Compatible with Redis protocols and commands.
  • Support http services.
  • Support tcp services.
  • Log aggregation
  • Data backup
  • Distributed cluster model.

📜 Version update doc

See in detail: Version-update-document

👀 Contributor

📝 How to contribute ?

If you have any ideas or suggestions for FlyDB, please feel free to submit 'issues' or' pr 'on GitHub. We welcome your contributions!

Please refer to the complete specification procedure:CONTRIBUTEING

📋 Licence

FlyDB is released under the Apache license. For details, see LICENSE file.

Thanks To JetBrains

Thanks to JetBrains for the free open source license.

FlyDB-logo

flydb's People

Contributors

bigboss2063 avatar chdlvy avatar crazyjius avatar kronlal avatar lim-yoona avatar lovevivi121 avatar moeen89 avatar qishenonly avatar saeid-a avatar sandtripper avatar sjcsjc123 avatar uncle-justice avatar wangchenguang123 avatar zhaoshuaiup avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flydb's Issues

合并过程,内存索引如何更新

请教个楼主个问题

  1. bitcask中合并文件过程,是将所有旧的数据文件进行合并。然后会创建一个hintfile

合并完成后,内存中的索引哈希表必须要更新,因为这些数据所在的文件已经改变了。请问这个具体的更新过程是怎么样的呢?

Optimization of multi-version concurrency control MVCC (Based on issue#94)

In the design process of FlyDB's storage engine, it draws on the design idea of bitcask, and maintains all the keys in memory. If MVCC is implemented on this basis, it will also maintain the corresponding version information in memory, which may lead to a sharp expansion of memory capacity.
If you take on issue#94, you will have to consider this issue and hopefully come up with a good optimization solution.

Optimize data read and write speed

You can change the data file format to 32kb block based on the wal format of RocksDB to speed up data read and write.
The wal log format already exists in lib and can be optimized based on it.

HintIndex file bug !

Similarly, a lower-level bug occurs when testing the merge file function. The hintIndex file is created and an error occurs when opening a database instance :no such file or directory.

When I looked at the merge file's method for loading the hint index, the loadIndexFromHintFile function, it returned err instead of nil when the file didn't exist and needed to be created.

This is a very basic question that strikes me as amazing.

Hopefully, in subsequent iterations, you'll notice the tiny details.

The bugs are as follows:
uTools_1683452263423

Data recovery

If a data operation, such as data insertion or deletion, fails, you need to perform the operation again according to the operation log.

Thesis address

Hello,
Please share the address of the bitcask thesis!
Thanks!

希望有志同道合的朋友们来参与开发FlyDB

flydb 目前的功能并不是十分的完善,因此我希望有人可以参与进来,不管你是 DB 的爱好者,或者是专家,甚至刚入门的小白,其实都能够对 flydb 进行相关的贡献。

目前项目比较缺人,如果有意愿的话,欢迎加我 vx:qishen_on 或 qq:1050026498 进行交流!

Questions about the Raft module

Questions about the Raft module:

  1. What are the benefits of dividing nodes into Master and Slave? For Raft, both read and write operations should go through the Raft Leader. Originally, it is not possible to perform read-write separation operations. To increase concurrency, it can only be achieved through Regions. Each Region contains multiple replicas, each on a Raft node. The Leader of each Region can be different Raft nodes, allowing concurrent operations on each Region and improving concurrency. If the state machine is manipulated without going through the Leader, linear consistency cannot be guaranteed. For a key-value storage, it’s not just metadata that maintains linear consistency.Or is the goal not strong consistency?

  2. Does the Master store the operation logs sent by the clients, such as Get, Put, Del, etc.?

  3. What is the workflow for client operations? Is it that the client sends operation commands to the Leader of the Master cluster, and after the Leader replicates the operation logs, it commits and applies the logs to update the state machine? However, the state machine is not on the Master node but on the Slave nodes. How should the Apply operation be performed on the Slave cluster? Is it directly invoking RPC methods on each Slave to update the state machine? How is successful Apply ensured?

  4. Do the nodes in the Slave store the same data? If they are the same, how can linear consistency be maintained in the Slave cluster when it is not a Raft cluster?

Some personal suggestions:

  1. Combine the Raft nodes and state machine nodes (DB), with the state machine on top of the Raft module. The client operation flow can be designed as follows: the client sends operation requests to the Raft Leader, the Leader replicates and commits the logs, and applies the logs to the upper-level state machine, then directly returns the result to the client (for read operations, optimization can be done using Lease Index, skipping the disk write for reading logs and the Leader determining if it is still the Leader's network IO, which performs well in terms of performance). This simplifies the complexity of the system architecture and better ensures linear consistency for read and write operations,and it can also reduce additional network IO(Client between Slave or Master between Slave,just Client between Raft Node).

  2. In this case, the Master does not need to maintain additional metadata such as heartbeat and directory tree for the Slave.

  3. Partition the keys based on hash or range, and implement operations such as automatic splitting when the partition becomes too large or automatic merging when it becomes too small.

  4. I think TiKV's distributed architecture is really worth learning.

New memory index —— SkipList

We need a new in-memory index, SkipList, to allow users to choose the index that is more efficient for the business during development.

Access to raft protocol

1、defined function

You need to define a method NewFlyDbCluster,param can be master address list and slave address list. Interacting between nodes through GRPC and building master and slave clusters

2、use hashicorp/raft

Implementing leader election for internal master clusters through the raft library requires defining several interfaces, such as obtaining leaders, joining the cluster, etc

3、create test

Need to create several tests to test

Optimize database startup speed

In the current FlyDB startup process, all the data is loaded and the index is built, which can take a long time if the database contains a lot of data.
So we need a new strategy to speed up the start-up.

Implementation of concurrent version control MVCC

FlyDB currently only implements a very simple ACID transaction model SSI, which only utilizes a global lock to ensure serialization of the transaction.
In order to be able to implement more complex multi-version concurrency control to ensure that there are no incalculable losses caused by the simple model of concurrency control SSI in the production environment.

New memory index —— HashMap

We need a new in-memory index, HashMap, to allow users to choose the index that is more efficient for the business during development.

implement `AssignData`

image
This method requires cutting the data into several parts and then having a map. Just use reedsolomon api.

System adaptation problem

When I ran the project under linux, everything worked fine.
But when I ran the project in windows, I got an error when I started the database instance. The file path format was incorrect.
Please take the time to fix this bug, although it is not very important.
The bugs are as follows,
err

Data backup

In order to avoid serious problems such as data directory failure or data file corruption resulting in data loss, we need to perfect a data backup function for FlyDB.

Mkdir permission denied!

When I was functional testing flydb's merge code, I found a file read/write bug. When I was building and writing data to the database instance and finally performing the merge operation, I was prompted with an error message indicating insufficient permissions to create.

After some debugging, it is found that the problem is in the getMergePath function of the merge file. The lack of the basic directory of the database results in an error. Hope to fix it as soon as possible.

The bugs are as follows:
uTools_1683449969184

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.