Comments (2)
I hope that fixed-length key values can be considered when designing the format. Many times, keys and values can be fixed-length (such as u64 id - file hash). I believe that fixed-length fields can be optimized a lot.
I think you can refer to duckdb and consider writing data to the log regularly and compressing it into parquet format.
https://duckdb.org/docs/data/parquet/overview.html
https://parquet.apache.org
I believe this format does a lot of optimizations for the data
You can use this library to read and write https://docs.rs/parquet/latest/parquet/
from fjall.
I hope that fixed-length key values can be considered when designing the format. Many times, keys and values can be fixed-length (such as u64 id - file hash). I believe that fixed-length fields can be optimized a lot.
I'm not sure if fixed lengths can really be optimized in block based tables. You would at most save 3 byte per K-V pair for a lot of added complexity. It could save you some decent space for huge data sets, but not in block-based tables, and right now I don't plan on adding other types of tables.
compressing it into parquet format.
Parquet is a column-based format with row groups. There is no notion of columns or rows here, so I'm not sure there is an advantage over packed K-V blocks. I have some interest in implementing an alternative block format that is row group based. The current blocks are KVKVKVKV, but an alternative Parquet-esque format could be KKKKVVVV, which would allow for better compression, depending on the values.
from fjall.
Related Issues (20)
- thinking through: persistence if keyspace is dropped before opened partitions HOT 2
- level_ratio should be changeable retroactively
- Write with sync vs Write with flush only vs Write without any flushing
- Create axum-kv
- Create rocket-kv
- How to flush manually HOT 1
- Think about backup strategies HOT 1
- unexpected warning : shard.rs:106: Invalid batch: found batch start inside batch HOT 6
- Concurrency control through SSI
- SingleDelete
- impl Drop flush for Journal, resolve cyclic Arcs on Drop HOT 1
- Level ratio is not recovered
- Recreating a deleted partition may be undefined behaviour HOT 2
- Correctly track lowest closed instant/snapshot seqno HOT 1
- Make fjall::WriteTransaction Send HOT 4
- Feature request: memory backend HOT 3
- Database reached a seemingly irrecoverable state HOT 5
- Lock version file to prevent multi-process access
- Insert-during-iterate deadlock
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fjall.