pumpkindb / pumpkindb Goto Github PK

View Code? Open in Web Editor NEW

1.4K 42.0 59.0 1.04 MB

Immutable Ordered Key-Value Database Engine

Home Page: http://pumpkindb.org

License: Mozilla Public License 2.0

Rust 99.29% Makefile 0.09% Papyrus 0.61%

event-sourcing database rust forth query indexing key-value storage concatenative

pumpkindb's Introduction

PumpkinDB


Build status (Linux)
Build status (Windows)
Project status	Usable, between alpha and beta
Production-readiness	Depends on your risk tolerance

PumpkinDB is an immutable ordered key-value database engine, featuring:

ACID transactions
Persistent storage
An embedded programming language (PumpkinScript)
Binary keys and values (allows any encoding to be used: JSON, XML, Protobuf, Cap'n Proto, etc.)
Standalone and embedded scenarios

Why immutable?

Simply put, the data replaced is data deleted and is therefore, an unsafe way to manage data. Bugs, misunderstanding, changing scope and requirements and other factors might influence what data (and especially past data) means and how can it be used.

By guaranteeing the immutability of key's value once it is set, PumpkinDB forces its users to think of their data through a temporal perspective.

This approach is highly beneficial for implementing event sourcing and similar types of architectures.

What is PumpkinDB?

PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn't allow overriding values once they are set. Core motivation for immutability was that with the cost of storage declining, erasing data is effectively a strategical mistake.

While not intended for general purpose programming, its main objective is to facilitate building specialized application-specific and generic databases with a particular focus on immutability and processing data as close to storage as possible, incurring as little communication penalty as possible.

Applications communicate with PumpkinDB by sending small PumpkinScript programs over a network interface (or API when using PumpkinDB as an embedded solution).

PumpkinDB offers a wide array of primitives for concurrency, storage, journalling, indexing and other common building blocks.

Why is it a database engine?

The core ideas behind PumpkinDB stem from the so called lazy event sourcing approach which is based on storing and indexing events while delaying domain binding for as long as possible. That said, the intention of this database is to be a building block for different kinds of architectures, be it classic event sourcing (using it as an event store), lazy event sourcing (using indices) or anything else. It's also possible to implement different approaches within a single database for different parts of the domain.

Instead of devising custom protocols for talking to PumpkinDB, the protocol of communication has become a pipeline to a script executor. This offers us enormous extension and flexibility capabilities.

While an external application can talk to PumpkinDB over a network connection, PumpkinDB's engine itself is embeddable and can be used directly. Currenly, it is available for Rust applications only, but this may one day extend to all languages that can interface with C.

Client libraries

Language	Library	Status
Rust	pumpkindb_client	Early release (0.2.0)
Java	pumpkindb-client	Pre-release

Trying it out

You can download PumpkinDB releases from GitHub.

Docker

You can try out latest PumpkinDB HEAD revision by using a docker image:

$ docker pull pumpkindb/pumpkindb

Alternatively, you can build the image yourself:

$ docker build . -t pumpkindb/pumpkindb

Run the server:

$ docker run -p 9981:9981 -ti pumpkindb/pumpkindb
2017-04-12T02:52:47.440873517+00:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-12T02:52:47.440983318+00:00 INFO pumpkindb - Starting up
2017-04-12T02:52:47.441122740+00:00 INFO pumpkindb_engine::storage - Available disk space is approx. 56Gb, setting database map size to it
2017-04-12T02:52:47.441460231+00:00 INFO pumpkindb - Starting 4 schedulers
2017-04-12T02:52:47.442375937+00:00 INFO pumpkindb - Listening on 0.0.0.0:9981

Finally, connect to it using pumpkindb-term:

$ docker run -ti pumpkindb/pumpkindb pumpkindb-term 172.17.0.1:9981 # replace IP with the docker host IP

Building from the source code

You are also welcome to clone the repository and build it yourself. You will need Rust Nightly to do this. The easiest way to get it is to use rustup

$ rustup install nightly
$ rustup override set nightly # in PumpkinDB directory

After that, you can run PumpkinDB server this way:

$ cargo build --all
$ ./target/debug/pumpkindb
2017-04-03T10:43:49.667667-07:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-03T10:43:49.668660-07:00 INFO pumpkindb - Starting up
2017-04-03T10:43:49.674139-07:00 INFO pumpkindb_engine::storage - Available disk space is approx. 7Gb, setting database map size to it
2017-04-03T10:43:49.675759-07:00 INFO pumpkindb - Starting 8 schedulers
2017-04-03T10:43:49.676113-07:00 INFO pumpkindb - Listening on 0.0.0.0:9981

You can connect to it using pumpkindb-term:

$ ./target/debug/pumpkindb-term
Connected to PumpkinDB at 0.0.0.0:9981
To send an expression, end it with `.`
Type \h for help.
PumpkinDB> ["Name" HLC CONCAT "Jopn Doe" ASSOC COMMIT] WRITE.

PumpkinDB> ["Name" HLC CONCAT "John Doe" ASSOC COMMIT] WRITE.

PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP CURSOR/VAL] READ (Get last value).
"John Doe"
PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP DUP CURSOR/PREV DROP CURSOR/VAL] READ (Get previous value).
"Jopn Doe"

(The above example shows how one can query and navigate for values submitted at a different time, using low level primitives).

You can change some of the server's parameters by creating pumpkindb.toml:

[storage]
path = "path/to/db"
# By default, mapsize will equal to the size of
# available space on the disk, except on Windows,
# where default would be 1Gb.
# `mapsize` is a theoretical limit the database can
# grow to. However, on Windows, this also means that
# the database file will take that space.
# This parameter allows to specify the mapsize
# in megabytes.
# mapsize = 2048

[server]
port = 9981

Components

PumpkinDB project is split into a couple of separate components (crates):

pumpkinscript — PumpkinScript parser. Allows to convert text PumpkinScript form into binary one.
pumpkindb_engine — Core PumpkinDB library. Provides PumpkinScript scheduler, and a standard library of instructions
pumpkindb_mio_server — Async MIO-based PumpkinDB server library. Useful for building custom PumpkinProtocol-compatible servers.
pumpkindb_client — PumpkinProtocol client library.
pumpkindb_server — Stock PumpkinDB server. Built on top of pumpkindb_mio_server.
pumpkindb_term — console-based PumpkinDB server client.
doctests — a small utility to run instructions doctests.

Contributing

This project is in its very early days and we will always be welcoming contributors.

Our goal is to encourage frictionless contributions to the project. In order to achieve that, we use Unprotocols C4 process. Please read it, it will answer a lot of questions. Our goal is to merge pull requests as quickly as possible and make new stable releases regularly.

In a nutshell, this means:

We merge pull requests rapidly (try!)
We are open to diverse ideas
We prefer code now over consensus later

To learn more, read our contribution guidelines

We also maintain a list of issues that we think are good starters for new contributors.

Backers

Support us with a monthly donation and help us continue our activities. [Become a backer]

pumpkindb's People

Contributors

Stargazers

Watchers

Forkers

ddd-ftw yrashk m-2k omarkj kangkot hhy5277 techscientist cybernetics montazze fengweijp s-soroosh mindis piamancini saem gsdu8g9 theduke stuarth missingfaktor mylearning2017 alexxnica kryndex rushmorem practise2017 stethd julianpistorius dhardy sathishkumarks linpingchuan maniacs-db tsw424 mariszo union-io dailypips codetriage-readme-bot tempbottle hnfgns ariesdevil danbruder jesusoctavioas duzhanyuan daveduthie oetyng tokcum rembinis1981 sinhasantos eliasyaoyc ajunlonglive standardgalactic rs-store-personal rs-store playfloor evanrichter mayhemheroes dhilip89 awesomedatatool ubi-mirrors yordis manelpin doytsujin

pumpkindb's Issues

Problem: no easy way to figure out if a word returns "optional" values

Optional values are [multiple]values wrapped into a closure. SOME? and NONE? words refer to this concept.

We do, however, have CURSOR/* family of words that don't indicate what they return. Unless you've read the documentation, you can't really know.

Proposed solution: introduce a naming convention for words that return an optional value.

For example, ? as a prefix.

This way TRY can become ?EVAL

> [DUP] ?EVAL

"Soft" cursor words:

> ?CURSOR/NEXT

This can be pronounced as "maybe-EVAL, maybe-CURSOR/NEXT". The boolean counterparts ("CURSOR/NEXT?" which return a boolean, can be pronounced as "is-CURSOR/NEXT")

Problem: capturing stack at once is not trivial

This is an upcoming problem. My current thinking that the server should only
be a binary packet streaming server, and those who want to play with PumpkinDB using
text form, should use a command-line PumpkinDB client (we can call it pumpkindb-term) that will compile code to binary form and send it over.

The idea is that when a binary form is received by the server and evaluated the resulting stack is simply discarded as there's no way to know just how much of the stack the requestor actually want.

But stack will sometimes need to be captured. The only way to communicate back from the script to the requestor (or other parties, for that matter) is to use pubsub capabilities of the server (not in yet, just something I am thinking about — but I strongly believe this is the way to go). So lets imagine publishing to a pubsub channel would be something like

<data> <channel> SEND

So, what if need to send the entire stack? How do we capture it?

Proposed solution: introduce a STACK word that will capture the stack and put it as a binary on top of the stack, so that capturing the stack and sending it over will be as simple as

STACK <channel> SEND

Problem: write_1000_kv_pairs_in_isolated_txns sporadically fails

 Error::Code(-30799, 'MDB_KEYEXIST: Key/data pair already exists’)

Problem: wall clock going back in time between restarts will break HLC guarantee

Basically, it means that after a restart values returned by HLC will be less than those
generated before.

Proposed solution: let timestamp module persist last generated HLC.

Letting it store in the same database is rather intricate: there might be no write txn, the write txn might never succeed, etc. So I propose that PumpkinDB maintains a separate "meta" database within the environment with its own transactions.

Any better solution (that would involve I/O penalty?)

Problem: in multi-VM setup, multiple write transactions are possible

It is not a problem right now, but soon enough multi-VM setups will be a thing (use all the cores!) and they need to synchronize their access to writing, since lmdb only permits one active writer

Problem: impossible to implement script context scope

Here's an example. When we do a storage transaction, it'd be nice to have something like this:

["Key" "Value" ASSOC COMMIT] WRITE

However, ASSOC and COMMIT need to have a reference to their transaction. Even though we can make this transaction serializable to a byte array and push it onto the stack before evaluating the closure, my concern is that this will unnecessarily mess up the stack and make reading it really difficult.

Hence, there needs to be a mechanism in script::VM to provide context to the code that's being evaluated.

Problem: LT? and GT? only work for byte arrays

They don't work well on numeric values

Proposed solution: introduce UINT/LT? and UINT/GT? words for unsigned integers.

Problem: penalizing valid closures

Right now, all words that evaluate given closures, parse them to ensure they are valid before injecting them into the scheduler. The reason for that is, once scheduled, it's impossible to extract them out due to possible misinterpretations and generally unknown size of the extension at the evaluation time. However, this carries a significant performance penalty.

Proposed solution: let scheduler operate not on a single Vector, but perhaps a vector of slices, this way enabling the invalid slice ejection (or, if we don't want to allocate code on stack — we didn't do this yet in passes for code, then Vec of Vecs). This way the error will still occur, but not invalid_error, but rather decoding_error, and the code will be ejected

Problem: impossible to do logical operations

Proposed solution:

AND: takes two items off the top of the stack and pushes [1] back if both of them are equal to [1], otherwise pushes [0]
NOT: take one item off the top of the stack and pushes [1] back if the item is equal to [0], and pushes [0] back if the item is equal to [1]
OR: takes two items off the top of the stack and pushes [1] back if either of them is equal to [1], otherwise pushes [0]

Problem: no function to convert script binary form to text form

Problem: hashing primitives needed for equality indexing

In order to avoid storing the whole piece of data in the key (which is often not going to work because of key size limitation), equality indexing should write keys like:

[index][value hash] => [key]

Proposed solution: find out what hashing algorithms are typically used by databases for their HASH index types, and implement the most reasonable one.

(Previously, I used SHA-1 there, but I am not sure if it is the best candidate)

Problem: empty text form script considered invalid

parse("") will return DecodingError

Problem: working with storage data is not zero-copy

This issue is mentioned in ad6da52

There was a realization that Env<'a>
wants all referenced elements on the stack to allow have
the same lifetime of 'a. However, the lifetime of the
values extracted through lmdb is limited by the lifetime
of the transaction.

Potential solution:

This means that right now we have to resort to copying
values. However, I suspect there's still a chance we can
have an optimization here. We might be able to have a
temporary "transaction context" stack that has a different
lifetime. The feasibility and consequences of such an idea
are still to be researched. But if defined well, this
should definitely improve the overall quiality and performance
of PumpkinDB.

Problem: text protocol is limited

It doesn't allow for streaming or compact communication.

Proposed solution: use binary-frames protocol with binary form encoding and write a CLI tool (pumpkindb-term or whatever it should be called) to be the client REPL.

Also, it would make sense to switch to Mio from Tokio as I think it might fit into our patterns better. But that's to be decided.

Related to #29

Problem: expressing numbers in text form is inconvenient

One has to make sure that they are hexadecimal and little-endian.

Proposed solution: prohibit words to start with - and 0..9 and allow these to be interpreted as bigint/biguints.

Problem: persisting keys in binary script form is bad for traversing

Right now, both keys and values are persisted with their size prefixes (according to the binary form data representation rules). This motivation for this was to avoid allocating memory to put a prefix in front of them upon retrieval.

However, when traversing a range of keys (CURSOR/SEEK and then /NEXT), it'll trip up the cursor when a "composite key" size changes, changing the very beginning of it.

Proposed solution: make Env stack store references to data without their size prefixes (as slices already have length) and hence we can move to persisting data as is in the database. Having stack that was is not big of a deal because we can write size prefixes on demand when we are sending data back, avoiding allocations there.

Problem: inability to iterate through keys

This is the key functionality required to implement timestamped data, indexing, etc.

Proposed solution: implement a set of CURSOR-related words

[...code...] CURSOR

Cursor-words:

Problem: PumpkinScript has no comments

Sometimes it's nice to annotate your code

Proposed solution: use Forth ( ... ) notation. Other ideas are also welcomed.

Problem: heap allocation constructs are too verbose

Proposed solution: extract them into a single macro

Problem: processing JSON events

In lot of cases, events are serialized to JSON. We can't really process them for indexing or other needs right now.

Proposed solution: start a JSON collection of words. Here's the starting proposal:

TBC

Problem: data needs to be communicated

Instead of collecting "query results" data on stack,
data needs to be sent off elsewhere.

First thing it will be useful for sending the resulting
stack over (see #29)

Proposed solution: implement a pubsub mechanism
where consumers can subscribe to certain keys and
scripts can send data over to those keys.

Something like

<data> <key> SEND

Problem: server will fail if script is invalid

Connected to localhost.
Escape character is '^]'.
["script
Connection closed by foreign host.

On the console:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Incomplete', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/result.rs:868

Proposed solution: don't simply unwrap the result of script parsing.

Problem: handle_set placement allows to override some words

                           self => handle_set,
                           // storage
                           self.storage => handle_write,
                           self.storage => handle_read,
...

Because handle_set checks Env's dictionary for unknown words, it means all words checked below handle_set entry, can be overridden.

Proposed solution: move it to the bottom of the list and add a comment saying it shouldn't be moved.

Problem: SEND implementation is blocking

Right now, because of how pubsub::PublisherAccessor.send is implemented, it'll wait until the message has fully gone through, which will block the scheduler until then.

This is, of course, unacceptable!

Proposed solution: have a send_async that returns a receiver, if receiver's try_recv on the next round is not yielding anything, keep rescheduling.

Problem: appeal, clarity and scope of the project description

It is not clear how to market the project best these days and going further. How broad or how narrow should we get? How to get a clear, clicking message to attract the right kind of people?

Problem: hardcoded Rust-type based errors are problematic

First of all, they require error structure maintenance and it is not clear how one should send them back to the client.

A solution that fits errors into PumpkinScript native form of some kind is needed.

Problem: VM only has one Receiver for SEND, not separated by EnvIds

This means it'll try to reschedule all Envs until something is received. Ouch!

Proposed solution: tag Receivers with EnvId

Problem: impossible to write multiline scripts in pumpkindb-term

This is very unknown when you want to write something large or copy-paste a multiliner.

Proposed solution: make pumpkindb-term read lines until it sees a period at the very end. But this should be in a way so that modifying lines above and below within that one input is possible (similar to how shell behaves when you type in \ or even in a better way)

Problem: unable to obtain the depth of the stack

Proposed solution: implement DEPTH word

Problem: heap [re]allocation failure will lead to a process crash

Right now, the returned pointer is not checked for being a null pointer (which is an indication of failure) and the returned slice will point to unallocated memory, crashing the entire process once accessed.

Proposed solution: make Env.alloc() and Env initializers return a Result and return an Err if [re]allocation failed so the error can be caught and properly handled.

Problem: Server runs out of memory when parsing invalid input

There is a bug in the parser which will cause memory exhaustion:

$ cargo run --bin pumpkindb                                                                                                                                       
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/pumpkindb`
Available disk space is approx. 392Gb, setting database map size to it
Listening on 0.0.0.0:9981
fatal runtime error: out of memory

In another terminal

$ telnet localhost 9981                                                                                                                                      
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
["script
Connection closed by foreign host.

Problem: release build is busy doing nothing

When there's nothing to do, pumpkindb process still spins the CPU hard:

Here's what profiler has to say:

Problem: server is blocked by channel recv

Currently, Service for server::PlainServer resorts to using channel recv to get a result from the VM. While it's currently "fast enough", this is not great.

Proposed solution: switch it to futures.

Problem: doc tests are not executable and don't verify much

Most of the time I end up just copying examples as is into the examples section.

Proposed solution: write actual doctests as a set of programs (builtins format) that should return 0 or 1 at the top of the stack

Something like

test_swap : 1 2 SWAP 2 WRAP 2 1 2 WRAP EQUAL?.

Eventually this can be used to run these automatically

Problem: collecting query results in a streaming fashion

We can't really wait for the stack to be completed to send the results back (nor would it be efficient in any way), so we need to figure out a way to do this

Proposed solution:

Implement a YIELD word (val -- ) results of which can be collected in runtime, for example:

[ [ [... YIELD] CURSOR] READ] "queue_name" ->QUEUE

[ [ [... YIELD] CURSOR] READ] "queue_name" ->STACK

(the latter would collect values and put them into the stack at once).

This can be implemented as a field in Env (for example), holding the stack of YieldConsumers or something like this. A top one would be a simple discarding consumer.

Problem: not easy to iterate over cursors

It requires a little bit of ceremony.

Proposed solution: introduce helper words:

CURSOR/DOWHILE : ['iterator SET 'closure SET 'c SET
                   [c closure EVAL [c iterator EVAL] [0] IFELSE] DOWHILE] EVAL/SCOPED.
?CURSOR/DOWHILE : ['iterator SET 'closure_ SET 'c SET
                   c [?CURSOR/CUR closure_ EVAL] iterator CURSOR/DOWHILE] EVAL/SCOPED.
?CURSOR/DOWHILE-PREFIXED : ['closure__ SET
                            'prefix SET
                            CURSOR 'c SET
                            c prefix CURSOR/SEEK?
                            [c [UNWRAP OVER 0 prefix LENGTH SLICE prefix EQUAL?
                              closure__ [0] IFELSE
                            ] 'CURSOR/NEXT? ?CURSOR/DOWHILE]
                            IF] EVAL/SCOPED.

Problem: EnvHeap allocator is wasteful

When it tries to allocate, if the size doesn't fit into the remainder of the chunk, it allocates a new chunk and allocates there. Next time around, though, it only looks at the last chunk and therefore wastes all the space left in previous chunks.

Proposed solution: let allocator scan through chunks to see if any of them have enough space left. This way we can saturate chunks a lot better.

Problem: querying "the last update" isn't trivial

Since PumpkinDB doesn't allow to override keys, one has to write to a new key value every time something is recorded. We have at least one primitive for that right now, and that is HLC. Since it is guaranteed to generate unique and monotonically growing values, they can be used to sequence any collection of values. All one has to do is to compose the key this way:

["key" HLC CONCAT "value" ASSOC COMMIT] WRITE

Similarly, things like journalling events can be done in the same way:

["journal" HLC CONCAT "id" "value" 2 WRAP ASSOC COMMIT] WRITE

However, how do we quickly find what's the last element in the collection?

Proposed solution: introduce a number of words to work with the concept. For now, lets refer to it as "ordered collection" (ORDCOLL), but we might want to find a better name.

I can think of some of the primitives:

ORDCOLL/PAIR : ROT ROT SWAP EVAL CONCAT SWAP

This is used to produce key value pairs for ordered collections:

>[[HLC] "testkey" 1 ORDCOLL/PAIR ASSOC COMMIT] WRITE
>[[HLC] "testkey" 2 ORDCOLL/PAIR ASSOC COMMIT] WRITE
>[[HLC] "testkey" 3 ORDCOLL/PAIR ASSOC COMMIT] WRITE
>[[HLC] "testkey" "Hello" ORDCOLL/PAIR ASSOC COMMIT] WRITE

The reason why it takes value in is to make it more future proof (what if this or other collections will do something with the value, too?)

ORDCOLL/LAST : 2DUP CONCAT SWAP DROP CURSOR DUP ROT CURSOR/SEEK? [DUP CURSOR/CUR UNWRAP DROP ROT DUP LENGTH SWAP ROT ROT 0 SWAP SLICE EQUAL? [CURSOR/CUR] [CURSOR/PREV] IFELSE ] [CURSOR/LAST SWAP DROP] IFELSE UNWRAP SWAP DROP

This one returns the last element in the collection:

> ["testkey" HLC/MAX ORDCOLL/LAST] READ
"Hello"

I can also see ORDCOLL/FIRST added for completeness (although it should be relatively trivial)

P.S. Keep in mind that the above implementations haven't been thoroughly tested!

Problem: set of stack words is incomplete

While following Forth to the letter isn't our goal (many stack operations can be implemented using the very basic ones), not relying on its experience isn't particularly wise.

Proposed solution:

Implement following words:

I'm personally less happy about PICK and ROLL as they make one feel like stack is an array, but are there good arguments to implement them?

Problem: no conditional control flow word

It's impossible to do something if some condition is true or false.

Proposed solution: IF and IF_ELSE words.

IF would take cond code and eval code if cond is [1]
IF_ELSE would take cond code code_else and eval code if cond is [1] or eval code_else otherwise.

Examples

Exit if false:

... NOT [EXIT] IF

Duplicate if true, drop if false

... [DUP] [DROP] IF_ELSE

Problem: copying to slices is usually a silly cycle

I am pretty sure there should be a more efficient way, but even if there isn't, it still should be wrapped in a macro.

Problem: allocating binaries of a certain length

I've ran into this issue while working on a simple algorithm for finding the last item in an ordered collection. Creating a binary of N bytes (0xFF) wastes cycles and lots of allocation.

Proposed solution: size byte ALLOC word, that would allocate size bytes with the value of byte in one go.

Problem: impossible to compare byte arrays on the stack

Proposed solution: EQUAL? word, pops two items from the stack, if they are equal, pushes [1] on top of the stack, otherwise pushes [0]

Problem: Impossible to get current cursor key/value without allocating a closure

CURSOR/CUR will return [key value]

Proposed solution: introduce CURSOR/CUR! that will produce two values on the stack

Feature: scoped_dictionary

Introduced through #67

Problem: implementing words without subwords is difficult

Technically speaking, loaded words can define their own
words but they will leak into the remainder of the program,
which less than ideal.

Using stack alone requires a significant amount of juggling
that makes writing code incredibly frustrating. I personally
believe that the value of stack based programming languages
is in concatenative abilities (composition via stack), not
just being able to do everything by juggling items in the
stack.

Solution: make all SETs and DEFs done within any closure
local to that closure.

This means that if eventually we'll need to do code injection
that's not injecting what effectively amounts to closures,
we'll need to have a separate pass result type that can
indicate that.

Problem: compact numeric byte arrays are encoded with 2 bytes

The most annoying case is the upcoming usage of "boolean" values — [0] and [1]. Right now they are encoded as [1, 0] and [1,1], respectively.

Proposed solution: allocate a band for very numbers, at the very least 0 and 1 (as these are the most expected small numbers). Under the current scheme, anything in [124u8...128u8 ] can be used.

Problem: small numbers range 0..120 is arbitrary

It leaves very little reserved space (123 and 127, because words start at 128) and makes no sense — why 120?

Proposed solution: decrease this range to 0..99, 100..110 represent 0..10 themselves, respectively. 121-123 prefixes move down to 111-113, and 114-127 become the new reserved pool. This is also related to #1

Problem: impossible to extract a portion of a byte array

This is important when (for example) comparing key prefixes

Proposed solution: SLICE word that would take a byte array, start index and end index and push back a slice of that array. Also, in conjunction, LENGTH word would be very useful so that it is easy to implement something like SLICE_FROM that would always slice to the end.

Problem: failing tests hang `cargo test`

Problem: server's frame size is too large

I believe 64 bit frame size is just too much as 32 bits would have sufficed — is there any reason to send more than 4GB in a single frame?

Proposed solution: switch to 32-bit sizes.

pumpkindb / pumpkindb Goto Github PK

pumpkindb's Introduction

PumpkinDB

Why immutable?

What is PumpkinDB?

Why is it a database engine?

Client libraries

Trying it out

Docker

Building from the source code

Components

Contributing

Backers

Sponsors

pumpkindb's People

Contributors

Stargazers

Watchers

Forkers

pumpkindb's Issues

Examples

Recommend Projects

Recommend Topics

Recommend Org