Giter Site home page Giter Site logo

pumpkindb / pumpkindb Goto Github PK

View Code? Open in Web Editor NEW
1.4K 42.0 59.0 1.04 MB

Immutable Ordered Key-Value Database Engine

Home Page: http://pumpkindb.org

License: Mozilla Public License 2.0

Rust 99.29% Makefile 0.09% Papyrus 0.61%
event-sourcing database rust forth query indexing key-value storage concatenative

pumpkindb's Introduction

PumpkinDB

Gitter chat Code Triagers OpenCollective OpenCollective

Build status (Linux) Build Status
Build status (Windows) Windows Build status
Project status Usable, between alpha and beta
Production-readiness Depends on your risk tolerance

PumpkinDB is an immutable ordered key-value database engine, featuring:

  • ACID transactions
  • Persistent storage
  • An embedded programming language (PumpkinScript)
  • Binary keys and values (allows any encoding to be used: JSON, XML, Protobuf, Cap'n Proto, etc.)
  • Standalone and embedded scenarios

Why immutable?

Simply put, the data replaced is data deleted and is therefore, an unsafe way to manage data. Bugs, misunderstanding, changing scope and requirements and other factors might influence what data (and especially past data) means and how can it be used.

By guaranteeing the immutability of key's value once it is set, PumpkinDB forces its users to think of their data through a temporal perspective.

This approach is highly beneficial for implementing event sourcing and similar types of architectures.

What is PumpkinDB?

PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn't allow overriding values once they are set. Core motivation for immutability was that with the cost of storage declining, erasing data is effectively a strategical mistake.

While not intended for general purpose programming, its main objective is to facilitate building specialized application-specific and generic databases with a particular focus on immutability and processing data as close to storage as possible, incurring as little communication penalty as possible.

Applications communicate with PumpkinDB by sending small PumpkinScript programs over a network interface (or API when using PumpkinDB as an embedded solution).

PumpkinDB offers a wide array of primitives for concurrency, storage, journalling, indexing and other common building blocks.

Why is it a database engine?

The core ideas behind PumpkinDB stem from the so called lazy event sourcing approach which is based on storing and indexing events while delaying domain binding for as long as possible. That said, the intention of this database is to be a building block for different kinds of architectures, be it classic event sourcing (using it as an event store), lazy event sourcing (using indices) or anything else. It's also possible to implement different approaches within a single database for different parts of the domain.

Instead of devising custom protocols for talking to PumpkinDB, the protocol of communication has become a pipeline to a script executor. This offers us enormous extension and flexibility capabilities.

While an external application can talk to PumpkinDB over a network connection, PumpkinDB's engine itself is embeddable and can be used directly. Currenly, it is available for Rust applications only, but this may one day extend to all languages that can interface with C.

Client libraries

Language Library Status
Rust pumpkindb_client Early release (0.2.0)
Java pumpkindb-client Pre-release

Trying it out

You can download PumpkinDB releases from GitHub.

Docker

You can try out latest PumpkinDB HEAD revision by using a docker image:

$ docker pull pumpkindb/pumpkindb

Alternatively, you can build the image yourself:

$ docker build . -t pumpkindb/pumpkindb

Run the server:

$ docker run -p 9981:9981 -ti pumpkindb/pumpkindb
2017-04-12T02:52:47.440873517+00:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-12T02:52:47.440983318+00:00 INFO pumpkindb - Starting up
2017-04-12T02:52:47.441122740+00:00 INFO pumpkindb_engine::storage - Available disk space is approx. 56Gb, setting database map size to it
2017-04-12T02:52:47.441460231+00:00 INFO pumpkindb - Starting 4 schedulers
2017-04-12T02:52:47.442375937+00:00 INFO pumpkindb - Listening on 0.0.0.0:9981

Finally, connect to it using pumpkindb-term:

$ docker run -ti pumpkindb/pumpkindb pumpkindb-term 172.17.0.1:9981 # replace IP with the docker host IP

Building from the source code

You are also welcome to clone the repository and build it yourself. You will need Rust Nightly to do this. The easiest way to get it is to use rustup

$ rustup install nightly
$ rustup override set nightly # in PumpkinDB directory

After that, you can run PumpkinDB server this way:

$ cargo build --all
$ ./target/debug/pumpkindb
2017-04-03T10:43:49.667667-07:00 WARN pumpkindb - No logging configuration specified, switching to console logging
2017-04-03T10:43:49.668660-07:00 INFO pumpkindb - Starting up
2017-04-03T10:43:49.674139-07:00 INFO pumpkindb_engine::storage - Available disk space is approx. 7Gb, setting database map size to it
2017-04-03T10:43:49.675759-07:00 INFO pumpkindb - Starting 8 schedulers
2017-04-03T10:43:49.676113-07:00 INFO pumpkindb - Listening on 0.0.0.0:9981

You can connect to it using pumpkindb-term:

$ ./target/debug/pumpkindb-term
Connected to PumpkinDB at 0.0.0.0:9981
To send an expression, end it with `.`
Type \h for help.
PumpkinDB> ["Name" HLC CONCAT "Jopn Doe" ASSOC COMMIT] WRITE.

PumpkinDB> ["Name" HLC CONCAT "John Doe" ASSOC COMMIT] WRITE.

PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP CURSOR/VAL] READ (Get last value).
"John Doe"
PumpkinDB> [CURSOR DUP "Name" CURSOR/SEEKLAST DROP DUP CURSOR/PREV DROP CURSOR/VAL] READ (Get previous value).
"Jopn Doe"

(The above example shows how one can query and navigate for values submitted at a different time, using low level primitives).

You can change some of the server's parameters by creating pumpkindb.toml:

[storage]
path = "path/to/db"
# By default, mapsize will equal to the size of
# available space on the disk, except on Windows,
# where default would be 1Gb.
# `mapsize` is a theoretical limit the database can
# grow to. However, on Windows, this also means that
# the database file will take that space.
# This parameter allows to specify the mapsize
# in megabytes.
# mapsize = 2048

[server]
port = 9981

Components

PumpkinDB project is split into a couple of separate components (crates):

  • pumpkinscript — PumpkinScript parser. Allows to convert text PumpkinScript form into binary one.
  • pumpkindb_engine — Core PumpkinDB library. Provides PumpkinScript scheduler, and a standard library of instructions
  • pumpkindb_mio_server — Async MIO-based PumpkinDB server library. Useful for building custom PumpkinProtocol-compatible servers.
  • pumpkindb_client — PumpkinProtocol client library.
  • pumpkindb_server — Stock PumpkinDB server. Built on top of pumpkindb_mio_server.
  • pumpkindb_term — console-based PumpkinDB server client.
  • doctests — a small utility to run instructions doctests.

Contributing

This project is in its very early days and we will always be welcoming contributors.

Our goal is to encourage frictionless contributions to the project. In order to achieve that, we use Unprotocols C4 process. Please read it, it will answer a lot of questions. Our goal is to merge pull requests as quickly as possible and make new stable releases regularly.

In a nutshell, this means:

  • We merge pull requests rapidly (try!)
  • We are open to diverse ideas
  • We prefer code now over consensus later

To learn more, read our contribution guidelines

We also maintain a list of issues that we think are good starters for new contributors.

Backers

Support us with a monthly donation and help us continue our activities. [Become a backer]

Sponsors

Become a sponsor and get your logo on our README on Github with a link to your site. [Become a sponsor]

pumpkindb's People

Contributors

dhardy avatar eav avatar mardiros avatar matt8898 avatar omarkj avatar piamancini avatar rushmorem avatar stuarth avatar theduke avatar yrashk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pumpkindb's Issues

Problem: no easy way to figure out if a word returns "optional" values

Optional values are [multiple]values wrapped into a closure. SOME? and NONE? words refer to this concept.

We do, however, have CURSOR/* family of words that don't indicate what they return. Unless you've read the documentation, you can't really know.

Proposed solution: introduce a naming convention for words that return an optional value.

For example, ? as a prefix.

This way TRY can become ?EVAL

> [DUP] ?EVAL

"Soft" cursor words:

> ?CURSOR/NEXT

This can be pronounced as "maybe-EVAL, maybe-CURSOR/NEXT". The boolean counterparts ("CURSOR/NEXT?" which return a boolean, can be pronounced as "is-CURSOR/NEXT")

Problem: capturing stack at once is not trivial

This is an upcoming problem. My current thinking that the server should only
be a binary packet streaming server, and those who want to play with PumpkinDB using
text form, should use a command-line PumpkinDB client (we can call it pumpkindb-term) that will compile code to binary form and send it over.

The idea is that when a binary form is received by the server and evaluated the resulting stack is simply discarded as there's no way to know just how much of the stack the requestor actually want.

But stack will sometimes need to be captured. The only way to communicate back from the script to the requestor (or other parties, for that matter) is to use pubsub capabilities of the server (not in yet, just something I am thinking about — but I strongly believe this is the way to go). So lets imagine publishing to a pubsub channel would be something like

<data> <channel> SEND

So, what if need to send the entire stack? How do we capture it?

Proposed solution: introduce a STACK word that will capture the stack and put it as a binary on top of the stack, so that capturing the stack and sending it over will be as simple as

STACK <channel> SEND

Problem: wall clock going back in time between restarts will break HLC guarantee

Basically, it means that after a restart values returned by HLC will be less than those
generated before.

Proposed solution: let timestamp module persist last generated HLC.

Letting it store in the same database is rather intricate: there might be no write txn, the write txn might never succeed, etc. So I propose that PumpkinDB maintains a separate "meta" database within the environment with its own transactions.

Any better solution (that would involve I/O penalty?)

Problem: impossible to implement script context scope

Here's an example. When we do a storage transaction, it'd be nice to have something like this:

["Key" "Value" ASSOC COMMIT] WRITE

However, ASSOC and COMMIT need to have a reference to their transaction. Even though we can make this transaction serializable to a byte array and push it onto the stack before evaluating the closure, my concern is that this will unnecessarily mess up the stack and make reading it really difficult.

Hence, there needs to be a mechanism in script::VM to provide context to the code that's being evaluated.

Problem: penalizing valid closures

Right now, all words that evaluate given closures, parse them to ensure they are valid before injecting them into the scheduler. The reason for that is, once scheduled, it's impossible to extract them out due to possible misinterpretations and generally unknown size of the extension at the evaluation time. However, this carries a significant performance penalty.

Proposed solution: let scheduler operate not on a single Vector, but perhaps a vector of slices, this way enabling the invalid slice ejection (or, if we don't want to allocate code on stack — we didn't do this yet in passes for code, then Vec of Vecs). This way the error will still occur, but not invalid_error, but rather decoding_error, and the code will be ejected

Problem: impossible to do logical operations

Proposed solution:

  • AND: takes two items off the top of the stack and pushes [1] back if both of them are equal to [1], otherwise pushes [0]
  • NOT: take one item off the top of the stack and pushes [1] back if the item is equal to [0], and pushes [0] back if the item is equal to [1]
  • OR: takes two items off the top of the stack and pushes [1] back if either of them is equal to [1], otherwise pushes [0]

Problem: hashing primitives needed for equality indexing

In order to avoid storing the whole piece of data in the key (which is often not going to work because of key size limitation), equality indexing should write keys like:

[index][value hash] => [key]

Proposed solution: find out what hashing algorithms are typically used by databases for their HASH index types, and implement the most reasonable one.

(Previously, I used SHA-1 there, but I am not sure if it is the best candidate)

Problem: working with storage data is not zero-copy

This issue is mentioned in ad6da52

There was a realization that Env<'a>
wants all referenced elements on the stack to allow have
the same lifetime of 'a. However, the lifetime of the
values extracted through lmdb is limited by the lifetime
of the transaction.

Potential solution:

This means that right now we have to resort to copying
values. However, I suspect there's still a chance we can
have an optimization here. We might be able to have a
temporary "transaction context" stack that has a different
lifetime. The feasibility and consequences of such an idea
are still to be researched. But if defined well, this
should definitely improve the overall quiality and performance
of PumpkinDB.

Problem: text protocol is limited

It doesn't allow for streaming or compact communication.

Proposed solution: use binary-frames protocol with binary form encoding and write a CLI tool (pumpkindb-term or whatever it should be called) to be the client REPL.

Also, it would make sense to switch to Mio from Tokio as I think it might fit into our patterns better. But that's to be decided.

Related to #29

Problem: persisting keys in binary script form is bad for traversing

Right now, both keys and values are persisted with their size prefixes (according to the binary form data representation rules). This motivation for this was to avoid allocating memory to put a prefix in front of them upon retrieval.

However, when traversing a range of keys (CURSOR/SEEK and then /NEXT), it'll trip up the cursor when a "composite key" size changes, changing the very beginning of it.

Proposed solution: make Env stack store references to data without their size prefixes (as slices already have length) and hence we can move to persisting data as is in the database. Having stack that was is not big of a deal because we can write size prefixes on demand when we are sending data back, avoiding allocations there.

Problem: inability to iterate through keys

This is the key functionality required to implement timestamped data, indexing, etc.

Proposed solution: implement a set of CURSOR-related words

[...code...] CURSOR

Cursor-words:

  • CURSOR/FIRST
  • CURSOR/LAST
  • CURSOR/SEEK
  • CURSOR/NEXT
  • CURSOR/PREV

Problem: processing JSON events

In lot of cases, events are serialized to JSON. We can't really process them for indexing or other needs right now.

Proposed solution: start a JSON collection of words. Here's the starting proposal:

  • JSON? — tests if given data has a valid JSON syntax
  • JSON/OBJECT? — tests if given JSON is an object
  • JSON/STRING? — tests if given JSON is a string
  • JSON/BOOLEAN? — tests if given JSON is a boolean
  • JSON/ARRAY? — tests if given JSON is an array
  • JSON/NULL? — tests if given JSON is a null
  • JSON/NUMBER? — tests if given JSON is a number
  • JSON/HAS? — tests if given JSON has a field with a given name
  • JSON/GET — extract JSON object's field (or array index), as a JSON (if not present, returns null)
  • JSON/SET — sets JSON object's field to a value (or array index), as a JSON
  • JSON/STRING->BINARY — extract a UTF-8 binary from JSON's string ("Hello" -> Hello)
  • JSON/BINARY->STRING — opposite of JSON/STRING->BINARY
  • JSON/NUMBER->BINARY — extract JSON number as a binary (serialization format depends on the actual number type, but should always be follow the numeric ordering) deferring till 0.2

TBC

Problem: data needs to be communicated

Instead of collecting "query results" data on stack,
data needs to be sent off elsewhere.

First thing it will be useful for sending the resulting
stack over (see #29)

Proposed solution: implement a pubsub mechanism
where consumers can subscribe to certain keys and
scripts can send data over to those keys.

Something like

<data> <key> SEND

Problem: server will fail if script is invalid

Connected to localhost.
Escape character is '^]'.
["script
Connection closed by foreign host.

On the console:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Incomplete', /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/libcore/result.rs:868

Proposed solution: don't simply unwrap the result of script parsing.

Problem: handle_set placement allows to override some words

                           self => handle_set,
                           // storage
                           self.storage => handle_write,
                           self.storage => handle_read,
...

Because handle_set checks Env's dictionary for unknown words, it means all words checked below handle_set entry, can be overridden.

Proposed solution: move it to the bottom of the list and add a comment saying it shouldn't be moved.

Problem: SEND implementation is blocking

Right now, because of how pubsub::PublisherAccessor.send is implemented, it'll wait until the message has fully gone through, which will block the scheduler until then.

This is, of course, unacceptable!

Proposed solution: have a send_async that returns a receiver, if receiver's try_recv on the next round is not yielding anything, keep rescheduling.

Problem: impossible to write multiline scripts in pumpkindb-term

This is very unknown when you want to write something large or copy-paste a multiliner.

Proposed solution: make pumpkindb-term read lines until it sees a period at the very end. But this should be in a way so that modifying lines above and below within that one input is possible (similar to how shell behaves when you type in \ or even in a better way)

Problem: heap [re]allocation failure will lead to a process crash

Right now, the returned pointer is not checked for being a null pointer (which is an indication of failure) and the returned slice will point to unallocated memory, crashing the entire process once accessed.

Proposed solution: make Env.alloc() and Env initializers return a Result and return an Err if [re]allocation failed so the error can be caught and properly handled.

Problem: Server runs out of memory when parsing invalid input

There is a bug in the parser which will cause memory exhaustion:

$ cargo run --bin pumpkindb                                                                                                                                       
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/pumpkindb`
Available disk space is approx. 392Gb, setting database map size to it
Listening on 0.0.0.0:9981
fatal runtime error: out of memory

In another terminal

$ telnet localhost 9981                                                                                                                                      
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
["script
Connection closed by foreign host.

Problem: server is blocked by channel recv

Currently, Service for server::PlainServer resorts to using channel recv to get a result from the VM. While it's currently "fast enough", this is not great.

Proposed solution: switch it to futures.

Problem: doc tests are not executable and don't verify much

Most of the time I end up just copying examples as is into the examples section.

Proposed solution: write actual doctests as a set of programs (builtins format) that should return 0 or 1 at the top of the stack

Something like

test_swap : 1 2 SWAP 2 WRAP 2 1 2 WRAP EQUAL?.

Eventually this can be used to run these automatically

Problem: collecting query results in a streaming fashion

We can't really wait for the stack to be completed to send the results back (nor would it be efficient in any way), so we need to figure out a way to do this

Proposed solution:

Implement a YIELD word (val -- ) results of which can be collected in runtime, for example:

[ [ [... YIELD] CURSOR] READ] "queue_name" ->QUEUE

or

[ [ [... YIELD] CURSOR] READ] "queue_name" ->STACK

(the latter would collect values and put them into the stack at once).

This can be implemented as a field in Env (for example), holding the stack of YieldConsumers or something like this. A top one would be a simple discarding consumer.

Problem: not easy to iterate over cursors

It requires a little bit of ceremony.

Proposed solution: introduce helper words:

CURSOR/DOWHILE : ['iterator SET 'closure SET 'c SET
                   [c closure EVAL [c iterator EVAL] [0] IFELSE] DOWHILE] EVAL/SCOPED.
?CURSOR/DOWHILE : ['iterator SET 'closure_ SET 'c SET
                   c [?CURSOR/CUR closure_ EVAL] iterator CURSOR/DOWHILE] EVAL/SCOPED.
?CURSOR/DOWHILE-PREFIXED : ['closure__ SET
                            'prefix SET
                            CURSOR 'c SET
                            c prefix CURSOR/SEEK?
                            [c [UNWRAP OVER 0 prefix LENGTH SLICE prefix EQUAL?
                              closure__ [0] IFELSE
                            ] 'CURSOR/NEXT? ?CURSOR/DOWHILE]
                            IF] EVAL/SCOPED.

Problem: EnvHeap allocator is wasteful

When it tries to allocate, if the size doesn't fit into the remainder of the chunk, it allocates a new chunk and allocates there. Next time around, though, it only looks at the last chunk and therefore wastes all the space left in previous chunks.

Proposed solution: let allocator scan through chunks to see if any of them have enough space left. This way we can saturate chunks a lot better.

Problem: querying "the last update" isn't trivial

Since PumpkinDB doesn't allow to override keys, one has to write to a new key value every time something is recorded. We have at least one primitive for that right now, and that is HLC. Since it is guaranteed to generate unique and monotonically growing values, they can be used to sequence any collection of values. All one has to do is to compose the key this way:

["key" HLC CONCAT "value" ASSOC COMMIT] WRITE

Similarly, things like journalling events can be done in the same way:

["journal" HLC CONCAT "id" "value" 2 WRAP ASSOC COMMIT] WRITE

However, how do we quickly find what's the last element in the collection?

Proposed solution: introduce a number of words to work with the concept. For now, lets refer to it as "ordered collection" (ORDCOLL), but we might want to find a better name.

I can think of some of the primitives:

ORDCOLL/PAIR : ROT ROT SWAP EVAL CONCAT SWAP

This is used to produce key value pairs for ordered collections:

>[[HLC] "testkey" 1 ORDCOLL/PAIR ASSOC COMMIT] WRITE
>[[HLC] "testkey" 2 ORDCOLL/PAIR ASSOC COMMIT] WRITE
>[[HLC] "testkey" 3 ORDCOLL/PAIR ASSOC COMMIT] WRITE
>[[HLC] "testkey" "Hello" ORDCOLL/PAIR ASSOC COMMIT] WRITE

The reason why it takes value in is to make it more future proof (what if this or other collections will do something with the value, too?)

ORDCOLL/LAST : 2DUP CONCAT SWAP DROP CURSOR DUP ROT CURSOR/SEEK? [DUP CURSOR/CUR UNWRAP DROP ROT DUP LENGTH SWAP ROT ROT 0 SWAP SLICE EQUAL? [CURSOR/CUR] [CURSOR/PREV] IFELSE ] [CURSOR/LAST SWAP DROP] IFELSE UNWRAP SWAP DROP

This one returns the last element in the collection:

> ["testkey" HLC/MAX ORDCOLL/LAST] READ
"Hello"

I can also see ORDCOLL/FIRST added for completeness (although it should be relatively trivial)

P.S. Keep in mind that the above implementations haven't been thoroughly tested!

Problem: set of stack words is incomplete

While following Forth to the letter isn't our goal (many stack operations can be implemented using the very basic ones), not relying on its experience isn't particularly wise.

Proposed solution:

Implement following words:

  • NIP
  • TUCK
  • 2DUP
  • 2DROP
  • 2SWAP
  • 2ROT
  • 2NIP
  • 2TUCK

I'm personally less happy about PICK and ROLL as they make one feel like stack is an array, but are there good arguments to implement them?

Problem: no conditional control flow word

It's impossible to do something if some condition is true or false.

Proposed solution: IF and IF_ELSE words.

IF would take cond code and eval code if cond is [1]
IF_ELSE would take cond code code_else and eval code if cond is [1] or eval code_else otherwise.

Examples

Exit if false:

... NOT [EXIT] IF

Duplicate if true, drop if false

... [DUP] [DROP] IF_ELSE

Problem: allocating binaries of a certain length

I've ran into this issue while working on a simple algorithm for finding the last item in an ordered collection. Creating a binary of N bytes (0xFF) wastes cycles and lots of allocation.

Proposed solution: size byte ALLOC word, that would allocate size bytes with the value of byte in one go.

Feature: scoped_dictionary

Introduced through #67

Problem: implementing words without subwords is difficult

Technically speaking, loaded words can define their own
words but they will leak into the remainder of the program,
which less than ideal.

Using stack alone requires a significant amount of juggling
that makes writing code incredibly frustrating. I personally
believe that the value of stack based programming languages
is in concatenative abilities (composition via stack), not
just being able to do everything by juggling items in the
stack.

Solution: make all SETs and DEFs done within any closure
local to that closure.

This means that if eventually we'll need to do code injection
that's not injecting what effectively amounts to closures,
we'll need to have a separate pass result type that can
indicate that.

Problem: compact numeric byte arrays are encoded with 2 bytes

The most annoying case is the upcoming usage of "boolean" values — [0] and [1]. Right now they are encoded as [1, 0] and [1,1], respectively.

Proposed solution: allocate a band for very numbers, at the very least 0 and 1 (as these are the most expected small numbers). Under the current scheme, anything in [124u8...128u8 ] can be used.

Problem: small numbers range 0..120 is arbitrary

It leaves very little reserved space (123 and 127, because words start at 128) and makes no sense — why 120?

Proposed solution: decrease this range to 0..99, 100..110 represent 0..10 themselves, respectively. 121-123 prefixes move down to 111-113, and 114-127 become the new reserved pool. This is also related to #1

Problem: impossible to extract a portion of a byte array

This is important when (for example) comparing key prefixes

Proposed solution: SLICE word that would take a byte array, start index and end index and push back a slice of that array. Also, in conjunction, LENGTH word would be very useful so that it is easy to implement something like SLICE_FROM that would always slice to the end.

Problem: server's frame size is too large

I believe 64 bit frame size is just too much as 32 bits would have sufficed — is there any reason to send more than 4GB in a single frame?

Proposed solution: switch to 32-bit sizes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.