Giter Site home page Giter Site logo

pot's Introduction

Pot

A concise storage format, written for BonsaiDb.

Pot forbids unsafe code crate version Live Build Status HTML Coverage Report for main branch Documentation for main branch

Pot is an encoding format used within BonsaiDb. Its purpose is to provide an encoding format for serde that:

  • Is self-describing.

  • Is safe to run in production.

  • Is compact. While still being self-describing, Pot's main space-saving feature is not repeating symbols/identifiers more than one time while serializing. When serializing arrays of structures, this can make a major difference. The logs.rs example demonstrates this:

    $ cargo test --example logs -- average_sizes --nocapture
    Generating 1000 LogArchives with 100 entries.
    +-----------------+-----------+-----------------+
    | Format          | Bytes     | Self-Describing |
    +-----------------+-----------+-----------------+
    | pot             | 2,627,586 | yes             |
    +-----------------+-----------+-----------------+
    | cbor            | 3,072,369 | yes             |
    +-----------------+-----------+-----------------+
    | msgpack(named)  | 3,059,915 | yes             |
    +-----------------+-----------+-----------------+
    | msgpack         | 2,559,907 | no              |
    +-----------------+-----------+-----------------+
    | bincode(varint) | 2,506,844 | no              |
    +-----------------+-----------+-----------------+
    | bincode         | 2,755,137 | no              |
    +-----------------+-----------+-----------------+

Example

use serde_derive::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug, Eq, PartialEq)]
pub struct User {
    id: u64,
    name: String,
}

fn main() -> Result<(), pot::Error> {
    let user = User {
        id: 42,
        name: String::from("ecton"),
    };
    let serialized = pot::to_vec(&user)?;
    println!("User serialized: {serialized:02x?}");
    let deserialized: User = pot::from_slice(&serialized)?;
    assert_eq!(deserialized, user);

    // Pot also provides a "Value" type for serializing Pot-encoded payloads
    // without needing the original structure.
    let user: pot::Value<'_> = pot::from_slice(&serialized)?;
    println!("User decoded as value: {user}");

    Ok(())
}

Outputs:

User serialized: [50, 6f, 74, 00, a2, c4, 69, 64, 40, 2a, c8, 6e, 61, 6d, 65, e5, 65, 63, 74, 6f, 6e]
User decoded as value: {id: 42, name: ecton}

Benchmarks

Because benchmarks can be subjective and often don't mirror real-world usage, this library's authors aren't making any specific performance claims. The way Pot achieves space savings requires some computational overhead. As such, it is expected that a hypothetically perfect CBOR implementation could outperform a hypothetically perfect Pot implementation.

The results from the current benchmark suite executed on GitHub Actions are viewable here. The current suite is only aimed at comparing the default performance for each library.

Serialize into new Vec<u8>

Serialize Benchmark Violin Chart

Serialize into reused Vec<u8>

Serialize with Reused Buffer Benchmark Violin Chart

Deserialize

Deserialize Benchmark Violin Chart

Open-source Licenses

This project, like all projects from Khonsu Labs, is open-source. This repository is available under the MIT License or the Apache License 2.0.

To learn more about contributing, please see CONTRIBUTING.md.

pot's People

Contributors

baptiste0928 avatar ecton avatar wackbyte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pot's Issues

`Value::from_serialize` panics when serialization returns an error

Example:

use pot::Value;
use serde::{Serialize, Serializer};

#[derive(Debug)]
struct Fallible;

impl Serialize for Fallible {
    fn serialize<S>(&self, _serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        Err(serde::ser::Error::custom("oh no!"))
    }
}

let value = Value::from_serialize(Fallible); // panic!
println!("{value:?}");

Serialization can return an error for types such as Path, whose Serialize implementation serializes it as a string and errors when the path contains invalid UTF-8.

The panic occurs because the Infallible error type invokes unreachable!() in its implementation of serde::ser::Error::custom.

Investigate supporting `serde(flatten)`

@ModProg reports that using serde(flatten) fails with a SequenceSizeMustBeKnown error. I remembered specifically looking at Bincode when implementing my sequence serialization code. Turns out that Bincode doesn't support serde(flatten) either.

I believe we should be able to add support for this, however, possibly without introducing a new format revision.

Docs say `Float` must be 4 or 8 bytes but it can be 2

pot/pot/src/format.rs

Lines 157 to 158 in 7a1ac59

/// A floating point value. Argument is the byte length, minus one. Must be
/// either 4 or 8 bytes. The following bytes are the value, stored in little

let f16 = pot::to_vec(&1.5).unwrap();
println!("f16: {f16:?}"); //  f16: [80, 111, 116, 0, 97, 0, 62]

80, 111, 116, 0 is the header, 97 is the atom, so the number is only 2 bytes: 0, 62

Add optional user-controlled version tag

One common problem with long-lived serialized types is version compatibility. Sometimes, it's best to just start fresh with a structure design. I realized that since Pot has a built-in version header, we could adopt a new version that supports encoding a single integer value at the front of the Pot file.

Parsing a Pot file would use an API like this:

pot::from_slice(&data, |version, deserializer| {
    match version {
        0 => CurrentStruct::from(deserializer.deserialize::<V0Struct>()),
        1 => deserializer.deserialize::<CurrentStruct>(),
        _ => Err(pot::Error::UnknownVersion)
    }
})?

I'm sure an even better pattern can be devised, but that snippet should get the idea across of what I want to accomplish. Parsing a file with an older header would call the callback with 0, allowing someone to upgrade to this new flow from existing data without any breaking changes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.