hit9 / bitproto Goto Github PK

View Code? Open in Web Editor NEW

127.0 7.0 16.0 1.36 MB

The bit level data interchange format for serializing data structures (long term maintenance).

Home Page: https://bitproto.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 8.18% C 90.26% Go 0.68% Makefile 0.54% Assembly 0.28% Vim Script 0.06%

bitproto protocol data-interchange data-exchange embedded serialization marshalling serialization-library

bitproto's Introduction

The bit level data interchange format

Introduction

Bitproto is a fast, lightweight and easy-to-use bit level data interchange format for serializing data structures.

The protocol describing syntax looks like the great protocol buffers, but in bit level:

proto example

message Data {
    uint3 the = 1
    uint3 bit = 2
    uint5 level = 3
    uint4 data = 4
    uint11 interchange = 6
    uint6 format = 7
}  // 32 bits => 4B

The Data above is called a message, it consists of 7 fields and will occupy a total of 4 bytes after encoding.

This image shows the layout of data fields in the encoded bytes buffer:

Code Example

Code example to encode bitproto message in C:

struct Data data = {.the = 7,
                    .bit = 7,
                    .level = 31,
                    .data = 15,
                    .interchange = 2047,
                    .format = 63};
unsigned char s[BYTES_LENGTH_DATA] = {0};
EncodeData(&data, s);
// length of s is 4, and the hex format is
// 0xFF 0xFF 0xFF 0xFF

And the decoding example:

struct Data d = {0};
DecodeData(&d, s);
// values of d's fields is now:
// 7 7 31 15 2047 63

Simple and green, isn't it?

Code patterns of bitproto encoding are exactly similar in C, Go and Python.

Features

Supports bit level data serialization, born for embedded development.
Supports protocol extensiblity , for forward-compatibility.
Easy to start, syntax is similar to the well-known protobuf.
Supports languages: C (without dynamic memory allocation), Go, Python.
Blazing fast encoding/decoding, benchmark.
We can clearly know the size and arrangement of encoded data, fields are compact without a single bit gap.

Schema Example

An example for a simple overview of the bitproto schema grammar:

proto pen

// Constant value
const PEN_ARRAY_SIZE = 2 * 3;

// Bit level enum.
enum Color : uint3 {
    COLOR_UNKNOWN = 0
    COLOR_RED = 1
    COLOR_BLUE = 2
    COLOR_GREEN = 3
}

// Type alias
type Timestamp = int64

// Composite structure
message Pen {
    Color color = 1
    Timestamp produced_at = 2
    uint3 number = 3
    uint13 value = 4
}

message Box {
    // Fixed-size array
    Pen[PEN_ARRAY_SIZE] pens = 1;
}

Run the bitproto compiler to generate C files:

$ bitproto c pen.bitproto

Which generates two files: pen_bp.h and pen_bp.c.

We can have an overview of the generated code for the C language:

// Constant value
#define PEN_ARRAY_SIZE 6

// Bit level enum.
typedef uint8_t Color; // 3bit

#define COLOR_UNKNOWN 0
#define COLOR_RED 1
#define COLOR_BLUE 2
#define COLOR_GREEN 3

// Type alias
typedef int64_t Timestamp; // 64bit

// Number of bytes to encode struct Pen
#define BYTES_LENGTH_PEN 11

// Composite structure
struct Pen {
    Color color; // 3bit
    Timestamp produced_at; // 64bit
    uint8_t number; // 3bit
    uint16_t value; // 13bit
};

// Number of bytes to encode struct Box
#define BYTES_LENGTH_BOX 63

struct Box {
    // Fixed-size array
    struct Pen pens[6]; // 498bit
};

You can checkout directory example for a larger example.

Why bitproto ?

There is protobuf, why bitproto?

Origin

The bitproto was originally made when I'm working with embedded programs on micro-controllers. Where usually exists many programming constraints:

tight communication size.
limited compiled code size.
better no dynamic memory allocation.

Protobuf does not live on embedded field natively, it doesn't target ANSI C out of box.

Scenario

It's recommended to use bitproto over protobuf when:

Working on or with microcontrollers.
Wants bit-level message fields.
Wants to know clearly how many bytes the encoded data will occupy.

For scenarios other than the above, I recommend to use protobuf over bitproto.

Vs Protobuf

The differences between bitproto and protobuf are:

bitproto supports bit level data serialization, like the bit fields in C.
bitproto doesn't use any dynamic memory allocations. Few of protobuf C implementations support this, except nanopb.
bitproto doesn't support varying sized data, all types are fixed sized.

bitproto won't encode typing or size reflection information into the buffer. It only encodes the data itself, without any additional data, the encoded data is arranged like it's arranged in the memory, with fixed size, without paddings, think setting aligned attribute to 1 on structs in C.
Protobuf works good on forward compatibility. For bitproto, this is the main shortcome of bitproto serialization until v0.4.0, since this version, it supports message's extensiblity by adding two bytes indicating the message size at head of the message's encoded buffer. This breaks the traditional data layout design by encoding some minimal reflection size information in, so this is designed as an optional feature.

Known Shortcomes

bitproto doesn't support varying sized types. For example, a unit37 always occupies 37 bits even you assign it a small value like 1.

Which means there will be lots of zero bytes if the meaningful data occupies little on this type. For instance, there will be n-1 bytes left zero if only one byte of a type with n bytes size is used.

Generally, we actually don't care much about this, since there are not so many bytes in communication with embedded devices. The protocol itself is meant to be designed tight and compact. Consider to wrap a compression mechanism like zlib on the encoded buffer if you really care.
bitproto can't provide best encoding performance with extensibility.

There's an optimization mode designed in bitproto to generate plain encoding/decoding statements directly at code-generation time, since all types in bitproto are fixed-sized, how-to-encode can be determined earlier at code-generation time. This mode gives a huge performance improvement, but I still haven't found a way to make it work with bitproto's extensibility mechanism together.

Documentation and Links

Documentation:

Website: https://bitproto.readthedocs.io
Documentation in Chinese: https://bitproto.readthedocs.io/zh/latest
Quick start tutorial
Grammar guide, in one page

Editor syntax highlighting plugins:

Faq:

What’s the advantage of this over a bit field?

Blog posts:

Dev notes in Chinese: https://writings.sh/post/bitproto-notes

License

BSD3

bitproto's People

Contributors

Stargazers

Watchers

Forkers

haifenghuang baojiweicn uttaravadina cattle-call fatty-bird fs000x waszil solarcaratuva maybelaterornot sec-fork mikimotoh tordf aiyou0731 thecupcakeisalie faustpy lanjackg2003

bitproto's Issues

bitproto.c compilation errors using TI's CCS.

I made a simple library project in Code Composer Studio (v12.4) using the default TI toolchain.
Unfortunately bitproto.c (v1.1.1) does not compile, with the following errors:

"../src/bitproto.c", line 50: error #29: expected an expression
"../src/bitproto.c", line 50: error #20: identifier "k" is undefined
"../src/bitproto.c", line 155: error #29: expected an expression
"../src/bitproto.c", line 155: error #20: identifier "k" is undefined
"../src/bitproto.c", line 165: error #29: expected an expression
"../src/bitproto.c", line 165: error #20: identifier "k" is undefined
"../src/bitproto.c", line 402: remark #1532-D: (ULP 5.3) Detected vsprintf() operation(s). Recommend moving them to RAM during run time or not using as these are processing/power intensive
"../src/bitproto.c", line 402: remark #2553-D: (ULP 14.1) Array index (involving "ctx") of type "int". Recommend using "unsigned int"
"../src/bitproto.c", line 414: error #29: expected an expression
"../src/bitproto.c", line 414: error #20: identifier "k" is undefined
"../src/bitproto.c", line 527: error #29: expected an expression
"../src/bitproto.c", line 527: error #20: identifier "k" is undefined
10 errors detected in the compilation of "../src/bitproto.c".

The BpMessageDescriptor structure defiined as a macro, doesn't seems to work with this compiler.

Ambiguity about the functionality of Encode/Decode

For byte-aligned field, Encode/Decode overwrite the dest. For non-byte-aligned field, Encode/Decode is just a BitOR

Message definition:

message Fipr {
    uint1 api_type = 1
    uint5 engine_id = 2
    uint13 pre_install_data = 3
}
message Flit {
    uint32 dw0 = 1
    uint32 dw1 = 2
    uint32 dw2 = 3
    uint32 dw3 = 4
}

Test code:

TEST(BitProto, 0) {
    Fipr fipr{};
    uint8_t buf[256] = {};

    fipr.engine_id = 1;
    EncodeFipr(&fipr, buf);
    fipr.engine_id = 0;
    DecodeFipr(&fipr, buf);
    EXPECT_EQ(fipr.engine_id, 1);

    fipr.engine_id = 2;
    DecodeFipr(&fipr, buf);
    EXPECT_EQ(fipr.engine_id, 1); // FAILED, It's 3

    fipr.engine_id = 8;
    EncodeFipr(&fipr, buf);
    fipr.engine_id = 0;
    DecodeFipr(&fipr, buf);
    EXPECT_EQ(fipr.engine_id, 8); // FAILED, It's 9
}

TEST(BitProto, 1) {
    Flit flit{};
    uint8_t buf[256] = {};

    flit.dw0 = 1;
    EncodeFlit(&flit, buf);
    flit.dw0 = 0;
    DecodeFlit(&flit, buf);
    EXPECT_EQ(flit.dw0, 1);

    flit.dw0 = 2;
    EncodeFlit(&flit, buf);
    flit.dw0 = 0;
    DecodeFlit(&flit, buf);
    EXPECT_EQ(flit.dw0, 2);

    flit.dw0 = 4;
    EncodeFlit(&flit, buf);
    flit.dw0 = 0;
    DecodeFlit(&flit, buf);
    EXPECT_EQ(flit.dw0, 4);
}

The implementation of BpCopyBufferBits in lib/c/bitproto.c make this happen.

What do you think? Is this ambiguity reasonable?

Invalid enum field name on imported proto on python generation

Test on bitproto v1.1.0:

// b.bitproto
proto b;

enum X : uint1 {
    OK = 0;
}

// a.bitproto
proto a

import "b.bitproto"


message A {
    b.X x = 1;
}

Generates for python:

# b_bp.py

# Code generated by bitproto. DO NOT EDIT.


import json
from dataclasses import dataclass, field
from typing import ClassVar, Dict, List, Union
from enum import IntEnum, unique

from bitprotolib import bp


@unique
class X(IntEnum): # 1bit
    OK = 0


# Aliases for backwards compatibility
OK: X = X.OK


_X_VALUE_TO_NAME_MAP: Dict[X, str] = {
    X.OK: "OK",
}

def bp_processor_X() -> bp.Processor:
    return bp.EnumProcessor(bp.Uint(1))

# a_bp.py


import json
from dataclasses import dataclass, field
from typing import ClassVar, Dict, List, Union
from enum import IntEnum, unique

from bitprotolib import bp

import b_bp as b


@dataclass
class A(bp.MessageBase):
    # Number of bytes to serialize class A
    BYTES_LENGTH: ClassVar[int] = 1

    x: Union[int, b.X] = b.X.b.OK          # !!!!!!!!!!!!! ERROR here
    # This field is a proxy to hold integer value of enum field 'x'
    _enum_field_proxy__x: int = field(init=False, repr=False) # 1bit

    def __post_init__(self):
        # initialize handling of enum field 'x' as `enum.IntEnum`
        if not isinstance(getattr(A, "x", False), property):
            self._enum_field_proxy__x = self.x
            A.x = property(A._get_x, A._set_x)  # type: ignore

    @staticmethod
    def dict_factory(kv_pairs):
        return {k: v for k, v in kv_pairs if not k.startswith('_enum_field_proxy__')}

    def _get_x(self) -> b.X:
        """property getter for enum proxy field"""
        return b.X(self._enum_field_proxy__x)

    def _set_x(self, val):
        """property setter for enum proxy field"""
        self._enum_field_proxy__x = val

    def bp_processor(self) -> bp.Processor:
        field_processors: List[bp.Processor] = [
            bp.MessageFieldProcessor(1, b.bp_processor_X()),
        ]
        return bp.MessageProcessor(False, 1, field_processors)

    def bp_set_byte(self, di: bp.DataIndexer, lshift: int, b: bp.byte) -> None:
        if di.field_number == 1:
            self.x |= (b.X(b) << lshift)
        return

    def bp_get_byte(self, di: bp.DataIndexer, rshift: int) -> bp.byte:
        if di.field_number == 1:
            return (self.x >> rshift) & 255
        return bp.byte(0)  # Won't reached

    def bp_get_accessor(self, di: bp.DataIndexer) -> bp.Accessor:
        return bp.NilAccessor() # Won't reached

    def encode(self) -> bytearray:
        """
        Encode this object to bytearray.
        """
        s = bytearray(self.BYTES_LENGTH)
        ctx = bp.ProcessContext(True, s)
        self.bp_processor().process(ctx, bp.NIL_DATA_INDEXER, self)
        return ctx.s

    def decode(self, s: bytearray) -> None:
        """
        Decode given bytearray s to this object.
        :param s: A bytearray with length at least `BYTES_LENGTH`.
        """
        assert len(s) >= self.BYTES_LENGTH, bp.NotEnoughBytes()
        ctx = bp.ProcessContext(False, s)
        self.bp_processor().process(ctx, bp.NIL_DATA_INDEXER, self)

    def bp_process_int(self, di: bp.DataIndexer) -> None:
        return

Seems like flatbuffers and cap'n proto, are there any comparasion to these projects?

Image on the Readme.md does not respresent what the library is doing

The description is:
message Data {
uint3 the = 1
uint3 bit = 2
uint5 level = 3
uint4 data = 4
uint11 interchange = 6
uint6 format = 7
}

As far as I understood, from playing with the python generated code, the 3 bits of 'the' won't be the 3 MSB of the first bytes, they would be the 3 LSB for the first byte. So the image should be:

Is that correct?

Feature: add a "packed" or `-p` flag

Problem

Currently, bitproto's serialized buffer is not packed.

Here's an example:
Here's my simple schema:

proto mytest

message Data {
    uint20 preamble = 1
    uint15 start = 2
    uint64 data = 3
    uint15 crc = 4
}

We compile using:

$ bitproto c .\myschema.bitproto -O

Using the newly generated file, we can do the following tests:

uint8_t buffer[BYTES_LENGTH_DATA] = {0};

struct Data data = {
    .preamble = 0x123,
    .start = 32767, //< 2^15 -1
    .data = 0,
    .crc = 0,
};

EncodeData(&data, buffer);
printf("0x%02X\n", buffer[0]);
printf("0x%02X\n", buffer[1]);
printf("0x%02X\n", buffer[2]);
printf("0x%02X\n", buffer[3]);

When executing this test code, we print

$ make && ./test
0x23
0x01
0xF0

Therefore, this structure is padded.
If the structure would be packed (no padding), I would expect:

0x48
0xFF
0xFF

Here's a visualization of the packed structure.

Solution

Adding a -p packed flag.

Why is this valuable

This would enable bitproto's users to have better control over how their data is serialized.

Floating point support

Hi,

Is it a possible goal to add floating point support to bitproto? If it seems feasible, I can try to work on it.

Simple example fails Python 3.10.8

The simple Pen example latest Ubuntu with Python 3.10.8 virtualenv (pen.py is main.py) fails as follows:

import-im6.q16: attempt to perform an operation not allowed by the security policy PS' @ error/constitute.c/IsCoderAuthorized/421. ./pen.py: line 4: syntax error near unexpected token ('
./pen.py: line 4: `p = bp.Pen(color=bp.COLOR_RED, produced_at=1611515729966)'

Other test examples throw up a similar error. Would very much appreciate a fix. No doubt something simple.
In any case thanks for all the great work.

Does bitproto support `union`?

Hi. First, thanks for this very cool project!
I have one question: does bitproto support union (or something like protobuf's oneof)? It would be useful for protocol frames where the deserialization of the remaining bytes depends on some kind of frame id in the header.

I tried to look into https://bitproto.readthedocs.io/en/latest/language.html, but unfortunately didn't find anything.

[Bug report] Parsing error when the message name is contained in the message fields

I think the parser will fail if the message name is contained in the message field type.

Steps to reproduce
Using the following schema:

proto mytest

enum payload_data_type_e: uint8 {
    PAYLOAD_DATA_TYPE_UNKNOWN = 1
    PAYLOAD_DATA_TYPE_DETECTOR = 2
}

message payload_data_t {
    payload_data_type_e type = 1 
}

When compiling

$ bitproto -O c .\mytest.bitproto 
error:  .\myschema.bitproto:L9  { => Some grammar error occurred during bitproto grammar parsing.

My guess is that this error is due to a kind of name collision in the lexer.

Representing enums as enum.IntEnum

Hi, great library! I have one question: do you plan to implement enum type representation with enum.IntEnum? It would be nice to be able to use enum classes instead of simple constants.

Example proto:

proto enums;

enum MyEnum : uint2 {
    MY_ENUM_UNKNOWN = 0;
    MY_ENUM_ONE = 1;
    MY_ENUM_TWO = 2;
    MY_ENUM_THREE = 3;
}

message EnumContainer {
    MyEnum my_enum = 2;
}

If the generated python code would be something like this:

import json
from dataclasses import dataclass, field
from typing import ClassVar, Dict, List
from enum import IntEnum, unique

from bitprotolib import bp


@unique
class MyEnum(IntEnum):
    MY_ENUM_UNKNOWN = 0
    MY_ENUM_ONE = 1
    MY_ENUM_TWO = 2
    MY_ENUM_THREE = 3


_MYENUM_VALUE_TO_NAME_MAP: Dict[MyEnum, str] = {
    MyEnum.MY_ENUM_UNKNOWN: "MY_ENUM_UNKNOWN",
    MyEnum.MY_ENUM_ONE: "MY_ENUM_ONE",
    MyEnum.MY_ENUM_TWO: "MY_ENUM_TWO",
    MyEnum.MY_ENUM_THREE: "MY_ENUM_THREE",
}


def bp_processor_MyEnum() -> bp.Processor:
    return bp.EnumProcessor(bp.Uint(2))


@dataclass
class EnumContainer(bp.MessageBase):
    # Number of bytes to serialize class EnumContainer
    BYTES_LENGTH: ClassVar[int] = 1
 
    # based on https://stackoverflow.com/a/61709025/1169220
    my_enum: MyEnum = MyEnum.MY_ENUM_UNKNOWN
    _my_enum: int = field(init=False, repr=False)

    def __post_init__(self):
        if not isinstance(getattr(EnumContainer, "my_enum", False), property):
            self._my_enum = self.my_enum
            EnumContainer.my_enum = property(EnumContainer._get_my_enum, EnumContainer._set_my_enum)

    def _get_my_enum(self):
        return MyEnum(self._my_enum)

    def _set_my_enum(self, val):
        self._my_enum = val

...

Then it could be used like this:

import enums_bp as bp

enum_container = bp.EnumContainer(my_enum=bp.MyEnum.MY_ENUM_ONE)
s = enum_container.encode()
enum_container_new = bp.EnumContainer()
enum_container_new.decode(s)
assert enum_container_new.my_enum == enum_container.my_enum
assert enum_container_new.encode() == s
assert isinstance(enum_container.my_enum, bp.MyEnum)
assert isinstance(enum_container_new.my_enum, bp.MyEnum)
assert enum_container.my_enum is bp.MyEnum.MY_ENUM_ONE
assert enum_container_new.my_enum is bp.MyEnum.MY_ENUM_ONE

What do you think?

Auto-generate rst documentation for serialization structures

Add a script, which will allow to generate .rst documentation for the given bit proto schema.

Rationale: Schemas often represents messages being exchanged between the devices and messages needs documentation in human readable form. Bitproto already supports generation of the serialization/deserialization source code. Having auto generation of the documentation should add additional great value to the bitproto.