Giter Site home page Giter Site logo

atilaneves / cerealed Goto Github PK

View Code? Open in Web Editor NEW
90.0 9.0 3.0 251 KB

Powerful binary serialisation library for D

License: BSD 3-Clause "New" or "Revised" License

D 100.00%
serialisation d custom-serialisation unmarshall packets dlang dlanguage

cerealed's Introduction

cerealed

Build Status Coverage

My DConf 2014 talk mentioning Cerealed.

Binary serialisation library for D. Minimal to no boilerplate necessary. Example usage:

    import cerealed;

    assert(cerealise(5) == [0, 0, 0, 5]); // returns ubyte[]
    cerealise!(a => assert(a == [0, 0, 0, 5]))(5); // faster than using the bytes directly

    assert(decerealise!int([0, 0, 0, 5]) == 5);

    struct Foo { int i; }
    const foo = Foo(5);
    // alternate spelling
    assert(foo.cerealize.decerealize!Foo == foo);

The example below shows off a few features. First and foremost, members are serialised automatically, but can be opted out via the @NoCereal attribute. Also importantly, members to be serialised in a certain number of bits (important for binary protocols) are signalled with the @Bits attribute with a compile-time integer specifying the number of bits to use.

    struct MyStruct {
        ubyte mybyte1;
        @NoCereal uint nocereal1; //won't be serialised
        @Bits!4 ubyte nibble;
        @Bits!1 ubyte bit;
        @Bits!3 ubyte bits3;
        ubyte mybyte2;
    }

    assert(MyStruct(3, 123, 14, 1, 2, 42).cerealise == [ 3, 0xea /*1110 1 010*/, 42]);

What if custom serialisation is needed and the default, even with opt-outs, won't work? If an aggregate type defines a member function void accept(C)(ref C cereal) it will be used instead. To get the usual automatic serialisation from within the custom accept, the grainAllMembers member function of Cereal can be called, as shown in the example below. This function takes a ref argument so rvalues need not apply.

The function to use on Cereal to marshall or unmarshall a particular value is grain. This is essentially what Cerealiser.~= and Decerealiser.value are calling behind the scenes (and therefore cerealise and decerealise).

    struct CustomStruct {
        ubyte mybyte;
        ushort myshort;
        void accept(C)(auto ref C cereal) {
             //do NOT call cereal.grain(this), that would cause an infinite loop
             cereal.grainAllMembers(this);
             ubyte otherbyte = 4; //make it an lvalue
             cereal.grain(otherbyte);
        }
    }

    assert(CustomStruct(1, 2).cerealise == [ 1, 0, 2, 4]);

    //because of the custom serialisation, passing in just [1, 0, 2] would throw
    assert([1, 0, 2, 4].decerealise!CustomStruct == CustomStruct(1, 2));

The other option when custom serialisation is needed that avoids boilerplate is to define a void postBlit(C)(ref C cereal) function instead of accept. The marshalling or unmarshalling is done as it would in the absence of customisation, and postBlit is called to fix things up. It is a compile-time error to define both accept and postBlit. Example below.

    struct CustomStruct {
        ubyte mybyte;
        ushort myshort;
        @NoCereal ubyte otherByte;
        void postBlit(C)(auto ref C cereal) {
             //no need to handle mybyte and myshort, already done
             if(mybyte == 1) {
                 cereal.grain(otherByte);
             }
        }
    }

    assert(CustomStruct(1, 2).cerealise == [ 1, 0, 2, 4]);
    assert(CustomStruct(3, 2).cerealise == [ 1, 0, 2]);

For more examples of how to serialise structs, check the tests directory or real-world usage in my MQTT broker also written in D.

Arrays are by default serialised with a ushort denoting array length followed by the array contents. It happens often enough that networking protocols have explicit length parameters for the whole packet and that array lengths are implicitly determined from this. For this use case, the @RestOfPacket attribute tells cerealed to not add the length parameter. As the name implies, it will "eat" all bytes until there aren't any left.

    private struct StringsStruct {
        ubyte mybyte;
        @RestOfPacket string[] strings;
    }

    //no length encoding for the array, but strings still get a length each
    const bytes = [ 5, 0, 3, 'f', 'o', 'o', 0, 6, 'f', 'o', 'o', 'b', 'a', 'r',
                    0, 6, 'o', 'h', 'w', 'e', 'l', 'l'];
    const strs = StringStruct(5, ["foo", "foobar", "ohwell"]);
    assert(strs.cerealise == bytes);
    assert(bytes.decerealise!StringsStruct ==  strs);

Derived classes can be serialised via a reference to the base class, but the child class must be registered first:

    class BaseClass  { int a; this(int a) { this.a = a; }}
    class ChildClass { int b; this(int b) { this.b = b; }}
    Cereal.registerChildClass!ChildClass;
    BaseClass obj = ChildClass(3, 7);
    assert(obj.cerealise == [0, 0, 0, 3, 0, 0, 0, 7]);

There is now support for InputRange and OutputRange objects. Examples can be found in the tests directory

Advanced Usage

Frequently in networking programming, the packets themselves encode the length of elements to follow. This happens often enough that Cerealed has two UDAs to automate this kind of serialisation: @ArrayLength and @LengthInBytes. The former specifies how to get the length of an array (usually a variable) The latter specifies how many bytes the array takes. Examples:

    struct Packet {
        ushort length;
        @ArrayLength("length") ushort[] array;
    }
    auto pkt = decerealise!Packet([
        0, 3, //length
        0, 1, 0, 2, 0, 3]); //array of 3 ushorts
    assert(pkt.length == 3);
    assert(pkt.array == [1, 2, 3]);

    struct Packet {
        static struct Header {
            ubyte ub;
            ubyte totalLength;
        }
        enum headerSize = unalignedSizeof!Header; //2 bytes

        Header header;
        @LengthInBytes("totalLength - headerSize") ushort[] array;
    }
    auto pkt = decerealise!Packet([
        7, //ub1
        6, //totalLength in bytes
        0, 1, 0, 2]); //array of 2 ushorts
    assert(pkt.ub1 == 7);
    assert(pkt.totalLength == 6);
    assert(pkt.array == [1, 2]);

Related Projects

cerealed's People

Contributors

atilaneves avatar coldencullen avatar colonelthirtytwo avatar tchaloupka avatar vicencb avatar vitalfadeev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cerealed's Issues

Deserialization of string from mutable buffer leads to unexpected behaviour

Here is the test case:

import cerealed;

struct Foo { string value; }

auto data = Foo("bar").cerealise;
auto f1 = data.decerealise!Foo;
assert(f1.value == "bar");
//f1.value = f1.value.idup;

auto data2 = Foo("baz").cerealise;
data[] = data2[];
auto f2 = data.decerealise!Foo;
assert(f2.value == "baz");
//f2.value = f2.value.idup;
assert(f1.value == "bar");

When string is deserialized it points to the original muttable buffer so when other data come to it, string is modified even though it's immutable.

Feature request: Explicit endianness

Some protocols/file formats may use a different endianness than the one native to the platform it is (de)serialised on.

Should a Big-Endian architecture ever gain D support, an application serialising data on it and then deserialising it on a Little-Endian platform will fail to do so.

File formats, such as PNG, use Big Endian ("network byte ordering"), yet the dominant Endianness today is Little Endian. Having a shortcut for saying "Please bswap if I need it" would be handy, instead of having to define this as custom behaviour.

Error: variable length used before set

with DMD64 D Compiler v2.075.0

struct Sample {}
void main(string[] args) {
  import cerealed;
  import std.file : read;
  const(ubyte)[] bytes = cast(const(ubyte)[])(read("samples.dat"));
  bytes.decerealise!(Sample[]);
}
../../.dub/packages/cerealed-0.6.8/cerealed/src/cerealed/cereal.d(192,18): Error: variable length used before set

Did I miss something?

Custom structs as keys in an associative array are not serialized

module test;

import cerealed;
import std.stdio;

struct Pair {
  string s1;
  int a;
}


void main() {
  auto p = Pair("foo", 5);

  int[Pair] map;
  map[p] = 105;

  auto ser = new Cerealiser();
  ser ~= map;

  auto deser = new Decerealiser(ser.bytes);
  int[Pair] outcu = deser.value!(int[Pair]);
  writeln(outcu);
}

Expected output: [Pair("foo", 5): 105]
Actual output : [Pair("", 0): 105]

Feature request: support for multidimensional arrays

Hello Atila,
As you suggested I've signed in and posted my feature request here.
I would like to use cerealed with a struct like this:

import cerealed;

struct nested_associative_array { nested_associative_array[int] x; }

struct some_struct {
    string[] x;
    int[][] y;
    nested_associative_array[] z;
};

void main() {
    some_struct original, restored;

    auto enc = Cerealiser();
    enc ~= original;
    auto dec = Decerealiser(enc.bytes);
    restored = dec.value!(some_struct);

    assert(original==restored);
}

The serialization process compiles fine, but the restoring part fails to compile.
I've tried to figure out how to do it, but I am still a beginner on D.
How difficult do you think it is?
The alternative, orange, works fine, but the struct I'm trying to export to disk is a few hundreds of MB in size and orange takes hours to complete and may also run out of memory. Your implementation, instead, is really fast and efficient!

Serializing ubyte[] fails to compile

Attempting to serialize a ubyte[] array results in a compiler error. Line 130 in cereal.d attempts to call dup on length, which isn't an array.

Test Program:

void main() {
    auto test = new ubyte[5];


    auto cerealiser = Cerealiser();
    cerealiser ~= test;

    auto deceralizer = Decerealizer(cerealiser.bytes);
    auto testcpy = deceralizer.value!(ubyte[]);

    assert(test == testcpy);
}

Output

> dub build
Target cerealed 0.6.1 is up to date. Use --force to rebuild.
Building test ~master configuration "application", build type debug.
Compiling using dmd...
../.dub/packages/cerealed-0.6.1/src/cerealed/cereal.d(130): Error: template object.dup cannot deduce function from argument types !()(short), candidates are:
/usr/include/dmd/druntime/import/object.d(1785):        object.dup(T : V[K], K, V)(T aa)
/usr/include/dmd/druntime/import/object.d(1821):        object.dup(T : V[K], K, V)(T* aa)
/usr/include/dmd/druntime/import/object.d(3122):        object.dup(T)(T[] a) if (!is(const(T) : T))
/usr/include/dmd/druntime/import/object.d(3138):        object.dup(T)(const(T)[] a) if (is(const(T) : T))
/usr/include/dmd/druntime/import/object.d(3149):        object.dup(T : void)(const(T)[] a)
../.dub/packages/cerealed-0.6.1/src/cerealed/cereal.d(110): Error: template instance cerealed.cereal.decerealiseArrayImpl!(Decerealiser, ubyte[], short) error instantiating
../.dub/packages/cerealed-0.6.1/src/cerealed/decerealiser.d(63):        instantiated from here: grain!(Decerealiser, ubyte[], short)
source/app.d(18):        instantiated from here: value!(ubyte[], short)
FAIL .dub/build/application-debug-linux.posix-x86_64-dmd_2068-52E1E9557A38487A968910C60DCC3586/ test executable
Error executing command build:
dmd failed with exit code 1.

ArrayLength attribute doesn't work properly on char[] array

The following Testcase fails at runtime:

struct TestStruct {
    ubyte len;
    @ArrayLength("len") char[] foo;
}

auto decerealiser = Decerealiser([2, 1, 2]);
// CerealException: "@ArrayLength of 2 units of type dchar (4 bytes) larger than remaining byte array (2 bytes)"
auto ts = decerealiser.value!TestStruct;
assert(ts.foo == ['\x01', '\x02']);
assert(ts.foo.length == 2);

Alias this issue

This broken down testcase doesn't compile, as long there is a alias this.

    struct TestStruct {
        char[] txt;
        alias txt this;
    }

    auto decerealiser = Decerealiser([0, 2, 0x90, 0x91]);
    auto ts = decerealiser.value!TestStruct;
    assert(ts.txt == ['\x90', '\x91']);
    assert(ts.txt.length == 2);

Max size for serialization is 64KiB

Why does Cerealed limit max size of serialized data to 64KiB?

Consider:

import std.stdio;

import cerealed;

struct ubyted {
    ubyte[] payload;
}
void main() {
    ubyte[] buff = new ubyte[2 ^^ 16];
    Cerealiser cer;

    cer ~= buff;
}

This code will fail with assert error on line 87 in cereal.d
Error: cerealed-0.6.6/src/cerealed/cereal.d(87): overflow

A buffer with length less with one will not trigger assert.
Is it by design, or is it a bug?

Typo in example code

I didn't want to go through the trouble of making a full pull request for so little but on the other hand it's important that example code work without any problem:

diff --git a/README.md b/README.md
index 192b10f..a700ea0 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ Example usage:
     cerealiser ~= cast(ubyte)42;
     assert(cerealiser.bytes == [ 0, 0, 0, 5, 42]);

-    auto deceralizer = Decerealizer([ 0, 0, 0, 5, 42]); //US spelling works too
+    auto decerealizer = Decerealizer([ 0, 0, 0, 5, 42]); //US spelling works too
     assert(decerealizer.value!int == 5);
     assert(decerealizer.value!ubyte == 42);

Add endianness support

Cerealed seems to have no support for different byte orders.
The library should clearly support both little and big endian.

Overflow on associative arrays with length > 65535

Thanks for the great lib, using it everyday now.

One of my associative arrays went above 65535 elements and I got an overflow error from cerealise, there's code below to reproduce the issue. I noticed the ushorts in cerealed's source code and tried somewhat haphazardly to replace them with size_t, got it half working but not fully.

Would it be thinkable to have an optional type parameter, default ushort (as now) ? Or any other solution supporting bigger assoc. arrays ?
.
overflow use case on assoc. arrays:

import cerealed;

import std.algorithm;
import std.range;
import std.stdio;

void main()
{
  test( 10 );  // ok
  test( 65535 );  // ok
  test( 65536 );  // overflow
}

ubyte[] test( in size_t n )
{
  bool[size_t] data;
  n.iota.each!( (i) { data[ i ] = true; } );

  writeln( "About to cerealise data for n: ", data.length );
  auto ret = cerealise( data );
  writeln( "Done." );
  
  return ret;
}

Deserialize through base class reference?

I see that there is a sample for serializing through base class reference:

Cereal.registerChildClass!ChildClass;
BaseClass obj = ChildClass(3, 7);
assert(obj.cerealise == [0, 0, 0, 3, 0, 0, 0, 7]);

However, as the assert shows, the derived class name isn't stored anywhere. So when deserializing, you need to know the type anyway. Imagine serializing a BaseClass[]. Some elements of the array are instances of BaseClass, some are instances of ChildClass. Shouldn't in the case of classes an identifier be stored saying what is the real class that is being stored? Something like "assert(obj.cerealise == ["BaseClass", 0, 0, 0, 3, 0, 0, 0, 7]);", so that later on in the code you can do:

Cereal.registerChildClass!ChildClass;
BaseClass obj = datastream.decerealise()

and obj will be actually the ChildClass instance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.