kaitai-io / kaitai_struct Goto Github PK

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby

Home Page: https://kaitai.io

Shell 100.00%

reverse-engineering protocol-analyser file-format declarative-language ruby java javascript python csharp golang

kaitai_struct's Introduction

Kaitai Struct

Note: if you want to make changes to the project, do not fork this repository kaitai_struct. Instead, choose the component you want to modify in the file tree above and fork that individual component instead.

This is an umbrella repository, containing the components only as submodules to make it easier to check out the entire project. Unless you want to modify this README, it is not the repo where you can make edits.

What is Kaitai Struct?

Kaitai Struct is a declarative language used for describing various binary data structures laid out in files or in memory: i.e. binary file formats, network stream packet formats, etc.

The main idea is that a particular format is described in Kaitai Struct language only once and then can be compiled with a ksc into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and give access to it in a nice, easy-to-comprehend API.

What it's used for?

Have you ever found yourself writing repetitive, error-prone and hard-to-debug code that reads binary data structures from file / network stream and somehow represents them in memory for easier access?

Kaitai Struct tries to make this job easier — you only have to describe the binary format once and then everybody can use it from their programming languages — cross-language, cross-platform.

Kaitai Struct includes a growing collection of format descriptions, available in formats submodule repository.

Can you give me a quick example?

Sure. Consider this simple .ksy format description file that describes the header of a GIF file (a popular web image format):

meta:
  id: gif
  file-extension: gif
  endian: le
seq:
  - id: header
    type: header
  - id: logical_screen
    type: logical_screen
types:
  header:
    seq:
      - id: magic
        contents: 'GIF'
      - id: version
        size: 3
  logical_screen:
    seq:
      - id: image_width
        type: u2
      - id: image_height
        type: u2
      - id: flags
        type: u1
      - id: bg_color_index
        type: u1
      - id: pixel_aspect_ratio
        type: u1

It declares that GIF files usually have a .gif extension and use little-endian integer encoding. The file itself starts with two blocks: first comes header and then comes logical_screen:

"Header" consists of "magic" string of 3 bytes ("GIF") that identifies that it's a GIF file starting and then there are 3 more bytes that identify format version (87a or 89a).
"Logical screen descriptor" is a block of integers:
- image_width and image_height are 2-byte unsigned ints
- flags, bg_color_index and pixel_aspect_ratio take 1-byte unsigned int each

This .ksy file can be compiled it into Gif.cs / Gif.java / Gif.js / Gif.php / gif.py / gif.rb and then instantly one can load .gif file and access, for example, it's width and height.

In C#

Gif g = Gif.FromFile("path/to/some.gif");
Console.WriteLine("width = " + g.LogicalScreen.ImageWidth);
Console.WriteLine("height = " + g.LogicalScreen.ImageHeight);

In Java

Gif g = Gif.fromFile("path/to/some.gif");
System.out.println("width = " + g.logicalScreen().imageWidth());
System.out.println("height = " + g.logicalScreen().imageHeight());

In JavaScript

See JavaScript notes in the documentation for a more complete quick start guide.

var g = new Gif(new KaitaiStream(someArrayBuffer));
console.log("width = " + g.logicalScreen.imageWidth);
console.log("height = " + g.logicalScreen.imageHeight);

In Lua

local g = Gif:from_file("path/to/some.gif")
print("width = " .. g.logical_screen.image_width)
print("height = " .. g.logical_screen.image_height)

In Nim

let g = Gif.fromFile("path/to/some.gif")
echo "width = " & $g.logicalScreen.imageWidth
echo "height = " & $g.logicalScreen.imageHeight

In PHP

$g = Gif::fromFile('path/to/some.gif');
printf("width = %d\n", $g->logicalScreen()->imageWidth());
printf("height = %d\n", $g->logicalScreen()->imageHeight());

In Python

g = Gif.from_file("path/to/some.gif")
print "width = %d" % (g.logical_screen.image_width)
print "height = %d" % (g.logical_screen.image_height)

In Ruby

g = Gif.from_file("path/to/some.gif")
puts "width = #{g.logical_screen.image_width}"
puts "height = #{g.logical_screen.image_height}"

Of course, this example shows only a very limited subset of what Kaitai Struct can do. Please refer to the tutorials and documentation for more insights.

Supported languages

Official Kaitai Struct compiler now supports compiling .ksy into source modules for the following languages:

C#
Java
JavaScript
Lua
Nim
PHP
Python
Ruby

Downloading and installing

The easiest way to check out the whole Kaitai Struct project is to download the main project repository that already imports all other parts as submodules. Use:

git clone --recursive https://github.com/kaitai-io/kaitai_struct.git

Note the --recursive option.

Alternatively, one can check out individual subprojects that consitute Kaitai Struct suite. They are:

kaitai_struct_compiler — compiler that translates .ksy into a parser source code written in a target programming language
kaitai_struct_tests — tests & specs to ensure that compiler work as planned
Runtime libraries
- kaitai_struct_cpp_stl_runtime — for C++/STL
- kaitai_struct_csharp_runtime — for C#
- kaitai_struct_java_runtime — for Java
- kaitai_struct_javascript_runtime — for JavaScript
- kaitai_struct_nim_runtime — for Nim
- kaitai_struct_lua_runtime — for Lua
- kaitai_struct_python_runtime — for Python
- kaitai_struct_ruby_runtime — for Ruby
- kaitai_struct_swift_runtime — for Swift
kaitai_struct_formats — library of widely used formats and binary structures described as .ksy files

Using KS in your project

Typically, using formats described in KS in your project, involves the following steps:

Describe the format — i.e. create a .ksy file
Compile .ksy file into target language source file and include that file in your project
Add KS runtime library for your particular language into your project (don't worry, it's small and it's there mostly to ensure readability of generated code)
Use generated class(es) to parse your binary file / stream and access its components

Check out the tutorial and documentation for more information.

Licensing

Compiler — GPLv3+
Runtime libraries — MIT or Apache v2 (=> you can include generated code even into proprietary applications) — see individual libraries for details

kaitai_struct's People

Contributors

Stargazers

Watchers

Forkers

4144 nnamon andrewsvsg wwebb-r7 gitcollect mewbak mrmugiwara gitter-badger coperius rnz ikthap carrot-garden rflaperuta rybnik longjohncoder geezer-workshop lexxat h1d3r vsheludchenkov heruix xee5ch govanify tmvector aokolovskis jaapaap tecknicaltom xiazixi tzubal vasudevram brolertools tempelmann alpex29 houcy opt9 shobhitsharma codeaudit ufwt sigma-random yssource lcarst latestalexey vpmanske bibekdw khasanov wowdevs corbane rhencke jazzlly mitch354 x1unix elliottecton iamwenbozhang difcareer filipdominec smcculle attackgithub gregcopenhaver excitoon-favorites copywrite anusorn mbrukman chinaren2003 dl4pd fudgepop01 flysnail100 tlindner ciarant e7dal joncampbell123 superf0sh freemanzyq arieltorti idoimaging acidburn0zzz mmmika hhy5277 erssebaggala chanyk-joseph jeffli678 zhangchengkai826 simonsan phoenixstar7 wang-zifu wopeizl mingun reversetools maysky-blue singhrahul007 mewpull xreyrobert modulexcite jpventura julianvolodia vogelsgesang rasata ly774508966 5l1v3r1 rollno748 yang123vc userunknownfactor

kaitai_struct's Issues

Data views / substreams support

Currently, all languages use something similar to this code, when it's time to do a substream and parse objects from substream:

seq:
  - id: foo
    size: some_size
    type: foo_class

this._raw_foo = io.readBytes(someSize());
KaitaiStream _io__raw_foo = new KaitaiStream(_raw_foo);
this.foo = new FooClass(_io__raw_foo, this, _root);

This is inefficient, especially for larger data streams - it needs to load everything into memory and then parse it from there. A more efficient approach would use using some sort of substreams in manner of data views, i.e. something like:

KaitaiStream _io__foo = _io.substream(_io.pos(), someSize());
this.foo = new FooClass(_io__foo, this, _root);

or, for instances that have known pos field, something like that:

instances:
  foo:
    pos: some_pos
    size: some_size
    type: foo_class

KaitaiStream _io__foo = _io.substream(somePos(), someSize());
this.foo = new FooClass(_io__foo, this, _root);

The devil, of course, is in the details:

We need to make sure a substream functionality for every target language exists (or implement it there)
Probably at least for some time we'll have to support two different ways: i.e. parsing via reading _raw_* byte arrays and parsing using substreams.
We need to decide what to do in lots of other cases:

What do we return when reading a type-less value: i.e. user would expect a byte array or we'll switch to use substream there too? or a substream-that-can-be-used-as-bytearray? or it should be used-choosable?
What do to when repeat constructs exist on this field?
How would that mix with process - these actually require reading and re-processing the whole byte array in memory. Or should we re-implement them as stream-in => stream-out converters as well?

Similar project, bin-parser for Python

Hi Mikhail,

Nice project. It looks a lot like something I have been working on last year. Perhaps we could exchange some ideas?

External processing subroutines

Sometimes data is encrypted with custom algorithms which requires from us to create two .ksy files: one for a file which contains that encrypted data and one for decrypted data itself. Then, while working with compiled modules, we have to create an object of the first module, locate the encrypted data, decrypt it, and instantiate an object of the second module with the decrypted data as input. Now we can work with the decrypted data.
In my opinion, this process can be simplified by two things:

Ability to specify an external script file or library and procedure to call. For example:

encrypted_data:
  seq:
    - id: enc_size
      type: u4
    - id: dec_size
      type: u4
    - id: data
      size: enc_size
      process: external
      file: script.pl
      sub: decrypt(enc_size, dec_size)

In addition to the previous item, ability to specify type of processed data (to eliminate the need for another .ksy file):

    - id: data
      size: enc_size
      process: external
      file: script.pl
      sub: decrypt(enc_size, dec_size)
      type: decrypted_data

  decrypted_data:
    ...

If we want to use external library or work with non-scripting languages, the library can be specified as file: library.dll. Then, we should be able to declare function prototype: sub: void* __stdcall decrypt(void* data, int enc_size, int dec_size).

Of course, generation a code to call external library functions for each supported language is the main problem of my proposal, and deallocation of processed data is another problem. What do you think about it all?

Fixed-length string with terminating character

I can't work out if this is possible currently, but it doesn't seem obvious: Many formats reserve a fixed space for a file name but also have a terminating character (usually \0) to support variable length names.

For example:

typedef struct {
    char name[256];
} example;

example e;
e.name = "test";
writeToFile(e); // Writes { 'T', 'E', 'S', 'T', '\0', ... [+ 251 more bytes] }

Right now there's two options currently available that I can think of:

Option 1. Use str with a fixed length and the consumer has to deal with the terminator manually:

KSY:
      - id: name
        type: str
        size: 256
        encoding: ASCII

Code:
    var e = Example.FromFile("...");
    e.Name = FixTerminatedString(e.Name);

Option 2. Use strz with a terminator and then a second value for the rest of the array:

      - id: name
        type: strz
        encoding: ASCII
        terminator: 0x00
        consume: false
      - id: junk
        size: 256 - name.length

It'd be nice to support an optional terminator in the str type to automatically truncate the string in the runtime:

KSY:
      - id: name
        type: str
        size: 256
        encoding: ASCII
        terminator: 0x00

Runtime:
    public string ReadStrByteLimit(long length, bool terminated, byte terminator, string encoding)
    {
        var bytes = ReadBytes(length);
        if (terminated) bytes = bytes.TakeWhile(b => b != terminator).ToArray();
        return System.Text.Encoding.GetEncoding(encoding).GetString(bytes);
    }

Alternatively, a size option in the strz type would work as well, but I think it'd be more suited for the str type because the data is still fixed-width in the end.

You can see a real-world example in the doom_wad.ksy format (boxes = null characters):

Support "if" in "instances" with a value

"if" is not implemented for calculated instances, but it probably should be:

http://stackoverflow.com/questions/38334665/kaitai-struct-calculated-instances-with-a-condition

Alignment options

I've been working on disassembling Unity3D archives, and it turns out that they frequently use 4- and 8-byte alignment, that is, the same structures get arbitrary zero padding up to next address divisible by 4 or 8, i.e. illustrating 4-byte alignment that exists between XX and YY structures:

001000: XX XX XX XX|YY YY YY YY ...
001000: XX XX XX XX|XX 00 00 00|YY YY YY YY ...
001000: XX XX XX XX|XX XX 00 00|YY YY YY YY ...
001000: XX XX XX XX|XX XX XX 00|YY YY YY YY ...
001000: XX XX XX XX|XX XX XX XX|YY YY YY YY ...

Right now, doing this manually is extremely tedious. In theory, one can add extra "padding" field with calculated size to accomodate these 00s. But XX structure might be of variable size (and it can be intrivial to calculate it), structures before XX might be of variable size, etc.

I'm not sure what would be the best syntax to use for alignment. A few ideas:

align per-type directive that affects all seq fields in a given type (or maybe even its subtypes)
Some kind of special type that can be sandwiched between regular types to request alignment, i.e. something like

seq:
  - id: first
    type: u2
  - align: 8     # skips next 6 bytes to align to position 8
  - id: second   # reads from position 8
    type: u2

Some kind of attribute that introduced pre- and post-parsing position alignment, i.e.:

seq:
  - id: first
    type: u2
  - id: second
    type: u2
    pre-align: 8

or, equivalently:

seq:
  - id: first
    type: u2
    post-align: 8
  - id: second
    type: u2

Given that you have #9, you might want to introduce (or at least think of) both byte-sized alignments and bit-sized alignments. I guess there are 2 very different modes for bit parsing: "aligned" to bytes and "unaligned", where requesting 2 bits + 1 byte results in that latter logical byte read partially from 2 distinct physical bytes.

repeat-until support

Quoting a proposal by @LogicAndTrick:

I was going to make a proposal at some point for something like this to repeat until a certain condition is met:
seq:
  - id: number
    type: u1
    repeat: until
    repeat-until: value == 0 # where `value` is the last value that was read
Hopefully something like that can be possible with whatever the new syntax might be.

Perl support

Hello, I've written a runtime for Perl, it's a draft without an usage example and tests, but it works. Please check it: https://github.com/sergeyzelenyuk/kaitai_struct_perl_runtime.

Switchable default endianness

There are quite a few binary formats that allow 2 versions: little-endian and big-endian and have some sort of indicator near the beginning of the file to determine which version we're reading now. 2 "versions" differ only in one aspect: default endianness.

Examples of such format:

TIFF: it starts with 2 bytes, II for l-e and MM for b-e version
pcap: it starts with 4 bytes, [0xd4, 0xc3, 0xb2, 0xa1] for l-e and [0xa1, 0xb2, 0xc3, 0xd4] for b-e
elf: byte 5 determines endianness of all integers in the file

Such formats are definitely readable with KS right now, but it means duplicating whole format description twice - once with a endian: le and once with endian: be and then switching between them with if: ....

There's a proposal to introduce a concept of switchable default endianness:

meta:
  id: tiff
seq:
  - id: byte_order_id
    type: str
    encoding: ASCII
    size: 2
  - id: header
    type: tiff_header
types:
  tiff_header:
    meta:
      little-endian-expr: '_root.byte_order_id == "II"'
    seq:
      - id: version
        type: u2
      - id: img_dir_ofs
        type: u4
      # ...

Basically, it means for tiff_header type and all types defined underneath it should determine its endianness by the result of evaluation of expression given in little-endian-expr.

Things I'd like to discuss:

I don't like the name of little-endian-expr
Implementation of these ones

If we want to aim for maximum performance, probably that means that compiler should generate two distinct versions of the classes like tiff_header (and everything beneath that in hierarchy) and choose which one they would use once. This would result in generation of classes like _be_TiffHeader and _le_TiffHeader - and, for statically typed languages, probably it would be a good idea to generate a common interface TiffHeader for them.

IDA support

IDA allows you to mark parts of binaries as data, code, structs, enums, etc. Some binaries have structs/tables in them, for example PnP Expansion Header or PE format, which can be described using katai-struct language. It'd be nice to generate idc/idapython scripts from ks description to automatically parse tables.

README: add installation info

https://github.com/kaitai-io/kaitai_struct#downloading-and-installing seems to have download instructions but no installation instructions

CI broken on broken C# tests

Breaking C# tests breaks CI build somehow:
https://travis-ci.org/kaitai-io/kaitai_struct/builds/172869789

Looks like that Travis's shell behaves some other way than on our boxes :(

YAML schema for .ksy

Another suggestion that might be interesting: do a YAML schema to:

Make .ksy structure more obvious
Allow some validation using standard YAML schema tools, such as kwalify

Default encoding for strings

As per #13 (comment), @LogicAndTrick proposed to add default encoding to strings in meta, in the same fashion as default endianness works:

meta:
  endian: le
  encoding: UTF-8
seq:
  - id: str1
    type: str
    size: 32
    # use the default encoding
  - id: str2
    type: str
    size: 64
    encoding: ASCII   # encoding can still be overridden

RFC converter

RFC by IETF often have descriptions of data structures as ascii tables, for example https://tools.ietf.org/html/rfc4271#section-4.2 . It'd be nice to have a tool to convert such descriptions to descriptions available for such tools as KS.

Wireshark dissectors as target language

A few people have already asked for compiling .ksy into Wireshark dissector.

This would be an umberlla issue to track progress for this goal.

C#: automatic generation of a project file?

While adding yet another test, I've finally got tired of adding all the required stuff manually to a C# project file, so I thought - why don't we just make some sort of .csproj file generator? From what I see now, it should just:

take a template
add compiled/csharp/*.cs stuff into second Project/ItemGroup in the following way:

    <Compile Include="..\..\..\compiled\csharp\ZlibWithHeader78.cs">
      <Link>compiled\ZlibWithHeader78.cs</Link>
    </Compile>

add tests/*.cs to the same location in the following way:

    <Compile Include="tests\SpecZlibWithHeader78.cs" />

add src/* to the third Project/ItemGroup in the following way:

    <None Include="..\..\..\src\zlib_with_header_78.bin">
      <CopyToOutputDirectory>Always</CopyToOutputDirectory>
      <Link>src\zlib_with_header_78.bin</Link>
    </None>

and, probably, that's all. Unless there is a simpler way to automate it, can I just hack some sort of quick script to generate this file? All our scripts so far are in either UNIX shell or Ruby — would it be ok to use it there as well?

Examples/strategies for dealing with broken file format implementations.

Parser for EN 13757-3:2012 needed

I need to create a parser for data defined in the standard EN 13757-3:2012 (Communication systems for and remote reading of meters - Part 3: Dedicated application layer) and you can get an example on page 4 of the following PDF. In the end it's like many other binary formats: Some fields of binary data, some with fixed length, some with variable one, all those fields have a special purpose and some implement a bit mask to decide what this purpose is etc.

http://fastforward.ag/downloads/docu/FAST_EnergyCam-Protocol-wirelessMBUS.pdf

I need to decide what/if I build from scratch, where some tools can be of benefit and I like your approach very much. Reading through your documentation, one question came into my mind, though:

How do I deal with the always present exceptions from the standards because of implementation errors of some vendor?

Currently I have already two of those exceptions, where the vendor claims to support EN 13757-3:2012, but actually doesn't in one case by purpose to optimize transferred data and in the other one simply because a firmware bug has been introduced. I'm unsure currently how I would be able to combine default data parsing and recognizing those errors and maybe X more into your approach. In my mind examples for dealing with such things or thinking of strategies how it could be supported at all are of common interest, so I decided to create this issue. If you think otherwise, feel free to close it...

One missing byte

The first problem I have is interesting, because it changes the data format in the first two bytes already: By default the datagrams I need to parse start with 4 bytes, the first containing the length, the second some communication flag and 3/4 are the vendor providing the package. One of my vendors simply removes the second byte with the communication flag, the C-field in the linked PDF with value 44h, to reduce the amount of data transferred, because this byte doesn't ever change in his use cases. But of course I can only know about that case if I read byte 2 and 3 and check if it's that special vendor or not and if so, the parser mode needs to be changed somehow.

With your approach I would create some root type like "header" or such and that would already be different for standard compatible implementation vs. not like I need it.

So would I need to create one of those header types for each implementation I have to deal with? And all of those would need to check something using "if"? And if it's 20 or 50, would all of those be in the same file? Even if I have to heavily document why such a header is needed? Or would be a better approach to generate a different parser for each such case? But what with all the other types then, would I need to redundantly maintain them per file instead of providing some kind of include/link to other types?

In the end, the first example is missing one byte, which influences the position of all the other types created.

Some flipped bits

The second problem is that in the end some arbitrary values are transferred in the datagram, which consist of a triple: One byte to explain the encoding of the value, one byte to explain the unit of the value, one or more bytes for the value itself. It's e.g. "BCD encoded integer" and "hours" and "10" to tell me that "something" last for 10 hours.

Now one of my vendor has a introduced a firmware bug which results in that some datagrams have a flipped bit e.g. for the first of those three bytes. What I need to do is check the vender, the device type and version and then decide that I need to flip some bit in some byte.

Checking for vendor etc. should be easy using "if", but how do I apply the bit flip? Do I need to implement a "process" statement like "zlib", "xor" etc.? Or is using an instance with a "value" calculation the better approach? And what if things get arbitrary complex, like I need to e.g. check a configuration for all versions of devices where this error occurs? Such configuration is not available to your compiler. Additionally, the same question like before applies: Do I combine 50 of such workarounds in the same ksy file?

I would be very thankful for your ideas, because currently I tend to not use your lib and instead implement the parser by hand to have full control to access things like configuration and such.

multiple terminators for strz

23 16 34 13 56 23 10 00 74 76 77 65 23 22 00 11 1A 47 2E 23 0A 00 7E 4A 23 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

I have file that has script commands (0x23) and no size markers. File ends with varying amount of 0x0 bytes.

23 [command number u2 ] [parameters]
Unexpected fixed contents: got 00, was waiting for 23 (because trailing 0x00's).

Is there 'correct' way to read this without getting error? Visualizer shows right values, just this popup happens when opening file.

meta:
    id: dat
    application: script
    endian: le
seq:
    - id: commands
      repeat: eos
      type: script_cmd
types:
    script_cmd:
        seq:
            - id: cmd_start
              contents: [0x23] 
            - id: num
              type: u2
            - id: params
              type: strz
              encoding: SJIS
              terminator: 0x23
              consume: false
              if: num != 0x12 # command num for end of file

RegEx support

Hi
Are you planning to support regular expression for parsing text fields ?

repeat: size, repeat: pos

Sometimes one need to describe repeated blocks which number is unknown and known only a size of all blocks or a position where they end. Consider the following .ksy file:

meta:
  id: test
  file-extension: ext
  endian: le

seq:
  - id: table_1_size
    type: u4
  - id: table_1
    type: table_1_entry
  - id: garbage
    size-eos: true

types:
  table_1_entry:
    seq:
      - id: len
        type: u4
      - id: data
        size: len

Here we want to read full table_1 but don't know how many entries exist. My suggestion is to add repeat: size attribute to make it possible to read as much table_1 entries as they fits in a size specified:

seq:
  - id: table_1_size
    type: u4
  - id: table_1
    type: table_1_entry
    repeat: size
    repeat-size: table_1_size
  - id: garbage
    size-eos: true

The other option is to read to a position in stream. It may looks like that:

seq:
  - id: offset_to_garbage
    type: u4
  - id: table_1
    type: table_1_entry
    repeat: pos
    repeat-pos: offset_to_garbage
  - id: garbage
    size-eos: true

Partial compilation for C#

Normal CIs work assuming that at the very least, the project is compiling: normally, no one commits stupid syntax errors.

However, it's not exactly the case with Kaitai Struct tests: it's normal course of development to have some generated sources, and only some of them would compile correctly. And even only some of the sources compile, it's still worthwhile to try testing them. There are high chance that the rest would pass.

I've implemented this thing I call "partial compilation" for Java - see run-java. The general idea is very crude, but it works: first it tries to compile all the sources together, if it fails, it tries to compile files one-by-one.

It's worth to implement the similar build scheme for any compilable language (i.e. C#, C++, Java). The general overview is as follows:

run-$LANG compiles language runtime; if this fails, the rest is abandoned, it's a fatal error
run-$LANG tries to compile format classes generated by KS compiler; it requires only the source of the class itself + runtime (which is already compiled at step 1); if anything fails, it's not a fatal error and we proceed, trying to keep as many format classes compiled as possible;
run-$LANG tries to compile test specs; there is 1 test spec file per 1 test; if anything fails, we just abandon failed file and try to continue with the rest
run-$LANG runs tests (obviously, only those that compiled successfully)
aggregate/aggregate later parses results of steps 1-4 and outputs distinct statuses like "spec build failed" and "format build failed", to give an idea of what's broken

For steps 2 and 3 we keep a list of files failed to build in test_out/$LANG/build.fails and detailed compilation error logs in test_out/$LANG/compile_log/$FILE.log - these are used for aggregation and display of errors at http://kaitai.io/ci/

Is it possible somehow to do that for C#?

Rust support

Is it possible to add Rust language support to kaitai?
Thanks.

Switch operation instead of multiple `if`s

Also a long requested feature, it is proposed to have some sort of alternative of doing repetitive ifs like that:

      - id: el_type
        type: u1
        enum: bson_type
      - id: el_string
        type: bson_string
        if: el_type == bson_type::string
      - id: el_document
        type: bson_document
        if: el_type == bson_type::document
      - id: el_boolean
        type: u1
        if: el_type == bson_type::boolean
...

I propose to discuss, choose the best possible plan and implement it :)

XOR processing with multiple bytes

Okay, I give up and start to file my requests properly, as you've requested ;)

So, I'd love to have XOR processing for a byte arrays or user subtypes with multiple bytes. That is, for example, given source bytes src and XOR argument arg[0..2] I'd like to have the following as a result:

[src[0] ^ arg[0], src[1] ^ arg[1], src[2] ^ arg[2],
 src[3] ^ arg[0], src[4] ^ arg[1], src[5] ^ arg[2],
 src[6] ^ arg[0], ...

Suggested syntax: process: xor([0x11, 0x22, 0x33, ...]).

Ideally, it will be a blessing if it would also work with byte arrays (i.e. keys) extracted as some other field, i.e.

- id: key
  size: 4
- id: encrypted
  process: xor(key)

Bit field data types

Sometimes a file format doesn't use whole bytes for every field, so it would be nice to support bit fields. Something like b[1-64]{le,be}.

MIDI running status

While trying to implement MIDI messaging specification for KS, I've ran into an issue with a certain thing called "running status" in MIDI core spec.

MIDI events in standard MIDI file normally look like that:

00 b5 00 00
00 b5 31 00
00 b5 0a 17

It breaks down like:

00 - time spec, zero (start of the track) in this case
b5 - change controller at channel 5
00 - controller ID
00 - controller value

However, when multiple similar event types go sequentially (as in this example), all the b5s except for the first one can be omitted:

00 b5 00 00
00 31 00
00 0a 17

The distinction between a "proper" new event and continuation of previous event type is made by lack of high bit (0x80) in that byte after time spec: if there is a bit there (like b5) - that's a new type of event, if there isn't (like 31) - that's "running status".

Although "running status" is mostly obsolete and many MIDI implementations doesn't seem to support it, it's still an interesting question on how to ponder this in KS syntax. Any ideas?

→ Current MIDI file spec

C#: suboptimal instances already calculated check

It dawned upon me that the current "is instance already calculated" check in C# is so-so. It was copied from Java, and, while Java has both nullable and non-nullable types (as Integer and int), C# seems to only have non-nullable primitive types, such as int, and thus they always have a default value (which is 0), not null. Because of that, this turns out bad:

        private int _foo;
        public int Foo
        {
            get
            {
                if (_foo != default(int))
                    return _foo;
                _foo = ...
                return _foo;
            }
        }

If calculated value of _foo is 0, it will get calculated again and again. This (1) breaks API contract - instance's value is supposed to be calculated once and never change afterwards, (2) might get really slow, if instance calculation is some kind of really slow operation, which will get executed repeatedly.

Is there something we can do about it? For example, C++ solves the same problem by introducing distinct flag variables - i.e. f_foo - false by default, set to true after the instance have been calculated.

Serialization

You have deserialization. How about serialization?

Graphviz diagram generation

The basic idea is to generate nice and readable diagrams from declarative .ksy format descriptions.

Early prototypes are below.

Microsoft PE executable file format

GIF file format

Generate only required imports / includes

Currently, all generated code includes a hefty amount of imports / includes / similar header-like code that import/include all kinds of stuff that might be or might not be used, like classes for arrays, zlib compression, etc.

For example, in Java, a typical HelloWorld generates:

import io.kaitai.struct.KaitaiStruct;
import io.kaitai.struct.KaitaiStream;

import java.io.IOException;
import java.util.Arrays; // unused
import java.util.ArrayList; // unused
import java.util.HashMap; // unused
import java.util.Map; // unused

That is 4 unused imports out of 7.

The problem now lies in a way the code is generated - there's only one stream. I propose to create a unified infrastructure for all compilers to be able to generate these "header" stuff separately, if the need arises.

_io.pos, _io.eof

According to a problem described in #24, we need an ability to use _io.pos and _io.eof in expressions to use, for example, inside size or if attributes. They will be additions to _io.size which can be used already.

_io.pos - get position in current stream;
_io.eof - true if current position at the end of file, false otherwise.

Unified YAML parsing

Given current situation with YAML libraries (there are no native Scala libraries, which could have compiled smoothing both to JVM and JS, and, in short notice, natively, and given that YAML is so weird and complex that it's infeasible to write our own YAML parsing library, I want to discuss another possible approach.

Raw SnakeYAML gives us Java Object, which is actually a conglomerate of java.util.List, java.util.Map, and simpler Java types (i.e. Integer, Long, Double, String). Passing objects from JavaScript gives us a scala.scalajs.js.Object, which is actually also a conglomerate of simpler types, such as js.Dictionary, js.Array, etc. Note that once we've got the parsing in terms of these objects, all the "precious" information about line numbers / columns, etc, is already lost. You can't get anything from just a java.util.List with only objects.

What I want to propose is that we do:

Two relatively thin and generic layers that convert platform-specific YAML parser output (i.e. JVM or JS) into regular Scala primitives (List, Map, and primitive types). Actually, we don't really use any of the fancy YAML stuff, like custom serialization or anything, so it should be pretty simple.
Single manually written layer that converts data defined in terms of List/Map to our *Spec stuff (ClassSpec, AttrSpec, etc). Probably it should be implemented approximately the same as our current create(...) in companion objects work, but should just take raw AnyRef and cast it as it will see fit.
During this process, we'll push out some parsing exceptions (i.e. "certain key expected, but not found", "unknown key encountered", etc), and to make these error messages more meaningful, I propose to address them not by the line numbers, but by some sort of path inside YAML, i.e. something like types/foo/seq/3/repeat-expr. It is obviously worse than line numbers, but it's at least something helpful that we can do right now in our current situation with YAML parsers.

It can, in theory, allow us to solve problems like https://github.com/kaitai-io/kaitai_struct_compiler/issues/23 and https://github.com/kaitai-io/kaitai_struct_compiler/issues/22

What do you think about it?

Parent class type detector doesn't check instances (C#/Java)

This one's pretty obscure, but there's definitely a bug here.

The compiler seems to have some logic to check if a type is used in multiple locations to determine whether the parent is a known type or whether it should just a a generic KaitaiStruct type. (ClassCompiler.scala@markupParentTypes)

This check doesn't check the usages of the type inside instances. If only one type uses it in the seq section, but other types use it in the instances section, the resulting output won't compile.

I tried to fix this one by adding to markupParentTypes, but I couldn't work it out (I'm not familiar enough with the type system yet).

KSY to replicate the bug:

meta:
  id: multiple_use
  endian: le
seq:
  - id: t1
    type: type_1
  - id: t2
    type: type_2
types:
  multi:
    seq:
      - id: value
        type: s4
  type_1:
    seq:
      - id: first_use
        type: multi  # parent type = type_1
  type_2:
    seq: []
    instances:
      second_use:
        pos: 0
        type: multi  # parent type = type_2

Compiled result (similar problems in both C# and Java):

// Error CS1503 - Argument 2: cannot convert from 'MultipleUse.Type2' to 'MultipleUse.Type1'
class MultipleUse : KaitaiStruct
{
    ...

    // Note here that `parent` is `Type1`. It should be `KaitaiStruct`.
    class Multi : KaitaiStruct
    {
        public Multi(KaitaiStream io, Type1 parent = null, MultipleUse root = null) : base(io) { ... }
    }

    class Type1 : KaitaiStruct
    {
        ...

        private void _parse()
        {
            _firstUse = new Multi(m_io, this, m_root);
        }
    }

    class Type2 : KaitaiStruct
    {
        public Multi SecondUse
        {
            get
            {
                ...
                _secondUse = new Multi(m_io, this, m_root); // <--- compile error here
                ...
            }
        }
    }
}

Temporary work around:

# Add this to `seq`:
  - id: unused
    type: multi
    if: false

Nested Subtypes

Consider the following YAML type:

    seq:
      - id: frame_entry
        repeat: eos

        # The pages are always 512 byte blocks / pages
        # for some reason... not sure why considering a lot of
        # the data is wasted...
        size: 512

        types:
          junk:
            size: 174
          other:
            size: 338

It'd be nice if we could support the "subtypes", that is types within types. How is this done in Kaistruct today? Is everything top level?

Unable to determine type for `field` in type List(kaitai_struct)

Reproduce gist: https://gist.github.com/aki017/70266d164354696535b9eabcd5a9306c

It occur when nest type and using _parent.field

work

- id: work
  size: _parent.size

broken

- id: broken
  size: _parent.size * 2

== Error

... compiling it for ruby... ... => ./ruby/broken.rb
java.lang.RuntimeException: Unable to determine type for main_size in type List(kaitai_struct)
    at io.kaitai.struct.ClassCompiler.determineType(ClassCompiler.scala:162)
    at io.kaitai.struct.translators.BaseTranslator.detectType(BaseTranslator.scala:260)
    at io.kaitai.struct.translators.BaseTranslator.translate(BaseTranslator.scala:45)
    at io.kaitai.struct.languages.LanguageCompiler.expression(LanguageCompiler.scala:74)
    at io.kaitai.struct.languages.RubyCompiler.parseExpr(RubyCompiler.scala:201)
    at io.kaitai.struct.languages.EveryReadIsExpression$class.attrBytesTypeParse(EveryReadIsExpression.scala:80)
    at io.kaitai.struct.languages.RubyCompiler.attrBytesTypeParse(RubyCompiler.scala:10)
    at io.kaitai.struct.languages.EveryReadIsExpression$class.attrParse2(EveryReadIsExpression.scala:61)
    at io.kaitai.struct.languages.RubyCompiler.attrParse2(RubyCompiler.scala:10)
    at io.kaitai.struct.languages.EveryReadIsExpression$class.attrParse(EveryReadIsExpression.scala:39)
    at io.kaitai.struct.languages.RubyCompiler.attrParse(RubyCompiler.scala:10)
    at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$3.apply(ClassCompiler.scala:100)
    at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$3.apply(ClassCompiler.scala:100)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at io.kaitai.struct.ClassCompiler.compileClass(ClassCompiler.scala:100)
    at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$5.apply(ClassCompiler.scala:108)
    at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$5.apply(ClassCompiler.scala:108)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
    at io.kaitai.struct.ClassCompiler.compileClass(ClassCompiler.scala:108)
    at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$5.apply(ClassCompiler.scala:108)
    at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$5.apply(ClassCompiler.scala:108)
    at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
    at io.kaitai.struct.ClassCompiler.compileClass(ClassCompiler.scala:108)
    at io.kaitai.struct.ClassCompiler.compile(ClassCompiler.scala:80)
    at io.kaitai.struct.Main$.compileOne(Main.scala:84)
    at io.kaitai.struct.Main$$anonfun$compileAll$1.apply(Main.scala:97)
    at io.kaitai.struct.Main$$anonfun$compileAll$1.apply(Main.scala:93)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at io.kaitai.struct.Main$.compileAll(Main.scala:93)
    at io.kaitai.struct.Main$$anonfun$main$1.apply(Main.scala:118)
    at io.kaitai.struct.Main$$anonfun$main$1.apply(Main.scala:111)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at io.kaitai.struct.Main$.main(Main.scala:111)
    at io.kaitai.struct.Main.main(Main.scala)

Rust support

Adding this so it can be tracked, but I plan on doing most of this myself. I've not yet had a need for Rust but I've been wanting to try it out anyway, figured this is a good excuse.

I've created a runtime for Rust: https://github.com/kaitai-io/kaitai_struct_rust_runtime

I'll be adding matching tests and a compiler for Rust in the future.

Note that I don't know Rust at all so I'm learning as I go. If anybody has coded with it before I would very much appreciate advice, suggestions, or even a code review of the runtime!

Theoretical background of binary parsing algorithms

What first reading papers can you recommend on theme of error-resistant binary parsing ?
What papers can you recommend to dive into binary parsing at all ?

My neighbors @ http://ikp.ssau.ru/ have longtime neglected problem with parsing telemetry data with lot of brokken packets. I found your project as draft decision, but is kaitai algorithms resistant for partial/inconsistent data ?

I'm thinking about some algorithm with sliding parse window and backtracking validation, can you name some widely known methods on it ?

Mod operation consistency

Regarding "positive mod" operation consistency. Unfortunately, many C-derived languages (including C++, Java, C# and PHP) do not feature "mod" operation, but instead use remainder calculation, which may result in negative answer. Wikipedia article on modulo has comprehensive list of how languages behave here.

I'm strongly vouching to have consistent "positive mod" operation in KS, so here's a summary of what can be done:

in Java - there is Math.floorMod which does exactly what we want — yet, unfortunately, it is available only since Java 8.
in C++ - there seems to be no simple solution (SO people suggest macros, inline functions, etc)
in JavaScript - these people basically suggest that there's no ready-made function and propose to roll your own

Stuff to be investigated:

in C#
in PHP
in Perl

PHP 7 support

We would like to add support of PHP 7, we could assist you in that without knowledge of Scala and with knowledge of PHP. Is it a possible?

_parent breaks when nested below a switch statement

This ksy breaks :

meta:
  id: asterix
seq:
  - id: asterix
    type: asterix_block
    repeat: eos
types:
  asterix_block:
    seq:
      - id: category
        type: u1
      - id: asterix_len
        type: u2be
      - id: content
        size: asterix_len - 3
        type:
          switch-on: category
          cases:
            '30': asterix_cat030
  asterix_cat030:
    seq:
      - id: fspec
        type: u1
        repeat: until
        repeat-until: _ & 0x1 == 0
      - id: cat030_content
        type: asterix_cat030_content
  asterix_cat030_content:
    seq:
      - id: item_010
        size: 2
        if: _parent.fspec[0] & 0x80 != 0

Exception in thread "main" scala.MatchError: UnknownClassSpec (of class io.kaitai.struct.format.UnknownClassSpec$)
	at io.kaitai.struct.ClassTypeProvider.makeUserType(ClassTypeProvider.scala:51)
	at io.kaitai.struct.ClassTypeProvider.determineType(ClassTypeProvider.scala:24)
	at io.kaitai.struct.ClassTypeProvider.determineType(ClassTypeProvider.scala:16)
	at io.kaitai.struct.translators.BaseTranslator.detectType(BaseTranslator.scala:228)
	at io.kaitai.struct.translators.BaseTranslator.translate(BaseTranslator.scala:69)
	at io.kaitai.struct.languages.components.ObjectOrientedLanguage$class.expression(ObjectOrientedLanguage.scala:9)
	at io.kaitai.struct.languages.RubyCompiler.expression(RubyCompiler.scala:11)
	at io.kaitai.struct.languages.RubyCompiler.condIfHeader(RubyCompiler.scala:172)
	at io.kaitai.struct.languages.components.EveryReadIsExpression$class.attrParse(EveryReadIsExpression.scala:23)
	at io.kaitai.struct.languages.RubyCompiler.attrParse(RubyCompiler.scala:11)
	at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$3.apply(ClassCompiler.scala:44)
	at io.kaitai.struct.ClassCompiler$$anonfun$compileClass$3.apply(ClassCompiler.scala:44)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at io.kaitai.struct.ClassCompiler.compileClass(ClassCompiler.scala:44)
	at io.kaitai.struct.ClassCompiler$$anonfun$compileSubclasses$1.apply(ClassCompiler.scala:86)
	at io.kaitai.struct.ClassCompiler$$anonfun$compileSubclasses$1.apply(ClassCompiler.scala:86)
	at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
	at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
	at io.kaitai.struct.ClassCompiler.compileSubclasses(ClassCompiler.scala:86)
	at io.kaitai.struct.ClassCompiler.compileClass(ClassCompiler.scala:59)
	at io.kaitai.struct.ClassCompiler.compile(ClassCompiler.scala:19)
	at io.kaitai.struct.Main$.compileOne(Main.scala:92)
	at io.kaitai.struct.Main$.compileOne(Main.scala:88)
	at io.kaitai.struct.Main$$anonfun$main$1.apply(Main.scala:124)
	at io.kaitai.struct.Main$$anonfun$main$1.apply(Main.scala:119)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at io.kaitai.struct.Main$.main(Main.scala:119)
	at io.kaitai.struct.Main.main(Main.scala)

This one works fine :

meta:
  id: asterix
seq:
  - id: asterix
    type: asterix_block
    repeat: eos
types:
  asterix_block:
    seq:
      - id: category
        type: u1
      - id: asterix_len
        type: u2be
      - id: content
        size: asterix_len - 3
        type: asterix_cat030
        if: category == 30
  asterix_cat030:
    seq:
      - id: fspec
        type: u1
        repeat: until
        repeat-until: _ & 0x1 == 0
      - id: cat030_content
        type: asterix_cat030_content
  asterix_cat030_content:
    seq:
      - id: item_010
        size: 2
        if: _parent.fspec[0] & 0x80 != 0

I don't know wether I'm misusing Kaitai-struct or it's bug.

For reference, I'm trying to use Kaitai-struct to parse ASTERIX data.

Specification of the binary format is here : https://www.eurocontrol.int/sites/default/files/field_tabs/content/documents/single-sky/specifications/20120401-asterix-spec-v2.0.pdf

Floating point data types

This is a proposal to add IEEE 754 floating points data types to the list of supported primitive types. The single and double precision types being the key types as the other types in the specification are rare.

Suggested specification pattern: (f)(4|8)(le|be)

First group - always f - Indicates floating point
Second group - 4 or 8 - The length of the type in bytes, 4 being single precision (float) and 8 being double precision (double)
Third group - le or be - The endianness of the stored value (optional)

Examples: f4, f4be, f8le

Suggested stream API methods:

read_f4le
read_f4be
read_f8le
read_f8be

Required changes (I'm mostly guessing here):

Compiler core: Add type handling for floating points
Language compilers: Add native type mappings for new types
Language Runtimes: Add new API methods
Documentation: Update the API docs and the page listing primitive types
Tests: Add/modify test cases to add floating point values

I've only ever encountered a half-precision floating point value once, and I'm not sure I ever will again. It's probably not worth adding any FP formats outside of the standard single and double.

Docstrings

Just another simple idea that was floating around for ages and I keep forgetting it.

The proposal is to add docstrings into .ksy format, something like:

seq:
  - id: num_files
    type: s4
    doc: Number of files in the archive

This should result in rendition of proper docstring for every language, for example, for Java:

/**
 * @return Number of files in the archive
 */
public int numFiles() { return this.numFiles; }

Should be easy enough to do (although almost impossible to check with tests), but a huge help for proper format documentation.

Lengths and other relations in data

Is it possible to define a structure where the first field is the count (let's call it n) followed by array of n items of some smaller structure followed by the rest of structure.
In pseudo-c++

struct a{
//.....
};
struct b{
size_t count;
a data[count];
//rest of structure
}

Also variants are needed, where the type of field depends on combinational function of bit flags. For example if we process network packets, their content depends on the header. This can be japped to the language using polymophism.

Add to FAQ differences between this and cap'nproto

Set the stream position inside a repeat loop

This one's a bit wild, but I don't think it's currently possible so I wanted to log it.

Quake's BSP format defines some structures like this:

// pseudocode
typedef struct {
    int miptexnum;
    int offsets[miptexnum]; // offsets to each miptex's header
} miptex_lump_t;

typedef struct
{
    char name[16];
    uint32_t width;
    uint32_t height;
    ... etc ...
} miptex_t;

// pseudocode to read the data
var stream = open(filename);
miptex_lump_t lump = stream->read_miptex_lump();
foreach (int o in lump->offsets) {
    stream->seek(o);
    miptex_t tex = stream->read_miptex();
}

Basically to read this data you need to first read an array of offsets, and then loop through that array and read the data at each offset. Is something like this currently possible?

If not, could support for this kind of structure be added? I was thinking something like this would probably be the least amount of change (important part is at the end):

types:
  miptex:
    seq:
      - id: name
        type: str
        size: 16
        encoding: ASCII
      - id: width
        type: u4
      - id: height
        type: u4
        # ... etc ...
  miptex_lump:
    seq:
      - id: num_textures
        type: s4
      - id: offsets
        type: s4
        repeat: expr
        repeat-expr: num_textures
    instances:
      miptex_list:
        type: miptex
        repeat: expr
        repeat-expr: num_textures
        repeat-pos: offsets[_index]    # `_index` is the index of the for loop:
                                       # for (int _index = 0; ...) { }

The idea being that pos runs once for the whole instance, and repeat-pos runs once for each item in the list.

The generated code would look something like this:

// generated C# code
public List<Miptex> MiptexList
{
    get
    {
        if (_miptexList != default(List<Miptex>))
            return _textures;
        long _pos = m_io.Pos();
        _miptexList = new List<Miptex>();
        for (var _index = 0; _index < NumTextures; _index++) {
            m_io.Seek(Offsets[_index]);    // <--- New code here
            _miptexList.Add(new Miptex(m_io, this, m_root));
        }
        m_io.Seek(_pos);
        return _miptexList;
    }
}

I've managed to get a (kinda hacky) version of this working, so I don't think massive changes are required - I'm able work on the proper implementation if we can work out the best way to do it. I've come across this kind of 'list of offsets' structure in other formats as well (e.g. Half-Life 1 models), so I think it would be a useful addition to have.

Maven coordinates

Hello
Would you please provide maven coordinates of java artifacts in the documentation/dev guide?
Thanks

Basic C# support

C# support is 78% done, according to the CI.

@LogicAndTrick, do you have plans to continue working on this support in a next week or two? I'd like to release v0.4 about that time, and it would be great to have working C# support in it. If you're busy right now, just tell me, I'll try to take on it myself - probably it shouldn't be that difficult given Java vs C# similarities.

Processing with real cryptographic ciphers

From time to time I get requests to implement "real" cryptographic ciphers in KS (like DES, AES, IDEA, RC*, etc). I'll try to outline what I have in mind on how to implement that.

We start creating a KS standard for cipher naming and parameters. Currently, there are tons of names available in different implementations. We'd want to have a unified naming that will be used in .ksy file, that will be mapped into existing implementations.
We'll slowly gather implementations for these algorithms in all target languages. We'll try to map stuff to standard libraries as much as possible, but, of course, not everything would be there. Thus we'll have to include external libraries - that's ok, but we'll strive as much as possible for that to be optional (i.e. if .ksy is not using algorithm X, there is no need to link/include library for X support in target project).

Example

- id: key
  size: 16
- id: iv
  size: 16
- id: buf
  size: 1024
  process: aes(128, key, iv, cbc)

It should be translated into something like that in Ruby:

@buf = KaitaiStream::unpack_aes(128, @_io.read(1024), key, iv, :CBC)

# ...

def unpack_aes(data, key_len, key, iv, mode)
  require 'openssl'
  decipher = OpenSSL::Cipher::AES.new(key_len, mode)  
  decipher.decrypt
  decipher.key = key
  decipher.iv = iv
  decipher.update(data) + decipher.final
end

If anyone's really interested in this, I suggest to start with gathering a list of ciphers with parameters and list of libraries in target languages that will support them.

Online visualizer / IDE

The idea is to create a HTML/JS based online visualizer / IDE for parsing / visualizing binary formats on the web.

This could be extended to a fully fledged IDE where you can edit .ksy descriptors, compile them to different languages, etc.