bjdata's Introduction

Binary JData Format Specification Development Guide

We use this repository to gather feedback from the community regarding the "Binary JData Format Specification", or Binary JData (BJData) format. Such feedback is crucial to finalize this file specification and help improve it in the future once disseminated.

The latest version of the BJData specification can be found in the file named Binary_JData_Specification.md. The specification is written in the Markdown format for convenient editing and version control.

This specification was derived from the Universal Binary JSON (UBJSON) Specification Draft 12 developed by Riyad Kalla and other UBJSON contributors. The MarkDown version of this specification was derived from the documentation included in the Py-ubjson repository (Commit 5ce1fe7). The BJData format is no longer backward compatible with UBJSON.

Libraries that support this specification include

Python: pybj (PIP: https://pypi.org/project/bjdata/, Github: https://github.com/NeuroJSON/pybj)
MATLAB/Octave: JSONLab (Debian/Ubuntu/Fedora: sudo apt-get install octave-jsonlab, Github: https://github.com/fangq/jsonlab)
C: ubj (Github: https://github.com/NeuroJSON/ubj)
C++: JSON for Modern C++ (v3.11.0 or later) (https://github.com/nlohmann/json/)
JavaScript: bjd (npm: https://www.npmjs.com/package/bjd, Github: https://github.com/NeuroJSON/js-bjdata)

Acknowledgement: This specification was developed as part of the NeuroJSON project (http://neurojson.org), with funding support from the US National Institute of Health (NIH) under grant U24-NS124027 (PI: Qianqian Fang).

What is BJData

BJData is a binary JSON format. It is similar to JSON but allows to store strongly-typed binary data. The BJData format improves upon the widely supported UBJSON (https://ubjson.org) format by adding the below key features

added 4 new data types previously missing from UBJSON, including [u] - uint16, [m] - uint32, [M] - uint64 and [h] - half/float16,
first among all binary JSON formats to support packed N-dimensional arrays (ndarray) - a data type that is of great importance to the scientific community,
adopted Little-Endian as the default byte-order, as opposed to Big-Endian for UBJSON/MessagePack/CBOR,
only non-zero-fixed-length data types are allowed in optimized container types ($), which means [{SHTFTN can not follow $, but UiuImLMLhdDC are allowed

How to participate

You can use a number of methods to provide your feedback to the working draft of this file specification, including

Create an "Issue"
- This is the most recommended method to provide detailed feedback or discussion. An "Issue" in github is highly versatile. One can ask a question, report a bug, provide a feature request, or simply propose general discussions. Please use URLs or keywords to link your discussion to a specific line/section/topic in the document.
Write short comments on Request for Comments (RFC) commits
- A milestone version of the specification will be associated with an RFC (Request for comments) commit (where the entire file is removed and re-added so that every line appears in such commit). One can write short comments as well as post replies on this RFC page.
- The latest stable release is Version 1 Draft 2. Please use this link to comment.
- To add a comment, you need to first register a github account, and then browse the above RFC page. When hovering your cursor over each line, a "plus" icon is displayed, clicking it will allow one to comment on a specific line (or reply to other's comments).
- The RFC page can get busy if too many comments appear. Please consider using the Issues section if this happens.
- One can browse the commit history of the specification document. If anyone is interested in commenting on a particular updated, you can also comment on any of the commit page using the same method.
Use NeuroJSON mailing list
- You may send your comments to the neurojson mailing list (neurojson at googlegroups.com). Subscribers will discuss by emails, and if a motion is reached, proposals will be resubmitted as an Issue, and changes to the specification will be associated with this issue page.

For anyone who wants to contribute to the writing or revision of this document, please follow the below steps

Fork this repository and make updates, then create a pull-request
- Please first register an account on github, then, browse the BJData Spec repository; on the top-right of this page, find and click the "Fork" button.
- once you fork the JData project to your own repository, you may edit the files in your browser directly, or download to your local folder, and edit the files using a text editor;
- once your revision is complete, please "commit" and "push" it to your forked git repository. Then you should create a pull-request (PR) against the upstream repository (i.e., NeuroJSON/bjdata). Please select "Compare cross forks" and select "NeuroJSON/bjdata" as "base fork". Please write a descriptive title for your pull-request. The project maintainer will review your updates and choose to merge to the upstream files or request revision from you.

bjdata's People

Contributors

Stargazers

Watchers

bjdata's Issues

Spec for a compact format for object persistance

I would like to propose to contribute with a spec for optimized serialization for any kind of struct, class or object.

It's similar to the metadata node you already proposed.

If it seems off-topic, personally I think it matches nicely with BJData and UBJSON's compact form: People come to binary format usually not willing to waste space repeating the same strings over and over. When simple UBJ_OBJECT are used for this purpose, that's what happens.

In contrast to opaque index or ndarrays, this spec allows for much better readability of the data and guarantees correct interpretation of data in the future by preventing you to lose record of which field is what or where. This is specially useful when data types and fields changes too often. I'm guessing you already know that.

In two cases I'm proposing new tags to UBJSON, to allow nesting inside those type's values, but the same could be done with reserved strings as you proposed, without changing or depending on UBJSON.

I think this spec would be specially useful when combined with reflection features that could allow automatic serialization and deserialization.

Inheritance could also be supported either with simple Single Table Inheritance or more sophisticated means.

I've turned a similar scheme of object serialization with UBJSON into a draft for this spec idea. See if you think that belongs somewhere in your project.

This idea is about an UBJSON container that holds both metadata and the object data to be stored in a compact way;

The metadata part is another container which contains values that specify the fields of all object types that will be stored later on the file.

Example of metadata:

[["Time", ["year", "month", "day", "hour", "minute"]],
 ["Place", ["longitude", "latitude"]],
 ["Appointment", ["start", "end", "place"]],
]

Then each instance is represented in the data portion as an array of the appropriate type: UBJ_MIXED or something else or even ndarrays if it's scalar data.

The type of each data instance is identified either by a preceding integer "tag" in the case of an heterogeneous array, or in the case of homogeneous representation, by it's location on the file.

If the array index or integer tag matches the index of a metadata type entry, then that type entry describes the fields of that data entry.

Simple example of heterogeneous data:

[[0, [2020, 05, 13, 2, 1]],
 [1, [34.234, 21.342342]],
 [0, [2001, 05, 01, 16, 30]],
 [0, [1970, 01, 01, 01, 01]],
 [1, [74.234, -5.342342]],
]

In another example using heterogeneous data, object nesting is achieved using a new UBJSON tag "t". This tag would be followed by the integer tag or index of the class or type, then by it's contents.

UBJSON numeric type tags are omitted for readability:

An "Appointment" entry would look like this:

[[]
[t] [2]
	[t] [0] [2020, 05, 13, 2, 1]
	[t] [0] [2020, 05, 13, 2, 31]
	[t] [1] [20.555555, 30.777777]
(...)
[]]

Example of linear homogeneous data in UBJ_MIXED arrays:

[
  [
    [2020, 05, 13, 2, 1],
    [2020, 05, 13, 2, 31],
    [2001, 05, 01, 16, 30],
    [1970, 01, 01, 01, 01],
  ],
  [
    [34.234, 21.342342],
    [74.234, -5.342342],
    [20.555555, 30.777777],
  ]
]

Another way to a similar nesting representation as above but using index-based reference, and other new UBJSON tag, this time "R" for reference.

This representation is useful when there are many repetitions.

Each R tag will be followed by a type tag or index followed by the index of the instance being referenced.

Of course, this representation is much less stream-friendly.

Again, other UBJSON types are omitted for simplicity:

[
  [
    [2020, 05, 13, 2, 1],
    [2020, 05, 13, 2, 31],
    [2001, 05, 01, 16, 30],
    [1970, 01, 01, 01, 01],
  ],
  [
    [34.234, 21.342342],
    [74.234, -5.342342],
    [20.555555, 30.777777],
  ],
  [
    [[R] [0] [0]
     [R] [0] [1]
     [R] [1] [2]
    ]
  ]
]

This example serializes the same Appointment instance data presented before.

That's it. Thanks for reading this long issue.

Recommend Projects

neurojson / bjdata Goto Github PK

bjdata's Introduction

Binary JData Format Specification Development Guide

What is BJData

How to participate

bjdata's People

Contributors

Stargazers

Watchers

Forkers

bjdata's Issues

Spec for a compact format for object persistance

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent