Giter Site home page Giter Site logo

jq-bencode's Introduction

jq-bencode

Bencode encoder/decoder module for jq

Description

This is a module for jq whose purpose is to provide ways to convert JSON structures into Bencode-d strings and the other way around.

The conversion is lossy, since Bencode has no notion of null or boolean values, unlike JSON. Null values will be converted to empty strings, while boolean values to 0 or 1 integer values.

Interface

The module provides three pairs of functions/filters for external consumption, each pair containing a Bencode encoder and decoder. The reason there isn't just one such pair of filters stems from the author's original motivation in writing such an implementation for jq, that is processing serialised metadata used by the Zimbra project, a Collaboration Suite, internally. The serialisation is achieved through Bencode, however the string length is not computed counting the number of bytes the string encodes into but rather as the count of UTF-16 code units used for its representation (given how the main component of the software is Java-based). Behind the scenes, each of the filter pairs calls the same internal filter pairs implementing Bencode encoding/decoding and providing as argument another filter implementing the specific computation of a single character's length according to the intended algorithm.

Standard Bencode

String length as the count of bytes resulting from its encoding via UTF-8

bencode/0: JSON to Bencode conversion
bdecode/0: Bencode to JSON conversion

String length as character count

String length as the count of UTF-8 code units (which should be identical to the number of codepoints used in the string).

strbencode/0: JSON to Bencode conversion
strbdecode/0: Bencode to JSON conversion

String length as count of UTF-16 code units needed to represent the data

String length as the count of UTF-16 code units needed to encode the string. Each character outside the BMP needs two UTF-16 code units to encode (see here).

u16strbencode/0: JSON to Bencode conversion
u16strbdecode/0: Bencode to JSON conversion

Implementation notes

The actual implementation of the encoder/decoder relies on jq's streaming parsing, which turns a data structure into a list of path expressions, some of them with leaf values, some of them not. The internal _bencode function takes a streaming form of the JSON input and through reduce processes it generating the Bencode-d output. The internal _bdecode function instead processes the Bencode-d string character by character through reduce, generating a streaming JSON form which is converted to a JSON data structure via fromstream at the end.

Examples

After copying bencode.jq in one of the directories of the jq modules search path (see jq's documentation) or using jq's -L option to reference the directory containing the module file:

$ jq --null-input -L ~/jq-bencode '
import "bencode" as bencdec;

["a",[],{},[{}],{"a":[{"b":[2]}],"f":""},1,{"c":{"d":[]}},"a"] | bencdec::bencode
'

"l1:aledeldeed1:ald1:bli2eeee1:f0:ei1ed1:cd1:dleee1:ae"


$ jq --null-input -L ~/jq-bencode '
import "bencode" as bencdec;

["a",[],{},[{}],{"a":[{"b":[2]}],"f":""},1,{"c":{"d":[]}},"a"] | bencdec::bencode | bencdec::bdecode
'
["a",[],{},[{}],{"a":[{"b":[2]}],"f":""},1,{"c":{"d":[]}},"a"]

# In the case above we can achieve round-trip encoding/decoding since the are no null or boolean values in the input.


# Let's see the different string length implementations below using an emoji
# ๐Ÿ˜ƒ encodes to F0 9F  98 83, a 4-byte sequence using UTF-8; it's 1 codepoint (U+1F603) but it requires 2 UTF-16 code units (D83D DE03) to be represented.

$ jq --null-input -L ~/jq-bencode '
import "bencode" as bencdec;

[ "๐Ÿ˜ƒ" ] | bencdec::bencode
'
"l4:๐Ÿ˜ƒe"


$ jq --null-input -L ~/jq-bencode '
import "bencode" as bencdec;

[ "๐Ÿ˜ƒ" ] | bencdec::strbencode
'
"l1:๐Ÿ˜ƒe"

$ jq --null-input -L ~/jq-bencode '
import "bencode" as bencdec;

[ "๐Ÿ˜ƒ" ] | bencdec::u16strbencode
'
"l2:๐Ÿ˜ƒe"

Limitations

Bencode does not specify the encoding used in its string data. This implementation takes for granted the use of UTF-8 for string encoding, since that's the only legal encoding for JSON string data. Problems may arise if, for instance, byte strings within a Bencode-d string are the end result of a different encoding.

jq-bencode's People

Contributors

trantor avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.