Giter Site home page Giter Site logo

jlusiardi / tlv8_python Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 2.0 94 KB

Python module to handle type-length-value (TLV) encoded data 8-bit type, 8-bit length, and N-byte value as described within the Apple HomeKit Accessory Protocol Specification Non-Commercial Version Release R2.

License: Apache License 2.0

Python 100.00%
tlv tlv-encoder tlv-decoder homekit

tlv8_python's Introduction

Type-Length-Value8 (TLV8) for python run tests Coverage Status

Type-Length-Value (TLV) are used to encode arbitrary data. In this case the type and length are represented by 1 byte each. Hence the name TLV8.

A TLV8 entry consists of the following parts:

  • the type: this 8 bit field denotes the type of information that is represented by the data.
  • the length: this 8 bit field denotes the length of the data (this does not include the 2 bytes for type and length. For data longer than 255 bytes, there is a defined procedure available.
  • the value: these length bytes represent the value of this TLV. The different types of data is represented differently:
    • bytes: this is raw binary data and will be used as is, no further interpretation takes place
    • tlv8: this is a specialized case of bytes values. Using this instead of pure bytes enables nesting of data and creating a hierarchy.
    • integer: integers are stored in little-endian byte order and are encoded with the minimal number of bytes possible (1, 2, 4 or 8)
    • float: floats are stored as little-endian ieee754 numbers
    • string: strings are always UTF-8 encoded and do not contain the terminating NULL byte

TLV8 entries whose content is longer than 255 bytes are split up into fragments. The type is repeated in each fragment, only the last fragment may contain less than 255 bytes. Fragments of one TLV8 entry must be continuous.

Multiple TLV8 entries can be combined to create larger structures. Entries of different types can placed one after another. Entries of the same type must be separated by a TLV8 entry of a different type (and probably zero length).

TLV8 entries of unknown or unwanted type are to be silently ignored.

Examples

simple TLV8s

Encoding of some atomic examples:

  • an empty TLV of type 42: [42, None] will be encoded as b'\x2a\x00'.
  • a TLV of type 2 with 2 bytes 0x12, 0x34: [2, b'\x12\x34'] will be encoded as b'\x02\x02\x12\x34'
  • a TLV of type 3 that contains the TLV from above: [3, [2, b'\x12\x34']] will be encoded as b'\x03\x04\x02\x02\x12\x34'
  • a TLV of type 4 that contains 1024: [4, 1024] will be encoded as b'\x04\0x02\x00\x04'
  • a TLV of type 5 that contains 3.141: [4, 3.141] will be encoded as b'\x04\x04\x0a\xd7\x23\x41'
  • a TLV of type 23 with string Hello ๐ŸŒ: [23, 'Hello ๐ŸŒ'] will be encoded as b'\x17\x0a\x48\x65\x6c\x6c\x6f\x20\xf0\x9f\x8c\x8d'

fragmented TLV8s

Encoding of a fragmented TLV8 entry:

  • an TLV of type 6 that contains 256 bytes from 0 to 255: [6, b'\x00\x01...\xfe\xff'] will be encoded as b'\x06\xff\x00...\xfe\x06\x01\xff'

combined TLV8s

Encoding of two TLV8 Entries that follow each other in the input list:

  • the combination of 2 TLV8 entries ([1, 123] and [2, 'Hello']) will be encoded as b'\x01\x01\x7b\x02\x05\x48\x65\x6c\x6c\x6f'

sequences of TLV8s of same type:

  • a sequence of 3 TLV8 entries of type 1 ([1, 1], [1, 2] and [1, 1]) will be encoded as b'\x01\x01\x01\xff\x00\x01\x01\x02\xff\x00\x01\x01\x03'

Using in code

There are two main use cases of this module.

Create a bytes representation

Here we want to have a comfortable way to create a data structure in python and to encode this structure into a bytes value.

encode a simple list

For example, create a representation containing the following structure:

  • Type: 1, Value: 23
  • Type: 2, Value: 2345

This can be code like that:

import tlv8

structure = [
    tlv8.Entry(1, 23),
    tlv8.Entry(2, 2345)
]
bytes_data = tlv8.encode(structure)
print(bytes_data)

And this will result in: b'\x01\x01\x17\x02\x02)\t'

Nesting structures

Representing a line ([x: 10, y: 20] - [x: 30, y: 40]) between to points could be represented like:

  • Type: 1, Value:
    • Type: 3, Value: 10
    • Type: 4, Value: 20
  • Type: 2, Value:
    • Type: 3, Value: 30
    • Type: 4, Value: 40
import tlv8

structure = [
    tlv8.Entry(1, [
        tlv8.Entry(3, 10),
        tlv8.Entry(4, 10),
    ]),
    tlv8.Entry(2, [
        tlv8.Entry(3, 30),
        tlv8.Entry(4, 40),
    ])
]
bytes_data = tlv8.encode(structure)
print(bytes_data)

And this will result in: b'\x01\x06\x03\x01\n\x04\x01\n\x02\x06\x03\x01\x1e\x04\x01('

Decode a bytes representation

Decoding TLV8 entries from bytes data will return all bytes from all first level entries. This includes possible separator entries between entries of the same type.

Decoding can be assisted by hinting with an expected structure. To represent the structure in python dict objects are used and nested. The keys of the dict objects are the type ids of the TLV8 entries. If the id of an entry is not contained in the structure, it will be ignored.

decode the simple list

import tlv8

in_data = b'\x01\x01\x17\x02\x02)\t'
expected_structure = {
    1: tlv8.DataType.INTEGER,
    2: tlv8.DataType.INTEGER
}
result = tlv8.decode(in_data, expected_structure)

print(tlv8.format_string(result))

This will result in:

[
  <1, 23>,
  <2, 2345>,
]

decode nested data

import tlv8

in_data = b'\x01\x06\x03\x01\n\x04\x01\n\x02\x06\x03\x01\x1e\x04\x01('
sub_struct = {
    3: tlv8.DataType.INTEGER,
    4: tlv8.DataType.INTEGER
}
expected_structure = {
    1: sub_struct,
    2: sub_struct
}
result = tlv8.decode(in_data, expected_structure)

print(tlv8.format_string(result))

This will result in:

[
  <1, [
    <3, 10>,
    <4, 10>,
  ]>,
  <2, [
    <3, 30>,
    <4, 40>,
  ]>,
]

Using IntEnum data during encoding and decoding

Using enumerations might increase readabilty of encode end decode processes.

During encoding

It is possible to use enum.IntEnum for encoding:

import tlv8
import enum

class Keys(enum.IntEnum):
    X = 42
    # ...

class Values(enum.IntEnum):
    Y = 23
    # ...

result = tlv8.encode([
    tlv8.Entry(Keys.X, Values.Y)
])

print(result)

This will result in:

b'*\x01\x17'

During decoding

As during encoding, enum.IntEnum can be used for keys and values during decoding:

import tlv8
import enum

class Keys(enum.IntEnum):
    X = 42
    # ...

class Values(enum.IntEnum):
    Y = 23
    # ...

result = tlv8.decode(b'*\x01\x17', {
    Keys.X: Values
})

print(tlv8.format_string(result))
print(type(result[0].type_id), type(result[0].data))

This will result in

[
  <Keys.X, Values.Y>,
]
<enum 'Keys'> <enum 'Values'>

So the type_id and the data fields are not simple int instance anymore but values of their enumerations. This alos helps during using format_string to get a easier to read output.

Coding

The module offers the following primary functions and classes.

function format_string

This function formats a list of TLV8 Entry objects as str. The hierarchy of the entries will be represented by increasing the indentation of the output.

The parameters are:

  • entries: a python list of tlv8.Entries objects
  • indent: the level of indentation to be used, this defaults to 0 and is increased on recursive calls for nested entries.

The function returns a str instance and raises ValueError instances if the input is not a list of tlv8.Entry objects.

Example:

import tlv8

data = [
    tlv8.Entry(1, 3.141),
    tlv8.Entry(2, [
        tlv8.Entry(3, 'hello'),
        tlv8.Entry(4, 'world'),
    ]),
    tlv8.Entry(1, 2)
]
print(tlv8.format_string(data))

This will become:

[
  <1, 3.141>,
  <2, [
    <3, hello>,
    <4, world>,
  ]>,
  <1, 2>,
]

function encode

Function to encode a list of tlv8.Entry objects into a sequence of bytes following the rules for creating TLVs. The separator_type_id is used for the separating entries between two entries of the same type.

The parameters are:

  • entries: a list of tlv8.Entry objects
  • separator_type_id: the 8-bit type id of the separator to be used. The default is (as defined in table 5-6, page 51 of HomeKit Accessory Protocol Specification Non-Commercial Version Release R2) 0xff.

The function returns an instance of bytes. This is empty if nothing was encoded. The function raises ValueError if the input parameter is not a list of tlv8.Entry objects or a data value is not encodable. A ValueError will also be raised if the separator_type_id is used as type_id in one of the entries as well.

Example:

import tlv8

data = [
    tlv8.Entry(1, 3.141),
    tlv8.Entry(2, [
        tlv8.Entry(3, 'hello'),
        tlv8.Entry(4, 'world')
    ]),
    tlv8.Entry(1, 2)
]
print(tlv8.encode(data))

This will result in:

b'\x01\x04%\x06I@\x02\x0e\x03\x05hello\x04\x05world\x01\x01\x02'

function decode

Function to decode a bytes or bytearray instance into a list of tlv8.Entry instances. This reverses the process done by the encode function.

The parameters are:

  • data: a bytes or bytearray instance to be parsed
  • expected: a dict of type ids onto expected tlv8.DataType values. If the expected entry is again a tlv8.Entry that should be parsed, use another dict to describe the hiearchical structure. This defaults to None which means not filtering will be performed but also no interpretation of the entries is done. This means they will be returned as bytes sequence.
  • strict_mode: This defaults to False. If set to True, this will raise additional ValueError instances if there are possible missing separators between entries of the same type.

The function returns a list instance and raises ValueError instances if the input is either not a bytes object or an invalid tlv8 structure.

Example:

import tlv8

data = b'\x01\x04%\x06I@\x02\x0e\x03\x05hello\x04\x05world\x03\x01\x02'

structure = {
        1: tlv8.DataType.FLOAT,
        2: {
            3: tlv8.DataType.STRING,
            4: tlv8.DataType.STRING
        },
        3: tlv8.DataType.INTEGER
    }

print(tlv8.decode(data, structure))

This will result in:

[
  <1, 3.1410000324249268>,
  <2, [
    <3, hello>,
    <4, world>,
  ]>,
  <3, 2>,
]

function deep_decode

This function works like the decode function but tries to do it recursively. That means it decodes the first level of a TLV8 structure first, then looks at each entry and tries to decode that as well. This is mostly meant for debugging purposes in combination with format_string.

Example:

import tlv8

data = b'\x01\x01\x23\x02\x03\x04\x01\x42\x01\x01\x23'
print(tlv8.deep_decode(data))

This will result in:

[
  <1, b'#'>,
  <2, [
    <4, b'B'>,
  ]>,
  <1, b'#'>,
]

Notice:

This function might misinterpret data as TLV8 data. For example

import tlv8

data = tlv8.encode([
    tlv8.Entry(1, 16843330),
    tlv8.Entry(2, b'\x01')
])

# here data is b'\x01\x04B\x02\x01\x01\x02\x01\x01'

print(tlv8.deep_decode(data))

This will result in a misinterpretation of the entry with ID 1:

[
  <1, [
    <66, b'\x01\x01'>,
  ]>,
  <2, b'\x01'>,
]

class DataType

This enumeration is used to represent the data type of a tlv8.Entry.

Enumeration Entry TLV8 type Python type
BYTES bytes bytes, also bytearray for encoding
TLV8 tlv8 custom class tlv8.Entry for encoding and dict for the expected structure during decoding
INTEGER integer int
FLOAT float float
STRING string str
AUTODETECT n/a this is used declare that a data type is not preset but will be determined by the python type of the data

class Entry

This class represents a single entry in a TLV8 data set. The class overrides the methods __eq__, __str__ and __repr__ to fit the needs of the application.

constructor

The constructor takes the following parameters:

  • type_id: the type id of the entry. Must be between 0 and 255 (8-bit type id).
  • data: the data to be stored in this entry.
  • data_type: the data type of the entry. Defaults to DataType.AUTODETECT.
  • length: if set, this overrides the automatic length detection. This used for integer, when there is special need to set higher byte count than the value would need.

The constructor raises a ValueError if the type_id is not within the 8-bit range.

encode() -> bytes

This function is called to encode the data stored in this Entry. The data type of the data will be used to decide how to encode the data. It uses the tlv8.encode() function to encode nested lists of tlv8.Entry objects.

format_string() -> str

This function formats the data stored in this entry as readable string. It is mostly called by tlv8.format_string().

class EntryList

This class represents a list of entries. The class overrides the methods __repr__, __eq__, __len__, __getitem__ and __iter__ to fit the needs of the application.

constructor

The constructor takes the following parameters:

  • data: if set, this list of tlv8.Entry instances is used to initialize the EntryList.

The constructor raised a ValueError if the data is either not a list or not a list of tlv8.Entry instances.

append(entry)

Append the tlv8.Entry to the EntryList. It performs type checks, so only tlv8.Entry instances can be appended.

assert_has(type_id, message)

Looks for a tlv8.Entry instance with type_id in the first level of the EntryList. If none is found, it raises an AssertionError with the given message. This does not iterate recursivly, because the same type id may have different meanings on different levels (and different contexts).

encode(self, separator_type_id)

Encodes the EntryList using the given separator type id. This relies on tlv8.encode().

by_id(type_id)

Filters the EntryList and returns only Entry instance whose type_id match the given one. If no Entry instances were found it returns an empty list.

first_by_id(type_id)

Search the EntryList for the first Entry with the given type_id. If no such Entry was found, it returns None.

tlv8_python's People

Contributors

amanaplan avatar jlusiardi avatar kvaellning avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

weyou amanaplan

tlv8_python's Issues

Tests are not contained in sdist tarball on pypi.org

Hi! I package this project for Arch Linux.
Like many other downstreams we rely on the sdist tarball on pypi.org and we run a project's test suite to ensure its integration with our Python ecosystem.
Unfortunately the sdist tarball on pypi.org does not contain the tests for this project. Could you add them? :)

enable encoding of bytearray data

For the use in homekit_python it would be of much use to be able to use bytearray instances as data values in tlv8.Entry objects.

Simplify the access to the decoded structure

Think about a way to simplify the handling of this pattern:

    d1 = TLV.decode_bytes(decrypted)
    d1 = TLV.reorder(d1, [TLV.kTLVType_Identifier, TLV.kTLVType_Signature])
    assert d1[0][0] == TLV.kTLVType_Identifier, 'get_session_keys: no identifier'
    assert d1[1][0] == TLV.kTLVType_Signature, 'get_session_keys: no signature'

    # 5) look up pairing by accessory name
    accessory_name = d1[0][1].decode()

Oversized TLV8 might be parsed properly.

data = '01ff0601040440371bf8c6cd6a7fb85e77a7b436a1b5300ed2023ca7e23f61303856a57358fd8ac03f288472776854765eb3' \
       'a2fcb4497c8b5497c29c88a574479030ec36b176ae05ff85c337879f0a4146a9cc089e0cb48ba3c9c21af0b206d493b224de' \
       'ee52ff0f9b1d8710db6531748ee6d1d66b8b4a6d0690670fb8f1233010d190c4ede1776cb10806eae66c78881647e82e9ba7' \
       'c806f52184c6f108275719cf425c4f8ea0e86c6534712343f88a1482c986e3dd252715872dee506520903c17d27f02ea8957' \
       '719c255631b78a9f2ecb7af0dc245b370cefef28f4652eebbe34afda0138039714665dd880559d1f2667294207892137820c' \
       'd80533d8c0b22601ffa49d1bdc1b641a33297fe59672a89d69391417c77e31283cd7f0d40920004d1bf1fc38357d9599ac2b' \
       '4d8ce3ac7ab8725a01500d198e94b00da80aac64ead393b266dcf9d4a07c05ff34548f7ebebd63f8a00ae2c82f6ee8ac6bcc' \
       'e0ab1030e9268c36714e2ec11c3bf21331129d62978e069dd087cbbdc31bdd6e0cf4ca825b91ab3c8b240de19aa097fc01cd' \
       '471e8c1b5598044d21be12b84c97a1d70e46681e5ecebbde1c33bae9bbd9b3ad41ba2aff8f1f952d0ef0cfcb8a674d5b4c7f' \
       '515ba94341334e86aac277920bd9080b9bf702e16671a3e41c0930beb8a552aefc28a3a9a7f2818e8fbb84c37ae10fb6c5d2' \
       '2e6ba9899e01f082381c9a3344ecbaf801ff85e33306ec9823e72ec4c93f9a45aa657b16f46757aaaf7c74daf35840e68749' \
       '42c132f4a639562920318b9f9867d8e5b0d50deac48c4e14842c91d565b0dd1fec667d092d123a4e1a05fff0b7070f184b4f' \
       '399532c0d1cc0aaea326efea765bc88ace048040a3e07a741e26ef55203bb3f76c075e3b6d20ede89a2eaa63e23376b0dff4' \
       'ef3a797df34a39d8130f60316e86cafcb264b0b570376ba911dc2bda031328ea1c915a724bc69ad2700623ed19c3ae75f946' \
       '4e3b1669adb916ad58e3252580911db2b535af3f2b2207aea880a0d24f1f759888bd5e25b6cf7b2e5ce825ab0fd943a8378a' \
       'eb12906e0965540af14dc0bfa3fae2eb5361992efaf56501ff6ed5c6e686a4af3c2fa121c810cb84cd5abac94f6d618af493' \
       '29e34fa613d2b758e3bc79eb03cb78328f9cd34df43566589615e42088681b4f69775350c9abf68c107d312b0f2421bb53cf' \
       '05ff50c14d6ff0da74b50d9b080c5c06175d66b24b35eb1e25f940170c0815a0ead23703b86da2103cd1b33021fd981d95c6' \
       'a32a3752dc903b0acba949d7d51a1bcabaebc52941bb25d558132feb1794481c0a5911e53553407a8771503d7673d4c3061a' \
       '4d2d41a2897fe507423509760fbe4847423a51155b99b67bf43c72958ce9409a459b5ce42e61309e96091411b256ec294fb0' \
       'f32782efc80d9d548f3cee4fdb21babd011e118238ec7545b24e5af74317a2670179930156512875653dce4e957e94a7596f' \
       'a7e2b533a3eacb9781634c79c094e2cbfcfc62128a25431f9b56cc40b6097614e7a4b08c32b3a7f2e471f55a295a9a06e5b0' \
       'e07dec2ad282842aa6f176052acd544cb5d67c206a3e38e80e32f560a57edda173892a39b021d616d8f2862a5d111e6610c1e7'

data = binascii.unhexlify(data)
resp_data = tlv8.decode(data, {1: tlv8.DataType.BYTES})

Failed with

2020-05-12 19:56:52,627 pair_ble.py:0075 DEBUG Not enough data left. 45 vs 194
Traceback (most recent call last):
File "homekit/pair_ble.py", line 68, in
finish_pairing(pin_function())
File "/home/jlusiardi/Dokumente/src/homekit_python/homekit/controller/controller.py", line 505, in finish_pairing
response = write_fun(request, expected)
File "/home/jlusiardi/Dokumente/src/homekit_python/homekit/controller/ble_impl/init.py", line 671, in write
expected={AdditionalParameterTypes.Value: tlv8.DataType.BYTES})
File "/home/jlusiardi/Dokumente/src/homekit_python/venv/lib/python3.7/site-packages/tlv8/init.py", line 273, in decode
tmp = _internal_decode(data, strict_mode)
File "/home/jlusiardi/Dokumente/src/homekit_python/venv/lib/python3.7/site-packages/tlv8/init.py", line 219, in _internal_decode
raise ValueError('Not enough data left. {} vs {}'.format(len(remaining_data[2:]), tlv_len))
ValueError: Not enough data left. 45 vs 194

Entry::length is not applied during encoding

Found out that I need to specify a specific length for an integer. From the documentation this appears to be the case by passing the length argument to the constructor of Entry. Unfortunately, length is not honored on encoding.

Repro:

import tlv8
entry = tlv8.Entry(42, 1, length=8)
print(entry.encode().hex() # 2a 01 01, expected 2a 08 01 00 00 00 00 00 00 00

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.