Giter Site home page Giter Site logo

multiformats's Introduction

multiformats: Python implementation of multiformat protocols

Python versions

PyPI version

PyPI status

Checked with Mypy

Documentation Status

Python package status

standard-readme compliant

Multiformats is a compliant implementation of multiformat protocols:

Install

You can install the latest release from PyPI as follows:

$ pip install --upgrade multiformats

The following are mandatory dependencies for this module:

The following are optional dependencies for this module:

  • blake3, for the blake3 hash function.
  • pyskein, for the skein hash functions.
  • mmh3, for the murmur3 hash functions.
  • pycryptodomex, for the ripemd-160 hash function, the kangarootwelve hash function, the keccak hash functions and the sha2-512-224/sha2-512-256 hash functions.

You can install the latest release together with all optional dependencies as follows:

$ pip install --upgrade multiformats[full]

Usage

You can import multiformat protocols directly from top level:

>>> from multiformats import *

The above will import the following names:

The first five are modules implementing the homonymous specifications, while CID is a class for Content IDentifiers. Below are some basic usage examples, to get you started: for detailed documentation, see https://multiformats.readthedocs.io/

Varint encode/decode

>>> varint.encode(128) b'x80x01' >>> varint.decode(b'x80x01') 128

Multicodec wrap/unwrap

Procedural style:

>>> raw_data = bytes([192, 168, 0, 254]) >>> multicodec_data = multicodec.wrap("ip4", raw_data) >>> raw_data.hex() 'c0a800fe' >>> multicodec_data.hex() '04c0a800fe' >>> codec, _raw_data = multicodec.unwrap(multicodec_data) >>> _raw_data.hex() 'c0a800fe' >>> codec Multicodec(name='ip4', tag='multiaddr', code='0x04', status='permanent', description='')

Object-oriented style:

>>> ip4 = multicodec.get("ip4") >>> ip4 Multicodec(name='ip4', tag='multiaddr', code='0x04', status='permanent', description='') >>> raw_data = bytes([192, 168, 0, 254]) >>> multicodec_data = ip4.wrap(raw_data) >>> raw_data.hex() 'c0a800fe' >>> multicodec_data.hex() '04c0a800fe' >>> ip4.unwrap(multicodec_data).hex() 'c0a800fe'

Multibase encode/decode

Procedural style:

>>> multibase.encode(b"Hello World!", "base32") 'bjbswy3dpeblw64tmmqqq' >>> multibase.decode('bjbswy3dpeblw64tmmqqq') b'Hello World!'

Object-oriented style:

>>> base32 = multibase.get("base32") >>> base32.encode(b"Hello World!") 'bjbswy3dpeblw64tmmqqq' >>> base32.decode('bjbswy3dpeblw64tmmqqq') b'Hello World!'

Multihash digest

Procedural style:

>>> data = b"Hello world!" >>> digest = multihash.digest(data, "sha2-256") >>> digest.hex() '1220c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'

Object-oriented style:

>>> sha2_256 = multihash.get("sha2-256") >>> digest = sha2_256.digest(data) >>> digest.hex() '1220c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'

Optional truncated digests:

>>> digest = multihash.digest(data, "sha2-256", size=20) # optional truncated hash size, in bytes ^^^^^^^ >>> digest.hex() '1214c0535e4be2b79ffd93291305436bf889314e4a3f'

Multihash wrap/unwrap

Procedural style:

>>> digest.hex() '1214c0535e4be2b79ffd93291305436bf889314e4a3f' >>> raw_digest = multihash.unwrap(digest) >>> raw_digest.hex() 'c0535e4be2b79ffd93291305436bf889314e4a3f' >>> multihash.wrap(raw_digest, "sha2-256").hex() '1214c0535e4be2b79ffd93291305436bf889314e4a3f'

Object-oriented style:

>>> sha2_256 = multihash.get("sha2-256") >>> raw_digest = sha2_256.unwrap(digest) >>> raw_digest.hex() 'c0535e4be2b79ffd93291305436bf889314e4a3f' >>> sha2_256.wrap(raw_digest).hex() '1214c0535e4be2b79ffd93291305436bf889314e4a3f'

CID encode/decode

Decoding from multibase encoded strings:

>>> cid = CID.decode("zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA") >>> cid CID('base58btc', 1, 'raw', '12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95') >>> cid.base Multibase(name='base58btc', code='z', status='default', description='base58 bitcoin') >>> cid.codec Multicodec(name='raw', tag='ipld', code='0x55', status='permanent', description='raw binary') >>> cid.digest.hex() '12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95' >>> cid.hashfun Multicodec(name='sha2-256', tag='multihash', code='0x12', status='permanent', description='') >>> cid.raw_digest.hex() '6e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'

Multibase encoding:

>>> str(cid) # encode with own multibase 'base58btc' 'zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA' >>> cid.encode("base32") # encode with different multibase 'bafkreidon73zkcrwdb5iafqtijxildoonbwnpv7dyd6ef3qdgads2jc4su'

PeerID creation

Creation of CIDv1 PeerIDs:

>>> pk_bytes = bytes.fromhex( # hex-string of 32-byte Ed25519 public key ... "1498b5467a63dffa2dc9d9e069caf075d16fc33fdd4c3b01bfadae6433767d93") >>> peer_id = CID.peer_id(pk_bytes) >>> peer_id CID('base32', 1, 'libp2p-key', '00201498b5467a63dffa2dc9d9e069caf075d16fc33fdd4c3b01bfadae6433767d93') #^^ 0x00 = 'identity' multihash used (public key length <= 42) # ^^ 0x20 = 32-bytes of raw hash digest length >>> str(peer_id) 'bafzaaiautc2um6td375c3soz4bu4v4dv2fx4gp65jq5qdp5nvzsdg5t5sm'

Multiaddr parse/decode

>>> s = '/ip4/127.0.0.1/udp/9090/quic' >>> multiaddr.parse(s) Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic')) >>> b = bytes.fromhex('047f00000191022382cc03') >>> multiaddr.decode(b) Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))

Multiaddr protocols/addresses

Accessing multiaddr protocols:

>>> ip4 = multiaddr.proto("ip4") >>> ip4 Proto("ip4") >>> udp = multiaddr.proto("udp") >>> quic = multiaddr.proto("quic")

Creating protocol addresses from human-readable strings:

>>> a = ip4/"192.168.1.1" >>> a Addr('ip4', '192.168.1.1') >>> str(a) '/ip4/192.168.1.1' >>> a.value '192.168.1.1' >>> bytes(a).hex() '04c0a80101' >>> a.value_bytes.hex() 'c0a80101'

Creating protocol addresses from bytestrings:

>>> a = ip4/bytes([192, 168, 1, 1]) >>> a Addr('ip4', '192.168.1.1')

Multiaddr encapsulation/decapsulation

Creating multiaddresses by protocol encapsulation:

>>> ma = ip4/"127.0.0.1"/udp/9090/quic >>> ma Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic')) >>> str(ma) '/ip4/127.0.0.1/udp/9090/quic'

Bytes for multiaddrs are computed according to the (TLV)+ multiaddr format:

>>> bytes(ip4/"127.0.0.1").hex() '047f000001' >>> bytes(udp/9090).hex() '91022382' >>> bytes(quic).hex() 'cc03' >>> bytes(ma).hex() '047f00000191022382cc03'

Protocol decapsulation by indexing and slicing:

>>> ma[0] Addr('ip4', '127.0.0.1') >>> ma[:2] Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090')) >>> ma[1:] Multiaddr(Addr('udp', '9090'), Proto('quic'))

API

For the full API documentation, see https://multiformats.readthedocs.io/

The tables specifying all multicodecs and multibases known to this package are maintained as part of the multiformats-config repository.

Contributing

Please see CONTRIBUTING.md.

License

MIT ยฉ Hashberg Ltd.

multiformats's People

Contributors

mcamou avatar sg495 avatar yabirgb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

multiformats's Issues

Install breaks with typing-extensions >= 4.6.0

It looks like the typing-extensions package released version 4.6.x on May 23rd, and this breaks multiformats for me on any Python version older than 3.11.

Stacktrace is:

  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/multiformats/multibase/__init__.py", line 22, in <module>
    from bases import (base2, base16, base8, base10, base36, base58btc, base58flickr, base58ripple,
  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/bases/__init__.py", line 40, in <module>
    from . import encoding as encoding
  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/bases/encoding/__init__.py", line 293, in <module>
    base8 = FixcharBaseEncoding(alphabet.base8, pad_char="=", padding="include")
  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/bases/encoding/fixchar.py", line 106, in __init__
    validate(char_nbits, Union[int, Literal["auto"]])
  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/typing_validation/validation.py", line 635, in validate
    _validate_union(val, t)
  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/typing_validation/validation.py", line 515, in _validate_union
    validate(val, member_t)
  File "/home/markus/.local/share/micromamba/envs/kiara-mono/lib/python3.10/site-packages/typing_validation/validation.py", line 691, in validate
    raise unsupported_type_error
ValueError: Unsupported validation for type typing_extensions.Literal['auto'].

Pinning typing-extensions to 4.5.0 makes it work again...

Implement `black` & `ruff`

Implement black & ruff for best practices as I'm sure this module will be very popular in the upcoming days

CID equivalence vs identity: how to implement `==`?

CIDs can be encoded in various formats, while referring to identical objects. This can lead to subtle problems regarding equality.

In particular, I expected the following code snippet to return True:

In [1]: from multiformats import CID

In [2]: cid = CID("base32", 1, "raw", ("sha2-256", "2C26B46B68FFC68FF99B453C1D30413413422D706483BFA0F98A5E886266E7AE"))

In [3]: CID.decode(bytes(cid)) == cid
Out[3]: False

it returns False, because currently equality is based on the tuple-representation and CID.decode defaults to 'base58btc' in stread of 'base32'.

Question: Whouldn't it be better to implement equality based of the CID identifies the same thing in strad of if the CID looks the same?


While in the scenario above, it would be easy to add another method (like .refers_to_same(other) etc...) to explicitly check if the CID refers to the same thing, things get more complicated when combining CID with dag-cbor (continuing the code above):

In [4]: import dag_cbor

In [5]: o = {"foo": cid}

In [6]: o2 = dag_cbor.decode(dag_cbor.encode(o))

In [7]: o2 == o
Out[7]: False

I really would like to have objects rounttripped through dag-cbor to compare equally.

As far as I understand, the bytes-representation of any CID is uniquely defined. If that's true, I'm suggesting to implement __eq__ and __hash__ based on bytes. I could also prepare a PR.

Various objects in library incompatible with pickle deserialization

multiformats is an upstream module in a package I'm using (MarshalX/atproto) and I've been having some issues with pickling objects from multiformats. (I need to pickle them to serialize them between threads.)

It seems that Pickle requires that all objects with a custom implementation of __new__() also include a function __getnewargs__() (see StackOverflow post). For instance, attempting to unpickle a pickled CID object with pickle.loads() gives an error

TypeError: CID.__new__() missing 4 required positional arguments: 'base', 'version', 'codec', and 'digest'

which can be fixed by adding the following function to multiformats/cid/__init__.py:CID:

def __getnewargs__(self):
    return self.base, self.version, self.codec, self.digest

allowing unpickling as expected.

Would be happy to submit a PR to fix this later this week (would do it right now but I should really go to bed...)

Thanks for your help!

importlib_resource new API

When running tests with python 3.11, I get the following warning:

../../../root/.cache/pypoetry/virtualenvs/bovine-herd-jY87IYlK-py3.11/lib/python3.11/site-packages/multiformats_config/multicodec.py:80
  /root/.cache/pypoetry/virtualenvs/bovine-herd-jY87IYlK-py3.11/lib/python3.11/site-packages/multiformats_config/multicodec.py:80: DeprecationWarning: open_text is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.
    with importlib_resources.open_text("multiformats_config", "multicodec-table.json", encoding="utf8") as _table_f:

My reading of https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy says that fixing this would mean going Python 3.9 only. Is there any objection to doing this?

If not, I can prepare a pull request.

Implementation difference of 'hash' type multiformat and 'multihash'

I have been working on making a CAR block decoder in python as a personal project. When testing HAMT-shards, I got the exception: multiformats.multihash.err.MultihashValueError: Multicodec 'murmur3-x64-64' exists, but it is not a multihash multicodec.

It seems that the type of the murmur3 was recently changed to that of a 'hash' as it is not cryptographic (multiformats/multihash#157).

Changing lines

if multihash.tag != "multihash":
and
if codec.tag != "multihash":
to allow the 'hash' type allowed me to hash normally with:

>>> from multiformats import multihash
>>> mh = multihash.get(code=0x22)
>>> mh.digest(b'test')
b'"\x08\x9a\x12\x821\xf9\xbdM\x82'

Should the 'hash' type be allowed with multihashes, or should something else be done with them?

Migrating from Legacy

Version: multiformats==0.2.1
Code:

""" IPFS Network send and receive by pubsub
"""
from json import loads
from typing import Iterator
from urllib.parse import urljoin

import requests
from multiformats.multibase import decode, encode

from .basic_network_model import BasicNetworkModel


class IPFSNetwork(BasicNetworkModel):
    def __init__(self, api_url: str):
        self.api_url = api_url

    def send(self, data: bytes, net_type: str, timeout: float) -> bool:
        """Send string data to topic"""
        topic = net_type.encode("utf-8")
        topic_encode: str = encode(topic, "base64url")
        url = urljoin(self.api_url, f"/api/v0/pubsub/pub?arg={topic_encode}")
        files = {"file": ("d", data)}
        rsp = requests.post(url, files=files, timeout=timeout)
        return rsp.status_code == 200

    def recv(self, net_type: str, timeout: float) -> Iterator[bytes]:
        """Receive data from topic"""
        topic = net_type.encode("utf-8")
        topic_encode: str = encode(topic, "base64url")
        url = urljoin(self.api_url, f"/api/v0/pubsub/sub?arg={topic_encode}")
        with requests.post(url, stream=True, timeout=timeout) as rsp:
            cache = b""
            for chunk in rsp.iter_content(8196):
                cache += chunk
                data_list = cache.split(b"\n")
                data = data_list.pop(0)
                cache = b"".join(data_list)
                if data_list:
                    data_json = loads(data.decode("utf-8"))
                    result = {}
                    result["data"] = decode(data_json["data"])
                    result["seqno"] = decode(data_json["seqno"])
                    result["topicIDs"] = [
                        decode(b).decode("utf-8") for b in data_json["topicIDs"]
                    ]
                    yield result

Print Wranning:

.venv/lib/python3.11/site-packages/multiformats_config/multicodec.py:80
  /home/xz/Code/FragThing/base_network/.venv/lib/python3.11/site-packages/multiformats_config/multicodec.py:80: DeprecationWarning: open_text is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.
    with importlib_resources.open_text("multiformats_config", "multicodec-table.json", encoding="utf8") as _table_f:

.venv/lib/python3.11/site-packages/multiformats_config/multibase.py:53
  /home/xz/Code/FragThing/base_network/.venv/lib/python3.11/site-packages/multiformats_config/multibase.py:53: DeprecationWarning: open_text is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.
    with importlib_resources.open_text("multiformats_config", "multibase-table.json", encoding="utf8") as _table_f:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

make 'pyskein' dependency optional?

Hi there, thanks for this library, it's been tremendously useful for me so far.

I'm currently in the process of packaging my application (and all its dependencies, incl multiformats) for conda. I'm having problems packaging pyskein (which multiformats depends on, but I don't need) though, since it includes C extensions that is proving more difficult than I anticipated (I haven't figured out how to cross-compile for all the different target architectures).

So I had a look in the multiformats source code to check how its used and it looks like https://github.com/hashberg-io/multiformats/blob/main/multiformats/multihash/raw.py#L29 is the only place it is imported directly.

So, I was wondering, would it be possible to do something like:

try:
  import skein
except:
  log.warning("pyskein dependency not available, skein-based algorithms won't work")

That way, the default behavior would not change if the package is available in the virtualenv, but my conda packages would also be use-able without throwing an import error when I try to use mutliformats.

Additionally, pyskein could maybe be moved to an optional dependency in setup.cfg and the overall pip install would be a bit quicker since no compilation step is involved, but for my case that would not be required, I'd just leave out the pyskein dependency in my conda package description.

No worries if this is not wanted behavior for this package, but I figured I'd ask before I build my conda package off a patched tracking fork.

Use `files()` to open *-table JSONs (importlib deprecated `open_text`)

I'm using python 3.11 and I'm getting these a lot:

> multiformats_config/multicodec.py:80:
>> with importlib_resources.open_text("multiformats_config", "multicodec-table.json", encoding="utf8") as _table_f:
> multiformats_config/multibase.py:53
>> with importlib_resources.open_text("multiformats_config", "multibase-table.json", encoding="utf8") as _table_f:
DeprecationWarning:
    open_text is deprecated. Use files() instead.
    Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.

And since I'm here, let me state that this is the best implementation of the multiformats protocol I've seen and it is only thanks to this package that I was able to get multiformats adopted where I work.

CIDs should be flyweight

Currently, it is possible to create different copies of the same CID. In applications where CIDs are used repeatedly to address content, this is unnecessarily wasteful. Introduce a static weak-value dictionary of instances to make them flyweight.

Demonstration of Support for Multiformat Global Standards

Hi,

We are trying to promote Multiformats onto the global standards track at the Internet Engineering Task Force (IETF), the standards setting body for the Internet. Given that you are a member of the Multiformats community, and have implemented sub-module of Python module multiformats, we need your help to demonstrate that there is implementer and developer support for this technology.

Manu Sporny, editor for the Multibase[1] and Multihash[2] I-D specifications at the Internet Engineering Task Force (IETF), recently requested[3] the adoption of these specifications onto the global standards track (RFC).

Here's how this process works:

Manu has written the specifications[1][2] and published them as "Internet Drafts". [DONE]
Manu has requested[3] that they be moved onto the standards track. [DONE]
We need YOU, as an implementer or developer that uses this technology, to send an email in support of standardizing these specifications to [email protected].
The Area Directors at IETF then decide on the best path for standardization (Working Group, Area Director Sponsored work item, Informational Draft, etc.)

So, we need your help to standardize the Multiformats technologies as global Internet standards โ€“ it won't happen without your support.

To help, this is what we need you to do:

Write an email to [email protected].
Set the subject line to: "Re: Finding a home for Multibase and Multihash".
In the email, introduce yourself and what you do in the Multiformats ecosystem and why you find Multiformats useful.
Clearly state if you have implemented software that uses Multibase or Multihash and are supportive of the standardization of those technologies at IETF.
Clearly state that you are willing to provide implementation feedback on the specification (we will email you asking for feedback in a few months). NOTE: We're just documenting what's already implemented, so the specification shouldn't deviate from what's already deployed.

We are trying to get these emails demonstrating support into the IETF during the next two weeks. Please help us do that and ensure Multiformats become a global standard.

If you have any questions you can reach Manu at [email protected]. Thank you, in advance, for any support that you can provide.

~ Morgan

PS: We will be asking you to demonstrate similar support for use of Multiformats at the World Wide Web Consortium, the global standards setting body for the Web in the following weeks. You'll get a separate issue about that next week.

[1] https://datatracker.ietf.org/doc/html/draft-multiformats-multibase-06
[2] https://www.ietf.org/archive/id/draft-multiformats-multihash-05.html
[3] https://mailarchive.ietf.org/arch/msg/dispatch/Q9aUoF01Upbvl7STjJvjoU8hlHM/

`io.BufferedIOBase` or `typing.BinaryIO` ?

Disclaimer: I'm kind of new to typing in Python.

multiformats makes heavy use of type information ๐Ÿ‘. However, I'm wondering if it should be io.BufferedIOBase or typing.BinaryIO ? Naively I'd say, the binary-ness is more important than the buffered-ness.

In particular, I came across a case where I wanted to read from a file and use the resulting object in varint.decode. As varint.decode requires io.BufferedIOBase, typecheking my code with mypy fails.

Minimal example (e.g. test would be a function like varint.decode from the multiformats package):

This fails using mypy --strict:

from io import BufferedIOBase, BytesIO

def test(stream: BufferedIOBase) -> None:
    pass

def main() -> None:
    with open("test.bin", "rb") as testfile:
        test(testfile)
    test(BytesIO(b"test"))

this version passes mypy --strict:

from io import BytesIO
from typing import BinaryIO

def test(stream: BinaryIO) -> None:
    pass

def main() -> None:
    with open("test.bin", "rb") as testfile:
        test(testfile)
    test(BytesIO(b"test"))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.