Giter Site home page Giter Site logo

ulid's Introduction

ulid

Build Status Build Status codecov Code Climate Issue Count

PyPI Version PyPI Versions

Documentation Status

Universally Unique Lexicographically Sortable Identifier in Python 3.

Status

This project is actively maintained.

Installation

To install ulid from pip:

    $ pip install ulid-py

To install ulid from source:

    $ git clone [email protected]:ahawker/ulid.git
    $ cd ulid && python setup.py install

Usage

Create a brand new ULID.

The timestamp value (48-bits) is from time.time() with millisecond precision.

The randomness value (80-bits) is from os.urandom().

>>> import ulid
>>> ulid.new()
<ULID('01BJQE4QTHMFP0S5J153XCFSP9')>

Create a new ULID from an existing 128-bit value, such as a UUID.

Supports ULID values as int, bytes, str, and UUID types.

>>> import ulid, uuid
>>> value = uuid.uuid4()
>>> value
UUID('0983d0a2-ff15-4d83-8f37-7dd945b5aa39')
>>> ulid.from_uuid(value)
<ULID('09GF8A5ZRN9P1RYDVXV52VBAHS')>

Create a new ULID from an existing timestamp value, such as a datetime object.

Supports timestamp values as int, float, str, bytes, bytearray, memoryview, datetime, Timestamp, and ULID types.

>>> import datetime, ulid
>>> ulid.from_timestamp(datetime.datetime(1999, 1, 1))
<ULID('00TM9HX0008S220A3PWSFVNFEH')>

Create a new ULID from an existing randomness value.

Supports randomness values as int, float, str, bytes, bytearray, memoryview, Randomness, and ULID types.

>>> import os, ulid
>>> randomness = os.urandom(10)
>>> ulid.from_randomness(randomness)
>>> <ULID('01BJQHX2XEDK0VN0GMYWT9JN8S')>

For cases when you don't necessarily control the data type (input from external system), you can use the parse method which will attempt to make the correct determination for you. Please note that this will be slightly slower than creating the instance from the respective from_* method as it needs to make a number of type/conditional checks.

Supports values as int, float, str, bytes, bytearray, memoryview, uuid.UUID, and ULID types.

>>> import ulid
>>> value = db.model.get_id()  ## Unsure about datatype -- Could be int, UUID, or string?
>>> ulid.parse(value)
>>> <ULID('0K0EDFETFM8SH912DBBD4ABXSZ')>

Once you have a ULID object, there are a number of ways to interact with it.

The timestamp method will give you a snapshot view of the first 48-bits of the ULID while the randomness method will give you a snapshot of the last 80-bits.

>>> import ulid
>>> u = ulid.new()
>>> u
<ULID('01BJQM7SC7D5VVTG3J68ABFQ3N')>
>>> u.timestamp()
<Timestamp('01BJQM7SC7')>
>>> u.randomness()
<Randomness('D5VVTG3J68ABFQ3N')>

The ULID, Timestamp, and Randomness classes all derive from the same base class, a MemoryView.

A MemoryView provides the bin, bytes, hex, int, oct, and str, methods for changing any values representation.

>>> import ulid
>>> u = ulid.new()
>>> u
<ULID('01BJQMF54D093DXEAWZ6JYRPAQ')>
>>> u.timestamp()
<Timestamp('01BJQMF54D')>
>>> u.timestamp().int
1497589322893
>>> u.timestamp().bytes
b'\x01\\\xafG\x94\x8d'
>>> u.timestamp().datetime
datetime.datetime(2017, 6, 16, 5, 2, 2, 893000, tzinfo=datetime.timezone.utc)
>>> u.randomness().bytes
b'\x02F\xde\xb9\\\xf9\xa5\xecYW'
>>> u.bytes[6:] == u.randomness().bytes
True
>>> u.str
'01BJQMF54D093DXEAWZ6JYRPAQ'
>>> u.int
1810474399624548315999517391436142935
>>> u.bin
'0b1010111001010111101000111100101001000110100000010010001101101111010111001010111001111100110100101111011000101100101010111'
>>> u.hex
'0x015caf47948d0246deb95cf9a5ec5957'
>>> u.oct
'0o12712750745106402215572712717464573054527'

A MemoryView also provides rich comparison functionality.

>>> import datetime, time, ulid
>>> u1 = ulid.new()
>>> time.sleep(5)
>>> u2 = ulid.new()
>>> u1 < u2
True
>>> u3 = ulid.from_timestamp(datetime.datetime(2039, 1, 1))
>>> u1 < u2 < u3
True
>>> [u.timestamp().datetime for u in sorted([u2, u3, u1])]
[datetime.datetime(2017, 6, 16, 5, 7, 14, 847000, tzinfo=datetime.timezone.utc), datetime.datetime(2017, 6, 16, 5, 7, 26, 775000, tzinfo=datetime.timezone.utc), datetime.datetime(2039, 1, 1, 8, 0, tzinfo=datetime.timezone.utc)]

Monotonic Support

This library supports two implementations for stronger guarantees of monotonically increasing randomness.

To use these implementations, simply import and alias it as ulid. They supports an identical interface as ulid, so no additional changes should be necessary.

Thread lock

The "thread lock" implementation is a simple implementation that follows that of the ulid/spec. When two or more identifiers are created with the same millisecond, the subsequent identifiers use the previous identifiers randomness value + 1. See PR 473 for more details.

>>> import time
>>> from ulid import monotonic as ulid

>>> ts = time.time()
>>> ulid.from_timestamp(ts)
<ULID('01EFZ62V7VTEQR4Q788PSBBQP8')>
>>> ulid.from_timestamp(ts)
<ULID('01EFZ62V7VTEQR4Q788PSBBQP9')>
>>> ulid.from_timestamp(ts)
<ULID('01EFZ62V7VTEQR4Q788PSBBQPA')>

Microsecond

The "microsecond" implementation is not defined in the ulid/spec. It uses a microsecond clock and uses those additional 10-bits into the first two bytes of the randomness value. This means that two identifiers generated within the same millisecond will be monotonically ordered. If two identifiers are generated within the same microsecond, they are ordered entirely by the randomness bytes. See PR 476 for more details.

>>> from ulid import microsecond as ulid
>>> ulid.new()
<ULID('01EH0VVVEC0BKJHF0370TNGQ4Z')>
>>> ulid.new()
<ULID('01EH0VVWPG0C6VDD0529CAHPNJ')>
>>> ulid.new()
<ULID('01EH0VVX8R0AN45DBYZZYMXVKT')>
>>> ulid.new()
<ULID('01EH0VVYA406BDKKRVCDJQZHYQ')>

Contributing

If you would like to contribute, simply fork the repository, push your changes and send a pull request. Pull requests will be brought into the master branch via a rebase and fast-forward merge with the goal of having a linear branch history with no merge commits.

License

Apache 2.0

Why not UUID?

UUID can be suboptimal for many uses-cases because:

  • It isn't the most character efficient way of encoding 128 bits of randomness
  • UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address
  • UUID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures
  • UUID v4 provides no other information than randomness which can cause fragmentation in many data structures

ULID provides:

  • 128-bit compatibility with UUID
  • 1.21e+24 unique ULIDs per millisecond
  • Lexicographically sortable!
  • Canonically encoded as a 26 character string, as opposed to the 36 character UUID
  • Uses Crockford's base32 for better efficiency and readability (5 bits per character)
  • Case insensitive
  • No special characters (URL safe)

Specification

Below is the current specification of ULID as implemented in this repository.

The binary format is implemented.

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
  10chars            16chars
   48bits             80bits

Components

Timestamp

  • 48 bit integer
  • UNIX-time in milliseconds
  • Won't run out of space till the year 10895 AD.

Randomness

  • 80 bits
  • Cryptographically secure source of randomness, if possible

Sorting

The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set must be used. Within the same millisecond, sort order is not guaranteed

Encoding

Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse.

0123456789ABCDEFGHJKMNPQRSTVWXYZ

Binary Layout and Byte Order

The components are encoded as 16 octets. Each component is encoded with the Most Significant Byte first (network byte order).

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32_bit_uint_time_high                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     16_bit_uint_time_low      |       16_bit_uint_random      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       32_bit_uint_random                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

String Representation

ttttttttttrrrrrrrrrrrrrrrr

where
t is Timestamp
r is Randomness

Links

ulid's People

Contributors

ahawker avatar kentac55 avatar mfogel avatar pyup-bot avatar stefanor avatar xkortex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ulid's Issues

Update Changelog

Would it be possible to update the changelog with the more recent version enhancements? Specifically, I'm upgrading from 0.0.6 to 0.0.7 and was hoping to get some high-level context. I've looked through the commits, I just wanted to make sure I wasn't missing the forest for the trees on anything.

Problems with using ulid with `mypy --strict`

Hi,
my project is using mypy --strict.
While importing ulid I'm getting a problem:

import ulid

MY_ULID = ulid.new()
error: Module has no attribute "new"

I found a workaround:

import ulid

MY_ULID = ulid.api.new()

But I'm sure the first way is a bit more preferable.

Investigation

I made some investigation on the problem.
The following modified content of __init__.py should fix the problem:

from .api import from_bytes, from_int, from_randomness, from_str, from_timestamp, from_uuid, new, parse
from .ulid import Randomness, Timestamp, ULID


__all__ = [
    # from .api
    'new', 'parse', 'from_bytes', 'from_int', 'from_str', 'from_uuid', 'from_timestamp', 'from_randomness',
    # from .ulid
    'Timestamp', 'Randomness', 'ULID',
]

__version__ = '0.0.14' 

So I explicitly imported items and explicitly listed them in __all__ . This is some code duplication, but it looks not fatal for me.

Questions

Q1. Should I create PR with the these changes for in __init__.py?

Q2. Should I crate PR to fix all mypy --strict errors for the whole ulid project? The fixes are going to be trivial from my experience. Here is the full list of mypy errors:

ulid\ulid.py:23: error: Function is missing a type annotation
ulid\ulid.py:26: error: Function is missing a type annotation
ulid\ulid.py:39: error: Function is missing a type annotation
ulid\ulid.py:52: error: Function is missing a type annotation
ulid\ulid.py:67: error: Function is missing a type annotation
ulid\ulid.py:82: error: Function is missing a type annotation
ulid\ulid.py:97: error: Function is missing a type annotation
ulid\ulid.py:112: error: Function is missing a return type annotation
ulid\ulid.py:115: error: Function is missing a return type annotation
ulid\ulid.py:118: error: Function is missing a return type annotation
ulid\ulid.py:121: error: Function is missing a return type annotation
ulid\ulid.py:124: error: Function is missing a return type annotation
ulid\ulid.py:127: error: Function is missing a return type annotation
ulid\ulid.py:275: error: Returning Any from function declared to return "Timestamp"
ulid\ulid.py:275: error: Call to untyped function "Timestamp" in typed context
ulid\ulid.py:284: error: Returning Any from function declared to return "Randomness"
ulid\ulid.py:284: error: Call to untyped function "Randomness" in typed context
ulid\api.py:47: error: Returning Any from function declared to return "ULID"
ulid\api.py:47: error: Call to untyped function "ULID" in typed context
ulid\api.py:104: error: Returning Any from function declared to return "ULID"
ulid\api.py:104: error: Call to untyped function "ULID" in typed context
ulid\api.py:124: error: Returning Any from function declared to return "ULID"
ulid\api.py:124: error: Call to untyped function "ULID" in typed context
ulid\api.py:137: error: Returning Any from function declared to return "ULID"
ulid\api.py:137: error: Call to untyped function "ULID" in typed context
ulid\api.py:149: error: Returning Any from function declared to return "ULID"
ulid\api.py:149: error: Call to untyped function "ULID" in typed context
ulid\api.py:198: error: Returning Any from function declared to return "ULID"
ulid\api.py:198: error: Call to untyped function "ULID" in typed context
ulid\api.py:244: error: Returning Any from function declared to return "ULID"
ulid\api.py:244: error: Call to untyped function "ULID" in typed context

API: from_timestamp should support ULID/Timestamp objects.

The from_timestamp function in ulid/api.py supports creating ULID instances with a timestamp from a given value. In addition to the currently supports types, it should also support Timestamp and ULID types as well.

  • When the given value is a Timestamp, a straight copy of all bytes should suffice.
  • When the given value is a ULID, a straight copy of the first 6 bytes should suffice.

Assert on ValueError exception messages

There are many cases where a ValueError can be raised by any number of functions across most of the modules in this package.

I am relatively confident that all of the @pytest.raises(ValueError) calls are correct based on code coverage metrics. However, I was proven wrong today and had to address some of them with #61.

The scope of this task is to go through all tests that use @pytest.raises, capture the exception and perform an additional assertion of the exception message to confirm that we're hitting the exact code path expected.

Add bounds checking for max timestamp overflow case

We need to add validation for handling the max timestamp value, 2 ^ 48 - 1, 281474976710655. Spec notes are at https://github.com/ulid/spec#overflow-errors-when-parsing-base32-strings

Parsing of the t value in the following example should raise an exception.

>>> import ulid
>>> s = '7ZZZZZZZZZZZZZZZZZZZZZZZZZ'
>>> t = '8ZZZZZZZZZZZZZZZZZZZZZZZZZ'
>>> ulid.parse(s)
<ULID('7ZZZZZZZZZZZZZZZZZZZZZZZZZ')>
>>> ulid.parse(t)
<ULID('0ZZZZZZZZZZZZZZZZZZZZZZZZZ')>
GitHub
The canonical spec for ulid. Contribute to ulid/spec development by creating an account on GitHub.

Values for range queries

To do range selection on time with ULIDs one needs to generate values with the lowest/highest possible randomness.

While this is doable with some effort, I feel it should be offered by the API. For example:

uilid.from_timestamp(timestamp, randomness=ulid.MIN_RANDOM)

ULID.hex skips leading zero

ulid-py 1.1.0

The .hex attribute does not correctly pad to 32 characters. It skips the leading zero, giving a len-31 string (33 with the 0x).

import ulid
import binascii 

u = ulid.from_randomness(0)
print(len(u.hex))
print(u.hex)
print(f"0x{binascii.hexlify(u.bytes).decode()}")

Out:

33
0x17b0c9d5b3b00000000000000000000
0x017b0c9d5b3b00000000000000000000

Fix non-ascii character tests

There are a number of tests based off the invalid_str_encoding fixture that are passing but the assert is being fulfilled by an incorrect code path.

  • Investigate best way to assert against exception message (pattern matching hopefully)
  • Fix tests and assert correct exception from expected code path

Fix CI (Replace Travis)

Travis CI is dead for open source projects (free). Swap to Circle CI, Github Actions, or all to Appveyor

Document how to get the next ULID

If you receive an ULID from some external source (e.g. a database) you might want to compute the next following ULID. This is useful for range-style queries where you are trying to retrieve every item after the aforementioned ULID. The library already does so internally to provide monotonic values but it's not entirely clear how to get the monotonically "next" ULID, given another one.

Example:

prev = ulid.parse(some_str)  # From external source
next = ...  # ???

I was playing with ulid.create but I couldn't quite figure it out. It seems to be that bumping the randomness by one and if that overflow bumping the timestamp by one is what we want.

A ULID.next method would be really nice.

Add Performance Benchmarking

I did some very basic work with pytest-benchmark during development. However, a more complete and robust set of performance tests for common API calls/flows should be written.

Completeness Criteria:

  • New benchmark module, say test_performance.py.
  • Use pytest groups on the module or filter it out so make test doesn't always run it.
  • Add make benchmark or some similar target to execute them.
  • Add as a new tox and Travis CI target.
  • Pick a stable machine to run benchmarks and add a baseline to the README.

Full Windows Support

This issue should track related work for making Windows a first class citizen for this package.

  • - All tests passing
  • - CI/CD pipeline via AppVeyor
  • - Attempt to merge travis* commands in Makefile into generalized ci* commands

Remove "development" requirements from base.txt

Currently the requirements/base.txt requirements file (for runtime) contains dependencies that are only useful for a development/deployment environment. These should be broken out into a separate file.

deepcopy doesn't work on ULID object

Running into a cryptic error when trying to deepcopy a ULID object:

>>> import ulid
>>> a = ulid.new()
>>> a
<ULID('01EAZF1038723PE2SS9BXRQC80')>
>>> import copy
>>> copy.deepcopy(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 173, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 147, in deepcopy
    y = copier(x, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 211, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 211, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 147, in deepcopy
    y = copier(x, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/ashu/.pyenv/versions/3.8.2/lib/python3.8/copy.py", line 162, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'memoryview' object

Any ideas?

API: from_randomness should support ULID/Randomness objects

The from_randomness function in ulid/api.py supports creating ULID instances with a randomness value from a given value. In addition to the currently supports types, it should also support Randomness and ULID types as well.

  • When the given value is a Randomness, a straight copy of all bytes should suffice.
  • When the given value is a ULID, a straight copy of the last 10 bytes should suffice.

Address reported pylint issues

Either fix the reported issue or explicitly add an ignore to silence the warning if it's "written as intended".

Issues:

hawker@mbp:~/src/github.com/ahawker/ulid|master⚡
⇒  make lint
************* Module ulid
W: 11, 0: Wildcard import api (wildcard-import)
W: 13, 0: Wildcard import ulid (wildcard-import)
************* Module ulid.api
C: 21, 0: Invalid constant name "TimestampPrimitive" (invalid-name)
C: 27, 0: Invalid constant name "RandomnessPrimitive" (invalid-name)
************* Module ulid.hints
C: 12, 0: Invalid constant name "Buffer" (invalid-name)
************* Module ulid.ulid
C:191, 8: Invalid variable name "ms" (invalid-name)
make: *** [lint] Error 20

Non-Crockford's Base32 letters converted differently in Java or Python implementations

Hi Andrew,

first of all, thanks for the amazing library, we've been using a lot!

I have a doubt regarding how we fix the conversion of ULIDs which are not following Crockford's Base32 standard.

We are using Lua to generate some guids (https://github.com/Tieske/ulid.lua) and for some reason, we get from time to time letters outside the Crockford's Base32.
While trying to fix this on our side (we're not sure how this is happening to be honest), we realised that Java and Python implementations silently corrects this issue in different ways:

Java

ULID.Value ulidValueFromString = ULID.parseULID("01BX73KC0TNH409RTFD1JXKmO0")
--> "01BX73KC0TNH409RTFD1JXKM00"

mO is silently converted into M0

Python

In [1]: import ulid

In [2]: u = ulid.from_str('01BX73KC0TNH409RTFD1JXKmO0')

In [3]: u
Out[3]: <ULID('01BX73KC0TNH409RTFD1JXKQZ0')>

In [4]: u.str
Out[4]: '01BX73KC0TNH409RTFD1JXKQZ0'

mO is silently converted into QZ

Shouldn't the python library behave as the Java one as per the Crockford's Base32 spec, converting L and I to 1 and O to 0 and only upper casing lower case letters instead of changing them?

Thanks a lot in advance!

Eddie

No module named 'ulid.api'

Starting ulid 0.2.0 I get this error when I try to simple install library

(venv) ➜  pip install ulid-py==1.0.0               
Collecting ulid-py==1.0.0
  Using cached https://files.pythonhosted.org/packages/3f/9e/deba154963e4eb00cd31b60f35329359dcbf8ad34a01371c10f32faf3867/ulid_py-1.0.0-py2.py3-none-any.whl
Installing collected packages: ulid-py
  Found existing installation: ulid-py 0.1.0
    Uninstalling ulid-py-0.1.0:
      Successfully uninstalled ulid-py-0.1.0
Successfully installed ulid-py-1.0.0
(venv) ➜  python3 <<< "import ulid" 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.8/site-packages/ulid/__init__.py", line 10, in <module>
    from .api import default, microsecond, monotonic
ModuleNotFoundError: No module named 'ulid.api'

Also, when I download a package from PyPI and unpack it there is not API folder inside

Add mypy support

This package was written with type hints (PEP484) so it should perform some static analysis checks on build.

  • Add make target for invoking checks
  • Add tox target?
  • Add pytest support?
  • Add TravisCI support

Properly handle invalid base32 characters

As of today, it is possible to input non-base32 characters, uU for example, into any of the api calls.

Doing this will cause the library to fail silently and perform an incorrect base32 decode on the string.

The API should provide a feedback mechanism that informs the caller of the bad input. The implementation of that feedback is still TBD (separate API call vs. exception vs. ??).

Considerations:

  • Performance of this computation for every decode call?
  • Double-penality for callers that have already made this guarantee?
  • Separate API call to validate? Is there use-cases for this outside of normal hot path?

Read the Docs Support

The codebase is relative well covered with comments and docstrings. We need to get the repository hooked up to an online documentation source, likely Read the Docs and get the API documentation updating as part of the build/release process.

API: Add from_* style method

Currently, the API exposes multiple methods for creating ulid.ULID instances from other data types. However, it does not support a "catch all" call that attempts to make the determination based on type and requires the caller to do that.

Let's imagine that a user of the library has read an input value from somewhere that they have a relatively high confidence is a ULID. However, they don't know the format in which it was stored. In order to support this mechanism, the user of the library needs to write the following code:

if isinstance(value, bytes):
    return ulid.from_bytes(value)
if isinstance(value, int):
    return ulid.from_int(value)
if isinstance(value, str):
    return ulid.from_str(value)
if isinstance(value, uuid.UUID):
    return ulid.from_uuid(value)

raise ValueError('Cannot create ULID from type {}'.format(value.__class__.__name__) 

This is pretty verbose, especially since we could hide this logic inside the library in a separate API call itself. It will be slightly slower that calling the correct method directly, since we have to run the if/else tree every time and don't know the "hot path", but should be helpful for this scenario.

Potential thoughts:

  • from_(value)
  • from_value(value)
  • from_obj(value)
  • from_unknown(value)
  • parse(value)
  • decode(value)
  • load(value)

Use rand.randbytes() instead of os.urandom()

[mmarkk@asus home]$ python -m timeit -s 'import random' 'random.randbytes(8)'
5000000 loops, best of 5: 93.9 nsec per loop
[mmarkk@asus home]$ python -m timeit -s 'import os' 'os.urandom(8)'
1000000 loops, best of 5: 248 nsec per loop

Datetime objects are naive

ulid.timestamp().datetime returns a naive datetime object (lacking time zone information), but yet the time is in UTC.

A naive datetime is ambiguous. Can the datetime be made aware by explicitly attaching the UTC time zone? The datetime module documentation has reasons why it is preferred to use aware datetimes to represent times in UTC

Enforce ULID Timestamp Range

A number of the test fixtures that generate data are powered by os.urandom. This works fine until it generates a random sequence of bytes that starts with a leading zero. This will cause tests to fail during duration due to int.bit_length stripping leading zeros in its computation.

Example test failure: https://travis-ci.org/ahawker/ulid/jobs/294263189

All of the above is a side-effect of the fact that there is no validation logic for the timestamp portion of a ULID. It should never contain a zero leading byte since the minimum value is the Unix epoch.

Items to address this issue:

  • Validation rules to enforce minimum and maximum timestamp values upon creation
  • Update test fixtures to specific generates values within valid or invalid ranges

Example:

>>> import ulid
>>> data = b"\x00\xcdh\x95}\xd9\xb2Yp':y0\xe4\xce\xdc"
>>> ulid.from_bytes(data)
<ULID('00SNM9AZESP9CQ09STF4RE9KPW')>
>>> ulid.from_int(int.from_bytes(data, byteorder='big'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hawker/src/github.com/ahawker/ulid/ulid/api.py", line 76, in from_int
    raise ValueError('Expects integer to be 128 bits; got {} bytes'.format(length))
ValueError: Expects integer to be 128 bits; got 15 bytes

Backport to Python 2.7?

Here are some initial thoughts but definitely incomplete list of changes necessary.

  • Switch hard coded bytes to be configurable to str.
  • Loss of int.to_bytes() and int.from_bytes().
  • Loss of datetime.timestamp()
  • Differences between memoryview and buffer?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.