@bradleypeabody, @kyzer-davis,
This is a continuation of my comment on PR #10
I thought of another approach. Instead of treating the integer part and the fractional part separately, the entire timestamp can be encoded in a fixed-point number.
I imagined another structure that reserves the last 32 bits for random bits or application-specific data. I'm not sure if anyone will need more than 90 bits for the timestamp. A field with 38 bits for the integer part and 52 bits for the fractional part is enough store a decimal number with 15 digits after the decimal point, like this: 0.123456789012345.
The structure described below also works with 48 bits reserved for milliseconds, requiring very small changes. By the way, I think that 48 bits for milliseconds may be more convenient in contexts where the best precision you can get is milliseconds, as in Java 8, because you don't need to encode the sub-millisecond part.
Sorry for sending spam to your project. I am very happy with the progress of this RFC update. I'm just trying to contribute some ideas, and I don't expect them to be accepted.
Structure with 90 bits for timestamp
This proposed structure consists of two fields: unixtime and random. The only mandatory requirements are to fill in the first 38 bits (48 bits for ms is better?) of the unixtime
field with the current seconds and fill in the 6 bits reserved for version and variant with their standardized values. The number of sub-second bits in the unixtime
field is variable/arbitrary. And the random
field does not have to be filled with random bits. The name of this field is just a hint that its value by default is random.
unixtime: a 90-bit field reserved for the Unix timestamp. This field is encoded as a fixed-point number in the form fixed<38,n>
, where 38
is the number of bits before the binary point, and n
is the number of bits after the binary point. The size n
is arbitrary so that it can be defined by the system that generates the UUID. The bits not used in this field can be filled with a sequence counter and/or some random bits, for example. The timestamp can represent dates up to the year 10680 A.D. (2^38/60/60/24/365.25 + 1970).
random: a 32-bit field reserved for random bits or application-specific data. Part or all of this field can be used, for example, to identify the UUID generator. This field is opaque, so its bits have no meaning, except for the system that generates the UUID.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unixtime |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unixtime | ver | unixtime |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| unixtime |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| random |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Encoding and decoding the sub-second part
Say you have a fractional value between 0 and 1 represented as f
, you can encode that fractional value with the precision of n
bits like this:
To decode a subsec
with n
precision you do the inverse operation:
This tutorial describes an efficient method for converting floating-point to fixed-point in MATLAB: Fixed-Point.pdf.
Generic algorithm to generate a UUIDv7
- Get the current Unix second and sub-second;
- Fill the first 38 bits of the UUID with the current Unix seconds;
- If the sub-second is not a fraction between 0 and 1, divide it by the time precision provided by the system. If the precision is millisecond, divide the sub-second value by 1,000. If the precision is microsecond, divide it by 1,000,000 and so on.
- Multiply the fraction of sub-seconds by
2^n
, where n
is the number of bits that fits to the system time precision. If the precision is millisecond, n
is equal to 10 (2^10 = 1,024). If the precision is microsecond, n
is equal to 20 (2^20 = 1.048.576) and so on;
- Fill in the next
n
bits of the UUID with the n
bits of the previous multiplication product;
- Fill the rest of the bits with random data;
- Insert the version and the variant bits in their standard positions using bitwise operations;
- Return the UUID.
Generic algorithm to extract the unix second
- Extract the first 38 bits of the UUID;
- Return the resulting unsigned integer.
Generic algorithm to extract the sub-second fraction
- Extract the first
n
bits starting from the 38th bit position of the UUID, where n
is the number of bits that fits the system time precision;
- Divide the integer value obtained by
2^n
to get a fractional value;
- Return the resulting fractional value.
To obtain a floating-point representation of the timestamp you can sum the two parts.
An implementation in python
Yesterday I implemented this simple python generator to test the concept:
#!/usr/bin/python3
#
# @date: 2021-03-06
# @author: Fabio Lima
#
# Changelog:
# - 2021-03-16: add parameter `subsec_decimal_digits` to function `uuid7()`
# - 2021-03-08: use `time.time_ns()` instead of `time.time()` to fix nanosecond precision
#
from uuid import UUID
from time import time_ns
from random import randint
TOTAL_BITS=128
VERSION_BITS = 4
VARIANT_BITS = 2
# Binary digits before the binary point
SEC_BITS=38
# Binary digits after the binary point
SUBSEC_BITS_S=0
SUBSEC_BITS_MS=10
SUBSEC_BITS_US=20
SUBSEC_BITS_NS=30
SUBSEC_BITS_DEFAULT=SUBSEC_BITS_NS
# Decimal digits after the decimal point
SUBSEC_DECIMAL_DIGITS_S=0 # 0
SUBSEC_DECIMAL_DIGITS_MS=3 # 0.999
SUBSEC_DECIMAL_DIGITS_US=6 # 0.999999
SUBSEC_DECIMAL_DIGITS_NS=9 # 0.999999999
SUBSEC_DECIMAL_DIGITS_DEFAULT=SUBSEC_DECIMAL_DIGITS_NS
SLICE_MASK_0 = 0xffffffffffff00000000000000000000
SLICE_MASK_1 = 0x0000000000000fff0000000000000000
SLICE_MASK_2 = 0x00000000000000003fffffffffffffff
def uuid7(t=None, subsec_bits=SUBSEC_BITS_DEFAULT, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
if t == None:
t = time_ns()
i = get_integer_part(t)
f = get_fractional_part(t, subsec_decimal_digits)
sec = i
subsec = round(f * (2 ** subsec_bits))
node_bits = (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS - subsec_bits)
uuid_sec = sec << (subsec_bits + node_bits)
uuid_subsec = subsec << node_bits
uuid_node = randint(0, (2 ** node_bits))
uuid_int = uuid_sec | uuid_subsec | uuid_node # 122 bits
uuid_int = __add_version__(uuid_int) # 128 bits
return UUID(int=uuid_int)
def uuid7_s(t=None):
return uuid7(t, SUBSEC_BITS_S, SUBSEC_DECIMAL_DIGITS_S)
def uuid7_ms(t=None):
return uuid7(t, SUBSEC_BITS_MS, SUBSEC_DECIMAL_DIGITS_MS)
def uuid7_us(t=None):
return uuid7(t, SUBSEC_BITS_US, SUBSEC_DECIMAL_DIGITS_US)
def uuid7_ns(t=None):
return uuid7(t, SUBSEC_BITS_NS, SUBSEC_DECIMAL_DIGITS_NS)
def __add_version__(uuid_int):
slice_mask_0 = SLICE_MASK_0 >> (VERSION_BITS + VARIANT_BITS)
slice_mask_1 = SLICE_MASK_1 >> (VARIANT_BITS)
slice_mask_2 = SLICE_MASK_2
slice_0 = (uuid_int & slice_mask_0) << (VERSION_BITS + VARIANT_BITS)
slice_1 = (uuid_int & slice_mask_1) << (VARIANT_BITS)
slice_2 = (uuid_int & slice_mask_2)
uuid_output = slice_0 | slice_1 | slice_2
uuid_output = uuid_output & 0xffffffffffff0fff3fffffffffffffff # clear version
uuid_output = uuid_output | 0x00000000000070008000000000000000 # apply version
return uuid_output
def __rem_version__(uuid_int):
slice_0 = (uuid_int & SLICE_MASK_0) >> (VERSION_BITS + VARIANT_BITS)
slice_1 = (uuid_int & SLICE_MASK_1) >> (VARIANT_BITS)
slice_2 = (uuid_int & SLICE_MASK_2)
uuid_output = slice_0 | slice_1 | slice_2
return uuid_output
def get_integer_part(t):
SUBSEC_DECIMAL_DIGITS_PYTHON=9
subsec_decimal_divisor = (10 ** SUBSEC_DECIMAL_DIGITS_PYTHON)
return int(t / subsec_decimal_divisor)
def get_fractional_part(t, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
SUBSEC_DECIMAL_DIGITS_PYTHON=9
subsec_decimal_divisor = (10 ** SUBSEC_DECIMAL_DIGITS_PYTHON)
return round((t % subsec_decimal_divisor) / subsec_decimal_divisor, subsec_decimal_digits)
def extract_sec(uuid):
uuid_int = __rem_version__(uuid.int)
uuid_sec = uuid_int >> (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS)
return uuid_sec
def extract_subsec(uuid, subsec_bits=SUBSEC_BITS_DEFAULT, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
uuid_int = __rem_version__(uuid.int)
node_bits = (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS - subsec_bits)
uuid_subsec = (uuid_int >> node_bits) & ((1 << (subsec_bits)) - 1)
return round(uuid_subsec / (2 ** subsec_bits), subsec_decimal_digits)
def list():
print("UUIDv7 sec in sec out subsec in subsec out")
for i in range(10):
t = time_ns()
u = uuid7(t)
i = get_integer_part(t)
f = get_fractional_part(t)
sec = extract_sec(u)
subsec = extract_subsec(u)
print(u, str(i).ljust(12), str(sec).ljust(12), str(f).ljust(12), str(subsec).ljust(12))
list()
OUTPUT:
UUIDv7 sec in sec out subsec in subsec out
018145e0-5fc5-7a65-ae82-507fbfe22749 1615951895 1615951895 0.943017418 0.943017418
018145e0-5fc5-7bad-b660-9ddd65978a4c 1615951895 1615951895 0.943095648 0.943095648
018145e0-5fc5-7c84-a49c-a9797ed70dc4 1615951895 1615951895 0.943146842 0.943146842
018145e0-5fc5-7d46-b435-4c76da9142c5 1615951895 1615951895 0.943193153 0.943193153
018145e0-5fc5-7de2-8716-009c3130d332 1615951895 1615951895 0.943230178 0.943230178
018145e0-5fc5-7e74-9231-05a35740fba0 1615951895 1615951895 0.943265028 0.943265028
018145e0-5fc5-7f4a-8022-e1ba32698bc3 1615951895 1615951895 0.943315983 0.943315983
018145e0-5fc5-7fd5-a568-b1d7bc223a91 1615951895 1615951895 0.943349262 0.943349262
018145e0-5fc6-7062-8b99-ba3f383d9b8e 1615951895 1615951895 0.943382783 0.943382783
018145e0-5fc6-70ed-98ce-fa5bf04836d8 1615951895 1615951895 0.943415972 0.943415972
Python time precision
The first version of the generator above used time.time()
. This method appears to return floating point numbers with 22 bits after the binary point. I had to replace it with time.time_ns()
to fix the nanosecond precision. If you need nanosecond precision, don't use time.time()
.
This loop prints the sub-seconds returned by time.time()
in binary format:
from time import time
for i in range(10):
n = 30 # system time precision
t = time() # get the current time
i = int(t) # integer part
f = t - int(t) # fractional part
# encode the fractional part to an integer
subsec = int(f * (2**n))
# print the result in binary format
print(bin(subsec))
This is the output of the previous loop:
|---------30 bits-----------|
0b10011010010000101110100000000
0b10011010010010000100100000000
0b10011010010010100100000000000
0b10011010010011010001000000000
0b10011010010011111000100000000
0b10011010010100001100100000000
0b10011010010100011111000000000
0b10011010010100110001100000000
0b10011010010101000011100000000
0b10011010010101010110000000000
|-------------------|-------|
empty
Note that the last 8 bits are zeros.