Giter Site home page Giter Site logo

uuid6-ietf-draft's Introduction

Updates

Name:     draft-ietf-uuidrev-rfc4122bis
Revision: 14
Title:    Universally Unique IDentifiers (UUID)
Date:     2023-11-06
Group:    uuidrev
Pages:    58
URL:      https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.txt
Status:   https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/
HTML:     https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.html
HTMLized: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
Diff:     https://author-tools.ietf.org/iddiff?url2=draft-ietf-uuidrev-rfc4122bis-14

New UUID Formats

This is the GitHub repo for the IETF draft surrounding the topic of new UUID formats. Various discussion will need to occur to arrive at a standard and this repo will be used to collect and organize that information.

High Level Overview

  1. UUID version 6: A re-ordering of UUID version 1 so it is sortable as an opaque sequence of bytes. Easy to implement given an existing UUIDv1 implementation.

    time_high|time_mid|time_low_and_version|clk_seq_hi_res|clk_seq_low|node

  2. UUID version 7: An entirely new time-based UUID bit layout sourced from the widely implemented and well known Unix Epoch timestamp source.

    unix_ts_ms|ver|rand_a|var|rand_b

  3. UUID version 8: A free-form UUID format which has no explicit requirements except maintaining backward compatibility.

    custom_a|ver|custom_b|var|custom_c

  4. Max UUID: A specialized UUID which is the inverse of the Nil UUID from RFC4122.

    FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF


RFC Scope

In order to keep things on track the following topics have been decided as in-scope or out of scope for this particular RFC. For more information on any of these items refer to the XML, TXT, HTML draft, research and the issue tracker for a particular discussion (follow hyperlinks below.)

In Scope Topics - UUID Generation

In Scope Topics - UUID Best Practices as it relates to the previous topics

  • Global and Local Uniqueness (collision resistance mechanisms)
  • Unguessability
  • Sorting/Ordering techniques
  • Storage and Opacity best practices
  • Big Endian vs Little Endian bit layout
  • Any and all UUID security concerns!

Out of Scope Topics (Rolled into a new Draft that can be found here: New UUID Encoding Techniques)

Out of Scope Topics


Out of Scope Topics (as as the result of discussion threads)

Out of Scope Topics (for backwards compatibility)


Contributing

  • The XML draft in the root folder is the most recent working draft for re-submission to the IETF.
    • An HTML and Textual (.txt) RFC representation will be provided in the root folder to ease reader input and discussion.
    • Older drafts are available for view here or on the IETF Datatracker.
  • The RFC Draft utilize an XML formatted document that follows RFC7742 markup. All XML changes MUST follow this format and pass conversion to .txt and .html via https://xml2rfc.tools.ietf.org/
  • Utilize the issue tracker to discuss topics, solutions, problems, typos and anything else.
    • Where possible contribute to an existing Discussion Thread vs creating a new thread.
    • Reviewing is the pre-Draft 01 Research efforts is encouraged before diving into discussion threads.
    • New threads that propose alternative text SHOULD utilize Proposed Draft Change GitHub issue template to ensure proper information is captured for the draft authors.
    • Be civil!
  • Pull requests will be accepted as long as the text is concise, clear and objective.
    • PRs will not be accepted for changes to the decision made for the draft without full discussion.
    • PRs MUST include the updated .xml and xml2rfc generated .txt and .html documents.
    • Draft versions are frozen until submission to the IETF; at which point new work constitutes a new draft version.

Prototyping

Remember first and foremost that this specification is still a draft. Breaking changes are to be expected. Prototypes SHOULD only be implemented to verify or discredit topics of the draft text.

uuid6-ietf-draft's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uuid6-ietf-draft's Issues

Discussion: Unix Timestamp Granularity in UUIDv7

Question: Too many or too much?

Current Draft-02 implementation:

  • 36 bits to avoid the year 2038 problem.
  • If 32 bit timestamp, pad left/start of 32 bit to create 36 bit value in the UUIDv7
  • if 64 bit timestamp, truncate to last, right-most 36 bits for use in UUIDv7
  • 36 bits provides timestamp values up to the year 4147

Alternatives Proposed:

  • 34 bit timestamp, giving 2 bits back to entropy/node/randomness, valid up to the year 2514 source

Discussion: Max UUID (ffffffff-ffff-ffff-ffff-ffffffffffff)

DEL (named after ASCII) obviously isn't the best name, but how about reserving the UUID that has all 128 bits set to one, in addition to the Nil UUID? The primary use case of the Del UUID is as a sentinel when processing time-ordered UUID versions, though I don't believe the new RFC needs to specify the semantics of the reserved Nil and Del. For unordered UUIDs, the single Nil value may suffice, but now that the UUID spec gets ordered versions, it should be a good timing to reserve the other end.

At first, I though it was a good idea to invalidate the both ends (MIN and MAX) of UUIDv7 values for the purpose of sentinel, but on second thought after reading the Nil UUID discussion in #58, I believe just reserving the Del UUID is much simpler.

This change is not supposed to break existing applications because 0b111 var has been reserved for future definitions by the original RFC.

v7: Proposal: Incorporate counter into "subsecond time", explicit layouts for "subsecond time" field width

There's a lot to like about Draft 2, version 7 UUIDs. They solve many of the main issues people have raised in version 1. I especially appreciate the fractional binary format for subsecond values. However there are two issues that continue to nag at me.

Problem 1: The layout of time/counter bits .vs. random bits is ambiguous. For example, if you examine the 3 example layouts (Draft 2, §4.4.4.1), you'll note that the high order bits of the subsec_seq_node field may be used for subsecond time and seq information (Fig 4 & 5), or simply set randomly (Fig 3). Unfortunately there's no way of distinguishing which is which. This may prove problematic where such IDs become comingled and need to be sorted or sharded within a database.

Problem 2: Dedicating a fixed number of bits to monotonically increasing state (subsec_a and subsec_b fields) leads to endless debate about whether or not the size of those fields is insufficient, sufficient, or excessive. Use cases that require low rates of UUID creation result in wasted bits (bits that rarely change), while cases with high creation rates result in possible counter overflow issues. We have seen arguments from both sides of this problem. (I suspect most of us have argued both sides of this problem!)

Proposal:

[Note: This is highly derivative of the current Draft 2 proposal and incorporates ideas that have been floated in other comments elsewhere. Please forgive any verbiage that may sound like I'm claiming credit for any of this as being particularly original.]

There are two parts to this proposal...

Part 1: This is more a matter of semantics, but I believe being explicit about this will simplify both the spec, and the discourse we're having around it. Basically I'd like to take §4.2.1.2 of the original RFC where it talks about how a counter may be used to populate the low-order bits of the timestamp and generalize it thusly:

"Subsecond time" refers to a combination of two pieces of state. The high order bits MUST be obtained from the subsecond state provided by the system clock (expected to be at least millisecond precision). Any remaining low order bits MUST be obtained from a count of the number of IDs generated in the current system clock time interval. The distribution of bits between system clock vs UUID counter will be implementation dependent.

(Where exactly this goes in the spec is TBD, but maybe replace section 4.4.2 of Draft 2...?)

The idea here is that by explicitly incorporating the notion of a "counter" into "subsecond time", we can do away with debate about how many bits are / aren't allocated to seq or counter fields and, instead, focus on the rate at which IDs may be generated which is, after all, what people really care about.

However, in order to do that we need the second part of this proposal...

Part 2: tl:dr; Define a handful of "subsecond time" precisions and layouts that allow for a reasonable range of creation rates, and identify these layouts using a [new] "pr" field. The pr field can be small (two bits), and will allow implementors to select a layout most appropriate for their use case, thereby maximizing how the the remaining bits are distributed between subsecond time and random entropy. Specifically, if pr is a two-bit field, we can use it to define the following layouts:

pr subsecond field size (max ids per second) Bit allocation examples (system clock : counter)
0b00 10 bits (1024) 10 : 0 (msec clock, no counter)
0 : 10 (1-second clock, 10-bit counter)
0b01 20 bits (1 x 106) 10 : 10 (msec clock, 10-bit counter)
20 : 0 (usec clock, no counter)
0b10 30 bits (1.1 x 109) 10:20 (msec clock, 20-bit counter
20: 10 (usec-clock, 10-bit counter)
0b11 40 bits (1.1 x 1012) 20:20 (usec clock, 20-bit counter)
30: 10 (nsec-clock, 10-bit counter)

Location of the pr field is TBD (and I'm not sure how important this decision is), but immediately following the var field seems like an obvious choice. This would allow for the following layouts:

Layout 0b00 (10-bit subsec field)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            unixts                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts |       subsec      |ran|  ver  | ...dom                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|0 0|                     rand                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             rand                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Layout 0b01 (20-bit subsec field)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            unixts                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts |       subsec          |  ver  |       subsec  | rand  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|0 1|                     rand                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             rand                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Layout 0b10 (30-bit subsec field)

 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            unixts                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts |       subsec          |  ver  |       subsec          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|1 0|subsec     |   rand    |                 rand          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             rand                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Layout 0b11 (40-bit subsec field)

 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            unixts                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts |       subsec          |  ver  |       subsec          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|1 1|       subsec                  |          rand         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             rand                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

[NOTE: subsec field here uses fractional-binary format as described in Draft 2]

Summary

Draft 2 Version 7 fixes the number of dedicated subsecond time bits at 24 (subsec_a and subsec_b). This allows for a generation rate of ~4M uuids/second. That's pretty much it. Creating IDs faster than that requires setting state in the subsec_seq_node field, which becomes meaningless as soon as the ids in question are comingled with IDs that may have been generated with other layouts. And in cases where IDs are created slowly (< 1 / msec), 12 bits go unused, resulting in only 62 bits of entropy to help avoid collisions.

Contrast to this proposal where the fast case allows for 10 trillion uuids/second (pr = 0b11) with 46 bits of entropy. And, in the slow case (pr = 0b00), allows for 1000 uuids /second and 74 bits of entropy. And has the added benefit of allowing recipients of IDs to identify exactly which bits are and are not monotonic in nature.

Section 3.3. - on random least significant bits

In the current draft the section 2.3 says that the random component of 64 bits may be:

(a) filled with random bits; or
(b) filled with application defined values for uniqueness.

I have some considerations about it:

(0) I would like to remember that to be consistent with the other RFC-4122 versions, the first 2 bits of the random component should be set to ONE and ZERO, respectively.

(1) For the item (a) I propose that, in the case that more than one UUID v6 is generated within the same timestamp, the random component is incremented by 1. This is the approach adopted by the ULID specification*.

(2) For the item (b) I also propose to recommend that the 14 bits (2^14 = 16384) that are used for clock sequence in the version 1 could be used to embed an application defined value for uniqueness. The name continues the same, but the purpose changes, and the application decides what to put in these bits. I think that it could be easier to the existing libraries that already implement clock sequence. But in this case the clock sequence should not be incremented.

So the structure could be one of these two:

(1) In the LSB, 2 bits are fixed, and the remaining 62 are random

|-------MSB--------|-------LSB-------|
 00000000-0000-6000-8000-000000000000
|------------------|-----------------|
     timestamp           random

(2) In the LSB 2, bits are fixed, 14 bits are uniquifier, the remaining 48 are random

|-------MSB--------|-------LSB-------|
 00000000-0000-6000-8000-000000000000
|------------------|----|------------|
     timestamp    clockseq random

0: any value
6: fixed version
8: fixed variant (2 first bits are ONE and ZERO)

(3) As a consequence of the item (1), the multicast bit is not required anymore as in the version 1. Is it necessary to mention that in the draft?

Footnote:

Yet another UUID layout

Goals:

  1. Be collision-free
    It is most important! If the format is to be widely used, even low collision rates can be dangerous. UUID with a timestamp has much less entropy than UUIDv4. 128 bits (really 122 or 125) isn't that much;
  2. Be sortable
    This is the main goal why a new standard is developed;
  3. Be as monotony as possible (even with UUIDs from different sources);
  4. Be easy to use and avoid pitfalls;
  5. Be interoperable with RFC4122.

Some decisions:

  1. UUID should be only an identifier, not a storage box for a timestamp, machine ID, or whatever. The reader is prohibited from relying on any internals #28 (goal n.4);
  2. Make the Epoch reasonably short (goal n.1)
    As mentioned in #23, this will reduce collision probability by several times (or even an order of magnitude);
  3. Do not include any machine-id or other pre-configured id (goal n.4)
    Bad id generation/selection can greatly increase the collision probability compared to a random of the same size;
  4. Do not include an explicit sequence id (goal n.1)
    A sequence id contains too little entropy, which increases the collision probability;
  5. Always leave a large enough part of UUID random so that the next UUID is unpredictable (goal n.4);
  6. Do not enable monotonicity guarantees by default (optional only)
    Monotonic sequence generation is difficult to scale/maintain because locks/atomic operations are required to prevent race conditions.
    Therefore, by default, we do not try to be monotony. But the high-resolution timestamp should provide good ordering (UUIDs will be monotonic or near monotonic in most cases);

Sample format

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                    unix_timestamp_100ns_u56                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    unix_timestamp_100ns_u56   |  filling_u8   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | var |                     random_u61                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           random_u61                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • part1 (64 bit):
    unix_timestamp_100ns_u56: unix_timestamp in 100ns resolution, stripped down to 56 bit + BE byte order for sortablity;
    filling_u8: usually random (can be non-random to keep monotonicity).
  • part2 (64 bit):
    var: 0b111 for interporability with RFC4122;
    random_u61: must be random.

I prefered to use variant = 0b111, it leaves 125 bits for timestamp + random.
But the format can be easily adapted for variant = 0b10x, version = 0b0111 (122 effective bits) by shrinking filling_u8.

Timestamp length/resolution can also be adjusted during the standard review.

With the current proposal, the timestamp overflow will occur in 2198. Nothing bad should happen at this moment.
Timestamp reuse will occur at about 2250. After that, the UUID will still work, only the collision resistance will be slightly reduced (but still better than ULID and other proposals that have a large epoch). There may be minor issues (e.g. reusing old partitions in the DBMS if UUID is used as partitioning key), but I assume that larger UUIDs will be used on this day;

The proposed format is mostly ULID-like, but:

  • interoperable with RFC4122;
  • not trying to be 10k AD compatible, but trying to minimize the collision probability as much as possible;
  • not affected by the neighbor ID prediction issue mentioned at https://github.com/Sofya2003/ULID-with-sequence.

Pseudocode

Generation without guaranteed monotonicity

It is very simple

unix_timestamp_100ns_u56 = tv_sec*10000000 + tv_nsec/100
filling_u8 = random
part1 = unix_timestamp_100ns_u56<<8 | filling_u8
random_u61 = random
part2 = 0b111<<61 | random_u61
UUID = part1 <<64 | part2

Generation with guaranteed monotonicity

unix_timestamp_100ns_u56 = tv_sec*10000000 + tv_nsec/100

// the following line is for illustration only. real code must contain correct unsigned comparsion 
// if (part1_prev - 256e6) < (unix_timestamp_100ns_u56<<8) <= part1_prev {
if part1_prev - (unix_timestamp_100ns_u56<<8) < (unsigned) 256e6 {
	part1 = part1_prev + 1
} else {
	filling_u8 = random
	part1 = unix_timestamp_100ns_u56<<8 | filling_u8
}
part1_prev = part1
random_u61 = random
part2 = 0b111<<61 | random_u61
UUID = part1 <<64 | part2

This allows:

  • constantly generate not less than 2 monotonic UUIDs per nanosecond;
  • keep monotonicity in case of burst UUID generation;
  • keep monotonicity in case of small clock correction.

Discussion: UUIDv7 subsecond precision encoding

Question: Factions of a second representation or specific number of milliseconds/microseconds/nanoseconds for subsecond precision encoding within UUIDv7's subsec_a, subsec_b, and subsec_seq_node

Current draft-02 Implementation as per @bradleypeabody (who wrote the UUIDv7 section)

  • UUIDv7 encodes subsecond precision as fraction of a second:
  • Nanoseconds (ns) encoding utilizes 38 total bits
  • Microsecond (µs) encoding utilizes 24 total bits
  • Millisecond (ms) encoding utilizes 12 total bits

Sources from email between Brad and I on the topic:

[UUIDv7] supposed to be a fraction as opposed to the specific number of milliseconds.
If you want to indicate .75 seconds (three quarters of a second), you would write in decimal, following the period, the digits "75", right?
Well you do the exact same thing in binary, except that instead of "7" meaning 7 tenths, plus "5" meaning 5 hundredths, you do:
11
which means 1 half, plus one quarter.
Or, put another way, in decimal the digits go, from left to right:
tenths
hundredths
thousandths
ten thousandths
etc.
In binary it goes
halfs
quarters
eights
sixteeths
etc

To encode a fraction of a second s (expressed as a rational number in the range [0-1), e.g. 0.321 seconds) into a bit field of length n, calculate m as the maximum integer value representable with n bits plus one (aka 2^n, 1<<n, or "an int with the lower n bits set to 1, converted to a float64"). And then the formula is simply: s / (1/m) - the remaining fractional part is truncated and the number encoded as an integer. (e.g. 0.321 seconds encoded with 12 bits would be 0.321 divided by (1/4096) = 1314.816, truncated and expressed an integer value gives 1314 or in hex 0x522 or in binary 010100100010)

Taking a page from how floating point numbers are encoded, we could express fractions of a second as a bitfield divisor of whatever length a particular implementation uses. E.g. JS implementations which only have millisecond precision can reliably encode this information into 12 bits (probably 10 or 11 actually, but I digress). The first bit set to 1 would mean 0.5 seconds, two bits set to 1 and 1 would mean .75 seconds, a 0 and then 1 would be 0.25 seconds, etc.

One down side I see is that it requires floating point math (or some system that can represent fractional numbers) for encoding and decoding, which perhaps on some embedded systems may not necessarily be readily available - not sure how big a deal that concern is. It's also slightly more complicated than dealing with integer math, but I don't think to an extent that is problematic.


Alternative is specific number of milliseconds/microseconds/nanoseconds as integer.

  • Nanoseconds (ns) in a timestamp take 30 bits. (1,000,000,000 nanoseconds in 1 seconds so 1000000000 = 111011100110101100101000000000)
  • Microseconds (µs) in a timestamp take 20 bits. (1,000,000 microseconds in 1 second so 1000000 = 11110100001001000000)
  • Milliseconds (ms) in a timestamp take 10 bits. (1,000 millisecond in 1 second so 1000 = 1111101000)

Sources: https://en.wikipedia.org/wiki/Orders_of_magnitude_(time)

More Discussion on the alternative:

If alternative was utilized in Draft 02 we would encode as such:

  • Milliseconds (ms): All 12 bits of subsec_a is fully dedicated to millisecond information with a pad of two bits at the start since we only need 10. Easier to do this padding since Version bits come up right after.
  • Microseconds (µs): All 24 bits of both subsec_a and subsec_b utilized for microsecond information with 4 bits padded at the start since we only require 20. Easier to pad here because Variant bits are coming up.
  • Nanoseconds (ns): subsec_a, subsec_b , and 6 bits of subsec_seq_nod used for nanosecond information. Final 48 bits of subsec_seq_nod are used for random/entropy/node data. No padding required for this scenario.

v6: Retain definition and behavior of v1 fields (avoids confusion, allows for symmetric encoding.)

Version 6 changes how the node and clockseq fields behave. This is confusing, and also compromises one of the most important traits version 6 can have: symmetric encoding with version 1.

It's confusing because these fields behave in ways that are contradictory and overlapping between the two versions:

  • node is stable and (mostly) unchanging in version 1. In version 6 it is generated randomly each time.
  • clockseq is also (mostly) stable in version 1, but must be set randomly if node changes. In version 6 it's something completely different - a subsecond counter (which version 1 already allows for in low-order bits of the timestamp.)

Of more concern, however, is loss of ability to deterministically encode a v6 id from any v1 id, and vice-versa. Imagine a distributed legacy system where different components of the system are being migrated from version 1 to version 6 over time. Not being able to deterministically encode ids means there must be a centralized authority that contains the mapping of each v1 id to each corresponding v6 id. This is fundamentally counter to the entire thesis on which UUIDs are based, and in all likelihood will be (if you'll pardon the expression) an enormous pain in the ass for developers, who very likely chose UUIDs to avoid the need for a central authority.

I believe the community as a whole will be much better served if the version 6 spec consist of nothing more than a description of how version 1 fields are to be laid out (... while ensuring version 1 and 6 ids may be symmetrically encoded.)

Draft 03: Further clarify Method selection for embedded counters

Change Proposal Template

Source (Select one.)

  • IETF Published Draft
  • Work in Progress Draft

Change Reason (Select all that apply.)

  • Typos and grammatical issues
  • Bad Reference
  • IETF Verbiage modification (MAY, MUST, SHOULD, SHOULD NOT, etc)
  • New Text for additional context
  • Underlying XML Format Update
  • ASCII diagram updates (artwork, code samples, etc.)

Draft Number, Full Section, Name

Draft 02, Section 5.2.  Monotonicity and Counters

Current Text:

The following methods SHOULD be employed for single-node UUID implementations that do require batch UUID creation.
Fixed-Length Dedicated Counter Bits (Position A):
Fixed-Length Dedicated Counter Bits (Position B):
Monotonic Random:

Proposed Text:

Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.
Fixed-Length Dedicated Counter Bits (Method 1):
Fixed-Length Dedicated Counter Bits (Method 2):
Monotonic Random (Method 3):

Other Supporting information below:

Current text reads as if all methods should be leveraged when in reality there are two methods and some supporting text about the fixed-length methods.

Clarify how to represent fractional seconds in UUIDv7

Some of the UUIDv7 implementations I've seen on GitHub have not implemented fractional seconds correctly. For instance, putting the number of milliseconds directly into the 12 bit field subsec_a, thereby using a 0-999 subset of the full 0-4095 range.

To prevent this, could the initial descriptions of the subsec_a and subsec_b fields (around L568 of the -02.txt draft) make it more explicit that these must use the full range, with an example of what not to do.

Currently this isn't spelled out (though discussed in issue #24). For instance:

Section 4.4.1. - UUIDv7 Timestamp Usage

Additional sub-second precision (millisecond, nanosecond,
microsecond, etc) MAY be provided for encoding and decoding in the
remaining bits in the layout. [note: but doesn't say how!]

Section 4.4.4.1. - UUIDv7 Encoding

All 12 bits of scenario subsec_a is fully dedicated to millisecond
information (msec). [note: it isn't clear what "fully dedicated" means here]

It's only once the reader gets down to L845 that the requirements are spelled out:

Section 4.4.4.2. UUIDv7 Decoding

Similarly as per Figure 2, the sub-second precision values lie within
subsec_a, subsec_b, and subsec_seq_node which are all interpreted as
sub-second information after skipping over the version (ver) and
(var) bits. These concatenated sub-second information bits are
interpreted in a way where most to least significant bits represent a
further division by two. This is the same normal place notation used
to express fractional numbers, except in binary. For example, in
decimal ".1" means one tenth, and ".01" means one hundredth. In this
subsec field, a 1 means one half, 01 means one quarter, 001 is one
eighth, etc. This scheme can work for any number of bits up to the
maximum available, and keeps the most significant data leftmost in
the bit sequence.

As an additional suggestion, it would be very helpful if the text could include some examples of uuids and their min/max implied timestamps, to serve as test cases in unit tests.

Current state of UUID v6 standard

What's the current state of UUID v6 draft?

I think that UUID v6 (or other standardized alternative for UUIDs ordered by time) would be very important and useful for usage in various data storage systems like databases. However, there are still some "concerns and TODOs" listed in this repository, and IETF draft will expire in less than a month (on August 27, 2020). There also don't seem to be any big changes about UUID v6 draft for a few months.

Does this mean draft will be abandoned? If so, will there be any alternative in the near future? What should developers that want to use UUID-like primary keys and identifiers in databases without loosing performance due to unordered bits of current UUID versions?

Also, in case if draft expires, what does this mean? Can it be then resubmitted when it is ready?

UUID version for proprietary formats

Note: Using the term "proprietary format" here to refer to any uuid formats that are user-specific in a way that doesn't [yet] merit formal RFC specification. (This notion comes from the world of MIME types, where the MIME standard allows for non-standard types, given a suitable "tree prefix" on the type string.)

In #30, I've argued that the current version 8 spec is too vague to carry much meaning. It defines timestamp, node, and clockseq fields in the most general way possible, allowing for arbitrary bit lengths in all fields. But, in doing so, fails to meaningfully define any of these fields. Furthermore, the spec is prefaced with a variety of provisions about when version 8 should and should not be used, all of which scream, "ONLY USE THIS FOR EXPERIMENTAL OR PROPRIETARY FORMATS".

In short, I believe version 8 is at risk of falling afoul of the classic "wanting the best of both worlds, but getting the worst" design process. It's a spec that is overly strict for proprietary use cases, and overly vague for timestamp cases. Thus, I believe the right approach is to combine the existing versions 6-8 into a single, new, timestamp uuid version, as discussed in issue #30).

Then, for use cases that require a proprietary format, have a new version that is dedicated to that purpose, and that makes as few assumptions about the nature of the contained proprietary format as possible. In other words, the spec should boil down to the following:

  • variant bits as defined in RFC4122
  • version bits set to 1000 (version 8, or whatever version # ends up being used)
  • All other bits available for user data. The one provision here being that users are encouraged (but not required) to structure fields in a way that maximizes database locality (by placing the most stable fields / bits in the most significant bits of the UUID).

Draft 03: Add guidance on CSPRNG

Change Proposal Template

Source (Select one.)

  • IETF Published Draft
  • Work in Progress Draft

Change Reason (Select all that apply.)

  • Typos and grammatical issues
  • Bad Reference
  • IETF Verbiage modification (MAY, MUST, SHOULD, SHOULD NOT, etc)
  • New Text for additional context
  • Underlying XML Format Update
  • ASCII diagram updates (artwork, code samples, etc.)

Draft Number, Full Section, Name

Draft 03, Section 5.6. Unguessability

Current Text:

No change to current text

Proposed Text:

Add:

Advice on generating cryptographic-quality random numbers can be found in RFC4086

Other Supporting information below:

From RFC 4122 Errata: https://www.rfc-editor.org/errata/eid3641

Discussion: Shared Knowledge Schemes and Node Identifiers

Premise: Despite the UUID name, "universally unique" values are only achievable with some sort of pre-arranged pattern that decides which systems will provide which values. Adding more bits and entropy to the random parts of a UUID reduces collision probability, but cannot eliminate it. And there is no such thing as "enough collision resistance for any use case anywhere". The use of the MAC address in UUIDv1 was an attempt to use this approach. The drawbacks with it were some security risk in exposing MAC addresses, and the fact that with virtual computing so commonplace these days MAC address uniqueness can no longer be guaranteed. But the core idea behind it is: if you want to guarantee uniqueness, you have to have some sort of global registry or rules system which ensures some part of the UUID will be different from any other system. I'll refer to this approach as "shared knowledge" for lack of a better term (i.e. the different machines on the network share the knowledge of the MAC address registry and depend on it for uniqueness guarantees).

Question/proposition: Should the draft/spec indicate some means guaranteeing global uniqueness through shared knowledge. While the approach of using the MAC address is flawed, it does address a vital need and expose a problem with the current proposal: We still don't have guaranteed universally unique identifiers.

Prior art: A number of "this MUST be unique everywhere" registry systems exist (it's different from UUID generation but the core problem is the same). There are probably a lot more such system than I'm aware of. But just to list a few:

  • Domains
  • IP addresses (v4 and probably now v6 is much more relevant)
  • USB vendor IDs
  • JEDEC for flash memory (although it's only 8 bits devoted to the organization ID :) )
  • And of course, MAC addresses

I'm sure there are many more. But you see what's happening here? When people REALLY need uniqueness, they create a registry, because there isn't any other way to achieve uniqueness guarantees.

Possible approaches:

  • Recommend the use of MAC addresses, like UUIDv1 did. (Problem: MAC addresses don't have uniqueness guarantees these days - random data provides better collision resistance against possible MAC address duplication)
  • Use IPv6 addresses. (Problem: not always available, and they are really long - 128-bits. Compression of zero values is possible, but still...)
  • Make a registry. This is the only practical course I can see so far, but the question is if it's worth the effort. But, for example, a registration scheme could be created so a prefix could be allocated that is guaranteed to be unique and not assigned to anyone else. Using this, such a value placed starting at some standardized position in the UUID could provide a full 100% uniqueness guarantee. It could have a variable length encoding, so as the numbers grow larger (e.g into the billions), it would take up a few more bytes in the UUID, but would still guarantee uniqueness. An encoding could be arrived at which ensures that later registry numbers with longer encodings don't conflict with earlier ones that happen to be followed by various random byte patterns - I'll lay this out in detail if there's enough agreement on the idea.

Thoughts? If we really want uniqueness, we need to confront the fact that there's only one way to guarantee it.

EDIT:

Looking at the language of RFC4122 again, it seems like they specifically wanted to avoid any central registry, but made an exception for MAC addresses:

One of the main reasons for using UUIDs is that no centralized
authority is required to administer them (although one format uses
IEEE 802 node identifiers, others do not). As a result, generation
on demand can be completely automated, and used for a variety of
purposes....

Indeed that's probably one of the main arguments against any such registry: It's inconvenient. And also, if we've learned anything from MAC addresses: If anything goes wrong with the system, using random data (which is generally fast, easy and pretty much always available) is better anyway, i.e. lower collision probability.

v7: Restrict use of the term "node" to version 6

node was originally defined as "MAC address or a random value". Then in this spec we've decided to be explicit about the fact it should not be a MAC address. Thus, the implication is that node MUST be a random value.

Meanwhile the definition of the subsec_seq_node field makes no mention of "node", other than in the name:

The remaining 62 bits which MAY be allocated to any combination of additional sub-second precision, sequence counter, or pseudo-random data.

(But it's at least explicit about the field holding subsecond time, sequence, or random data)

However in section 4.4.3 we define "node" as "the leftover bits" that aren't allocated to subsecond or sequence values, then rather tepidly state, "In most scenarios the node SHOULD be filled with pseudo-random data".

But that opens the door for it being something other than random values. But no mention is made of what else might be used there, or how that would be possible without falling afoul of the implied (legacy) definition.

Basically, its use is confusing and adds little value. We should just be explicit about the fact the "leftover bits" should be set to pseudo-random values and avoid the term altogether unless we're talking about the legacy definition (in version 6).

Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation?

Hello,

I understand that this may be beyond the scope of what you intended with your draft, but since it's slated to update 4122, and it has an impact on UUIDv6, I thought I'd drop this here in case you wanted some well-intentioned scope creep. 😄

While going through and manually revalidating my implementation of UUIDv6, I realized the library I maintained was generating UUIDv1 clock sequences subtly different than how it is defined in RFC 4122. This library was forked from an earlier one, and so this implementation detail of my library originated there. Here is what section 4.1.5 of RFC 4122 says:

4.1.5.  Clock Sequence

   For UUID version 1, the clock sequence is used to help avoid
   duplicates that could arise when the clock is set backwards in time
   or if the node ID changes.

   If the clock is set backwards, or might have been set backwards
   (e.g., while the system was powered off), and the UUID generator can
   not be sure that no UUIDs were generated with timestamps larger than
   the value to which the clock was set, then the clock sequence has to
   be changed.  If the previous value of the clock sequence is known, it
   can just be incremented; otherwise it should be set to a random or
   high-quality pseudo-random value.

This says the clock sequence is used to help avoid duplicates that could arise when the clock is set backwards in time. What this means is that the number of UUIDs a stateful generator can generate is limited by the precision of the timestamp, whereas a stateless generator using a random clock sequence does not face the same limitations. That seems like an unintentional side effect of the wording of the RFC. Especially when the RFC says you should either fail to generate the UUID or block until it's safe to do so:

4.2.1.2.  System Clock Resolution

   The timestamp is generated from the system time, whose resolution may
   be less than the resolution of the UUID timestamp.

  ...

   If a system overruns the generator by requesting too many UUIDs
   within a single system time interval, the UUID service MUST either
   return an error, or stall the UUID generator until the system clock
   catches up.

So the person who originally implemented the UUIDv1 generator I now maintain, did more than the RFC said to do to avoid this failure mode and also incremented the clock sequence whenever a UUIDv1 has already been generated for this timestamp. This effectively removed the limitation on the number of UUIDv1s the stateful generator could generate, albeit while technically violating the language of the RFC.

if timeNow <= g.lastTime {
	g.clockSequence++
}

After I discovered this, I became curious about other UUIDv1 implementations in Go, and other languages, and I came to find that every stateful UUIDv1 implementation I looked at chose to not follow this section of the RFC as it was written. Instead, they either implemented it the same as the code I maintain, or they took the simpler approach of incrementing the sequence on every generation. The C implementation provided in RFC 4122 was implemented as the RFC itself was written, but that was the only stateful one I found doing so.

So that makes me wonder if we should use this opportunity to update the clock sequence generation language from 4122, so that UUIDv6 stateful generator implementers aren't also encouraged to violate the RFC as it's currently written.

I'd be interested in your thoughts.

New Draft 03 Ideas

@kyzer-davis and @edo1 @sergeyprokhorenko @fabiolimace @nerg4l @broofa @nurked (hopefully I didn't miss anyone).

I typed up a summary of what I suggest should go into a new draft, along with the rationale.

https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md

I tried to keep it as succinct as possible and had to make some decisions, I'm sure some people will disagree on various things.

It would be great if we could review these issues being discussed against that and see how close this is to addressing everyone's concerns and try to start narrowing down what specifically should be in the next draft. The detailed discussion by issue can stay on the individual issue threads to keep them focused but if there are specific changes in the doc above and help move us toward an exact list of items to address, definitely let me know.

Some ideas for UUIDv7

@bradleypeabody, @kyzer-davis,

This is a continuation of my comment on PR #10

I thought of another approach. Instead of treating the integer part and the fractional part separately, the entire timestamp can be encoded in a fixed-point number.

I imagined another structure that reserves the last 32 bits for random bits or application-specific data. I'm not sure if anyone will need more than 90 bits for the timestamp. A field with 38 bits for the integer part and 52 bits for the fractional part is enough store a decimal number with 15 digits after the decimal point, like this: 0.123456789012345.

The structure described below also works with 48 bits reserved for milliseconds, requiring very small changes. By the way, I think that 48 bits for milliseconds may be more convenient in contexts where the best precision you can get is milliseconds, as in Java 8, because you don't need to encode the sub-millisecond part.

Sorry for sending spam to your project. I am very happy with the progress of this RFC update. I'm just trying to contribute some ideas, and I don't expect them to be accepted.

Structure with 90 bits for timestamp

This proposed structure consists of two fields: unixtime and random. The only mandatory requirements are to fill in the first 38 bits (48 bits for ms is better?) of the unixtime field with the current seconds and fill in the 6 bits reserved for version and variant with their standardized values. The number of sub-second bits in the unixtime field is variable/arbitrary. And the random field does not have to be filled with random bits. The name of this field is just a hint that its value by default is random.

unixtime: a 90-bit field reserved for the Unix timestamp. This field is encoded as a fixed-point number in the form fixed<38,n>, where 38 is the number of bits before the binary point, and n is the number of bits after the binary point. The size n is arbitrary so that it can be defined by the system that generates the UUID. The bits not used in this field can be filled with a sequence counter and/or some random bits, for example. The timestamp can represent dates up to the year 10680 A.D. (2^38/60/60/24/365.25 + 1970).

random: a 32-bit field reserved for random bits or application-specific data. Part or all of this field can be used, for example, to identify the UUID generator. This field is opaque, so its bits have no meaning, except for the system that generates the UUID.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           unixtime                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |            unixtime           |  ver  |        unixtime       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |var|                       unixtime                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            random                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Encoding and decoding the sub-second part

Say you have a fractional value between 0 and 1 represented as f, you can encode that fractional value with the precision of n bits like this:

subsec = f * 2^n

To decode a subsec with n precision you do the inverse operation:

f = subsec / 2^n

This tutorial describes an efficient method for converting floating-point to fixed-point in MATLAB: Fixed-Point.pdf.

Generic algorithm to generate a UUIDv7

  1. Get the current Unix second and sub-second;
  2. Fill the first 38 bits of the UUID with the current Unix seconds;
  3. If the sub-second is not a fraction between 0 and 1, divide it by the time precision provided by the system. If the precision is millisecond, divide the sub-second value by 1,000. If the precision is microsecond, divide it by 1,000,000 and so on.
  4. Multiply the fraction of sub-seconds by 2^n, where n is the number of bits that fits to the system time precision. If the precision is millisecond, n is equal to 10 (2^10 = 1,024). If the precision is microsecond, n is equal to 20 (2^20 = 1.048.576) and so on;
  5. Fill in the next n bits of the UUID with the n bits of the previous multiplication product;
  6. Fill the rest of the bits with random data;
  7. Insert the version and the variant bits in their standard positions using bitwise operations;
  8. Return the UUID.

Generic algorithm to extract the unix second

  1. Extract the first 38 bits of the UUID;
  2. Return the resulting unsigned integer.

Generic algorithm to extract the sub-second fraction

  1. Extract the first n bits starting from the 38th bit position of the UUID, where n is the number of bits that fits the system time precision;
  2. Divide the integer value obtained by 2^n to get a fractional value;
  3. Return the resulting fractional value.

To obtain a floating-point representation of the timestamp you can sum the two parts.

An implementation in python

Yesterday I implemented this simple python generator to test the concept:

#!/usr/bin/python3
#
# @date: 2021-03-06
# @author: Fabio Lima
#
# Changelog:
# - 2021-03-16: add parameter `subsec_decimal_digits` to function `uuid7()`
# - 2021-03-08: use `time.time_ns()` instead of `time.time()` to fix nanosecond precision
#

from uuid import UUID
from time import time_ns
from random import randint

TOTAL_BITS=128
VERSION_BITS = 4
VARIANT_BITS = 2

# Binary digits before the binary point
SEC_BITS=38

# Binary digits after the binary point
SUBSEC_BITS_S=0
SUBSEC_BITS_MS=10
SUBSEC_BITS_US=20
SUBSEC_BITS_NS=30
SUBSEC_BITS_DEFAULT=SUBSEC_BITS_NS

# Decimal digits after the decimal point
SUBSEC_DECIMAL_DIGITS_S=0   # 0
SUBSEC_DECIMAL_DIGITS_MS=3  # 0.999
SUBSEC_DECIMAL_DIGITS_US=6  # 0.999999
SUBSEC_DECIMAL_DIGITS_NS=9  # 0.999999999
SUBSEC_DECIMAL_DIGITS_DEFAULT=SUBSEC_DECIMAL_DIGITS_NS

SLICE_MASK_0 = 0xffffffffffff00000000000000000000
SLICE_MASK_1 = 0x0000000000000fff0000000000000000
SLICE_MASK_2 = 0x00000000000000003fffffffffffffff

def uuid7(t=None, subsec_bits=SUBSEC_BITS_DEFAULT, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
	
	if t == None:
		t = time_ns()
	
	i = get_integer_part(t)
	f = get_fractional_part(t, subsec_decimal_digits)
	
	sec = i
	subsec = round(f * (2 ** subsec_bits))
	
	node_bits = (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS - subsec_bits)
	
	uuid_sec = sec << (subsec_bits + node_bits)
	uuid_subsec = subsec << node_bits
	uuid_node = randint(0, (2 ** node_bits))
	
	uuid_int = uuid_sec | uuid_subsec | uuid_node # 122 bits
	uuid_int = __add_version__(uuid_int) # 128 bits

	return UUID(int=uuid_int)
	
def uuid7_s(t=None):
	return uuid7(t, SUBSEC_BITS_S, SUBSEC_DECIMAL_DIGITS_S)
	
def uuid7_ms(t=None):
	return uuid7(t, SUBSEC_BITS_MS, SUBSEC_DECIMAL_DIGITS_MS)

def uuid7_us(t=None):
	return uuid7(t, SUBSEC_BITS_US, SUBSEC_DECIMAL_DIGITS_US)

def uuid7_ns(t=None):
	return uuid7(t, SUBSEC_BITS_NS, SUBSEC_DECIMAL_DIGITS_NS)

def __add_version__(uuid_int):
	
	slice_mask_0 = SLICE_MASK_0 >> (VERSION_BITS + VARIANT_BITS)
	slice_mask_1 = SLICE_MASK_1 >> (VARIANT_BITS)
	slice_mask_2 = SLICE_MASK_2

	slice_0 = (uuid_int & slice_mask_0) << (VERSION_BITS + VARIANT_BITS)
	slice_1 = (uuid_int & slice_mask_1) << (VARIANT_BITS)
	slice_2 = (uuid_int & slice_mask_2)
	
	uuid_output = slice_0 | slice_1 | slice_2
	uuid_output = uuid_output & 0xffffffffffff0fff3fffffffffffffff # clear version
	uuid_output = uuid_output | 0x00000000000070008000000000000000 # apply version

	return uuid_output

def __rem_version__(uuid_int):
	
	slice_0 = (uuid_int & SLICE_MASK_0) >> (VERSION_BITS + VARIANT_BITS)
	slice_1 = (uuid_int & SLICE_MASK_1) >> (VARIANT_BITS)
	slice_2 = (uuid_int & SLICE_MASK_2)
	
	uuid_output = slice_0 | slice_1 | slice_2
	
	return uuid_output
	
def get_integer_part(t):
	SUBSEC_DECIMAL_DIGITS_PYTHON=9
	subsec_decimal_divisor = (10 ** SUBSEC_DECIMAL_DIGITS_PYTHON)
	return int(t / subsec_decimal_divisor)

def get_fractional_part(t, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
	SUBSEC_DECIMAL_DIGITS_PYTHON=9
	subsec_decimal_divisor = (10 ** SUBSEC_DECIMAL_DIGITS_PYTHON)
	return round((t % subsec_decimal_divisor) / subsec_decimal_divisor, subsec_decimal_digits)

def extract_sec(uuid):
	uuid_int = __rem_version__(uuid.int)
	uuid_sec = uuid_int >> (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS)	
	return uuid_sec
	
def extract_subsec(uuid, subsec_bits=SUBSEC_BITS_DEFAULT, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
	uuid_int = __rem_version__(uuid.int)
	node_bits = (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS - subsec_bits)
	uuid_subsec = (uuid_int >> node_bits) & ((1 << (subsec_bits)) - 1)
	return round(uuid_subsec / (2 ** subsec_bits), subsec_decimal_digits)

def list():

	print("UUIDv7                               sec in       sec out      subsec in    subsec out")
	
	for i in range(10):
		t = time_ns()
		u = uuid7(t)
		i = get_integer_part(t)
		f = get_fractional_part(t)
		sec = extract_sec(u)
		subsec = extract_subsec(u)
		print(u, str(i).ljust(12), str(sec).ljust(12), str(f).ljust(12), str(subsec).ljust(12))

list()

OUTPUT:

UUIDv7                               sec in       sec out      subsec in    subsec out
018145e0-5fc5-7a65-ae82-507fbfe22749 1615951895   1615951895   0.943017418  0.943017418 
018145e0-5fc5-7bad-b660-9ddd65978a4c 1615951895   1615951895   0.943095648  0.943095648 
018145e0-5fc5-7c84-a49c-a9797ed70dc4 1615951895   1615951895   0.943146842  0.943146842 
018145e0-5fc5-7d46-b435-4c76da9142c5 1615951895   1615951895   0.943193153  0.943193153 
018145e0-5fc5-7de2-8716-009c3130d332 1615951895   1615951895   0.943230178  0.943230178 
018145e0-5fc5-7e74-9231-05a35740fba0 1615951895   1615951895   0.943265028  0.943265028 
018145e0-5fc5-7f4a-8022-e1ba32698bc3 1615951895   1615951895   0.943315983  0.943315983 
018145e0-5fc5-7fd5-a568-b1d7bc223a91 1615951895   1615951895   0.943349262  0.943349262 
018145e0-5fc6-7062-8b99-ba3f383d9b8e 1615951895   1615951895   0.943382783  0.943382783 
018145e0-5fc6-70ed-98ce-fa5bf04836d8 1615951895   1615951895   0.943415972  0.943415972

Python time precision

The first version of the generator above used time.time(). This method appears to return floating point numbers with 22 bits after the binary point. I had to replace it with time.time_ns() to fix the nanosecond precision. If you need nanosecond precision, don't use time.time().

This loop prints the sub-seconds returned by time.time() in binary format:

from time import time

for i in range(10):
	
	n = 30 # system time precision
	t = time() # get the current time
	i = int(t) # integer part
	f = t - int(t) # fractional part
	
	# encode the fractional part to an integer
	subsec = int(f * (2**n))	
	
	# print the result in binary format
	print(bin(subsec))

This is the output of the previous loop:

  |---------30 bits-----------|
0b10011010010000101110100000000
0b10011010010010000100100000000
0b10011010010010100100000000000
0b10011010010011010001000000000
0b10011010010011111000100000000
0b10011010010100001100100000000
0b10011010010100011111000000000
0b10011010010100110001100000000
0b10011010010101000011100000000
0b10011010010101010110000000000
  |-------------------|-------|
                        empty

Note that the last 8 bits are zeros.

Draft 03: Further Clarify Version to Variant Mappings

Change Proposal Template

Source (Select one.)

  • IETF Published Draft
  • Work in Progress Draft

Change Reason (Select all that apply.)

  • Typos and grammatical issues
  • Bad Reference
  • IETF Verbiage modification (MAY, MUST, SHOULD, SHOULD NOT, etc)
  • New Text for additional context
  • Underlying XML Format Update
  • ASCII diagram updates (artwork, code samples, etc.)

Draft Number, Full Section, Name

Draft 02, Section 4.1. Variant and Version Fields

Other Supporting information below:

My Suggested edits that will make this a bit more clear:

  • Add the Lower case epsilon (ε) nomenclature to the Draft 03 Variant and Version Fields if it helps drive the point home.
    • Anywhere I reference UUIDv7/UUIDv8/Version 7/Version 8 change to UUIDv7ε/UUIDv8ε/Version 7ε/Version 8ε
  • I can also split Table 2 UUID versions defined by this specification into RFC 4122 Updated version reservations and Draft 03 Version reservations.
    • Table A: Define RFC 4122 variant 10xx/89AB + Version 6 then fill out a table up to v15 with "Reserved for future definition"
    • Table B: Define Variant 1110/E + Version 7ε and 8ε filling out the rest as "Reserved for future definition"

Changes from #26 (comment)

Reference implementations and test vectors?

I was wondering if it would make sense to provide some kind reference implementations and test vectors like every hash function specification does?

For example for UUIDv6 every timestamp, clock_seq, and node input -> some exact output

Then in Python a reference implementation could be something like this making it easy to test with the provided test vectors:

def uuid6(timestamp: int = None, clock_seq: int = None, node: int = None) -> UUID:
    ...

And then of course you could use timestamp, clock_seq, and node values from the test vectors to check against your own implementations.

Discussion: MAC Address usage within UUIDs

In section 4.3.3 (UUIDv6 Node Usage) of this specification, there is the following text:

UUIDv6 node bits SHOULD be set to a 48-bit random or pseudo-random number. UUIDv6 nodes SHOULD NOT utilize an IEEE 802 MAC address or the [RFC4122], Section 4.5 method of generating a random multicast IEEE 802 MAC address.

May I ask what the motivation is for this section that tells me that I "SHOULD NOT"?

I know that using MAC addresses isn't fully secure, but to fix that, RFC 4122 also has Section 4.5. This section says if you cannot use or don't want to use MAC addresses, you can set one bit to 1 (that is always 0 in normal MAC addresses) and generate some 47-bit integer for the remaining bits. However, section 4.3.3 also says I "SHOULD NOT" use Section 4.5.

I don't see any problem with accepting Section 4.5, because setting that unicast/multicast bit makes it more secure (and the only difference is that your generated id isn't 48 bits, but just 47 bits).

Draft 03: Correct IETF Verbiage in Fixed-Length Dedicated Counter Rollover Handling

Change Proposal Template

Source (Select one.)

  • IETF Published Draft
  • Work in Progress Draft

Change Reason (Select all that apply.)

  • Typos and grammatical issues
  • Bad Reference
  • IETF Verbiage modification (MAY, MUST, SHOULD, SHOULD NOT, etc)
  • New Text for additional context
  • Underlying XML Format Update
  • ASCII diagram updates (artwork, code samples, etc.)

Draft Number, Full Section, Name

Draft 03, Section 5.2. Monotonicity and Counters

Current Text:

The general guidance is that applications should freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken.

Proposed Text:

The general guidance is that applications SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken.


Other Supporting information below:

should > SHOULD

UUIDv7: Is "the node MUST NOT be set to all 0s" necessary?

https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-02#section-4.4.3

The UUIDv7 Node MAY
contain any set of data an implementation desires however the node
MUST NOT be set to all 0s which does not ensure global uniqueness.
In most scenarios the node SHOULD be filled with pseudo-random data.

Is that necessary? It creates a special case in pretty much every programming language.

For example in Python you have to create a node with something like the following.

node = 0
while node == 0:
    node = secrets.randbits(56)

Instead of just writing.

node = secrets.randbits(56)

Freeze the counter until it overflows

Freeze the counter until it overflows. Fatal run-time errors are not allowed. Freezing the counter may only result in a slight decrease in database performance

Discussion: Redefine variant bit (111) definition

Continuation of separate #24 thread

Question:
Should we redefine UUID variant bits 111 (E/F) which are currently "Reserved for future definition." as per the original RFC 4122 Section 4.1.1 source?


Proposal:

  • In Draft 02 set definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID.
  • With that precedent set UUIDv6 could be converted to UUIDv1 (0001) + Variant E/F (111) as a method to signal the alternative encoding for UUIDv1 labeled as UUIDv1ε.

Possible text changes depending on feedback from this issue and #24

As such the Draft 01 goes from the current three definitions:

Name Version Variant Description
UUIDv6 0110 10x (8/9/A/B) UUIDv1 with Re-ordered Gregorian timestamp, explicit start sequence counter, no MAC address
UUIDv7 0111 10x (8/9/A/B) 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using floating point math and fractions (38 bits allocated to subsecond precision).
UUIDv8 1000 10x (8/9/A/B) Relaxed implementation, any timestamp goes, future proof the specification, 122 bits to do as you desire with general guidelines of timestamp, sequence, random in that order

To the possible four definitions in Draft 02:

Name Version Variant Description
UUIDv1ε 0001 111 (E/F) UUIDv1 with Re-ordered Gregorian timestamp, explicit start sequence counter, no MAC address. (What is UUIDv6 in draft 01)
UUIDv6 0110 10x (8/9/A/B) 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using floating point math and binary fractions (38 bits allocated to subsecond precision). (What is UUIDv7 in draft 01)
UUIDv6ε 0110 111 (E/F) UUIDv6 with 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using integers to represent total number of subseconds (30 bits allocated to subsecond precision). (Did not exist in draft 01)
UUIDv7 0111 10x (8/9/A/B) Relaxed implementation, any timestamp goes, future proof the specification, 122 bits to do as you desire with general guidelines of timestamp, sequence, random in that order. (What was UUIDv8 in draft 01)
UUIDv8 1000 10x (8/9/A/B) Goes away in draft 02 as it is no longer required.

Make a concise list of concerns and (tentative) decisions regarding them

There are a lot of different individual concerns along with ideas on how things could be solved. It would be valuable to have a concise list of these along with the current decision on what the outcome is as expressed in the current draft. It should summarize the concern/question, concisely list out the alternatives and/or the pros and cons, and the decision as currently expressed in the draft. This should make discussions about specific features very focused and avoid having to repeat the same concerns over and over. It also means that good ideas and valid concerns that are still being considered have a place they can be recorded and summarized and made easily accessible. The README for this repo is probably the appropriate place. Visible, and people can submit PRs to add things (which I will readily merge as long as they concisely include data and pros and cons and leave out the "therefore MUST do something this way" - what we need instead is "the concerns are A, B, C solution X solves A and B but not C, solution Y solves all three but introduces problem D"). If all of the data can be concisely collected, the ensuing discussions will be far more productive and then the exact reason for changes can be documented as well.

Examples of things to cover:

  • Should we have alternate text formats or just the existing hex with dashes.
  • Are we sticking with 128-bits or are we introducing variable-length
  • Should a sort order be defined, if so what.
  • Is this a solution only for globally unique IDs, or does it include locally unique IDs as well (unique within one system, not the whole planet/solar system/galaxy/universe)

There are a bunch of these and they have exact arguments for various solutions, and only a subset of that information is covered in the draft.

Related: bradleypeabody/gouuidv6#3

Typo: Bad UUIDv7 reference in Section 4.4

Need to fix still,

4.4. UUIDv7 Layout and Bit Order

Old:
The format for the 16-octet, 128-bit UUIDv6 is shown in Figure 2

New:
The format for the 16-octet, 128 bit UUIDv7 is shown in Figure 2

Reported by Blachly, James [email protected] on the mailer


Already fixed, under #14, found by Ville Skyttä [email protected] on the mailer

Old
The 4 bit UUIDv8 version (0111)

New:
The 4 bit UUIDv7 version (0111)

Treat UUID like a black box

Treat UUID like a black box:

  • Disallow extracting timestamp or any other data from UUID
  • Remove ver, var and node from the RFC (like in UUIDv4)
  • But allow database shard on left part of UUID of any length

Draft 03: Fix Bad Format section text copied from RFC4122

Change Proposal Template

Source (Select one.)

  • IETF Published Draft
  • Work in Progress Draft

Change Reason (Select all that apply.)

  • Typos and grammatical issues
  • Bad Reference
  • IETF Verbiage modification (MAY, MUST, SHOULD, SHOULD NOT, etc)
  • New Text for additional context
  • Underlying XML Format Update
  • ASCII diagram updates (artwork, code samples, etc.)

Draft Number, Full Section, Name

Draft 02, Section 4. Format

Current Text:

The UUID format is 16 octets; some bits of the eight octet variant field specified below determine finer structure.

Proposed Text:

The UUID format is 16 octets; The variant bits in conjunction with the version bits described in the next section in determine finer structure.

Other Supporting information below:

Text was copied from RFC4122 which has an open errata. We should fix it vs open ourselves for an errata.
https://www.rfc-editor.org/errata/eid4975

Long UUID

And I don't expect a very long lifespan for 128 bit UUIDs. It is very likely that in 50 or 100 years, 256-bit identifiers (or something radically new) will be used.

160-bit identifiers would be great right now. By the way, 160-bit identifiers in Crockford's base32 would be the same length as 128-bit identifiers in usual UUID string format. Therefore they are compartible.

Originally posted by @sergeyprokhorenko in #23 (comment)

Typo: Proposed UUIDv8 implementation

The proposed UUIDv8 implementation (section 4.5.4) contains a couple typos:

  1. Set 12-bit time_or_node to the 12 least significant bits from the timestamp
    • Typo, the referenced field is called time_or_seq
  2. If the state was unavailable (e.g., non-existent or corrupted) or the timestamp is greater than the current timestamp; set the 12-bit clock sequence value (seq_or_node) to 0
    • seq_or_node contains only 8 bits, not 12

Just as general feedback: UUIDv8 feels oddly specific to me when it should be the most flexible. Why must timestamp_48 be set to 0 for 32 bit timestamps? These bits could be used for a sequence counter instead. It is also odd that timestamp and sequence counter are required, however the node according to the spec is optional and can be filled with more timestamp and sequence counter data. I would have expected that UUIDv8 would just contain (in this order) timestamp, sequence counter and node, with each having a variable length greater than zero and the version and variant bits being fixed. I don't really understand why it needs to be more specific than that.

Have just one new UUID version rather than three(?)

Playing Devil's Advocate: One of my first impressions of this proposal is that introducing three new versions seems "excessive".

Collectively, these three versions introduce a number of much-needed improvements. However I'm not entirely clear on what purpose the multiple versions serve. 'Seems like this is unnecessarily complicating the development of this spec.

For example, I would suggest the spec for version 7 provides the most value at present. It supports Unix timestamps, db-friendly byte-ordering, arbitrary sub-second resolution, and prohibits the use of MAC addresses, all of which are great. I.e. This is pretty obviously what most applications that need timestamp UUIDs should be using moving forward.

Meanwhile version 6 has nothing to recommend it over version 1 other than the reordering of time_* fields. I assume the nominal use case for this is to make it trivial to migrate version 1 uuids to a more db-friendly form? But I don't think that actually works because v6 currently prohibits the use of MAC addresses in the node field. Thus, converting from v1 -> v6 will be necessarily non-deterministic. (The node field will have to be set randomly in order to remain in compliance with this spec, since there's no way of knowing for sure that the v1 node is not a MAC Address.)

For anyone wanting to convert v1 uuids to v6 across a large, distributed system, that's going to be a huge issue.

As for version 8... well, I confess I don't really understand the use case for this yet. My first impression is that it's so abstract as to be almost meaningless. Technically, you could have a single timestamp bit ("The Boolean Epoch"?!?) with 121 bits of random data and still conform to the spec, no? I suppose (???) there's some value in allowing for arbitrary timestamp epoch/resolution/random bit construction of uuids but a spec that starts with, "The only explicitly defined bits are the Version and Variant leaving 122 bits for implementation specific time-based UUIDs" doesn't seem like much of a spec. It's more just a hand-wavy catchall that's going to invite people to create ad-hoc, proprietary sub-versions.

"It's a version 8 UUID, but we [Blargle Fargle Inc.] set bits 73-78 to b110011 for our PulsyWulsy™ blood oximeter product."

See also #29.

Typo: Bad UUIDv7 references in Section 4.5

This is already fixed in #14 but opening an issue to track until V02 is submitted to IETF since these are generating lots of feedback emails.

"4.5. UUIDv8 Layout and Bit Order":

Old:

   UUIDv8 SHOULD only be utilized if an implementation cannot utilize
   UUIDv1, UUIDv6, or UUIDv8.  Some situations in which UUIDv8 usage

New:

   UUIDv8 SHOULD only be utilized if an implementation cannot utilize
   UUIDv1, UUIDv6, or UUIDv7.  Some situations in which UUIDv8 usage

Thanks:
Ville Skyttä [email protected]
Blachly, James [email protected]
Stefano Zuccaro [email protected]

UUIDv7 with Millisecond Precision and Clock Sequence Rollovers

Hello,

I was on the fence as to whether I should include this in #24, or file it as its own issue. If that other issue is better, let me know and I'll add a comment there instead.

I am working on an implementation of the latest draft in Go, and when using millisecond precision at full-bore I can easily increment the clock sequence to the point of rolling over the 12 bit integer. When discovering this I wanted to understand how the rollover should be handled, and I noticed some issues with how section 4.4.2 is worded that contradict themselves and ultimately don't provide explicit direction on how to handle the rollover.

4.4.2.  UUIDv7 Clock Sequence Usage

   UUIDv7 SHOULD utilize a monotonic sequence counter to provide
   additional sequencing guarantees when multiple UUIDv7 values are
   created in the same UNIXTS and SUBSEC timestamp.  The amount of bits
   allocates to the sequence counter depend on the precision of the
   timestamp.  For example, a more accurate timestamp source using
   nanosecond precision will require less clock sequence bits than a
   timestamp source utilizing seconds for precision.  For best
   sequencing results the sequence counter SHOULD be placed immediately
   after available sub-second bits.

   The clock sequence MUST start at zero and increment monotonically for
   each new UUIDv7 created on by the application on the same timestamp.
   When the timestamp increments the clock sequence MUST be reset to
   zero.  The clock sequence MUST NOT rollover or reset to zero unless
   the timestamp has incremented.  Care MUST be given to ensure that an
   adequate sized clock sequence is selected for a given application
   based on expected timestamp precision and expected UUIDv7 generation
   rates.

The initial issue is that the first and second paragraph contradict each other. The first says we SHOULD utilize a monotonic sequence counter, without offering guidance on how to implement it without using a sequence counter. The second paragraph then goes on to say "The clock sequence MUST...", which with the usage of the indicates that there is only one type of clock sequence that should be used. This makes me think the SHOULD in the first paragraph might benefit from becoming a MUST.

The other issue I ran into is that it's unclear how to proceed when the sequence counter would roll over without the timestamp having changed. I understand that I should not permit it to roll over, but I don't know whether that's failure condition and I should return an error or whether I should just generate UUIDv7 with the maximum sequence number until the timestamp changes. Would there be value in being explicit about how to handle that failure mode?

The truth is, I believe the majority of organizations would lean on publicly available UUID implementations over implementing their own. To me that means ambiguities are going to result in implementation differences between languages and libraries within single languages, which could be problematic in a polyglot environment if the different implementations have their own interpretations of the RFC.

Separately from the wording in the RFC, is there something we can do about the clock sequence hitting the maximum value when using millisecond precision? Seeing how easy it was to hit the maximum clock sequence value, it seems to strongly devalue the millisecond precision variant of UUIDv7 to the point I'd probably actively discourage its use in favor of the microsecond precision.

Would love to hear your thoughts on both of these.

UUID7 and 128-bit signed integer

UUID1 and UUID6 can be used with 128-bit signed integers, but UUID7 cannot.

UUID1 and UUID6 use 60 bits for the timestamp. If the timestamp is unsigned, the maximum possible year is around 5235 AD (2^60/10^7/60/60/24/365.25 + 1582 = ~5235). But the RFC-4122 says that "since a UUID is a fixed size and contains a time field, it is possible for values to rollover around A.D. 3400, depending on the specific algorithm used". It indicates that the timestamp in the RFC-4122 is signed (2^59/10^7/60/60/24/365.25 + 1582 = ~3408).

UUID7 timestamp is a 36 bit unsigned integer. It can contain dates up to 4147 AD (2^36/60/60/24/365.25 + 1970 = ~4147). If a UUID7 is stored in a signed 128-bit integer, it could be sorted incorrectly after 3059 AD (2^35/60/60/24/365.25 + 1970 = ~3059). I think this could be a problem, at least hypothetically.

Some languages and databases have signed and unsigned integer primitives:

But some languages have the signed primitive only:

Rust language has both signed and unsigned 128-bit integers:

There's also an old discussion about 128 primitive in Go: golang/go#9455

v7: Change "clock sequence" to "counter"

We should avoid using the term "clock sequence" in the version 7 section, as its use there conflicts with how it's defined in versions 1 and 6.

[Edit: Just realized version 6 redefines the "clock sequence" behavior. See #52. I don't think that changes the basic argument here for just calling it "counter"]

Specifically, in versions 1 & 6, a clock sequence is randomly initialized and stable unless the node changes or the system clock regresses. Contrast that to version 7 where the clock sequence (counter) is initialized to zero at the start of each time interval, and increments each time a new ID is generated in that interval.

Just call it what it is: "counter".

Thought-exercise / proposal: An 86-bit clock sequence

[Edit: 'did my math wrong. "90 bit" -> "86 bit" throughout]

Observation: Clock precision is not the same as clock accuracy. Even in systems that provide micro or nano-second precision timestamps, the actual time (when compared to "true" USNO naval time) is unlikely to be accurate to more than a few 10's of milliseconds. I.e. Where UUIDs are concerned subsecond timestamp information is arguably indistinguishable from a simple counter. This is evidenced by RFC4122's provision for low-precision system clocks:

A high resolution timestamp can be simulated by keeping a count of the number of UUIDs that have been generated with the same value of the system time, and using it to construct the low order bits of the timestamp.

Observation: In version 1, the main reason for having a distinction between node and clock_seq was because node was expected to be a MAC address. However, that's proven to be a poor design choice and we all agree that a random node is now preferred. As long as there's some reasonably stable, random bits somewhere in the UUID, there's not really much reason for an explicit node field.

Observation: v1 clock_seq is initialized randomly, and incremented as needed to insure uniqueness and monotonicity. It also has desirable behavior around system clock regressions.

With the above in mind, we can expand theclock_seq field to include all of the bits fromnode as well as the bits needed for subsecond timestamp precision:, giving us a UUID layout that looks something like this:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            unixts                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|unixts |         clock_seq     |  ver  |        clock_seq      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var |            clock_seq     |                clock_seq      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           clock_seq                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

ver and var fields are as we've seen previoiusly in RFC4122.

unixts field is the 36-bit unsigned integer, unix epoch, 1-second resolution, field proposed elsewhere. (Allows for dates from 1970 to the year 4147).

clock_seq is [effectively] an 86-bit, unsigned integer, encoded most-significant bits first, with ver and var fields injected in their usual positions.

clock_seq behaves as follows:

  • clock_seq bits are all initialized randomly, with the exception of the first (high-order) bit, which is set to 0.
  • clock_seq may be re-used for successive uuids, with the following provisions (it's worth noting that this behavior is consistent with how clock_seq works in v1 uuids):
    • If unixts is unchanged (i.e. in the same clock cycle), clock_seq MUST be incrementied by one (1)
    • If unixts has advanced (i.e. a new clock cycle) and clock_seq's high-order bit is zero, no change is needed
    • If unixts has advanced (i.e. a new clock cycle) and clock_seq's high-order bit is one, reinitialize as described above
    • If unixts regresses, reinitialize as described above

(Note; The high-order bit logic prevents clock_seq from rolling over mid clock-cycle, which might otherwise break monotonicity.)

Observations

UUIDs are guaranteed to be monotonic and sort in time-creation order (maximizes DB locality)

The high-order bits of clock_seq serve the roughly the same purpose as v1's node - they provide a uniqueness guarantee across different systems for uuids generated with the same timestamp. Which bits are considered "high-order" and "stable" vs "low-order" and "dynamic" depends on the uuid creation rate and what the definition of "stable" is. For example, this table shows the number of high order bits that will remain "stable" for 1 day and 1 year at the given creation rates:

uuid creation rate # of stable bits (1 day) # of stable bits (1 year)
> 1/sec 67 59
1,000 / sec (~1K) 57 49
1,000,000 / sec 47 39
1,000,000,000 / sec 37 29

Typo: Bad time_or_node Reference in Section 4.5.4

Sequence 4.5.4 references time_or_node multiple times.

Examples:

   10.  Format by concatenating the 128-bits as: timestamp_32|timestamp_
        48|version|time_or_node|variant|seq_or_node|node
   6.   Set 12-bit time_or_node to the 12 least significant bits from
        the timestamp

Section 4.5 only describes time_or_seq and every other section under Section 4.5 only references time_or_seq.

v7: Add example layout that employs exotic precision timestamp

One of the benefits of the sub-second fraction representation is that we no longer need to stick to decimal clock resolutions like msec, usec, and nsec. We can employ some exotic precision resolutions such as ~0.25 milliseconds (1 / 4096 seconds) and ~5 microseconds. Please take a look at the following quick C code that prints the leading 48 bits of a UUIDv7:

#include <stdint.h>
#include <stdio.h>
#include <time.h>

int main() {
  struct timespec tp;
  clock_gettime(CLOCK_REALTIME, &tp);
  uint64_t timestamp = (uint64_t)tp.tv_sec << 12;

  // compute 12-bit (~0.25 msec precision) fraction from nsecs
  timestamp |= ((uint64_t)tp.tv_nsec << 12) / 1000000000;

  printf("%08llx-%04llx\n", timestamp >> 16, timestamp & 0xFFFF);
  return 0;
}

The above code truncates microsecond-precision (in my environment) sub-seconds to ~0.25-millisecond precision in order to fully utilize the 12-bit subsec_a space. With the sub-second fraction representation, truncating a decimal timestamp to a convenient size is always a good option to pack the timestamp into a binary field efficiently.

I think the RFC should include this approach in one way or another because, though some implementers might invent similar techniques, many might naively implement the microsecond-precision example layout as in the draft 02 and just waste ~4 bits. Also, this approach might help resolve the endless debate over the length of sub-second fields like #24.

Draft 2: Question about V6 clock sequence change, and whether it reduces the value of the current V6 spec

Hello,

Thank you for your work on trying to shepherd this draft through the standards track. I help maintain one of the Go UUID libraries, and I'm noodling on releasing a version that implements the draft so folks can begin playing with it and offering feedback.

I was starting to target revision 1, and noticed there are some changes in pending in the revision 2 WIP draft that would make me need to substantially refactor how I'm generating the UUID, specifically the change around how the clock sequence will be handled. No sweat, since I knew I'd be working with things that are subject to change. That said I wanted to see if there was more context behind the reason for that change, knowing that the V6 UUID is supposed to be very similar to the V1.

Ultimately it made me realize that my V6 generator would share even less code with my V1 generator, since the clock sequence would need to be generated differently. So I started to wonder if there is even value in having the V6 UUID in its current form at all, if we're going to make it even more different than V1, and whether we should just make the V7 the V6 and guide everyone towards that richer UUID value.

I'd love to hear the context behind the change, and your thoughts on whether the change devalues V6 UUID's goal of being similar to the V1 format.

Sorry, please delete.

Ugh. I brain-farted on what clock sequences were. I need to gather my thoughts on this. I'll submit a new issue if/when it's appropriate.

In the meantime just delete this one, please. Sorry for the bother.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.