Hello, I understand that this may be beyond the scope of what you in

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation? about uuid6-ietf-draft HOT 4 CLOSED

uuid6 commented on May 14, 2024

Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation?

from uuid6-ietf-draft.

Comments (4)

fabiolimace commented on May 14, 2024 7

TL;DR

The 'clock sequence' and the timestamp 'count' are not the same thing. They have different goals. The 'clock sequence' is a misnomer. The timestamp 'count' is just a counter with many names.

The 'count'

Many people confuse the 'clock sequence' with the 'count' used to simulate a high resolution timestamp. RFC-4122 says the following in the fourth paragraph of Section-4.2.1.2:

A high resolution timestamp can be simulated by keeping a COUNT of the number of UUIDs that have been generated with the same value of the system time, and using it to construct the low order bits of the timestamp. The count will range between zero and the number of 100-nanosecond intervals per system time interval.

Each implementation calls the 'count' with a different name. These are some names I found:

uuids_this_tick: used to "compensate for the fact that real clock resolution is less than 100ns."
adjustment: used to ajust the timestamp, assuming "that the gettimeofday() has microsecond granularity"
time_adjust: used to "check that we're not generating uuid's faster than we can accommodate"
nsecs: used to throw error if too many uuids are requested
_clockCounter: used to "compensate inadequate resolution of JDK system timer."
counter: used to 'simulate" a high resolution clock.

Note that all of the above 'count' implementations have the same function: simulate a high resolution timestamp. This is the common formula used, assuming a system time with millisecond precision: TIME * 10000 + COUNT. The formula is derived from this paragraph:

If UUIDs do not need to be frequently generated, the timestamp can simply be the system time multiplied by the number of 100-nanosecond intervals per system time interval.

For example, a generator that relies on a system clock that ticks every millisecond must use a 'count' variable that can be initialized to zero and increments whenever the system millisecond repeats. But it has a limit of 10,000 increment operations. When the 'count' reaches this limit, the Section-4.2.1.2 says that the generator should either throw an error or wait the clock to advance. If the generator is allowed to generate more than 10,000 UUIDs in the same millisecond, the generator would be violating the limit of 10,000,000 per second per machine:

The UUID generation algorithm described here supports very high allocation rates of up to 10 million per second per machine if necessary, so that they could even be used as transaction IDs.

The clock sequence

The clock sequence has a very different goal:

For UUID version 1, the clock sequence is used to help avoid duplicates that could arise when the clock is set backwards in time or if the node ID changes.

The RFC-4122 says that the clock sequence must be changed when:

the clock is backwards;
the clock might have been set backwards;
the generator is not sure if the clock is backwards or not;
the node identifier changed;
node identifiers are getting mixed up.

The clock sequence 'can just be incremented' (like a sequence). But it is a MAY, not a SHOULD. I've found some implementations that always randomize the clock sequence here and here.

I think the word 'sequence' is the cause of the confusion. Although the 'clock sequence' behaves like a sequence when it is incremented, it is a random number. So, the UUID v1 has not just two parts, namely time (100-nanos of the Gregorian epoch) and place (IEEE address), but THREE: time, place and some entropy.

Some people add another task to the 'clock sequence': to be an extension of the timestamp. I don't see any problem with this, but it's not explicit in RFC-4122. Read this text from Wikipedia:

A 13- or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node... Since the time and clock sequence total 74 bits, 274 (1.8×1022, or 18 sextillion) version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID.

If a developer wants to add the task of timestamp extension, he/she just needs to replace the comparison operator '<' with '<=':

    if (timestamp <= last_time)
        clockseq++;

Thanks for reading! :)

from uuid6-ietf-draft.

kyzer-davis commented on May 14, 2024

If I read your request correctly:
You want to update some text around 4.2.1.2. System Clock Resolution to explicitly detail the method of incrementing the clock sequence if the previous timestamp is less than or equal to the current timestamp being used for the current UUIDvX generation vs stopping/blocking/stalling/erroring UUID generation altogether.

Else new timestamp or unknown previous timestamp:
Set clock sequence to random (UUIDv1/6) or explicit 0 start (UUIDv7)

from uuid6-ietf-draft.

theckman commented on May 14, 2024

You want to update some text around 4.2.1.2. System Clock Resolution to explicitly detail the method of incrementing the clock sequence if the previous timestamp is less than or equal to the current timestamp being used for the current UUIDvX generation vs stopping/blocking/stalling/erroring UUID generation altogether.

@kyzer-davis I think your summarization is almost right, with one minor comment. I wasn't thinking of it in a sense of "versus" stalling when I wrote up this issue, but this would definitely reduce the number of situations where an implementer following the RFC as written would need to do that.

For UUIDv1, it seems extremely unlikely a single stateful generator could generate enough UUIDs (4096) to loop back to the same clock sequence value without the timestamp having changed on most systems. There may be systems in use that don't have a precise clock and could generate 4096 UUIDs before the timestamp changes, so that assumption may not hold true. Considering that, we may still want to include language that explicitly guards against the clock sequence looping back around and generating duplicate UUIDs. I think either erroring or stalling when you detect you're about to generate a duplicate clock sequence is appropriate, so this is why I wouldn't summarize it as a "versus" but more of a "yes and".

I do have a similar open question (#40) as to whether we should error/stall UUIDv7s when the sequence clock is about to roll over. Maybe we want to unify the language for both to have the same behaviors. With UUIDv1/6 starting at a random value, and UUIDv7 starting at 0.

from uuid6-ietf-draft.

bradleypeabody commented on May 14, 2024

@fabiolimace This is a terrific explanation! I'm not sure exactly how, but this clarification should absolutely appear in the new draft.

from uuid6-ietf-draft.

Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation? about uuid6-ietf-draft HOT 4 CLOSED

Comments (4)

TL;DR

The 'count'

The clock sequence

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent