Giter Site home page Giter Site logo

Comments (105)

bradleypeabody avatar bradleypeabody commented on July 23, 2024 3

Is there any way to distinguish a UUID with monotonic entropy (or whatever) which has been generated correctly, from one which has been generated incorrectly? I don't think so, right?

Correct. Outside the implementation used to generate the UUID value, there is no way to tell for any single UUID. (Just the resulting monotonic property of the sort order of a set of UUIDs generated with the same timestamp value.)

Is it just an implementation guide?

Unless someone wants to argue otherwise, yes, the entire point of this section in the spec is as an implementation guideline for those implementations that wish to implement monotonicity. It is not intended to be a requirement for all implementations. Two important reasons for this are a) not all implementations require monotonicity, and b) there are an increasing number of applications which cannot feasibly provide it anyway because UUIDs are being generated on different systems (e.g. a cluster of web servers each generating records separately, where the cost of providing true monotonicity across the cluster is not worth the effort).

I strongly believe that implementors of UUID generators should be free to implement monotonicity in essentially any way they see fit, since I don't see any one algorithm that will suit a majority of use cases (I can happily list half a dozen cases with radically different requirements if this is in question), and I don't see any material difference in the resulting values except to meet application specific needs - i.e. "UUIDs generated for my application absolutely must be monotonic when generated for xyz purpose" - fine, go ahead and do that, by all means. This does not mean everyone else must do the same. So the intention of the text is (IMO) just "for some cases monotonicity matters, if this is you (or you're making a library and want to include this option), here's a sensible recommendation on how it can be done, good luck".

from uuid6-ietf-draft.

fabiolimace avatar fabiolimace commented on July 23, 2024 3

Furthermore, with this method the actual increment of the counter MAY be a random integer

I also think MAY can be replaced by SHOULD. The "plus one" is a special case of "plus one a random number in a range", but it is obviously no longer random.

from uuid6-ietf-draft.

edo1 avatar edo1 commented on July 23, 2024 3

There is no need to list all possible options if some of them are obviously bad. It is better to use Occam's razor.

Completely agree.

@kyzer-davis, IMO current standard version is too loose, it is unclear and confusing.
You seem to be assuming that each application uses its own customized UUID generator. In the real world, almost all applications use a generic implementation from some library, this implementation should "fit all".

  1. Any reason to keep methods 1 and 2, types A and B?
  2. Why is the recommended counter size not specified?
  3. Why not use SHOULD here?
    Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized.
    SHOULD allows other implementations to be considered RFC-compliant, but makes it clear which approach is recommended.
  4. Why not use SHOULD here?
    Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:
    Compare the current timestamp against the previously stored timestamp.
    If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method and type.
    If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).
  5. The recommended approach to handle clock drift is not specified.
    Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers
    My proposal was to reuse the last timestamp+counter if the jump is small enough (less than ≈2.5s), it will handle leap second and/or small clock drift due to NTP, VM migration, and so on.
  6. The recommended approach to handle counter overflow is not specified.
    My proposal was to increment the timestamp (it shouldn't hurt because the suggested timestamp resolution was 100ns).
    ULID approach is to fail.

from uuid6-ietf-draft.

fabiolimace avatar fabiolimace commented on July 23, 2024 3

SHOULD vs MAY is always tricky. I believe somebody asked me to change this to SHOULD or vice versa. Maybe it was @fabiolimace?

@kyzer-davis , I just wanted to suggest this change below as the Plus One increment type is a special case of Plus N (random increment).

With this increment the actual increment of the counter MAY SHOULD be a
random integer of any desired length larger than zero. ...

I gave up on the idea of creating a Change Proposal issue because the deadline for submitting the draft was running out. We all want to see our collective effort specification published as a standard as soon as possible.

However, it is important that we correct small problems as they are discovered. So I want to take the opportunity to suggest further changes. Forgive me for saying this just now. I was looking forward to the submission of the new draft.


I really liked the two counter methods and the two increment types. The combinations produced by them seemed to resolve all the differences of opinion that there were. I mean if an implementer wanted to use method 1 with type 2 they would be free to do so with full approval of the specification.

However, while implementing a GitHub Gist to test the four combinations of counter methods and increment types, I realized that maybe it's not necessary to have 4 combinations. I'll explain.

  1. The Method 1 and Type A combination seems to be accepted by many of us, although there are small details that are not unanimous, such as the rollover guard bit.

  2. The Method 1 and Type B combination does not seem to be of much practical use, as it greatly reduces the increment space and increases the risk of rollover. Furthermore, it is identical to the Method 2 and Type B combination if we put the two side by side.

  3. The Method 2 and Type A combination seems to be preferred by those who like the monotonic ULID. This combination has the problem of being easy to guess the next value, but it has the advantage of generating UUIDs at a very high speed, for applications that need it.

  4. The Method 2 and Type B combination seems to be preferred by people who don't like the Method 1 and Type A combination. In practice both combinations have the same effect. So the preference of one or the other will depend on the ease of implementing in a given programming language.

So I would like to propose that there are only 2 combinations: Method 1 with Type A and Method 2 with Type B. With that I hope to remove the cons and just keep the pros. Of course you are all free to disagree.

The changes I want to propose in the text are the ones below. I just copy-pasted and changed some modal verbs.

Fixed-Length Dedicated Counter Bits (Method 1):

This references the practice of allocating a specific number of
bits in the UUID layout to the sole purpose of tallying the total
number of UUIDs created during a given UUID timestamp tick.
Positioning of a fixed bit-length counter SHOULD be immediatly
after the embedded timestamp. This promotes sortability and
allows random data generation for each counter increment. With
this method rand_a section of UUIDv7 MAY SHOULD be utilized
as fixed-length dedicated counter bits. In the event more counter
bits are required the most significant, left-most, bits of rand_b MAY
be leveraged as additional counter bits.

With this increment logic the counter method is incremented by one
for every UUID generation. When this increment method is utilized
with Fixed-Length Dedicated Counter the trailing random generated
for each new UUID can help produce unguessable UUIDs.
(Copy-pasted from Type A)

Monotonic Random (Method 2):

With this method the random data is extended to also double as a
counter. This monotonic random can be thought of as a "randomly
seeded counter" which MUST be incremented in the least significant
position for each UUID created on a given timestamp tick.
UUIDv7's rand_b section SHOULD be utilized with this method to
handle batch UUID generation during a single timestamp tick.

With this increment the actual increment of the counter MAY SHOULD
be a random integer of any desired length larger than zero. When
this increment method is utilized with Monotonic Random Counters the
counter ensures the UUIDs retain the required level of unguessability
characters provided by the underlying entropy.
(Copy-pasted from Type B)

The increment of the countar MAY be 1. However when this increment
method is utilized with Monotonic Random the resulting values are
easily guessable. Implementations that favor unguessiblity SHOULD NOT
utilize this method with the monotonic random method.
(Copy-pasted and adapted from Type A)

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 2

I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024 2

implementers usually utilize counters to ensure monotonicity, isn't it?

Using plain counters (random_segment += 1) to implement monotonicity is a naïve approach which can work in a pinch but probably shouldn't be a sanctioned recommendation. I hate to keep harping on the point but the approach taken by oklog/ulid parameterizes the collision probability vs. entropy exhaustion within the time bounds, and is AFAICT a strictly superior approach.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024 2

@fabiolimace Thanks!
@peterbourgon I hit it in my latest commit. Let me know if the text looks good here: https://uuid6.github.io/uuid6-ietf-draft/#name-monotonicity-and-counters
@LiosK Peter's method for unguessibility in Monotonic random is sound from my point of view so I did not add any text pointing back to fixed-length counter method. Both methods solve unguessibility in slightly different ways.
@sergeyprokhorenko I added LiosK's suggestion for folks that care about sortability.


All, I can't say enough how much I appreciate the feedback and support while authoring this text. As a group we always think of new ways to improve what we are doing with this specification.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 2

I don't disagree to the randomly-incremented counter approach because it's another sensible approach to ensure some levels of unguessability, but I don't really like the description of the current draft that generally discourages the fixed-length counter approach and pushes implementers to the naive plus-one monotonic random approach, even though Method 1 and 2 are now essentially the same approach. I would make a few proposals to the current draft:

With this method the rand_a sectionrand_a and rand_b sections of UUIDv7 MAY be utilized as fixed-length dedicated counter bits.

This line recommends an unreasonably short counter length so should be revised or omitted.

This is similar to the Fixed-Length Dedicated Counter Bits method described in the previous bullet but does not suffer from rollover issues, length problems, or wasted entropy bits plagued by fixed-length counter bit positions.

Rollover is a common issue over Method 1 and 2 and wasted entropy bits are no longer an issue of fixed-length counters because they too are initialized randomly. This line sends a wrong message that the discussion about rollovers and seeding is not relevant to Method 2, which is not the case. Implementers must handle rollovers with Method 2 and they can also adopt the portion-counter-initialization approach for Method 2.

As such, UUIDv7 SHOULD utilize this method to handle batch UUID generation during a single timestamp tick.

Now I don't think Method 2 is a clearly superior approach to Method 1 (because they are essentially the same).

Furthermore, with this method the actual increment of the counter MAYSHOULD be a random integer of any desired length instead of using simplistic "plus one" logic found with Fixed Length Dedicated Counters.

The randomly-incremented approach should be considered default if the whole 74 bits are dedicated to a counter because the plus-one approach damages to unguessability for no reason.

However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented by one or by a random integer larger than zero. When utilizing a randomly seeded counter alongside Method 1; the The random MAY be regenerated with each counter increment without impacting sortability.

The randomly-incremented counter approach is applicable to any types of counters. So is the regeneration.

Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. General Guidance is that the counter SHOULD be at least 12 bits but can be longer. Care SHOULD also be given to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch. General guidance is that the counter SHOULD be at least 12 bits and at most 40 bits.

I'd provide a clearer guidance here for unguessability.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 2

I'd provide a couple of view points about recent updates. I don't expect these to be incorporated into the RFC but they will be helpful to make my points clear.

Plus-one approach vs. random increment approach

I don't necessarily disagree to the random increment approach because I did choose that a decade ago, but I'd point out that the random increment approach is not always superior to the plus-one approach. It's hard to analyze the statistical characteristics of the random increment approach, but based on the previous discussion and this Zendesk article, the random increment approach is likely to have weaker collision resistance than the plus-one approach if we measure collision resistance by the probability of at least one collision. Unguessability obtained by the random increment approach is not a free gift but comes at the sacrifice of collision resistance in the distributed use cases.

That being said, given the 74-bit counter/entropy space, it should be reasonable to spare, say, 32 bits for unguessability. The 74-bit counter with 32-bit random increment still provides stronger "collision resistance" (as defined above) than the entire 74-bit randomness. This is just a matter of finding a right balance, and the 32-bit random increment is likely a right balance.

42-bit fixed-length randomly seeded plus-one counter + trailing 32 bit random vs. 74-bit monotonic random + 32-bit random increment

Again, it's hard to analyze the statistical characteristics of the random increment approach, but the above two sets of techniques come out very similar bits; the leading 41 bits are relatively constant and the trailing 33 bits are almost random. So I assume they have similar characteristics in terms of collision resistance and unguessability. That's why I argued the fixed-length approach should be treated at least as prominent as the monotonic random in the draft RFC.

That said, IMO the 74-bit monotonic random + 32-bit random increment approach is worse than the fixed-length approach because of the following two reasons:

  • The monotonic random approach requires 74-bit arithmetics, which need BitInt math and software emulation in the vast majority of languages and hardware, respectively. The 42-bit counter fits in uint64, int64, and the safe int range of double.
  • It's more complex with the random increment approach to estimate the average number of IDs that can be covered by a given counter bit-length. Also, with the plus-one approach it's very easy to calculate the guaranteed number of IDs with the (n - 1)-bit random initialization technique, which is not the case at all with the random increment approach.

I usually ignore complexity because library authors are generally supposed to be professionals who make things done; however, I don't see any strong technical superiority of the 74-bit monotonic random + 32-bit random increment approach that justifies the above complexity.

Anyway, I don't necessarily disagree to the monotonic random + random increment approach because at least it is not unsafe. But this approach should never be introduced more prominently in the new RFC than the fixed-length + trailing random approach.

Priority of unguessability

I've been emphasizing unguessability, but I'd be clear that unguessability is not the priority of UUIDv7. We can't definitely say "XX-bit random provides sufficient unguessability" or something. Even 32 bits are not likely enough if an application is exposed to off-line attacks. Again, it's a matter of right balance. A 74-bit counter space is an overkill for most of applications, and thus I would recommend to spare some bits for unguessability by default so implementers do not expose a potential attack surface for no reason.

Conclusion

Implementers usually face many matters of finding right balances, so it's great if we can agree on and provide a sensible set of default parameters. But if we can't, I would like the new RFC to put guardrails around unsafe choices and extreme approaches to prevent implementers from falling in common pitfalls.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024 1

Please include this in version 7:

A counter MAY be placed instead of the left side of the random section, immediately after the timestamp. The counter is designed to keep UUIDs monotonic within a timestamp step, by default within a millisecond. The default counter length is 16 bits for the millisecond timestamp. The counter MUST be reset every timestamp step. The counter MUST be frozen just before overflowing up to the next timestep step. For each UUID instance, the random section MUST be updated despite the use of the counter.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024 1
  1. The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.

Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.

l beleive that all random bits of every uuid MUST be randomized despite using counter. Such monotonic counter scheme is secure.

  1. If random part is close to 111...111, then you have too short counter

Again, true of any counter scheme.

No, if we have 16 empty bits of separate counter for every millisecond.

Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.

You are correct.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024 1

@kyzer-davis

  1. Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.

  2. I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024 1

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

@kyzer-davis Yes please. That would be great!

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024 1

@sergeyprokhorenko ...did you even check #76? With this update both Method 1 and Method 3 are recommended counter solutions for v7E.

I deleted my comment

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 1

I am against discrimination in relation to (16 bit) zero-starting counter.

This is a SHOULD scope so zero-starting counters are permitted. Zero-starting counters are so dangerous that they should not be recommended explicitly.

metadata

metadata stuff is application-specific and should generally be used in v8 only IMO. I don't have a strong objection (except new var value) against the UUIDv7 design currently compiled in the draft RFC, and thus the above proposal does not intend to alter the current UUIDv7 design.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024 1

@LiosK,

but some implementers might get lost

I agree, I have iterated on that section a ton. It is close but I agree that it needs a bit more shape.

  • Personally I am leaning towards dropping method 2 altogether since it is basically method 3. That should help reduce some text and avoid some confusion.
  • As for the section, I mauled it over the weekend and have some ideas on how to reorganize. I mainly need to separate the methods from the method sub-topics. That will go a long way to promote readability.

@sergeyprokhorenko, do you think that line is that bad? I can yank it as the section conveys the same info with or without it.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024 1

@LiosK,

I updated counters section once more in #85 to drop method 2 and re-organize the text for readability purposes.
Let me know what you think and let's discuss here.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 1

Just one bit and shorter randomly seeded counter will save the day

Deal! Awesome. I will draft a change proposal to the current text later.

It's actually very important to let implementers realize that an n-bit counter can be initialized by any of n-bit random, (n - 1)-bit random, (n - 2)-bit random, ... or 0-bit random (i.e. zero) and they can choose the ones most appropriate to their applications.

using bit by bit of UUID to search using the binary search algorithm

That's actually a very good point.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 1

@kyzer-davis, I would propose the following change to "Fixed-Length Dedicated Counter Seeding" section. This proposes to initialize an n-bit counter at n-bit random numbers by default and at (n - 1)-bit random numbers (instead of zero) in the high work load scenarios where many IDs are generated per millisecond and/or the counter rollover should be avoided.


Implementations utilizing fixed-length counter method MAYSHOULD randomly initialize the counter with each new timestamp tick.

...

Implementations utilizing fixed-length counter method MAY also choose to initialize the counter at zero to ensure the full bit-space is utilizedsmaller-bit random numbers than the counter length (e.g. initialize a 24-bit counter at 23-bit random numbers) to ensure a certain number of possible increments is guaranteed and help avoid counter rollovers. This approach has less entropy and more guessibility but ensures the most of the counter bit spacetypically claims 1-bit entropy but in effect ensures no counter rollover occurs.


I would propose the following change to "Fixed-Length Dedicated Counter Length" because the above proposal obscures the reason why too long counters should be avoided. Also, I think now it's safe to recommend a longer counter than 24 bits because implementers will not easily choose a zero-starting counter thanks to the above proposal.


Care MUST be taken to select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision SHOULD require a larger counter than a timestamp with nanosecond precision. In addition, care MUST be taken to leave sufficient trailing bits for randomness to ensure adjacent UUIDs are not predictable. If the entire rand_a and rand_b fields of UUIDv7 are utilized as a single counter, the next UUID after one UUID in a given millisecond is easily predicted by adding one to the previous UUID, which is a situation that should generally be avoided. General Guidance is that the counter SHOULD be at least 12 bits and no more than a maximum of 2440 bits.


I'd also like to propose to omit Method 2 because Method 2 is essentially a 74-bit (v7) / 72-bit (v7E) fixed-length counter, so all the discussions about fixed-length counters apply. Plus, Method 1 isn't that different from Method 2 any longer after the above proposals because now Method 1 also resets counters at random numbers. I am also concerned that Method 2 pushes implementers to the same pitfall as that ULID Monotonic has fallen: perfectly predictable adjacent IDs like the following:

062273da-cfc3-75dd-a8ef-2ca2f3d45f71
062273da-cfc3-75dd-a8ef-2ca2f3d45f72
062273da-cfc3-75dd-a8ef-2ca2f3d45f73
062273da-cfc3-75dd-a8ef-2ca2f3d45f74

Wording might need to be reviewed and changed; English is not my primary language. Thanks.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024 1

I see some folks on, I am in here making edits to the XML. My current verbiage to add at the end of "Monotonic Random (method 2)"

Furthermore, with this method the actual increment of the counter MAY be a random integer of any desired length instead of using simplistic "plus one" logic found with Fixed Length Dedicated Counters. This random increment of the counter ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 1

Method 1 proposes to utilize say 40 bits as a randomly initialized counter and leave the remaining 32 bits for entire randomness renewed every time. It's not guessable, is it?

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024 1

@LiosK, I merged the changes which are live now via https://uuid6.github.io/uuid6-ietf-draft/#name-monotonicity-and-counters

The text should now describe:

  • Counter Method 1, Increment Type A
  • Counter Method 1, Increment Type B
  • Counter Method 2, Increment Type A
  • Counter Method 2, Increment Type B

I removed the harsh verbiage against fixed-length counters. Both methods solve the problems. The bits being tallied are just in two different positions within the UUID layout. We are describing the solutions and their possible upsides/downfalls/challenges. Implementations pick the one that seems correct for them.

I also made the changes to the fixed-length seeding, length, rollover parts as per your comment #60 (comment)

Lastly, I abstracted the "when to increment" sub-section to point to Counter Methods and Increment Types.


@sergeyprokhorenko, the text reads:

applications that care about absolute monotonicity and sortability SHOULD freeze the counter

This should be sufficient to cover the topic.


Edit: Personal note, I have always been a proponent of the fixed-length counter method. Drafts 01 and 02 never had monotonic random because of this personal bias. Any harsh verbiage against fixed-length counters in the previous stages of Draft 03 was not intended.

As we have iterated via this thread, both methods have proven sufficient at solving this problem. As a result in this latest revision both should be presented as equal solutions. If this is not the case please suggest an edit.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024 1

As we have iterated via this thread, both methods have proven sufficient at solving this problem. As a result in this latest revision both should be presented as equal solutions. If this is not the case please suggest an edit.

@kyzer-davis, Now the text of section Monotonicity and Counters is confusing. It mixes the new and the outdated, the great and the bad. There is no need to list all possible options if some of them are obviously bad. It is better to use Occam's razor.

I'm convinced that both methods 1 and 2 should be dropped in favor of paragraph Fixed-Length Dedicated Counter Seeding. Both methods 1 and 2 are now obsolete thanks to LiosK's great idea.

We also need to drop Random Increment (Type B) in favor of Plus One Increment (Type A). See rationale from LiosK: #60 (comment) and from me: #60 (comment)

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024 1

@edo1, great feedback on the section.

On the topics:

  1. This section of document is written for any implementers (custom or those writing a library). But you are right, many will likely use a library and somebody has already made these decisions for them. However, the goal from Brad and I is to detail how to solve the problems so that the implementer of said library can make their choices based on our research here.
    1. Counter Methods 1 and 2 both solve the batch UUID creation problem when timestamp has not incremented. Each with some pro/cons that I want to clearly define.
    2. Increment Types A and Type B both provide some other pro/cons.
    3. There are many advocates for each; removing one will likely cause negative feedback from one group I am sure.
  2. We have a counter range specified. Different languages have different speeds and may produce more/less than others. They can test and choose a length that fits them.
  3. SHOULD vs MAY is always tricky. I believe somebody asked me to change this to SHOULD or vice versa. Maybe it was @fabiolimace?
  4. Same as previous bullet.
  5. I like this approach. I can add it to Counter Rollover Handling. Could you open a new Change Proposal issue for me to track?
  6. Again, N-1 works, increasing timestamp as rollover guard works. They both solve the problem. Neither are wrong. Similar to Method 1/2 and Increment Type A/B. I can describe this rollover guard along with N-1 in Counter Rollover Handling section and let an implementation choose. To differentiate them from Counter Method and Increment Type I can define these as Rollover Guard Option i and ii. Same as previous. If you could open a Change Proposal I can get this into Draft 04.

Plus, remember, everything in this section also pertains to UUIDv8 where somebody may want to do it their own way. Rather than spend a ton of cycles on R&D; we have already vetted most problems/solutions and documented the pros/cons. They can implement what they need and be on their way.


Edit: Saw your other comment on timestamp length thread:
I agree, implementers will likely not expose every single knob from the best practices section. They will likely have UUIDv7 libraries will likely have the minimum required parts:

  • Timestamp
  • Entropy
  • This is is the fastest as you said (uuid_generate_v7_fast())

And MAY layer in a counter of parts:

  • Counter Method
  • Increment Type
  • Rollover Guard Option
  • of which will make things slower but improve collision resistance i.e uuid_generate_v7()

The folks using those libraries will take what they get and not need to think about all the other "best practices" defined in that section.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024 1

I'd rather suggest to subsume the monotonic random under the fixed-length counter because the monotonic random is just a 74-bit variation of fixed-length counters, but I don't have a strong opinion here as long as unsafe choices are properly quarantined.

it has the advantage of generating UUIDs at a very high speed

It's a very good point. I've been observing that UUIDv7 with counter is much faster than that without counter because even the blazing fast arc4random implementations are still much slower than incrementing a counter.

from uuid6-ietf-draft.

broofa avatar broofa commented on July 23, 2024

A counter MAY be placed instead of...

Is there any benefit to this over simply treating the entire rand field as a counter, the way ULID does? In which case, the counter is effectively at the very end ("far right") side of the random field.

The advantage of ULID approach is that counter rollover so statistically unlikely as to not warrant mention. Whereas placement at the left side of the field requires reasoning about creation rates, counter sizes, and rollover error cases.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

Is there any benefit to this over simply treating the entire rand field as a counter, the way ULID does?

  1. The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.
  2. If random part is close to 111...111, then you have too short counter

from uuid6-ietf-draft.

broofa avatar broofa commented on July 23, 2024
  1. The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.

Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.

  1. If random part is close to 111...111, then you have too short counter

Again, true of any counter scheme.

Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

oklog/ulid.Monotonic provides monotonicity parameterized by a user-provided inc value. Higher inc values provide less predictable output at the cost of fewer overall possible outputs (i.e. higher probability of collision).

IMO — specs should not take any stance on any property of random portions of IDs. Specs should define them as random, and if implementations want to provide monotonicity within some bounds, that's above-and-beyond spec guarantees.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@peterbourgon If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either. Therefore, the specification should contain at least a more or less satisfactory solution as an option.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.

@LiosK I like your solution very much. I would add freezing just before overflowing the counter, although this is unlikely.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either.

I don't think I agree, but I also don't think this is a bad thing. In fact, I think it's correct! The spec defines the random portion of the ID to be random. If those bytes happen to be monotonic, that's a detail of the implementation.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@peterbourgon @sergeyprokhorenko
I just merged Draft 03 into master.
Could you take a look at the latest Monotonicity and Counters section? I believe it already hits on both of your points.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.

@peterbourgon This is not true. It is entirely possible to prevent errors during the generation of UUIDs. It is better to prevent errors if possible.

If you mean collisions of UUIDs, then such errors do not occur during generation, but when writing to the table.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@sergeyprokhorenko,

I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

I simply say you SHOULD use a monotonic random and for fixed-length counters you SHOULD use UUIDv8 because of the heated debates around counter length, rollover handling, and initialization of the counter have so many variations based on user input that it is better fit for UUIDv8.

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.

The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.

@peterbourgon It sounds like "run out of numbers" :))

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

@kyzer-davis No, I mean #60 (comment) I like it very much

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)

System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.

@kyzer-davis It seems to me that the specification should suggest to developers a safer option, and not an unexpected pause in the generation of UUIDs when the counter overflows. Overflows always remind me of the Ariane flight V88 disaster

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)

System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

This is about the low quality of pseudo-random numbers, but not about their deficit. Poor quality generators generate a repeating sequence of pseudo-random numbers, which leads to collisions.
The problem of low quality of pseudo-random numbers is already solved in the spec: https://uuid6.github.io/uuid6-ietf-draft/#name-unguessability

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@sergeyprokhorenko

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

@kyzer-davis Yes please. That would be great!

Let me put together a Change Proposal template for this and get it in Draft 03 Phase 3 PR.

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

@kyzer-davis No, I mean #60 (comment) I like it very much

I apologize, somehow I missed this comment entirely. It does seem like a valid approach but seems pretty similar to Fixed-Length Dedicated Counter Bits (Method 1) at the moment. The difference is in semantics. My text states to dedicated a portion of bits after the timestamp; this text states to dedicate a portion of the most significant, left-most, part of the random. 9/10 times they will likely be the same thing.

I could update Fixed-Length Dedicated Counter Seeding: to describe this practice. I took a quick swing below:


Fixed-Length Dedicated Counter Seeding:

  • Implementations utilizing either fixed-length counter method MAY randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented by one.
  • When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting storability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments.
  • A randomly seeded counter alongside Method 2 is more-or-less the same as Monotonic Random (Method 3).
  • Implementations utilizing either fixed-length counter method MAY also choose to initialize the counter at zero to ensure the full bit-space is utilized and help avoid counter rollovers. This approach has less entropy and more guessibility but ensures the most of the counter bit space.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

I deleted my comment

As did I :)

Let me know what you think about that proposed "Fixed-Length Dedicated Counter Seeding:" text.
If it looks good I can write up a change proposal template and possibly sneak it in the open PR.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

@sergeyprokhorenko

This is about the low quality of pseudo-random numbers, but not about their deficit.

I believe you're mistaken. Both are issues, but I'm pointing out the fact that entropy — crypto-grade and pseudorandom both — is a resource which can be depleted. If something consumes that resource, then it necessarily must deal with the possibility that there is no entropy available in a given period of time.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@peterbourgon You probably mean the insufficient speed of generating pseudo-random numbers and UUIDs. I think that multi-threaded generation and buffering of pseudo-random numbers could solve this problem.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

No, I don't mean that either :) I mean literally that entropy is a stream of bytes which can pause or stop altogether for unbounded periods of time. Please read the linked article.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

I feel the current content of Monotonicity and Counters section to be a little bit confusing. It's basically very permissive, so personally I feel okay with this because almost everything is permitted, but some implementers might get lost.

I think the decisions left up to implementers by this section could be summarized to the following two points, if the spec clearly required that the counter field be placed right after the timestamp and the remaining bits be filled with random:

  1. Zero-starting counter or randomly seeded counter
  2. Length of counter bits from 0 bits to 72 bits

Under this framework, the three methods can be described as:

  • Method 1: 12-24-bit zero-starting or randomly seeded counter
  • Method 2: 72-bit randomly seeded or partially (i.e. significant bits frozen for a given timestamp tick) randomly seeded counter
  • Method 3: 72-bit randomly seeded counter
  • Single-node without batch: 0-bit counter

By #60 (comment), I meant the spec should clearly state the counter should generally be limited to 40 or 56 bits. Although unguessability is definitely not the priority, I just don't want to see implementers carelessly leave a potential attack surface.

Then, how about omitting or amalgamating the Method 1, 2, 3 description and reorganizing the related sections like the following?

4.3. UUID Version 7E

Current content + rand_a and rand_b together MUST be dedicated to 0-72-bit counter and remaining random, in this order.

5.2. Monotonicity and Counters

AS IS

Fixed-Length Dedicated Counter Length

Current content + UUIDv7 SHOULD utilize 0-40-bit counter.

Fixed-Length Dedicated Counter Seeding:

Current content + UUIDv7 SHOULD utilize randomly seeded counter; UUIDv8 may utilize zero-starting counter.

Fixed-Length Dedicated Counter Rollover Handling

AS IS

Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:

AS IS

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

Current content + UUIDv7 SHOULD utilize randomly seeded counter; UUIDv8 may utilize zero-starting counter.

@LiosK I am against discrimination in relation to (16 bit) zero-starting counter.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

Current content + rand_a and rand_b together MUST be dedicated to 0-72-bit counter and remaining random, in this order.

@LiosK
I propose this text:

Current content + rand_a and rand_b together (72 bit) MUST be dedicated to counter, random and metadata, in this order.

If zero-starting counter is used then the segments MUST have following lenth:

  • 16-24-bit zero-starting counter
  • 40-56-bit random
  • 0-16-bit metadata

If randomly seeded counter is used then the segments MUST have following lenth:

  • 24-40-bit randomly seeded counter
  • 32-bit random
  • 0-16-bit metadata

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

Zero-starting counters are so dangerous

@LiosK
There is nothing dangerous in zero-starting counters

metadata stuff is application-specific and should generally be used in v8 only IMO

There is no logic here. IMO is not a rationale. Populating the metadata is simply specified by the function parameter, and so the metadata does not need to be detailed in the UUIDv7 specification. We want to get rid of composite keys, don't we? If metadata is prohibited in the UUID, then it will have to be dragged through additional fields, increasing chaos in the database.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

There is nothing dangerous in zero-starting counters

A zero-starting counter sacrifices entropy unless a constant number of IDs are generated per millisecond and an accurate estimate of the number is available. That doesn't happen in an ordinary use case.

metadata stuff is application-specific

is the rationale. Application-specific metadata damage to the universal uniqueness across applications and thus should be used with v8, which is application-specific by design. This is the same reason as why machine IDs and shared knowledge schemes are not utilized by v7.

We want to get rid of composite keys, don't we?

No. It's much better to use a separate metadata field and composite key than to parse UUID to extract the metadata. And, if you don't extract the metadata from UUID, then you don't either need to encode them in UUID.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

There is nothing dangerous in zero-starting counters

A zero-starting counter sacrifices entropy unless a constant number of IDs are generated per millisecond and an accurate estimate of the number is available. That doesn't happen in an ordinary use case.

sacrifices entropy? You forgot about 40-56-bit random

metadata stuff is application-specific

is the rationale. Application-specific metadata damage to the universal uniqueness across applications and thus should be used with v8, which is application-specific by design. This is the same reason as why machine IDs and shared knowledge schemes are not utilized by v7.

damage to the universal uniqueness? Uniqueness is not going anywhere. You forgot about random again

We want to get rid of composite keys, don't we?

No. It's much better to use a separate metadata field and composite key than to parse UUID to extract the metadata. And, if you don't extract the metadata from UUID, then you don't either need to encode them in UUID.

Let's leave it up to implementers to decide how many fields are more convenient for them, and which architecture provides better information system performance. Parsing is necessary in rare cases, while the use of a composite key always slows down the information system.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@LiosK You'd better protest sacrificing 8 bits of randomness to a completely useless var_ver. In addition, var_ver breaks rand into two segments, which complicates the UUID generation algorithm. It would be better to make var_ver optional, filling it with random content by default.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

You'd better protest sacrificing 8 bits of randomness to a completely useless var_ver.

I did it for the one bit wasted by the new var. My allegiance to the universal uniqueness of v7 is so consistent that I am against all of the new var, zero-starting counter, and metadata field.

Let's leave it up to implementers to decide how many fields are more convenient for them, and which architecture provides better information system performance.

v8 is there for such users.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

Let's leave it up to implementers to decide how many fields are more convenient for them, and which architecture provides better information system performance.

v8 is there for such users.

such users are the most users, and all advanced users who use Data Vault Modeling (most banks) or Anchor Modeling (leading banks and marketplaces). All these users suffer from the use of composite keys.

v8 is not recommended:
UUID version 8E provides an RFC-compatible format for experimental or vendor-specific use cases... UUIDv8E SHOULD only be utilized if an implementation cannot utilize another UUID in this document or [RFC4122]
This is the stigma, untouchables caste, unwanted child. The authors outlined the most primitive and non-life scenario for v8. Therefore v8 will not be implemented by DBMS developers. v8 will never be used, and it's safe to drop this concept from the spec. Your advice to use v8 is just a polite form of refusal.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

High-level question, please redirect me if this isn't the right forum —

Is there any way to distinguish a UUID with monotonic entropy (or whatever) which has been generated correctly, from one which has been generated incorrectly? I don't think so, right? Assuming not, why are these kinds of details part of the spec at all? What purpose is served by defining them here? Is it just an implementation guide?

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

here's a sensible recommendation on how it can be done, good luck

Love this.

Monotonicity must be optional as it requires some sort of coordination that is just an unnecessary complexity for distributed applications. That said, many use cases still need monotonicity, and counters involve several caveats. That's why I think a sensible recommendation should be provided in the spec. I just don't want to see readers carelessly invent a vulnerable implementation such as:

Inappropriate use of zero-starting counter that simply wastes entropy for no reason:

062273a6-93df-7000-0000-31912d9c4cd9
062273a6-a608-7000-0000-3c9592dd85f9
062273a6-bf3c-7000-0000-77c3d673fbe0
062273a6-dd45-7000-0000-a750c1a3e1ec

Tail counter that comes out perfectly predictable results:

062273da-cfc3-75dd-a8ef-2ca2f3d45f71
062273da-cfc3-75dd-a8ef-2ca2f3d45f72
062273da-cfc3-75dd-a8ef-2ca2f3d45f73
062273da-cfc3-75dd-a8ef-2ca2f3d45f74

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

Is it just an implementation guide?

@bradleypeabody's point hits the nail on the head!

Adding a quote I saw somewhere during my IETF onboarding research for how to define an internet standard draft:

"Internet standards tended to be those written for implementers. International standards were written as documents to be obeyed"

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

Inappropriate use of zero-starting counter that simply wastes entropy for no reason:

062273a6-93df-7000-0000-31912d9c4cd9
062273a6-a608-7000-0000-3c9592dd85f9
062273a6-bf3c-7000-0000-77c3d673fbe0
062273a6-dd45-7000-0000-a750c1a3e1ec

@LiosK,

No need to distort the truth! 16-bit zero-starting counter occupies 2 symbols only, not repeats 8 symbols like 7000-0000!

Collision resistance is determined not only by the random segment, but also by the rest of the UUID segments. For identifiers to match, they must be generated in the same millisecond, have the same random segments, the same counter values ​​(this is impossible if they are generated by the same generator) and metadata values, and fall into the same database table. It is wrong to reduce collision resistance to the random segment.

By the way, frozen for a millisecond 24-40-bit random in your favorite randomly seeded counter simply wastes entropy for no reason.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

16-bit zero-starting counter occupies 2 symbols only

16-bit zero-starting counter occupies 4 symbols and that is a big enough problem. Anyway, the above example is perfectly valid to illustrate the underlying concern.

Collision resistance is determined not only by the random segment

It doesn't mean the random segment can be wasted for no reason. In particular, as of the pre-draft 03, entropy is the key factor to guarantee the uniqueness of v7 and thus has to be handled carefully.

For the sake of understanding of readers, it is best to keep UUID specs to ensure universal uniqueness to a reasonable extent, and the current draft does a very good job by separating application-specific items that contradict with uniqueness (e.g. metadata) to v8 and saying explicitly "UUIDv8E's uniqueness will be implementation-specific and SHOULD NOT be assumed." I would support this approach.

randomly seeded counter simply wastes entropy for no reason.

Please take a look at the previous discussion and the article @fabiolimace found out. Randomly seeded counters do not sacrifice the universal uniqueness. Rather, they improve collision resistance in some scenarios.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

16-bit zero-starting counter occupies 4 symbols and that is a big enough problem. Anyway, the above example is perfectly valid to illustrate the underlying concern.

(4/128)*100%=3% is not a big problem, and these 3% are not wasted - they improve collision resistance in monolitic databases with only one generator per table (alike autoincrement) and in highload applications also (with thousands of records per millisecond). Therefore zero-starting counter does not sacrifice universal uniqueness. The same can be said about 16 bits of metadata (metadata also improves collision resistance).

Zero-starting counter is better for monolitic databases, and randomly seeded counter is better for distributed applications

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

the entire point of this section in the spec is as an implementation guideline for those implementations that wish to implement monotonicity. It is not intended to be a requirement for all implementations.

Got it, thanks for the explanation. It doesn't seem like there is consensus on this topic even among the participants in this discussion. Given that, I wonder if the spec should "recommend" anything at all.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

(4/128)*100%=3% is not a big problem

16 / 128 = 12.5% is obviously a big problem. Please double check your numbers.

Zero-starting counter is better for monolitic databases, and randomly seeded counter is better for distributed applications

A 40-bit randomly seeded counter accommodates ~550 billion IDs per millisecond on average. Although at a 2^16 / 2^40 = 0.000006% chance the 40-bit randomly seeded counter fails to provide a room for 65,536 IDs, an application can just wait up to one millisecond for the next tick. If the 0.000006% is not acceptable, an application can initialize the 40-bit counter with a 39-bit random number. This approach claims one bit, but it guarantees the space for ~550 billion IDs. Or you can use a 17-bit counter and 16-bit random numbers if you like 65,536 or care about unpredictability, though v4 should be used if unpredictability is the priority, as clearly stated in the Security Considerations section of the current pre-draft.

Zero-starting counters never win and thus should not be recommended explicitly to prevent readers from falling in this common pitfall.

@sergeyprokhorenko, please think of general readers of the RFC, not only of your specific applications. I'd appreciate your views based on your work on the high-load applications that generate no more or less than 65,536 records per millisecond on a single node (i.e. 16-bit zero-starting counters make sense only under this specific scenario; by the way, such an application claims 1 GB/second or 82.4 TB/day storage or one dedicated 1000BASE-T just to store/transfer 128-bit IDs), but few readers work on such a huge constant application or few general-purpose library authors design their work to support such a use case.

@peterbourgon, some sensible recommendations will definitely be helpful for readers, and it's also valuable to keep discussing over and over to refine the recommendations, isn't it?

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@LiosK,

please think of general readers of the RFC, not only of your specific applications.

I think that general readers of the RFC are highly qualified experts in DBMS and distributed systems, not developers of primitive websites.

I think about important applications, such as payment services, financial services, trading systems, marketplaces, online advertising, information retrieval, booking, logistics services etc. These are not my specific applications.

...such an application claims 1 GB/second...

I care about the peak performance of ordinary information systems, about the speed of searching for complexly structured data, and not about the supercomputers that you imagine.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

I care about the peak performance

That's the point. You got it. A 16-bit zero-starting counter makes sense only when the work load is constant, and randomly seeded counters work much better in any other cases. You've been trying to serve the unrealistically constant supercomputer use cases by sticking to 16-bit zero-starting counters. A 40-bit randomly seeded counter fits much better to your important applications by behaving just as entropy in low load times while supporting 550 billion monotonic IDs per millisecond at peak. Now do you agree not to recommend zero-starting counters?

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@LiosK,
It is not that simple. I care of write peaks to ensure monotonicity. Monotonicity, in turn, will provide a high speed search in the database. Ultimately, it is the search speed that matters. The zero-starting counter is much shorter and therefore it can provide faster search of records in a database table. I don't care at all if the counter fills up completely or not, because the risk of collisions is still low, especially in monolithic databases.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

UUID is a 128-bit opaque identifier and thus searching UUIDs based on a shorter component should not be a general use case. In addition, even if you really want to write such code, a 16-bit randomly seeded counter provides the same property. You can initialize the 16-bit counter with 15-bit random numbers to guarantee a reasonable space or use a 17-bit counter if you really stick to 65,536. Plus, if you don't even fill up the 65,536 counter space, the 48-bit timestamp component is generally sufficient for indexing/sharding purpose. Anyway, my impression here is that you're again talking about supercomputer use cases. Please consider general readers.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@LiosK,

UUID is a 128-bit opaque identifier and thus searching UUIDs based on a shorter component should not be a general use case.

Nothing prevents you from using bit by bit of UUID to search using the binary search algorithm.

In addition, even if you really want to write such code, a 16-bit randomly seeded counter provides the same property. You can initialize the 16-bit counter with 15-bit random numbers to guarantee a reasonable space

This is a great idea. If it is included into the specification, then I am ready to refuse 16 bit zero-starting counters.

Just one bit and shorter randomly seeded counter will save the day: combined counter 0rrrrrrrrrrrrrrr, where 0 is zero-starting counter (for more capacity of the combined counter) and r is 15-bit randomly seeded counter.

Plus, if you don't even fill up the 65,536 counter space, the 48-bit timestamp component is generally sufficient for indexing/sharding purpose.

It depends on the situation.

Anyway, my impression here is that you're again talking about supercomputer use cases. Please consider general readers.

I'm not worried about general readers, but about the costs of general companies, which have to purchase extra computer hardware, software licenses and technical support services due to low performance of information systems.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

@sergeyprokhorenko

I care of write peaks to ensure monotonicity. Monotonicity, in turn, will provide a high speed search in the database. Ultimately, it is the search speed that matters. The zero-starting counter is much shorter and therefore it can provide faster search of records in a database table.

Could you explain how search speed — presumably, finding a specific UUID in a sorted list of UUIDs — is impacted by the bitwidth of specific fields?

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@peterbourgon

87658765767657658765 this value is less
87658765767657658777 this value is greater
19th symbol of the ID is compared

8765876576constant7657658765 this value is less
8765876576constant7657658777 this value is greater
27th symbol of the ID is compared

IDs are compared bit by bit left to right. The shorter ID the faster comparison, which is part of search.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

Are you claiming that evaluating 0x01000000 == 0x02000000 is faster than evaluating 0x00000001 == 0x00000002?

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

It can be faster

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

Deal! Awesome. I will draft a change proposal to the current text later.

@LiosK I look forward to reviewing this one. Please use the text of #85 as the base for any update and I will review for inclusion in that PR.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@LiosK,

Thanks for the text.
Note that Method 2 has already been dropped.
That being said, let me see what I can do to get this into #85 then comment back here for you to review.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@LiosK please check out the changes in Commit

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

I would make a small correction to prevent counter rollover:

General Guidance is that the counter SHOULD be at least 1216 bits and no more than a maximum of 40 bits.

I would also drop these words: and wait for the timestamp to advance which ensures monotonicity is not broken, bacause small breach of monotonicity is not so dangerous for application as breach of UUID generation.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

@kyzer-davis,

I like the wording you have incorporated my proposals to the updated draft. Thanks.

By "Method 2" I meant the Monotonic Random approach. Monotonic random is equivalent to a 74-bit (v7) or 72-bit (v7E) fixed-length randomly-seeded counter and contradicts with the guidance to leave room for sufficient entropy in the random portion after the counter. That's why I would propose to drop the monotonic random section and provide general guidance to limit the counter length up to 40 bits or 56 bits as per this seemingly un-hated comment. Do you have any concerns about this approach?

@sergeyprokhorenko,

I would also drop these words: and wait for the timestamp to advance which ensures monotonicity is not broken,

This is a logical and consistent approach as general guidance because implementers usually utilize counters to ensure monotonicity, isn't it? Or, I'd suggest to tweak this line to limit the scope of this general guidance like: "The general guidance is that applications that care about sortability SHOULD freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken."

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@sergeyprokhorenko, 12 bits were chose as the lower bound due to UUIDv7's rand_a between ver and var which is the logical place for a counter in that bit layout. 16 bits in UUIDv7E makes sense but with both in the text I need to ensure I think about it from both perspectives. I could call it out but then we are splitting hairs within the text.

@LiosK I see what you mean with monotonic random + some trailing bits. Let me maul it over. I may be able to add a one-liner about how to achieve unguessibility with that approach which is again, basically Method 1. Or at least say something along the lines of "For Implementations that require unguessibility Method 1, Fixed Length Dedicated Counters, MUST be utilized."

@sergeyprokhorenko, I like @LiosK's verbiage. We really are just giving some guidance and that is specific to storability. If they don't care they can take the hit. I need that sentence to convey that point and I think adding that simple "that care about sortability" really hits the nail on the head.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@peterbourgon, if I missed a discussion about this I apologize. That being said this is a good point, Monotonic Random does not need to necessarily be a static +1 increment. I'll work this into the text tonight and report back.

from uuid6-ietf-draft.

fabiolimace avatar fabiolimace commented on July 23, 2024

@peterbourgon

I wouldn't say this is a superior approach. The technique used is essentially the same, that is, a succession of increments over a random number. The difference is that the increment steps are random within a range of values. The superiority of this method is only intuitive, that is, without theoretical or practical verification, therefore it is also naive.

However, in fact, this approach can make it a little difficult to predict the next value that will be generated. I'm actually thinking of including a parameter like this in a personal project that generates ULIDs. I think it's a good idea, although it's also an approach that could be considered naive.

As for including this parameterization in the specification, that will depend on a little more discussion here.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

However, in fact, this approach can make it a little difficult to predict the next value that will be generated.

Not "a little difficult" but "impossible", right? e.g.

The technique used is essentially the same, that is, a succession of increments over a random number. The difference is that the increment steps are random within a range of values.

It's categorically different to increment the random portion from a default state of 0 by a random amount each step, vs. incrementing the random portion from a default state of an arbitrary value by 1.

from uuid6-ietf-draft.

fabiolimace avatar fabiolimace commented on July 23, 2024

Not "a little difficult" but "impossible", right?

It was an euphemism. But, yes, that's right. Thanks.

from uuid6-ietf-draft.

fabiolimace avatar fabiolimace commented on July 23, 2024

Furthermore, with this method the actual increment of the counter MAY be a random integer of any desired length instead of using simplistic "plus one" logic found with Fixed Length Dedicated Counters. This random increment of the counter ensures the UUIDs retain the required level of unguessability characters provided by the underlying entropy.

Looks good to me. Maybe @peterbourgon can do a better review of the text, for having introduced us to the approach.

from uuid6-ietf-draft.

peterbourgon avatar peterbourgon commented on July 23, 2024

Unless I misunderstand something, Method 1 produces guessable IDs, Method 2 doesn't.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@LiosK and @fabiolimace

Good feedback. I ran out of time today but I may have some cycles tomorrow to revise the section based on these changes.
Hang tight for an updated set of text we can re-review.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

implementers usually utilize counters to ensure monotonicity, isn't it?

Using plain counters (random_segment += 1) to implement monotonicity is a naïve approach which can work in a pinch but probably shouldn't be a sanctioned recommendation. I hate to keep harping on the point but the approach taken by oklog/ulid parameterizes the collision probability vs. entropy exhaustion within the time bounds, and is AFAICT a strictly superior approach.

This, excuse me, is nonsense that everyone picked up despite LiosK's reasonable but weak objections. It would be shame to see it in such a thoughtful specification. A random increment quickly and unnecessarily depletes the capacity of the counter.

This is justified in oklog/ulid only because the WHOLE random segment is a frozen counter there. However, in the specification we are discussing, only the LEFT SIDE of the random is a counter, while the right side is truly random and changes for each UUID. This provides an excellent level of unguessability.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

@sergeyprokhorenko I added LiosK's suggestion for folks that care about sortability.

@kyzer-davis You should also add a suggestion for folks that care about continuity in generating UUIDs. It would be fair.

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@LiosK Great writeup.

I would like the new RFC to put guardrails around unsafe choices and extreme approaches to prevent implementers from falling in common pitfalls.

This statement hits the nail on the head for the best practices section. I am working on the counter changes. Another update to come.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

The new draft looks good! Thank you, @kyzer-davis. If I might add something, Counter Rollover Handling is also relevant to the monotonic random approach particularly when it's combined with the random increment strategy, and thus it should be placed under "The following sub-topics cover methods behind incrementing either type of counter method:".

from uuid6-ietf-draft.

kyzer-davis avatar kyzer-davis commented on July 23, 2024

@LiosK, good callout. I split into two sections for rollover handling and rollover guarding in the latest PR.

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024
  1. Why not use SHOULD here?
    Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized.
    SHOULD allows other implementations to be considered RFC-compliant, but makes it clear which approach is recommended.

IMO MAY is fine here. With a 24-bit counter, on average 8 million IDs can be generated per millisecond and with a 99.9% chance at least 10,000 IDs can be generated. BTW, I am an advocate of 42-bit counters and in this case 2.2 trillion on average and with a 99.999999% chance at least 10,000 can be generated. These numbers are more than enough for most applications and only few extremely high-load applications and those that prefer short counters are adversely affected by the initialize-full-counter approach. On the other hand, if the spec recommends the initialize-portion-counter approach, the generic libraries might force every application to pay one bit entropy for the overflow guard not really relevant to most applications.

The initialize-portion-counter approach definitely solves some problems but is not a clearly superior approach that should be recommended to everyone.

  1. The recommended approach to handle clock drift is not specified.
  2. The recommended approach to handle counter overflow is not specified.

Your approaches are actually very interesting and creative if combined together. The timestamp state is simply incremented at a counter overflow and such an advanced timestamp state is accepted so long as the difference from the real clock is small enough. The generator will stall, reset state, or raise error only when the real clock is rewound so much or when the generator is "overheated" (i.e. when it experiences too many counter overflows). Very interesting! I would try this in my prototype. I'm not sure if many people think this approach is "right", but I'm kind of sure this works well in common scenarios.

from uuid6-ietf-draft.

edo1 avatar edo1 commented on July 23, 2024

On the other hand, if the spec recommends the initialize-portion-counter approach, the generic libraries might force every application to pay one bit entropy for the overflow guard not really relevant to most applications.

So you're advocating not using "N-1 rollover protection" as the default approach?

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

N-1 approach shouldn't be default. Overflow protection isn't worth one bit entropy for most applications. Plus, counter overflow isn't a serious disaster if handled properly e.g. by your smart handling technique.

from uuid6-ietf-draft.

edo1 avatar edo1 commented on July 23, 2024

Plus, counter overflow isn't a serious disaster if handled properly e.g. by your smart handling technique.

I agree.

One note: in my proposal, the timestamp resolution was 100ns, which is overkill in most cases. So timestamp incrementing shouldn't hurt.
I have some doubts about incrementing the timestamp with 1ms resolution

from uuid6-ietf-draft.

LiosK avatar LiosK commented on July 23, 2024

1 ms is likely overkill in a scenario where distributed nodes use their own clocks, while even 100 ns increment can be problematic if a clock is shared by distributed nodes. I'm now trying to figure out in what conditions the timestamp increment can hurt and in what conditions not.

Anyway, using the timestamp field as a reserve counter is an eye opening but counterintuitive idea that not many implementers will come up with. I think it's worth discussing this here and in the RFC.

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

N-1 approach shouldn't be default. Overflow protection isn't worth one bit entropy for most applications. Plus, counter overflow isn't a serious disaster if handled properly e.g. by your smart handling technique.

I agree to trade the N-1 approach for a timestamp increment. Complex problems require complex but ultimate tools. It's not a choice between green and purple UUIDs with an A/B test among Instagrammers

from uuid6-ietf-draft.

sergeyprokhorenko avatar sergeyprokhorenko commented on July 23, 2024

6. Again, N-1 works, increasing timestamp as rollover guard works. They both solve the problem. Neither are wrong. Similar to Method 1/2 and Increment Type A/B. I can describe this rollover guard along with N-1 in Counter Rollover Handling section and let an implementation choose. To differentiate them from Counter Method and Increment Type I can define these as Rollover Guard Option i and ii. Same as previous. If you could open a Change Proposal I can get this into Draft 04.

It would be clearer if all options had short names instead of numbers, letters and counting sticks.

It seems to me that the merits and demerits of the options are less described in the RFC than in the GitHub issues. Therefore, the implementers will turn into Buridan donkeys. The RFC should indicate the preferred option, of course with justification. It seems to me that in the course of the discussion, the positions of the participants here quickly converge. Therefore, it should not be a problem to choose the preferred option. If something causes controversy, then it has not yet been sufficiently thought out.

It would be desirable to show an exact and unambiguous reference layout of UUID for example, for a marketplace or an international bank.

from uuid6-ietf-draft.

edo1 avatar edo1 commented on July 23, 2024

I've been observing that UUIDv7 with counter is much faster than that without counter because even the blazing fast arc4random implementations are still much slower than incrementing a counter.

I guess it will exactly the opposite in real life. Version with counter needs mutex or another lock.

from uuid6-ietf-draft.

fabiolimace avatar fabiolimace commented on July 23, 2024

I'd rather suggest to subsume the monotonic random under the fixed-length counter because the monotonic random is just a 74-bit variation of fixed-length counters

Yes, monotonic random can be thought of as a 74-bit variation of fixed-length counters. I just thought "hey, maybe we can reduce the cognitive load if we omit the fixed-length counter that increments by a random number" (Method 1 with Type B). I don't have a strong opinion either.

I guess it will exactly the opposite in real life. Version with counter needs mutex or another lock.

I think it might be true if the pseudo-random generator is not cryptographically secure. I just tested this hypothesis in a project of mine using Random() (fast generator) instead of SecureRandom() (secure generator). But the benchmarks did not confirm this. Monotonic ULIDs were always faster than non-monotonic ULIDs, even using a fast non-cryptographically secure generator. Maybe it's due to my implementation, but I'm not sure.

from uuid6-ietf-draft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.