Giter Site home page Giter Site logo

Comments (11)

ericelliott avatar ericelliott commented on August 29, 2024

Hi @broofa, you're right that the v1 implementations are much better than v4 implementations. Sadly, most people don't know the difference, and assume that all UUID/GUID implementations offer the same protections. This is a social / branding problem, not a problem with your implementation.

Most implementations in the wild are purely pseudo-random, and that's the real problem. People see guid or uuid and assume that it's OK. For the stronger v1 algorithm, it probably is. For the others, it clearly isn't.

MAC uniqueness isn't unique at all for different processes running on the same virtual machine, and MAC virtual machine uniqueness is a question worthy of detailed investigation. MAC is not available at all in browsers, and many of today's applications are written using the same code in browsers, mobile clients, and Node, so a MAC based approach is a non-starter.

How does your implementation work in a browser?

from cuid.

broofa avatar broofa commented on August 29, 2024

you're right that the v1 implementations are much better than v4 implementations

To be clear, I didn't say v1 is better than v4. Only that cuid is similar to v1.

In general the more moving parts you have in an id the more difficult it is to rationalize about collision probability. With v1 and cuid, the quality of the clock, fingerprint, and RNG are all debatable. With v4, at least, you only need to worry about the RNG. As long as that's "good", the probability of collision is incredibly small . (e.g. ~0.00000001% for 100 trillion ids).

... but we don't actually know if the RNG is "good". At least, not for Math.random(). For crypto, I believe we can take as given that it is. However, unless and until we have data on the quality of Math.random(), on how unique the components of cuid's fingerprint is, an how often client clocks regress, and so on, I don't think it's possible to have a meaningful conversation as to which is "better". :-/

from cuid.

ericelliott avatar ericelliott commented on August 29, 2024

"I don't think it's possible to have a meaningful conversation as to which is "better"

Philosophically, I agree with you, but practically speaking, collision reliability can and should be measured.

It's easy to run collision tests across lots of hosts these days. It takes a bit more effort to encapsulate those tests in non-proprietary, non-host-specific ways to share with methodology with everybody (otherwise I'd share them and the results in this repo), but I spun up thousands of hosts and dumped many millions of ids into logs, aggregated those logged ids together, and reported errors on collisions.

That's how I developed cuid and decided on the various groupings.

If you haven't run a similar test with your uuid implementation, I recommend that you do.

from cuid.

ericelliott avatar ericelliott commented on August 29, 2024

BTW, I also ran cuid collision tests in production most of the time I was at Tout, testing ~50 million ids per month across all our production browser sessions, and did not log a single collision after the cuid format stabilized.

from cuid.

broofa avatar broofa commented on August 29, 2024

It's easy to run collision tests across lots of hosts these days. It's takes a bit more effort to encapsulate those tests in non-proprietary, non-host-specific ways to share with methodology with everybody (otherwise I'd share them and the results in this repo), but I spun up thousands of hosts and dumped many millions of ids into logs, aggregated those logged ids together, and reported errors on collisions.

BTW, I also ran cuid collision tests in production most of the time I was at Tout, testing ~50 million IDs per month across all our production browser sessions, and did not log a single collision after the cuid format stabilized.

How do you test all the various client / browser permutations? Were you running on lots of EC2 instances (or a similarly consistent cloud environment), or were you actually generating IDs on the whole panoply of device + OS + JS interpreter permutations? Regardless, unless there is a truly egregious flaw in the uuid implementation (read, "increases collision probability by at 1015") I wouldn't expect to see a collision.

I guess I'm content (for now) relying on the fact node-uuid is downloaded 1.5M times/month, and dependend [directly] by 500+ modules. I've only ever had two reports of collisions. One was a Chrome bug (since fixed), the other wasn't clear if it was actually a uuid issue, or some external issue.

from cuid.

ericelliott avatar ericelliott commented on August 29, 2024

Browser ID generation

v1 UUIDs can't be reliably generated in-browser, unless you know something I don't. Cuid was originally designed for universal JavaScript, meaning it has to be able to run on both the client and the server.

AFAIK, node-uuid is not a viable alternative for this use-case.

Were you running on lots of EC2 instances (or a similarly consistent cloud environment), or were you actually generating IDs on the whole panoply of device + OS + JS interpreter permutations?

Generated ids in every client that connected to the app and logged results.

from cuid.

uiteoi avatar uiteoi commented on August 29, 2024

Regarding the quality of Math.random(), the following article is very relevant, showing signs that it is/was somewhat weak and is being improved at least in v8: There's Math.random(), and then there's Math.random()

On another note, this thread is very interesting, it's too bad that github does not offer the possibility to keep these out of the issues section.

I currently use node-uuid v4 for toubkal, the reason I did not choose v1 was that I thought it would be unreliable in browsers, and therefore preferred the wider random numbers of v4, not really knowing which one is actually better.

I am considering other options, but it's hard to make a rational decision without data comparing various alternatives. Is there such a resource allowing this kind of comparison based on real tests, not just rational assumptions, however good these are?

Also, regardless of the method used, I am considering updating client-generated ids on the server out of security concerns that a hacker might try to increase the likelihood of collisions. I still need client-generated ids for performance and offline operation. This is actually trivial to do in Toubkal because application programmers would have nothing to write to make it happen thanks to our dataflow operation.

Any advice or pointers on these matters would be highly appreciated.

Happy new year 2016.

from cuid.

ericelliott avatar ericelliott commented on August 29, 2024

There's Math.random(), and then there's Math.random()

Yes, very relevant. Firefox was much better than V8's implementation, but also severely flawed as a source of unique ids.

I wouldn't expect to see a collision.

I don't get into a car expecting to get into a collision, but like ID collisions, they happen quite a bit when you start looking at large numbers. The only way to really be sure about IDs for distributed apps (such as every web application) is to run collision tests using very large numbers of ids and parallel testing hosts.

As I've mentioned more than once, collision bugs are hard to detect until you get into the millions or tens of millions of monthly active users, so almost all apps will never detect a collision bug even if it exists, because most apps never get that popular.

But for those that do get that popular, by the time the app developer notices, it's often a big problem.

@uiteoi My advice to you is to test your solution in the contexts you'll use it for your apps. If it's a universal app, make sure that includes parallel tests for the browser clients you expect to see in production use.

Don't take my word for it. Run the tests.

from cuid.

uiteoi avatar uiteoi commented on August 29, 2024

Thanks or your feedback.

So, if collisions, however unlikely, do occur, and we agree they do, then we must, at least for reliable systems, detect collisions and fix them.

A system without collision detection is a hopeful system, an unreliable system.

The difference between several implementations of unique ids, is that some will fail less often than others, but they all fail, sooner or later.

For a system without collision detection, reducing the chances of collisions may still be desirable, making the system more lucky in its likeliness to function longer. But as a computer scientist I don't like too much relying on luck, at least for parts of applications that require high-reliability.

As for testing, I don't think this could help much, because it is impractical to test collisions over trillions of ids, and because we already know it will fail at an increasing rate over time.

I am not building apps but a library for others to build apps, as such my role is to provide my users with tools allowing them to achieve high-reliability whenever required. As such I can only conclude that I have to provide my users with tools that:
1/ Allow collision detection
3/ Don't rely on systems that cannot be trusted for id generation such as clients
2/ Provide Identifiers substitution or translation between domains
4/ Make all of the above as easy as possible to specify, ideally making this transparent to application programmers which Toubkal dataflow programming model allows.

In such a reliable system we would not even need any uuid or cuid generators, simple local counters for each domain would work very well. We can still keep uuids for parts of the systems that do not require high-reliability and for which simplicity of implementation is more important than reliability.

from cuid.

ericelliott avatar ericelliott commented on August 29, 2024

As for testing, I don't think this could help much, because it is impractical to test collisions over trillions of ids, and because we already know it will fail at an increasing rate over time.

Testing helps you select the mechanisms least likely to fail, but is perhaps not necessary for every application and every library. Considering the damage I've seen done by colliding ID's, I personally wouldn't trust ID generation to blind faith in any important application.

we would not even need any uuid or cuid generators, simple local counters for each domain would work very well

That very much depends on the mechanisms you use for ID juggling. In my experience, most such systems employed for web applications are slow and error prone. Starting with a fairly strong uniqueness guarantee can increase performance and reduce the surface area available for bugs by reducing the need to juggle ids in the first place to nearly zero.

Typical ID juggling implementations entail sending a cid (client id) to a server on entity creation, which gets supplanted on the server by a server id. This entails separate logic governing entity lookups on the client and the server, which cripples your ability to use universal logic for entity lookup across client and server.

An alternate mechanism shares lookup logic, but requires "dirty tracking" to indicate that the client id has not been replaced by the permanent server-issued id.

Neither of those mechanisms scale in a straightforward manner when you start to scale the server logic itself horizontally, meaning that your code falls victim to the horizontal scaling performance hits I mentioned in the README. I strongly recommend you read that.

Cuid allows you to use the client-generated ID very nearly 100% of the time, and only replace it on the rarest occasion if an error is ever detected (which, as I mentioned before, I have never observed since the Cuid spec stabilized).

from cuid.

uiteoi avatar uiteoi commented on August 29, 2024

I understand the pros and cons of these choices, including scalability issues. With Toubkal I want to be able to allow users to choose which level of reliability they want for each part of their applications. Strong reliability inevitably goes with lower scalability.

Also note that actual scalability of uuid/cuid mechanisms is limited by decreasing reliability with the number of held ids. If ids are short-lived reliability would not decrease over time because their overall count would stay small, but for long-lived ids, the risk of collisions will increase over time as their counts become very large.

Updating ids dynamically with Toubkal is non-issue from a programmer perspective, thanks to the dataflow model that allows updates without actually writing code, but it does come at a performance cost including additional data transfers.

Translating ids dynamically between domains might be a good solution from security, scalability, and performance standpoints, not requiring additional data exchange.

from cuid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.