Giter Site home page Giter Site logo

typeid's Introduction

TypeID

A type-safe, K-sortable, globally unique identifier inspired by Stripe IDs

License: Apache 2.0 Join Discord Built with Devbox

What is it?

TypeIDs are a modern, type-safe extension of UUIDv7. Inspired by a similar use of prefixes in Stripe's APIs.

TypeIDs are canonically encoded as lowercase strings consisting of three parts:

  1. A type prefix (at most 63 characters in all lowercase ASCII [a-z])
  2. An underscore '_' separator
  3. A 128-bit UUIDv7 encoded as a 26-character string using a modified base32 encoding.

Here's an example of a TypeID of type user:

  user_2x4y6z8a0b1c2d3e4f5g6h7j8k
  └──┘ └────────────────────────┘
  type    uuid suffix (base32)

A formal specification defines the encoding in more detail.

Benefits

  • Type-safe: you can't accidentally use a user ID where a post ID is expected. When debugging, you can immediately understand what type of entity a TypeID refers to thanks to the type prefix.
  • Compatible with UUIDs: TypeIDs are a superset of UUIDs. They are based on the upcoming UUIDv7 standard. If you decode the TypeID and remove the type information, you get a valid UUIDv7.
  • K-Sortable: TypeIDs are K-sortable and can be used as the primary key in a database while ensuring good locality. Compare to entirely random global ids, like UUIDv4, that generally suffer from poor database locality.
  • Thoughtful encoding: the base32 encoding is URL safe, case-insensitive, avoids ambiguous characters, can be selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs (26 characters vs 36 characters).

Implementations

Implementations should adhere to the formal specification.

Official Implementations by jetpack.io

Language Status
Go ✓ Implemented
SQL ✓ Implemented
TypeScript ✓ Implemented

Community Provided Implementations

Language Author Validated Against Spec?
C# (.Net) @TenCoKaciStromy Yes, on 2023-06-30
C# (.Net Standard 2.1) @cbuctok Yes, on 2023-07-03
C# (.NET) @firenero Yes, on 2023-07-15
Dart @mistermoe @tbd54566975 Yes, on 2024-03-25
Elixir @sloanelybutsurely Yes, on 2023-07-02
Haskell @MMZK1526 Yes, on 2023-07-07
Java @fxlae Yes, on 2023-07-02
Java @softprops Yes, on 2023-07-04
OCaml @titouancreach Yes, on 2024-03-07
PHP @BombenProdukt Yes, on 2023-07-03
Python @akhundMurad Yes, on 2023-06-30
Ruby @broothie Yes, on 2023-06-30
Rust @conradludgate Yes, on 2023-07-01
Rust @johnnynotsolucky Yes, on 2023-07-13
Scala @ant8e Yes, on 2023-07-14
Scala @guizmaii Not yet
Swift @Frizlab Yes, on 2023-07-07
T-SQL @uniteeio Yes, on 2023-08-25
TypeScript @ongteckwu Yes, on 2023-06-30
Zig @tensorush Yes, on 2023-07-05

We are looking for community contributions to implement TypeIDs in other languages.

Command-line Tool

This repo includes a command-line tool for generating TypeIDs. To install it, run:

curl -fsSL https://get.jetpack.io/typeid | bash

To generate a new TypeID, run:

$ typeid new prefix
prefix_01h2xcejqtf2nbrexx3vqjhp41

To decode an existing TypeID into a UUID run:

$ typeid decode prefix_01h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881

And to encode an existing UUID into a TypeID run:

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

Related Work

  • UUIDv7 - The upcoming UUID standard that TypeIDs are based on.

Alternatives to UUIDv7 that are also worth considering (but not type-safe like TypeIDs):

typeid's People

Contributors

akhundmurad avatar ant8e avatar broothie avatar cbuctok avatar conradludgate avatar faustbrian avatar firenero avatar frizlab avatar fxlae avatar github-actions[bot] avatar guizmaii avatar janwennrich avatar johnnynotsolucky avatar lagoja avatar loreto avatar mikeland73 avatar mistermoe avatar mmzk1526 avatar ongteckwu avatar softprops avatar tencokacistromy avatar tensorush avatar titouancreach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

typeid's Issues

Origin of base 32 alphabet

Hello,

This is more a comment than a true issue.
I was surprised to see Crockford's alphabet
for something that looked familiar to me and was only from 2019.
Don't misunderstand me, I think this alphabet is convenient and maybe he was the first to propose exactly this choice of letters;
hence, he may deserve to be credited;
but for those interested in the topic, I suggest to look at:
https://ux.stackexchange.com/questions/53341/are-there-any-letters-numbers-that-should-be-avoided-in-an-id
from 2014
with link to:
https://github.com/tytso/pwgen/blame/master/pw_rand.c
from 2005
and if there had been GitHub since 1970, I would guess something earlier could be more easily found.
I think definitely https://www.crockford.com/base32.html should add bibliographic references.

Best regards,
Laurent Lyaudet

Add Typescript Implementation

Implemented a typescript implementation. Needed for my work. Will post here soon after I polish it up. Thanks for this project!

RFC: Consider asking the IETF to make a smarter move on UUID V7 before adopting

We should push for a better option (or revision to) UUID V7

From an email I sent the authors of that draft:

Given that the "Unix Epoch" value is going to Y2K us in 2038, thus meaning all the sortability of V7 UUIDs would be broken, any chance you would consider revising that format slightly?

I would propose a Epoch-Period of one or two bits at the front of the UUID field. Then right-shift the actual Unix timestamp one or two bits before injecting those values in the rest of the timestamp field in V7 format. That would only lose us one or two bits of timestamp precision while buying us either 68 or 204 more YEARS before we get Y2Ked

If we actually are building this to have some useful K-sortability, seems crazy to be asking for trouble and adopting a representation that will roll-over in 2038... that's not far away.

Let's consider the ways of automation of the spec validation

TypeID now has a lot of implementations in different programming languages. Therefore, there should be a way of tracking their validation statuses (against spec). I am wondering about a badge, that can show in the README a validation status (failed, succeed) of the particular library. Nevertheless, there should be better solutions.

More compact string encoding

Cool project! I've also recently been thinking about typeIDs (I follow a similar pattern in some of my toy projects) and might be interested in collaborating on a python implementation.

The approach I've been taking is to run a UUIDv7 through base58 (https://pypi.org/project/base58/) before prefixing in order to get an even shorter string encoding. I haven't done this at any particular scale, but I'd be curious if you considered an encoding like this, and if there are any pros/cons you see either way?

When I was looking for an encoding scheme, I had a similar set of requirements:

URL safe, case-insensitive, avoids ambiguous characters, can be selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs

Add C# .NET implementation to the list

Hi,

I've created a performance-oriented implementation of TypeId in C#: https://github.com/firenero/TypeId
What should be done to add it to the list and mark it as verified? I've seen the discussion about automated flow for validation (#23 ) but not sure if there is something in place already.

Also thanks for the reference implementation and examples of valid/invalid typeIds. They were really helpful during developing my library.

On the possible ambiguity when decoding.

Hi,

First of all, cool project! After implementing it myself, I wanted to share some thoughts.

Since TypeIDs have a fixed length with known padding, they can be encoded and decoded in a straightforward manner. However, this does not resolve a certain ambiguity that arises when decoding the suffix, depending on the leftmost character. This is likely already known, but I believe its implications could be made more explicit.

Imagine the first three bits of a UUID to be 100. With padding, that would be 00100. Now, encoding is simple:

encode(00100) = '4'

And so is decoding:

decode('4') = 00100

Then we strip the padding and get back our initial three bits: 100. However, decode('c'), decode('m'), and decode('w') lead to this exact same result, as their binary representation is XX100. After discarding the first two bits, 100 remains in all cases. In short, this implies that if two TypeIDs are identical except for their leftmost suffix characters, and both characters map to the same binary representation after stripping the first two padding bits, the resulting UUID is the same. 32 TypeId suffixes that only differ in the first character map to only 8 unique UUIDs.

Yes... strictly speaking, no TypeID suffix that was encoded as described in the formal specification can ever start with another character than '0'-'7', as these are the only characters with a binary representation beginning with 00..., which is exactly the padding. But the specification does not explicitly restrict a TypeID suffix not to begin with '8'-'z', syntactically, those are still valid TypeIDs.

I'm not suggesting this is a problem. The specification is not incorrect. It just does not (in mathematical terms) describe a bijective function, and I'm concerned that end users of TypeID libraries may intuitively expect the encoding and decoding process to be bijective.

An illustration


This behavior can be observed with the current implementation of the command-line tool from this repository.

First, let's decode and re-encode a TypeID suffix starting with a character from between '0' and '7':

$ typeid decode prefix_01h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

As expected, the encoded result is equal to the original TypeID.

Now, let's take the same TypeID, but replace the leftmost character of the suffix with something between '8' and 'z', which still constitutes a syntactically correct TypeID:

$ typeid decode prefix_81h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881 # same as above

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

But now: prefix_81h2xcejqtf2nbrexx3vqjhp41 != prefix_01h2xcejqtf2nbrexx3vqjhp41

As mentioned above, if we try this for all 32 characters, the command-line tool decodes 32 different TypeIDs to only 8 unique UUIDs:

[0,8,g,r]1h2xcejqtf2nbrexx3vqjhp41 -> 0188bac7-4afa-78aa-bc3b-bd1eef28d881
[1,9,h,s]1h2xcejqtf2nbrexx3vqjhp41 -> 2188bac7-4afa-78aa-bc3b-bd1eef28d881
[2,a,j,t]1h2xcejqtf2nbrexx3vqjhp41 -> 4188bac7-4afa-78aa-bc3b-bd1eef28d881
[3,b,k,v]1h2xcejqtf2nbrexx3vqjhp41 -> 6188bac7-4afa-78aa-bc3b-bd1eef28d881
[4,c,m,w]1h2xcejqtf2nbrexx3vqjhp41 -> 8188bac7-4afa-78aa-bc3b-bd1eef28d881
[5,d,n,x]1h2xcejqtf2nbrexx3vqjhp41 -> a188bac7-4afa-78aa-bc3b-bd1eef28d881
[6,e,p,y]1h2xcejqtf2nbrexx3vqjhp41 -> c188bac7-4afa-78aa-bc3b-bd1eef28d881
[7,f,q,z]1h2xcejqtf2nbrexx3vqjhp41 -> e188bac7-4afa-78aa-bc3b-bd1eef28d881

My thoughts:

  • You could argue that for properly generated TypeIDs, the leftmost suffix character is always between '0'-'7'. That's true, but the problem arises not during encoding, but during decoding. Input strings from external sources (users, clients, etc.) are not inherently trustworthy. Even syntactically correct TypeIDs lead to this ambiguity (as demonstrated above).
  • Possible solutions:
    • Keep everything as it is. Maybe it's not that much of a problem.
    • Or: Do not allow '8'-'z' as the leftmost characters, as no properly generated suffix should ever begin with those characters. This is what I did in my Java implementation that I submitted yesterday, because I initially assumed it was not permitted. Only later I found out that it isn't explicitly specified.

I hope this feedback is in some way helpful.

RFC: Consider allowing `_` as an additional separator within the typeid prefix

The spec, as defined today, only allows for lowercase alphabetic characters in the type prefix. Some users though, might need a way to have a "compound noun" in the prefix. Imagine you want the type to be "user accounts"; today you would have to encode that as a single word useraccounts but it might be preferable to allow a separator to encode it as user_accounts instead.

Remove second rust implementation

Hi! Thank you again for adding my implementation :)
Library that was provided by @conradludgate is very good, and I don't think we should have two rust implementations,
could you please remove mine from the table?

Future specification on binary format

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

RFC: Consider adding one or two extra characters to the encoding for a checksum

One the top comments in the HackerNews discussion was:

I've been doing this kind of thing for years with two notable differences:
...

I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the
value is bogus either by accident or malice. I'm unsure why other implementations don't do this.

Should we do that as part of the official TypeID spec?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.