jetpack-io / typeid Goto Github PK

View Code? Open in Web Editor NEW

2.6K 17.0 35.0 62 KB

Type-safe, K-sortable, globally unique identifier inspired by Stripe IDs

License: Apache License 2.0

Go 80.19% Shell 19.81%

guid uuid uuidv7 typeid

typeid's Introduction

TypeID

A type-safe, K-sortable, globally unique identifier inspired by Stripe IDs

What is it?

TypeIDs are a modern, type-safe extension of UUIDv7. Inspired by a similar use of prefixes in Stripe's APIs.

TypeIDs are canonically encoded as lowercase strings consisting of three parts:

A type prefix (at most 63 characters in all lowercase ASCII [a-z])
An underscore '_' separator
A 128-bit UUIDv7 encoded as a 26-character string using a modified base32 encoding.

Here's an example of a TypeID of type user:

  user_2x4y6z8a0b1c2d3e4f5g6h7j8k
  └──┘ └────────────────────────┘
  type    uuid suffix (base32)

A formal specification defines the encoding in more detail.

Benefits

Type-safe: you can't accidentally use a user ID where a post ID is expected. When debugging, you can immediately understand what type of entity a TypeID refers to thanks to the type prefix.
Compatible with UUIDs: TypeIDs are a superset of UUIDs. They are based on the upcoming UUIDv7 standard. If you decode the TypeID and remove the type information, you get a valid UUIDv7.
K-Sortable: TypeIDs are K-sortable and can be used as the primary key in a database while ensuring good locality. Compare to entirely random global ids, like UUIDv4, that generally suffer from poor database locality.
Thoughtful encoding: the base32 encoding is URL safe, case-insensitive, avoids ambiguous characters, can be selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs (26 characters vs 36 characters).

Implementations

Implementations should adhere to the formal specification.

Official Implementations by `jetpack.io`

Language	Status
Go	✓ Implemented
SQL	✓ Implemented
TypeScript	✓ Implemented

Community Provided Implementations

Language	Author	Validated Against Spec?
C# (.Net)	@TenCoKaciStromy	Yes, on 2023-06-30
C# (.Net Standard 2.1)	@cbuctok	Yes, on 2023-07-03
C# (.NET)	@firenero	Yes, on 2023-07-15
Dart	@mistermoe @tbd54566975	Yes, on 2024-03-25
Elixir	@sloanelybutsurely	Yes, on 2023-07-02
Haskell	@MMZK1526	Yes, on 2023-07-07
Java	@fxlae	Yes, on 2023-07-02
Java	@softprops	Yes, on 2023-07-04
OCaml	@titouancreach	Yes, on 2024-03-07
PHP	@BombenProdukt	Yes, on 2023-07-03
Python	@akhundMurad	Yes, on 2023-06-30
Ruby	@broothie	Yes, on 2023-06-30
Rust	@conradludgate	Yes, on 2023-07-01
Rust	@johnnynotsolucky	Yes, on 2023-07-13
Scala	@ant8e	Yes, on 2023-07-14
Scala	@guizmaii	Not yet
Swift	@Frizlab	Yes, on 2023-07-07
T-SQL	@uniteeio	Yes, on 2023-08-25
TypeScript	@ongteckwu	Yes, on 2023-06-30
Zig	@tensorush	Yes, on 2023-07-05

We are looking for community contributions to implement TypeIDs in other languages.

Command-line Tool

This repo includes a command-line tool for generating TypeIDs. To install it, run:

curl -fsSL https://get.jetpack.io/typeid | bash

To generate a new TypeID, run:

$ typeid new prefix
prefix_01h2xcejqtf2nbrexx3vqjhp41

To decode an existing TypeID into a UUID run:

$ typeid decode prefix_01h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881

And to encode an existing UUID into a TypeID run:

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

Related Work

UUIDv7 - The upcoming UUID standard that TypeIDs are based on.

Alternatives to UUIDv7 that are also worth considering (but not type-safe like TypeIDs):

typeid's People

Contributors

Stargazers

Watchers

typeid's Issues

existing php uuidv7 implementation.

https://github.com/oittaa/uuid-php

Haskell implementation of typeid

I would be glad if you can kindly include this Haskell implementation in the README.

There's no validation yet, but I will include them soon.

Add Rust implementation

Implemented Rust version. Thanks for the project!
Please add https://github.com/alisa101rs/typeid-rs to implementation list.

But also check jetpack-io/typeid-go#2 issue.

This is more a comment than a true issue.
I was surprised to see Crockford's alphabet
for something that looked familiar to me and was only from 2019.
Don't misunderstand me, I think this alphabet is convenient and maybe he was the first to propose exactly this choice of letters;
hence, he may deserve to be credited;
but for those interested in the topic, I suggest to look at:
https://ux.stackexchange.com/questions/53341/are-there-any-letters-numbers-that-should-be-avoided-in-an-id
from 2014
with link to:
https://github.com/tytso/pwgen/blame/master/pw_rand.c
from 2005
and if there had been GitHub since 1970, I would guess something earlier could be more easily found.
I think definitely https://www.crockford.com/base32.html should add bibliographic references.

Best regards,
Laurent Lyaudet

Add Typescript Implementation

Implemented a typescript implementation. Needed for my work. Will post here soon after I polish it up. Thanks for this project!

RFC: Consider asking the IETF to make a smarter move on UUID V7 before adopting

We should push for a better option (or revision to) UUID V7

From an email I sent the authors of that draft:

Given that the "Unix Epoch" value is going to Y2K us in 2038, thus meaning all the sortability of V7 UUIDs would be broken, any chance you would consider revising that format slightly?

I would propose a Epoch-Period of one or two bits at the front of the UUID field. Then right-shift the actual Unix timestamp one or two bits before injecting those values in the rest of the timestamp field in V7 format. That would only lose us one or two bits of timestamp precision while buying us either 68 or 204 more YEARS before we get Y2Ked

If we actually are building this to have some useful K-sortability, seems crazy to be asking for trouble and adopting a representation that will roll-over in 2038... that's not far away.

[question] how to use in a Postgres database?

Add Dart implementation to list

Heyo y'all!

Just finished implementing and publishing typeid in Dart!

Repo
pub.dev link (dart / flutter package manager)
Tests using vectors provided in spec
Successful build with tests passing

Wanted to share and get a ✅ before opening a PR to add it to y'alls list of implementations

Add .NET (C#) implementation

Hi!

I implemented a .NET lib: https://github.com/TenCoKaciStromy/typeid-dotnet

Could you add it in the implementations list?

Thanks!

Let's consider the ways of automation of the spec validation

TypeID now has a lot of implementations in different programming languages. Therefore, there should be a way of tracking their validation statuses (against spec). I am wondering about a badge, that can show in the README a validation status (failed, succeed) of the particular library. Nevertheless, there should be better solutions.

More compact string encoding

Cool project! I've also recently been thinking about typeIDs (I follow a similar pattern in some of my toy projects) and might be interested in collaborating on a python implementation.

The approach I've been taking is to run a UUIDv7 through base58 (https://pypi.org/project/base58/) before prefixing in order to get an even shorter string encoding. I haven't done this at any particular scale, but I'd be curious if you considered an encoding like this, and if there are any pros/cons you see either way?

When I was looking for an encoding scheme, I had a similar set of requirements:

URL safe, case-insensitive, avoids ambiguous characters, can be selected for copy-pasting by double-clicking, and is a more compact encoding than the traditional hex encoding used by UUIDs

Add C# .NET implementation to the list

Hi,

I've created a performance-oriented implementation of TypeId in C#: https://github.com/firenero/TypeId
What should be done to add it to the list and mark it as verified? I've seen the discussion about automated flow for validation (#23 ) but not sure if there is something in place already.

Also thanks for the reference implementation and examples of valid/invalid typeIds. They were really helpful during developing my library.

Add Swift Implementation

Hi!

I’ve implemented a Swift lib for typeid: https://github.com/Frizlab/swift-typeid.
Can you add it in the implementations list?

Thanks!

On the possible ambiguity when decoding.

Hi,

First of all, cool project! After implementing it myself, I wanted to share some thoughts.

Since TypeIDs have a fixed length with known padding, they can be encoded and decoded in a straightforward manner. However, this does not resolve a certain ambiguity that arises when decoding the suffix, depending on the leftmost character. This is likely already known, but I believe its implications could be made more explicit.

Imagine the first three bits of a UUID to be 100. With padding, that would be 00100. Now, encoding is simple:

encode(00100) = '4'

And so is decoding:

decode('4') = 00100

Then we strip the padding and get back our initial three bits: 100. However, decode('c'), decode('m'), and decode('w') lead to this exact same result, as their binary representation is XX100. After discarding the first two bits, 100 remains in all cases. In short, this implies that if two TypeIDs are identical except for their leftmost suffix characters, and both characters map to the same binary representation after stripping the first two padding bits, the resulting UUID is the same. 32 TypeId suffixes that only differ in the first character map to only 8 unique UUIDs.

Yes... strictly speaking, no TypeID suffix that was encoded as described in the formal specification can ever start with another character than '0'-'7', as these are the only characters with a binary representation beginning with 00..., which is exactly the padding. But the specification does not explicitly restrict a TypeID suffix not to begin with '8'-'z', syntactically, those are still valid TypeIDs.

I'm not suggesting this is a problem. The specification is not incorrect. It just does not (in mathematical terms) describe a bijective function, and I'm concerned that end users of TypeID libraries may intuitively expect the encoding and decoding process to be bijective.

An illustration

This behavior can be observed with the current implementation of the command-line tool from this repository.

First, let's decode and re-encode a TypeID suffix starting with a character from between '0' and '7':

$ typeid decode prefix_01h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

As expected, the encoded result is equal to the original TypeID.

Now, let's take the same TypeID, but replace the leftmost character of the suffix with something between '8' and 'z', which still constitutes a syntactically correct TypeID:

$ typeid decode prefix_81h2xcejqtf2nbrexx3vqjhp41
type: prefix
uuid: 0188bac7-4afa-78aa-bc3b-bd1eef28d881 # same as above

$ typeid encode prefix 0188bac7-4afa-78aa-bc3b-bd1eef28d881
prefix_01h2xcejqtf2nbrexx3vqjhp41

But now: prefix_81h2xcejqtf2nbrexx3vqjhp41 != prefix_01h2xcejqtf2nbrexx3vqjhp41

As mentioned above, if we try this for all 32 characters, the command-line tool decodes 32 different TypeIDs to only 8 unique UUIDs:

[0,8,g,r]1h2xcejqtf2nbrexx3vqjhp41 -> 0188bac7-4afa-78aa-bc3b-bd1eef28d881
[1,9,h,s]1h2xcejqtf2nbrexx3vqjhp41 -> 2188bac7-4afa-78aa-bc3b-bd1eef28d881
[2,a,j,t]1h2xcejqtf2nbrexx3vqjhp41 -> 4188bac7-4afa-78aa-bc3b-bd1eef28d881
[3,b,k,v]1h2xcejqtf2nbrexx3vqjhp41 -> 6188bac7-4afa-78aa-bc3b-bd1eef28d881
[4,c,m,w]1h2xcejqtf2nbrexx3vqjhp41 -> 8188bac7-4afa-78aa-bc3b-bd1eef28d881
[5,d,n,x]1h2xcejqtf2nbrexx3vqjhp41 -> a188bac7-4afa-78aa-bc3b-bd1eef28d881
[6,e,p,y]1h2xcejqtf2nbrexx3vqjhp41 -> c188bac7-4afa-78aa-bc3b-bd1eef28d881
[7,f,q,z]1h2xcejqtf2nbrexx3vqjhp41 -> e188bac7-4afa-78aa-bc3b-bd1eef28d881

My thoughts:

You could argue that for properly generated TypeIDs, the leftmost suffix character is always between '0'-'7'. That's true, but the problem arises not during encoding, but during decoding. Input strings from external sources (users, clients, etc.) are not inherently trustworthy. Even syntactically correct TypeIDs lead to this ambiguity (as demonstrated above).
Possible solutions:
- Keep everything as it is. Maybe it's not that much of a problem.
- Or: Do not allow '8'-'z' as the leftmost characters, as no properly generated suffix should ever begin with those characters. This is what I did in my Java implementation that I submitted yesterday, because I initially assumed it was not permitted. Only later I found out that it isn't explicitly specified.

I hope this feedback is in some way helpful.

RFC: Consider allowing `_` as an additional separator within the typeid prefix

The spec, as defined today, only allows for lowercase alphabetic characters in the type prefix. Some users though, might need a way to have a "compound noun" in the prefix. Imagine you want the type to be "user accounts"; today you would have to encode that as a single word useraccounts but it might be preferable to allow a separator to encode it as user_accounts instead.

Remove second rust implementation

Hi! Thank you again for adding my implementation :)
Library that was provided by @conradludgate is very good, and I don't think we should have two rust implementations,
could you please remove mine from the table?

Future specification on binary format

Is there any plan to add into the specification how to convert a typeid to binary format?

In my other personal project utilising typeid, I will need to serialise the ids. So far I'm implementing my own serialisation only for that specific project, but if there will be a formal specification, I can include that in the Haskell implementation as well.

I've been doing this kind of thing for years with two notable differences:
...

I add two base-32 characters as a checksum (salted of course). This is prevents having to go look at the datastore when the
value is bogus either by accident or malice. I'm unsure why other implementations don't do this.

Should we do that as part of the official TypeID spec?