rs / xid Goto Github PK

View Code? Open in Web Editor NEW

3.8K 34.0 199.0 81 KB

xid is a globally unique id generator thought for the web

License: MIT License

Go 100.00%

xid's People

Contributors

Stargazers

Watchers

Forkers

elliott5 creantan leyra binarypie billhathaway zhanglei escribano andradeandrey 0xmalloc libra9z antman2023 levinlin nvcnvn xmelon daigo betashepherd pjvds gophersgang sharonjl dc0d roothybrid7 kalbasit joinverse theharveyz snail007 kyf djcoder100 wfxiang08 hiepnsx solertis longzhihun lazercorn ciaranarcher jardar dikang123 jrasanen fredcarle yianz kellerli zanjs kuangchanglang noahcse quantumghost nasermirzaei89 payboxth carryon julianshen iandev samhennessy dasjestyr borori karlmutch dradtke mstoykov nc37tw forging2012 ouewqa linecode nadim500 bharatrajan h0axd vincentserpoul sirfilip kpango hilalisadev lucasepe durigon feckmore hzmsrv itsky365 cjrjc terenceclark isgasho jppiiroinen ismiyati xujunhai topdcw super-rain mbrukman janzz carmel migzone teamlint ljtking123 clansofts diegogub bakks timestee skarm rubik-ai brunovu20 machinechatllc gotid productlabs osmo-clement giwayne sonnt85 forkkit diegojlucena bww

xid's Issues

What is "K-ordered" and "sortable"?

Hi! Could you please explain what exactly is K-ordered and sortable? I'm not sure if I got the meaning of these aspects compared to other approaches.

Does that mean by any chance that the xid generated at two distinct moments e.g. start of today will necessarily be sorted before any xid generated afterwards (behaving like a timestamp)?

Thank you! :)

[Query] How can I verify that generated code is done by my code?

A version tag

Thank you for a nice implementation of the id generator.

I just wonder it's possible to add some semantic version tag? Something like 1.0.0, so that I can specify it in my glide.yaml file.

what size should i set for xid in mysql?

I use gorm .and the id type is xid.ID , it will generate varbinary 255 in mysql. What is the correctly size should i set for xid?

`kern.uuid` is misinterpreted as a machine ID when it's actually a kernel version identifier

The readPlatformMachineID function in hostid_darwin.go performs a syscall to obtain kern.uuid, and it uses the result as the machine ID. This value isn't actually meant to be a unique ID for a machine, but rather a unique ID for the currently running kernel version.

To verify this, you can run sysctl kern.uuid and then search for the value on Google, and as long as your current version of macOS has been released for a while, you'll likely find other people with the same ID.

(For concrete examples of this, see shirou/gopsutil#1058.)

Could we use the syscall instead of x/sys

As far as I could see syscall has everything that is used from x/sys and will remove one external dependency that is 6.2Mb :)

Question around readMachineID

Thank you for your work! I've been using it as an inspiration for ID generation for a particular project.

I'm most likely wrong here, but I'm curious to verify my understanding. Feel free to ignore the question, and just close the ticket!

I cannot see how readMachineID is stable.

func readMachineID() []byte {
	id := make([]byte, 3)
	hid, err := readPlatformMachineID()
	if err != nil || len(hid) == 0 {
		hid, err = os.Hostname()
	}
	if err == nil && len(hid) != 0 {
		hw := md5.New()
		hw.Write([]byte(hid))
		copy(id, hw.Sum(nil))
	} else {
		// Fallback to rand number if machine id can't be gathered
		if _, randErr := rand.Reader.Read(id); randErr != nil {
			panic(fmt.Errorf("xid: cannot get hostname nor generate a random number: %v; %v", err, randErr))
		}
	}
	return id
}

Assuming a Linux platform, readPlatformMachineID would return according to the man machine-id:

The machine ID is a single newline-terminated, hexadecimal, 32-character, lowercase ID

If we successfully read it, and the length is correct, we then md5 hash it, and copy only the 3 first bytes out of 16 bytes.

Wouldn't this be prone to collisions as we are only using the 3 first bytes of the md5 hash?

I wrote a tiny test, following the same approach but randomly generating the string to be feed into the hash

package unique

import (
	"crypto/md5"
	"math/rand"
	"testing"
)

const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func randStr(r *rand.Rand) string {
	b := make([]byte, 32)
	for i := range b {
		b[i] = chars[r.Intn(len(chars))]
	}
	return string(b)
}

func genID(hid string) []byte {
	id := make([]byte, 3)
	hw := md5.New()
	hw.Write([]byte(hid))
	copy(id, hw.Sum(nil))
	return id
}

func TestUnique(t *testing.T) {
	r := rand.New(rand.NewSource(999))

	rounds := 100000

	mStr := map[string]interface{}{}
	mID := map[string]string{}

	for i := 0; i < rounds; i++ {
		src := randStr(r)
		_, ok := mStr[src]
		if ok { // skip when we've already randomly generated the same string as source
			continue
		}
		mStr[src] = struct{}{}

		genID := string(genID(src))
		storedSrc, ok := mID[genID]
		if ok {
			t.Fatalf("collision? round: %d, storedSrc: '%s', src: '%s', id: '%v'",
				i, storedSrc, src, genID)
		}
		mID[genID] = src
	}
}

func TestCollision(t *testing.T) {
	src1 := "pplbyfSYmSkuUQbjJvcOWsUuSwoPYOTk"
	src2 := "LRnRfzVvPAWbEhDNOegktwBvpaCnutyH"
	require.NotEqual(t, genID(src1), genID(src2))
}

➜  unique go test -v ./...
=== RUN   TestUnique
    unique_test.go:48: collision? round: 12209, storedSrc: 'pplbyfSYmSkuUQbjJvcOWsUuSwoPYOTk', src: 'LRnRfzVvPAWbEhDNOegktwBvpaCnutyH', id: '�(m'
--- FAIL: TestUnique (0.01s)
=== RUN   TestCollision
    unique_test.go:58: 
        	Error Trace:	unique_test.go:58
        	Error:      	Should not be: []byte{0xa3, 0x28, 0x6d}
        	Test:       	TestCollision
--- FAIL: TestCollision (0.00s)
FAIL
FAIL	github.com/sata/unique	0.013s
FAIL

My take of it is, after 12209 attempts, we ended up with two identical machine IDs while their sources are different.

Is the xid ID considered to be stable since we take epoch time + machine identifier + local process id + start counter at a random value? i.e the likelihood of there being a collision of xid IDs is so low due to the other factors?

Failing to get Xid embedded info

Let me start by saying I'm new to golang :)

I can't seem to get the embedded info out of the UUID. Could you shed some light on what might be the issue?

./test.go:15: undefined: xid.Time

package main

import (
	"log"
	"fmt"
	"github.com/rs/xid"
)

func getUuid() string {
	guid := xid.New()
	return guid.String()
}

func printUuidData(s string) {
	ID, err := xid.FromString(s)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(ID)
	return
}

func printUuidTime(s string) {
	t := xid.Time(s)
	fmt.Println(t)
	return
}

func main() {
	testUuid := getUuid()
	printUuidData("Print UUID: " + testUuid)
	printUuidTime(testUuid)
}

Exposing a Nil var

Actually, xid.ID{} is good enough, closing this

What if machine ids are the same?

Suppose

I have two machines, and both of them have the same hostname, they will get the same machine id.
I have two machines, and both of them failed to get the hostname, and their random number generator give the same number as machine id.

How do we prevent this from happening?

network partitioning

you use time when generating the UUID.
So if the network partitions the clocks will skew.
https://en.wikipedia.org/wiki/Clock_skew

I remember readin once the google paper on this for spanner.
the buggers have atomic clocks in each data center and they account for clock skew and also bandwidth latency during clock synchronisation.

Time and Space is the biggest computing problem :)

SO its a heuristic approach. I wonder if there is benchmarking code where someone has run tests with many servers and clients and controlled the partitioning and latency to find out how useful the google approach in. If you can get ordered keys and use OT ( operational transforms ) all your chickens will come home to roost...

Could there be a conflict use it in diffrent docker？

Usage in docker

I'm always running my golang executables within containers in docker.
The PID is then always 1, which then defeats the purpose of having this field.

Would there be any possible replacement?

The least would be to specify this caveat in the README I think.

Rewriting this some recommendations

You can replace your misnamed RandomInt32 (which should be RandomUint32 because it returns a uint32) with a single return rand.Uint32()

Most of the bitwise mess that is intelligible to non-C developers, and often makes them cry when they look at it with binary.(Big|Little)Endian.PutUint(16|32)

I also found a way to shrink the timestamp data to 2 bytes while keeping the most important aspects of it.

If there is any interest I can make pull requests with some of these changes to make it more readable to most Go programmers and make it easier to modify and update because right now I imagine most developers who don't know bitwise operataions very well look at this codebase and cry and its not making it more efficient.

There is tons of efficiency issues in the code, it uses way way more memory than it needs to.

How do you pronounce xid?

I've been saying "zid", but sometimes "x id". Would be good to know how you say "xid" out loud so I can copy it.

How do I use/implement it?

Do I use it with a Google Cloud Function?

Or do I use a $5 Digital Ocean Droplet with a node/go app where I use it?

Is it 100% safe uniquely, or only 99.9999999999999999999999999999999999%?

If I need many unique IDs per second, I can just use ten $5 Digital Ocean droplets with the same node app, and they all gonna still spit out 100% unique IDs? Or also only 99.99999999999999999999%?

sorry for my dumbness, I am a beginner

Kind regards

XID as a 12 byte (bytea) unique indexed Postgres column

In my Postgres 9.6+ tables, I want to use an xid as a unique indexed column and also sometimes as primary key or foreign key column too. Is it more performant to store it as a 20 character string or 12 byte binary? I initially thought binary would be more efficient, but this article suggests otherwise.

Does anyone have any experience using a bytea as a primary key? If so, is it more efficient to use the default bytea hex format or the older bytea escape format?

[bug] xid.FromString return string is difference.

xid.FromString("c6e52g2mrqcjl44hf179")
expect to "c6e52g2mrqcjl44hf179", but currently is "c6e52g2mrqcjl44hf170"

Rust port

Hi there,

In case it helps someone I wrote a Rust port of xid.

repo: https://github.com/jeromer/libxid
crates.io URL: https://crates.io/crates/libxid

Any feedback welcome.

Bug: hostid_windows.go: "Undefined: windows"

In hostid_windows.go you forgot importing the "windows" reference :)

How does it compare to ULID?

Spec for ulid: https://github.com/ulid/spec

Remove panics

As a library that is used by many other application perhaps a "panic" is not the correct way to handle errors as that would cause the user app to die when that said app should be able to choose what to do? The "New()" function should then return (ID, error)

Breaking Change

Release Tagging

The releases section hasn't been updated since 2018. It might be beneficial to mark the tags as releases so the project page accurately reflects the latest info.

I came to this after running go get -u ./..., seeing there was an update, and wanting to find what changed from 1.4.0 to 1.5.0.

lookup performance

Hi,
is xid suitable for fast lookups?

http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html

Cheers,
Jens

'/proc/self/cpuset' not found inside the container

I noticed that xid will xor the PID with the CRC of '/proc/self/cpuset' to against the situation when the process is running inside a container. Unfortunately '/proc/self/cpuset' could not be found inside my containers.

Would '/proc/self/cgroup' be a better choice?

Thanks

Cryptographically secure ?

Hi, Thanks for the package. I have been using this in a project of mine and it is very helpful. This is more a usage question than an Issue.

Are the ids generated by this package cryptographically secure ? There are quite a few sources that you use (machine id, process id, counter etc.) but the documentation does not say anything about if it is cryptographically secure (unpredictable) to use this package when there is a necessity. It will be good to mention the answer to this in the README. Thanks once again.

What happens when the 4 byte time value overflows?

4-byte value representing the seconds since the Unix epoch,

It should be in 2038 if I'm not mistaken. It's sooner than you think. 20 years of my life went by like fingersnip this.

Shouldn't a 64bit (8byte) value be used instead?

Out of Sequence ID

An out of sequence phenomena can be seen by running this code (or there is a mistake in my approach). This code simply compare two ids, generated consecutively:

package main

import (
	"bytes"
	"fmt"

	"github.com/rs/xid"
)

func main() {
	var (
		p, n []byte
	)

	var cnt int64
	for p, n = nil, []byte("0"); bytes.Compare(p, n) < 0; p, n = n, next() {
		cnt++
	}

	idp := conv(p)
	idn := conv(n)
	fmt.Printf("%v %s %s\n", cnt, idp.String(), idn.String())
}

func next() []byte {
	id := xid.New()
	return id[:]
}

func conv(b []byte) xid.ID {
	var id xid.ID
	copy(id[:], b)
	return id
}

Sample outputs:

6135364 bah53vtgl2r1p4vvvvvg bah53vtgl2r1p4o00000
2586623 bah540dgl2r1q1fvvvvg bah540dgl2r1q1800000
3508359 bah55elgl2r1sdfvvvvg bah55elgl2r1sd800000

Is the xid still following the sequence and unique when transferring the program to another server?

Is the xid still following the sequence and unique when transferring the program to another server?
If not, what kind of configuration do I need to do to make it work?
Thanks.

Current Time

Using time.Now() for getting current time is problematic, when we are using this package, running on different servers with different timezones. Seems logical to use time.Now().UTC() as default - a sane default. Or add some package (or type level) default function for getting current time, that one could change if needed:

var (
	TimeFunc = func() time.Time {
		return time.Now().UTC()
	}
)

Is xid thread-safe?

Can we safely use the library within goroutines without worrying about locking the call?

Tag a release

Would be nice if you are able to tag a release @rs, including the latest fixes, even if there are not to many of them.

Pro-tip: The auto-generated release notes on GitHub are not to bad; it will generate a link to each PR that has been merged since the last release. I notice that there is no GitHub release for v1.3.0, just a v1.3.0 git tag. I think that still works as far as go.mod is concerned, but it would be possible to create a release for that tag first, if you want better auto-generated release notes.

Add an id pool layer

userid_pool := xid.NewPool()
orderid_pool := xid.NewPool()

userid := userid_pool.New()
orderid := userid_pool.New()

Keep idCounter in different pool

Ids in the same pool are unique.

But in different pools, ids do not need to be unique.

is there any other language implemention, such as java?

I would like to generate xid in go but decode it some where else in java/python for log processing purpose, but I cannot find any implementation in other languages.

Can a new release be released?

The latest release is v1.2.1, in 2018.

github.com/rs/xid v1.2.1

Great library, look forward to it.

Command-line tool

Dear @rs,
I'm not sure how to appreciate the beauty of your solution without being able to run it on Windows as a binary that produces the result. E.g. check ULID and CUIDv2

$ ulid.exe
01GPR0A4J919E253QDAGVR8MK7

$ cuidgen.exe
gib227a07c6a1njttd9jn982

Cryptographically secure comparing to ULID

There is a note in the README:

Xid is dependent on the system time, a monotonic counter and so is not cryptographically secure. If unpredictability of IDs is important, you should not use Xids. It is worth noting that most other UUID-like implementations are also not cryptographically secure. You should use libraries that rely on cryptographically secure sources (like /dev/urandom on unix, crypto/rand in golang), if you want a truly random ID generator.

Can XID be used with those random generators and how?

On the other hand, fr ULID that should be possible: https://github.com/ulid/javascript#pseudo-random-number-generators, and here is an example how: https://github.com/prometheus/prometheus/pull/6867/files

Is there a similar example for XID (if this is at all possible)?

Xid needs a Kotlin Multiplattform port

Hi guys,

I know this doesn't exactly fit in this repo, but it would be great to have a Kotlin port for XID. Especially since it is Multiplattform now it would be great to generate the same ids on server and client.

Best,
Sebastian

upgrade go version to 1.13 or later?

so many error with golint.
Although the error is reported, it can be used normally

invalid operation: signed shift count 16 (untyped int constant) requires go1.13 or later

big bug: not sortable

var list = make([]string, 0)
var busy = make(chan bool, 1)

func AddXidToList(i string) {
busy <- true
id := xid.New()
list = append(list, id.String()+" "+i)
<-busy
}

func Test_test6(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(2)
go func() {
for i := 0; i < 1000000; i++ {
AddXidToList("1")
}
wg.Done()
}()
go func() {
for i := 0; i < 1000000; i++ {
AddXidToList("2")
}
wg.Done()
}()
wg.Wait()
for i := 0; i < len(list); i++ {
if i+1 < len(list) && list[i] > list[i+1] {
//1820049 cdiebmnlt656j2nvvvvg 2 cdiebmnlt656j2g00000 1
fmt.Println(i, list[i], list[i+1])
t.Error("big bug")
}
}
if list[len(list)-2] > list[len(list)-1] {
t.Error("big bug2")
fmt.Println(list[len(list)-2], list[len(list)-1])
}
fmt.Println("END")
}
func Test_test7(t *testing.T) {
i := 0
for {
i++
Test_test6(t)
if i == 10 {
break
}
}
}

run the method Test_test7, it print following:
1969577 cdj6vqvlt652mmnvvvvg 1 cdj6vqvlt652mmg00000 1
id_test.go:177: big bug

README: Update the example

The example in the README shows output looking like base64 (mixing lower and upper case letters), not base32.

Is it possible to generate XID from SQL?

Hi,
To set DEFAULT value for an PK of table, I wonder if has anyone tried to generate a valid xid value by using SQL.

For example for UUIDv4, we can generate default value if uuid-ossp extension is not installed for PostrgreSQL as:

`CREATE TABLE IF NOT EXISTS table (
   id VARCHAR(36) DEFAULT md5(random()::text || clock_timestamp()::text)::uuid  NOT NULL,
....

Of course it'll be better to generate from application but, I it makes simpler to add default value for manual interactions with tables.

Has anyone tried to develop an SQL function to geneate xid?

Potential curse/profane/offensive words generated using xid (in a user-facing setting)

My question is around xid's that may be facing users of a software (i.e. not just stored as keys in a database).

As it is a 20 characters string and the alphabet includes most characters of the alphabet there are decent probabilities that the accidental f*ck shows up or even whole offensive short sentences are formed. Has anyone considered this aspect before and, if yes, found reasonable solutions that would be practical?

I could think of:

re-encoding the bytes with a reduced alphabet
switching some of aeiou for wxyz

pid in container managed by systemd

Hi guys, my server is managed by systemd, running in a docker container, which probably own the same pid(and it's not 1) in different containers. This will cause xid conflict when multi instances deployed in one host.
Any idea to solve this case?

Use raw []byte value in sql interfaces for smaller indices

As mentioned in #14, XID may be stored as a bytea in Postgres, resulting it to take up 16 bytes rather then 24.

While read/write performance is impacted slighlty, as far as I understand, the benefits of a BYTEA over TEXT in smaller size, which again means smaller (and thus in theory faster) indices for large tables in particular. In addition, there are less "special rules" (e.g. unicode / local encoding rules) for comparison, which agin, in theory, should make it faster to query as well.

https://www.db-fiddle.com/f/jgYzsKTFGu3NU9ZjDjfRUw/0

The link above shows a simple table with an ID as either bytea or string. Given 50.000 entries, the index size is reduced from 2496 kB to 2048 kB by using bytea.

I don't know at which table sizes this become significant, and if it really matters. A propper benchmark with a few million rows and a few quries is probably wise before making any changes.

How to define bytea IDs

Given bytea is used as an ID in the schema, test-code to encode/decode XIDs from binary is provaided here:

type XID struct {
	xid.ID
}

// NewXID generates a new XID instance.
func NewXID() XID {
	return XID{ID: xid.New()}
}

// Value implements the driver.Valuer interface.
func (id XID) Value() (driver.Value, error) {
	if id.IsNil() {
		return nil, nil
	}
	return id.Bytes(), nil
}

// Scan implements the sql.Scanner interface.
func (id *XID) Scan(value interface{}) error {
	switch b := value.(type) {
	case []byte:
		_id, err := xid.FromBytes(b)
		if err != nil {
			return err
		}
		id.ID = _id
		return nil
	case nil:
		id.ID = xid.ID{}
		return nil
	default:
		return fmt.Errorf("xid: scanning unsupported type: %T", value)
	}
}