rs / xid Goto Github PK
View Code? Open in Web Editor NEWxid is a globally unique id generator thought for the web
License: MIT License
xid is a globally unique id generator thought for the web
License: MIT License
Hi! Could you please explain what exactly is K-ordered
and sortable
? I'm not sure if I got the meaning of these aspects compared to other approaches.
Does that mean by any chance that the xid generated at two distinct moments e.g. start of today will necessarily be sorted before any xid generated afterwards (behaving like a timestamp)?
Thank you! :)
Thank you for a nice implementation of the id generator.
I just wonder it's possible to add some semantic version tag? Something like 1.0.0, so that I can specify it in my glide.yaml file.
I use gorm .and the id type is xid.ID
, it will generate varbinary 255
in mysql. What is the correctly size should i set for xid?
The readPlatformMachineID
function in hostid_darwin.go
performs a syscall to obtain kern.uuid
, and it uses the result as the machine ID. This value isn't actually meant to be a unique ID for a machine, but rather a unique ID for the currently running kernel version.
To verify this, you can run sysctl kern.uuid
and then search for the value on Google, and as long as your current version of macOS has been released for a while, you'll likely find other people with the same ID.
(For concrete examples of this, see shirou/gopsutil#1058.)
As far as I could see syscall has everything that is used from x/sys and will remove one external dependency that is 6.2Mb :)
Thank you for your work! I've been using it as an inspiration for ID generation for a particular project.
I'm most likely wrong here, but I'm curious to verify my understanding. Feel free to ignore the question, and just close the ticket!
I cannot see how readMachineID is stable.
func readMachineID() []byte {
id := make([]byte, 3)
hid, err := readPlatformMachineID()
if err != nil || len(hid) == 0 {
hid, err = os.Hostname()
}
if err == nil && len(hid) != 0 {
hw := md5.New()
hw.Write([]byte(hid))
copy(id, hw.Sum(nil))
} else {
// Fallback to rand number if machine id can't be gathered
if _, randErr := rand.Reader.Read(id); randErr != nil {
panic(fmt.Errorf("xid: cannot get hostname nor generate a random number: %v; %v", err, randErr))
}
}
return id
}
Assuming a Linux platform, readPlatformMachineID
would return according to the man machine-id
:
The machine ID is a single newline-terminated, hexadecimal, 32-character, lowercase ID
If we successfully read it, and the length is correct, we then md5 hash it, and copy only the 3 first bytes out of 16 bytes.
Wouldn't this be prone to collisions as we are only using the 3 first bytes of the md5 hash?
I wrote a tiny test, following the same approach but randomly generating the string to be feed into the hash
package unique
import (
"crypto/md5"
"math/rand"
"testing"
)
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
func randStr(r *rand.Rand) string {
b := make([]byte, 32)
for i := range b {
b[i] = chars[r.Intn(len(chars))]
}
return string(b)
}
func genID(hid string) []byte {
id := make([]byte, 3)
hw := md5.New()
hw.Write([]byte(hid))
copy(id, hw.Sum(nil))
return id
}
func TestUnique(t *testing.T) {
r := rand.New(rand.NewSource(999))
rounds := 100000
mStr := map[string]interface{}{}
mID := map[string]string{}
for i := 0; i < rounds; i++ {
src := randStr(r)
_, ok := mStr[src]
if ok { // skip when we've already randomly generated the same string as source
continue
}
mStr[src] = struct{}{}
genID := string(genID(src))
storedSrc, ok := mID[genID]
if ok {
t.Fatalf("collision? round: %d, storedSrc: '%s', src: '%s', id: '%v'",
i, storedSrc, src, genID)
}
mID[genID] = src
}
}
func TestCollision(t *testing.T) {
src1 := "pplbyfSYmSkuUQbjJvcOWsUuSwoPYOTk"
src2 := "LRnRfzVvPAWbEhDNOegktwBvpaCnutyH"
require.NotEqual(t, genID(src1), genID(src2))
}
➜ unique go test -v ./...
=== RUN TestUnique
unique_test.go:48: collision? round: 12209, storedSrc: 'pplbyfSYmSkuUQbjJvcOWsUuSwoPYOTk', src: 'LRnRfzVvPAWbEhDNOegktwBvpaCnutyH', id: '�(m'
--- FAIL: TestUnique (0.01s)
=== RUN TestCollision
unique_test.go:58:
Error Trace: unique_test.go:58
Error: Should not be: []byte{0xa3, 0x28, 0x6d}
Test: TestCollision
--- FAIL: TestCollision (0.00s)
FAIL
FAIL github.com/sata/unique 0.013s
FAIL
My take of it is, after 12209 attempts, we ended up with two identical machine IDs while their sources are different.
Is the xid ID considered to be stable since we take epoch time + machine identifier + local process id + start counter at a random value? i.e the likelihood of there being a collision of xid IDs is so low due to the other factors?
Let me start by saying I'm new to golang :)
I can't seem to get the embedded info out of the UUID. Could you shed some light on what might be the issue?
./test.go:15: undefined: xid.Time
package main
import (
"log"
"fmt"
"github.com/rs/xid"
)
func getUuid() string {
guid := xid.New()
return guid.String()
}
func printUuidData(s string) {
ID, err := xid.FromString(s)
if err != nil {
log.Fatal(err)
}
fmt.Println(ID)
return
}
func printUuidTime(s string) {
t := xid.Time(s)
fmt.Println(t)
return
}
func main() {
testUuid := getUuid()
printUuidData("Print UUID: " + testUuid)
printUuidTime(testUuid)
}
Actually, xid.ID{} is good enough, closing this
Suppose
How do we prevent this from happening?
you use time when generating the UUID.
So if the network partitions the clocks will skew.
https://en.wikipedia.org/wiki/Clock_skew
I remember readin once the google paper on this for spanner.
the buggers have atomic clocks in each data center and they account for clock skew and also bandwidth latency during clock synchronisation.
Time and Space is the biggest computing problem :)
SO its a heuristic approach. I wonder if there is benchmarking code where someone has run tests with many servers and clients and controlled the partitioning and latency to find out how useful the google approach in. If you can get ordered keys and use OT ( operational transforms ) all your chickens will come home to roost...
I'm always running my golang executables within containers in docker.
The PID is then always 1, which then defeats the purpose of having this field.
Would there be any possible replacement?
The least would be to specify this caveat in the README I think.
You can replace your misnamed RandomInt32 (which should be RandomUint32 because it returns a uint32) with a single return rand.Uint32()
Most of the bitwise mess that is intelligible to non-C developers, and often makes them cry when they look at it with binary.(Big|Little)Endian.PutUint(16|32)
I also found a way to shrink the timestamp data to 2 bytes while keeping the most important aspects of it.
If there is any interest I can make pull requests with some of these changes to make it more readable to most Go programmers and make it easier to modify and update because right now I imagine most developers who don't know bitwise operataions very well look at this codebase and cry and its not making it more efficient.
There is tons of efficiency issues in the code, it uses way way more memory than it needs to.
I've been saying "zid", but sometimes "x id". Would be good to know how you say "xid" out loud so I can copy it.
Do I use it with a Google Cloud Function?
Or do I use a $5 Digital Ocean Droplet with a node/go app where I use it?
Is it 100% safe uniquely, or only 99.9999999999999999999999999999999999%?
If I need many unique IDs per second, I can just use ten $5 Digital Ocean droplets with the same node app, and they all gonna still spit out 100% unique IDs? Or also only 99.99999999999999999999%?
sorry for my dumbness, I am a beginner
Kind regards
In my Postgres 9.6+ tables, I want to use an xid as a unique indexed column and also sometimes as primary key or foreign key column too. Is it more performant to store it as a 20 character string or 12 byte binary? I initially thought binary would be more efficient, but this article suggests otherwise.
Does anyone have any experience using a bytea as a primary key? If so, is it more efficient to use the default bytea hex
format or the older bytea escape
format?
xid.FromString("c6e52g2mrqcjl44hf179")
expect to "c6e52g2mrqcjl44hf179", but currently is "c6e52g2mrqcjl44hf170"
Hi there,
In case it helps someone I wrote a Rust port of xid.
crates.io URL: https://crates.io/crates/libxid
Any feedback welcome.
:)
In hostid_windows.go you forgot importing the "windows" reference :)
Spec for ulid: https://github.com/ulid/spec
As a library that is used by many other application perhaps a "panic" is not the correct way to handle errors as that would cause the user app to die when that said app should be able to choose what to do? The "New()" function should then return (ID, error)
Breaking Change
Hi,
is xid suitable for fast lookups?
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
Cheers,
Jens
I noticed that xid
will xor the PID with the CRC of '/proc/self/cpuset' to against the situation when the process is running inside a container. Unfortunately '/proc/self/cpuset' could not be found inside my containers.
Would '/proc/self/cgroup' be a better choice?
Thanks
Hi, Thanks for the package. I have been using this in a project of mine and it is very helpful. This is more a usage question than an Issue.
Are the ids generated by this package cryptographically secure ? There are quite a few sources that you use (machine id, process id, counter etc.) but the documentation does not say anything about if it is cryptographically secure (unpredictable) to use this package when there is a necessity. It will be good to mention the answer to this in the README. Thanks once again.
4-byte value representing the seconds since the Unix epoch,
It should be in 2038 if I'm not mistaken. It's sooner than you think. 20 years of my life went by like fingersnip this.
Shouldn't a 64bit (8byte) value be used instead?
An out of sequence phenomena can be seen by running this code (or there is a mistake in my approach). This code simply compare two ids, generated consecutively:
package main
import (
"bytes"
"fmt"
"github.com/rs/xid"
)
func main() {
var (
p, n []byte
)
var cnt int64
for p, n = nil, []byte("0"); bytes.Compare(p, n) < 0; p, n = n, next() {
cnt++
}
idp := conv(p)
idn := conv(n)
fmt.Printf("%v %s %s\n", cnt, idp.String(), idn.String())
}
func next() []byte {
id := xid.New()
return id[:]
}
func conv(b []byte) xid.ID {
var id xid.ID
copy(id[:], b)
return id
}
Sample outputs:
6135364 bah53vtgl2r1p4vvvvvg bah53vtgl2r1p4o00000
2586623 bah540dgl2r1q1fvvvvg bah540dgl2r1q1800000
3508359 bah55elgl2r1sdfvvvvg bah55elgl2r1sd800000
Is the xid still following the sequence and unique when transferring the program to another server?
If not, what kind of configuration do I need to do to make it work?
Thanks.
Using time.Now()
for getting current time is problematic, when we are using this package, running on different servers with different timezones. Seems logical to use time.Now().UTC()
as default - a sane default. Or add some package (or type level) default function for getting current time, that one could change if needed:
var (
TimeFunc = func() time.Time {
return time.Now().UTC()
}
)
Can we safely use the library within goroutines without worrying about locking the call?
Would be nice if you are able to tag a release @rs, including the latest fixes, even if there are not to many of them.
Pro-tip: The auto-generated release notes on GitHub are not to bad; it will generate a link to each PR that has been merged since the last release. I notice that there is no GitHub release for v1.3.0
, just a v1.3.0
git tag. I think that still works as far as go.mod is concerned, but it would be possible to create a release for that tag first, if you want better auto-generated release notes.
userid_pool := xid.NewPool()
orderid_pool := xid.NewPool()
userid := userid_pool.New()
orderid := userid_pool.New()
Keep idCounter in different pool
Ids in the same pool are unique.
But in different pools, ids do not need to be unique.
I would like to generate xid in go but decode it some where else in java/python for log processing purpose, but I cannot find any implementation in other languages.
The latest release is v1.2.1, in 2018.
github.com/rs/xid v1.2.1
Great library, look forward to it.
There is a note in the README:
Xid is dependent on the system time, a monotonic counter and so is not cryptographically secure. If unpredictability of IDs is important, you should not use Xids. It is worth noting that most other UUID-like implementations are also not cryptographically secure. You should use libraries that rely on cryptographically secure sources (like /dev/urandom on unix, crypto/rand in golang), if you want a truly random ID generator.
Can XID be used with those random generators and how?
On the other hand, fr ULID that should be possible: https://github.com/ulid/javascript#pseudo-random-number-generators, and here is an example how: https://github.com/prometheus/prometheus/pull/6867/files
Is there a similar example for XID (if this is at all possible)?
Hi guys,
I know this doesn't exactly fit in this repo, but it would be great to have a Kotlin port for XID. Especially since it is Multiplattform now it would be great to generate the same ids on server and client.
Best,
Sebastian
var list = make([]string, 0)
var busy = make(chan bool, 1)
func AddXidToList(i string) {
busy <- true
id := xid.New()
list = append(list, id.String()+" "+i)
<-busy
}
func Test_test6(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(2)
go func() {
for i := 0; i < 1000000; i++ {
AddXidToList("1")
}
wg.Done()
}()
go func() {
for i := 0; i < 1000000; i++ {
AddXidToList("2")
}
wg.Done()
}()
wg.Wait()
for i := 0; i < len(list); i++ {
if i+1 < len(list) && list[i] > list[i+1] {
//1820049 cdiebmnlt656j2nvvvvg 2 cdiebmnlt656j2g00000 1
fmt.Println(i, list[i], list[i+1])
t.Error("big bug")
}
}
if list[len(list)-2] > list[len(list)-1] {
t.Error("big bug2")
fmt.Println(list[len(list)-2], list[len(list)-1])
}
fmt.Println("END")
}
func Test_test7(t *testing.T) {
i := 0
for {
i++
Test_test6(t)
if i == 10 {
break
}
}
}
run the method Test_test7, it print following:
1969577 cdj6vqvlt652mmnvvvvg 1 cdj6vqvlt652mmg00000 1
id_test.go:177: big bug
The example in the README shows output looking like base64 (mixing lower and upper case letters), not base32.
Hi,
To set DEFAULT
value for an PK
of table, I wonder if has anyone tried to generate a valid xid
value by using SQL.
For example for UUIDv4, we can generate default value if uuid-ossp
extension is not installed for PostrgreSQL as:
`CREATE TABLE IF NOT EXISTS table (
id VARCHAR(36) DEFAULT md5(random()::text || clock_timestamp()::text)::uuid NOT NULL,
....
Of course it'll be better to generate from application but, I it makes simpler to add default value for manual interactions with tables.
Has anyone tried to develop an SQL function to geneate xid
?
My question is around xid's that may be facing users of a software (i.e. not just stored as keys in a database).
As it is a 20 characters string and the alphabet includes most characters of the alphabet there are decent probabilities that the accidental f*ck
shows up or even whole offensive short sentences are formed. Has anyone considered this aspect before and, if yes, found reasonable solutions that would be practical?
I could think of:
aeiou
for wxyz
Hi guys, my server is managed by systemd, running in a docker container, which probably own the same pid(and it's not 1) in different containers. This will cause xid conflict when multi instances deployed in one host.
Any idea to solve this case?
As mentioned in #14, XID may be stored as a bytea in Postgres, resulting it to take up 16 bytes rather then 24.
While read/write performance is impacted slighlty, as far as I understand, the benefits of a BYTEA over TEXT in smaller size, which again means smaller (and thus in theory faster) indices for large tables in particular. In addition, there are less "special rules" (e.g. unicode / local encoding rules) for comparison, which agin, in theory, should make it faster to query as well.
https://www.db-fiddle.com/f/jgYzsKTFGu3NU9ZjDjfRUw/0
The link above shows a simple table with an ID as either bytea or string. Given 50.000 entries, the index size is reduced from 2496 kB to 2048 kB by using bytea.
I don't know at which table sizes this become significant, and if it really matters. A propper benchmark with a few million rows and a few quries is probably wise before making any changes.
Given bytea
is used as an ID in the schema, test-code to encode/decode XIDs from binary is provaided here:
type XID struct {
xid.ID
}
// NewXID generates a new XID instance.
func NewXID() XID {
return XID{ID: xid.New()}
}
// Value implements the driver.Valuer interface.
func (id XID) Value() (driver.Value, error) {
if id.IsNil() {
return nil, nil
}
return id.Bytes(), nil
}
// Scan implements the sql.Scanner interface.
func (id *XID) Scan(value interface{}) error {
switch b := value.(type) {
case []byte:
_id, err := xid.FromBytes(b)
if err != nil {
return err
}
id.ID = _id
return nil
case nil:
id.ID = xid.ID{}
return nil
default:
return fmt.Errorf("xid: scanning unsupported type: %T", value)
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.