Giter Site home page Giter Site logo

f4b6a3 / tsid-creator Goto Github PK

View Code? Open in Web Editor NEW
413.0 9.0 41.0 374 KB

A Java library for generating Time-Sorted Unique Identifiers (TSID).

License: MIT License

Java 99.63% Shell 0.21% Batchfile 0.16%
tsid id snowflake snowflake-twitter java twitter-snowflake

tsid-creator's Introduction

TSID Creator

A Java library for generating Time-Sorted Unique Identifiers (TSID).

It brings together ideas from Twitter's Snowflake and ULID Spec.

In summary:

  • Sorted by generation time;
  • Can be stored as an integer of 64 bits;
  • Can be stored as a string of 13 chars;
  • String format is encoded to Crockford's base32;
  • String format is URL safe, is case insensitive, and has no hyphens;
  • Shorter than UUID, ULID and KSUID.

This project contains a micro benchmark and a good amount of unit tests.

The jar file can be downloaded directly from maven.org.

Recommended readings:

Usage

Create a TSID:

Tsid tsid = TsidCreator.getTsid();

Create a TSID as long:

long number = TsidCreator.getTsid().toLong(); // 38352658567418872

Create a TSID as String:

String string = TsidCreator.getTsid().toString(); // 01226N0640J7Q

The TSID generator is thread-safe.

Dependency

Add these lines to your pom.xml:

<!-- https://search.maven.org/artifact/com.github.f4b6a3/tsid-creator -->
<dependency>
  <groupId>com.github.f4b6a3</groupId>
  <artifactId>tsid-creator</artifactId>
  <version>5.2.6</version>
</dependency>

See more options in maven.org.

Modularity

Module and bundle names are the same as the root package name.

  • JPMS module name: com.github.f4b6a3.tsid
  • OSGi symbolic name: com.github.f4b6a3.tsid

TSID as Long

The Tsid.toLong() method simply unwraps the internal long value of a TSID.

long tsid = TsidCreator.getTsid().toLong();

Sequence of TSIDs:

38352658567418867
38352658567418868
38352658567418869
38352658567418870
38352658567418871
38352658567418872
38352658567418873
38352658567418874
38352658573940759 < millisecond changed
38352658573940760
38352658573940761
38352658573940762
38352658573940763
38352658573940764
38352658573940765
38352658573940766
         ^      ^ look
                                   
|--------|------|
   time   random

TSID as String

The Tsid.toString() method encodes a TSID to Crockford's base 32 encoding. The returned string is 13 characters long.

String tsid = TsidCreator.getTsid().toString();

Sequence of TSID strings:

01226N0640J7K
01226N0640J7M
01226N0640J7N
01226N0640J7P
01226N0640J7Q
01226N0640J7R
01226N0640J7S
01226N0640J7T
01226N0693HDA < millisecond changed
01226N0693HDB
01226N0693HDC
01226N0693HDD
01226N0693HDE
01226N0693HDF
01226N0693HDG
01226N0693HDH
        ^   ^ look
                                   
|-------|---|
   time random

The string format can be useful for languages that store numbers in double-precision 64-bit binary format IEEE 754, such as Javascript.

TSID Structure

The term TSID stands for (roughly) Time-Sorted ID. A TSID is a number that is formed by the creation time along with random bits.

The TSID has 2 components:

  • Time component (42 bits)
  • Random component (22 bits)

The time component is the count of milliseconds since 2020-01-01 00:00:00 UTC.

The Random component has 2 sub-parts:

  • Node ID (0 to 20 bits)
  • Counter (2 to 22 bits)

The counter bits depend on the node bits. If the node bits are 10, the counter bits are limited to 12. In this example, the maximum node value is 2^10-1 = 1023 and the maximum counter value is 2^12-1 = 4095. So the maximum TSIDs that can be generated per millisecond is 4096.

The node identifier uses 10 bits of the random component by default in the TsidFactory. It's possible to adjust the node bits to a value between 0 and 20. The counter bits are affected by the node bits.

This is the default TSID structure:

                                            adjustable
                                           <---------->
|------------------------------------------|----------|------------|
       time (msecs since 2020-01-01)           node       counter
                42 bits                       10 bits     12 bits

- time:    2^42 = ~69 years or ~139 years (with adjustable epoch)
- node:    2^10 = 1,024 (with adjustable bits)
- counter: 2^12 = 4,096 (initially random)

Notes:
The node is adjustable from 0 to 20 bits.
The node bits affect the counter bits.
The time component can be used for ~69 years if stored in a SIGNED 64 bits integer field.
The time component can be used for ~139 years if stored in a UNSIGNED 64 bits integer field.

The time component can be 1 ms or more ahead of the system time when necessary to maintain monotonicity and generation speed.

Node identifier

A simple way to avoid collisions is to make sure that each generator has its exclusive node ID. A "node" as we call it in this library can be a physical machine, a virtual machine, a container, a k8s pod, a running process, a database instance number, etc.

The node ID can be given to TsidFactory by defining the tsidcreator.node system property or the TSIDCREATOR_NODE environment variable. Otherwise, the node identifier will be chosen randomly.

The total number of nodes can be given to TsidFactory by defining the tsidcreator.node.count system property or the TSIDCREATOR_NODE_COUNT environment variable. If this property or variable is set, TsidFactory will adjust the amount of bits needed to fit the given node count. For example, if the value 100 is given, the number of bits reserved for the node ID is set to 7, which is the minimum number of bits needed to fit 100 nodes. Otherwise, the default number of bits is set to 10, which can accommodate 1024 nodes.

System properties:

  • tsidcreator.node: the node identifier (machine-id).
  • tsidcreator.node.count: the total number of nodes.

Environment variables:

  • TSIDCREATOR_NODE: the node identifier (machine-id).
  • TSIDCREATOR_NODE_COUNT: the total number of nodes.

Using system properties:

// append to VM arguments
// node identifier: 1 of 1024
// default node count is 1024
-Dtsidcreator.node="1"
// append to VM arguments
// node identifier: 1 of 64
-Dtsidcreator.node="1" \
-Dtsidcreator.node.count="64"

Using environment variables:

# append to ~/.profile
# node identifier: 1 of 1024
# default node count is 1024
export TSIDCREATOR_NODE="1"
# append to ~/.profile
# node identifier: 1 of 64
export TSIDCREATOR_NODE="1"
export TSIDCREATOR_NODE_COUNT="64"
# append to ~/.profile
# node identifier: x of 256
# where x is the last part of the host's IPv4 (if it can be resolved)
# for example, if the host address is 192.168.0.42, the value of x is 42
export TSIDCREATOR_NODE="`hostname --ip-address | awk -F. '{print $4}'`"
export TSIDCREATOR_NODE_COUNT="256"
# append to ~/.profile
# node identifier: x of 1024
# remember that the default node count is 1024 and the maximum number of IDs is 4096K/second/node
# where x is the MODULO 1024 (2^10) of the first host's IPv4 (if there's 1 or more addresses)
# for example, if the first address of the host is 192.168.1.1, the value of x is 257 (1*256 + 1)
export TSIDCREATOR_NODE="`hostname -I | awk '{print $1}' | awk -F. '{print ($3*256 + $4) % 1024}'`"
# append to ~/.profile
# node identifier: x of 65536
# note that the maximum number of IDs per ms per node is reduced to 64, i.e., 64K/second/node
# where x is the MODULO 65536 (2^16) of the first host's IPv4 (if there's 1 or more addresses)
# for example, if the host address is 10.42.10.1 (e.g. k8s pod), the value of x is 2561 (10*256 + 1)
export TSIDCREATOR_NODE="`hostname -I | awk '{print $1}' | awk -F. '{print ($3*256 + $4) % 65536}'`"
export TSIDCREATOR_NODE_COUNT="65536"

Notes:

  1. As a reference, 6,000 tweets are posted on Twitter every second as of 2022;
  2. According to the hostname manual:
    • --ip-address or -i (lowercase): Display the network address(es) of the host name. Note that this works only if the host name can be resolved. Avoid using this option; use hostname --all-ip-addresses instead.
    • --all-ip-addresses or -I (uppercase): Display all network addresses of the host. This option enumerates all configured addresses on all network interfaces. The loopback interface and IPv6 link-local addresses are omitted. Contrary to option -i, this option does not depend on name resolution. Do not make any assumptions about the order of the output.

More Examples

Create a quick TSID:

Tsid tsid = Tsid.fast();

Create a TSID from a canonical string (13 chars):

Tsid tsid = Tsid.from("0123456789ABC");

Convert a TSID into a canonical string in lower case:

String string = tsid.toLowerCase(); // 0123456789abc

Get the creation instant of a TSID:

Instant instant = tsid.getInstant(); // 2020-04-15T22:31:02.458Z

Encode a TSID to base-62:

String string = tsid.encode(62); // 0T5jFDIkmmy

Format a TSID to a string starting with a letter, where "K" is the letter and "%S" is a placeholder:

String string = tsid.format("K%S"); // K0AWE5HZP3SKTK

A key generator that makes substitution easy if necessary:

package com.example;

import com.github.f4b6a3.tsid.TsidCreator;

public class KeyGenerator {
    public static String next() {
        return TsidCreator.getTsid().toString();
    }
}
String key = KeyGenerator.next();

A TsidFactory with a FIXED node identifier:

int node = 256; // max: 2^10
TsidFactory factory = new TsidFactory(node);

// use the factory
Tsid tsid = factory.create();

A TsidFactory with a FIXED node identifier and CUSTOM node bits:

// setup a factory for up to 64 nodes and 65536 ID/ms.
TsidFactory factory = TsidFactory.builder()
    .withNodeBits(6)      // max: 20
    .withNode(63)         // max: 2^nodeBits
    .build();

// use the factory
Tsid tsid = factory.create();

A TsidFactory with a CUSTOM epoch:

// use a CUSTOM epoch that starts from the fall of the Berlin Wall
Instant customEpoch = Instant.parse("1989-11-09T00:00:00Z");
TsidFactory factory = TsidFactory.builder().withCustomEpoch(customEpoch).build();

// use the factory
Tsid tsid = factory.create();

A TsidFactory with java.util.Random:

// use a `java.util.Random` instance for fast generation
TsidFactory factory = TsidFactory.builder().withRandom(new Random()).build();

// use the factory
Tsid tsid = factory.create();

A TsidFactory with RandomGenerator (JDK 17+):

// use a random function that returns an int value
RandomGenerator random = RandomGenerator.getDefault();
TsidFactory factory = TsidFactory.builder()
    .withRandomFunction(() -> random.nextInt())
    .build();

// use the factory
Tsid tsid = factory.create();

A TsidFactory with ThreadLocalRandom:

// use a random function that returns an int value
TsidFactory factory = TsidFactory.builder()
    .withRandomFunction(() -> ThreadLocalRandom.current().nextInt())
    .build();

// use the factory
Tsid tsid = factory.create();

A TsidFactory that creates TSIDs similar to Twitter Snowflakes:

// Twitter Snowflakes have 5 bits for datacenter ID and 5 bits for worker ID
int datacenter = 1; // max: 2^5-1 = 31
int worker = 1;     // max: 2^5-1 = 31
int node = (datacenter << 5 | worker); // max: 2^10-1 = 1023

// Twitter Epoch is fixed in 1288834974657 (2010-11-04T01:42:54.657Z)
Instant customEpoch = Instant.ofEpochMilli(1288834974657L);

// a function that returns an array with ZEROS, making the factory
// to RESET the counter to ZERO when the millisecond changes
IntFunction<byte[]> randomFunction = (x) -> new byte[x];

// a factory that returns TSIDs similar to Twitter Snowflakes
TsidFactory factory = TsidFactory.builder()
		.withRandomFunction(randomFunction)
		.withCustomEpoch(customEpoch)
		.withNode(node)
		.build();

// use the factory
Tsid tsid = factory.create();

A TsidFactory that creates TSIDs similar to Discord Snowflakes:

// Discord Snowflakes have 5 bits for worker ID and 5 bits for process ID
int worker = 1;  // max: 2^5-1 = 31
int process = 1; // max: 2^5-1 = 31
int node = (worker << 5 | process); // max: 2^10-1 = 1023

// Discord Epoch starts in the first millisecond of 2015
Instant customEpoch = Instant.parse("2015-01-01T00:00:00.000Z");

// a factory that returns TSIDs similar to Discord Snowflakes
TsidFactory factory = TsidFactory.builder()
		.withCustomEpoch(customEpoch)
		.withNode(node)
		.build();

// use the factory
Tsid tsid = factory.create();

Benchmark

This section shows benchmarks comparing TsidCreator to java.util.UUID.

---------------------------------------------------------------------------
THROUGHPUT (operations/msec)       Mode  Cnt      Score      Error   Units
---------------------------------------------------------------------------
UUID_randomUUID                   thrpt    5   1630,938 ±  183,581  ops/ms
UUID_randomUUID_toString          thrpt    5   1604,916 ±  189,711  ops/ms
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Tsid_fast                         thrpt    5  37397,739 ± 1128,756  ops/ms
Tsid_fast_toString                thrpt    5  21144,662 ±  673,939  ops/ms
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
TsidCreator_getTsid256            thrpt    5  10727,236 ±  761,920  ops/ms
TsidCreator_getTsid256_toString   thrpt    5   6813,193 ±  867,041  ops/ms
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
TsidCreator_getTsid1024           thrpt    5  12146,561 ± 1533,959  ops/ms
TsidCreator_getTsid1024_toString  thrpt    5   6507,373 ±  729,444  ops/ms
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
TsidCreator_getTsid4096           thrpt    5  11589,976 ± 1757,076  ops/ms
TsidCreator_getTsid4096_toString  thrpt    5   6497,042 ± 1339,480  ops/ms
---------------------------------------------------------------------------
Total time: 00:03:22
---------------------------------------------------------------------------

Number of threads used in this the benchmark: 4.

System: CPU i7-8565U, 16G RAM, Ubuntu 22.04, JVM 11, rng-tools installed.

To execute the benchmark, run ./benchmark/run.sh.

Ports, forks, implementations and other OSS

Ports, forks and implementations:

Language Name
Go vishal-bihani/go-tsid
Java vladmihalcea/hypersistence-tsid
Java vincentdaogithub/tsid
.NET kgkoutis/TSID.Creator.NET
PHP odan/tsid
Python luismedel/tsid-python
Rust jakudlaty/tsid
TypeScript yubintw/tsid-ts

Other OSS:

Language Name
Java fillumina/id-encryptor
.NET ullmark/hashids.net

License

This library is Open Source software released under the MIT license.

tsid-creator's People

Contributors

dependabot[bot] avatar fabiolimace avatar fillumina avatar jakudlaty avatar luismedel avatar yubintw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tsid-creator's Issues

Question: Is the ThreadLocalRandom approach considerably safe on a clustered environment?

Hello, first of all I'd like to congratulate the maintainer for this wonderful project.
I'm wondering if, due an unavailability to provide a node identifier, the ThreadLocalRandom approach described on README is significantly safe to generate ids on a small cluster, < 10 nodes, assuring low probability of collision.

// use a random function that returns an array of bytes with a given length
TsidFactory factory = TsidFactory.builder()
    .withRandomFunction(length -> {
        final byte[] bytes = new byte[length];
        ThreadLocalRandom.current().nextBytes(bytes);
        return bytes;
    }).build();

// use the factory
Tsid tsid = factory.create();

My best regards.

Add a fast method to generate identifiers

Implement a convenience method to generate quick TSIDs.

Tsid tsid = Tsid.fast();

AtomicInteger will be employed by this method.

Security-sensitive applications that require a cryptographically strong pseudo-random generator should continue to use TsidCreator.getTsid().

TSID as database primary keys in multi tenant environment

Hi,

First of all, thanks for this library. I haven't used it yet, but it looks promising. I'm working on a project with Spring/Hibernate/MySql classical stack on a multitenant environment, where there is a database catalog per tenant. Additionally, some tenants data could be consolidated in a different catalog as they belong to the same organization. Anyway, my requirement is to generate primary keys in a way where future clashes can be avoided. This is where this library fits. My initial thought was to assign a different node id for each tenant, but the library is designed assuming that node id is immutable. So, my questions are:

  • Does my approach make sense?.
  • If so, do you think that this requirement could be fit in your library?.
  • If not, my idea is to clone the repo and make an implementation where the create method receives an overriding node id. What do you think about this approach?.

Thanks for your time.
Best regards,

Juan Carlos

Increment the counter when the random function returns null or empty

Currently, TsidFactory does not accept a random function that returns null. The random function is used to reset the counter when the millisecond changes. If this function returns null, an exception is thrown.

TsidFactory can ignore the null return and just increment the counter value.

Add simple format methods

Goal 1: produce formatted TSID.

String format = "K%S";
String key = Tsid.fast().format(format); // key: K0AWG1PRQQFZ8D
String format = "DOC-%S.pdf";
String filename = Tsid.fast().format(format); // filename: DOC-0AWG1PRQQFZ8D.pdf

Goal 2: parse formatted TSID.

String format = "K%S";
String key = "K0AWG1PRQQFZ8D";
Tsid tsid = Tsid.unformat(key, format); // tsid: 0AWG1PRQQFZ8D
String format = "DOC-%S.pdf";
String filename = "DOC-0AWG1PRQQFZ8D.pdf";
Tsid tsid = Tsid.unformat(filename, format); // tsid: 0AWG1PRQQFZ8D

Reduce synchronization scope when generating the TSID

Currently, the TsidFactory.create method is synchronized:

public synchronized Tsid create() {

	final long _time = getTime() << RANDOM_BITS;
	final long _node = (long) this.node << this.counterBits;
	final long _counter = (long) this.counter & this.counterMask;

	return new Tsid(_time | _node | _counter);
}

But, the synchronized is needed for getTime only. So, it is getTime that should be synchronized instead because of the timestamp monotonicity issues:

private synchronized long getTime() {

	long time = clock.millis();

	if (time <= this.lastTime) {
		this.counter++;
		// Carry is 1 if an overflow occurs after ++.
		int carry = this.counter >>> this.counterBits;
		this.counter = this.counter & this.counterMask;
		time = this.lastTime + carry; // increment time
	} else {
		// If the system clock has advanced as expected,
		// simply reset the counter to a new random value.
		this.counter = this.getRandomCounter();
	}

	// save current time
	this.lastTime = time;

	// adjust to the custom epoch
	return time - this.customEpoch;
}

This will reduce the scope of the Thread contention, especially when using a ThreadLocalRandom generator.

Remove the overrun exception

Make the TimeIdCreator wait the next millisecond instead of throwing a TsidCreatorException.

Twitter Snowflake also waits the next millisecond:

sequence number - 12 bits - rolls over every 4096 per machine (with protection to avoid rollover in the same ms)

  protected def tilNextMillis(lastTimestamp: Long): Long = {
    var timestamp = timeGen()
    while (timestamp <= lastTimestamp) {
      timestamp = timeGen()
    }
    timestamp
  }

See this line:
https://github.com/twitter-archive/snowflake/blob/b3f6a3c6ca8e1b6847baa6ff42bf72201e2c2231/src/main/scala/com/twitter/service/snowflake/IdWorker.scala#L97

Make the Tsid class extensible

Could you make the Tsid class extensible?
Can you remove the final modifier from class and changing the number field from private to protected?

Use case: For some entities like Tenant and User I have specialized IDs that differentiate them from others.

To extend Tsid I had to copy the classes into my project and modify them, but I would like to use the original library for that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.