Giter Site home page Giter Site logo

sqids / sqids-dotnet Goto Github PK

View Code? Open in Web Editor NEW
670.0 14.0 15.0 121 KB

Official .NET port of Sqids. Generate short unique IDs from numbers.

Home Page: https://sqids.org/dotnet

License: MIT License

C# 100.00%
dot-net dotnet id-generator sqids csharp hashids short-id short-url unique-id unique-id-generator

sqids-dotnet's Introduction

Sqids Specification

Github Actions

This is the main repository for Sqids specification. It is meant to be the guide for future ports of different languages.

The code is optimized for readability and clarity; individual implementations should optimize for performance as needed.

All unit tests should have matching results.

๐Ÿ‘ฉโ€๐Ÿ’ป Get started

npm install
npm test

The main Sqids library is in src/index.ts; unit tests are in src/tests.

Use the following to format & check changes:

npm run format
npm run lint

๐Ÿšง Improvements (over Hashids)

  1. The user is not required to provide randomized input anymore (there's still support for custom IDs).
  2. Better internal alphabet shuffling function.
  3. With default alphabet - Hashids is using base 49 for encoding-only, whereas Sqids is using base 61.
  4. Safer public IDs, with support for custom blocklist of words.
  5. Separators are no longer limited to characters "c, s, f, h, u, i, t". Instead, it's one rotating separator assigned on the fly.
  6. Simpler & smaller implementation: only "encode" & "decode" functions.

๐Ÿ”ฌ How it works

Sqids is basically a decimal to hexademical conversion, but with a few extra features. The alphabet is larger, it supports encoding several numbers into a single ID, and it makes sure generated IDs are URL-safe (no common profanity).

Here's how encoding works:

  1. An offset index is chosen from the given input
  2. Alphabet is split into two pieces using that offset and those two halfs are swapped
  3. Alphabet is reversed
  4. For each input number:
    1. The first character from the alphabet is reserved to be used as separator
    2. The rest of the alphabet is used to encode the number into an ID
    3. If this is not the last number in the input array, the separator character is appended
    4. The alphabet is shuffled
  5. If the generated ID does not meet the minLength requirement:
    • The separator character is appended
    • If still does not meet requirement:
      • Another shuffle occurs
      • The separator character is again appended to the remaining id + however many characters needed to meet the requirement
  6. If the blocklist function matches the generated ID:
    • offset index is incremented by 1, but never more than the length of the alphabet (in that case throw error)
    • Re-encode (repeat the whole procedure again)

Decoding is the same process but in reverse. A few things worth noting:

  • If two separators are right next to each other within the ID, that's fine - it just means the rest of the ID are junk characters used to satisfy the minLength requirement
  • The decoding function does not check if ID is valid/canonical, because we want blocked IDs to still be decodable (the user can check for this stuff themselves by re-encoding decoded numbers)

๐Ÿ“ฆ How to port Sqids to another language?

Implementations of new languages are more than welcome! To start:

  1. Make sure the language is not already implemented. At this point, if you see a Hashids implementation, but not a Sqids implementation: we could use your help on converting it.
  2. The main spec is here: https://github.com/sqids/sqids-spec/blob/main/src/index.ts. It's ~300 lines of code and heavily commented. Comments are there for clarity, they don't have to exist in your own implementation.
  3. Fork the repository/language you'd like to implement to your own Github account. If the repository/language does not exist under the Sqids Github account, open a new issue under the spec repo so we can create a blank repo first.
  4. Implement the main library + unit tests + Github Actions (if applicable). You do not need to port tests in the internal folder; they are there to test the algorithm itself.
  5. Add a README.md -- you can re-use any of the existing ones.
  6. Please use the blocklist from https://github.com/sqids/sqids-blocklist. It will contain the most up-to-date list. Do not copy and paste the blocklist from other implementations, as they might not be up-to-date.
  7. Create a pull request, so we can review & merge it.
  8. If the repo has no active maintainers, we'll invite you to manage it (and maybe even merge your own PR).
  9. Once the library is ready, we'll update the website.

๐Ÿ“‹ Notes

  • The reason prefix character is used is to randomize sequential inputs (eg: [0, 1], [0, 2], [0, 3]). Without the extra prefix character embedded into the ID, the output would start with the same characters.
  • Internal shuffle function does not use random input. It consistently produces the same output.
  • If new words are blocked (or removed from the blocklist), the encode() function might produce new IDs, but the decode() function would still work for old/blocked IDs, plus new IDs. So, there's more than one ID that can be produced for same numbers.
  • FAQ section is here: https://sqids.org/faq

๐Ÿป License

Every official Sqids library is MIT-licensed.

sqids-dotnet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqids-dotnet's Issues

Suggestions on `SqidsEncoder`

Now the SqidsEncoder has two versions, non-generic and generic. Both of their codes are in one file with a lot of conditional compilation. It's a bit messy to read.

My suggestions are that,

  • Create a new file for SqidsEncoder<T>, for example, SqidsEncoder{T}.cs. So this file is totally for generic version with only one #if NET7_0_OR_GREATER.
  • When the target framework is NET7_0_OR_GREATER, we can still leave a non-generic version and let it derive SqidsEncoder<int>. So, the upgrade from .NET 7 below can be easier and smoother.
  • Create a shared file for reusable codes between two versions, for example, SqidsEncoder.Shared.cs.

I'll do a test first and pull a request when I'm free.

Do you have example of using this with a custom JSON converter?

I'm using minimal API with .NET 7 and I wanted to add an attribute to my ID fields so that when the JSON encoding/decoding happens, it automatically runs it through this library. The issue I'm having is that when I call builder.Services.ConfigureHttpJsonOptions there's not an IServiceProvider available to pass to the custom converter, meaning it doesn't have access to the injected SqidsEncoder.

I'm wanting to do something like this:

public sealed class SqidJsonConverter : JsonConverter<int> {
    public override int Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) {
        if (reader.GetString() is not string encoded)
            throw new JsonException();

        return options
            .GetServiceProvider()
            .GetRequiredService<SqidsEncoder<int>>()
            .Decode(encoded)
            .FirstOrDefault();
    }

    public override void Write(Utf8JsonWriter writer, int value, JsonSerializerOptions options) {
        var sqid = options.GetServiceProvider().GetRequiredService<SqidsEncoder<int>>();

        writer.WriteStringValue(sqid.Encode(value));
    }
}

A memory-alloc quesiton in the constructor of `SqidsEncoder`

Sorry to interrupt you again. I have a small question: why do you use Span in the following codes ?

                // TODO: `sizeof(T)` doesn't work, so we resorted to `sizeof(long)`, but ideally we should get it to work somehow โ€” see https://github.com/sqids/sqids-dotnet/pull/15#issue-1872663234
		Span<char> shuffledAlphabet = options.Alphabet.Length * sizeof(long) > MaxStackallocSize // NOTE: We multiply the number of characters by the size of a `char` to get the actual amount of memory that would be allocated.
			? new char[options.Alphabet.Length]
			: stackalloc char[options.Alphabet.Length];
		options.Alphabet.AsSpan().CopyTo(shuffledAlphabet);
		ConsistentShuffle(shuffledAlphabet);
		_alphabet = shuffledAlphabet.ToArray();

First of all, the allocation of a char array for _alphabet is unavoidable.

Then, if the alphabet size is less than MaxStackallocSize, shuffledAlphabet is allocated on the stack but the last code line converts it to a char[]. Otherwise, shuffledAlphabet.ToArray() will create another new char[] and shuffledAlphabet is going to be collected by GC, which means double allocation.

Why not just let shuffledAlphabet be a char array and assign it to _alphabet? Span seems to be unnecessary here.

Ensure decoding an invalid ID with a repeating reserved character does not crash

Spec has been adjusted to include an additional check: sqids/sqids-spec@f52b578

What happens without that check is that when a reserved character (eg: offset character) is removed from the alphabet during decoding, conversion from ID to number becomes impossible when that same character is once again present in the ID (under normal conditions this never happens during encoding).

The best way to check for this is to try & decode an ID fff (like in the above link). Please ensure the library tests for that & returns an empty array under these conditions.

Encode types other than int

Hello;

I am looking to use your library to encode/decode an id/timestamp value. I need to use GUID for the ID and a ulong for the timestamp.

Is there any way to hash those values together, using this library?

When will this be implemented?

Hi. I want to use this in a .NET project, I'd like to know whether this is being implemented or should I just go back to Hashids.net instead?

Always decoding, even with random string

var sqids = new SqidsEncoder<long>(new SqidsOptions { Alphabet = _sqidAlphabet });
return sqids.Decode(sqid);

This code always decode, even with random string. But the playground returns Error: Invalid ID with the same string. Am I doing something wrong or missing a very primitive concept?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.