Giter Site home page Giter Site logo

Documentation of Algorithm about xxhash HOT 37 CLOSED

cyan4973 avatar cyan4973 commented on August 21, 2024
Documentation of Algorithm

from xxhash.

Comments (37)

stbrumme avatar stbrumme commented on August 21, 2024 10

I wrote a few words about the algorithm on my website but it's definitely not a thorough documentation: http://create.stephan-brumme.com/xxhash/
In addition, you will find my own implementation which skips xxHash64 and alignment/endian issues and therefore boils down to just 182 lines - most of it are comments.

from xxhash.

EamonNerbonne avatar EamonNerbonne commented on August 21, 2024 8

The current code is quite long with quite a few tricks to improve performance - a version that's optimized for brevity and readability instead of raw speed might do the trick.

from xxhash.

mnot avatar mnot commented on August 21, 2024 4

Could I suggest you write it up as a (short) Internet-Draft and submit to be published an RFC?

That would allow it to be referenced by other RFCs (which is why I'm here: we're looking for a non-cryptographic hash algo for this spec).

Happy to help with prepping the I-D and navigating IETF process if you're interested.

from xxhash.

Bulat-Ziganshin avatar Bulat-Ziganshin commented on August 21, 2024 3

That's right - the current code has a lot of extras. This section, though, contains algoritmhs in almost bare state (skip XXH32/XXH64 since they are also utility):

xxHash/xxhash.c

Line 324 in 228d727

/* ***************************

from xxhash.

alnsn avatar alnsn commented on August 21, 2024 3

I think that Film/TV production community wants to see a wikipedia article :)

from xxhash.

dcwar avatar dcwar commented on August 21, 2024 3

Heartbeat ping. I know you're still busy with lz4 and other projects. :) However, I just realized it's been two years since this issue was opened.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024 2

For information on this item :

with Zstandard v1.0 released, I'll try to save time for a new release of LZ4 this month.
Once this is completed, my following objective will be xxHash documentation, and see if it can be moved towards RFC status.

from xxhash.

dcwar avatar dcwar commented on August 21, 2024 1

I'm absolutely certain that a good faith effort in documenting the algorithm is better than none at all. :)

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024 1

Thanks, it's an excellent presentation @stbrumme .
It's not a "specification", but is really useful to understand the hash function main principles.
Then the provided source code makes the function result non-ambiguous.

I guess it is a good basis for a potential Wikipedia article.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024 1

I agree with both suggestions,
especially as they can reinforce each other.

Also, having the support of someone who actually knows the RFC process is a huge plus in this direction.

from xxhash.

dcwar avatar dcwar commented on August 21, 2024 1

Checking in on this. I know you've been busy with lz4.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024 1

Indeed, you are right, this is a mistake.
Thanks for spotting it @masyos !

from xxhash.

exussum12 avatar exussum12 commented on August 21, 2024 1

Just implemented a PHP version using only that doc. So seems to be complete (at least for the 32 bit version)

Thank you!

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

Currently, the code source is the documentation.

Writing a specification as a dedicated document is an effort to be done.

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

Thanks for the quick response! I understand, of course. Any prospective time frame at present?

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

Not really. It's difficult to prioritize this action with most of my time currently tied to zstd development.

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

I completely understand - thanks!

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

The request was made because some end users in the Film/TV production community have expressed doubts as to the validity of an undocumented algorithm. Pointing them to source code, instead of documentation, doesn't address the underlying concern.

from xxhash.

Bulat-Ziganshin avatar Bulat-Ziganshin commented on August 21, 2024

But what they mean by documentation? Or what they can accept? We can copy C code of algorithm into pdf file, is it enough?

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

By documentation, I mean a plain English explanation of how the algorithm works. :)

from xxhash.

Bulat-Ziganshin avatar Bulat-Ziganshin commented on August 21, 2024

You mean translating "x=x*z" into "At next step x multiplied by z" or something higher-level? For algorithm of ~20 lines, sources is best description, so i really cannot understand what else they need.

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

Just like I said, a plain English explanation of how the algorithm works. Not everyone works with source code. :)

If you are looking for a style reference, the description of the algorithm in the MD5 wikipedia article is a starting point:
https://en.wikipedia.org/wiki/MD5#Algorithm

from xxhash.

Bulat-Ziganshin avatar Bulat-Ziganshin commented on August 21, 2024

It's informal description outlining main algorithm details, but not the whole algorithm. Note absence of g and a0 in the text. I think that formal description would be close to plain C-to-English translation, just because English isn't good at formal descriptions. Math or programming languages are better for that needs. Papers about crypto-algorithms including cryptohashes tend to use Math.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

We'll make an effort to document it.
And yes, even the documentation will contain some lines of codes,
as sometimes, it's the simplest and most exact way to express a formula.

It's just a question of time allocation, and currently it's very hard to find time.
But situation will improve in the future.

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

Thank you, Yann!

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

In which case, it's much less thorough than a formal spec.
It still requires time to do, but a lot more people can be involved

from xxhash.

dcwar avatar dcwar commented on August 21, 2024

Agreed that this is a fantastic start @stbrumme , and the graphic is very nice. The example source code certainly helps understand the operation.

Detailing the operation of the hash itself in prose seems like the next step. The FNV Wikipedia page that you linked to does a nice job of explaining the function of the hash in both prose and psuedocode, and I think we'd be on the right track for following that style in creating general purpose 'Wikipedia' documentation.

from xxhash.

stbrumme avatar stbrumme commented on August 21, 2024

@mnot : we should write a Wikipedia article before starting an RFC. That way we could collaboratively figure out the best way to represent the algorithm. I'm not a native speaker (English was actually only the second foreign language I was taught at school) and often have doubts about the proper choice of words.

from xxhash.

darkuranium avatar darkuranium commented on August 21, 2024

Seconded!

I have another reason to add as to why a documentation is needed, @Cyan4973. Namely, without xxHash being documented, LZ4 format documentation is incomplete, as it is impossible to make a full (de)compressor from scratch .

My use case is that a project I'm working on is going to use LZ4 --- although LZ4 is liberally licensed (indeed, I am a fan of the 2-clause BSD license and Copyfree licensing in general), it is not quite liberal enough for this particular project, which will primarily use the even more liberal Boost license (*note below). That means that LZ4 is license-incompatible with my project, and it is why I'm working on a clean-room implementation of LZ4.
But a clean-room implementation of LZ4 needs a clean-room implementation of xxHash ... and with xxHash being undocumented, an LZ4 implementation simply cannot be written. My current decompressor thus simply ignores hashing; making a compliant compressor is impossible because of the non-optional xxHash in the header (other hashes are optional, at least as far as I can tell).

(*note: I might release parts of the code [e.g. xxHash and LZ4] under a public domain-equivalent license instead [most likely CC0], which is --- of course --- also incompatible with the 2-clause BSD license)

from xxhash.

bitinn avatar bitinn commented on August 21, 2024

I am trying to add my vote but found comment from @Bulat-Ziganshin and @stbrumme pretty much explain my questions.

Being a noob at understanding fast hash functions, I think starting from a simple input, such as int32, where byte length is known and code is much simplified (no sub-dividing necessary), would be a very good idea.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

This action is not forgotten ;)
But it's hard to find time to prioritize correctly...

from xxhash.

darkuranium avatar darkuranium commented on August 21, 2024

Heartbeat ping. Please don't make us clean-room-reverse engineer something that's already open-source ...

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

I expected to have some time for this action during the Summer.
Clearly I failed.
My next time-slot for documentation-related activities is this coming October.
Let's put that one in.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

Thanks for the reminder @dcwar .
This gives me the incentive to finish the document in the near future.
I've got something in good shape, just need some time reading and polishing.

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

There is a tentative first version available at :
https://github.com/Cyan4973/xxHash/wiki/xxHash-specification

from xxhash.

masyos avatar masyos commented on August 21, 2024

Algorithm documentation, thank you!
As for the question, the document XXH64 step 5 is different from the source and PRIME, is the source the correct formula?

document

while (remainingLength >= 8) {
...
acc = acc + PRIME64_5;

xxhash.c

   while (p+8<=bEnd) {
        ...
        h64 = XXH_rotl64(h64,27) * PRIME64_1 + PRIME64_4;

from xxhash.

Cyan4973 avatar Cyan4973 commented on August 21, 2024

The specification is now integrated into the repository,
within /doc directory.

from xxhash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.