Giter Site home page Giter Site logo

shockhash's Introduction

ShockHash

Small, heavily overloaded cuckoo hash tables inside the RecSplit framework. Constructs very compact perfect hash functions significantly faster than previous approaches.

Construction performance

Plots preview

Library Usage

Clone (with submodules) this repo and add it to your CMakeLists.txt:

add_subdirectory(path/to/ShockHash)
target_link_libraries(YourTarget PRIVATE ShockHash)

Then use one of the following classes:

  • ShockHash is the original ShockHash algorithm integrated into the RecSplit framework.
  • SIMDShockHash is the SIMD-parallel version of the original ShockHash algorithm. Both ShockHash and the RecSplit framework are SIMD-parallelized. If this implementation is used on a machine without SIMD support, it is slower than the non-SIMD version because SIMD operations are emulated.
  • ShockHash2 is the bipartite ShockHash algorithm. Only the inner ShockHash loop is SIMD-parallel, the RecSplit framework is not. If this implementation is used on a machine without SIMD support, the implementation uses sequential operations without explicitly emulating SIMD. To turn off SIMD, change to SIMD lanes of size 1 in ShockHash2-internal.h.

Constructing a ShockHash perfect hash function is then straightforward:

std::vector<std::string> keys = {"abc", "def", "123", "456"};
shockhash::ShockHash<30, false> shockHash(keys, 2000); // ShockHash base case size n=30, bucket size b=2000
std::cout << shockHash("abc") << " " << shockHash("def") << " "
          << shockHash("123") << " " << shockHash("456") << std::endl;
// Output: 1 3 2 0

We also give the base-case implementations without the RecSplit framework, which makes it easier to understand the main idea.

  • Original ShockHash.
  • Bipartite ShockHash. The outer loop that is also given in the pseudocode of the paper is given in BijectionsShockHash2::findSeed.

Licensing

ShockHash is licensed exactly like libstdc++ (GPLv3 + GCC Runtime Library Exception), which essentially means you can use it everywhere, exactly like libstdc++. You can find details in the COPYING and COPYING.RUNTIME files.

If you use ShockHash or bipartite ShockHash in an academic context or publication, please cite our papers:

@misc{lehmann2023shockhash,
      title={ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force},
      author={Hans-Peter Lehmann and Peter Sanders and Stefan Walzer},
      year={2023},
      eprint={2308.09561},
      archivePrefix={arXiv},
      primaryClass={cs.DS}
}

@misc{lehmann2023bipartite,
      title={Bipartite ShockHash: Pruning ShockHash Search for Efficient Perfect Hashing},
      author={Hans-Peter Lehmann and Peter Sanders and Stefan Walzer},
      year={2023},
      eprint={2310.14959},
      archivePrefix={arXiv},
      primaryClass={cs.DS}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.