Giter Site home page Giter Site logo

jbapple / libfilter Goto Github PK

View Code? Open in Web Editor NEW
30.0 5.0 6.0 2.17 MB

High-speed Bloom filters and taffy filters for C, C++, and Java

License: Apache License 2.0

Makefile 3.10% C++ 41.18% C 31.66% Java 16.81% CMake 2.15% Go 2.25% Python 2.85%
c cpp data-structures library bloom-filter bloom-filters bloomfilter java

libfilter's People

Contributors

jbapple avatar mavam avatar tobim avatar ttsugriy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

libfilter's Issues

[Java] Block filter has high fpp

I noticed the com.github.jbapple.libfilter.TaffyBlockFilter has a high fpp (100% at some point). Likely there is a simple bug:

private void MakeMask(long hash, int[] result) {
    // The loops are done individually for easier auto-vectorization by the JIT
    for (int i = 0; i < 8; ++i) {
      result[0] = (int) hash * INTERNAL_HASH_SEEDS[i]; <== BUG HERE
    }
    for (int i = 0; i < 8; ++i) {
      result[i] = result[i] >>> (32 - 5);
    }
    for (int i = 0; i < 8; ++i) {
      result[i] = 1 << result[i];
    }
  }

To me it looks like it should be result[i] instead of result[0].

Benchmark for different filters

Thanks for the amazing work.
Can we please also add some benchmarking results for different filters.
Trying to understand the Build latency , query latency , fpp rates and bits per value for these different filters to choose something that fits under some of opur sla requirements.
use case is upper bound 50-100 million values to be added , for a given bits/value , what would be the memory size for the filter and build latency when adding huge number of elements to the filter

Save the filter to SSD

Thank you for implementing this awesome filter, I have a question I would like to ask, I wonder if you can give me some pointers.
I want to use C/C++ to save the filter to SSD so that it can be load again next time, or directly use mmap the filter to the memory instead of reading the entire filter into the memory.
grateful.

split block bloom filters implementation

While i am going through some amazing work you are doing , wanted to get your thoughts on some implementation details or caveats

IIUC SBBF is same as in https://github.com/apache/kudu/blob/master/src/kudu/util/block_bloom_filter.h ? Are there any implementation difference or caveats in your particular approach (any significant perf difference) ? Bit confused around terminology here between block and split block so wanted to understand better if your implementation is something additional on top of classical blocked bloom filter.

I found your repo through : https://arxiv.org/pdf/2101.01719.pdf , it says for the split block bloom fpp are somewhat fixed due to pre-determined number of hash functions used . In your repo example is see double fpp = 0.0065; , is this because we are using a fixed number of hashes and bits per value (pre-configured at the given fpp level) . Am i able to tune this for lower fpp by using more hash functions and may be more bits per value ? What is the current default bits per value set in current SBBF implementation ? which parameter is that ?

In the apache kudu project which uses block bloom filter , here : https://github.com/apache/kudu/blob/master/src/kudu/util/block_bloom_filter.h#L66 it says we are able to get 0.1% fpp with 15 bits per value , is it possible to achieve this in your split block BF implementation as well , or are we stuck to 0.4% - 19% range only ,

thanks for bearing with my naive questions as i am new into this area and still trying to figure out best filter for my case where i have strict latency requirement and need to insert ~10-100 million ids , these ids are not uuid but sequential range [0-100M] , so was trying to use roaring bitmaps before but now looking into this probabilistic filters since constructing filters is a potential bottleneck for me .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.