This fork extends the implementation to function as a key-value store with similar properties to Cucko Filter.
The original implementation allowed counting the number of insertions, when storing values alongside a fingerprint we can't distinguish values with the same fingerprint anymore. Therefore this implementation drops support for counting. This does not limit its ability to delete entries.
CODE IS BARELY TESTED AND STILL REQUIRES SOME MAINTENANCE!
Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
For details about the algorithm and citations please use this article for now
"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky
extern crate cuckoofilter;
...
let value: &str = "hello world";
// Create cuckoo filter with default max capacity of 1000000 items
let mut cf = cuckoofilter::new();
// Add data to the filter
let success = cf.add(value).unwrap();
// success ==> Ok(())
// Lookup if data is in the filter
let success = cf.contains(value);
// success ==> true
// Test and add to the filter (if data does not exists then add)
let success = cf.test_and_add(value).unwrap();
// success ==> Ok(false)
// Remove data from the filter.
let success = cf.delete(value);
// success ==> true
This crate has a C interface for embedding it into other languages than Rust. See the C Interface Documentation for more details.
- This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
- When the filter returns
NotEnoughSpace
, the element given is actually added to the filter, but some random other element gets removed. This could be improved by implementing a single-item eviction cache for that removed item. - There are no high-level bindings for other languages than C. One could add them e.g. for python using milksnake.