Giter Site home page Giter Site logo

xx-bloom's Introduction

Yet another Bloom filter implementation for node.js. Everybody has to write one, as you know. Backed by Xxhash via node-xxhash. Xxhash is a fast general-purpose hash, which is all a bloom filter needs. Three variations are provided: a straight Bloom filter, a counting filter (from which items can be removed), and a straight Bloom filter backed by redis. The first two have synchronous APIs. The redis one perforce requires callbacks.

To install: npm install bloomxx

on npm Build Status Coverage Status

Usage

BloomFilter

To create a filter, pass an options hash to the constructor:

var options =
{
	bits: 1024,
	hashes: 7,
	seeds: [1, 2, 3, 4, 5, 6, 7]
};
filter = new BloomFilter(options);

You can pass in seeds for the hash functions if you like, or they'll be randomly generated. Seeds must be integers.

You may also pass in a buffer as generated by filter.toBuffer().

createOptimal()

To create a filter optimized for the number of items you'll be storing and a desired error rate:

filter = BloomFilter.createOptimal(estimatedItemCount, errorRate);

The error rate parameter is optional. It defaults to 0.005, or a 0.5% rate.

add()

filter.add('cat');

Adds the given item to the filter. Can also accept buffers and arrays containing strings or buffers:

filter.add(['cat', 'dog', 'coati', 'red panda']);

has()

To test for membership:

filter.has('dog');

clear()

To clear the filter:

filter.clear();

toBuffer()

Returns a buffer with seeds and filter data.

fromBuffer()

Reconstitutes a filter from a freeze-dried buffer.

CountingFilter

Uses about 8 times as much space as the regular filter. Basic usage is exactly the same as the plain Bloom filter:

filter = new CountingFilter({ hashes: 8, bits: 1024 });`
filter2 = CountingFilter.createOptimal(estimatedItemCount, optionalErrorRate);

Add a list, test for membership, then remove:

filter.add(['cat', 'dog', 'coati', 'red panda']);
filter.has('cat'); // returns true
filter.remove('cat');
filter.has('cat'); // returns false most of the time

The counting filter tracks its overflow count in filter.overflow. Overflow will be non-zero if any bit has been set more than 255 times. Once the filter has overflowed, removing items is no longer reliable.

Check for overflow:

filter.hasOverflowed(); // returns boolean
filter.overflow; // integer count of number of times overflow occurred

RedisFilter

This is a plain vanilla bloom filter backed by redis. Its api is asychronous.

RedisFilter.createOrRead({
		key: 'cats', // the key used to store data in redis; will also set 'cats:meta'
		bits: 1024,  // filter size in bits
		hashes: 8,   // number of hash functions
		redis: redis.createClient(port, host)  // redis client to use
	}, function(err, filter)
	{
		filter.add(['cat', 'jaguar', 'lion', 'tiger', 'leopard'], function(err)
		{
			filter.has('caracal', function(err, result)
			{
				assert(result === false);
			});
		});
	});

The options hash can also specify host and port, which will be used to create a redis client. createOrRead() will attempt to find a filter saved at the given key and create one if it isn't found.

createOptimal(itemCount, errorRate, options)

Returns a filter sized for the given item count and desired error rate, with other options as specified in the options hash.

clear(function(err) {})

Clear all bits.

del(function(err) {})

Delete the filter from redis.

Licence

MIT.

xx-bloom's People

Contributors

ceejbot avatar greenkeeperio-bot avatar

Stargazers

Github Notification avatar Tuan Anh Tran avatar Mitch Crowe avatar yarik ponomarenko avatar Ava Howell avatar  avatar lagleki avatar  avatar Tom Thorogood avatar Val Packett avatar  avatar Luke avatar Avraam Mavridis avatar Francesc Rosas avatar Marcel Toben avatar 别天 avatar Jean Christophe Roy avatar Connor Peet avatar Angus H. avatar Joe Graham avatar Jon Baer avatar Matt avatar Michael Anthony avatar Aaron Lidman avatar Valentin Vichnal avatar Kumarajiva avatar Joshua Stauter avatar SDF avatar  avatar Isao Yagi avatar JT5D avatar Fatih Kaya avatar Edgar de Oliveira Maciel avatar  avatar Alexandru Vlăduţu avatar Steve Thomas avatar Umar Hansa avatar Joseph avatar Jarrett Cruger avatar Arnout Kazemier avatar Yann Collet avatar Carter Cole avatar Arjun Bajaj avatar Matteo Collina avatar

Watchers

 avatar James Cloos avatar yuan avatar Deepak Verma avatar Michael Anthony avatar  avatar

xx-bloom's Issues

Counting filter / overflow issue

I was playing around with the counting filter and I'm seeing some problems with the hasOverflowed function - it seems like it is pretty wildly off.

Take this example:

var
  bloom   = require('bloomxx'),
  cFilter = bloom.CountingFilter.createOptimal(75),
  i;

for (i = 0; i < 10000; i += 1) {
  cFilter.add(String(Math.random()));
}
cFilter.add('kyle');

console.log('overflowed?', cFilter.hasOverflowed(), cFilter.overflow);

console.log('has name: kyle?',cFilter.has('kyle'));
console.log('has name: bob?',cFilter.has('bob'));
console.log('removed kyle');
cFilter.remove('kyle');
console.log('has name: kyle?',cFilter.has('kyle'));

For me, this is resulting in the following output:

overflowed? false 0
has name: kyle? true
has name: bob? true
removed kyle
has name: kyle? true

Since the loop is only inserting random numbers and the we manually insert 'kyle', 'bob' should not be returning true. I would also guess that an optimal filter for size 75 would have overflowed way before 10,001 members. Or am I missing something in the implementation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.