Giter Site home page Giter Site logo

oldpanda / bloomfilter-py Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 0.0 45 KB

Yet another Bloomfilter implementation in Python, compatible with Java's Guava library

Home Page: https://pypi.org/project/bloomfilter-py/

License: MIT License

Python 100.00%
bloomfilter bloomfilter-python python3 python python-library

bloomfilter-py's Introduction

bloomfilter-py

codecov Downloads

Overview

Yet another Bloomfilter implementation in Python, compatible with Java's Guava library.

I was looking for a Python library which is capable of reading what Bloomfilter of Java's Guava library serializes and is also able to output byte array which is recognizable by Java. But unfortunately failed. Hence I developed this library by borrowing how Guava implements Bloomfilter serialization/deserialization a lot to deal with Bloomfilters on both Python and Java sides.

As for Bloomfilter usage in Java world, please refer to this post.

Here's a brief introduction to Bloomfilter.

Requirements

  • Python 3.7+

This library is not tested under Python 3.6 and lower versions.

Install

pip install bloomfilter-py

Usage Examples

Basic Usage

>>> from bloomfilter import BloomFilter
>>> bloom_filter = BloomFilter(expected_insertions=500, err_rate=0.01)
>>> for i in range(100):
...     bloom_filter.put(i)
...
>>> 1 in bloom_filter
True
>>> 100 in bloom_filter
False
>>>

Serialize Bloomfilter

You can easily serialize BloomFilter instance to a byte array

>>> dumps = bloom_filter.dumps()
>>> with open("dumps.out", "wb") as f:
...     f.write(dumps)
...
>>>

or to a hex string

>>> hex_str = bloom_filter.dumps_to_hex()

or to a base64 encoded bytes

base64_bytes = bloom_filter.dumps_to_base64()

Deserialize Bloomfilter

And you can easily initialize a BloomFilter instance from a byte array

>>> with open("dumps.out", "rb") as f:
...     bf = BloomFilter.loads(f.read())
...
>>> 1 in bf
True
>>> 100 in bf
False
>>>

or from a hex string

>>> bf = BloomFilter.loads_from_hex(hex_str)
>>> 1 in bf
True
>>> 100 in bf
False

or from a base64 encoded bytes

>>> bf = BloomFilter.loads_from_base64(base64_bytes)
>>> 100 in bf
False
>>> 200 in bf
False
>>> 1 in bf
True
>>> 99 in bf
True

bloomfilter-py's People

Contributors

oldpanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bloomfilter-py's Issues

Byte decoding fails while converting to string.

Use case:

  • To convert the bloom filter object into a string in order to store and retrieve as and when required.

Problem:

  • While converting the bytes object after using the dumps() method on the bloom filter object into a string via the decode() method, it fails (in some cases when typically the integers added to the filter are larger) as below:

    File ~/.pyenv/versions/3.10.6/lib/python3.10/encodings/utf_8.py:16, in decode(input, errors)
       15 def decode(input, errors='strict'):
    ---> 16     return codecs.utf_8_decode(input, errors, True)
    
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 6: invalid start byte
    

Code to reproduce:

import random
from bloomfilter import BloomFilter
bf = BloomFilter(expected_insertions=10, err_rate=0.1)
bf.put(random.randint(100000000, 10000000000))
bf.put(random.randint(100000000, 10000000000))
bf.dumps().decode()

My observations:

The encoding being used here for serializing the bloom filter to bytes is neither of [utf-8, ascii, utf-16, utf-32] as the byte codecs seen are not supported in either of these.

Probable solutions:

  1. Use the utf-8 encoding while serializing the bloom filter object.
  2. Allow users to specify the encoding to be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.