Giter Site home page Giter Site logo

minhash2.0's Introduction

On the Privacy of Sublinear-Communication Jaccard Index Estimation via Min-hash Sketching

This repository is the official implementation of [On the Privacy of Sublinear-Communication Jaccard Index Estimation via Min-hash Sketching].

Requirements

To install requirements:

pip install -r requirements.txt

The experiment is implemented based on Python3, and should have pip version>=22.0.2.

To run the experiment, run:

python3 main.py

Graph will be generated and stored for each section. To adjust the size and scale or the outcome graph, e.g. 8x6 graph size, use "fig, ax = plt.subplots(figsize=(8, 6))" at each graph function. The experiment takes a while to finish, especially the one for "number of iterations vs jaccard index" for public hash setting (MinhashGraphPBinomJaccardVsK()), due to the usage of binary search to find the optimal number of iterations. Hence, our suggestion is to run MinhashGraphPBinom() instead, which provides "epsilon vs delta" giving intersection size and Jaccard index, to get a quick assessment of the trade-off between intersection size, Jaccard index, and privacy parameters. Experiment raw results, including the case of n=100k and n=1million, is also included as an Excel chart with graphs.

Results

Below are the empirical evaluation results of parameters in both curator setting or public hash setting. See the section 6 in our paper for detail.

The results for the curator setting:

eps, delta: Figure

Iteration vs Jaccard: Figure

The results for the public hash setting:

eps, delta: Figure

Iteration vs Jaccard: Figure

minhash2.0's People

Contributors

witolyu avatar merpsu1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.