Giter Site home page Giter Site logo

near-set-reconciliation's Introduction

I like to make stuff work in Rust ๐Ÿฆ€.

I created following repos:

I constributed to, for example:

near-set-reconciliation's People

Contributors

pmnoxx avatar

Watchers

 avatar

near-set-reconciliation's Issues

Code review results

This is what I found after a quick review:

In blt.rs:

  1. You should not use DefaultHasher, it doesn't guarantee consistency across program instances.
  2. This check if i != 0 on line 130 seems to be incorrect.
  3. Even if it is not possible to recover all elements, it may still make sense to use the elements recovered so far to reduce the number of elements that will need to be recovered in the future.
  4. The approach to generate the list of indices (in generate_idx) is suboptimal, it may be more efficient to derive the indices from a single (sufficiently long) hash value.
  5. The values used to derive indices and to xor with xor_hash should be independent.
  6. Element size and hash size should probably be configurable.
  7. Consider removing count, this will make the sketch smaller and the code a bit simpler, but add some computation overhead (which will be small with a properly chosen hash function).

Difference size estimation algorithm

The following algorithm can be used to efficiently estimate the size of the difference between two sets:

Suppose that both peers sample their sets using the same predicate that selects an element with probability p. If the size of the symmetric difference between the sets is n, then the probability that the samples will be identical is (1-p)n. This probability roughly behaves as follows: if np >> 1, this probability is close to zero, if np << 1, it is close to one, otherwise it is somewhere in between.

This can be used to estimate the size of the difference like this: the peers choose a sequence of the values p, for example, 1/2i for some range of integers i. For each p, they sample the set multiple times using different predicates that select an element with probability p, compute the hash of each of the sampled sets, and send the hashes to the other peer. Now, each peer can compare their hashes with those of the other peer. For low p, most of the hashes will be identical. For high p, most of the hashes will probably be different. For some intermediate value of p, about half of the hashes will be different. The exact fraction can be used to estimate the size more precisely.

Some implementation notes:

  • The hashes can be small. In fact, for many parameter values, the highest precision is achieved (for the same amount of space) by using hashes that are just one bit long.
  • A hash of a set can be computed by XORing hashes of its elements. With small hashes, different samples should use different element hashes.
  • One long hash can be interpreted as multiple short ones by taking different bits.
  • The same hash can also be used for element selection (but the bits used for element selection should not be the same as those used for set hashing).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.