Giter Site home page Giter Site logo

countminsketch's Introduction

Count-Min Sketch

Count-Min Sketch is a probabilistic sub-linear space streaming algorithm which can be used to summarize a data stream in different ways. It's mostly used to find Heavy Hiiters in a data set. This data structure is pretty recent. It was introduced in 2003. Sine then, it has inspired many applications, extensions, and variations.

This probabilistic data structure was first proposed by Muthukrishnan and Cormode (see references below for more details).

Count-Min Sketch Internals

The data structure primarily consists of a fixed array of counters, of width w and depth d. Every entry is called a counter. The counters are all initialized to zeros. Each row of counters is associated with a different hash function (out of k-pairwise independent hash functions). The hash function maps items uniformly onto the range {1,2,...,w}. The hash functions do not need to be particularly strong or as complex as cryptographic hash functions. So you can just use a linear hash. But is important that the hash function for each row be distinct (and pairwise indepenent!).

There are two main operations: UPDATE(i,c), ESTIMATE(i). UPDATE(i,c) updates (incremenets) the count associated with the item i. On the other hand, ESTIMATE(i) returns an estimate of the count associated with an item i.

For more details on the data structure internals, see the summary in this repository.

Using the Program

You can create a new Count-Min skectch like this:

CountMinSketch c(0.1, 0.01);

The signature of the constructor is ContMinSketch(float eps, float gamma) where 0.01 < eps < 1 is the possible error to allow and 0 < gamma < 1 is the probability that the possible error to occur. The smaller these values the more accurate are the estimates of the Count-Min sketch but the more space would be used in estimations.

To update the count of an item in the Count-Min sketch, use c.update(item, int count) where item is either a string constant or an integer. For example,

 c.update("heylo", 1);
 c.update(20, 2);
 c.update("heylo", 1);

To estimate the total count of a specific item, use c.estimate(item) where item is either a string constant or an integer. For example,

  c.estimate(20);
  c.esimate("hello");
  c.estimate("heylo");
  // returns 2,0,2 respectively

To retrieve the total count of all the items in the data structure, use c.totalcount().

Running the Tests

To run the tests, invoke make. It produces the object file countmin which you can then call, like this: ./countmin.

References

countminsketch's People

Contributors

alabid avatar kpjoshi avatar neilcook avatar rustingsword avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.