Giter Site home page Giter Site logo

zeqinlin-666 / sf-sketch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from grace-tl/sf-sketch

1.0 0.0 0.0 311 KB

SF-sketch: A Two-stage Sketch for Data Streams

License: GNU General Public License v3.0

C++ 45.74% C 33.64% Makefile 2.95% Shell 16.81% CMake 0.86%

sf-sketch's Introduction

The SF-sketch

Introduction

A sketch is a probabilistic data structure used to record frequencies of items in a multi-set. Sketches are widely used in various fields, especially those that involve processing and storing data streams. In streaming applications with high data rates, a sketch "fills up" very quickly. Thus, its contents are periodically transferred to a remote collector, which is responsible for answering queries. In this paper, we propose a new sketch, called Slim-Fat (SF) sketch, which has a significantly higher accuracy compared to prior art and at the same time achieves comparable speed as the best prior sketch. The key idea behind our proposed SF-sketch is to maintain two separate sketches: a large sketch called Fat-subsketch and a small sketch called Slim-subsketch. The Fat-subsketch is used to perform updates, and periodically produce the Slim-subsketch, which is periodically transferred to the remote collector for answering queries quickly and accurately. For the final version of our SF-sketch, we derived the error bound and an accurate correct rate formula verified by experiments. We implemented and extensively evaluated SF-sketch along with several prior sketches and compared them side by side. Our experimental results show that SF-sketch outperforms the most widely used CM-sketch by up to 33.1 times in terms of accuracy.

Building

We implement SF-sketch and 4 well known sketches (CMCU sketch, A sketch, CM sketch and C sketch) which are used for comparing with our proposed SF sketch. You can build thoses sketches by

cd [sketch folder]; ./cmkae .; make

There is an example in main.c, which shows the basic usage of the corresponding sketch. Type [sketch folder]/bin/[sketch] -h to show more information.

NOTICE: CMake 3.1 or higher is required

Usage

$ cd ./bench_sketch/; make;
$ sh ./experiment_on_timming.sh
$ sh ./experiment_on_aging.sh

Workloads specification

To test the performance of sketches in different scenarios, we harness YCSB to generate two kinds of workloads: uniform and skewed (zipfian). We also use memcached and the generated workloads to record the real frequency of each item as benchmarks. After that, we feed the generated workloads to our proposed SF-sketch to record the estimation of itme frequency using the SF-sketch. Then, we calculate the average relative error and empirical cumulative distribution function (CDF) of relative error using benchmarks and estimations. To compare the accuracy with other well known sketches (such as Count-min (CM) sketch, conservative update (CU) sketch, etc.), we can use the same method to get the average relative error and CDF.

sf-sketch's People

Contributors

paper2017 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.