Giter Site home page Giter Site logo

zeqinlin-666 / pyramid_sketch_framework Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhouyangpkuer/pyramid_sketch_framework

0.0 0.0 0.0 18.02 MB

This repository contains the open source of the Pyramid sketch framework (accepted by VLDB 2017 and then submitted to ToN).

C++ 99.21% Makefile 0.42% C 0.38%

pyramid_sketch_framework's Introduction

The Pyramid sketch framework

Introduction

VLDB version

The sketch is a probabilistic data structure, and is used to store and query the frequency of any item in a given multiset. Due to its memory efficiency, it has been applied to various fields in computer science, such as streaming database, network traffic measurement, etc. The key factors of sketches for data streams are accuracy, speed, and memory usage. There are various sketches in the literature, but they cannot achieve both high accuracy and high speed using limited memory, especially for skewed datasets. To address this issue, we propose a sketch framework, the Pyramid sketch, which can significantly improve accuracy as well as update and query speed. To verify the effectiveness and efficiency of our framework, we applied our framework to four typical sketches. Extensive experimental results show that the accuracy is improved up to 3.50 times, while the speed is improved up to 2.10 times.

ToN version

Sketch is a probabilistic data structure, and is used to store and query the frequency of any item in a given multiset. Due to its high memory efficiency, it has been applied to various fields in computer science, such as stream database, network traffic measurement, etc. The key metrics of sketches for data streams are accuracy, speed, and memory usage. Various sketches have been proposed, but they cannot achieve both high accuracy and high speed using limited memory, especially for skewed datasets. To address this issue, we propose a sketch framework, the Pyramid sketch, which can significantly improve accuracy as well as update and query speed. To verify the effectiveness and efficiency of our framework, we applied our framework to four typical sketches. Extensive experimental results show that the accuracy is improved up to 3.50 times, while the speed is improved up to 2.10 times. We have released our source codes at Github

About the source codes, dataset and parameters setting

The source code contains the C++ implementation of the CM, CU, C, A sketch and PCM, PCU, PC, PA sketch (using our Pyramid sketch framework). We complete these codes on Linux 14.04.5 and compile successfully using g++ 4.8.4.

The file stream.dat is the subset of one of the encrypted IP traces used in experiments. This small dataset contains 1M items totally and 193,894 distinct items. The maximum frequency of those items is 4426. The full dataset can be download on our homepage (http://net.pku.edu.cn/~yangtong/uploads/stream_full.dat).

We set the memory allocated to each sketch 0.1MB. The other parameters setting is the same as mentioned in the paper.

How to run

Suppose you've already cloned the respository and start from the Pyramid_Sketch_Framework directory.

You just need:

$ make 
$ ./pyramid

Output format

Our program will print the throughput of insertion and query of these eight sketches and the ARE and AAE of these sketches. Note that to obtain convincing results of the throughput, you are supposed to set the micro "testcycles" in the main.cpp to a larger value (e.g. 100) and run the program on a Linux server.

Publication

(VLDB17) Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams

Technical Report

pyramid_sketch_framework's People

Contributors

zhouyangpkuer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.