OmniSketch

OmniSketch is a C++ framework for designing, simulating and testing sketch, a telemetry technique that aims at collecting streaming statistics. It is designed to be efficient, consistent and easy to use. Currently it is supported on Windows, Linux and MacOS.

Feature Overview

A wide variety of sketches are implemented with minimal code. Some complicated sketches, such as ElasticSketch, are also supported. For a full list of the sketches available, check this table.
An auto-testing framework that tests sketch performance and collects interested metrics with minimal configurations.
A generally applicable Counter Hierarchy framework that enables a tremendous amount of space saving via hierarchical counter braiding, applicable to all value-based counters.
Pcap Parser parses .pcap and .pcapng files to generate streaming data that interoperates with OmniSketch. Data format of a streaming data record is customizable.
Flexible definitions of a flowkey are available. Currently, 1-tuple, 2-tuple and 5-tuple are supported.
No redundant code has to be rewritten. Designer can focus mainly on the sketch algorithm, and let the framework do the rest, e.g. data parsing and testing details.
Since sketching algorithms vary a lot, the framework includes only the most common methods. Do not panic! You can always write inject your own code and tell the framework to run it.

Download & Build

Download from GitHub release page:

git clone --recurse-submodules https://github.com/N2-Sys/OmniSketch.git
cd OmniSketch
mkdir build; cd build

The project is built with CMake. If an additional module of PcapParser, which is capable of parsing .pcap/.pcapng files in quite a versatile manner and is interoperable with OmniSketch, is requested, please define the BUILD_PCAP_PARSER macro by

cmake .. -DBUILD_PCAP_PARSER=True

Upon defining this macro, CMake automatically checks for dependencies on PcapPlusPlus and pcap libraries, so MAKE SURE you have installed these libraries in advance. The default included directory of the PcapPlusPlus header files is /usr/local/include/pcapplusplus (which is certainly not true on Windows). To change this default searching path, provide to CMake a new argument:

cmake .. -DBUILD_PCAP_PARSER=True -DPCPP_INCLUDE_PATH=[path on your system]
# e.g. cmake .. -DBUILD_PCAP_PARSER=True -DPCPP_INCLUDE_PATH=/usr/local/include/pcapplusplus

In the simplest case, if PcapParser is not requested, the PcapPlusPlus and pcap libraries are never needed, so you can just build with

cmake ..

Gee, it saves you a lot of work!

To verify that you indeed build a runnable copy of OmniSketch, in the same directory (build/) run

ctest

If you see the line

100% tests passed, 0 tests failed out of 8

you can proceed to poke around OmniSketch and design your new sketches.

Design New Sketches

Here is an overview of how to design your own sketch in OmniSketch. For a detailed description, please check the docs.

Sketch algorithm should be in src/sketch/. Suppose you add a file XXX.h to this directory.
Sketch testing procedure is defined in src/sketch_test/. You should name your testing file as XXXTest.h accordingly.
Add a new sketch target in CMakeLists.txt. This is done in a single line add_user_sketch(YYY XXX).
Write down the sketch config in a toml file. The default config file is src/sketch_config.toml. All the sketch configs are user-defined, but typically should contain (though not required)

Streaming data file
Format of the streaming data record
Metrics measured during testing
Sketch parameters
Other user-defined configs

It is the user who controls what, where and how to parse in the config file. Omnisketch imposes no restrictions on how you organize the toml file and what you put in there. Fortunately, OmniSketch provides a rich set of tools to help you do it in just a couple of lines.

Goto the building directory build/, cmake and make it. (CMake is needed since a new target has just been added) From the time on, if the code is modified, only make is needed.
At this point, the sketch is compiled and linked. It is callable from the terminal with ./YYY -c config. If no -c option is provided, src/sketch_config.toml is assumed to be the default config file. A possible output of Count Min Sketch runs as follows:

terminal> ./CM -c ../src/sketch_config.toml
   INFO| Loading config from ../src/sketch_config.toml... @utils.cpp:62
VERBOSE| Config loaded. @utils.cpp:76
VERBOSE| Preparing test data... @data.h:718
   INFO| Loading records from ../data/records.bin... @data.h:720
VERBOSE| Records Loaded. @data.h:746
DataSet: 1090120 records with 99999 keys (../data/records.bin)
============     Count Min      ============
  Mem Footprint: 1.52646 MB
    Update Rate: 1853.95 Mpac/s
     Query Rate: 1694.9 Mpac/s
      Query ARE: 0.161831
      Query AAE: 0.252943
============================================

From now on, every time your sketch is about to run on different data and formats, or to collect some new statistics, all you have to do is simply modifying the config file. If the template header should be changed, you have to make a new driver.

API Docs

Please follow this link.

Tables of Sketches

Here are a list of all the sketches implemented and tested. The last column indicates the name of generated executable drivers, which can be modified in lines starting with add_user_sketch in CMakeLists.txt. For example, suppose there is a line saying add_user_sketch(CM CMSketch). It means that one may run driver compiled from src/sketch/CMSketch.h and src/test/CMSketchTest.h by ./CM in the building directory after making it.

Status: have, checked, checked but with doubt, tested.

	Status	Name of the Executable
CM Sketch	t	CM
CH-optimized CM Sketch	t	CHCM
Count Sketch	t	CS
CU Sketch	t	CU
Bloom Filter	t	BF
counting bloom filter	t
LD-sketch	t
MV-sketch	t
HashPipe	t	HP
FM-sketch(PCSA)	t
Linear Counting
Kmin(KMV)
Deltoid	t
Flow Radar	t	FR
sketch learn
elastic sketch	t
univmon
nitro sketch	t
reversible sketch
Mrac	t
k-ary sketch	t
seqHash
TwoLevel	t
multi-resolution bitmap
lossy count	t
space saving	t
HyperLogLog	t
Misra-Gries	t
Fast Sketch	t
CounterBraids	t
HeavyKeeper	d

Inaccurate time measurement in test.h

In test.h, we use the following macros to measure the update and query time for each sketch:

#define DEFINE_TIMERS                                                          \
  auto timer = std::chrono::microseconds::zero();                              \
  auto tick = std::chrono::steady_clock::now();                                \
  auto tock = std::chrono::steady_clock::now();
#define START_TIMER tick = std::chrono::steady_clock::now();
#define STOP_TIMER                                                             \
  tock = std::chrono::steady_clock::now();                                     \
  timer += std::chrono::duration_cast<std::chrono::microseconds>(tock - tick);
#define TIMER_RESULT static_cast<int64_t>(timer.count())

Here the timer is measured in microseconds. However, in testUpdate, we measure the time for each update separately:

  DEFINE_TIMERS;
  for (auto ptr = begin; ptr != end; ptr++) {
    START_TIMER;
    ptr_sketch->update(ptr->flowkey,
                       cnt_method == Data::InLength ? ptr->length : 1);
    STOP_TIMER;
  }
  if (metric_vec.in(Metric::RATE))
    update[Metric::RATE] = 1.0 * (end - begin) / TIMER_RESULT * 1e6;

For many sketch algorithms (e.g. CMSketch), the update operation takes far less than 1us (~200ns on my machine), and the timer result is rounded to 0us. This causes significant inaccuracy for time measurement. The same problem exists in testInsert and testQuery, etc.

Changing microseconds to nanoseconds would be a solution.

Also notice that, std::chrono measures the wall clock time instead of CPU time. We might want to switch to std::clock(), which measures the CPU time and produces a more accurate result.

n2-sys / omnisketch Goto Github PK

omnisketch's Introduction

OmniSketch

Feature Overview

Download & Build

Design New Sketches

API Docs

Tables of Sketches

omnisketch's People

Contributors

Stargazers

Watchers

Forkers

omnisketch's Issues

Heavy hitter estimation is broken when there are multiple flows of the same size

Inaccurate time measurement in test.h

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent