Giter Site home page Giter Site logo

dairui777 / hashpipe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vibhaa/hashpipe

0.0 0.0 0.0 139.9 MB

Heavy hitter detection algorithm that is entirely in the dataplane

License: Apache License 2.0

Shell 0.03% C++ 0.01% Python 0.07% C 0.01% Java 0.39% CSS 0.01% TeX 0.34% Makefile 0.01% HTML 99.14% P4 0.01%

hashpipe's Introduction

Project aimed at identifying the flows that contribute to majority of packets between two points on a given network. The final paper can be found here.

All the evaluations used data from the CAIDA Traces from https://data.caida.org/datasets/passive-2016/equinix-chicago/20160218-130000.UTC/ and datacenter traces from http://pages.cs.wisc.edu/~tbenson/IMC10_Data.html. These were converted to .csv files with the srcip, dstip, protocol, srcport and dstport using tshark. They were then split them into chunks of 10M packets each after which TopKIdentifierFlowId was run on them. TopKIdentifierFlowId also needs a file with the actual flow size, which can be obtained by running FindFlowSize. This groups the packets from the csv by 5-tuple giving you a 5-tuple flowid and the associated count for a flow.

The output from TopKIdentifierFlowId itself involves rows of data one corresponding to each experiment (a run of HashPipe on that particular trace with those settings) and prints out the tablesize (number of flowid, counter pairs), K value, D (number of stages), false negative rate, false positive rate, weighted versions of the same (by size fo the flows), number of duplicates in the table, reported fraction of heavy hitters, number of reported heavy hitters, average deviation from the actual size across flows and so on. The header identifies the precise metrics reported. The variable "FlowSizes" has the map between flowid and their actual sizes.

runTopKIdentificationTrials function runs the experiments for you and runs only once across all the K values we want (for HashPipe and all of the incremental steps before that). To run this, you want to use the SummaryStructureType.RollingMinSingleLookup. So, you would want to run TopKIdentifierFlowId with the arguments flowSize file (actual flowsize per flow), the tracefile, "runTrial" and "Single" as its arguments to run HashPipe as per the algorithm in the paper.

runTrialsPerK on the other hand, runs it once for every K and this is relevant for Sample and Hold and the Count-Min Sketch. So, you would want to run TopKIdentifierFlowId with the arguments flowSize file (actual flowsize per flow), the tracefile, "PerThreshold" and "CM" or "SampleAndHold" as its arguments to run CM sketch or Sample And Hold respectively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.