Giter Site home page Giter Site logo

autosketch's Introduction

AutoSketch

1. Introduction

AutoSketch is a sketch-oriented compiler for query-driven network telemetry. It can automatically compile high-level data-stream operators into sketch instances that can be readily deployed with low resource usage and incur limited accuracy drop. This work has been accepted by NSDI'24.

The major contributions of AutoSketch are as follows:

  • Combine the strengths of both sketch-based telemetry algorithms and query-driven network telemetry
  • Extend the capacity of conventional telemetry languages to perceive and control accuracy intent
  • Reduce the burden on users to select, configure, and implement Sketch algorithms
  • A framework capable of integrating many novel Sketch optimization techniques (e.g., SketchLib [NSDI’22], FlyMon [SIGCOMM’22], Sketchovsky [NSDI’23], BitSense [SIGCOMM’23], OmniWindow [SIGCOMM’23])

2. Environment requirement

We require the following dependencies to run AutoSketch programs.

  • Software Dependencies

    pip3 install ply
    pip3 install jinja2
    sudo apt install libboost-all-dev -y
    sudo apt install libjsoncpp-dev -y
    sudo apt install libpcap-dev -y
    
    # spdlog
    git clone https://github.com/gabime/spdlog.git
    cd spdlog && mkdir build && cd build
    cmake .. && make -j && sudo make install
  • Switch SDE: Tofino SDE 9.13.1 is needed to compile the P4 code generated by AutoSketch. (Hint: Older versions of SDE should work correctly. However, we have not fully verified this in other environments.)

  • Trace data: We provide an archive including the pre-processed CAIDA trace file for running experiments. Due to the large size of the data, please download and extract it to the ${AutoSketch_dir}/data/ directory. PKU Drive

3. Trace Preprocess [Optional]

AutoSketch requires traffic data for benchmark-based searching to identify the sketch configurations with the minimal resource overhead that can meet the user's accuracy intent. The trace files we provide have already been preprocessed. If users need to use other trace files, they should preprocess them according to the following steps.

$ cd trace; mkdir build; cd build
$ cmake ..; make
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130000.UTC.anon.pcap search_trace.bin
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130100.UTC.anon.pcap verify_trace.bin

4. Run command

  • One Command to generate the backend P4 program

    $ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4s [-p4v]

    The -i parameter specifies the source file for the input query code.

    The -p4o parameter specifies the path and filename for the generated P4 code.

    The -p4s parameter indicates that during the compilation process, benchmark-based searching will be automatically executed to search for configuration parameters.

    The -p4v parameter indicates that the obtained configuration parameters are verified during the search.

  • AutoSketch also supports step-by-step compilation to facilitate debugging.

    1. Generate the benchmark-based searching program

      $ python compiler.py -i examples/newconn.py -s output/newconn

      The -i parameter specifies the source file for the input query code.

      The -s parameter specifies the directory for generating the profiling program and related configuration files.

    2. Run the benchmark-based searching

      $ cd output/newconn
      $ ls
      autosketch-newconn.cpp  conf.json  Makefile
      $ make
      $ ./autosketch-newconn ./conf.json --search ./app-conf.json

      The --search parameter writes the searched configuration to the specified file.

    3. Verify the searched configuration

      $ ./autosketch-newconn ./conf.json --verify ./app-conf.json

      The --verify parameter verifies the configuration in the specified file.

    4. Generate the P4 program based on the searched configuration

      $ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4c output/newconn/app-conf.json

      The -p4c parameter indicates that an existing configuration file.

5. Input program requirements

The input program consists of several modules, each of which can be a User-Defined Function (UDF) or a definition of a task. Here is an example.

def remap_key(tcp.flag):
    if tcp.flag == SYNACK:
        nkey = ipv4.src_addr
    else:
        nkey = ipv4.dst_addr 
def sf(tcp.flag, tcp.seq, tcp.ack):# cnt nextseq
    if tcp.flag == SYNACK:
        nextseq = tcp.seq + 1
        cnt += 1
    elif nextseq == tcp.ack: 
        cnt -= 1

syn_flood = PacketStream()
            .filter(left_value="ipv4.protocol", op="eq", right_value="IP_PROTOCOLS_TCP")
            .filter(left_value="tcp.flags", op="eq", right_value="TCP_FLAG_ACK")
            .groupby(func_name="remap_key", index=[], args=["tcp.flags"], registers=[], out=["nkey"])
            .groupby(func_name="sf", index=["nkey"], args=["tcp.flags", "tcp.seq", "tcp.ack"], registers=["nextseq", "cnt"], out=["cnt"])
            .filter(left_value="cnt", op="gt", right_value="Thrd_SYN_FLOOD")
            .distinct(distinct_keys=["nkey"])

Description of the UDF format

The format for User-Defined Functions (UDF) takes inspiration from Python's function definition syntax but with some differences.

def func_name(args) # persist_state
    statements
  • args parameter specifies the arguments passed in, usually fields from the header.

  • persist_state defines variables that need to be globally saved across multiple operations, separated by spaces (think of it as defining a global table). This annotation cannot be omitted unless the function does not utilize a global state. For more details, refer to the description of the groupby operation in the operators section.

Description of the telemetry application format

name = PacketStream()
        .operators()

The name is the name of the task, PacketStream is a fixed identifier, and operators are the several operators that follow it.

Description of AutoSketch operators

Here are the conventions for each type of operator format:

  • filter: The parameters are (left_value, op, right_value), fundamentally acting as a conditional expression.
  • map: The parameters are (map_keys, new_import), where map_keys selects which key-value pairs from the original set to continue processing, and new_import introduces new key-value pairs, formatted as {"key": "value"}.
  • reduce: The parameters are (reduce_keys, result), where reduce_keys indicates which key(s) to use as a reference for the reduce operation, and result stores the outcome of the reduce.
  • zip: The parameters are (stream_name, left_key, right_key), with stream_name indicating which stream's operator sequence results to merge, and left_key and right_key indicating which keys to use as the basis for merging from the current stream and the incoming stream, respectively.
  • distinct: The parameters are (distinct_keys), indicating which keys to use as the basis for deduplication.
  • groupby: The parameters are (func_name, index, args, registers, out), where func_name corresponds to the function name in the UDF described above, index indicates which keys to use as a basis for building the lookup table, args corresponds to the arguments passed, registers are the registers the function needs to save, and out defines the result output to the key-value pair.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.