Giter Site home page Giter Site logo

cisco-ie / telemetry Goto Github PK

View Code? Open in Web Editor NEW
112.0 33.0 36.0 1.71 GB

Open-source datasets for anyone interested in working with network anomaly based machine learning, data science and research

Home Page: https://github.com/cisco-ie/telemetry

License: Other

network-analysis network-monitoring behavior-analysis machine-learning machinelearning data-science deep-learning automation network-automation auto-remediation

telemetry's Introduction

Open-source datasets for anyone interested in working with network anomaly based machine learning, data science and research

Objective

Our immediate goal is to share real-world datasets and documentation that are instrumental to develop, test and compare anomaly detection algorithms based on machine learning (both supervised or unsupervised).

Our longer term goal is to systematically extend this collection with more complex datasets, event occurrences, which drives towards more real-life situations and helps the community move towards a greater capability for automation, remediation, and behavior pattern recognition.

Related repositories

The datasets released in this website are also instrumental to reproduce results that are published in [ACM SIGCOMM BigDama'18] and that are demonstrated at [IEEE INFOCOM'18] (see the Reference section below)

This repository only contains the dataset, whereas related repositories contain

Usage

Each datasets include the following:

  • .csv Dataset
  • Header Definition File: Provides a definition of each header
  • Case File: Information reflecting the events, time of the events, and device(s) where event triggers are initiated

Folders & Files

  • /topology_description_docs - Information regarding the topology, all connections, cdp neighbors, and device types

    • telemetry_topology_maps.pdf
      • Slide 1: Logical topology map with links colored based on the numbe of ECMP links and speed
      • Slide 2: Actual connected topology
      • Slide 3: Device types in position
    • CDP_ground_truth.pdf: Device connections for the network under test
  • # Traffic load No. Anomalies Duration Description
    0 0 0 1h Baseline (no amolies)
    1 500Gbps 0 1h Baseline (no anomalies)
    2 1Tbps 11 1h BGP Clear
    3 1Tbps 8 0.55h BGP Clear
    4 1Tbps 5 0.72h Port Flap
    5 1Tbps 12 2h BGP Clear
    6 0 12 2h BGP Clear
    7 0 130 72h (VIRL) BGP Clear
    8 0 238 262h (VIRL) BGP Clear
    9 2.9Tbps 5 .75h Port Admin Shut
    10 2Tbps 5 .55h Port Transceiver Pull and Reinsert

References

[ACM SIGCOMM BigDama'18] Putina, Andrian and Rossi, Dario and Bifet, Albert and Barth, Steven and Pletcher, Drew and Precup, Cristina and Nivaggioli, Patrice, Telemetry-based stream-learning of BGP anomalies ACM SIGCOMM Workshop on Big Data Analytics and Machine Learning for Data Communication Networks (Big-DAMA’18) aug. 2018

[IEEE INFOCOM'18] Putina, Andrian and Rossi, Dario and Bifet, Albert and Barth, Steven and Pletcher, Drew and Precup, Cristina and Nivaggioli, Patrice, Unsupervised real-time detection of BGP anomalies leveraging high-rate and fine-grained telemetry data IEEE INFOCOM, Demo Session apr. 2018,

License

Community Data License Agreement - Permissive 1.0 © Cisco Innovation Edge

telemetry's People

Contributors

anrputina avatar apletche avatar apletcher avatar brh55 avatar brockners avatar nonsns avatar parisa-foroughi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

telemetry's Issues

Duplicated entries for a given Producer and time

Hi,
I am seeing lots of entries at a given timepoint for a given producer. I am not sure why this is happening. Roughly half of the timepoints are duplicates. Could you please help clarify? Below is an example from the data in 3/.

df_3=pd.read_csv('../Data/cisco_telemetry/3/bgpclear08042017.csv', low_memory = False).dropna(axis=1, how='all')
df_3['time'] = pd.to_datetime(df_3['time'].astype('int'), unit='ns')
df_3.query('Producer == "leaf2"').sort_values('time')[['bytes-received','time', 'name', 'EncodingPath']].tail(10)

image

df_3.query('Producer == "leaf2"')['time'].shape[0], df_3.query('Producer == "leaf2"')['time'].unique().shape[0]

out: (31080, 17340)

Data headers not matching docx headers

There are a few mismatches between the dataset headers and the docx header definition files "Metric name". When the docx headers are used as reference while data-processing, this can lead to problems.
Two kinds of typos:

  • Misspelling of "fragmentation" (docx header definition file) as "fragmenation" (csv dataset header)
  • Use of double underscores in some csv dataset headers, but only one in the docx header definition file

Dataset 8 is missing

The 262h-long dataset #8 listed in your README is missing in the repository.
Also, telemetry-topology-maps.pdf is missing the 2nd and 3rd slides.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.