Giter Site home page Giter Site logo

yatesbury's Introduction

Yatesbury: A Benchmark for East-West Network Security

License: MIT

This dataset serves as a benchmark for evaluting the performance and efficiency of anomaly detectors in east-west data center network traffic. Detailed information about the benchmark can be found in our NetVigil paper.

The dataset includes 13 distinct scenarios, each designated as an attack or a normal operation. For each scenario, a web-based e-commerce application is utilized to generate normal traffic patterns. Simultaneously, attacks are carried out using one or more compromised, malicious nodes. Traffic traces, sourced from NSG flow logs, are processed and converted into CSV files containing only the relevant properties.

Dataset Description

The dataset is available for download from Azure Blob Storage via this link. For efficient data transfer, we suggest using wget -i Dataset.txt to download the dataset. The dataset is compressed in a .tar.gz format and each file represents a distinct scenario.

After decompression, each folder corresponds to either a normal or an attack scenario and includes two files: nsg.csv and label.csv. The schema for these files is as follows:

nsg.csv

  1. time: Time in UTC when the event was logged.
  2. Source IP: Source IP address.
  3. Destination IP: Destination IP address.
  4. Source port: Source port.
  5. Destination port: Destination port.
  6. Protocol: Protocol of the flow. Valid values are T for TCP and U for UDP.
  7. Traffic flow: Direction of the traffic flow. Valid values are I for inbound and O for outbound.
  8. Traffic decision: Whether traffic was allowed or denied. Valid values are A for allowed and D for denied.
  9. Flow State: State of the flow. Possible states are:
    • B: Begin, when a flow is created. Statistics aren't provided.
    • C: Continuing for an ongoing flow. Statistics are provided at 5-minute intervals.
    • E: End, when a flow is terminated. Statistics are provided.
  10. Packets sent: Total number of TCP packets sent from source to destination since the last update.
  11. Bytes sent: Total number of TCP packet bytes sent from source to destination since the last update. Packet bytes include the packet header and payload.
  12. Packets received: Total number of TCP packets sent from destination to source since the last update.
  13. Bytes received: Total number of TCP packet bytes sent from destination to source since the last update. Packet bytes include packet header and payload.

label.csv

  1. Source IP: Source IP address.
  2. Destination IP: Destination IP address.
  3. time: The starting time of each 2-minute window.
  4. label: A 0 indicates normal operation for this IP pair during the 2-minute window, while a 1 denotes the presence of an attack.

Attack Scenarios

Attack Description Description # flows Ratio malicious
Vertical Port Scan Run an exhaustive scan of open ports 1429 0.0265
SYN Flood DoS attack DoS attack where connections are rapidly initialized but not completed 2817 0.0184
SYN Flood DDoS DoS attack where connections are rapidly initialized but not completed (multiple attackers) 2437 0.0439
UDP DDoS DoS attack with UDP packets (multiple attackers) 1473 0.0081
Distributed Stealth Port Scan Run a targeted stealth scan of several key ports across many nodes with SYN packets 4069 0.0058
Distributed Port Scan Run a targeted scan of several key ports across many nodes 4054 0.0051
Distributed UDP Port Scan Run a targeted stealth scan of several key across many nodes with UDP packets 4319 0.0050
Infection Monkey 1 Scans key ports and launches network exploits 2768 0.0122
Infection Monkey 2 Scans key ports and launches network exploits (target limited number of hosts) 1490 0.0107
Infection Monkey 3 Scans key ports and launches network exploits (mount limited number of exploits) 4677 0.0027
C&C communication Compromised nodes receive commands, heartbeats, and file updates from C&C server 2163 0.0254
DNS amplification Attackers send DNS requests and direct responses to victim 4410 0.0825

Reference Paper

If you use our benchmark in your work, we would appreciate a reference to the following paper:

Kevin Hsieh, Mike Wong, Santiago Segarra, Sathiya Kumaran Mani, Trevor Eberl, Anatoliy Panasyuk, Ravi Netravali, Ranveer Chandra, and Srikanth Kandula. NetVigil: Robust and Low-Cost Anomaly Detection for East-West Data Center Security. USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

yatesbury's People

Contributors

kevinhsieh avatar microsoft-github-operations[bot] avatar microsoftopensource avatar

Watchers

 avatar  avatar

yatesbury's Issues

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.