Giter Site home page Giter Site logo

owenhiggins / ml-network-intrusion-detection-model Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 280 KB

Using machine learning with Netflow data, to detect anomalous events on a network. Thus proving the feasibility to prevent zero day attacks with Network Intrusion Dection Systems combined with machine learning. Using a data pipleine with AWS S3, EC2, RDS, Docker, Airflow, Tensorflow and more.

License: MIT License

Python 100.00%
airflow airflow-dags aws data-pipeline docker machine-learning nids python tensorflow

ml-network-intrusion-detection-model's Introduction

Authors

ML Network Intrusion Detection Model

Description

Using machine learning with Netflow data, to detect anomalous events on a network. Thus proving the feasibility to prevent zero day attacks with Network Intrusion Dection Systems combined with machine learning. Using a data pipleine with AWS S3, EC2, RDS, Docker, Airflow, Tensorflow and more.

Objective

Network Intrusion Detection Systems are used to monitor networks for malicious activity and prevent network or data breaches. There is one drawback with these systems; Network Intrusion Detection Systems fail to protect against Zero-Day exploits, which are never before seen attacks used by threat actors. With the vastly growing threat landscape in cyberspace, it is becoming more and more important to be able to protect and defend against these types of attacks. Using machine learning, we plan to be able to detect anomalies on Network Intrusion Detection Systems. A machine learning approach to this issue will allow for rapid and continuous improvement to the model as new threats emerge and are documented. Lastly, we will use Tableau to visualize what attacks we would have been able to prevent using our machine learning model.

Dataset

This datasets provided by University of Queensland Australia contains NetFlow packet capture data labeled as benign or attack, and includes 12 features and contains around 12 million records. Of these records 77% are benign and 23% are attacks. This dataset was created be used to train machine learning based network intrusion detection systems. The NetFlow format is commonly used on networks which allows for wide deployment and scaling of machine learning based network intrusion detection systems. We chose the NF-UNSW-NB15 dataset for our project.

Features:

  • IPv4 source address
  • IPv4 destination address
  • IPv4 source port number
  • IPv4 destination port number
  • IP protocol identifier byte
  • Cumulative of all TCP flags
  • Layer 7 protocol (numeric)
  • Incoming number of bytes
  • Outgoing number of bytes
  • Incoming number of packets
  • Outgoing number of packets
  • Flow duration in milliseconds

Dataset - Machine Learning-Based NIDS Datasets

Paper - NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems

Data Quality Assessment

We preformed exploratory data analysis on the dataset that checks for valid data, looks at the shape, checks for nulls, and duplicate values. Since this data comes directly from the University of Queensland, we believe it to be highly trustworthy and accurate.

Tools & Technologies

Data Pipeline

We used a Batch - ML - Visualize, data pipeline. Our data is downloaded via an API into the AWS S3 bucket. We use an EC2 instance to handle computation and storage, as well as access our containerization software which is Docker. Inside Docker we are able to store our Python files. We then use Apache Airflow to run our DAG (Directed Acyclic Graph), which is our collection of tasks to preform on the data that is inside of our Python files. The transformed data is prepared for storage in RDS. We can then access and visualize the data through Tableau.

Data Transformation Models

Minimal data transformation was needed for this dataset as all but two fields were numeric and acceptable for the logistic regression model. The two IP address fields were split into separate columns to remove periods. The data was then split into training and test sets as well as isolating the target variable 'Label'. We used Tensorflow with logistic regression for binary classification as our machine learning model trained on the NetFlow dataset we specifed above. Building of the logistic regression model was adaped from the Tensorflow guide, Logistic regression for binary classification. Note: After installing Tensorflow to the Docker container, it is necessary to upgrade the typing-extensions model by running the command "pip install typing-extensions --upgrade".

Architecture

Diagram.png

Results

With this project we were able to successfully implement a data pipleline for a simple machine learning classification model for Network Intrusion Detection. This project could serve as a baseline for further training of the existing model, creation of additional classification models, or incorporation of additional datasets to learn from. While the performance of the existing model has much room for improvement, we have built the necessary infrastructure for further training and enhancements which was more important to the project than actual model performance. The project was able to build a reproducable framework to pull in datasets from a highly reputable source, perform EDA as needed, transform the data, store the data for processing, build and train the model to process the data, and build reporting on the model's performance. There is great potential to scale up this framework, for example we have limited the training epochs for our development (configurable by variable) but a larger implementation without processing cost concerns could easily improve performance here. As next steps for this project, we would reccomend configuring re-use of an existing model in a separate dag that could be used for incremental or delta updates to a particular dataset to be run on an existing model and/or for feeding live network traffic data from Netflow. We hope this project proves the viability of using automated data pipeline to train a machine learning model for Network Intrusion purposes.

Tableau Visualization of Model Performance

TableauModel.png

ml-network-intrusion-detection-model's People

Contributors

caseygary avatar owenhiggins avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

caseygary

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.