Giter Site home page Giter Site logo

munhouiani / gee Goto Github PK

View Code? Open in Web Editor NEW
23.0 1.0 12.0 1.01 MB

Pytorch implementation of GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Python 4.84% Jupyter Notebook 95.16%
pytorch pytorch-implementation deep-learning autoencoder vae vae-pytorch petastorm pyspark jupyter jupyter-notebook

gee's Introduction

GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Details in blog post: https://blog.munhou.com/2020/07/12/Pytorch-Implementation-of-GEE-A-Gradient-based-Explainable-Variational-Autoencoder-for-Network-Anomaly-Detection/

How to Use

Install Dependencies

Create a new conda environment

conda create -n gee python=3.7.7
conda activate gee 
conda install pyspark=3.0.0 click=7.1.2 jupyterlab=2.1.5 seaborn=0.10.1
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.1 -c pytorch
conda install pytorch-lightning=0.8.4 shap=0.35.0 -c conda-forge
pip install petastorm==0.9.2

Feature Extraction

Download the processed data here or perform all the following steps.

  1. Download raw data march_week3_csv.tar.gz and july_week5_csv.tar.gz.
  2. Decompress files.
    tar -xvf march_week3_csv.tar.gz
    tar -xvf july_week5_csv.tar.gz
    
  3. Separate files by date.
    grep '^2016-03-18' march.week3.csv.uniqblacklistremoved >> 20160318.csv
    grep '^2016-03-19' march.week3.csv.uniqblacklistremoved >> 20160319.csv
    grep '^2016-03-20' march.week3.csv.uniqblacklistremoved >> 20160320.csv
    grep '^2016-07-30' july.week5.csv.uniqblacklistremoved >> 20160730.csv
    grep '^2016-07-31' july.week5.csv.uniqblacklistremoved >> 20160731.csv
    
  4. Put 20160319.csv and 20160730.csv to data/train folder, 20160318.csv, 20160320.csv, and 20160731.csv to data/test folder.
  5. Perform feature extraction.
    python feature_extraction.py --train data/train --test data/test --target_train feature/train.feature.parquet --target_test feature/test.feature.parquet
    

Normalise and Prepare Input Data for Model

Download the processed data here or perform all the following steps.

python build_model_input.py --train feature/train.feature.parquet --test feature/test.feature.parquet --target_train model_input/train.model_input.parquet --target_test model_input/test.model_input.parquet

Train Model

Download pre-trained model here or perform all the following steps.

python train_vae.py --data_path model_input/train.model_input.parquet --model_path model/vae.model --gpu True

Evaluation

ROC

Reconstruction Error Distribution

Gradient

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.