Giter Site home page Giter Site logo

zorroroot / dga_detection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from miawallace0618/dga_detection

0.0 1.0 0.0 1.39 MB

DGA detection project that aims to rapidly build and deploy a machine learning system to detect domain names generated by malware.

Jupyter Notebook 94.48% Dockerfile 0.84% Python 4.69%

dga_detection's Introduction

Domain Generation Algorithm Detector

Rachel Qu - Coding Challenge - Jan 28, 2018

This project aims to rapdily research, build, and deploy a machine learning system to detect domain names which are likely generated by malware. Based on some research, three machine learning models were built in this project. Bigram classifier acted as baseline here. LSTM showed a better performance than bigram classifier. An ensemble model that use predicted results of LSTM and one-hot encoding of common TLD information achieved the best performance. It's the ensemble model that went into service.

Content:

  • Get Predictions

    • if you want to get predicted results back, please follow steps in this section.
  • Project Details

    • if you want to learn more about the project details, including data preparation, model training and evaluation, please check information in this section.

Get Predictions

Step1: Build Docker

To get predicted results on unseen data, first you need to get docker ready. Two choices are availabe here. You can either choose to build docker image on your machine, or you can click the link below to download a built image from drive.

  • Follow steps below to build docker on your machine:

    Open a new terminal window and type these commands:

     $ git clone http://github.com/miaWallace0618/DGA_Detection.git
     $ cd DGA_Detection
     $ sudo docker build -t dga_detection .
     $ sudo docker run -p 80:80 dga_detecion:latest
    
  • Click this link below to download the built image from drive diretly: https://drive.google.com/file/d/1DXPTtse12P29IwuEX8IdJNiiVWB_OaNS/view?usp=sharing

    Then load the downloaded image by typing the following commands in a terminal window:

     $ sudo docker load < path_to_dga_detection.tar
     $ sudo docker run -p 80:80 dga_detection:latest
    

Step2: Get Predicted Results

When docker is ready, you can get your predicted results back by calling REST API service. The tool used here to send unseen data is Postman. Please follow the steps below carefully to get your predictions.

  1. Select POST method and type the url http://0.0.0.0:80/predict into the Params in Postman.
  2. Under Body tab, select form-data button.
  3. Under KEY section, select File from the dropdown list and type file in the textbox.
  4. Click Choose File button and upload your unseen data as a .txt file. Postman Sample
  5. Your .txt file should formatted to have one URL in each line. Each URL should contain domain information and TLD information. There shouldn't be 'ctl-A' delimiter at the end of each line. A .txt file named sample.txt is a sample file provided. A screenshot of sample input data is shown below: Input Sample
  6. After loaded the .txt file containing unseen data, click on Send button. The predicted results should be returned as a list. Sample results are shown below: Postman Results

Project Details

Data preparation, model training and evaluation are contained in DGA_Detecion.ipynb.

Basic idea of this project is using ensemble method to improve performance and robustness of model. First, data was cleaned, including removing subdomains, seperating primary domains and TLDs. Then a LSTM model is built on primary domains only. After got the probabilities of LSTM model, the probabilities were combined with the one-hot encoding results of top 250 common TLDs in dataset to build the final model. The overall architecture of the final proposed DGA detecion model is shown below:

Architecture of Model

References

  • Ryan R. Curtin, Andrew B. Gardner. Detecting DGA Domains with Recurrent Neural Networks and Side Information. arXiv. Oct 4, 2018.
  • Hyrum Anderson, Jonathan Woodbridge. Using Deep Learning To Detect DGAs. EndGame. Nov 18, 2016.
  • Anonymous authors. Character Level Based Detection of DGA Domain Names. ICLR. 2018.

dga_detection's People

Contributors

miawallace0618 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.