Giter Site home page Giter Site logo

phishing's Introduction

Feature Selection for Spam and Phishing Detection

The ease of communicating through email saw a huge increase in Unsolicited Bulk Email(UBE). Unsolicited emails are broadly divided into 2 categories: Spam(mass mailing approach to marketing) and Phishing(impersonatisation for the purpose of stealing data). This project gives a machine learning to approach to classify them into spam and phishing categories. It considers a total of 40 features which are broadly categorised into URL based, body based, sender based, subject based and script based features.

An addition to this extraction involved feature selection using:

  • Low Variance filter
  • High correlation filter
  • Feature importances
  • mRMR

Getting Started

The project has been divided into 3 modules for convenience. The first module deals with extraction of intricate features from the emails and preparing the dataset for application fo the module. The entire procedure is illustrated as below!

Requirements

Running

  • Run Feature Extraction sequentially to obtain 3 datasets in CSV format
  • Once CSVs with 40 features have been generated, now run the Feature Selection sequentially
  • In Feature Selection, the reference file chosen was dataset_HSP.csv which can be changed in the calling of mRMR_CSV('dataset_HSP', 'label')

Testing and Results

Voting Ensemble classifier with SVM, Naive Bayes, LDA, Adaboost, Random Forest and CART was used in the implementation of mRMR feature selection The selected feature accuracy was tested against the origibal (40 features) using the following algorithms:

  • Voting Ensemble classifier (same as above)
  • SVM
  • Stochastic Gradient Boosting
  • Extra Trees classifier
  • Adaboost
  • Random Forest
  • Bagged Decision Tree classifier (CART)
  • Naive Bayes

phishing's People

Contributors

tushaargvs avatar

Stargazers

 avatar Khair Ahammed avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.