phishing's Introduction

Feature Selection for Spam and Phishing Detection

The ease of communicating through email saw a huge increase in Unsolicited Bulk Email(UBE). Unsolicited emails are broadly divided into 2 categories: Spam(mass mailing approach to marketing) and Phishing(impersonatisation for the purpose of stealing data). This project gives a machine learning to approach to classify them into spam and phishing categories. It considers a total of 40 features which are broadly categorised into URL based, body based, sender based, subject based and script based features.

An addition to this extraction involved feature selection using:

Low Variance filter
High correlation filter
Feature importances
mRMR

Getting Started

The project has been divided into 3 modules for convenience. The first module deals with extraction of intricate features from the emails and preparing the dataset for application fo the module. The entire procedure is illustrated as below!

Requirements

Python2
Jupyter Notebook
Necessary Python libraries (check first cells of Feature Extraction and Feature Selection)

Running

Run Feature Extraction sequentially to obtain 3 datasets in CSV format
Once CSVs with 40 features have been generated, now run the Feature Selection sequentially
In Feature Selection, the reference file chosen was dataset_HSP.csv which can be changed in the calling of mRMR_CSV('dataset_HSP', 'label')

Testing and Results

Voting Ensemble classifier with SVM, Naive Bayes, LDA, Adaboost, Random Forest and CART was used in the implementation of mRMR feature selection The selected feature accuracy was tested against the origibal (40 features) using the following algorithms:

Voting Ensemble classifier (same as above)
SVM
Stochastic Gradient Boosting
Extra Trees classifier
Adaboost
Random Forest
Bagged Decision Tree classifier (CART)
Naive Bayes

Recommend Projects

tushaargvs / phishing Goto Github PK

phishing's Introduction

Feature Selection for Spam and Phishing Detection

Getting Started

Requirements

Running

Testing and Results

phishing's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent