Giter Site home page Giter Site logo

laxmanbalaraman / phishing-website-prediction Goto Github PK

View Code? Open in Web Editor NEW
7.0 1.0 2.0 19.04 MB

Building the best machine learning model to detect phishing websites.

Jupyter Notebook 100.00%
phishing hybrid-model machine-learning ensemble-classifier phishing-attacks classifier

phishing-website-prediction's Introduction

Phishing-website-prediction

Phishing is a type of fraud wherein an attacker impersonates a reputable company or person in order to get sensitive information such as login credentials or account information via email or other communication channels. Phishing is popular among attackers because it is easier to persuade someone to click a malicious link that appears to be authentic than it is to break through a computer's protection measures.

This project presents the applicability of Machine Learning algorithms in predicting phishing attacks and concludes the positives and negatives. There are many Machine Learning algorithms that are inspected to present the best machine learning algorithm to predict and prevent the phishing attacks. We have extracted features from the URL like Address Bar based Features, Abnormal Based Features, HTML & JavaScript based Features and Domain based Features. We also present the numerical results of the machine learning algorithms like Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbour, Logistic Regression, Naive Bayes, AdaBoost Classifier and Hybrid Ensembler. And determine if a website is legitimate or not.

Proposed Architecture

image

In our work we use different types of features from the given URL like:

• URL-Based Features: The URL is the first thing to look at when determining whether or not a website is phishing. As previously stated, phishing domain URLs contain several distinguishing characteristics. When the URL is parsed, features linked to these points are acquired.

• Page-Based Features: Page-Based Features make use of data from reputation ranking systems to calculate information about pages. Some of these characteristics indicate how trustworthy a website is.

• Content-Based Features: Obtaining these features necessitates an active scan of the target domain. The contents of the page are analysed to see if the target domain is being used for phishing.

And we will be using different machine learning algorithms like • Logistic Regression • Naive Bayes • K-Nearest Neighbours • Decision Tree • SVM • Random Forest • AdaBoost • Hybrid Ensembler And compare the error percent and execution time and propose the best algorithm suited with the additional features we think can be useful to accurately predict.

Conclusion

image

Phishing is a serious threat to users' personal information these days. Because detecting phishing websites is a time-consuming task, the number of phishers is continually expanding. To overcome the issue, researchers and experts worked on many approaches and techniques, but it resulted in low rates of detection. We used a range of algorithm in our work. Out of which Hybrid Ensembler gave the maximum training accuracy of “0.9863” and test accuracy of “0.9620” and minimum training accuracy of “0.7096” and test accuracy of “0.7015” as shown in the table below. Hence our Hybrid Ensembler is the best classifier for predicting phishing websites.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.