Giter Site home page Giter Site logo

willer-lu / phishpedia Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lindsey98/phishpedia

0.0 0.0 0.0 2.27 MB

Official Implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21

License: MIT License

Shell 1.08% Python 69.56% Jupyter Notebook 29.36%

phishpedia's Introduction

Phishpedia A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

Dialogues Dialogues

PaperWebsiteVideoDatasetCitation

  • This is the official implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21 link to paper, link to our website, link to our dataset.

  • Existing reference-based phishing detectors:

    • ❌ Lack of interpretability
    • ❌ Lack of generalization performance in the wild
    • ❌ Lack of a large-scale phishing benchmark dataset
  • The contributions of our paper:

    • ✅ We propose a phishing identification system Phishpedia, which has high identification accuracy and low runtime overhead, outperforming the relevant state-of-the-art identification approaches.
    • ✅ Our system provides explainable annotations which increase users' confidence in model prediction
    • ✅ We conducted a phishing discovery experiment on emerging domains fed from CertStream and discovered 1,704 real phishing, out of which 1133 are zero-days

Framework

Input: A URL and its screenshot Output: Phish/Benign, Phishing target

  • Step 1: Enter Deep Object Detection Model, get predicted logos and inputs (inputs are not used for later prediction, just for explanation)

  • Step 2: Enter Deep Siamese Model

    • If Siamese report no target, Return Benign, None
    • Else Siamese report a target, Return Phish, Phishing target

Project structure

- src
    - adv_attack: adversarial attacking scripts
    - detectron2_pedia: training script for object detector
     |_ output
      |_ rcnn_2
        |_ rcnn_bet365.pth 
    - siamese_pedia: inference script for siamese
     |_ siamese_retrain: training script for siamese
     |_ expand_targetlist
         |_ 1&1 Ionos
         |_ ...
     |_ domain_map.pkl
     |_ resnetv2_rgb_new.pth.tar
    - siamese.py: main script for siamese
    - pipeline_eval.py: evaluation script for general experiment

- tele: telegram scripts to vote for phishing 
- phishpedia_config.py: config script for phish-discovery experiment 
- phishpedia_main.py: main script for phish-discovery experiment 

Instructions

Requirements:

  1. Create a local clone of Phishpedia
git clone https://github.com/lindsey98/Phishpedia.git
  1. Setup
cd Phishpedia/
chmod +x ./setup.sh
./setup.sh

If you encounter any problem in downloading the models, you can manually download them from here https://huggingface.co/Kelsey98/Phishpedia. And put them into the corresponding conda environment.

conda activate myenv

Run in Python to test a single website

from phishpedia.phishpedia_main import test
import matplotlib.pyplot as plt
from phishpedia.phishpedia_config import load_config

url = open("phishpedia/datasets/test_sites/accounts.g.cdcde.com/info.txt").read().strip()
screenshot_path = "phishpedia/datasets/test_sites/accounts.g.cdcde.com/shot.png"
ELE_MODEL, SIAMESE_THRE, SIAMESE_MODEL, LOGO_FEATS, LOGO_FILES, DOMAIN_MAP_PATH = load_config(None)

phish_category, pred_target, plotvis, siamese_conf, pred_boxes = test(url=url, screenshot_path=screenshot_path,
                                                                       ELE_MODEL=ELE_MODEL,
                                                                       SIAMESE_THRE=SIAMESE_THRE,
                                                                       SIAMESE_MODEL=SIAMESE_MODEL,
                                                                       LOGO_FEATS=LOGO_FEATS,
                                                                       LOGO_FILES=LOGO_FILES,
                                                                       DOMAIN_MAP_PATH=DOMAIN_MAP_PATH
                                                                      )

print('Phishing (1) or Benign (0) ?', phish_category)
print('What is its targeted brand if it is a phishing ?', pred_target)
print('What is the siamese matching confidence ?', siamese_conf)
print('Where is the predicted logo (in [x_min, y_min, x_max, y_max])?', pred_boxes)
plt.imshow(plotvis[:, :, ::-1])
plt.title("Predicted screenshot with annotations")
plt.show()

Or run in bash

python run.py --folder <folder you want to test e.g. phishpedia/datasets/test_sites> --results <where you want to save the results e.g. test.txt> 

Miscellaneous

  • In our paper, we also implement several phishing detection and identification baselines, see here
  • The logo targetlist described in our paper includes 181 brands, we have further expanded the targetlist to include 277 brands in this code repository
  • For the phish discovery experiment, we obtain feed from Certstream phish_catcher, we lower the score threshold to be 40 to process more suspicious websites, readers can refer to their repo for details
  • We use Scrapy for website crawling Repo here

Citation

If you find our work useful in your research, please consider citing our paper by:

@inproceedings{lin2021phishpedia,
  title={Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages},
  author={Lin, Yun and Liu, Ruofan and Divakaran, Dinil Mon and Ng, Jun Yang and Chan, Qing Zhou and Lu, Yiwen and Si, Yuxuan and Zhang, Fan and Dong, Jin Song},
  booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
  year={2021}
}

Contacts

If you have any issues running our code, you can raise an issue or send an email to [email protected], [email protected], and [email protected]

phishpedia's People

Contributors

lindsey98 avatar llmhyy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.