Giter Site home page Giter Site logo

transcendent-release's Introduction

Transcendent Code

Using conformal evaluation to detect concept drift affecting malware detection.

For more information, you can see the project page: https://s2lab.cs.ucl.ac.uk/projects/transcend/

What is Transcend and Conformal Evaluation?

Malware evolves rapidly which makes it hard---if not impossible---to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve.

Transcendent is a toolset which, together with a statistical framework called conformal evaluation, aims to identify aging classification models in vivo during deployment, before the machine learning model's performance starts to degrade.

Further details can be found in the paper Transcending TRANSCEND: Revisiting Malware Classification in the Presence of Concept Drift. by F. Barbero, F. Pendlebury, F. Pierazzi, and L. Cavallaro (IEEE S&P 2022).

If you end up using Transcendent as part of a project or publication, please include a citation of the S&P paper:

@inproceedings{barbero2022transcendent,
author = {Federico Barbero and Feargus Pendlebury and Fabio Pierazzi and Lorenzo Cavallaro},
title = {Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift},
booktitle = {{IEEE} Symposium on Security and Privacy},
year = {2022},
}

Transcendent is based on Transcend. Further details can be found in the paper Transcend: Detecting Concept Drift in Malware Classification Models. by R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, and L. Cavallaro (USENIX Sec 2017). An associated presentation can be found at the Usenix site.

If you end up using Transcendent as part of a project or publication, please include a citation of the original Transcend Usenix paper as well:

@inproceedings {jordaney2017,
    author = {Roberto Jordaney and Kumar Sharad and Santanu K. Dash and Zhi Wang and Davide Papini and Ilia Nouretdinov and Lorenzo Cavallaro},
    title = {Transcend: Detecting Concept Drift in Malware Classification Models},
    booktitle = {26th {USENIX} Security Symposium ({USENIX} Security 17)},
    year = {2017},
    isbn = {978-1-931971-40-9},
    address = {Vancouver, BC},
    pages = {625--642},
    url = {https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/jordaney},
    publisher = {{USENIX} Association},
}

Getting Started

Installation

Transcend requires Python 3 (preferably >= 3.5) as well as the statistical learning stack of NumPy, SciPy, and Scikit-learn.

Package dependencies can be installed by using the listing in requirements.txt.

pip install -r requirements.txt

A full installation can be peformed using setup.py:

pip install -r requirements.txt
python setup.py install 

Features to reproduce the Android experiments can be downloaded from this link

Features for Marvin and Drebin can be downloaded from this link

Usage

Conformal evaluation can get a little bit fiddly, so it's advised that you become familiar with a typical testing pipeline such as the example given in ce.py as well as the following functions (which are particularly affected by different configuration settings):

  • utils.parse_args()
  • data.load_features()
  • thresholding.find_quartile_thresholds()
  • thresholding.find_random_search_thresholds()
  • thresholding.sort_by_predicted_label()
  • thresholding.get_performance_with_rejection()

ce.py

An example conformal evaluation pipeline using the Transcend library is given in ce.py. It can be run with a multitude of command line arguments.

Comparing quartiles of correct predictions using credibility only:

python3 ce.py	                  	    \
    --train drebin              	    \
    --test marvin_full          	    \
    -k 10                       	    \
    -n 10                       	    \
    --pval-consider full-train  	    \
    -t quartiles                	    \
    --q-consider correct                \
    -c cred                     	 

Random search for thresholds maximising F1 above threshold and minimising F1 of rejected predictions while enforcing thresholds for credibility and confidence:

python3 ce.py	                  	    \
    --train drebin              	    \
    --test marvin_full          	    \
    -k 10                       	    \
    -n -2                       	    \
    --pval-consider full-train  	    \
    -t random-search            	    \
    -c cred+conf                  	    \
    --rs-max f1_k           	 	    \
    --rs-min f1_r              		    \
    --rs-limit reject_total_perc:0.25   \
    --rs-samples 500

Random search for thresholds maximising F1 above threshold subject to the total percentage of rejected elements while enforcing credibility thresholds:

python3 ce.py 		                                \
	--train drebin                                  \
	--test marvin_half                              \
	-k 10                                           \
	-n -1                                           \
	--pval-consider full-train                      \
	-t constrained-search                           \
	-c cred                                         \
	--cs-max f1_k:0.95                              \
	--cs-con kept_pos_perc:0.76,kept_neg_perc:0.76  \
	--rs-samples 500

Acknowledgements

This research has been partially supported by the UK EPSRC grants EP/K033344/1, EP/L022710/1, EP/K006266/1, and EP/P009301/1 as well as the NVIDIA Corporation, NHS England, and Innovate UK.

transcendent-release's People

Contributors

fbpierazzi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.