This is my final year thesis and the topic is using static analysis combine with machine learning techniques to detect whether or not an APK file (.i.e the android application executable) is a malware. The project's source code is written in Python for rapid development.
To get into the problem, I have to follow these steps: collect data (which are APK files), extract to inspect data & get insight about them, build a classification model to make prediction about new (unseen) APK file.
The dataset (.i.e. Training set) is divided into 2 class: benign apps & malware apps
- The malware applications are collected by request access to VirusShare repository & by request to the author of blog page Contagio.
- The benign applications are crawled by the self-made crawler to 2 free android applications markets:
The dataset I used to build model contains 4101 malware apps & 1276 benign apps.
I have used Androguard to extract & inspect data from apk files. It is an easy tool to use, but also has a problem that it consumes much of computer's memory & take much time to extract. So I had change to use Apktool to share some work load to disk.
I used Xgboost to build model, it's a implimentation of Gradient Boosted Machine model which is recently very famous on Kaggle commpetitions. It's provide efficiently models for small laptop like mine's can process over 5000 records of raw data within just 4 - 5 minutes. I will update about some result experiences of my model on my blog.
This project only support on Linux environment. This has some problems with file system in Windows
- Python >= 2.7
- JRE 8
- Apktool 2.2.2
- xgboost
- Download Apktool and Linux wrapper script into same directory then
$ cd apktool
$ export PATH=$PATH:$PWD
- To install python package of
Xgboost
head to this site and follow the instructions
After install all requirements, clone this repository and run install:
$ git clone https://github.com/hunguyen1702/android_malware_detector.git
then
$ python setup.py install
or (if you want to install with pip
)
$ pip install .
After that, the android-malware-detector
command with available on your system,
choose the --help
option for usage.