BotHunter is a machine-learning based GitHub bot identification script that can be executed through command-line.
BotHunter accepts either a username to determine the type of contirbutor or the name of a repository (format: repo_owner/repo_name) to determine the type of contributors that are present in 'repository --> insights --> contributors'.
To determine the contirbutor type, bothunter depends on the following 19 features that are obtained through GitHub API:
Profile information:
- Account login
- Account name
- Account bio
- Number of followings
- Number of followers
- Account tag
Account activity:
- Total number of repository activities
- Total number of issue activities
- Total number of pull request activities
- Total number of commit activities
- Unique repository activities
- Unique issue activities
- Unique pull request activities
- Unique commit activities
- Median activity per day
- Median response time
Text similarity:
- Issue/Pull request comments
- Preceding comments
- Commit messages
Data for computing the features in profile information is obtained through GitHub Users API, account activity is obtained by making a maximum of 3 queries to the GitHub Events API and text similarity is obtained through repository API.
Given that BotHunter has many dependencies, and in order not to conflict with already installed packages, it is recommended to use a virtual environment before its installation. You can install and create a Python virtual environment and then execute BotHunter in this environment. You can use any virtual environment of your choice. Below are the steps to install and create a virtual environment with virtualenv.
Use the following command to install the virtual environment:
pip install virtualenv
Create a virtual environment in the folder where you want to place your files:
virtualenv <name>
Start using the environment by:
source <name>/bin/activate
After running this command your command line prompt will change to (<name>) ...
Now you can fork the BotHunter repository from 'https://github.com/natarajan-chidambaram/BotHunter' and clone it to your local system.
Navigate to the location in which BotHunter is cloned using the terminal command
cd <BotHunter location>
and you can install BotHunter dependencies from the provided requirements.txt with the pip command
pip install -r requirements.txt
When you are finished running the script, you can quit the environment by:
deactivate
To execute BotHunter, you need to provide a GitHub personal access token (API key). You can follow the instructions here to obtain such a token.
Parameters List:
--key <APIKEY>
GitHub personal access token (key) required to extract data from the GitHub API
--repo <REPO_OWNER/REPO_NAME>
Name of the GitHub repository to determine the type of all the contributors that are present in `https://github.com/repo_owner/repo_name/graphs/contributors'
Example: $ python BotHunter.py --key <GH_TOKEN> --repo <REPO_OWNER/REPO_NAME>
--file-repo <file cointaining mutiple REPO_OWNER/REPO_NAMEs>
A file containing the names of GitHub repositories (one name per line) to determine the type of all the contributors that are present in `https://github.com/repo_owner/repo_name/graphs/contributors' in all those repositories
Example: $ python BotHunter.py --key <GH_TOKEN> --file-repo <REPO_OWNER/REPO_NAME>
--u <USERNAME>
The username for which the type needs to be determined
Example: $ python BotHunter.py --key <GH_TOKEN> --u
--file-u <File containg mutiple USERNAMEs>
A file containing usernames (one username per line) for which the type needs to be determined
Example: $ python BotHunter.py --key <GH_TOKEN> --file-u
--csv <file name to save the prediction.csv>
Filename to save the predictions
Example: $ python BotHunter.py --key <GH_TOKEN> --file-u --csv <FILE_NAME>.csv
Note: Only either of --repo
or --u
can be given as input along with the --key
.
You can also run BotHunter using Docker. To do so, you need to have Docker installed on your system. You can follow the instructions here to install Docker.
After installing Docker, you can build the Docker image using the following command:
docker build -t bothunter .
After building the image, you can run the Docker container using the following command:
docker run --rm bothunter --key <GH_TOKEN> <OTHER_ARGUMENTS>
To retrieve the output of the argument --csv, bind the current directory with the container working directory using the following command:
docker run --rm -v `pwd`:`pwd` bothunter --key <GH_TOKEN> <OTHER_ARGUMENTS> --csv <FILE_NAME>.csv
$ python BotHunter.py --key <GH_TOKEN> --u bors
contributor type
bors Bot
$ python BotHunter.py --key <GH_TOKEN> --file-u usernames.txt
contributor type
natarajan-chidambaram Human
bors Bot
rust-timer Bot
$ python BotHunter.py --key <GH_TOKEN> --repo natarajan-chidambaram/BotHunter
contributor type
natarajan-chidambaram Human
$ python BotHunter.py --key <GH_TOKEN> --file-repo reponames.txt
contributor type
natarajan-chidambaram Human
bors Bot
rust-timer Bot
dependabot Bot
<anonymised> Human
<anonymised> Human
$ python BotHunter.py --key <GH_TOKEN> --file-u reponames.txt --csv predictions.csv
This project is distributed under parent repository's license - LGPL-2.1 license