Giter Site home page Giter Site logo

bothunter's Introduction

BotHunter

Description

BotHunter is a machine-learning based GitHub bot identification script that can be executed through command-line.

BotHunter accepts either a username to determine the type of contirbutor or the name of a repository (format: repo_owner/repo_name) to determine the type of contributors that are present in 'repository --> insights --> contributors'.

Features

To determine the contirbutor type, bothunter depends on the following 19 features that are obtained through GitHub API:

Profile information:

  1. Account login
  2. Account name
  3. Account bio
  4. Number of followings
  5. Number of followers
  6. Account tag

Account activity:

  1. Total number of repository activities
  2. Total number of issue activities
  3. Total number of pull request activities
  4. Total number of commit activities
  5. Unique repository activities
  6. Unique issue activities
  7. Unique pull request activities
  8. Unique commit activities
  9. Median activity per day
  10. Median response time

Text similarity:

  1. Issue/Pull request comments
  2. Preceding comments
  3. Commit messages

Data for computing the features in profile information is obtained through GitHub Users API, account activity is obtained by making a maximum of 3 queries to the GitHub Events API and text similarity is obtained through repository API.

Installation

Given that BotHunter has many dependencies, and in order not to conflict with already installed packages, it is recommended to use a virtual environment before its installation. You can install and create a Python virtual environment and then execute BotHunter in this environment. You can use any virtual environment of your choice. Below are the steps to install and create a virtual environment with virtualenv.

Use the following command to install the virtual environment:

pip install virtualenv

Create a virtual environment in the folder where you want to place your files:

virtualenv <name>

Start using the environment by:

source <name>/bin/activate

After running this command your command line prompt will change to (<name>) ...

Now you can fork the BotHunter repository from 'https://github.com/natarajan-chidambaram/BotHunter' and clone it to your local system.

Navigate to the location in which BotHunter is cloned using the terminal command

cd <BotHunter location>

and you can install BotHunter dependencies from the provided requirements.txt with the pip command

pip install -r requirements.txt

When you are finished running the script, you can quit the environment by:

deactivate

Usage

To execute BotHunter, you need to provide a GitHub personal access token (API key). You can follow the instructions here to obtain such a token.

Parameters List:

--key <APIKEY> GitHub personal access token (key) required to extract data from the GitHub API

--repo <REPO_OWNER/REPO_NAME> Name of the GitHub repository to determine the type of all the contributors that are present in `https://github.com/repo_owner/repo_name/graphs/contributors'

Example: $ python BotHunter.py --key <GH_TOKEN> --repo <REPO_OWNER/REPO_NAME>

--file-repo <file cointaining mutiple REPO_OWNER/REPO_NAMEs> A file containing the names of GitHub repositories (one name per line) to determine the type of all the contributors that are present in `https://github.com/repo_owner/repo_name/graphs/contributors' in all those repositories

Example: $ python BotHunter.py --key <GH_TOKEN> --file-repo <REPO_OWNER/REPO_NAME>

--u <USERNAME> The username for which the type needs to be determined

Example: $ python BotHunter.py --key <GH_TOKEN> --u

--file-u <File containg mutiple USERNAMEs> A file containing usernames (one username per line) for which the type needs to be determined

Example: $ python BotHunter.py --key <GH_TOKEN> --file-u

--csv <file name to save the prediction.csv> Filename to save the predictions

Example: $ python BotHunter.py --key <GH_TOKEN> --file-u --csv <FILE_NAME>.csv

Note: Only either of --repo or --u can be given as input along with the --key.

Docker

You can also run BotHunter using Docker. To do so, you need to have Docker installed on your system. You can follow the instructions here to install Docker.

After installing Docker, you can build the Docker image using the following command:

docker build -t bothunter .

After building the image, you can run the Docker container using the following command:

docker run --rm bothunter --key <GH_TOKEN> <OTHER_ARGUMENTS>

To retrieve the output of the argument --csv, bind the current directory with the container working directory using the following command:

docker run --rm -v `pwd`:`pwd` bothunter --key <GH_TOKEN>  <OTHER_ARGUMENTS> --csv <FILE_NAME>.csv 

Examples

$ python BotHunter.py --key <GH_TOKEN> --u bors
contributor   type
       bors    Bot
$ python BotHunter.py --key <GH_TOKEN> --file-u usernames.txt
          contributor    type
natarajan-chidambaram   Human
                 bors     Bot
           rust-timer     Bot

$ python BotHunter.py --key <GH_TOKEN> --repo natarajan-chidambaram/BotHunter
         contributor     type
natarajan-chidambaram   Human
$ python BotHunter.py --key <GH_TOKEN> --file-repo reponames.txt
         contributor     type
natarajan-chidambaram   Human
                 bors     Bot
           rust-timer     Bot
           dependabot     Bot
         <anonymised>   Human
         <anonymised>   Human
$ python BotHunter.py --key <GH_TOKEN> --file-u reponames.txt --csv predictions.csv

License

This project is distributed under parent repository's license - LGPL-2.1 license

bothunter's People

Contributors

stefanostone avatar natarajan-chidambaram avatar ahmad-abdellatif avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.