Giter Site home page Giter Site logo

yingtongdou / nash-detect Goto Github PK

View Code? Open in Web Editor NEW
119.0 5.0 14.0 853 KB

Code for KDD 2020 paper Robust Spammer Detection by Nash Reinforcement Learning

Home Page: http://arxiv.org/abs/2006.06069

License: Apache License 2.0

Python 100.00%
spam-detection fraud-detection machine-learning reinforcement-learning security game-theory

nash-detect's Introduction

Nash-Detect

Code for KDD 2020 paper Robust Spammer Detection by Nash Reinforcement Learning.
Yingtong Dou, Guixiang Ma, Philip S. Yu, Sihong Xie.
[Paper][Slides][Video][Toolbox][Chinese Blog]

Overview



Nash-Detect is an algorithm proposed by the above paper to train a robust spam review detector using reinforcement learning. The robust detector is composed of five base detectors and is trained through playing a minimax game between the spammer and the defender. There are five base spamming strategies used by the spammer to synthesize the mixed spamming strategy.

This repo includes the spamming attack implementation and generation code, the detector implementation code, and the training & testing code for Nash-Detect and all baselines.

Note that we only investigate the shallow graph and behavior-based spam detectors in this paper; there is no text or deep neural network involved. Nonetheless, there is no hurdle to apply Nash-Detect to train robust neural networks or text-based spam detectors.

Setup

To run the code, you need the Yelp Spam Review Datasets. Please send email with the title Yelp Dataset Request to [email protected] to download the file with metadata and ground truth. You can unzip the dataset file under the root directory of the project.

You can download the project and install required packages using following commands:

git clone https://github.com/YingtongDou/Nash-Detect.git
cd Nash-Detect
pip3 install -r requirements.txt

To run the code, you need to have Python 3.6 or later version.

Running

  1. Run attack_generation.py with mode = "Training" to generate fake reviews for training
  2. Run worst_case.py to compute the worst-case performance of single attacks vs. single detectors
  3. Run training.py to train a robust detector configuration using Nash-Detect
  4. Run attack_generation.py with mode = "Testing" to generate fake reviews for testing
  5. Run testing.py to test the performance of the optimal detector trained by Nash-Detect and other baselines

To facilitate the training and testing, we have stored all generated fake reviews in directories /Training and /Testing. So you can skip Steps 1 and 4 to play the game and evaluation code directly. Moreover, you can play each single detector using the eval_XXX.py under the /Detector repository or using our UGFraud toolbox.

The experimental settings and model parameters can be found at the beginning of the main functions of training.py and testing.py.

Repo Structure

The repository is organized as follows:

  • Attack/ contains the implementations of four spamming attack strategies, the Singleton attack is implemented in attack_generation.py;
  • Detector/ contains the implementations and evaluations of five spam detectors;
  • Testing/ contains generated fake reviews for testing;
  • Training/ contains generated fake reviews for training;
  • Utils/ contains:
    • functions for loading graphs/features from dataset/manifest files (iohelper.py);
    • utility functions for training and testing (eval_helper.py);
    • functions for extracting and updating features and prior beliefs (yelpFeatureExtraction.py);
    • the manifest file for features (feature_configuration.py).

Citation

@inproceedings{dou2020robust,
  title={Robust Spammer Detection by Nash Reinforcement Learning},
  author={Dou, Yingtong and Ma, Guixiang and Yu, Philip S and Xie, Sihong},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  year={2020}
}

nash-detect's People

Contributors

yingtongdou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.