Giter Site home page Giter Site logo

satoshizpy / deep_spammer_detection_gcn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mikel-brostrom/deep_spammer_detection_gcn

0.0 1.0 0.0 75 KB

An introduction to graph convolutional neural networks with PyTorch Geometric

Home Page: https://github.com/rusty1s/pytorch_geometric

Python 100.00%

deep_spammer_detection_gcn's Introduction

Deep_Spammer_Detection_GCN

The idea with this project is to classify whether the nodes in a graph are spammers or not.

Introduction

A large amount of data in practical tasks naturally comes in the form of irregular, non-euclidean structures; e.g. graph data or meshes. Many real-life representations such as social networks, maps, molecular structures take this form. GCNs transfers the high performance of traditional convolutional neural networks to this kind of data. These set of methods got recently brought together under the term geometric deep learning.

The input data to GCNs

A graph convolutional network takes a graph G = (V, E), as input, represented in the following way:

  • Node features: N x D where N is the number of nodes and D is the number of features
  • Matrix representation: 2 x L where L is the number of edges in the graph
  • Edge attributes: L x R where L x R where R is the number of features per node

The convolutional operations

Node features are extracted by convolution operators that aggregates node features in local neighborhoods weighted by a trainable, continuous kernel function. An example of spatial aggregation in geometric deep learning with trainable, continuous kernel functions for image graph representations can be seen below (image source: https://arxiv.org/pdf/1711.08920.pdf) :

Requirements

Python 3.7 or later with

  • np
  • torch-scatter
  • torch-sparse
  • torch-cluster
  • torch-spline-conv
  • torch-geometric

make sure that they work with your PyTorch version

The generated data

We generate a graph that tries to depict an internet network with spammer nodes and non-spammer nodes.

Spammer have the following similarities:

  • They have more edges (connections to different computers)
  • A high edge value associated to the nodes they are connected to (depicting the amount of bytes they transfer via email to their neighbours)
  • Each spammer has an high associated node value (depicting the level of certainty that it is a spammer). If the node is more likely to be a spammer then the value will be closer to 1.

Non-spammers have the opposite characteristics

Training

train.py runs the training on the graph generated by the dataloader:

python3 train.py

Training output example:

...
Epoch: 024, train_loss: 0.185, test_loss:0.192, train_acc: 0.99, test_acc: 0.95
Epoch: 025, train_loss: 0.138, test_loss:0.151, train_acc: 0.99, test_acc: 0.95
Epoch: 026, train_loss: 0.099, test_loss:0.117, train_acc: 0.99, test_acc: 0.95
Epoch: 027, train_loss: 0.068, test_loss:0.091, train_acc: 1.00, test_acc: 0.95
Epoch: 028, train_loss: 0.046, test_loss:0.067, train_acc: 1.00, test_acc: 1.00
Epoch: 029, train_loss: 0.030, test_loss:0.048, train_acc: 1.00, test_acc: 1.00
Epoch: 030, train_loss: 0.019, test_loss:0.035, train_acc: 1.00, test_acc: 1.00
Epoch: 031, train_loss: 0.011, test_loss:0.024, train_acc: 1.00, test_acc: 1.00
Epoch: 032, train_loss: 0.007, test_loss:0.016, train_acc: 1.00, test_acc: 1.00
Epoch: 033, train_loss: 0.004, test_loss:0.010, train_acc: 1.00, test_acc: 1.00
Epoch: 034, train_loss: 0.002, test_loss:0.006, train_acc: 1.00, test_acc: 1.00
Epoch: 035, train_loss: 0.001, test_loss:0.004, train_acc: 1.00, test_acc: 1.00
Epoch: 036, train_loss: 0.001, test_loss:0.002, train_acc: 1.00, test_acc: 1.00
Epoch: 037, train_loss: 0.001, test_loss:0.002, train_acc: 1.00, test_acc: 1.00
Epoch: 038, train_loss: 0.000, test_loss:0.001, train_acc: 1.00, test_acc: 1.00
Epoch: 039, train_loss: 0.000, test_loss:0.001, train_acc: 1.00, test_acc: 1.00
Epoch: 040, train_loss: 0.000, test_loss:0.001, train_acc: 1.00, test_acc: 1.00
Epoch: 041, train_loss: 0.000, test_loss:0.000, train_acc: 1.00, test_acc: 1.00
...

The network

We use a stack of spline convolutions with exponential ReLU and dropout for regularization. The metric used is nll_loss which can be used for classification of arbitrary classes.

deep_spammer_detection_gcn's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.