Giter Site home page Giter Site logo

thao9611 / egosplitting Goto Github PK

View Code? Open in Web Editor NEW

This project forked from benedekrozemberczki/egosplitting

0.0 1.0 0.0 1.61 MB

A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).

Home Page: https://karateclub.readthedocs.io/

License: GNU General Public License v3.0

Python 100.00%

egosplitting's Introduction

Ego-Splitting Framework

License codebeat badge

A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017)

Abstract

We propose a new framework called Ego-Splitting for detecting clusters in complex networks which leverage the local structures known as ego-nets (i.e. the subgraph induced by the neighborhood of each node) to de-couple overlapping clusters. Ego-Splitting is a highly scalable and flexible framework, with provable theoretical guarantees, that reduces the complex overlapping clustering problem to a simpler and more amenable non-overlapping (partitioning) problem. We can solve community detection in graphs with tens of billions of edges and outperform previous solutions based on ego-nets analysis.

More precisely, our framework works in two steps: a local ego-net analysis phase, and a global graph partitioning phase . In the local step, we first partition the nodes’ ego-nets using a partitioning algorithm. We then use the computed clusters to split each node into its persona nodes that represent the instantiations of the node in its communities. Then, in the global step, we partition the newly created graph to obtain an overlapping clustering of the original graph.

This repository provides a lightweight NetworkX implementation of Ego-splitting as described in the paper:

Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters. Alessandro Epasto, Silvio Lattanzi, and Renato Paes Leme. KDD, 2017. [Paper]

A reference implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
pandas            0.23.4
texttable         1.5.0
argparse          1.1.0
python-louvain    0.13.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for `Facebook Politicians` and `Facebook TV Shows` are included in the `input/` directory.

Options

Training an Ego-splitter model is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --edge-path       STR     Edge list csv.            Default is `input/tvshow_edges.csv`.
  --features-path   STR     Membership json.          Default is `output/tvshow_cluster_memberships.json`.
  --resolution      FLOAT   Validation set size.      Default is 1.0.

Examples

The following commands create an egonet splitted overlapping community assignment. (Training a model on the default dataset.)

python src/main.py

Training a model with a higher resolution.

python src/main.py --resolution 2.5

Training a model with a lower resolution.

python src/main.py --resolution 0.5

Training a model on the Facebook TV shows dataset.

python src/main.py --edge-path input/tvshow_edges.csv --output-path output/tvshow_cluster_memberships.json

egosplitting's People

Contributors

benedekrozemberczki avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.