Giter Site home page Giter Site logo

frequentsubgraphmining's Introduction

Gaston Graph Mining with Python
This is a python implementation of the Gaston graph mining algorithm.

Author: Colin Conduff

"Gaston finds all frequent subgraphs by using a level-wise approach in which first simple paths are considered, 
then more complex trees and finally the most complex cyclic graphs. It appears that in practice most frequent 
graphs are not actually very complex structures; Gaston uses this quickstart observation to organize the search 
space efficiently. To determine the frequency of graphs, Gaston employs an occurrence list based approach in 
which all occurrences of a small set of graphs are stored in main memory." 
- Siegfried Nijssen (Gaston author)

Software Requirements:
 - python 3.2 or later
 - networkx 1.11
 - matplotlib

 Installation:
 `cd gaston_py`
 `pip install .`

Command Line Interface:
For usage instructions, use the following command.
`gaston -h`

Examples:
`gaston 0.95 test_files/medium_chemical.txt -o output_files/ -c -t`
`gaston 6 test_files/medium_chemical.txt`
`gaston 2 test_files/Chemical_340.txt`

Notes: 
 - Support is defined as frequency(subgraph) / count(graphs). See reference [1] below for details.
 - If an output directory is provided: 
     * Frequent subgraphs are drawn using matplotlib and saved under [output folder]/graphs/.
     * A `line_graph.txt` file is generated containing the frequent subgraphs in Line Graph format.
 - Test files are available in the `test_files` directory.  The Chemical_340 dataset was obtained 
     from Nijssen and Kok's website.

Run Tests:
`python setup.py test`

Performance:
 - This Python implementation is significantly slower than the original implementation 
      in C++ by Nijssen and Kok, as well as the Java implementation by the ParSeMis library.
 - It is also much slower than gSpan impelementations in Python.  This is likely due
      to the use of an occurrence list, which tracks all subgraphs that have been seen to 
      ensure the correctness of the algorithm.

References:
[1] Siegfried Nijssen and Joost Kok. A Quickstart in Frequent Structure Mining Can 
  Make a Difference. Proceedings of the SIGKDD, 2004.
http://liacs.leidenuniv.nl/~nijssensgr/gaston/index.html

Additional Resources:
Java implementation of the Gaston algorithm
https://www2.cs.fau.de/EN/research/zold/ParSeMiS/index.html
https://github.com/timtadh/parsemis

frequentsubgraphmining's People

Contributors

colinconduff avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.