pmacg / py-sgtl Goto Github PK

7.0 7.0 2.0 118 KB

A library of spectral graph theory methods in python.

License: MIT License

Python 99.06% Shell 0.94%

py-sgtl's Introduction

I’m a Lecturer in the School of Computer Science at the University of St Andrews. My research interests are broadly in algorithms for data science and machine learning and I always aim to develop theoretically sound algorithms which work well in practice. Please see my website for more information on my work.

If you'd like help with any of my work, or you'd just like to connect, then please do get in touch using the contact details on my website.

py-sgtl's People

Contributors

Stargazers

Watchers

Forkers

yonmaor rasmita404

py-sgtl's Issues

Generating graphs from the SBM is slow

The methods for generating graphs from the stochastic block model are a little slow. It would be nice to improve the implementation to speed them up.

Add a method for computing the number of connected components in a graph

It would be useful to have a single method for computing the number of connected components in a graph. Possibly by finding the number of eigenvalues of the Laplacian equal to 0 - might not be fast though!

Probably better to use a 'proper' connected components algorithm :) It's basically just repeated graph traversals.

Add method for constructing Margulis–Gabber–Galil expander graphs

Add a method to the graph module which constructs a Margulis–Gabber–Galil graph of order n.

Add method for constructing a barbell graph

Along with the existing standard graphs available, add a new method for constructing the barbell graph.

Graph methods should give meaningful errors when vertices out of range

Currently, if you ask for the volume of a set of vertices, but the set includes indices which are greater than the number of vertices in the graph, the code throws an IndexError from somewhere inside the graph code.

This is not totally unreasonable, but it would help debugging this kind of error if it gave a clear error message. Something like: "Vertex set includes indices larger than number of vertices".

Add spectrum method to graph

Add the methods

adjacency_spectrum()
laplacian_spectrum()
normalised_laplacian_spectrum()

To the graph object for returning the eigenvalues of the corresponding graph matrices.

Constructing the KNN graph should allow a sparse data matrix

Currently, when passing a sparse data matrix to the knn graph method, an error is thrown. This shouldn't happen.

Create labelled graph object

Add an object which extends the graph class to include labels for each vertex. Consider allowing arbitrary data associated with each vertex - maybe a dictionary of key-value pairs.

Look into behaviour of methods on graphs with negative weights

It may be sometimes useful to use graphs with negative weights. The library is not currently designed with this in mind, but many things might 'just work'. This issue covers investigating formally the behaviour of the library on graphs with negative weights.

Graph matrices are not symmetric for an undirected graph

For a large undirected graph generated from the stochastic block model, the generated graph matrices are not always symmetric. For example:

>>> big_graph = sgtl.random.ssbm(1000, 5, 0.8, 0.2)
>>> lap_mat = big_graph.normalised_laplacian_matrix()
>>> lap_mat_dense = lap_mat.toarray()
>>> np.allclose(lap_mat_dense, lap_mat_dense.T)
False

I haven't debugged this yet, but we should certainly be able to assume that the graph matrices are symmetric for an undirected graph.

Add Trevisan's max-cut approximation algorithm

Add an implementation of Trevisan's approximation algorithm for max cut.

The reference for this algorithm is here.

The weight function on the Graph object should return a float

At the moment, the weight function on sgtl.graph.Graph() rounds the weight to the nearest integer and returns an int.

This should return a float to allow fractionally weighted edges.

Test improvements - code coverage and hypothesis

Consider adding testing tools to the library, in particular:

measure code coverage and achieve 100%
investigate using the hypothesis testing tool for property-based tests

Typo in return value of bipartiteness method

The return value of the Graph.bipartiteness() method should display a beta in math mode in the rendered documentation. See the return value of the conductance method for how to do this.

Add weighted graph example to the Getting Started guide in the documentation

The 'Getting Started' page of the documentation currently only gives examples of unweighted graphs. It would be nice to also include a weighted example, or at least mention that this is possible.

Add methods for converting between graphml files

Add a method to construct a graph object from a graphml file. Should also add a method to save a graph object as a graphml file.

Create hypercube graph

Add a method to construct the hypercube graph of a given size.

Construct k-nearest neighbours graph

Add a method to construct an SGTL graph object from an arbitrary dataset using the k neraest neighbours construction.

Construct tensor-product graph

Add a method for constructing the tensor product of two graphs.

Consider overloading the '*' operator, so you can write graph3 = graph1 * graph2.

Add implementation of the cheeger cut

This issue covers moving the implementation of the Cheeger cut and the Trevisan cut from https://github.com/pmacg/pycheeg into SGTL.

Should include a fix for the issue raised under that project.

Combine graphs by adding their edges

Add a method to add two graphs together, equivalent to adding their adjacency matrices.

If the graphs do not have the same number of vertices, then this method should throw a suitable exception.

Consider overloading the '+' operator so that you can write graph3 = graph1 + graph2.

Add documentation example of finding cheeger cut from networkx graph

SGTL has now replaced the pycheeg package, and so the documentation should include a clear example of finding the cheeger cut of a networkx graph.

Add version specifiers to requirements.txt

To match best-practice, we should add version specifiers to the dependencies in requirements.txt.

Add tools for visualising the spectrum of a graph.

There should be tools available to visualise the spectrum of a graph - both the eigenvalues themselves and the spectral embedding of the eigenvectors.

Benchmarking and optimisation of basic graph operations

Many methods in the library are much slower than we'd like them to be. This issue covers doing some benchmarking and testing to identify the bottlenecks in the basic graph operations and implementing some speed-ups.

An ideal goal would be for basic operations (such as computing the conductance of a vertex set) on a graph with 1m vertices to be fast enough to be responsive to a user (e.g. sub 1 second) - is this achievable?

Construct the cartesian product of graphs

Add a method to the Graph object, similar to the tensor_product() method, which constructs the cartesian product of the two graphs, and returns the constructed graph as a new graph object.

Install requirements automatically through pip

When installing sgtl through pip, there are some other packages that it depends on. These should be installed automatically when you install sgtl.

Construct graph using a gaussian kernel with threshold

Generating SBM with different cluster sizes can fail

The following code for generating a graph from the stochastic block model fails with out-of-bounds errors.

import sgtl.random
import numpy as np


def main():
    cluster_sizes = [100, 50, 20, 20, 20]
    p = 0.8
    prob_mat_q = np.asarray([[p,   0.2, 0.5, 0.1, 0.1],
                             [0.2, p,   0.1, 0.6, 0.2],
                             [0.5, 0.1, p,   0.2, 0.1],
                             [0.1, 0.6, 0.2, p,   0.5],
                             [0.1, 0.2, 0.1, 0.5, p]])
    graph = sgtl.random.sbm(cluster_sizes, prob_mat_q)


if __name__ == '__main__':
    main()

Traceback (most recent call last):
  File ".../main.py", line 17, in <module>
    main()
  File ".../main.py", line 13, in main
    graph = sgtl.random.sbm(cluster_sizes, prob_mat_q)
  File ".../venv/lib/python3.8/site-packages/sgtl/random.py", line 179, in sbm
    adj_mat[vertex_1, vertex_2] = 1
  File ".../venv/lib/python3.8/site-packages/scipy/sparse/_lil.py", line 330, in __setitem__
    return self._set_intXint(key[0], key[1], x)
  File ".../venv/lib/python3.8/site-packages/scipy/sparse/_lil.py", line 299, in _set_intXint
    _csparsetools.lil_insert(self.shape[0], self.shape[1], self.rows,
  File "_csparsetools.pyx", line 61, in scipy.sparse._csparsetools.lil_insert
  File "_csparsetools.pyx", line 87, in scipy.sparse._csparsetools.lil_insert
IndexError: column index (240) out of bounds

Generating graphs from the SBM with a sparse array as the probability matrix fails.

Minimal working example

The error message thrown depends on the precise sparse matrix format. For example, here are two slightly different behaviours.

Case 1

>>> import sgtl.random
>>> import scipy.sparse
>>> prob_mat = scipy.sparse.eye(5)
>>> graph = sgtl.random.sbm_equal_clusters(100, 5, prob_mat)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/peter/.local/lib/python3.8/site-packages/sgtl/random.py", line 204, in sbm_equal_clusters
    return sbm([int(n/k)] * k, prob_mat_q, directed=directed)
  File "/home/peter/.local/lib/python3.8/site-packages/sgtl/random.py", line 176, in sbm
    for (vertex_1, vertex_2) in _generate_sbm_edges(cluster_sizes, prob_mat_q, directed=directed):
  File "/home/peter/.local/lib/python3.8/site-packages/sgtl/random.py", line 108, in _generate_sbm_edges
    prob_mat_q[cluster_1_idx][cluster_2_idx])
TypeError: 'dia_matrix' object is not subscriptable

Case 2

>>> import sgtl.random
>>> import scipy.sparse
>>> prob_mat = scipy.sparse.eye(5).tocsr()
>>> graph = sgtl.random.sbm_equal_clusters(100, 5, prob_mat)
TypeError: float() argument must be a string or a number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/peter/.local/lib/python3.8/site-packages/sgtl/random.py", line 204, in sbm_equal_clusters
    return sbm([int(n/k)] * k, prob_mat_q, directed=directed)
  File "/home/peter/.local/lib/python3.8/site-packages/sgtl/random.py", line 176, in sbm
    for (vertex_1, vertex_2) in _generate_sbm_edges(cluster_sizes, prob_mat_q, directed=directed):
  File "/home/peter/.local/lib/python3.8/site-packages/sgtl/random.py", line 107, in _generate_sbm_edges
    num_edges = np.random.binomial(cluster_1_size - (vertex_2 - c1_base_index),
  File "mtrand.pyx", line 3362, in numpy.random.mtrand.RandomState.binomial
ValueError: setting an array element with a sequence.

Workaround

It is possible to work around this problem in the client code by calling the toarray() method on the probability matrix when passing it to the SBM code.

>>> import sgtl.random
>>> import scipy.sparse
>>> prob_mat = scipy.sparse.eye(5).tocsr()
>>> graph = sgtl.random.sbm_equal_clusters(100, 5, prob_mat.toarray())

Allow precomputed eigenvectors for spectral clustering method

It might be useful to pass a list of precomputed eigenvectors to the spectral clustering method.

The num_edges member of the Graph object is incorrect when graph is weighted

Given a weighted graph object, the graph.num_edges gives half of the total volume of the graph, rather than the actual number of weighted edges.

This should be replaced with two methods on the object, a num_edges method which gives the number of non-zero elements of the adjacency matrix (corrected for the symmetry) and a graph_volume method which returns the total weight of all the edges in the graph.

Plot spectral embedding

Add a method for drawing a graph, using its spectral embedding. May just call out to e.g. networkx code to do the drawing.

Implement CLSZ algorithm for clustering directed graphs

Implement the hermitian matrices clustering algorithm in this paper for clustering directed graphs.

Check graph methods are correct for directed graphs

The sgtl.graph.Graph object has been written with undirected graphs in mind, but is expected to work for directed graphs too.

This issue covers double checking the directed graph cases. In particular, it covers:

writing tests for each Graph method for the directed graph cases
fixing up any problems that are found and
updating the documentation to make it clear what the behaviour is for directed graphs

Add method for constructing a grid graph

Add a new method to the graph module for constructing an m by n grid graph.

Conductance of the empty set should return clear error message

Requesting the conductance (or bipartiteness) of an empty set should return a meaningful error message to the client.

Minimal working example

>>> import sgtl.graph
>>> graph = sgtl.graph.cycle_graph(10)
>>> graph.conductance([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/peter/wc/py-sgtl/sgtl/graph.py", line 144, in conductance
    return 1 - (2 * self.weight(vertex_set_s, vertex_set_s, sets_are_equal=True)) / self.volume(vertex_set_s)
ZeroDivisionError: division by zero

For the bipartiteness case:

>>> graph.bipartiteness([], [])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/peter/wc/py-sgtl/sgtl/graph.py", line 159, in bipartiteness
    return 1 - 2 * self.weight(vertex_set_l, vertex_set_r) / self.volume(vertex_set_l + vertex_set_r)
ZeroDivisionError: division by zero

Desired Behaviour

Something like:

>>> import sgtl.graph
>>> graph = sgtl.graph.cycle_graph(10)
>>> graph.conductance([])
<snip>
ValueError: conductance of the empty set is undefined

Convert between networkx and graph objects

Add two methods to the sgtl.graph.Graph object to convert to and from the networkx graph type:

@staticmethod
def from_networkx(nx_graph):
    pass

def to_networkx(self):
    pass

pmacg / py-sgtl Goto Github PK

py-sgtl's Introduction

py-sgtl's People

Contributors

Stargazers

Watchers

Forkers

py-sgtl's Issues

Minimal working example

Case 1

Case 2

Workaround

Minimal working example

Desired Behaviour

Recommend Projects

Recommend Topics

Recommend Org