The xgi from xgi-org

Replace get_edge_data?

The method in question is defined in classes/hypergraph.py as :

    def get_edge_data(self, id, default=None):
        """Returns the attribute dictionary associated with edge id.

        This is identical to `H._edge_attr[id]` except the default is returned
        instead of an exception if the edge doesn't exist.
        """
        try:
            # this may fail because the ID may not exist
            # or the property doesn't exist.
            return self.edges[id]
        except KeyError:
            return default

I suggest we move remove it from Hypergraph and make it a part of IDView, where it can just be called get. This will match more nicely with the dictionary interface:

# instead of this
H.get_edge_data(-1, default)

# how about this
H.edges.get(-1, default)

JSON File format and specifications

This issue consists of 4 components:

A JSON file format to standardize storage of hypergraphs (need to consider temporality, directedness, etc.)
A specification file that lays out best practices for this file
Functions to go from a Hypergraph to a JSON and vice-versa
A script that will check a JSON file and see if the format is correct (nodes and edges agree, tags are correct, etc.)

There are three different ways of accessing edge data

H.edge[id], xgi.get_edge_attributes, and H.get_edge_data. All of these should be handled via the same interface, perhaps a AttrView object that we used to have.

H.name and H['name'] are inconsistent

After #41, the code H['name'] will access hypergraph attributes and therefore should (hopefully) always return the same as H.name. However, if the name is not set, we will get, after that PR is merged, the following:

H.name
# -> ""

H['name']
# -> KeyError: 'name'

Improve the efficiency of the connected methods

It may be possible to speed up the methods in algorithms/connected.py by removing the dependence on networkx and doing a breadth-first search directly.

Add tutorial/example notebooks to test suite

Improve the performance of the matrix computation algorithms

The adjacency matrix is computed indirectly through the incidence matrix via the formula $A = II^T$, but this is weighted by occurrence and has a non-zero diagonal. Perhaps computing this directly would be more efficient.

module 'xgi' has no attribute 'version'

import xgi
xgi.__version__

yields module 'xgi' has no attribute '__version__'. It would be nice to support it.

Subhypergraphs are not views

Currently, the way we have implemented subhypergraphs are essentially as read-only (but separate) hypergraphs. They are not really views, though there is some language in the code base that would suggest that they are. What I mean by this is the following:

H = xgi.Hypergraph([(0,1,2), (0,3), (1,2)])
sub = H.subhypergraph([0,1,2])

# the subgraph contains the correct edges
sub.edges
# -> EdgeView((0, 2))

# now modify the original graph
H.add_edge([0, 1])
H.edges
# -> EdgeView((0, 1, 2, 3))

# the subgraph DOES NOT update
sub.edges
# -> EdgeView((0, 2))
# we want EdgeView((0, 2, 3))

My problem is this: since the subgraphs do not update automatically (i.e. they are not views), there is no point in making them read-only. So for now we can just return a new Hypergraph object whenever the user needs a subgraph.

Implement "cleanup" method

~~Make sure the documentation highlights the fact that the user should remember to call remove_isolates and other related functions.~~
Implement a cleanup method that performs various common clean up tasks, see comment down thread.

broken requirement links

In the requirement readme file, links for example.txt and doc.txt are broken.

Interface of NodeView

NodeView supports accessing the edge ids that contain a node in three different ways:

H = some_hypergraph()
H.nodes[1] == H.nodes(1) == H.members(1)
# -> True

I find this unpythonic. I'd much rather there was a single way of doing things if at all possible. If there was exactly one way of accessing the edges that contain a node, which would you prefer? If anyone can argue that we should support two (or more) ways of doing the same thing, I'm all ears.

Related to #20.

EdgeView and NodeView not uniform

We have the case that EdgeView([...]) contains a list while NodeView((...)) contains a tuple.

Fix Github action for unit tests

The .github/workflows/test.yml file needs to be edited so that the tests pass. I removed doctests for now. Having trouble getting it to work so would appreciate any help.

Should the edge members be stored as a set?

There may be unintended consequences to storing the hyperedge members as a set. When there are self-loops, the edge will shrink in cardinality. When constructing m-uniform hypergraphs with self-loops, the hypergraph will no longer will uniform.

unique_edge_sizes(): debug and move to functions

Fix the reference to edge_degree
Move to the function.py file since this is not a core part of the data structure.

Wrong dim of linalg degree when no neighbours

Running

import xgi

edges = [[0,1], [0,2], [1,2,3]]
H = xgi.Hypergraph(edges)

yields

xgi.degree(H, order=3).shape
## --> (0,)

but it should be (4,), the number of nodes.

This then yields an error when running xgi.laplacian(H, order=3), because it conflicts with the (correct) dimension of the adjacency matrix:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 xgi.laplacian(H, order=3).shape

File ~/Dropbox (ISI Foundation)/WORK/SCIENCE/xgi/xgi/linalg/matrix.py:274, in laplacian(H, order, rescale_per_node, index)
    270     return (np.array([]), {}) if index else np.array([])
    272 K = degree(H, order=order, index=False)
--> 274 L = order * np.diag(np.ravel(K)) - A  # ravel needed to convert sparse matrix
    275 L = np.asarray(L)
    277 if rescale_per_node:

ValueError: operands could not be broadcast together with shapes (0,0) (4,4)

sparse matrix warning in adjacency_matrix()

Calling

import xgi
H = xgi.random_hypergraph(10, [0.1, 0.01])
A = xgi.adjacency_matrix(H, weighted=True)

yields two warnings:

/usr/local/lib/python3.9/site-packages/scipy/sparse/compressed.py:291: SparseEfficiencyWarning: 
Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead. 
  warn(bad_scalar_msg, SparseEfficiencyWarning)
/usr/local/lib/python3.9/site-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning: 
Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray(i, j, x)

When weighted=False, I cannot reproduce a warning (but I might have had one in the past, unsure).
It seems to come from the last line in

if not weighted:
    A = (A >= s) * 1
else:
    A[A < s] = 0

Add function to get list of edges (not edge ids)

Currently,

H.edges 
# EdgeView([3, 2, 4, 9])

outputs edge ids, not edges as list of member nodes.
It's often useful to access edges as list of member nodes.
One current way is list(H._edge.values()) but it's long and hidden.

Should we have a two views?

EdgeIdView (to replace current EdgeView): for ids

H.edges_id
# EdgeIdView([3, 2, 4, 9])

EdgeView:for list of member nodes

H.edges
# EdgeView([[0], [3, 4], [4], [4]])

Better way of getting node and edge data

Currently the way of accessing nodal data of node 1 is H.nodes.data()[1] or H.nodes(1).data() and a more streamlined call would be preferable. Currently .nodes(id) and .nodes[id] access a node's bipartite neighbors so this probably is an undesirable way of doing it. The problem is that unlike NetworkX, we are using two dictionaries each for nodes and edges: one for nodes and one for attributes. So, I'm not sure what the function would return. I see three options 1) a separate property of Hypergraph called H.node_data or something 2) a flag data where if data=True, then it returns the node attributes and if false, it returns the nodes with the bipartite connections, or 3) return a tuple of the bipartite connections and data respectively. Would appreciate your thoughts!

Which should be the default? Edge ids or edge members?

I'm just wondering if we still wanted to implement this solution #28 (comment) that we seemed to agree on. But then we didn't implement it.

Namely, something more like

H = xgi.Hypergraph([[1, 2, 3], [3, 4], [4, 5, 6]])

H.edge_ids
# -> EdgeIDView((0, 1, 2))
H.edges
# -> EdgeView([[1, 2, 3], [3, 4], [4, 5, 6]])
H.edges(return_dict=True)
# -> {0: [1, 2, 3], 1: [3, 4], 2: [4, 5, 6]}

I think the reasoning was that it makes it shorter and more straightforward to see the members, as well as being consistent with networkx's edges.

Originally posted by @maximelucas in #70 (comment)

Add unit tests for XGI

Write initial tests for XGI using pytest for the following files:

Modify degree_counts function

Change API such that

degree_counts returns what degree_histogram currently returns, and
degree_histogram returns the same data structures as np.unique(i.e. one array with the heights and one array with the degree bins.)

Improve the has_edge method

Improve the efficiency of the has_edge method by implementing with generators.

Remove `create_using` from hypergraph conversion methods

For the methods in the convert.py file, the create_using keyword is used to indicate the constructor to be used to add edges from. This may be unnecessary and potentially should be implemented differently.

Update documentation to a standard format

Update the docstrings of the functions to the numpydoc format for the following files:

Same edge (by members) gets multiple IDs in erdos_renyi_hypergraph

Working on #29 to remove singletons, I came across this:
Running

import xgi
n = 10
m = 50
p = 0.1
H = xgi.erdos_renyi_hypergraph(n, m, p)

then in H._edge, a given edge sometimes appear more than once with different IDs, e.g. { 38: [9], 39: [9], 48: [9], ...}.

In addition, not all nodes are present in edges as a singleton. For example, there is a node 8 but no singleton edge [8].

Haven't checked if #33 fixes this?

Add ability create/look at subhypergraphs

Add a function which allows you to specify a list of nodes and an option to shrink edges (false if you want an induced subhypergraph) and it will return the subhypergraph.
Add a function which allows you to specify a list of edges and it will return the subhypergraph.

Add ability to specify hyperdegree order

The current functionality only specifies the number of hyperedges to which a node belongs. In principle, one should be able to specify a hyperedge size or list of sizes and it returns the number of hyperedges of that size of which that node is a part.

Add support for directed hypergraphs

It would be good to add a class that allows users to specify a directed hypergraph. I've seen several ways to do this, but not sure which is best.

Implement weak removal of edges

A weak removal of a node from a hypergraph means removing the node from each edge it belongs to and then removing the node itself from the node set. A strong removal means weak removal AND deleting all edges that contain that node. Currently, Hypergraph.remove_node implements strong removal only.

We should think whether or not this makes sense in the case of simp. comps.

Normalize edge "order" and "size"

The words "order" and "size" are used throughout the codebase to refer to the number of nodes in an edge (where order = size-1). I'd suggest we choose one and remove all instances of the other. For example, we have H.edge_size but H.edges_of_order.

Wrong dimension in sparse incidence matrix

The incidence matrix should always have dimension (num_nodes, num_edges) or (num_nodes, num_edges_of_order).

But when sparse=True, dimension 0 is wrong when some nodes are disconnected (overall, or at a given order when it is specified). This makes all the other functions using the incidence matrix wrong (adjacency, laplacian, etc.).

MWE:

nodes = range(5)
edges = [[0,1], [0,2], [1,2,3]]

H = xgi.empty_hypergraph()
H.add_nodes_from(nodes)
H.add_edges_from(edges)

xgi.incidence_matrix(H, sparse=False).shape
## --> (5, 3) # good
xgi.incidence_matrix(H, sparse=True).shape
## --> (4, 3) # not good

This is because when sparse=True, the sparse matrix is built by iterating over the members of edges (of order d):

https://github.com/ComplexGroupInteractions/xgi/blob/388114d72d3ba03ea7c04ced52532762f3588ef8/xgi/linalg/matrix.py#L71-L82

adjacency_matrix error when no edges (of order d)

Running

import xgi
H = xgi.Hypergraph()
H.add_edges_from([[0, 1, 2], [1, 2, 3], [2, 3, 4]]) #edgelist6

A, node_dict = xgi.adjacency_matrix(H, order=1, index=True)

returns

File ~/Dropbox (ISI Foundation)/WORK/SCIENCE/xgi/xgi/linalg/matrix.py:136, in adjacency_matrix(H, order, s, weighted, index)
    133     I = incidence_matrix(H, index=False, order=order)
    135 A = I.dot(I.T)
--> 136 A.setdiag(0)
    138 if not weighted:
    139     A = (A >= s) * 1

AttributeError: 'numpy.float64' object has no attribute 'setdiag'

That happens because, there is no edge of order 1 in the hypergraph. In that case, right now, we have

xgi.incidence_matrix(H, order=1, index=True)
# array([0])

so that I.dot(I.T) == 0 which is an int which has no diagonal to set.

To fix this, two options:

raise an error in incidence_matrix when there is no edge or no nodes, instead of returning [0].
have a if else in adjacency_matrix to return an NxN array of zeros if I == [0]. Probably need to add sparse argument there too then to deal with both cases coming from incidence_matrix.

What do you think? Do we want to return [0] for the incidence matrix when there are no edges/nodes or not?

Test of the Tutorial 1 notebook

The tutorial 1 notebook with basic functionality is ready.

What functionality should be added?
Do the calls make sense?
Any bugs in addition to the issues currently open?
Performance concerns?

Improve View object interface

In networkx, the (only?) way to get the largest degree is the following:

max(list(dict(G.degree()).values()))

I find this unacceptable. In xgi, I propose we implement something like the following:

max(H.degree().values())

That is, we should try to give the View objects a bit of an interface that makes them behave like the object they point to. Specifically, since H.degree() returns a DegreeView, which looks like a dictionary,

H.degree()
# -> DegreeView({0: 10, 1: 20, ...})

we should make the DegreeView object support some of the methods of a dictionary,

H.degree().values()
# -> [10, 20]

Incorporate this dynamics package

https://github.com/arnaudon/simplicial-kuramoto

neighbors

I was thinking that, when doing simulations (particularly for games), it is very useful to have the neighbors of a node divided by hyperedge. To explain better, here's what the current neighbors function returns:

hyperedge_list = [[1, 2], [2, 3, 4]]
H.neighbors(2)
# {1, 3, 4}

which is basically the neighbors in the graph projection of H. What I was imagining is actually a higher-order neighbor function that returns {[1], [3, 4]}. This is extremely useful in loops when you want to scroll over the hyperedges attached to a node and then the neighbors in those.

Do you think something like this could be integrated into the current neighbors function? Or maybe two different ones? In my old code I had H.graph_neighbors() and H.neighbors().

Add support for simplicial complexes

Because one can infer all the sub-interactions given a simplex, perhaps it is beneficial to store a simplicial complex using it's maximal edges to save space and perhaps use to develop efficient algorithms.

Implement an "easy mode" class

The proposal is to have a subclass of Hypergraph where every time the user does something fishy, it raises a warning.

H = some_hypergraph()
H.add_edge([1, 1, 2])      # an edge with a repeated node is added

easy = xgi.HypergraphWithWarnings(H)
easy.add_edge([1, 1, 2])
# -> Warning: An edge contains a repeated node

Implementing such a class would require executing many sanity checks each time an edge is added. This makes the class slow and inefficient. However, it is very safe and it "protects" the user at each step of the way. There would also be a way to easily convert one class to the other. The main caveat would be to make sure the user remembers to convert from one class to the other when they need to. We could perhaps highlight this in the documentation, and also include it in the warning text:

easy.add_edge([1, 1, 2])
# -> Warning: An edge contains a repeated node (call XXX method to disable these warnings)

add "reverse" keyword in the convert/read bipartite edgelist

The default behavior is that the first column specifies the nodes and the second column specifies the edges. Instead of calling this method and then H.dual(), it would be nice if there was reverse=True to switch the labels. This would affect the following functions:

read_bipartite_edgelist()
write_bipartite_edgelist()
from_bipartite_pandas_dataframe()

Add random seed option in the random hypergraph generators

For reproducibility, it would be nice to have a "random_seed" argument that can be set in the generators.

Add function to remove singleton from edges

Currently, edges contain singletons (nodes) :

list(H._edge.values())
# [[0], [3, 4], [4], [4]]

it's often useful to have a list of edges that does not contain nodes.
Add a function to get that.
Should the function output a new external list, of remove singletons from the edges stored internally in the HyperGraph?

Add functions to get largest connected component

Graph generators may sometimes return hypergraphs with isolated nodes (e.g. ER hypergraph with small p). In these cases, and others, it is usual to obtain the largest component. Ideally, we would like to do something like this:

G = xgi.erdos_renyi_hypergraph(n, m, p).largest_component()
G = xgi.erdos_renyi_hypergraph(n, m, p).remove_isolates()

Fix EdgeDataView class

Remove references to the _adj property

Create a frozen hypergraph object

We have discussed implementing a frozen object that would allow every statistic to be cached:

H = some_hyper_graph()

H.adj_tensor()    # compute the tensor
# ...more code here...
H.adj_tensor()    # has to recompute

H.freeze()
H.adj_tensor()    # since H is frozen, it computes the tensor once and *caches it*
# ...more code here...
H.adj_tensor() # no need to recompute!

The point is that after H is frozen, we can essentially cache everything!

H.freeze()
H.max_degree() # compute and cache
H.degree_histogram() # compute and cache
# etc

If at any point the user wants to modify H again, the cache is flushed,

H.thaw() # everything that was cached is erased
H.add_edge([1,2,3])
H.max_degree() # need to recompute
H.freeze()
H.max_degree() # compute and cache

order, shape, number_of_edges, number_of_nodes

All these functions do very similar things. In keeping with the idea of doing things only once, I vote we simplify them somehow.

I personally lean toward the nuclear option and just keeping shape. All the others are not necessary...

erdos_renyi_hypergraph() : does not have n nodes

As discussed today, trying the random hypergraph generator:

import xgi
n = 10
m = 100
p = 0.01
H = xgi.erdos_renyi_hypergraph(n, m, p)

H.nodes
# NodeView((1, 3, 6, 7, 9))

the hypergraph has 5 nodes, not 10, even though the doc says n is the number of nodes.

This discrepancy between n and the actual number of nodes is apparently expected for small n and p (correct me if I'm wrong).
If so, add a warning to the docstring. Otherwise, fix the bug.

pip install on readthedocs requires upper bound

pip on readthedocs seems to be trying to install pre-releases, so we had to upper bound numpy.

xgi-org / xgi Goto Github PK

xgi's People

Contributors

Stargazers

Watchers

Forkers

xgi's Issues

Recommend Projects

Recommend Topics

Recommend Org