Giter Site home page Giter Site logo

xgi-org / xgi Goto Github PK

View Code? Open in Web Editor NEW
172.0 9.0 27.0 26.36 MB

CompleX Group Interactions (XGI) is a Python package for higher-order networks.

Home Page: https://xgi.readthedocs.io

License: Other

Jupyter Notebook 85.42% Python 14.58%
hypergraphs higher-order-networks network-science higher-order

xgi's People

Contributors

acombretrenouard avatar acuschwarze avatar doabell avatar goznalo-git avatar iaciac avatar leotrs avatar marconurisso avatar maximelucas avatar mcontisc avatar nwlandry avatar saad1282 avatar thomasrobiglio avatar tlarock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xgi's Issues

Replace get_edge_data?

The method in question is defined in classes/hypergraph.py as :

    def get_edge_data(self, id, default=None):
        """Returns the attribute dictionary associated with edge id.

        This is identical to `H._edge_attr[id]` except the default is returned
        instead of an exception if the edge doesn't exist.
        """
        try:
            # this may fail because the ID may not exist
            # or the property doesn't exist.
            return self.edges[id]
        except KeyError:
            return default

I suggest we move remove it from Hypergraph and make it a part of IDView, where it can just be called get. This will match more nicely with the dictionary interface:

# instead of this
H.get_edge_data(-1, default)

# how about this
H.edges.get(-1, default)

JSON File format and specifications

This issue consists of 4 components:

  • A JSON file format to standardize storage of hypergraphs (need to consider temporality, directedness, etc.)
  • A specification file that lays out best practices for this file
  • Functions to go from a Hypergraph to a JSON and vice-versa
  • A script that will check a JSON file and see if the format is correct (nodes and edges agree, tags are correct, etc.)

H.name and H['name'] are inconsistent

After #41, the code H['name'] will access hypergraph attributes and therefore should (hopefully) always return the same as H.name. However, if the name is not set, we will get, after that PR is merged, the following:

H.name
# -> ""

H['name']
# -> KeyError: 'name'

Subhypergraphs are not views

Currently, the way we have implemented subhypergraphs are essentially as read-only (but separate) hypergraphs. They are not really views, though there is some language in the code base that would suggest that they are. What I mean by this is the following:

H = xgi.Hypergraph([(0,1,2), (0,3), (1,2)])
sub = H.subhypergraph([0,1,2])

# the subgraph contains the correct edges
sub.edges
# -> EdgeView((0, 2))

# now modify the original graph
H.add_edge([0, 1])
H.edges
# -> EdgeView((0, 1, 2, 3))

# the subgraph DOES NOT update
sub.edges
# -> EdgeView((0, 2))
# we want EdgeView((0, 2, 3))

My problem is this: since the subgraphs do not update automatically (i.e. they are not views), there is no point in making them read-only. So for now we can just return a new Hypergraph object whenever the user needs a subgraph.

Implement "cleanup" method

Make sure the documentation highlights the fact that the user should remember to call remove_isolates and other related functions.
Implement a cleanup method that performs various common clean up tasks, see comment down thread.

Interface of NodeView

NodeView supports accessing the edge ids that contain a node in three different ways:

H = some_hypergraph()
H.nodes[1] == H.nodes(1) == H.members(1)
# -> True

I find this unpythonic. I'd much rather there was a single way of doing things if at all possible. If there was exactly one way of accessing the edges that contain a node, which would you prefer? If anyone can argue that we should support two (or more) ways of doing the same thing, I'm all ears.

Related to #20.

Fix Github action for unit tests

The .github/workflows/test.yml file needs to be edited so that the tests pass. I removed doctests for now. Having trouble getting it to work so would appreciate any help.

Should the edge members be stored as a set?

There may be unintended consequences to storing the hyperedge members as a set. When there are self-loops, the edge will shrink in cardinality. When constructing m-uniform hypergraphs with self-loops, the hypergraph will no longer will uniform.

Wrong dim of linalg degree when no neighbours

Running

import xgi

edges = [[0,1], [0,2], [1,2,3]]
H = xgi.Hypergraph(edges)

yields

xgi.degree(H, order=3).shape
## --> (0,)

but it should be (4,), the number of nodes.

This then yields an error when running xgi.laplacian(H, order=3), because it conflicts with the (correct) dimension of the adjacency matrix:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 xgi.laplacian(H, order=3).shape

File ~/Dropbox (ISI Foundation)/WORK/SCIENCE/xgi/xgi/linalg/matrix.py:274, in laplacian(H, order, rescale_per_node, index)
    270     return (np.array([]), {}) if index else np.array([])
    272 K = degree(H, order=order, index=False)
--> 274 L = order * np.diag(np.ravel(K)) - A  # ravel needed to convert sparse matrix
    275 L = np.asarray(L)
    277 if rescale_per_node:

ValueError: operands could not be broadcast together with shapes (0,0) (4,4)

sparse matrix warning in adjacency_matrix()

Calling

import xgi
H = xgi.random_hypergraph(10, [0.1, 0.01])
A = xgi.adjacency_matrix(H, weighted=True)

yields two warnings:

/usr/local/lib/python3.9/site-packages/scipy/sparse/compressed.py:291: SparseEfficiencyWarning: 
Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead. 
  warn(bad_scalar_msg, SparseEfficiencyWarning)
/usr/local/lib/python3.9/site-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning: 
Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray(i, j, x)

When weighted=False, I cannot reproduce a warning (but I might have had one in the past, unsure).
It seems to come from the last line in

if not weighted:
    A = (A >= s) * 1
else:
    A[A < s] = 0

Add function to get list of edges (not edge ids)

Currently,

H.edges 
# EdgeView([3, 2, 4, 9])

outputs edge ids, not edges as list of member nodes.
It's often useful to access edges as list of member nodes.
One current way is list(H._edge.values()) but it's long and hidden.

Should we have a two views?

  • EdgeIdView (to replace current EdgeView): for ids
H.edges_id
# EdgeIdView([3, 2, 4, 9])
  • EdgeView:for list of member nodes
H.edges
# EdgeView([[0], [3, 4], [4], [4]])

Better way of getting node and edge data

Currently the way of accessing nodal data of node 1 is H.nodes.data()[1] or H.nodes(1).data() and a more streamlined call would be preferable. Currently .nodes(id) and .nodes[id] access a node's bipartite neighbors so this probably is an undesirable way of doing it. The problem is that unlike NetworkX, we are using two dictionaries each for nodes and edges: one for nodes and one for attributes. So, I'm not sure what the function would return. I see three options 1) a separate property of Hypergraph called H.node_data or something 2) a flag data where if data=True, then it returns the node attributes and if false, it returns the nodes with the bipartite connections, or 3) return a tuple of the bipartite connections and data respectively. Would appreciate your thoughts!

Which should be the default? Edge ids or edge members?

I'm just wondering if we still wanted to implement this solution #28 (comment) that we seemed to agree on. But then we didn't implement it.

Namely, something more like

H = xgi.Hypergraph([[1, 2, 3], [3, 4], [4, 5, 6]])

H.edge_ids
# -> EdgeIDView((0, 1, 2))
H.edges
# -> EdgeView([[1, 2, 3], [3, 4], [4, 5, 6]])
H.edges(return_dict=True)
# -> {0: [1, 2, 3], 1: [3, 4], 2: [4, 5, 6]}

I think the reasoning was that it makes it shorter and more straightforward to see the members, as well as being consistent with networkx's edges.

Originally posted by @maximelucas in #70 (comment)

Add unit tests for XGI

Write initial tests for XGI using pytest for the following files:

  • algorithms/connected.py
  • classes/function.py
  • classes/hypergraph.py
  • classes/hypergraphviews.py
  • classes/reportviews.py
  • generators/classic.py
  • generators/nonuniform.py
  • generators/uniform.py
  • linalg/matrix.py
  • readwrite/bipartite.py
  • readwrite/edgelist.py
  • readwrite/incidence_matrix.py
  • readwrite/json.py
  • utils/utilities.py
  • convert.py

Modify degree_counts function

Change API such that

  1. degree_counts returns what degree_histogram currently returns, and
  2. degree_histogram returns the same data structures as np.unique(i.e. one array with the heights and one array with the degree bins.)

Update documentation to a standard format

Update the docstrings of the functions to the numpydoc format for the following files:

  • convert.py
  • algorithms/connected.py
  • classes/function.py
  • classes/hypergraph.py
  • classes/hypergraphviews.py
  • classes/reportviews.py
  • generators/classic.py
  • generators/nonuniform.py
  • generators/uniform.py
  • linalg/matrix.py
  • readwrite/bipartite.py
  • readwrite/edgelist.py
  • readwrite/incidence_matrix.py
  • readwrite/json.py
  • utils/utilities.py

Same edge (by members) gets multiple IDs in erdos_renyi_hypergraph

Working on #29 to remove singletons, I came across this:
Running

import xgi
n = 10
m = 50
p = 0.1
H = xgi.erdos_renyi_hypergraph(n, m, p)

then in H._edge, a given edge sometimes appear more than once with different IDs, e.g. { 38: [9], 39: [9], 48: [9], ...}.

In addition, not all nodes are present in edges as a singleton. For example, there is a node 8 but no singleton edge [8].

Haven't checked if #33 fixes this?

Add ability create/look at subhypergraphs

  • Add a function which allows you to specify a list of nodes and an option to shrink edges (false if you want an induced subhypergraph) and it will return the subhypergraph.
  • Add a function which allows you to specify a list of edges and it will return the subhypergraph.

Add ability to specify hyperdegree order

The current functionality only specifies the number of hyperedges to which a node belongs. In principle, one should be able to specify a hyperedge size or list of sizes and it returns the number of hyperedges of that size of which that node is a part.

Add support for directed hypergraphs

It would be good to add a class that allows users to specify a directed hypergraph. I've seen several ways to do this, but not sure which is best.

Implement weak removal of edges

A weak removal of a node from a hypergraph means removing the node from each edge it belongs to and then removing the node itself from the node set. A strong removal means weak removal AND deleting all edges that contain that node. Currently, Hypergraph.remove_node implements strong removal only.

We should think whether or not this makes sense in the case of simp. comps.

Normalize edge "order" and "size"

The words "order" and "size" are used throughout the codebase to refer to the number of nodes in an edge (where order = size-1). I'd suggest we choose one and remove all instances of the other. For example, we have H.edge_size but H.edges_of_order.

Wrong dimension in sparse incidence matrix

The incidence matrix should always have dimension (num_nodes, num_edges) or (num_nodes, num_edges_of_order).

But when sparse=True, dimension 0 is wrong when some nodes are disconnected (overall, or at a given order when it is specified). This makes all the other functions using the incidence matrix wrong (adjacency, laplacian, etc.).

MWE:

nodes = range(5)
edges = [[0,1], [0,2], [1,2,3]]

H = xgi.empty_hypergraph()
H.add_nodes_from(nodes)
H.add_edges_from(edges)

xgi.incidence_matrix(H, sparse=False).shape
## --> (5, 3) # good
xgi.incidence_matrix(H, sparse=True).shape
## --> (4, 3) # not good

This is because when sparse=True, the sparse matrix is built by iterating over the members of edges (of order d):

https://github.com/ComplexGroupInteractions/xgi/blob/388114d72d3ba03ea7c04ced52532762f3588ef8/xgi/linalg/matrix.py#L71-L82

adjacency_matrix error when no edges (of order d)

Running

import xgi
H = xgi.Hypergraph()
H.add_edges_from([[0, 1, 2], [1, 2, 3], [2, 3, 4]]) #edgelist6

A, node_dict = xgi.adjacency_matrix(H, order=1, index=True)

returns

File ~/Dropbox (ISI Foundation)/WORK/SCIENCE/xgi/xgi/linalg/matrix.py:136, in adjacency_matrix(H, order, s, weighted, index)
    133     I = incidence_matrix(H, index=False, order=order)
    135 A = I.dot(I.T)
--> 136 A.setdiag(0)
    138 if not weighted:
    139     A = (A >= s) * 1

AttributeError: 'numpy.float64' object has no attribute 'setdiag'

That happens because, there is no edge of order 1 in the hypergraph. In that case, right now, we have

xgi.incidence_matrix(H, order=1, index=True)
# array([0])

so that I.dot(I.T) == 0 which is an int which has no diagonal to set.

To fix this, two options:

  1. raise an error in incidence_matrix when there is no edge or no nodes, instead of returning [0].
  2. have a if else in adjacency_matrix to return an NxN array of zeros if I == [0]. Probably need to add sparse argument there too then to deal with both cases coming from incidence_matrix.

What do you think? Do we want to return [0] for the incidence matrix when there are no edges/nodes or not?

Improve View object interface

In networkx, the (only?) way to get the largest degree is the following:

max(list(dict(G.degree()).values()))

I find this unacceptable. In xgi, I propose we implement something like the following:

max(H.degree().values())

That is, we should try to give the View objects a bit of an interface that makes them behave like the object they point to. Specifically, since H.degree() returns a DegreeView, which looks like a dictionary,

H.degree()
# -> DegreeView({0: 10, 1: 20, ...})

we should make the DegreeView object support some of the methods of a dictionary,

H.degree().values()
# -> [10, 20]

neighbors

I was thinking that, when doing simulations (particularly for games), it is very useful to have the neighbors of a node divided by hyperedge. To explain better, here's what the current neighbors function returns:

hyperedge_list = [[1, 2], [2, 3, 4]]
H.neighbors(2)
# {1, 3, 4}

which is basically the neighbors in the graph projection of H. What I was imagining is actually a higher-order neighbor function that returns {[1], [3, 4]}. This is extremely useful in loops when you want to scroll over the hyperedges attached to a node and then the neighbors in those.

Do you think something like this could be integrated into the current neighbors function? Or maybe two different ones? In my old code I had H.graph_neighbors() and H.neighbors().

Add support for simplicial complexes

Because one can infer all the sub-interactions given a simplex, perhaps it is beneficial to store a simplicial complex using it's maximal edges to save space and perhaps use to develop efficient algorithms.

Implement an "easy mode" class

The proposal is to have a subclass of Hypergraph where every time the user does something fishy, it raises a warning.

H = some_hypergraph()
H.add_edge([1, 1, 2])      # an edge with a repeated node is added

easy = xgi.HypergraphWithWarnings(H)
easy.add_edge([1, 1, 2])
# -> Warning: An edge contains a repeated node

Implementing such a class would require executing many sanity checks each time an edge is added. This makes the class slow and inefficient. However, it is very safe and it "protects" the user at each step of the way. There would also be a way to easily convert one class to the other. The main caveat would be to make sure the user remembers to convert from one class to the other when they need to. We could perhaps highlight this in the documentation, and also include it in the warning text:

easy.add_edge([1, 1, 2])
# -> Warning: An edge contains a repeated node (call XXX method to disable these warnings)

add "reverse" keyword in the convert/read bipartite edgelist

The default behavior is that the first column specifies the nodes and the second column specifies the edges. Instead of calling this method and then H.dual(), it would be nice if there was reverse=True to switch the labels. This would affect the following functions:

  • read_bipartite_edgelist()
  • write_bipartite_edgelist()
  • from_bipartite_pandas_dataframe()

Add function to remove singleton from edges

Currently, edges contain singletons (nodes) :

list(H._edge.values())
# [[0], [3, 4], [4], [4]]

it's often useful to have a list of edges that does not contain nodes.
Add a function to get that.
Should the function output a new external list, of remove singletons from the edges stored internally in the HyperGraph?

Add functions to get largest connected component

Graph generators may sometimes return hypergraphs with isolated nodes (e.g. ER hypergraph with small p). In these cases, and others, it is usual to obtain the largest component. Ideally, we would like to do something like this:

G = xgi.erdos_renyi_hypergraph(n, m, p).largest_component()
G = xgi.erdos_renyi_hypergraph(n, m, p).remove_isolates()

Create a frozen hypergraph object

We have discussed implementing a frozen object that would allow every statistic to be cached:

H = some_hyper_graph()

H.adj_tensor()    # compute the tensor
# ...more code here...
H.adj_tensor()    # has to recompute

H.freeze()
H.adj_tensor()    # since H is frozen, it computes the tensor once and *caches it*
# ...more code here...
H.adj_tensor() # no need to recompute!

The point is that after H is frozen, we can essentially cache everything!

H.freeze()
H.max_degree() # compute and cache
H.degree_histogram() # compute and cache
# etc

If at any point the user wants to modify H again, the cache is flushed,

H.thaw() # everything that was cached is erased
H.add_edge([1,2,3])
H.max_degree() # need to recompute
H.freeze()
H.max_degree() # compute and cache

order, shape, number_of_edges, number_of_nodes

All these functions do very similar things. In keeping with the idea of doing things only once, I vote we simplify them somehow.

I personally lean toward the nuclear option and just keeping shape. All the others are not necessary...

erdos_renyi_hypergraph() : does not have n nodes

As discussed today, trying the random hypergraph generator:

import xgi
n = 10
m = 100
p = 0.01
H = xgi.erdos_renyi_hypergraph(n, m, p)

H.nodes
# NodeView((1, 3, 6, 7, 9))

the hypergraph has 5 nodes, not 10, even though the doc says n is the number of nodes.

This discrepancy between n and the actual number of nodes is apparently expected for small n and p (correct me if I'm wrong).
If so, add a warning to the docstring. Otherwise, fix the bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.