pyt-team / topomodelx Goto Github PK

Topological Deep Learning

License: MIT License

Python 29.16% Jupyter Notebook 70.84%

cell-complex-networks cell-complex-neural-networks cxn graph-neural-networks higher-order-models higher-order-networks topological-data-analysis topological-deep-learning cw-networks simplicial-neural-networks

topomodelx's People

Stargazers

Watchers

Forkers

neo4reo sherif-med ninamiolane mathildepapillon mhajij all-seeing-code nkalyanv godfrey-cw hussainmsajwani hrithiknambiar pragyasingh7 rballeba alexandervnikitin papamarkou mr-siddy hellsegga david-klindt adele-myers franciscoeacosta xwang38438 bongjinkoo abrahamrabinowitz spindro paulhausner abdelwahed gvbazhenov 7anooch latticetower andrew-herdering ilwwan pavlo-melnyk odinhg cookbook-ms as-l-c effrosyni-papanastasiou marc0041 ioanam25 rtviii leoyouli vardhandongre ccatbbella devendragovil aslansd shubham-kashyapi lcrypto ajbrent alizia369 cocoaaa snopoff giminsukju sadra-barikbin johmathe nmamie dongcf sourcesync gurug-dev yashkumaratri maneelusf jarpri zwimpee levtelyatnikov elenaviewsynthesis toggled standardgalactic techthiyanes zstreet87 nimarasekh jiaxingwang1998 raphaelpellegrin dtbinh rodroadl lnhutnam liyubov haikuoxin duydl

topomodelx's Issues

Simplicial: Add equations in docstrings of forward methods

What?

We consider the layers implemented for the simplicial complex domain, i.e. the ones in nn/simplicial/*_layer.py.

Each Layer is a Python class that has a forward() method.
The docstring of this forward method should give the message passing equation of the Layer that is being implemented.

For example, see how the docstring of the forward method of the HSNLayer here:

TopoModelX/topomodelx/nn/simplicial/hsn_layer.py

Line 70 in 712fe4d

def forward(self, x_0, incidence_1, adjacency_0):

properly gives the equations:

        r"""Forward pass.

        The forward pass was initially proposed in [HRGZ22]_.
        Its equations are given in [TNN23]_ and graphically illustrated in [PSHM23]_.

        .. math::
            \begin{align*}
            &🟥 \quad m_{{y \rightarrow z}}^{(0 \rightarrow 0)} = \sigma ((A_{\uparrow,0})_{xy} \cdot h^{t,(0)}_y \cdot \Theta^{t,(0)1})\\
            &🟥 \quad m_{z \rightarrow x}^{(0 \rightarrow 0)}  = (A_{\uparrow,0})_{xy} \cdot m_{y \rightarrow z}^{(0 \rightarrow 0)} \cdot \Theta^{t,(0)2}\\
            &🟥 \quad m_{{y \rightarrow z}}^{(0 \rightarrow 1)}  = \sigma((B_1^T)_{zy} \cdot h_y^{t,(0)} \cdot \Theta^{t,(0 \rightarrow 1)})\\
            &🟥 \quad m_{z \rightarrow x)}^{(1 \rightarrow 0)}  = (B_1)_{xz} \cdot m_{z \rightarrow x}^{(0 \rightarrow 1)} \cdot \Theta^{t, (1 \rightarrow 0)}\\
            &🟧 \quad m_{x}^{(0 \rightarrow 0)}  = \sum_{z \in \mathcal{L}_\uparrow(x)} m_{z \rightarrow x}^{(0 \rightarrow 0)}\\
            &🟧 \quad m_{x}^{(1 \rightarrow 0)}  = \sum_{z \in \mathcal{C}(x)} m_{z \rightarrow x}^{(1 \rightarrow 0)}\\
            &🟩 \quad m_x^{(0)}  = m_x^{(0 \rightarrow 0)} + m_x^{(1 \rightarrow 0)}\\
            &🟦 \quad h_x^{t+1,(0)}  = I(m_x^{(0)})
            \end{align*}

However, some forward functions of the layers on the simplicial domain do not provide the mathematical equations associated with this message passing. We should add them.

Why?

The code is easier to read and understand if one can refer to the mathematical equation directly.

While the equation does not render very well in the docstring, it will render on the documentation website which will help the users.

Where?

The files to modify are:

topomodelx/nn/simplicial/*_layer.py

NOTE: This issue only focus on layers within the simplicial domain. There will be other issues to add equations in docstrings for other topological domains.

How?

Go to the repository: https://github.com/awesome-tnns/awesome-tnns
Go to the file: Simplicial_Complexes.md

For each layer file simplicial/*_layer.py, for each forward() method in that file:

Find the equation corresponding to the forward you are looking at in Simplicial_Complexes.md.
Copy that equation into the correct format into the docstring of the forward() method.
! Verify that the code in the forward() method indeed corresponds to the equation. If not, raise an issue.
When merging your PR, verify that the documentation website has rendered the equation correctly.

Note: this last step might need to wait on the issue #165

Pytorch missing in dependencies

What

The pyproject.toml file is missing pytorch from the list of dependencies, when it is used by a lot of modules.

Why

Pytorch is required, for example pytest doesn't run at all (in a fresh environment):

Add tutorials to tmx doc website

Merge HOAN models from challenge's PRs

What?

There were two implementations of HOAN models from Hajij 22a by challenge participants, in the PRs:
https://github.com/pyt-team/TopoModelX/pull/145/files
https://github.com/pyt-team/TopoModelX/pull/104/files

Both PRs fail the unit-tests.
It is also unclear whether the PRs implement the same HOAN model, or different models from Hajij22a.

If they are implementing the same HOAN model, we need to merge them into 3 files:

hoan_layer.py
test_hoan_layer.py
hoan_train.ipynb

If they are different implementations, we need to check if one of them can be deleted.
Possible reasons to delete:

not faithful to the research paper introducing HOAN,
incomplete implementation.

If we decide to keep both of them, the file names and docstrings should be updated to highlight their difference.

Why?

These PRs represent our only implementation of the combinatorial_complex domain which is currently not represented in topomodelx. We need to merge them so that this domain is represented as well.

Where?

The files to add should look like:

topomodelx/nn/combinatorial/hoan_layer.py
test/nn/combinatorial/test_hoan_layer.py
tutorials/combinatorial/hoan_train.ipynb

If the two PRs are actually implementing different models, then we should have 2 sets of 3-files (one set for each model) with explicit names and documentation that reflect the differences between the models.

How?

See for example how the duplicate SCN and SCN2 has been solved and the use of the "See Also" sections in their docstrings.

Add path complex neural network in TopoModelX

What

Implement a path complex neural network.

Why

Currently TopoModelX has networks for hypergraphs, simp, cell and combinatorial complexes. TopoNetX supports all these complexes as well as Path Complex, yet the package does not have any neural networks build for path complexes.

How

Follow the templates we have for other networks, add layer and network classes. Add a tutorial in the tutorial folder under path folder, add unit tests.

Problem with test_reset_parameters() in test_hsn_layer.py

Hi!

In test_reset_parameters()

TopoModelX/test/nn/simplicial/test_hsn_layer.py

Line 34 in ed4bd96

if isinstance(module, torch.nn.Conv2d):

, the HSNLayer modules are checked to be instances of torch.nn.Conv2d. However, none of them are, since the layer consists of only Aggregation (from topomodelx.base.aggregation) and Conv from (topomodelx.base.conv).

Thus, the test will be passed without actually checking the parameter reset.

(The test itself is incorrect: after resetting, the parameters of a torch.nn.Conv2d layer are compared to zeros

TopoModelX/test/nn/simplicial/test_hsn_layer.py

Line 36 in ed4bd96

module.weight, torch.zeros_like(module.weight)

, which is not how they are initialized.)

Make 3 figures for the software paper

Make three figures for the software paper in overleaf. See overleaf.

Questions about Cellular Attention Mechanism/Network

I have a couple of clarifying questions about cellular attention networks and the code for the attention mechanism in the conv and message passing files.

I may be mistaken, but it seems like there is no normalization being done for the attention coefficients in the current implementation of attention (the reference paper uses softmax for this purpose). Should we leave the current attention mechanism as is or is it worth rewriting the code to implement the normalization?
With regards to the tensor diagram for CAN's when it comes to the neighborhood aggregation the tensor diagram here calls for applying a non-linearity to each within neighborhood aggregation and then performing the inter-neighborhood aggregation, whereas the referenced CAN paper performs the inter-neighborhood aggregation and then applies the non-linearity. Would it be ok to go with the formula given by the paper? This would also make it so our implementation can reduce to the Hodge Laplacian layer in the referenced Rodenberry et al. paper when the option to use attention is set to false.

Synchronization of implementations for layers applicable to complexes of arbitrary rank

Some simplicial(/cellular) complex layers are applicable to complexes of arbitrary high rank. This raises the question of how the input to these layers should be formatted.

In #129 we use a dictionary each for adjacencies, incidences and features, indexing each adjacency/incidence/feature matrix by their rank. In #142 lists are used instead. I must admit that I have not carefully checked all implementations to see if there are more examples. Perhaps neither of these solutions is robust enough and there should be some custom data structure for a complex, something like an analogue to the SimplicialComplex class in TopoNetX but using PyTorch instead of NumPy?

In any case it would be good to synchronize the implementations, so I started this issue for discussing!

Update badges on 3 repositories

Add python 3.10+ badge
Add doi

Clarify duplicate implementation of SCN/SCN2

What?

There were two implementations of SCN by challenge participants:

SCN2 only works for simplicial complexes of rank 2, and
SCN works for simplicial complexes of any rank.

However, we should check that the difference between these two implementations only lies in the ranks used.

Why?

Because it is confusing to have two very close implementations of SCN without knowing their precise difference.

Where?

The files to modify are:

topomodelx/nn/simplicial/scn_layer.py
topomodelx/nn/simplicial/scn2_layer.py
test/nn/simplicial/test_scn_layer.py
test/nn/simplicial/test_scn2_layer.py
tutorials/simplicial/scn_train.ipynb
tutorials/simplicial/scn2_train.ipynb

How?

We should look at the following:

Do they use the same neighborhood matrices?
- If not, the docstrings' sections "See Also" should explain additional differences in more details.
Do they use the same activation?
etc.

A small discrepancy between `MessagePassing::forward` return value and its docstring

Hello,
In MessagePassing::forward, it's stated that the shape of return value is [..., n_target_cells, out_channels] but its return value is actually of shape [..., n_target_cells_with_neighbors, out_channels] which its last but one dimension size is the number of target cells that have any neighbor in source cells. Which one is supposed to be correct? The implementation or the docstring?

To reproduce:

from topomodelx.base.message_passing import MessagePassing
m = MessagePassing()
x_source = torch.rand((3,10))
neighborhood = torch.tensor([[1, 0, 1], [0, 0, 0]]).to_sparse_coo()
m.forward(x_source, neighborhood).shape # result: torch.Size([1, 10])

Unify duplicate implementation of CAN

What?

There were two implementations of CAN by challenge participants.

If they are indeed the same implementation, we need to merge them into 3 files:

can_layer.py
test_can_layer.py
can_train.ipynb

If they are different implementations, we need to check if one of them can be deleted.
Possible reasons to delete:

not faithful to the research paper introducing CAN,
incomplete implementation.

If we decide to keep both of them, the file names and docstrings should be updated to highlight their difference.

Why?

Because it is confusing to have two implementations of CAN without knowing their difference.

Where?

The files to modify are:

topomodelx/nn/cell/can_layer.py
topomodelx/nn/cell/can_layer_bis.py
test/nn/cell/test_can_layer.py
test/nn/cell/test_can_layer_bis.py
tutorials/cell/can_train.ipynb
tutorials/cell/can_train_bis.ipynb

How?

See for example how the duplicate SCN and SCN2 has been solved and the use of the "See Also" sections in their docstrings.

Additionally:

Rename the Python class ending in _v2 in can_layer.py because:

v2 is not explanatory: it hinders readability
Python classes need to be in CamelCase and without underscore.

Address the comments left by the challenge's participants in can_layer_bis.py`, and copy-pasted again below:

# Some notes - The attention function provided for us does not normalize the attention coefficients. Should this be done?
# Where should we be able to customize the non-linearities? Seems important for the output. What about the attention non-linearities do we just use what is given?
# I wanted to make this so that without attention it ends up being the Hodge Laplacian network. Maybe ask the contest organizers about this?

Avoid conversions sparse -> dense -> sparse

Notebooks often use a sparse -> dense -> sparse conversion, e.g. sccnn_train.ipynb

laplacian_0 = torch.from_numpy(laplacian_0.todense()).to_sparse()

This is inefficient: it makes our tutorials, and therefore our unit-tests slower.

The sparse numpy array should be directly transformed into a sparse tensor, e.g. using torch.sparse_coo_tensor.

Create Tmx website pages

What?

The current documentation website does not include TMX.

We should add one to help contributors navigate the repository.

Why?

The documentation website is a great entry point for contributors.

Where?

Add API documentation pages in here:
https://github.com/pyt-team/TopoModelX/tree/main/docs

If necessary, change the docs workflow:
.github/workflows/docs.yml

How?

The API should be automatically generated from the code. Follow what was done for TNX.

Add tutorials' thumbnails on TopoModelX doc website

What?

The tutorials webpage of TopoModelX documentation website does not display any thumbnails.

https://pyt-team.github.io/topomodelx/tutorials/index.html

We should add thumbnails.

Why?

The tutorials webpage is one of the first pages that users see. It should look clean.

Additionally, thumbnails help the reader understanding what the tutorials are about.

Where and How?

In order to add thumbnails, we refer to the similar github issue that added thumbnails for the tutorials of TopoNetX:
pyt-team/TopoNetX#84

The PR solving the issue was:
pyt-team/TopoNetX#179

The persons solving the issue were: @devendragovil and @mhajij: ask them for guidelines if needed.

For each tutorial notebook:

Locate the tensor diagram corresponding to the neural network implemented in the notebook in Fig. 11 of the review Architectures of Topological Deep Learning: A Survey of Topological Neural Networks.
Crop the tensor diagram out of the figure.
Make this tensor diagram the thumbnail of the notebook.

Simplicial: Migrate Neural Network's Class inside topomodelx/nn

What?

We consider the neural networks (not the layers, the full networks) implemented for the simplicial domain. These neural networks are currently implemented as Python classes in the tutorials notebooks.

We should, instead, port their implementation into the core code base, specifically into: topomodelx/nn/simplicial/.

Why?

The neural networks are "hidden" in the tutorials.
They might also be less unit-tested than what they could be if they were inside the core codebase.

Where?

The files to modify are:

tutorials/simplicial/*_train.ipynb
topomodelx/nn/simplicial/

NOTE: This issue only focus on layers within the simplicial domain. There will be other issues to port the neural network python code into the core code base for the other domains.

How?

For each file tutorials/simplicial/[model-name]_train.ipynb:

Locate the code of the Python class that defines the neural network within the ipynb notebook,
Create a new file topomodelx/nn/simplicial/[model-name].py (note the absence of any _layer suffix).
Copy the code of the Python class that defines the neural network there.
Create a new file test/nn/simplicial/test_[model-name].py (note the absence of any _layer suffix).
Add unit-tests: one test for each of the method of the neural network's Python class.
Make sure that the unit-tests pass and that the methods are correctly documented.

Check that "References" render correctly on documentation websites

What?

The docstrings of several functions have a Reference section that looks like:

    References
    ----------
    .. [AGRW20] Devanshu Arya, Deepak K Gupta, Stevan Rudinac and Marcel Worring.
        HyperSAGE: Generalizing inductive representation learning on hypergraphs.
        arXiv preprint arXiv:2010.04558. 2020

Within the docstrings, the papers are cited via in [AGRW20] or maybe via in _[AGRW20] (using an underscore?).

We should make sure that each docstring:

cites the paper that their function is implementing,
cites it "correctly" in the sense that the References section appears cleanly on the documentation website.

Note: this issue is blocking on Issue , which should be completed first.

Why?

Because we want to give credit to the authors of the corresponding papers.
Because it will help the users of the codebase to know in which paper they can find the theory corresponding to the code they are trying to run.

Where?

Everywhere.

How?

Check on the documentation website that the references look great by going to:
https://pyt-team.github.io/topomodelx/api/index.html

Possible bug in scatter_sum

The bug happens when the last cell / simplex (index-wise) is isolated. Simple code to reproduce the issue:

from toponetx.classes.cell_complex import CellComplex
import torch
from topomodelx.base.conv import Conv


if __name__ == '__main__':
    complex = CellComplex()
    complex.add_cell([0,1,2], 2)
    complex.add_cell([0,1,3], 2)
    complex.add_node(4)
    vertex_feats = torch.zeros(5, 3)
    hidden_dim = 8
    A0 = torch.from_numpy(complex.adjacency_matrix(rank=0).todense()).to_sparse().float()
    
conv_0_to_0 = Conv(
            in_channels=in_channels_0, out_channels=in_channels_0, att=False
        )
    
    zv = conv_0_to_0(vertex_feats, A0)

`Dist2Cycle` never calls `Dist2CycleLayer`s

While Dist2CycleLayers are correctly constructed in Dist2Cyle, they are never actually called.

Correctly call layers in the forward method.
This should have been catched by tests. Fix them!

Complete writing tasks on overleaf

See TODO's in overleaf

No PR Drafts

Unfinished PRs can't be easily flagged right now. Pull Request Drafts require a public Repository, GitHub Team, GitHub Enterprise Server 2.17+ or GitHub Enterprise Cloud.

Fix API Reference on Topomodelx' doc website

What?

The API Reference Tab of TopoModelX's documentation website is broken.
https://pyt-team.github.io/topomodelx/api/index.html

It is broken because it does not show the Python classes and functions, and their docs, that are implemented in the package.
By contrast, the API Reference Tab of TopoNetX is not broken (at least at the time of submission of this issue), and looks like this:
https://pyt-team.github.io/toponetx/api/index.html

(see also Screenshot below, in the case the API Reference Tab of TopoNetX does break after this issue is submitted).

We need to fix the API Reference Tab of TopoModelX.

Why?

Because the API Reference is one of the most important aspect of the documentation website. It needs to show which functions are implemented in the package.

Where?

On TopoModelX, modify the documentation website's "api" tab by modifying files in the folder:
https://github.com/pyt-team/TopoModelX/tree/main/docs/api

How?

Look at how the API Reference Tab is coded in TopoNetX, by looking at the files in this folder:
https://github.com/pyt-team/TopoNetX/tree/main/docs/api

Detect what differs between TopoModelX and TopoNetX, and fix TopoModelX accordingly.

The main difference is probably that TopoModelX has an additional layer of folder structure, that needs to be correctly processed.

Hypergraph: Migrate Neural Network's Class inside topomodelx/nn

What?

We consider the neural networks (not the layers, the full networks) implemented for the hypergraph domain. These neural networks are currently implemented as Python classes in the tutorials notebooks.

We should, instead, port their implementation into the core code base, specifically into: topomodelx/nn/hypergraph/.

Why?

The neural networks are "hidden" in the tutorials.
They might also be less unit-tested than what they could be if they were inside the core codebase.

Where?

The files to modify are:

tutorials/hypergraph/*_train.ipynb
topomodelx/nn/hypergraph/

NOTE: This issue only focus on layers within the hypergraph domain. There will be other issues to port the neural network python code into the core code base for the other domains.

How?

For each file tutorials/hypergraph/[model-name]_train.ipynb:

Locate the code of the Python class that defines the neural network within the ipynb notebook,
Create a new file topomodelx/nn/hypergraph/[model-name].py (note the absence of any _layer suffix).
Copy the code of the Python class that defines the neural network there.
Create a new file test/nn/hypergraph/test_[model-name].py (note the absence of any _layer suffix).
Add unit-tests: one test for each of the method of the neural network's Python class.
Make sure that the unit-tests pass and that the methods are correctly documented.

Hypergraph: Add equations in docstrings of forward methods

What?

We consider the layers implemented for the hypergraph domain, i.e. the ones in nn/hypergraph/*_layer.py.

Each Layer is a Python class that has a forward() method.
The docstring of this forward method should give the message passing equation of the Layer that is being implemented.

For example, see how the docstring of the forward method of the UniGCNII here:
https://github.com/pyt-team/TopoModelX/blob/main/topomodelx/nn/hypergraph/unigcnii_layer.py

properly gives the equations:

        The forward pass consists of:
        - two messages, and
        - a skip connection with a learned update function.

        1. Every hyper-edge sums up the features of its constituent edges:
        .. math::
            \begin{align*}
            & 🟥 \quad m_{y \rightarrow z}^{(0 \rightarrow 1)} = (B^T_1)\_{zy} \cdot h^{t,(0)}_y \\
            & 🟧 \quad m_z^{(0\rightarrow1)} = \sum_{y \in \mathcal{B}(z)} m_{y \rightarrow z}^{(0 \rightarrow 1)}
            \end{align*}

        2. The second message is normalized with the node and edge degrees:
        .. math::
            \begin{align*}
            & 🟥 \quad m_{z \rightarrow x}^{(1 \rightarrow 0)}  = B_1 \cdot m_z^{(0 \rightarrow 1)} \\
            & 🟧 \quad m_{x}^{(1\rightarrow0)}  = \frac{1}{\sqrt{d_x}}\sum_{z \in \mathcal{C}(x)} \frac{1}{\sqrt{d_z}}m_{z \rightarrow x}^{(1\rightarrow0)} \\
            \end{align*}

        3. The computed message is combined with skip connections and a linear transformation using hyperparameters alpha and beta:
        .. math::
            \begin{align*}
            & 🟩 \quad m_x^{(0)}  = m_x^{(1 \rightarrow 0)} \\
            & 🟦 \quad m_x^{(0)}  = ((1-\beta)I + \beta W)((1-\alpha)m_x^{(0)} + \alpha \cdot h_x^{t,(0)}) \\
            \end{align*}

However, some forward functions of the layers on the hypergraph domain do not provide the mathematical equations associated with this message passing. We should add them.

Why?

The code is easier to read and understand if one can refer to the mathematical equation directly.

While the equation does not render very well in the docstring, it will render on the documentation website which will help the users.

Where?

The files to modify are:

topomodelx/nn/hypergraph/*_layer.py

NOTE: This issue only focus on layers within the hypergraph domain. There will be other issues to add equations in docstrings for other topological domains.

How?

Go to the repository: https://github.com/awesome-tnns/awesome-tnns
Go to the file: Hypergraphs.md

For each layer file hypergraph/*_layer.py, for each forward() method in that file:

Find the equation corresponding to the forward you are looking at in Hypergraphs.md.
Copy that equation into the correct format into the docstring of the forward() method.
! Verify that the code in the forward() method indeed corresponds to the equation. If not, raise an issue.
When merging your PR, verify that the documentation website has rendered the equation correctly.

Note: this last step might need to wait on the issue #165.

Missing Tutorial Thumbnail

What?

Two Issues:

Thumbnails are missing from the tutorial page in documentations.
I couldn't locate the thumbnails in the module itself.

Help Required

Please guide me regarding the location of thumbnails for the docs. Alternatively, will be helpful if I can get guidance on which images can serve as thumbnails.

@mhajij

Speed up unit-tests

torch_scatter does not import properly following README

I get:

Python 3.10.10 (main, Mar 21 2023, 13:41:39) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import torch_scatter
libc++abi: terminating with uncaught exception of type std::length_error: vector
Abort trap: 6

Check docstrings everywhere

What?

Some docstrings are erroneous, for example:


class SCNN(torch.nn.Module):
    """Simplicial convolutional neural network implementation for complex classification.

    Note: At the last layer, we obtain the output on simplcies, e.g., edges.
    To perform the complex classification task for this challenge, we consider pass the final output to a linear layer and compute the average.

    Parameters
    ----------
    in_channels: int
        Dimension of input features
    intermediate_channels: int
        Dimension of features of intermediate layers
    out_channels: int
        Dimension of output features
    conv_order_down: int
        Order of lower convolution
    conv_order_up: int
        Order of upper convolution
    n_layers: int
        Numer of layers
    """

    def __init__(
        self,
        in_channels,
        intermediate_channels,
        out_channels,
        conv_order_down,
        conv_order_up,
        aggr_norm=False,
        update_func=None,
        n_layers=2,
    ):

How?

Check docstrings in the whole codebase and correct them if needed. Check that:

the format of the docstring is correct
the docstring documents every input and output variable

Follow docstrings' guidelines written in our contribution file: https://pyt-team.github.io/topomodelx/contributing/index.html#write-documentation

Diagnose & Speed-up Hypergraph tutorials

What?

Testing the tutorials on hypergraphs takes ~15 minutes, whereas testing the tutorials on other domains takes ~2-5 minutes (see screenshot).

There is probably one tutorial on hypergraphs that takes very long and slows down the whole github action workflow.

Find out which one and whether it can be accelerated.

Why?

A slow testing workflow slows down all the contributors, who have to wait for all tests to pass before being able to move on.

Refactor topomodelx

What

Provide a structure of Python classes that inherit from torch.nn.Module and that allow future contributors to implement the architecture of topological deep learning in a consistent framework.

Provide example notebooks.

Provide unit-tests.

Why?

The current dev implementation is not tested, and does not follow a consistent API. We need to have a consistent, unit-tested and documented code architecture before asking contributors to contribute their models or develop new models.

Where?

In the repo topomodelx, follow a structure of folders that is inspired by torch_geometric.

How?

From the literature:

Once these tasks are done with unit-tested, documented code, the package is available for large-scale contributions.

Cell: Migrate Neural Network's Class inside topomodelx/nn/

What?

We consider the neural networks (not the layers, the full networks) implemented for the cell domain. These neural networks are currently implemented as Python classes in the tutorials notebooks.

We should, instead, port their implementation into the core code base, specifically into: topomodelx/nn/cell/.

Why?

The neural networks are "hidden" in the tutorials.
They might also be less unit-tested than what they could be if they were inside the core codebase.

Where?

The files to modify are:

tutorials/cell/*_train.ipynb
topomodelx/nn/cell/

NOTE: This issue only focus on layers within the cell domain. There will be other issues to port the neural network python code into the core code base for the other domains.

How?

For each file tutorials/cell/[model-name]_train.ipynb:

Locate the code of the Python class that defines the neural network within the ipynb notebook,
Create a new file topomodelx/nn/cell/[model-name].py (note the absence of any _layer suffix).
Copy the code of the Python class that defines the neural network there.
Create a new file test/nn/cell/test_[model-name].py (note the absence of any _layer suffix).
Add unit-tests: one test for each of the method of the neural network's Python class.
Make sure that the unit-tests pass and that the methods are correctly documented.

Questions about HSN tutorial

I have a couple of questions/found bugs regarding the HSN tutorial (and hence might impact other tutorials in the simplicial domain).

TopoModelX/tutorials/simplicial/hsn_train.ipynb

Line 326 in 18956de

" self.layers = layers\n",

should be self.layers = torch.nn.ModuleList(layers), so that the parameters get properly registered.
TopoModelX/tutorials/simplicial/hsn_train.ipynb

Line 355 in 18956de

" return torch.softmax(logits, dim=-1)"

should probably not have softmax, as later binary crossentropy on logits is used:

TopoModelX/tutorials/simplicial/hsn_train.ipynb

Line 415 in 18956de

" loss = torch.nn.functional.binary_cross_entropy_with_logits(\n",

TopoModelX/tutorials/simplicial/hsn_train.ipynb

Line 120 in 18956de

    
           "Since our task will be node classification, we must retrieve an input signal on the nodes. The signal will have shape $n_\\text{nodes} \\times$ in_channels, where in_channels is the dimension of each cell's feature. Here, we have in_channels = channels_nodes $ = 34$. This is because the Karate dataset encodes the identity of each of the 34 nodes as a one hot encoder."

"Here, we have in_channels = channels_nodes $ = 34$. This is because the Karate dataset encodes the identity of each of the 34 nodes as a one hot encoder." This seems to be incorrect as we get 2 dim features:

TopoModelX/tutorials/simplicial/hsn_train.ipynb

Line 145 in 18956de

"There are 34 nodes with features of dimension 2.\n"

and they are eigenvectors from the graph as defined in https://github.com/pyt-team/TopoNetX/blob/4c47ec24047a7af83d5a249a79c1945e7043ceea/toponetx/datasets/graph.py#L38 .

the class _MessagePassing should implement matrix multplication in the easiest case.

I am trying to perform matrix multiplication in the _MessagePassing class as a test case.

import torch
import numpy as np

import topomodelx as tmx
import toponetx as tnx
from topomodelx.nn.base import _MessagePassing

def coo_2_torch_tensor(sparse_mx, sparse=True):
"""Convert a scipy matrix to a torch tensor.

Parameters
----------
sparse_mx : scipy matrix
    Matrix to convert.
sparse: bool
    Specifies if the matrix is sparse or not.

Returns
-------
_ : torch.tensor
    Converted matrix.
"""
if sparse:
    sparse_mx = sparse_mx.tocoo().astype(np.float32)
    indices = torch.from_numpy(
        np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64)
    )
    values = torch.from_numpy(sparse_mx.data)
    shape = torch.Size(sparse_mx.shape)
    return torch.sparse.FloatTensor(indices, values, shape)
return torch.FloatTensor(sparse_mx.todense())

SC= tnx.SimplicialComplex([[0,1],[1,2]])
B1 = coo_2_torch_tensor(SC.incidence_matrix(1))
mp = _MessagePassing(1,1)
x_e = torch.rand(2,10)
x_v = mp (x_e,B1)

getting the error:

TypeError: 'module' object is not callable

Unify duplicate implementation of HNHN

What?

There were two implementations of HNHN by challenge participants.

If they are indeed the same implementation, we need to merge them into 3 files:

hnhn_layer.py
test_hnhn_layer.py
hnhn_train.ipynb

If they are different implementations, we need to check if one of them can be deleted.
Possible reasons to delete:

not faithful to the research paper introducing HNHN,
incomplete implementation.

If we decide to keep both of them, the file names and docstrings should be updated to highlight their difference.

Why?

Because it is confusing to have two implementations of HNHN without knowing their difference.

Where?

The files to modify are:

topomodelx/nn/hypergraph/hnhn_layer.py
topomodelx/nn/hypergraph/hnhn_layer_bis.py
test/nn/hypergraph/test_hnhn_layer.py
test/nn/hypergraph/test_hnhn_layer_bis.py
tutorials/hypergraph/hnhn_train.ipynb
tutorials/hypergraph/hnhn_train_bis.ipynb

How?

See for example how the duplicate SCN and SCN2 has been solved and the use of the "See Also" sections in their docstrings.

Add Typing everywhere, for consistency

What?

Across the three packages, the code is not consistent in its use of Typing. Sometimes we have typing, sometimes we do not.

We should decide on one convention and stick to it. If we decide to include Typing, then we should check it with mypy.

The addition of Typing in the CI workflow can be done like this:
pyt-team/TopoNetX#163

Why?

Adding typing help debug the code.

Where?

Everywhere: in the code base of all repositories of pyt-team.

How:

In the code:

Observe how typing is done for example here:
https://github.com/pyt-team/TopoModelX/blob/main/topomodelx/utils/scatter.py
Add typing in every other file.
Add typing check in CI workflow as in: pyt-team/TopoNetX#163

Make unique doc website for the three packages

What?

Create a pyt-team website that serves as a single entry point that points towards the three documentation websites.

Why?

Users might want to circulate from one doc website to the other. Having one single entry point could help.

How?

Create a docs folder with the content of the website. See /docs folder in the PR referenced below.
Create a github action workflow that builds and deploys the content of the docs folder into a website. See .github/workflow/docs.yml in the PR referenced below.
Use Pyt-Team image as a welcome banner.
Make sure that the formatting of the website is intact.
On each package's doc website, create an option to go to the others: e.g. on TopoNetX's website, create an option to go to TopoEmbedX and TopoModelX, etc.

Linting

Can we somehow simplify the linting process? It seems like the current setup could create a entry barrier, because while black passes, flake8 sometimes doesn't and so you have to format manually.

Write coverletter

What?

Write coverletter that accompanies the software paper for JMLR software track. We need one.

How?

Guidelines about what to put in the coverletter can be found here: https://www.jmlr.org/mloss/mloss-info.html

Nina shared a template of cover letter on overleaf in the slack channel for the software paper. it has text that should be adapted.

Fix tutorials' display on TopoModelX doc's website

What?

The Tutorials Tab of TopoModelX's documentation website does not display the tutorials properly.
https://pyt-team.github.io/topomodelx/tutorials/index.html

Specifically, the headers of every tutorials shows up which clutters the page without providing any useful information:

Tutorials
[Train a Cellular Attention Network (CAN)](https://pyt-team.github.io/topomodelx/notebooks/cell/can_train.html)
[Set-up](https://pyt-team.github.io/topomodelx/notebooks/cell/can_train.html#Set-up)
[Pre-processing](https://pyt-team.github.io/topomodelx/notebooks/cell/can_train.html#Pre-processing)
[Create the Neural Network](https://pyt-team.github.io/topomodelx/notebooks/cell/can_train.html#Create-the-Neural-Network)
[Train the Neural Network](https://pyt-team.github.io/topomodelx/notebooks/cell/can_train.html#Train-the-Neural-Network)
[Train the Neural Network with Attention](https://pyt-team.github.io/topomodelx/notebooks/cell/can_train.html#Train-the-Neural-Network-with-Attention)
[Train a Convolutional Cell Complex Network (CCXN)](https://pyt-team.github.io/topomodelx/notebooks/cell/ccxn_train.html)
[Set-up](https://pyt-team.github.io/topomodelx/notebooks/cell/ccxn_train.html#Set-up)

Etc...

We need to prevent these headers to appear, so that only the thumbnails of the tutorials are shown.

Why?

Because displaying headers of tutorials does not provide any useful information and clutters the page.

Where?

On TopoModelX, modify the documentation website's "tutorials" tab by modifying:
https://github.com/pyt-team/TopoModelX/tree/main/docs/tutorials

How?

Maybe the :maxdepth: 1 parameter can be tuned?

Best Practices for batching?

Hi,

From looking at the existing tutorials for cell based networks, it doesnt seem like there is a convention to do batching for cell complexes. Is there some sort of best practice established within the TopoModelX framework? How should I go about creating batches?

Unify duplicate implementation of Scone

What?

There were two implementations of Scone by challenge participants.

If they are indeed the same implementation, we need to merge them into 3 files:

scone_layer.py
test_scone_layer.py
scone_train.ipynb

If they are different implementations, we need to check if one of them can be deleted.
Possible reasons to delete:

not faithful to the research paper introducing Scone,
incomplete implementation.

If we decide to keep both of them, the file names and docstrings should be updated to highlight their difference.

Why?

Because it is confusing to have two implementations of Scone without knowing their difference.

Where?

The files to modify are:

topomodelx/nn/simplicial/scone_layer.py
topomodelx/nn/simplicial/scone_layer_bis.py
test/nn/simplicial/test_scone_layer.py
test/nn/simplicial/test_scone_layer_bis.py
tutorials/simplicial/tscone_train.ipynb
tutorials/simplicial/tscone_train_bis.ipynb

How?

See for example how the duplicate SCN and SCN2 has been solved and the use of the "See Also" sections in their docstrings.

weights are not defined in the MessagePassingConv

trying to run the code:

import topomodelx as tmx
from topomodelx.nn.base import _MessagePassing
from topomodelx.nn.conv import MessagePassingConv
import toponetx as tnx
import torch
import numpy as np

def coo_2_torch_tensor(sparse_mx, sparse=True):
"""Convert a scipy matrix to a torch tensor.

Parameters
----------
sparse_mx : scipy matrix
    Matrix to convert.
sparse: bool
    Specifies if the matrix is sparse or not.

Returns
-------
_ : torch.tensor
    Converted matrix.
"""
if sparse:
    sparse_mx = sparse_mx.tocoo().astype(np.float32)
    indices = torch.from_numpy(
        np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64)
    )
    values = torch.from_numpy(sparse_mx.data)
    shape = torch.Size(sparse_mx.shape)
    return torch.sparse.FloatTensor(indices, values, shape)
return torch.FloatTensor(sparse_mx.todense())

define the complex

SC= tnx.SimplicialComplex([[0,1],[1,2]])

define the neibhood function

B1 = coo_2_torch_tensor(SC.incidence_matrix(1))

define the data

x_e = torch.rand(2,10)

feedforward

model = MessagePassingConv(10,10)

xv = model(x_e,B1)

but I am getting the error :

AttributeError: 'MessagePassingConv' object has no attribute 'weight'

weights are not defined in the class MessagePassingConv

README setup doesn't explicitly install torch-cluster

What:

Setup instructions in Readme do not explicitly install torch-cluster (for. eg. if torch installation is done for cpu), need to ensure torch-cluster is also separately installed as pytest fails with missing module error if torch-cluster is not installed.

Extends TopoEmbedX to ColoredHypergraphs and Path complexes

What

Extends existing algorithms in TopoEmbedX to ColoredHyperGraph and PathComplex

Why

TopoEmbedX can support all compelxes that TopoNetX implements. Currently, only cell, simplicial and combintorial complexes are supported in the algorithms.

How

Most work needs to be done here in the neighborhood file :

https://github.com/pyt-team/TopoEmbedX/blob/main/topoembedx/neighborhood.py

to extends its functionalities to ColoredHyperGraph and PathComplex.

Clarification on `scatter` utilities

TopoModelX contains some scatter functions in utils.scatter, that -- according to the module docstring -- are adapted from torch_scatter. Is there a specific reason why we ship our own implementation? torch_scatter is a dependency already.

If yes, we should document that reason and how the build-in implementation differs from torch_scatter (which, for me, is not at all obvious from looking at the code). Otherwise remove it and use the existing implementation in torch_scatter everywhere.

Error with installations commands given in README

pip install -e ".[dev,full]"

Gives:

Obtaining file:///Volumes/GoogleDrive/My%20Drive/code/TopoModelX
  Preparing metadata (setup.py) ... done
Collecting toponetx@ git+https://[email protected]/pyt-team/TopoNetX.git
  Cloning https://****@github.com/pyt-team/TopoNetX.git to /private/var/folders/dz/k1hb2xr94k558sjs416njdp40000gn/T/pip-install-wwfbzseq/toponetx_ccc12cc924f6461eb2b665372872af30
  Running command git clone --filter=blob:none --quiet 'https://****@github.com/pyt-team/TopoNetX.git' /private/var/folders/dz/k1hb2xr94k558sjs416njdp40000gn/T/pip-install-wwfbzseq/toponetx_ccc12cc924f6461eb2b665372872af30
  remote: Invalid username or password.
  fatal: Authentication failed for 'https://github.com/pyt-team/TopoNetX.git/'
  error: subprocess-exited-with-error
  
  × git clone --filter=blob:none --quiet 'https://****@github.com/pyt-team/TopoNetX.git' /private/var/folders/dz/k1hb2xr94k558sjs416njdp40000gn/T/pip-install-wwfbzseq/toponetx_ccc12cc924f6461eb2b665372872af30 did not run successfully.
  │ exit code: 128
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet 'https://****@github.com/pyt-team/TopoNetX.git' /private/var/folders/dz/k1hb2xr94k558sjs416njdp40000gn/T/pip-install-wwfbzseq/toponetx_ccc12cc924f6461eb2b665372872af30 did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

RED LIST: Check before submitting

What?

This is the list of essentials that need to be checked before submitting the software paper.

Tests are passing on all repositories.
Code coverage is above 90% for each of the 3 repositories (see JMLR guidelines).
The documentation websites look polished for each of the 3 repositories.
The code snippets given in the README run without problem.
The three repositories and the pyt-team.github.io repository are packaged into a unique zip file, that is sent to ~5 beta testers with different OS (Mac, Windows, Ubuntu, etc)
The feedback from the 5 beta testers is addressed.
The authors are added to the list of authors and they have proof-read the manuscript. This includes the winners of the challenge.

Why?

JMLR guidelines: https://www.jmlr.org/mloss/mloss-info.html

They state, for example: "It is expected that test coverage is close to 100%."

Oauth App Restrictions

Oauth App Restrictions prevent the google Colab link from working:

Oauth App Restrictions could be disabled but I'm not sure if it is safe to do so, hence not doing that for now. You can still use the example .ipynb in google colab, you just have to manually upload them.

Fix datasets in topomodelx' test workflow + Merge SNN

What?

There is an issue upon loading the datasets from TopoNetX when testing TopoModelX through GitHub Actions.
See details:
pyt-team/TopoNetX#195

We need to fix it.

Why?

Because the merge of TopoModelX's PR 98 is blocked by this:
#98

This PR has the implementation of the SNN model.

Where?

On TopoModelX, this might mean modifying the github workflow file for tests, which is:
.github/workflows/test.yml

On TopoNetX, this might mean modifying how the datasets are loaded.

How?

Many tutorials manage to load and compute with datasets from TopoNetX. Get inspiration from this code and fix the dataset that is not working for the SNN PR 98.