Graphs are ubiquitous data structures and a universal framework for modeling complex systems, such as social networks, atomic interactions, and drug-protein interactions.
In this repo, practical tutorials of PyTorch Geometric (aka PyG) are covered. These tutorials are adapted from PyTorch Geometric examples, and expanded to include additional explorations.
Specifically, this repo includes examples of increasing GNN architecture complexity, and it is recommended that you complete them in the following order:
- intro
README.md
: Covers creating the conda environment needed to run the notebooks in this repoKarate_Club_Node_GCN_Classifier.ipynb
: A simple neural message passing scheme is utilized for classification using the graph convolutional operatorGCNConv
.- Dataset: Zachary's karate club is a social network of a university karate club, described in the paper "An Information Flow Model for Conflict and Fission in Small Groups" by Wayne W. Zachary.
- node_classification
Node_Classification_with_MLP.ipynb
: A Multi-layer Perceptron Network (MLP) is used for classifying nodes in a single graph, where only node features are utilized.Node_Classification_with_GNN.ipynb
: Two different GNN models (graph convolutional operatorGCNConv
and graph attentional operatorGATConv
) are utilized. Here, the graph structure is explicitly used, and unsurprisingly significant improvement in performance is observed.- Dataset: The Cora dataset consists of scientific papers, where nodes represent documents, and edges represent citation relationships. Each paper is categorized into one of several research fields.
- graph_classification
Graph_Classification.ipynb
: Given a dataset of graphs, the goal is to classify each (entire) graph based on its structural information and additional features.- A new PyG operator,
GraphConv
is introduced to demonstrate the impact of including/omitting neighborhoold information normalization. Plus, the addition of a skip-connection layer is explored. - Dataset: MUTAG dataset represents chemical compounds, specifically mutagenic and non-mutagenic molecules. The graphs in the dataset describe the structure of these chemical compounds, with nodes representing atoms and edges representing chemical bonds.
Visualizing embeddings is utilized in all notebooks to assess how well the GNN representation resembles the community-structure of the graph(s).
Finally, reading the first five chapters of Graph Representation Learning by William L. Hamilton is highly recommended to get comfortable with graph theory and graph neural networks lingo before you dive into code. The book provides intution, mathematical proofs, and explanations important to understanding why GNNs work, and and why/how different architectures impact model performance.