Giter Site home page Giter Site logo

yanluocityu / emgnn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhanglab-aim/emgnn

0.0 0.0 0.0 5.74 MB

Explainable Multilayer Graph Neural Network for Cancer Gene Prediction

License: GNU General Public License v3.0

Python 1.84% Jupyter Notebook 98.16%

emgnn's Introduction

made-with-python License: MIT

Explainable Multilayer Graph Neural Network for Cancer Gene Prediction

Published at UOC Bioinformatics Paper link.

The identification of cancer genes is a critical yet challenging problem in cancer genomics research. Existing computational methods, including deep graph neural networks, fail to exploit the multilayered gene-gene interactions or provide limited explanations for their predictions. These methods are restricted to a single biological network, which cannot capture the full complexity of tumorigenesis. Models trained on different biological networks often yield different and even opposite cancer gene predictions, hindering their trustworthy adaptation. Here, we introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes by leveraging multiple gene-gene interaction networks and pan-cancer multi-omics data. Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction.

EMGNN Architecture

Requirements

  • Python 3
  • PyTorch
  • torch-scatter, torch-sparse, torch-cluster, torch-spline-conv, torch-geometric
  • networkx
  • captum
  • pandas
  • sklearn

This repository contains the scripts for training and explaining the predictions of the EMGNN model.

How to Run

Data Preperation

To have a fair comparison with EMOGI, we used the same data preprocessing as their official implementation https://github.com/schulter/EMOGI. Follow the instruction to download the PPI networks and the labels.

Training

To train the model, run the following command:

python train.py --gcn 1 

This will train the EMGNN model using GCN as the graph neural network with the default settings and the datasets specified in the script. You can also specify different settings and datasets by passing in command-line arguments. For example, to train the model using the GAT architecture on the IREF_2015, PCNET and STRING PPI networks, and test it on STRING you can run:

python train.py --gat 1 --dataset IREF_2015 PCNET STRING

Notice that the last PPI network will always be used as the test set. The script also includes additional functionalities such as loading a pretrained model, adding random features, adding identical features, adding structural noise, and running a multi-layer perceptron (MLP) as a baseline instead of the EMGNN. The functionality of these options can be found in the code.

Explaining Predictions of EMGNN

The following script allows you to explain predictions made by an EMGNN (Edge-enhanced Meta Graph Neural Network) model for gene prediction. It uses the Captum library to perform gradient-based attribution methods, focusing on the Integrated Gradients method. You can use this script to gain insights into why the model made specific predictions for cancer genes.

python explain.py --model_dir <path_to_trained_model> --gene_label <gene_type>

Replace <path_to_trained_model> with the path to your trained EMGNN model and <gene_type> with one of the following options:

  • cancer: To explain cancer genes.

  • non-cancer: To explain non-cancer genes.

  • top_predicted: To explain the top predicted genes.

You can also use the --visualize flag to save network explanation visualizations if desired.

Note: Make sure the path to the trained model is the correct path.

The script generates explanation outputs for both edge and node explainability. The explanations are saved as pickle files in the explain directory within your model directory.

  • Edge Explainability: Edge attributions are saved as edge_mask_explain_<idx>_<label>.pkl.
  • Node Explainability: Node feature attributions are saved as node_feat_mask_explain_<idx>_<label>.pkl.

For a comprehensive explanation of the results and detailed code to generate the explainability plots of the paper, please refer to the analysis.ipynb Jupyter Notebook in this repository. The notebook provides step-by-step instructions and code snippets to perform an in-depth analysis of the EMGNN model's predictions and explanations.

Predictions for Unlabelled Genes

We provide the predictions for the unlabelled genes of our EMGNN model in the following link.

Citation

@article{10.1093/bioinformatics/btad643,
    author = {Chatzianastasis, Michail and Vazirgiannis, Michalis and Zhang, Zijun},
    title = "{Explainable Multilayer Graph Neural Network for Cancer Gene Prediction}",
    journal = {Bioinformatics},
    pages = {btad643},
    year = {2023},
    month = {10},
    abstract = "{The identification of cancer genes is a critical yet challenging problem in cancer genomics research. Existing computational methods, including deep graph neural networks, fail to exploit the multilayered gene-gene interactions or provide limited explanations for their predictions. These methods are restricted to a single biological network, which cannot capture the full complexity of tumorigenesis. Models trained on different biological networks often yield different and even opposite cancer gene predictions, hindering their trustworthy adaptation. Here, we introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes by leveraging multiple gene-gene interaction networks and pan-cancer multi-omics data. Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction.Our method consistently outperforms all existing methods, with an average 7.15\\% improvement in area under the precision-recall curve (AUPR) over the current state-of-the-art method. Importantly, EMGNN integrated multiple graphs to prioritize newly predicted cancer genes with conflicting predictions from single biological networks. For each prediction, EMGNN provided valuable biological insights via both model-level feature importance explanations and molecular-level gene set enrichment analysis. Overall, EMGNN offers a powerful new paradigm of graph learning through modeling the multilayered topological gene relationships and provides a valuable tool for cancer genomics research.Our code is publicly available at https://github.com/zhanglab-aim/EMGNN.}",
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btad643},
    url = {https://doi.org/10.1093/bioinformatics/btad643},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btad643/52306228/btad643.pdf},
}

emgnn's People

Contributors

michailchatzianastasis avatar zj-zhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.