Giter Site home page Giter Site logo

transformergo's Introduction

TransformerGO

This repository contains the official implementation of the paper: "TransformerGO: Predicting protein-protein interactions by modelling the attention between sets of gene ontology terms"

For more details, see: TransformerGO: Predicting protein-protein interactions by modelling the attention between sets of gene ontology terms.

About

TransformerGO is a model based on the orginal Transformer architecture. It is used to capture deep semantic similarities between gene ontology terms and predict protein to protein interactions. Introduced in our paper TransformerGO: Predicting protein-protein interactions by modelling the attention between sets of gene ontology terms.

Contents

Datasets

The model is trained and evaluated using datasets for two organisms S. Cerevisiae and H. Sapiens

#Annotation data
url = 'http://geneontology.org/gene-associations/goa_human.gaf.gz'
#Protein to protein interactions
url = 'https://stringdb-static.org/download/protein.links.v11.5/9606.protein.links.v11.5.txt.gz'
#Protein aliases 
url = 'https://stringdb-static.org/download/protein.aliases.v11.5/9606.protein.aliases.v11.5.txt.gz'

Generating GO term embeddings

To generate embeddings for Gene Ontology terms, we use the original implementation of node2vec: Scalable Feature Learning for Networks. Parsing the .obo file is done using obo-file-parsing.ipynb and the generation of the edge list (input for node2vec) using node2vec-embeddings.ipynb. An example of running node2vec:

#!/bin/bash
input_path="../../../datasets/transformerGO-dataset/go-terms/graph/go-terms.edgelist"
output_path="../../../datasets/transformerGO-dataset/go-terms/emb/go-terms.emd"
python node2vec-master/src/main.py --input $input_path --output $ouput_path --dimensions 64 --iter 10

Training and testing the model

LSTM

Training and testing the implementation of protein2vec can be done via training-protein2vec.ipynb Changing the organism or the subset can be done by changing the following paths.

neg_path = "datasets/jains-TCSS-datasets/yeast_data/iea+/negatives.sgd.iea.f"
poz_path = "datasets/jains-TCSS-datasets/yeast_data/iea+/positives.sgd.iea.f"

TransformerGO model

Training and testing the implementation of TransformerGO can be done via training-transformerGO.ipynb

Multiple datasets are available, and these can be chosen by running the code block corresponding to the desired dataset (TCSS, StringDB benchmark or our datasets). Running experiements using specific gene ontology terms or different annotation sizes can be run by changing the following variables:

intr_set_size_filter = [0,5000]
go_filter = 'CC'

To train or test the model using a new dataset, simply provide the paths to the interaction data, annotation file and the aliases file as follows:

organism = 9606
EMB_DIM = 64
data_path = 'datasets/transformerGO-dataset/'
go_embed_pth = data_path + f"go-terms/emb/go-terms-{EMB_DIM}.emd"
go_id_dict_pth = data_path + "go-terms/go_id_dict"
protein_go_anno_pth = data_path +"stringDB-files/goa_human.gaf.gz"
alias_path = data_path + f'stringDB-files/{organism}.protein.aliases.v11.5.txt.gz'

neg_path = data_path + f'interaction-datasets/{organism}.protein.negative.v11.5.txt'
poz_path= data_path + f'interaction-datasets/{organism}.protein.links.v11.5.txt'

Note that the embedings could also be changed by running node2vec on a completly different Gene Ontology graph.

Attention analysis

Generating heatmaps

The heatmaps contain the aggregation of the attention weights after passing through each positive interaction from the dataset. Heatmaps can be generated via attention-plots.ipynb. Examples generated using the StringDB benchmark for S. Cerevisiae and H. Sapiens can be found in attention-heatmaps. To generate heatmaps on a new dataset, change the paths as shown in the example above. Note that the examples in the paper are generated using only positive interactions from the training dataset.

Analysing the attention for one interaction

In attention-per-interaction.ipynb we provide a notebook which can be used to analyse the attention values between GO terms given a single interaction. Here the heatmaps for each head and layer are generated.

Authors

Ioan Ieremie, Rob M. Ewing, Mahesan Niranjan

Citation

@article{10.1093/bioinformatics/btac104,
    author = {Ieremie, Ioan and Ewing, Rob M and Niranjan, Mahesan},
    title = "{TransformerGO: Predicting protein-protein interactions by modelling the attention between sets of gene ontology terms}",
    journal = {Bioinformatics},
    year = {2022},
    month = {02},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac104},
    url = {https://doi.org/10.1093/bioinformatics/btac104},
    note = {btac104},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac104/42546304/btac104.pdf},
}

Contact

ii1g17 [at] soton [dot] ac [dot] uk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.