Giter Site home page Giter Site logo

yueyu1030 / sumgnn Goto Github PK

View Code? Open in Web Editor NEW
78.0 0.0 20.0 22.7 MB

[Bioinformatics 2021] This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'.

Home Page: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab207/6189090

Python 99.92% Shell 0.08%
drug-drug-interaction gnn graph-neural-networks drug-discovery subgraph knowledge-graph biomedical-knowledge-graph bioinformatics drugbank biosnap twosides

sumgnn's Introduction

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization

This is the code for our paper ``SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'' (published in Bioinformatics'21) [link].

Install

git clone [email protected]:yueyu1030/SumGNN.git
cd SumGNN
pip install -r requirements.txt

Example

python train.py 
    -d drugbank         # task
    -e ddi_hop3         # the name for the log for experiments
    --gpu=0             # ID of GPU
    --hop=3             # size of the hops for subgraph
    --batch=256         # batch size for samples
    --emb_dim=32        # size of embedding for GNN layers
    -b=10               # size of basis for relation kernel

You can also change the d to BioSNAP. Please change the e accordingly. The trained model and the logs are stored in experiments folder. Note that to ensure a fair comparison, we test all models on the same negative triplets.

Dataset

We provide the dataset in the data folder.

Data Source Description
Drugbank This link A drug-drug interaction network betweeen 1,709 drugs with 136,351 interactions.
TWOSIDES This link A drug-drug interaction network betweeen 645 drugs with 46221 interactions.
Hetionet This link The knowledge graph containing 33,765 nodes out of 11 types (e.g., gene, disease, pathway,molecular function and etc.) with 1,690,693 edges from 23 relation types after preprocessing (To ensure no information leakage, we remove all the overlapping edges between HetioNet and the dataset).

We provide the mapping file between ids in our pre-processed data and their original name/drugbank id as well as a copy of hetionet data and their mapping file on this link.

Knowledge Graph Embedding

We train the knowledge graph embedding based on the framework in OpenKE.

To obtain the embedding on your own, you need to first feed the triples in train.txt (edges in dataset) and relations_2hop.txt (edges in KG) as edges into their toolkit and obtain the embeddings for each node. Then, you can incorporate this embedding into our framework by modifying the line 44-45 in model/dgl/rgcn_model.py.

Cite Us

Please kindly cite this paper if you find it useful for your research. Thanks!

@article{yu2021sumgnn,
  title={Sumgnn: Multi-typed drug interaction prediction via efficient knowledge graph summarization},
  author={Yu, Yue and Huang, Kexin and Zhang, Chao and Glass, Lucas M and Sun, Jimeng and Xiao, Cao},
  journal={Bioinformatics},
  year={2021}
}

Acknowledgement

The code framework is based on GraIL.

sumgnn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sumgnn's Issues

questions about the codes

  1. The TWOSIDES source link maybe wrong? Probably this one. http://snap.stanford.edu/biodata/datasets/10017/10017-ChChSe-Decagon.html
  2. In the BioSNAP dataset you have processed,including train.txt, test.txt, dev.txt and relations2hop.txt, I only found the biomedical entities(drug, protein,gene) number given in digital form.What's the correspondence between these biomedical entities numbers and real entities? You only give the correspondence about drugbank dataset.

Incomplete requirements.txt

Your requirements.txt is incomplete. When I tried to install the requirements and run the train.py, several errors were thrown. The packages I had to install manually were the following:

  • torch (since 1.2.0 is not up to date anymore)
  • scipy
  • dglteam dgl (since the installation of dgl works differently now)
  • requests
  • tqdm
  • sklearn

Please consider adding them (with the appropriate versions) to the requirements.txt to help other future users and prevent further trouble installing the package.

How to get drug fingerprint?

What network was used to obtain the drug fingerprint in your paper? I only saw the final results in the project.

File not found error

with open('cid2smiles.txt', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'cid2smiles.txt'

This error raised while running read_data.py (external_data).How to get the file?

Make code pip installable

Right now it's not so obvious how to install or use the code besides as a script. It would be good if the code were properly packaged and made pip installable

try except in subgraph_extraction datasets.py line 166 to 179

Excuse me, I would like to ask what is the meaning of the paragraph described in the title. When I run the code, an error is reported in the valid SubgraphDataset stage, Information is 'no edge from 0 to one'.Thank you very much if someone can help me.

Some bug appear in the model

"Traceback (most recent call last):
File "train.py", line 230, in
main(params)
File "train.py", line 31, in main
train = SubgraphDataset(params.db_path, 'train_pos', 'train_neg', params.file_paths,
File "/home/aita/4444/chenwenqi/model/SUMGNN/SumGNN-master/subgraph_extraction/datasets.py", line 71, in init
self.db_neg = self.main_env.open_db(db_name_neg.encode())
lmdb.ReadonlyError: mdb_dbi_open: Permission denied"
When i tried to run the model,it told me the above error.
But yesterday when i first time run the model it is smoothly successful,the second time i run i meet the bug.
I have set up the running environment in accordance with the requirement.txt.

Values error when using BioSNAP dataset

When I want to run the model on BioSNAP dataset, it threw out the value error: Using a target size (torch.Size([178, 178])) that is different to the input size (torch.Size([178, batch_size])) is deprecated. Please ensure they have the same size, where 178 is the number of samples in last batch.

关于知识图谱的问题

    您好,您这篇Sum GNN太棒了,我复现了您的代码,您在论文中指出使用的是HetioNet,如果没理解错的话处理后得到的应该是那个relation_2hop.txt文件吧?
    我想问一下您是怎么处理得到这个文件的呢?因为我也想尝试一下去利用外部知识图谱,但是我不会处理.还有个问题,既然是同一个知识图谱里得到的,为什么BioSNAP和DRUGBANK的relation_2hop.txt文件是不一样的呢?
    刚刚入门学习,对这些知识都不了解,又没有解决问题的好办法,所以只能冒昧打扰,十分期待您的回复,万分感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.