thinng / graphdta Goto Github PK

View Code? Open in Web Editor NEW

212.0 212.0 76.0 69.03 MB

GraphDTA: Predicting drug-target binding affinity with graph neural networks

Home Page: https://doi.org/10.1093/bioinformatics/btaa921

Python 100.00%

graphdta's People

Contributors

Stargazers

Watchers

Forkers

aspirincode highdxy mike575 lifeixianshen nair-p zxsama stjordanis greenary-john pohjie aabbccgithub yeqing97 deepstatsanalysis aurora-yuan amanzadi lcy081099 michael-wzhu ieee820 tjustorm zhaoxuetong zhenglz kamilali joshuagithub winterfell-nov cgh2797 sailfish009 huzhyn lichenbiostat shenwanxiang peizhenbai superxiang freemanguohua alperenbolat0 zhangjiahuan17 fanshijianpharmacy kdpan fuhaitao95 plovor amdens-sci cuidachao gaoshan2006 changzhijiang geniusyx kiki-win johnshouie xinxinatg hit-peijin vidok-bk dptrsa-300 amine179 keerthanaramki ys-arch xuhuangchao infinitymiao rnaimehaom fsonya88 importsharp azra3lzz aminkhod kalininalab addevsan jmche li-study mathcom dhrubapatra wodegithubzh hamidhadipour dbpackage madhumakireddy woaiyong710 arua23 omar-muhammad zmxxmz0713 youssefezz annaliseyang

graphdta's Issues

Questions about 1D convolution

Thank you for your excellent work. I have carefully read your code and have some questions about the one-dimensional convolution. You embed the protein sequence into 128 dimensions. So, the dimension of a batch of protein embedding matrix is [512, 1000, 128]. You do not exchange the last two dimensions. Therefore, the one-dimensional convolution will be executed on the last dimension, However, one-dimensional convolution is usually performed in the sequence dimension, that is to exchange the last two dimensions (permute). In short, for nn.Conv1d( ), your input is [batch_size, sequence length, embedded_dim], while the input required by pytorch should be [batch_size, embedded_dim, sequence length]. I think there is a problem here.

Question: How to get the embeddings of a list of SMILES strings

I wonder if there's any easy way to get the embeddings of a set/list of SMILEs string using the pre-trained models? For example, let's say we have a list of SMILEs like [smile_1, smiles_2,...,smile_n] how can I get the corresponding embedding vectors of SMILES in this list using the pre-trained models?

Confusion about the data process code

Hi, I’m a little confused about the code in create_data.py:

This seems to indicate that the number 0 represents both the character 'A' and the default value. Whether this will have an impact on the network's prediction results?

Missing key(s) in state_dict

Hi,

When I tried to run python predict_with_pretrained_model.py,
for the GATNet model, I got an exception shown as follows. How can I fix it?

Many thanks!
Ryan

Pretrained Models?

Hi. In the paper I see mention of pretrained models being available in this repo, however I don't see any .model files? Do you perhaps have those located elsewhere?

Cannot be accelerated by GPU when run KIBA dataset

Hello thinng,
I met some issue when I running GraphDTA. Davis dataset was successfully run with GPU. However, when I change KIBA dataset, the GPU utilization is zero and model would run long time. Can you give me some advice about this problem? Thangs!

Invalid val_loss values for DAVIS dataset

While running the run_experimet.py file for DAVIS dataset, the training procedure produced unusual loss values. I attach the screenshot below:

Please help me fix this issue.

How to use a trained model to predict a new data?

I now have some sequences of proteins and drugs but no affinity data between them. How can I predict the affinity between them through your method？How should I handle my data and how to call your method? Can you predict? I don't know how to make this, please give me some methods, thank you!

how to prevent overfitting?

I saw your epochs=1000 and you didn't set early stopping. So, how do you prevent overfitting for such large epochs? Thanks.

In addition, I don't think you can use your test dataset to guide your training (training.py). Normally, when you train your model, you can't touch the test dataset. Otherwise, it is overfitting and can not have a generalized model. Maybe your training_validation.py is the right way to possibly have a generalized model.

In your paper, your results are better than DeepDTA and also WideDTA, I felt a little bit unsure if using a totally new test dataset.

How to use the pre-trained optimal model to predict my own dataset?

Hello, thank you for sharing your code and results. I would like to ask you a question about how to use your optimal model to predict my own dataset. I have converted my dataset from csv file to pt file, but when I try to use it for prediction, I encounter an error that says the dataset cannot be input to the model. Could you please give me some guidance on how to solve this problem? Thank you very much!

How to extract the graph data for a single graph from the DataLoader batch?

Dear Team,
I'm trying to understand how graphs of variable sizes such as small molecules can be passed as a batch to a deep learning model with this code. While looking at the output from the DataLoader in training_validation.py, I get the following output with the default parameters set in the code.

`cuda_name: cuda:0
Learning rate: 0.0005
Epochs: 1000

running on GCNNet_davis
Pre-processed data found: data/processed/davis_train.pt, loading ...
Pre-processed data found: data/processed/davis_test.pt, loading ...
Batch(batch=[16399], c_size=[512], edge_index=[2, 36172], target=[512, 1000], x=[16399, 78], y=[512])
Batch(batch=[16332], c_size=[512], edge_index=[2, 36122], target=[512, 1000], x=[16332, 78], y=[512])
Batch(batch=[16269], c_size=[512], edge_index=[2, 36000], target=[512, 1000], x=[16269, 78], y=[512])
Batch(batch=[16193], c_size=[512], edge_index=[2, 35794], target=[512, 1000], x=[16193, 78], y=[512])
Batch(batch=[16418], c_size=[512], edge_index=[2, 36284], target=[512, 1000], x=[16418, 78], y=[512])`

I understand that 512 molecular graphs with their corresponding target proteins and affinity values are present in 1 batch of data. But I'm confused about how to extract the data corresponding to the 1st or 2nd graph in each batch from the DataLoader. I'm a beginner in Pytorch Geometric - so please explain in detail even if it appears as a very naive question. Also, another question is - does c_size set an upper limit to the maximum number of nodes in the batch? What will happen if we omit to provide the c_size attribute here?
Anticipating your reply and thanks in advance!

What do the two outputs of the predicting function represent

hello professor
I want to know What are the two outputs of the prediction function in the predict_with_pretrained_model.py file.
Do they represent values for predicted and actual affinity?

thank you very much

How to change paired data into metrics Y

Thank you for your repo.

If my data set is like

SMILES Protein sequence affinity
CC1=C2C.....CC=C4)N MKKFFDSRR.....LLLVDQLIDL 7.365
CC1=C2C.....CC=C4) NSADAQSFLN.....MYTPHTVLQ 4.999

Do you have a function to change this to Y?

Cannot run pretrained models

Dear Mr.Thin,

I have got issues when running pretrained-models.

With model_GCNet_davis.model and model_GCNet_kiba.model, it returns mse and ci scores successfully but might be wrong values (GCNet_davis: ci = 0.651, mse = 1.274, GCNet_kiba: ci = 0,639, mse = 0.592)
With model_GINConvNet_davis.model and model_GCNet_kiba.model, it also returns mse and ci scores successfully but also might be wrong values (GINConvNet_davis: ci = 0.662, mse = 1.189, GINConvNet_kiba: ci = 0.648, mse = 0.608)
With other pretrained_models, it throws an error as the image below:

Steps to reproduce:

python create_data.py
python predict_with_pretrained_model.py

FileNotFoundError: Could not find module 'D:\Anaconda3\envs\geometric\Lib\site-packages\torch_sparse\_convert.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

(geometric) D:\Anaconda3\envs\geometric\GraphDTA>python create_data.py
Traceback (most recent call last):
File "create_data.py", line 9, in
from utils import *
File "D:\Anaconda3\envs\geometric\GraphDTA\utils.py", line 5, in
from torch_geometric.data import InMemoryDataset, DataLoader
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric_init_.py", line 2, in
import torch_geometric.nn
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\nn_init_.py", line 2, in
from .data_parallel import DataParallel
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\nn\data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\data_init_.py", line 1, in
from .data import Data
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\data\data.py", line 7, in
from torch_sparse import coalesce, SparseTensor
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_sparse_init_.py", line 12, in
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_ops.py", line 105, in load_library
ctypes.CDLL(path)
File "D:\Anaconda3\envs\geometric\lib\ctypes_init_.py", line 381, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'D:\Anaconda3\envs\geometric\Lib\site-packages\torch_sparse_convert.pyd' (or one of its dependencies). Try using the full path with constructor syntax.