Giter Site home page Giter Site logo

graphdta's People

Contributors

thinng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

graphdta's Issues

Questions about 1D convolution

Thank you for your excellent work. I have carefully read your code and have some questions about the one-dimensional convolution. You embed the protein sequence into 128 dimensions. So, the dimension of a batch of protein embedding matrix is [512, 1000, 128]. You do not exchange the last two dimensions. Therefore, the one-dimensional convolution will be executed on the last dimension, However, one-dimensional convolution is usually performed in the sequence dimension, that is to exchange the last two dimensions (permute). In short, for nn.Conv1d( ), your input is [batch_size, sequence length, embedded_dim], while the input required by pytorch should be [batch_size, embedded_dim, sequence length]. I think there is a problem here.

Question: How to get the embeddings of a list of SMILES strings

I wonder if there's any easy way to get the embeddings of a set/list of SMILEs string using the pre-trained models? For example, let's say we have a list of SMILEs like [smile_1, smiles_2,...,smile_n] how can I get the corresponding embedding vectors of SMILES in this list using the pre-trained models?

Confusion about the data process code

Hi, I’m a little confused about the code in create_data.py:
捕获

捕获1
This seems to indicate that the number 0 represents both the character 'A' and the default value. Whether this will have an impact on the network's prediction results?

Missing key(s) in state_dict

Hi,

When I tried to run python predict_with_pretrained_model.py,
for the GATNet model, I got an exception shown as follows. How can I fix it?
image

Many thanks!
Ryan

Pretrained Models?

Hi. In the paper I see mention of pretrained models being available in this repo, however I don't see any .model files? Do you perhaps have those located elsewhere?

Cannot be accelerated by GPU when run KIBA dataset

Hello thinng,
I met some issue when I running GraphDTA. Davis dataset was successfully run with GPU. However, when I change KIBA dataset, the GPU utilization is zero and model would run long time. Can you give me some advice about this problem? Thangs!

Invalid val_loss values for DAVIS dataset

While running the run_experimet.py file for DAVIS dataset, the training procedure produced unusual loss values. I attach the screenshot below:
Screenshot 2024-01-31 214132
Please help me fix this issue.

How to use a trained model to predict a new data?

I now have some sequences of proteins and drugs but no affinity data between them. How can I predict the affinity between them through your method?How should I handle my data and how to call your method? Can you predict? I don't know how to make this, please give me some methods, thank you!

how to prevent overfitting?

I saw your epochs=1000 and you didn't set early stopping. So, how do you prevent overfitting for such large epochs? Thanks.

In addition, I don't think you can use your test dataset to guide your training (training.py). Normally, when you train your model, you can't touch the test dataset. Otherwise, it is overfitting and can not have a generalized model. Maybe your training_validation.py is the right way to possibly have a generalized model.

In your paper, your results are better than DeepDTA and also WideDTA, I felt a little bit unsure if using a totally new test dataset.

How to use the pre-trained optimal model to predict my own dataset?

Hello, thank you for sharing your code and results. I would like to ask you a question about how to use your optimal model to predict my own dataset. I have converted my dataset from csv file to pt file, but when I try to use it for prediction, I encounter an error that says the dataset cannot be input to the model. Could you please give me some guidance on how to solve this problem? Thank you very much!

How to extract the graph data for a single graph from the DataLoader batch?

Dear Team,
I'm trying to understand how graphs of variable sizes such as small molecules can be passed as a batch to a deep learning model with this code. While looking at the output from the DataLoader in training_validation.py, I get the following output with the default parameters set in the code.

`cuda_name: cuda:0
Learning rate: 0.0005
Epochs: 1000

running on GCNNet_davis
Pre-processed data found: data/processed/davis_train.pt, loading ...
Pre-processed data found: data/processed/davis_test.pt, loading ...
Batch(batch=[16399], c_size=[512], edge_index=[2, 36172], target=[512, 1000], x=[16399, 78], y=[512])
Batch(batch=[16332], c_size=[512], edge_index=[2, 36122], target=[512, 1000], x=[16332, 78], y=[512])
Batch(batch=[16269], c_size=[512], edge_index=[2, 36000], target=[512, 1000], x=[16269, 78], y=[512])
Batch(batch=[16193], c_size=[512], edge_index=[2, 35794], target=[512, 1000], x=[16193, 78], y=[512])
Batch(batch=[16418], c_size=[512], edge_index=[2, 36284], target=[512, 1000], x=[16418, 78], y=[512])`

I understand that 512 molecular graphs with their corresponding target proteins and affinity values are present in 1 batch of data. But I'm confused about how to extract the data corresponding to the 1st or 2nd graph in each batch from the DataLoader. I'm a beginner in Pytorch Geometric - so please explain in detail even if it appears as a very naive question. Also, another question is - does c_size set an upper limit to the maximum number of nodes in the batch? What will happen if we omit to provide the c_size attribute here?
Anticipating your reply and thanks in advance!

How to change paired data into metrics Y

Thank you for your repo.

If my data set is like

SMILES Protein sequence affinity
CC1=C2C.....CC=C4)N MKKFFDSRR.....LLLVDQLIDL 7.365
CC1=C2C.....CC=C4) NSADAQSFLN.....MYTPHTVLQ 4.999

Do you have a function to change this to Y?

Cannot run pretrained models

Dear Mr.Thin,

I have got issues when running pretrained-models.

  • With model_GCNet_davis.model and model_GCNet_kiba.model, it returns mse and ci scores successfully but might be wrong values (GCNet_davis: ci = 0.651, mse = 1.274, GCNet_kiba: ci = 0,639, mse = 0.592)

  • With model_GINConvNet_davis.model and model_GCNet_kiba.model, it also returns mse and ci scores successfully but also might be wrong values (GINConvNet_davis: ci = 0.662, mse = 1.189, GINConvNet_kiba: ci = 0.648, mse = 0.608)

  • With other pretrained_models, it throws an error as the image below:
    image

image

Steps to reproduce:

python create_data.py
python predict_with_pretrained_model.py

FileNotFoundError: Could not find module 'D:\Anaconda3\envs\geometric\Lib\site-packages\torch_sparse\_convert.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

(geometric) D:\Anaconda3\envs\geometric\GraphDTA>python create_data.py
Traceback (most recent call last):
File "create_data.py", line 9, in
from utils import *
File "D:\Anaconda3\envs\geometric\GraphDTA\utils.py", line 5, in
from torch_geometric.data import InMemoryDataset, DataLoader
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric_init_.py", line 2, in
import torch_geometric.nn
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\nn_init_.py", line 2, in
from .data_parallel import DataParallel
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\nn\data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\data_init_.py", line 1, in
from .data import Data
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_geometric\data\data.py", line 7, in
from torch_sparse import coalesce, SparseTensor
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_sparse_init_.py", line 12, in
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
File "D:\Anaconda3\envs\geometric\lib\site-packages\torch_ops.py", line 105, in load_library
ctypes.CDLL(path)
File "D:\Anaconda3\envs\geometric\lib\ctypes_init_.py", line 381, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'D:\Anaconda3\envs\geometric\Lib\site-packages\torch_sparse_convert.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

please help me!!

image

Hello, when I run the "python training.py 0 0 0" instruction on the server, I get the above error, what is the reason?
I am reproducing your model and I am in a hurry and would like to get your help, thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.