yvquanli / glam Goto Github PK

View Code? Open in Web Editor NEW

39.0 39.0 8.0 17.65 MB

Code for "An adaptive graph learning method for automated molecular interactions and properties predictions".

Home Page: https://www.nature.com/articles/s42256-022-00501-8

License: MIT License

Python 89.45% Jupyter Notebook 10.55%

deep-learning graph-neural-networks pytorch

glam's People

Contributors

Stargazers

Watchers

Forkers

wang-lin-boop bwang-ecnu ys-arch byun-jinyoung masterwhook taiwanhuachenyu hpc-neau confusedant

glam's Issues

How to get the file "ddi_total.csv"?

Hi, I encountered a problem when testing with my own data, how to get the file "ddi_total.csv" in order to put it GLAM-DDI/raw/drugbank_caster/ddi_total.csv?

How to know the parameters of the model?

In the Supplementary information of the article, I can find the parameters of the model. Can you tell me what the parameters of the model are approximately?

请问如何使用LIT-PCBA数据集 & DrugBank数据集？

LIT-PCBA数据集中的mol文件是什么意思呢？我看一个target下面往往有多个蛋白的mol文件，有这么多个蛋白质如何确定active的和inactive的分子到底是和哪一个蛋白互作？您处理data的脚本当中的sequence又是如何确定的呢？我没有在数据集中看到这些信息。
DrugBank那个数据集下载以后的 df_pair 和 dd_pair 分别代表什么呢？

CUDA out of memory

When I am reproducing your work, I find always CUDA out of memory.

like this:

Traceback (most recent call last):
File "run.py", line 62, in
trainer.train_and_test()
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 101, in train_and_test
self.train()
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 83, in train
trn_loss = self.train_iterations()
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 295, in train_iterations
output = self.model(mol_batch).view(-1)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/model.py", line 54, in forward
xm, hm = self.mol_conv(xm, data_mol.edge_index, data_mol.edge_attr, h=hm, batch=data_mol.batch)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 259, in forward
x = self.conv(x, edge_index, edge_attr)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 131, in forward
return self.conv(x, edge_index, edge_attr)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 40, in forward
return self.propagate(edge_index, x=x, edge_attr=edge_attr, size=size)
File "/tmp/layer_TripletMessage_propagate_6_8b4kbw.py", line 194, in propagate
out = self.message(
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 49, in message
alpha = (triplet * self.weight_triplet_att).sum(dim=-1) # time consuming 12.14s
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 866.00 MiB (GPU 4; 23.64 GiB total capacity; 10.45 GiB already allocated; 31.56 MiB free; 10.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How to deal with that?

Cannot reproduce your results in paper

Hi, I'm reproducing your results on ESOL dataset, but when using the default params, the test RMSE seems to be much higher than the result in your paper.

Could you provide your best parameters' setting? Below is the output logs I received.

Model saved at epoch 132
Testing...
{'dataset_root': 'chemrl_downstream_datasets/esol/', 'dataset': 'esol', 'split': 'scaffold', 'seed': 1234, 'split_seed': 1234, 'gpu': 0, 'note': 'None2', 'hid_dim_alpha': 4, 'mol_block': '_NNConv', 'e_dim': 1024, 'out_dim': 1, 'message_steps': 3, 'mol_readout': 'GlobalPool5', 'pre_norm': '_None', 'graph_norm': '_PairNorm', 'flat_norm': '_None', 'end_norm': '_None', 'pre_do': '_None()', 'graph_do': '_None()', 'flat_do': 'Dropout(0.2)', 'end_do': 'Dropout(0.2)', 'pre_act': 'RReLU', 'graph_act': 'RReLU', 'flat_act': 'RReLU', 'graph_res': 1, 'batch_size': 32, 'epochs': 999, 'loss': 'mse', 'optim': 'Adam', 'k': 6, 'lr': 0.001, 'lr_reduce_rate': 0.7, 'lr_reduce_patience': 20, 'early_stop_patience': 50, 'verbose_patience': 500}
{'testloss': 0.9559108018875122, 'valloss': 0.6927680969238281}|{'ci': 0.8770122343850612, 'mse': 0.97501713, 'rmse': 0.9874295571709956, 'r2': 0.779877562758926}|{'valci': 0.8853528843055108, 'valmse': 0.6884, 'valrmse': 0.8296987227490854, 'valr2': 0.8644862290531075}

Thanks for your time and contribution !

I encountered two problems during the run a demo phase.

Hello, I'm very interested in your project, but I encountered two problems in the run a demo stage.

Test for dataset.py and run.py!
Loading dataset...
Traceback (most recent call last):
File "./run.py", line 55, in
args, dataset, Trainer = auto_dataset(args)
File "/root/prog/wzp/GLAM/GLAM/src_1gp/dataset.py", line 38, in auto_dataset
_dataset = Dataset(args.dataset_root,
File "/root/prog/wzp/GLAM/GLAM/src_1gp/dataset.py", line 107, in init
trn, val, test = self.split(type=split)
File "/root/prog/wzp/GLAM/GLAM/src_1gp/dataset.py", line 177, in split
trn, val, test = random_scaffold_split(dataset=shuffled, smiles_list=shuffled.data.smi,
File "/root/prog/wzp/GLAM/GLAM/src_1gp/utils.py", line 160, in random_scaffold_split
scaffold_sets = rng.perperturb(list(scaffolds.values()))
AttributeError: 'numpy.random.mtrand.RandomState' object has no attribute 'perperturb'
Demo running...
GLAMHelper pr of demo start...
Solver for demo running start @ Fri Jul 1 10:43:27 2022
1 gpus available
Configuration 0 start:
config_id is 532ed
config is {'dataset': 'demo', 'dataset_root': './demo', 'seed': 1234, 'split_seed': 1234, 'hid_dim_alpha': 2, 'e_dim': 2048, 'mol_block': '_TripletMessageLight', 'message_steps': 2, 'mol_readout': 'GlobalLAPool', 'pre_do': '_None()', 'graph_do': '_None()', 'flat_do': 'Dropout(0.1)', 'end_do': 'Dropout(0.5)', 'pre_norm': '_LayerNorm', 'graph_norm': '_None', 'flat_norm': '_LayerNorm', 'end_norm': '_None', 'pre_act': 'RReLU', 'graph_act': 'CELU', 'flat_act': 'ReLU', 'graph_res': 1, 'loss': 'bcel', 'batch_size': 768, 'optim': 'Ranger', 'k': 3, 'epochs': 30, 'lr': 0.001, 'early_stop_patience': 50}
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce GTX 1050 Ti
memory.free:4020
memory.total:4038
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce GTX 1050 Ti
memory.free:4020
memory.total:4038
python3 run.py --dataset demo --dataset_root ./demo --seed 12 --split_seed 1234 --hid_dim_alpha 2 --e_dim 2048 --mol_block _TripletMessageLight --message_steps 2 --mol_readout GlobalLAPool --pre_do _None() --graph_do _None() --flat_do Dropout(0.1) --end_do Dropout(0.5) --pre_norm _LayerNorm --graph_norm _None --flat_norm _LayerNorm --end_norm _None --pre_act RReLU --graph_act CELU --flat_act ReLU --graph_res 1 --loss bcel --batch_size 768 --optim Ranger --k 3 --epochs 30 --lr 0.001 --early_stop_patience 50 --note 532ed --gpu 0
Traceback (most recent call last):
File "run.py", line 7, in
import os; os.chdir(os.path.dirname(file))
FileNotFoundError: [Errno 2] No such file or directory: ''

Reproduce results from paper.

Hi Yuquan Li,
Thanks for your previous reply.
I had another question: What are the configurations to reproduce your results in the paper? I.e. when I run python3 glam.py --n_init_configs ... What do I set for each dataset?

Problem to run demo.py

After following all the command in the sction "Installation" I tried to run a demo with command:
"cd ./GLAM/src_1gp
python3 demo.py"
but there is this error:
(GLAM) vito@vito-HP-ENVY-15-Notebook-PC:~/project/GLAM/src_1gp$ python3 demo.py
Traceback (most recent call last):
File "demo.py", line 2, in
os.chdir(os.path.dirname(file))
FileNotFoundError: [Errno 2] No such file or directory: ''

I don't know what's wrong and how can I resolve this problem

Stuck in Training start ...

I downloaded the bindingdb and lit_pcba datasets, and put them in the correct directories. However, I encounter this problem where I'm stuck at "Training start ...".
demo.py works properly. So I'm not sure what's wrong. Please help!

Inference process of bindingdb_c start...
Solver for bindingdb_c running start @ Fri Jul 15 16:40:52 2022
1 gpus available
Configuration 0 start:
config_id is 311c2
config is {'dataset': 'bindingdb_c', 'hid_dim_alpha': 2, 'e_dim': 2048, 'mol_block': '_TripletMessage', 'pro_block': '_GATConv', 'message_steps': 2, 'mol_readout': 'GlobalPool5', 'pro_readout': 'GlobalPool5', 'pre_do': 'Dropout(0.1)', 'graph_do': 'Dropout(0.1)', 'flat_do': 'Dropout(0.2)', 'end_do': 'Dropout(0.1)', 'pre_norm': '_None', 'graph_norm': '_LayerNorm', 'flat_norm': '_None', 'end_norm': '_LayerNorm', 'pre_act': 'LeakyReLU', 'graph_act': 'RReLU', 'flat_act': 'ReLU', 'end_act': 'CELU', 'graph_res': 1, 'loss': 'ce', 'batch_size': 64, 'optim': 'Ranger', 'k': 6, 'epochs': 20, 'lr': 0.0001, 'early_stop_patience': 50}
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce RTX 3070 Ti
memory.free:7243
memory.total:8192
0
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce RTX 3070 Ti
memory.free:7243
memory.total:8192
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce RTX 3070 Ti
memory.free:7243
memory.total:8192
python3 run.py --dataset bindingdb_c --hid_dim_alpha 2 --e_dim 2048 --mol_block _TripletMessage --pro_block _GATConv --message_steps 2 --mol_readout GlobalPool5 --pro_readout GlobalPool5 --pre_do Dropout(0.1) --graph_do Dropout(0.1) --flat_do Dropout(0.2) --end_do Dropout(0.1) --pre_norm _None --graph_norm _LayerNorm --flat_norm _None --end_norm _LayerNorm --pre_act LeakyReLU --graph_act RReLU --flat_act ReLU --end_act CELU --graph_res 1 --loss ce --batch_size 64 --optim Ranger --k 6 --epochs 20 --lr 0.0001 --early_stop_patience 50 --note 311c2 --gpu 0 --seed 1
Loading dataset...
Training init...
################################################################################
dataset_root:../../Dataset/GLAM-DTI
dataset:bindingdb_c
seed:1
gpu:0
note:311c2
hid_dim_alpha:2
mol_block:_TripletMessage
pro_block:_GATConv
e_dim:2048
out_dim:2
message_steps:2
mol_readout:GlobalPool5
pro_readout:GlobalPool5
pre_norm:_None
graph_norm:_LayerNorm
flat_norm:_None
end_norm:_LayerNorm
pre_do:Dropout(0.1)
graph_do:Dropout(0.1)
flat_do:Dropout(0.2)
end_do:Dropout(0.1)
pre_act:LeakyReLU
graph_act:RReLU
flat_act:ReLU
end_act:CELU
graph_res:1
batch_size:64
epochs:20
loss:ce
optim:Ranger
k:6
lr:0.0001
lr_reduce_rate:0.7
lr_reduce_patience:20
early_stop_patience:50
verbose_patience:2000
################################################################################
save id: 2022-07-15_20:40:59.028_seed_1
run device: cuda:0
train set num:48006 valid set num:5475 test set num: 5371
total parameters:165232
################################################################################
Architecture(
(mol_lin0): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=15, out_features=30, bias=True)
(act): LeakyReLU(negative_slope=0.01)
)
(pro_lin0): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=49, out_features=30, bias=True)
(act): LeakyReLU(negative_slope=0.01)
)
(mol_conv): MessageBlock(
(norm): _LayerNorm(
(norm): LayerNorm(30)
)
(dropout): Dropout(p=0.1, inplace=False)
(conv): _TripletMessage(
(conv): TripletMessage()
)
(gru): GRU(30, 30)
(act): RReLU(lower=0.125, upper=0.3333333333333333)
)
(pro_conv): MessageBlock(
(norm): _LayerNorm(
(norm): LayerNorm(30)
)
(dropout): Dropout(p=0.1, inplace=False)
(conv): _GATConv(
(conv): GATConv(30, 30, heads=1)
)
(gru): None
(act): RReLU(lower=0.125, upper=0.3333333333333333)
)
(mol_readout): GlobalPool5()
(pro_readout): GlobalPool5()
(mol_flat): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.2, inplace=False)
(linear): Linear(in_features=150, out_features=30, bias=True)
(act): ReLU()
)
(pro_flat): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.2, inplace=False)
(linear): Linear(in_features=150, out_features=30, bias=True)
(act): ReLU()
)
(lin_out0): LinearBlock(
(norm): _LayerNorm(
(norm): LayerNorm(64)
)
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=64, out_features=2048, bias=True)
(act): CELU(alpha=1.0)
)
(lin_out1): LinearBlock(
(norm): _LayerNorm(
(norm): LayerNorm(2048)
)
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=2048, out_features=2, bias=True)
(act): _None()
)
)
################################################################################
Training start...
0%| | 0/20 [00:00<?, ?it/sbatch 0 training loss: 0.79090 time elapsed 0.00 hrs (0.0 mins)

How to get the file "protein_maps_dict.ckpt"?

Hello, I encountered a problem when testing with my own data, how to get the file "protein_maps_dict.ckpt"?

yvquanli / glam Goto Github PK

glam's People

Contributors

Stargazers

Watchers

Forkers

glam's Issues

How to get the file "ddi_total.csv"?

How to know the parameters of the model?

请问如何使用LIT-PCBA数据集 & DrugBank数据集？

CUDA out of memory

Cannot reproduce your results in paper

I encountered two problems during the run a demo phase.

Reproduce results from paper.

Problem to run demo.py

Stuck in Training start ...

How to get the file "protein_maps_dict.ckpt"?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent