Giter Site home page Giter Site logo

yvquanli / glam Goto Github PK

View Code? Open in Web Editor NEW
39.0 39.0 8.0 17.65 MB

Code for "An adaptive graph learning method for automated molecular interactions and properties predictions".

Home Page: https://www.nature.com/articles/s42256-022-00501-8

License: MIT License

Python 89.45% Jupyter Notebook 10.55%
deep-learning graph-neural-networks pytorch

glam's People

Contributors

yvquanli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

glam's Issues

How to get the file "ddi_total.csv"?

Hi, I encountered a problem when testing with my own data, how to get the file "ddi_total.csv" in order to put it GLAM-DDI/raw/drugbank_caster/ddi_total.csv?

请问如何使用LIT-PCBA数据集 & DrugBank数据集?

  1. LIT-PCBA数据集中 的mol文件是什么意思呢?我看一个target下面往往有多个蛋白的mol文件,有这么多个蛋白质如何确定active的和inactive的分子到底是和哪一个蛋白互作?您处理data的脚本当中的sequence又是如何确定的呢?我没有在数据集中看到这些信息。
  2. DrugBank那个数据集下载以后的 df_pair 和 dd_pair 分别代表什么呢?

CUDA out of memory

When I am reproducing your work, I find always CUDA out of memory.

like this:

Traceback (most recent call last):
File "run.py", line 62, in
trainer.train_and_test()
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 101, in train_and_test
self.train()
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 83, in train
trn_loss = self.train_iterations()
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/trainer.py", line 295, in train_iterations
output = self.model(mol_batch).view(-1)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/model.py", line 54, in forward
xm, hm = self.mol_conv(xm, data_mol.edge_index, data_mol.edge_attr, h=hm, batch=data_mol.batch)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 259, in forward
x = self.conv(x, edge_index, edge_attr)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 131, in forward
return self.conv(x, edge_index, edge_attr)
File "/export/disk3/why/software/Miniforge3/envs/PyG252/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 40, in forward
return self.propagate(edge_index, x=x, edge_attr=edge_attr, size=size)
File "/tmp/layer_TripletMessage_propagate_6_8b4kbw.py", line 194, in propagate
out = self.message(
File "/export/disk7/why/workbench/MERGE/v0/GLAM_repeat/GLAM/src_1gp_EmbeddingCompare/layer.py", line 49, in message
alpha = (triplet * self.weight_triplet_att).sum(dim=-1) # time consuming 12.14s
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 866.00 MiB (GPU 4; 23.64 GiB total capacity; 10.45 GiB already allocated; 31.56 MiB free; 10.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How to deal with that?

Cannot reproduce your results in paper

Hi, I'm reproducing your results on ESOL dataset, but when using the default params, the test RMSE seems to be much higher than the result in your paper.

Could you provide your best parameters' setting? Below is the output logs I received.

Model saved at epoch 132
Testing...
{'dataset_root': 'chemrl_downstream_datasets/esol/', 'dataset': 'esol', 'split': 'scaffold', 'seed': 1234, 'split_seed': 1234, 'gpu': 0, 'note': 'None2', 'hid_dim_alpha': 4, 'mol_block': '_NNConv', 'e_dim': 1024, 'out_dim': 1, 'message_steps': 3, 'mol_readout': 'GlobalPool5', 'pre_norm': '_None', 'graph_norm': '_PairNorm', 'flat_norm': '_None', 'end_norm': '_None', 'pre_do': '_None()', 'graph_do': '_None()', 'flat_do': 'Dropout(0.2)', 'end_do': 'Dropout(0.2)', 'pre_act': 'RReLU', 'graph_act': 'RReLU', 'flat_act': 'RReLU', 'graph_res': 1, 'batch_size': 32, 'epochs': 999, 'loss': 'mse', 'optim': 'Adam', 'k': 6, 'lr': 0.001, 'lr_reduce_rate': 0.7, 'lr_reduce_patience': 20, 'early_stop_patience': 50, 'verbose_patience': 500}
{'testloss': 0.9559108018875122, 'valloss': 0.6927680969238281}|{'ci': 0.8770122343850612, 'mse': 0.97501713, 'rmse': 0.9874295571709956, 'r2': 0.779877562758926}|{'valci': 0.8853528843055108, 'valmse': 0.6884, 'valrmse': 0.8296987227490854, 'valr2': 0.8644862290531075}

Thanks for your time and contribution !

I encountered two problems during the run a demo phase.

Hello, I'm very interested in your project, but I encountered two problems in the run a demo stage.

Test for dataset.py and run.py!
Loading dataset...
Traceback (most recent call last):
File "./run.py", line 55, in
args, dataset, Trainer = auto_dataset(args)
File "/root/prog/wzp/GLAM/GLAM/src_1gp/dataset.py", line 38, in auto_dataset
_dataset = Dataset(args.dataset_root,
File "/root/prog/wzp/GLAM/GLAM/src_1gp/dataset.py", line 107, in init
trn, val, test = self.split(type=split)
File "/root/prog/wzp/GLAM/GLAM/src_1gp/dataset.py", line 177, in split
trn, val, test = random_scaffold_split(dataset=shuffled, smiles_list=shuffled.data.smi,
File "/root/prog/wzp/GLAM/GLAM/src_1gp/utils.py", line 160, in random_scaffold_split
scaffold_sets = rng.perperturb(list(scaffolds.values()))
AttributeError: 'numpy.random.mtrand.RandomState' object has no attribute 'perperturb'
Demo running...
GLAMHelper pr of demo start...
Solver for demo running start @ Fri Jul 1 10:43:27 2022
1 gpus available
Configuration 0 start:
config_id is 532ed
config is {'dataset': 'demo', 'dataset_root': './demo', 'seed': 1234, 'split_seed': 1234, 'hid_dim_alpha': 2, 'e_dim': 2048, 'mol_block': '_TripletMessageLight', 'message_steps': 2, 'mol_readout': 'GlobalLAPool', 'pre_do': '_None()', 'graph_do': '_None()', 'flat_do': 'Dropout(0.1)', 'end_do': 'Dropout(0.5)', 'pre_norm': '_LayerNorm', 'graph_norm': '_None', 'flat_norm': '_LayerNorm', 'end_norm': '_None', 'pre_act': 'RReLU', 'graph_act': 'CELU', 'flat_act': 'ReLU', 'graph_res': 1, 'loss': 'bcel', 'batch_size': 768, 'optim': 'Ranger', 'k': 3, 'epochs': 30, 'lr': 0.001, 'early_stop_patience': 50}
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce GTX 1050 Ti
memory.free:4020
memory.total:4038
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce GTX 1050 Ti
memory.free:4020
memory.total:4038
python3 run.py --dataset demo --dataset_root ./demo --seed 12 --split_seed 1234 --hid_dim_alpha 2 --e_dim 2048 --mol_block _TripletMessageLight --message_steps 2 --mol_readout GlobalLAPool --pre_do _None() --graph_do _None() --flat_do Dropout(0.1) --end_do Dropout(0.5) --pre_norm _LayerNorm --graph_norm _None --flat_norm _LayerNorm --end_norm _None --pre_act RReLU --graph_act CELU --flat_act ReLU --graph_res 1 --loss bcel --batch_size 768 --optim Ranger --k 3 --epochs 30 --lr 0.001 --early_stop_patience 50 --note 532ed --gpu 0
Traceback (most recent call last):
File "run.py", line 7, in
import os; os.chdir(os.path.dirname(file))
FileNotFoundError: [Errno 2] No such file or directory: ''

Reproduce results from paper.

Hi Yuquan Li,
Thanks for your previous reply.
I had another question: What are the configurations to reproduce your results in the paper? I.e. when I run python3 glam.py --n_init_configs ... What do I set for each dataset?

Problem to run demo.py

After following all the command in the sction "Installation" I tried to run a demo with command:
"cd ./GLAM/src_1gp
python3 demo.py"
but there is this error:
(GLAM) vito@vito-HP-ENVY-15-Notebook-PC:~/project/GLAM/src_1gp$ python3 demo.py
Traceback (most recent call last):
File "demo.py", line 2, in
os.chdir(os.path.dirname(file))
FileNotFoundError: [Errno 2] No such file or directory: ''

I don't know what's wrong and how can I resolve this problem

Stuck in Training start ...

I downloaded the bindingdb and lit_pcba datasets, and put them in the correct directories. However, I encounter this problem where I'm stuck at "Training start ...".
demo.py works properly. So I'm not sure what's wrong. Please help!

Inference process of bindingdb_c start...
Solver for bindingdb_c running start @ Fri Jul 15 16:40:52 2022
1 gpus available
Configuration 0 start:
config_id is 311c2
config is {'dataset': 'bindingdb_c', 'hid_dim_alpha': 2, 'e_dim': 2048, 'mol_block': '_TripletMessage', 'pro_block': '_GATConv', 'message_steps': 2, 'mol_readout': 'GlobalPool5', 'pro_readout': 'GlobalPool5', 'pre_do': 'Dropout(0.1)', 'graph_do': 'Dropout(0.1)', 'flat_do': 'Dropout(0.2)', 'end_do': 'Dropout(0.1)', 'pre_norm': '_None', 'graph_norm': '_LayerNorm', 'flat_norm': '_None', 'end_norm': '_LayerNorm', 'pre_act': 'LeakyReLU', 'graph_act': 'RReLU', 'flat_act': 'ReLU', 'end_act': 'CELU', 'graph_res': 1, 'loss': 'ce', 'batch_size': 64, 'optim': 'Ranger', 'k': 6, 'epochs': 20, 'lr': 0.0001, 'early_stop_patience': 50}
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce RTX 3070 Ti
memory.free:7243
memory.total:8192
0
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce RTX 3070 Ti
memory.free:7243
memory.total:8192
Choosing the GPU device has largest free memory...
Sorted by free memory size
Using GPU 0:
index:0
gpu_name:NVIDIA GeForce RTX 3070 Ti
memory.free:7243
memory.total:8192
python3 run.py --dataset bindingdb_c --hid_dim_alpha 2 --e_dim 2048 --mol_block _TripletMessage --pro_block _GATConv --message_steps 2 --mol_readout GlobalPool5 --pro_readout GlobalPool5 --pre_do Dropout(0.1) --graph_do Dropout(0.1) --flat_do Dropout(0.2) --end_do Dropout(0.1) --pre_norm _None --graph_norm _LayerNorm --flat_norm _None --end_norm _LayerNorm --pre_act LeakyReLU --graph_act RReLU --flat_act ReLU --end_act CELU --graph_res 1 --loss ce --batch_size 64 --optim Ranger --k 6 --epochs 20 --lr 0.0001 --early_stop_patience 50 --note 311c2 --gpu 0 --seed 1
Loading dataset...
Training init...
################################################################################
dataset_root:../../Dataset/GLAM-DTI
dataset:bindingdb_c
seed:1
gpu:0
note:311c2
hid_dim_alpha:2
mol_block:_TripletMessage
pro_block:_GATConv
e_dim:2048
out_dim:2
message_steps:2
mol_readout:GlobalPool5
pro_readout:GlobalPool5
pre_norm:_None
graph_norm:_LayerNorm
flat_norm:_None
end_norm:_LayerNorm
pre_do:Dropout(0.1)
graph_do:Dropout(0.1)
flat_do:Dropout(0.2)
end_do:Dropout(0.1)
pre_act:LeakyReLU
graph_act:RReLU
flat_act:ReLU
end_act:CELU
graph_res:1
batch_size:64
epochs:20
loss:ce
optim:Ranger
k:6
lr:0.0001
lr_reduce_rate:0.7
lr_reduce_patience:20
early_stop_patience:50
verbose_patience:2000
################################################################################
save id: 2022-07-15_20:40:59.028_seed_1
run device: cuda:0
train set num:48006 valid set num:5475 test set num: 5371
total parameters:165232
################################################################################
Architecture(
(mol_lin0): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=15, out_features=30, bias=True)
(act): LeakyReLU(negative_slope=0.01)
)
(pro_lin0): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=49, out_features=30, bias=True)
(act): LeakyReLU(negative_slope=0.01)
)
(mol_conv): MessageBlock(
(norm): _LayerNorm(
(norm): LayerNorm(30)
)
(dropout): Dropout(p=0.1, inplace=False)
(conv): _TripletMessage(
(conv): TripletMessage()
)
(gru): GRU(30, 30)
(act): RReLU(lower=0.125, upper=0.3333333333333333)
)
(pro_conv): MessageBlock(
(norm): _LayerNorm(
(norm): LayerNorm(30)
)
(dropout): Dropout(p=0.1, inplace=False)
(conv): _GATConv(
(conv): GATConv(30, 30, heads=1)
)
(gru): None
(act): RReLU(lower=0.125, upper=0.3333333333333333)
)
(mol_readout): GlobalPool5()
(pro_readout): GlobalPool5()
(mol_flat): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.2, inplace=False)
(linear): Linear(in_features=150, out_features=30, bias=True)
(act): ReLU()
)
(pro_flat): LinearBlock(
(norm): _None()
(dropout): Dropout(p=0.2, inplace=False)
(linear): Linear(in_features=150, out_features=30, bias=True)
(act): ReLU()
)
(lin_out0): LinearBlock(
(norm): _LayerNorm(
(norm): LayerNorm(64)
)
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=64, out_features=2048, bias=True)
(act): CELU(alpha=1.0)
)
(lin_out1): LinearBlock(
(norm): _LayerNorm(
(norm): LayerNorm(2048)
)
(dropout): Dropout(p=0.1, inplace=False)
(linear): Linear(in_features=2048, out_features=2, bias=True)
(act): _None()
)
)
################################################################################
Training start...
0%| | 0/20 [00:00<?, ?it/sbatch 0 training loss: 0.79090 time elapsed 0.00 hrs (0.0 mins)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.