luost26 / 3d-generative-sbdd Goto Github PK
View Code? Open in Web Editor NEW💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)
License: MIT License
💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)
License: MIT License
Hi @luost26,
Thank you for this interesting work. I have a question about the bond inference process.
After sampling, we have the generated atom types and their positions, and our goal is to infer the bonds among these atoms. In this line, it seems that we can decide if an added bond should be aromatic or not. For this line, I am confused about what is the indicator
for test data used in generation. From my understanding, what we have after sampling are generated atom types and their positions. How can we determine if the generated atoms/bonds should be aromatic or not?
Thank you in advance. Any discussions from others are also warmly welcome.
Hi,
I have used your vina score script "docking.py" recently. However, I have met the following error,
Traceback (most recent call last):
File "/sample_geo.py", line 519, in
g_vina_score = vina_task.run_sync()
File "/evaluation/docking.py", line 168, in run_sync
while self.get_results() is None:
File "/evaluation/docking.py", line 182, in get_results
self.results = parse_qvina_outputs(self.docked_sdf_path)
File "/evaluation/docking.py", line 24, in parse_qvina_outputs
suppl = Chem.SDMolSupplier(docked_sdf_path)
OSError: File error: Invalid input file /tmp/olzbpempbkdjotcpxfbqvwjfnxltvc_ligand_out.sdf
I found that error was caused by the missing script “prepare_receptor4.py ”.
Could you please help me solve it or provide this script?
Thank you for your attention.
I read your paper which looks great.
I tried to run according to the instruction, sample.py, using the sample.yml and the two model ckpts downloaded from google drive. I waited for a long time (like half an hour) , and it still cannot produce a valid molecule, it shows [Pool] Queue 300 | Finished 0 | Failed 55.
I tried to debug, I found that most generated graph does not pass the if statement "if data_next.status == STATUS_FINISHED:".
Even for those few that can pass the if statement, it failed during "rdmol = reconstruct_from_generated(data_next)", which throws exception shown as "Ignoring, because reconstruction error encountered."
--------------------------------------------- Error Information --------------------------------------
(SBDD-3D) root@10-90-43-152:/home/jenkins/zhuyuxiao/SBDD_Project/3D-Generative-SBDD# python sample.py ./configs/sample.yml --data_id {i}
Traceback (most recent call last):
File "sample.py", line 7, in
from torch_geometric.data import Batch
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/site-packages/torch_geometric/init.py", line 4, in
import torch_geometric.data
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/site-packages/torch_geometric/data/data.py", line 3, in
from torch_geometric.typing import OptTensor, NodeType, EdgeType
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/site-packages/torch_geometric/typing.py", line 4, in
from torch_sparse import SparseTensor
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/site-packages/torch_sparse/init.py", line 19, in
torch.ops.load_library(spec.origin)
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/site-packages/torch/_ops.py", line 110, in load_library
ctypes.CDLL(path)
File "/home/miniconda3/envs/SBDD-3D/lib/python3.8/ctypes/init.py", line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: libc10_cuda.so: cannot open shared object file: No such file or directory
--------------------------------------------- My Conda Env --------------------------------------
packages in environment at /home/miniconda3/envs/SBDD-3D:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 1.0.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.7.4.post0 py38h497a2fe_0 conda-forge
async-timeout 3.0.1 py_1000 conda-forge
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
biopython 1.79 py38h497a2fe_0 conda-forge
blas 1.0 mkl conda-forge
blinker 1.4 py_1 conda-forge
boost 1.74.0 py38hc10631b_3 conda-forge
boost-cpp 1.74.0 h9359b55_0 conda-forge
brotlipy 0.7.0 py38h497a2fe_1001 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f8727e_0 defaults
ca-certificates 2022.07.19 h06a4308_0 defaults
cachetools 4.2.4 pyhd8ed1ab_0 conda-forge
cairo 1.16.0 h3fc0475_1005 conda-forge
certifi 2022.6.15 py38h578d9bd_0 conda-forge
cffi 1.14.6 py38ha65f79e_0 conda-forge
chardet 4.0.0 py38h578d9bd_3 conda-forge
charset-normalizer 2.0.10 pyhd8ed1ab_0 conda-forge
click 8.0.3 py38h578d9bd_1 conda-forge
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cpuonly 2.0 0 pytorch
cryptography 35.0.0 py38ha5dfef3_0 conda-forge
cudatoolkit 11.3.1 h2bc3f7f_2 defaults
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dataclasses 0.8 pyhc8e2a94_3 conda-forge
decorator 4.4.2 py_0 conda-forge
easydict 1.9 py_0 conda-forge
ffmpeg 4.3 hf484d3e_0 pytorch
fontconfig 2.13.1 hba837de_1005 conda-forge
freetype 2.11.0 h70c0345_0 defaults
gettext 0.19.8.1 h0b5b191_1005 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
glib 2.69.1 h4ff587b_1 defaults
gmp 6.2.1 h58526e2_0 conda-forge
gnutls 3.6.15 he1e5248_0 defaults
google-auth 2.3.3 pyh6c4a22f_0 conda-forge
google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge
googledrivedownloader 0.4 pyhd3deb0d_1 conda-forge
grpcio 1.42.0 py38hce63b2e_0 defaults
icu 67.1 he1b5a44_0 conda-forge
idna 3.1 pyhd3deb0d_0 conda-forge
importlib-metadata 4.10.0 py38h578d9bd_0 conda-forge
intel-openmp 2021.4.0 h06a4308_3561 defaults
jinja2 3.0.3 pyhd8ed1ab_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
kiwisolver 1.3.1 py38h1fd1430_1 conda-forge
lame 3.100 h7f98852_1001 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.35.1 hea4e1c9_2 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 7.5.0 h14aa051_20 conda-forge
libgfortran4 7.5.0 h14aa051_20 conda-forge
libgomp 12.1.0 h8d9b700_16 conda-forge
libiconv 1.15 h516909a_1006 conda-forge
libidn2 2.3.2 h7f98852_0 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libprotobuf 3.15.8 h780b84a_0 conda-forge
libstdcxx-ng 9.3.0 h6de172a_19 conda-forge
libtasn1 4.16.0 h27cfd23_0 defaults
libtiff 4.2.0 hbd63e13_2 conda-forge
libunistring 0.9.10 h7f98852_0 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libuv 1.40.0 h7f98852_0 conda-forge
libwebp 1.2.0 h3452ae3_0 conda-forge
libwebp-base 1.2.0 h7f98852_2 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxml2 2.9.10 h72b56ed_2 conda-forge
libzlib 1.2.11 h36c2ea0_1013 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
markdown 3.3.6 pyhd8ed1ab_0 conda-forge
markupsafe 2.0.1 py38h497a2fe_0 conda-forge
matplotlib-base 3.3.4 py38h0efea84_0 conda-forge
mkl 2021.4.0 h06a4308_640 defaults
mkl-service 2.4.0 py38h497a2fe_0 conda-forge
mkl_fft 1.3.1 py38hd3c417c_0 defaults
mkl_random 1.2.2 py38h1abd341_0 conda-forge
multidict 5.2.0 py38h7f8727e_2 defaults
ncurses 6.3 h7f8727e_2 defaults
nettle 3.7.3 hbbd107a_1 defaults
networkx 2.5.1 pyhd8ed1ab_0 conda-forge
numpy 1.21.2 py38h20f2e39_0 defaults
numpy-base 1.21.2 py38h79a1101_0 defaults
oauthlib 3.1.1 pyhd8ed1ab_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openbabel 3.1.1 py38hf4b5c11_1 conda-forge
openh264 2.1.1 h780b84a_0 conda-forge
openssl 1.1.1q h166bdaf_0 conda-forge
pandas 1.2.5 py38h1abd341_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pillow 8.4.0 py38h5aabda8_0 defaults
pip 21.2.4 pyhd8ed1ab_0 conda-forge
pixman 0.38.0 h516909a_1003 conda-forge
protobuf 3.15.8 py38h709712a_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycairo 1.20.1 py38hf61ee4a_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyg 2.0.3 py38_torch_1.10.0_cpu pyg
pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge
pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.6 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 py38h578d9bd_5 conda-forge
python 3.8.12 h12debd9_0 defaults
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-lmdb 0.99 py38h709712a_0 conda-forge
python-louvain 0.15 pyhd8ed1ab_1 conda-forge
python_abi 3.8 2_cp38 conda-forge
pytorch 1.10.1 py3.8_cpu_0 pytorch
pytorch-cluster 1.5.9 py38_torch_1.10.0_cu113 pyg
pytorch-mutex 1.0 cpu pytorch
pytorch-scatter 2.0.9 py38_torch_1.10.0_cu113 pyg
pytorch-sparse 0.6.12 py38_torch_1.10.0_cu113 pyg
pytorch-spline-conv 1.2.1 py38_torch_1.10.0_cu113 pyg
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py38h497a2fe_0 conda-forge
rdkit 2020.09.5 py38h2bca085_0 conda-forge
readline 8.1.2 h7f8727e_1 defaults
reportlab 3.5.68 py38hadf75a6_0 conda-forge
requests 2.27.1 pyhd8ed1ab_0 conda-forge
requests-oauthlib 1.3.0 pyh9f0ad1d_0 conda-forge
rsa 4.8 pyhd8ed1ab_0 conda-forge
scikit-learn 1.0.2 py38h51133e4_1 defaults
scipy 1.7.3 py38hc147768_0 defaults
setuptools 58.0.4 py38h578d9bd_2 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sqlalchemy 1.3.23 py38h497a2fe_0 conda-forge
sqlite 3.37.0 hc218d9a_0 defaults
tensorboard 2.7.0 pyhd8ed1ab_0 conda-forge
tensorboard-data-server 0.6.0 py38h2b97feb_0 conda-forge
tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge
tk 8.6.11 h21135ba_0 conda-forge
torchaudio 0.10.1 py38_cu113 pytorch
torchvision 0.11.2 py38_cu113 pytorch
tornado 6.1 py38h497a2fe_1 conda-forge
tqdm 4.62.3 pyhd8ed1ab_0 conda-forge
typing-extensions 3.10.0.2 hd8ed1ab_0 conda-forge
typing_extensions 3.10.0.2 pyha770c72_0 conda-forge
urllib3 1.26.8 pyhd8ed1ab_1 conda-forge
werkzeug 2.0.1 pyhd8ed1ab_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
yacs 0.1.6 py_0 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
yarl 1.6.3 py38h497a2fe_2 conda-forge
zipp 3.7.0 pyhd8ed1ab_1 conda-forge
zlib 1.2.11 h36c2ea0_1013 conda-forge
zstd 1.4.9 ha95c52a_0 conda-forge
Hi,
Can you please explain the data structure of crossdocked_pocket10?
For example:
1eqc_A_rec_1eqc_cts_lig_tt_docked_0_pocket10.pdb
So here if we split using _
what are the terms like 1eqc
, cts
etc.
Same for .sdf
files like 4m81_A_rec_4m81_glf_lig_tt_min_0.sdf
Also how did you convert crossdocked original dataset to this pocket dataset and then how to dump data into index.pkl
file?
Thanks
Hello, Luo, when will the source code be open? Your work offered some new ideas to me and I was very curious about your work
When I ran the sample.py, it threw the error. I have checked the data folder, it does not include this file.
thanks for your great work!
i got the error when running sample.py:
ValueError: zero-size array to reduction operation maximum which has no identity
the reason is query_tmp = [], after sampling.
Line 282 in fabc986
do you know how to fix it? thanks
File "/sample.py", line 180, in
data = testset[args.data_id]
File "/conda/envs/mol/lib/python3.9/site-packages/torch/utils/data/dataset.py", line 311, in getitem
return self.dataset[self.indices[idx]]
IndexError: list index out of range
When I ran the sample, it threw the error. I have checked the dataset and subdataset, both the length of them are 0.
I ran the sample.py
script and from the pool.finished
object I get a list of data
. For each data
I call reconstruct_from_generated
to get out a rdkit molecule with a 3D conformation.
However, it seems the conformation is not in the same coordinate space as the protein pocket:
I sampled idx 50 and looked at the samples_all.pt
file, the corresponding protein pocket I loaded is thus: FKB1A_HUMAN_2_108_0/1d7j_A_rec_1tco_fk5_lig_tt_docked_0_pocket10.pdb
I see the ligand (left) is far away from the pocket (right)
Am I doing something wrong?
Sorry to bother you,I ran the test model successfully and got the smiles structures of generated molecules, however, I have no idea about how to run our own example, though we replace the test system with our own one and I also wonder the meaning of the number in the last part of this list ('YPKA_YERPS_90_433_0/5ce3_B_rec_5ce3_adp_lig_tt_min_0_pocket10.pdb', 'YPKA_YERPS_90_433_0/5ce3_B_rec_5ce3_adp_lig_tt_min_0.sdf', 'YPKA_YERPS_90_433_0/5ce3_B_rec.pdb', 0.552965)
Hi,
I am trying to train the model on a new dataset, so I run train.py and sample.py with new configuration files.
My training and sampling commands:
python train.py my_train_main_model.yml
python train.py my_train_frontier_model.yml
python sample.py my_sample.yml -i 0
my_train_main_model.yml
, my_train_frontier_model.yml
only change dataset path, and my_sample.yml
change checkpoint and dataset paths.
However, I got following error message:
[2022-02-22 07:56:33,069::sample::INFO] Namespace(config='configs/fy_sample.yml', data_id=0, device='cuda', outdir='./outputs')
[2022-02-22 07:56:33,069::sample::INFO] {'dataset': {'name': 'pl', 'path': '/home/t-yafan/workspace/data/Tgt2Drug', 'split': '/home/t-yafan/workspace/data/Tgt2Drug/Binary/v4-2/src/split_by_name.pt'}, 'model': {'main': {'checkpoint': './logs/fy_train_main_model_2022_02_21__07_57_11/checkpoints/100000.pt'}, 'frontier': {'checkpoint': './logs/fy_train_frontier_model_2022_02_21__14_30_45/checkpoints/80000.pt'}}, 'sample': {'seed': 2020, 'num_samples': 100, 'beam_size': 300, 'logp_thres': -inf, 'num_retry': 5, 'max_steps': 50}}
[2022-02-22 07:56:33,070::sample::INFO] Loading data...
[2022-02-22 07:56:33,632::sample::INFO] Loading main model...
[2022-02-22 07:56:36,019::sample::INFO] Loading frontier model...
Traceback (most recent call last):
File "sample.py", line 194, in <module>
ftnet.load_state_dict(ckpt_ft['model'])
File "/anaconda/envs/SBDD-3D/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FrontierNetwork:
Missing key(s) in state_dict: "frontier_pred.layers.0.weight", "frontier_pred.layers.0.bias", "frontier_pred.layers.1.weight", "frontier_pred.layers.1.bias", "frontier_pred.layers.2.weight", "frontier_pred.layers.2.bias", "frontier_pred.layers.3.weight", "frontier_pred.layers.3.bias".
Unexpected key(s) in state_dict: "field.lin1.weight", "field.lin2.weight", "field.lin2.bias", "field.nn.0.weight", "field.nn.0.bias", "field.nn.2.weight", "field.nn.2.bias", "field.classifier.0.weight", "field.classifier.0.bias", "field.classifier.2.weight", "field.classifier.2.bias", "field.property_pred.0.weight", "field.property_pred.0.bias", "field.property_pred.2.weight", "field.property_pred.2.bias", "field.distance_expansion.offset".
Which step goes wrong? How can I sample on trained models?
Thanks!
Hi
excuse me, I can't run the script sample.py
the error is No such file or directory: './data/crossdocked_pocket10_name2id.pt', i have no idea, could you tell how can i solve this problem? thank you!
Sorry to bother you. May I ask you how to evaluate the performance of the model since there is no script for evaluation?
我gnina下载了原始的crossdocked2020数据集,但是其中并没有包括关于pocket的pdb文件(但是在你文章的数据集中有),所以想问一下pocket文件是如何得到的
We replaced the empty ligand_dict with docked fragments in the script 'samples_for_pdb.py', but the outputs of SDF files do not contain our prepared fragments anymore. It seems your model still starts from scratch not considering the input fragments at all.
# ligand_dict = {
# 'element': torch.empty([0,], dtype=torch.long),
# 'pos': torch.empty([0, 3], dtype=torch.float),
# 'atom_feature': torch.empty([0, 8], dtype=torch.float),
# 'bond_index': torch.empty([2, 0], dtype=torch.long),
# 'bond_type': torch.empty([0,], dtype=torch.long),
# }
path='ZINC_2_frag.sdf'
ligand_dict = parse_sdf_file(path)
ligand_dict=torchify_dict(ligand_dict)
Please share your codes or pipeline about the Linker Prediction of your paper.
Sorry to bother you,I ran the test model successfully and got the smiles structures of generated molecules, however, I have no idea about how to run our own example, though we replace the test system with our own one and I also wonder the meaning of the number in the last part of this list ('YPKA_YERPS_90_433_0/5ce3_B_rec_5ce3_adp_lig_tt_min_0_pocket10.pdb', 'YPKA_YERPS_90_433_0/5ce3_B_rec_5ce3_adp_lig_tt_min_0.sdf', 'YPKA_YERPS_90_433_0/5ce3_B_rec.pdb', 0.552965)
Hi, your work is very fantastic!
May I ask two questions? What is the meaning of the numbers (for example 0.367042 as below) in the data/crossdocked_pocket10/index.pkl? The index is like the following:
[('1B57_HUMAN_25_300_0/5u98_D_rec_5u98_1kx_lig_tt_min_0_pocket10.pdb', '1B57_HUMAN_25_300_0/5u98_D_rec_5u98_1kx_lig_tt_min_0.sdf', '1B57_HUMAN_25_300_0/5u98_D_rec.pdb', 0.367042), ...,]
The index implies that we need to give the receptor protein .pdb file. However in the cross docked_pocked10, we only saw two pocked substructure and didn't protein .pdb file.
FileNotFoundError: [Errno 2] No such file or directory: './data/crossdocked_pocket10_name2id.pt'
Thanks
Hello, I appreciate your excellent work. I am also attempting to train this model on a larger dataset and have started with the CrossDocked2020-v1.3 dataset, which has an RMSD < 2A. This dataset already includes clustered training and test data distributions, so I'm unsure about the need for mmseqs2 clustering. Could you please explain the reason for this step?
Additionally, the CrossDocked2020 dataset has various docking forms, such as Autodock Vina docked poses of ligands in the receptor and the first and second iterations of CNN-optimized poses. Which of these did you use in your training process?
Lastly, I noticed that the .PDB files in your training dataset are smaller than those in the CrossDocked2020 dataset. Did you perform any extra processing steps to obtain these smaller receptor files?
Thank you in advance for your time and assistance!
Hi!
generated molecules from my personal pdb as input(not seems to be in crossdocked file) overall seem to be short. I have 2 questions
How can we start with a given scaffold?
If i increase the pocket size, would it increase? Because my results show that they typically don't
Thanks
Hi,
I've read your paper and the idea is impressive.
I successfully run your sample code, but each time I run the same input I got different results. Though the seed is set fixed, it seems that there is still some randomness in the sample procedure. This problem appears on both CPU and CUDA environments.
Could you explain which step may cause this? If possible, how can we fix this so that same results can be reproduced?
Thanks!
Kris
Hi, thanks for sharing,
could you please tell me how to use SBDD to generate molecules on MY protein (which already have a ligand) ? not the given protein from crossdock dataset
Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.