longlongman / desert Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 9.0 377 KB

Zero-Shot 3D Drug Design by Sketching and Generating (NeurIPS 2022)

Python 99.73% Thrift 0.06% Shell 0.21%

desert's People

Contributors

Stargazers

Watchers

Forkers

shunsunsun xiongb-lab anyuese vivekbisht467 lyndonlens highdxy deargen jingqiong pratyusha-code

desert's Issues

question regarding encoder-decoder

Hi longlongman,

I've tried using your encoder-decoder on CASP-ligands but wasn't successful in fully decoding(recovering) the ligands after encoding (molecule cut off, some atoms missing)

I'm suspecting whether if it's because the patch size for the shapepretrainingencoder is too small? or maybe due to maxlen_coef being too small?

any ideas on getting the full molecule?

thanks

Dataset unable to download for preprocessing

I want to reproduce the results of this paper for my research work but I am not able to download datasets from ZINC20 or ZINC15 for the data preprocessing, the links I am using are: https://zinc20.docking.org/substances/, https://zinc15.docking.org/substances/
the downloads are not happening like after sometimes it says failed, Is there any google drive/one drive link where the data is already available to use or any other references will also help. Please help me out to sort this issue.

Thanks in advance.

question about the vocab file

Dear Longlongman,

is the vocab file provided, a partial vocab file?? If it isn't could you provide the full vocab file? some fragments aren't being properly decoded compared to the original. Maybe this is the reason why?

thanks

issue with get_training_data

after get_fragment_vocab i get 2 pkl files

at get_training_data i put in
BRICS_RING_R.vocab.pkl path..

there's an error

Traceback (most recent call last):

frag_idx = vocab[frag_smi][2]

TypeError: 'Mol' object is not subscriptable

how to reduce batch size during generation

hi, my gpu keeps busting up when i'm trying to generate.

horovodrun -np 8 bycha-run
--config configs/generating.yaml
--lib shape_pretraining
--task.mode evaluate
--task.data.train.path data
--task.data.valid.path.samples /home/kiwoong/DESERT/data/sample_shapes.pkl
--task.data.test.path.samples /home/kiwoong/DESERT/data/sample_shapes.pkl
--task.dataloader.train.max_samples 1
--task.dataloader.valid.sampler.max_samples 1
--task.dataloader.test.sampler.max_samples 1
--task.model.path /home/kiwoong/DESERT/trainer/save_model_dir/1WW_30W_5048064.pt
--task.evaluator.save_hypo_dir /home/kiwoong/DESERT/trainer/save_hypo_di

this is the bash file I'm running right now.
How do you reduce the batch size?? In the config file for generation it seems that the only option is max_sample, which i set to 1 but increases steadily. thanks.

Some issues during installing and using

Hello longlongman,
I'm new to deep learning. Please bear with me for some dump questions.

Could you provide more details of installation? ex. Which python version and which pip
version to choose for installing?
I used python 3.7.1 and at the last step of installing ' pip install pybel scikit-image pebble
meeko==0.1.dev1 vina pytransform3d' an error message prints:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.4.0 requires typing-extensions~=3.7.4, but you have typing-extensions 4.7.1 which is incompatible.

at the pretraining stage after I executed the 'python get_training_data.py' an error occurred:

File "get_training_data.py", line 31, in <module>
    with open(vocab_path, 'rb') as fr:
IsADirectoryError: [Errno 21] Is a directory: '/home/ubuntu/user_space/DESERT/preparation/vocab'

The vocab directory contains two files produced by
'get_fragment_vocab.py' : 'BRICS_RING_R.vocab.pkl' and 'BRICS_RING_R.494789.pkl'

Since I was stuck on step 2, I tried to use the training data and vocab upload online to train the shape2mol. What's the difference between 0.pkl and 1.pkl? Do I need to use both pretrained model? I'm a bit confused about how to fill out the blanks in training.yaml

class: ShapePretrainingDatasetShard
      path: ---TRAINING DATA PATH---
      vocab_path: ---VOCAB PATH---
      sample_each_shard: 500000
      shuffle: True
    valid:
      class: ShapePretrainingDataset
      path: 
        samples: ---VALID DATA PATH---
        vocab: ---VOCAB PATH---
    test:
      class: ShapePretrainingDataset
      path: 
        samples: ---TEST DATA PATH---
        vocab: ---VOCAB PATH---

The ---TRAINING DATA PATH--- is simply the path to get 0.pkl and 1.pkl
But what about valid data path and test data path? ( I just put it same as training data path)

When executing train.sh another error occured
pkg_resources.DistributionNotFound: The 'typing-extensions~=3.7.4' distribution was not found and is required by tensorflow
And I believe it is related with the first error during installing.
4. For the sketching process, where to fill in the path for input ligand sdf in sketch.py?
When sketching the pocket by sketch.py , the following error occurred:

File "sketching.py", line 2, in <module>
   from shape_utils import get_atom_stamp
 File "/home/ubuntu/user_space/DESERT/sketch/shape_utils.py", line 3, in <module>
   from common import ATOM_RADIUS, ATOMIC_NUMBER, ATOMIC_NUMBER_REVERSE
ImportError: cannot import name 'ATOM_RADIUS' from 'common' (/home/ubuntu/miniconda3/envs/DESERT/lib/python3.7/site-packages/common/__init__.py)

For generating process

bycha-run \
   --config configs/generating.yaml \
   --lib shape_pretraining \
   --task.mode evaluate \
   --task.data.train.path data \
   --task.data.valid.path.samples ❗❗❗FILL_THIS(MOLECULE SHAPES SAMPLED FROM CAVITY)❗❗❗ \
   --task.data.test.path.samples  ❗❗❗FILL_THIS❗❗❗ \
   --task.dataloader.train.max_samples 1 \
   --task.dataloader.valid.sampler.max_samples 1 \
   --task.dataloader.test.sampler.max_samples 1 \
   --task.model.path ❗❗❗FILL_THIS❗❗❗ \
   --task.evaluator.save_hypo_dir ❗❗❗FILL_THIS❗❗❗

What should the path for task.data.test.path.samples ?
Is task.model.path the path to 1WW_30W_5048064.pt ?

Sorry for bothering

unable to generate

Hi ！
thanks for sharing your excellent work！

I am unable to do the final step of generate after modifying part of your code to successfully train and get pockets, could you please upload the latest and correct full code?

Thanks!

about Preprocessing of data

Hello, I have a question, when running the file get_fragment_vocab.py, the vocab of the fragment can be saved, why does the fragment need to be re-acquired in the file get_training_data.py, and align it with the previously saved fragment, and finally get the rotation matrix ? why do that? thank you very much.

fail to work on mac(m1)

Hi!
Thank u for sharing your excellent work！
when I try to run this on my mac book pro(m1 pro),it just doesn't work
so could u please share a dock file or something can work on mac?
thanks!

longlongman / desert Goto Github PK

desert's People

Contributors

Stargazers

Watchers

Forkers

desert's Issues

question regarding encoder-decoder

Dataset unable to download for preprocessing

question about the vocab file

issue with get_training_data

how to reduce batch size during generation

Some issues during installing and using

unable to generate

about Preprocessing of data

fail to work on mac(m1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent