benevolentai / deeplytough Goto Github PK

View Code? Open in Web Editor NEW

151.0 151.0 39.0 32.53 MB

DeeplyTough: Learning Structural Comparison of Protein Binding Sites

License: Other

Dockerfile 2.24% Shell 1.52% Python 96.24%

3d-models deep-learning drug-discovery metric-learning protein-structure

deeplytough's People

Contributors

Stargazers

Watchers

Forkers

chemphy tonydeep masterwhook boyuezhong xxffliu boston123456 ljx-liujiaxin rdk virtualchemist hyperdimensions bbyun28 z5476t4508 kiwichen2003 tianbiao-yang srkm009 mys007 sailfish009 greitzmann z-linlinlin drdjbaker aspirincode thyroxine shunsunsun hyunp2 minghao2016 joshuameyers zhangshd tanxiaoqin888 jiaming28 robertlizatovic truatpasteurdotfr amdens-sci bbgao stefdoerr l40s38 rnaimehaom fehraaron minhgiang174 qqlaoxia

deeplytough's Issues

Dockerimage

Can you make the Dockerimage available for download? The computing cluster at my University does not permit the program Docker on our computer. Instead, we have udocker. So I can run a docker image, but I can't build the docker image from the docker file that you have available

Running custom dataset

Hi,
I just started out with your tool and run into similar problems as others already reported and wanted to share my solutions, which worked for me.

I followed the installation instructions as indicated here: https://github.com/JoshuaMeyers/DeeplyTough#code-setup

first issue was the installation of mdtraj, which could not be build the wheel for, instead of the frozen version I used the current version.
After setting up the installation and tried to run the custom dataset as indicated here: https://github.com/BenevolentAI/DeeplyTough#evaluation, I got an issue with the second conda environment, in deeplytough/misc/utils.py line 142 you used "source activate", my conda could not find the environment anymore, this I changed it to "conda activate"
Third Issue arose somewhere in the mgltools library, which needs numpy.oldnumerics. As per installation I did not have numpy installed here, and found that the version 1.8.1 will fullfill those requirements.

Long story short:
change requirements.txt to the current version of mdtraj (1.9.7 currently)
install pip install numpy==1.8.1 after activating deeplytough_mgltools environment
change conda environment change call in deeplytough/misc/utils.py line 142 to "conda activate" (dependen on conda version (mine is 4.6)

I hope that this will help you and others facilitating the installation process.

Custom evaluation failure

I've been using DeeplyTough for a while now, and recently our lab changed computers. I've been having issues with getting DeeplyTough to work on the new computer. I've been using the command
python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --output_dir $DEEPLYTOUGH/results --device 'cpu' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar
to initiate the comparison, and I've been using the sample dataset provided, but I can't seem to get it to work.

This is the output I've been getting

HTMD: Logging setup failed
INFO:misc.utils:Pre-processing /uufs/chpc.utah.edu/common/home/u1261874/storage/Nathan/research/learning_programs/DeeplyTough_new/DeeplyTough-master/datasets/custom/1a05B/1a05B.pdb with HTMD...
setting PYTHONHOME environment
adding gasteiger charges to peptide

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 1)

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 2)

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 3)

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 4)

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 5)

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.
but OpenBabel found ' ' (atom 6)

*** Open Babel Warning in parseAtomRecord
WARNING: Problems reading a PDB file
Problems reading a HETATM or ATOM record.
According to the PDB specification,
columns 77-78 should contain the element symbol of an atom.

htmd.molecule.voxeldescriptors

I'm having an issue with import htmd.molecule.voxeldescriptors in the msc/utils.py file. I've installed the newest version of htmd through conda, but it does not appear to have the voxeldescriptors module within it. I've gone through my miniconda files searching for it, but for whatever reason, it is not there. htmd.molecule.molecule is there, so I don't think I've installed htmd incorrectly. Is it possible a newer version of htmd exists that removed that or moved it somewhere else? Any help would be appreciated

Custom Evaluation =

I've been trying to set up the custom evaluation tool and I keep getting this error while I run it

File "~/DeeplyTough-master/deeplytough/scripts/custom_evaluation.py", line 50
fname = f"Custom-{args.alg}-{os.path.basename(os.path.dirname(args.net))}.pickle"

Do you have any insight on what is going on?

General Question on Usage

I have just a general question about usage. I've been using the TOUGH-M1 dataset to compare a bunch of pockets. When I look at the results I get values ranging from 0 to -1. What would you say is generally the best number to use as a cutoff for saying two things have the same binding capabilities but without introducing too many false positives? I look at figure 2 with the ROC curve in the paper and it appears to me that at around 80% accuracy for true positives, and ~10% false negatives seems to be an ideal position. But that ROC curve doesn't give any cutoff values for what each point in the curve correlates to. I would greatly appreciate this type of information. Thanks

Alternative Usage

Recently our lab has collected a large amount of data on a specific class of proteins and what they bind to. This protein class is not sampled in the TOUGH or VERTEX dataset, such that when we attempt to distinguish binding pockets in this class of proteins using DeeplyTough, it fails, presumably because it wasn't trained on this class of proteins. My question is, is there some way we could use DeeplyTough on our own data to train our own dataset which could then be used to distinguish between these protein classes?

HTMD: Logging setup failed

HI, there
after following the code setup, i tried to eval on custom dataset, and it raised:
HTMD: Logging setup failed
can someone show me how to fix it

No module named 'htmd.molecule.voxeldescriptors'

Hi,
I encountered an error when running your code:

  Traceback (most recent call last):
    File "/data/zhangjunyi/DeeplyTough/deeplytough/scripts/toughm1_benchmark.py", line 6, in <module>
      from datasets import ToughM1
    File "/data/zhangjunyi/DeeplyTough/deeplytough/datasets/__init__.py", line 1, in <module>
      from .toughm1 import ToughM1
    File "/data/zhangjunyi/DeeplyTough/deeplytough/datasets/toughm1.py", line 16, in <module>
      from misc.utils import htmd_featurizer, voc_ap, RcsbPdbClusters, pdb_check_obsolete
    File "/data/zhangjunyi/DeeplyTough/deeplytough/misc/utils.py", line 10, in <module>
      import htmd.molecule.voxeldescriptors as htmdvox
  ModuleNotFoundError: No module named 'htmd.molecule.voxeldescriptors'

I have installed the htmd version '1.27.0', and I think this might be an issue with the htmd version. May I ask which version of htmd you are using?

Using DeeplyTough as an embedder

Hello Josh,

I am thinking of the possibility of using DeeplyTough as an embedder for protein pockets, so that each pocket is mapped to a vector of descriptors. Could you provide some guidance on how these could be obtained?

Also, is it possible to process a custom pdb as the input containing only the pocket residues, instead of relying on the automated pocket detection?

Thank you very much

Pretrained model availability

Dear Authors,

Congratulations on an innovative publication! I was wondering if you would be willing to make the pretrained models available?

Thanks, Raghav

mdtraj==1.9.3 may cause failure when installing mdtraj

My solution is use mdtraj==1.9.9 instead.
someone says changing the cython version also helps, but I failed.

Custom dataset evaluation error

I tried to run the script for vertex, TOUGH-M1 and prospeccts and it runs perfectly and I could produce the same AUC as mentioned in your paper. Now I tried to run the custom script and I get this warning:
No HTMD could be found but {} PDB files were given, please call preprocess_once() on the dataset'.format(len(pdb_list))
AssertionError: No HTMD could be found but 8 PDB files were given, please call preprocess_once() on the dataset

I tried to look to the script and I found the entries were good and the npz path were identified and created but nothing was created there (numpy matrix or the receptor pdbqt).

Can you help me with that??

Thanks

Warnings in output

As I'm running the custom evaluation on the sample dataset I get the following warnings at the very end

/uufs/chpc.utah.edu/common/home/u1261874/software/pkg/miniconda3/envs/deeplytough/lib/python3.6/site-packages/torch/utils/checkpoint.py:21: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
/uufs/chpc.utah.edu/common/home/u1261874/software/pkg/miniconda3/envs/deeplytough/lib/python3.6/site-packages/torch/nn/functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:53<00:00, 53.59s/it]
7it [00:00, 7270.96it/s]

I still get an output, I just want to make sure that this is normal, and not messing up the results I'm getting. If so, what do I need to do to fix it?

Thanks

Cannot retrieve some cluster files

Hi.

I executed the command to evaluate on the Vertex dataset or the ProSPECCTS dataset.
But I found almost the same error like below.

(I exported as $STRUCTURE_DATA_DIR = $DEEPLYTOUGH/datasets_structure. Also, I omitted the path to the repository)

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5324k 100 5324k 0 0 1471k 0 0:00:03 0:00:03 --:--:-- 1472k
INFO:datasets.vertex:Preprocessing: downloading data and extracting pockets, this will take time.
INFO:root:cluster file path: DeeplyTough/datasets_structure/bc-30.out
WARNING:root:Cluster definition not found, will download a fresh one.
WARNING:root:However, this will very likely lead to silent incompatibilities with any old 'pdbcode_mappings.pickle' files! Please better remove those manually.
Traceback (most recent call last):
File "DeeplyTough/deeplytough/scripts/vertex_benchmark.py", line 68, in
main()
File "DeeplyTough/deeplytough/scripts/vertex_benchmark.py", line 32, in main
database.preprocess_once()
File "DeeplyTough/deeplytough/datasets/vertex.py", line 49, in preprocess_once
clusterer = RcsbPdbClusters(identity=30)
File "DeeplyTough/deeplytough/misc/utils.py", line 248, in init
self._fetch_cluster_file()
File "DeeplyTough/deeplytough/misc/utils.py", line 262, in _fetch_cluster_file
self._download_cluster_sets(cluster_file_path)
File "DeeplyTough/deeplytough/misc/utils.py", line 253, in _download_cluster_sets
request.urlretrieve(f'https://cdn.rcsb.org/resources/sequence/clusters/bc-{self.identity}.out', cluster_file_path)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "anaconda3/envs/deeplytough/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

I successed when evaluated on the TOUGH-M1 dataset, so I'm afraid of some URL to the Vertex and ProSPECCTS data is expired.
Would you mind check about that?

custom data

Hello, I am a graduate student majoring in computer science. I want to use this software to find the corresponding binding site information for the protein PDB information I found and compare the binding sites. Do I need to provide the binding site information myself or can the software find it automatically?

datasets_downloader.sh

Hi,
I ran into a tricky problem. When I run the command datasets_downloader.sh, it cannot download the full data. I hope someone can help me with this problem.

RuntimeError for custom_evaluation.py

Hello,

When I try to run custom_evaluation.py on the example custom pairs, I get the following error:

Traceback (most recent call last):
File "deeplytough/scripts/custom_evaluation.py", line 69, in
main()
File "deeplytough/scripts/custom_evaluation.py", line 41, in main
entries = matcher.precompute_descriptors(entries)
File "/data/applic/DeeplyTough/deeplytough/matchers/deeply_tough.py", line 46, in precompute_descriptors
feats = load_and_precompute_point_feats(self.model, self.args, pdb_list, point_list, self.device, self.nworkers, self.batch_size)
File "/data/applic/DeeplyTough/deeplytough/engine/predictor.py", line 45, in load_and_precompute_point_feats
outputs = model(inputs)
File "/data/miniconda3/envs/deeplytough/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/applic/DeeplyTough/deeplytough/engine/models.py", line 95, in forward
inputs = module(inputs)
File "/data/miniconda3/envs/deeplytough/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/miniconda3/envs/deeplytough/lib/python3.6/site-packages/se3cnn-0.0.0-py3.6.egg/se3cnn/blocks/gated_block.py", line 154, in forward
File "/data/miniconda3/envs/deeplytough/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/miniconda3/envs/deeplytough/lib/python3.6/site-packages/se3cnn-0.0.0-py3.6.egg/se3cnn/batchnorm.py", line 266, in forward
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:258

I'm running DeeplyTough on an NVIDIA RTX A4500 GPU and did the setup of the conda environment exactly as described on this page. What could be the issue here?

Assertion Error : No htmd could be found

I was using DeeplyTough for a user-defined data set , I followed the steps mentioned in "Custom Dataset" of your article and
1.Added the path for STRUCTURE_DATA_DIR environment variable in bashrc file. For testing purposes, I took one pair of PDB structures, their pockets in .pdb format and a csv file for their pairing. I kept all of this in datasets/custom directory.
2.Executed "python $DEEPLYTOUGH/deeplytough/scripts/custom_evaluation.py --dataset_subdir 'custom' --db_preprocessing 1 --output_dir $DEEPLYTOUGH/results --device 'cuda:0' --nworkers 4 --net $DEEPLYTOUGH/networks/deeplytough_toughm1_test.pth.tar"

I am getting the following warning and error:

2020-08-11 11:42:54,118 - root - WARNING - HTMD featurization file not found: /home/iiitd/gayatrip/indigen_v2/final_pdbs/DeeplyTough master/datasets/processed/htmd/custom/ind_pdbs/6I83_clean.npz,corresponding pdb likely could not be parsed

AssertionError: No HTMD could be found but 11PDB files were given, please call preprocess_once() on the dataset.

Can you suggest me where am I going wrong and what can I do rectify this error?

Vertex dataset lost some pocket PDB files

Hi, glad to see your excellent work!
I followed your code to preprocess TOUGH-M1 and Vertex for training and evaluation with --db_preprocessing set set 0, just trying to attain the same splits used in your work. However, I encountered some problems blew:

I can't find corresponding pocket PDB files in several pocket paths, e.g. DeeplyTough/STRUCTURE_DATA_DIR/Vertex/4cmt/4cmt_site_2.pdb, DeeplyTough/STRUCTURE_DATA_DIR/Vertex/4a9t/4a9t_site_2.pdb , and DeeplyTough/STRUCTURE_DATA_DIR/Vertex/4anu/4anu_site_2.pdb, etc, which resulted in invalid 23,380 pocket pairs and didn't match the number in the paper (1,461,668 positive and 102,935 negative pocket pairs, 1,564,603 pairs in total).
I got 6580 structures (unmatch with your result, 6548 structures) left for training after filtering TOUGH-M1 for the evaluation of Vertex with --db_exclude_vertex set 'seqclust', but the number of pocket pairs constructed from these 6580 TOUGH-M1 structures totally accorded with your result, 710,009 pairs, which made me very confused.

I would really appreciate it if I could get your help! Looking forward to your reply!

benevolentai / deeplytough Goto Github PK

deeplytough's People

Contributors

Stargazers

Watchers

Forkers

deeplytough's Issues

HTMD: Logging setup failed INFO:misc.utils:Pre-processing /uufs/chpc.utah.edu/common/home/u1261874/storage/Nathan/research/learning_programs/DeeplyTough_new/DeeplyTough-master/datasets/custom/1a05B/1a05B.pdb with HTMD... setting PYTHONHOME environment adding gasteiger charges to peptide

*** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 1)

*** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 2)

*** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 3)

*** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 4)

*** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 5)

*** Open Babel Warning in parseAtomRecord WARNING: Problems reading a PDB file Problems reading a HETATM or ATOM record. According to the PDB specification, columns 77-78 should contain the element symbol of an atom. but OpenBabel found ' ' (atom 6)

Recommend Projects

Recommend Topics

Recommend Org