qizhipei / fabind Goto Github PK

View Code? Open in Web Editor NEW

105.0 105.0 15.0 1.2 MB

FABind: Fast and Accurate Protein-Ligand Binding (NeurIPS 2023)

Home Page: https://arxiv.org/abs/2310.06763

License: MIT License

Python 100.00%

binding bioinformatics computational-biology docking machine-learning

fabind's Issues

Processing multiple smiles on one target

Hi,

This is not an issue per se, more a question and suggestions.

in your ex, the structure of the file is:
Cleaned_SMILES,pdb_id
CCC@H C@HC(=O)NC@@HC(=O)NC@@HC(=O)NC@HC(C)C,6efk
CC(C)CCN1c2nc(Nc3cc(F)c(O)c(F)c3)ncc2N(C)C(=O)C1(C)C,6g3c
CC(C)(COP@@(O)OP@(O)OC[C@H]1OC@@H C@H[C@@h]1OP(=O)(O)O)C@@HC(=O)NCCC(=O)NCCO,6n93
O=C(O)c1ccccc1-n1cccc1,6npi

and the output are sd files with a name composed of pdb_id + a number. If I were to generate million of poses on one receptor... so on a virtual screen setup, it would be more practical that instead of the protein_id, it would be a compound id. It is just a suggestion for an eventual update if you do not have already a screening mode. If cmpd_id would be in the header than instead of appending a number to the name of the sd file, it would be the cmpd_id.

The poses generated are saved in sdf, can they be saved in mol2 files? It is not a big problem as I can convert them but converting multi-million files takes some time so having them directly in mol2 files would save some time. as most of the rescoring tools accept mol2 files.

Best,
Christian

Does FABind provide some confidence/affinity score?

Hi Qizhi,

Does FABind provide some confidence/affinity score that I use to decide whether a pair of protein-ligand can bind or not?
I have tried setting the confidence score as

pred_index_true = pocket_cls_pred[i][:j].sigmoid().unsqueeze(-1) # pocket predicted probability 
pred_index_false = 1. - pred_index_true
pred_index_prob = torch.cat([pred_index_false, pred_index_true], dim=-1)

pred_index_log_prob = torch.log(pred_index_prob)
pred_index_one_hot = gumbel_softmax_no_random(pred_index_log_prob,
                                                          tau=self.args.gs_tau,
                                                          hard=self.args.gs_hard)
pred_index_one_hot_true = pred_index_one_hot[:, 1].unsqueeze(-1)
pred_confidence_gumbel = pred_index_one_hot_true * pred_index_true
pred_pocket_confidence[i] = pred_confidence_gumbel.sum(dim=0) / pred_index_one_hot_true.sum(dim=0)

However, I find the confidence scores are very high for arbitrary protein-ligand pairs, like > 0.9.
Therefore, I would like to ask if you have a better suggestion.

When will FABind+ be released?

The README says soon, any idea when that would be? I'd be really interested in trying it!

Alternatives to SMILES

How can I start an inference using a pdb for the protien, and a 'mol2' or 'sdf' file for the ligand?

return complex hidden features in inference

Return complex hidden features in inference so people can also use these features to do further prediction or other tasks.

Reproduciblity of Results with Inference Mode

Hi,

I'm quite impressed with the concept presented in your work; it has the potential to save considerable time. I attempted to replicate your model in inference mode, as described in README, but encountered discrepancies in both RMSD scores and visualizations compared to the reported results.

Here are the RMSD scores I obtained:

6g3c rmsd: 13.472411671189839
6npi rmsd: 13.028547491615917

Additionally, I observed differences in the docking visualizations between my replication attempts in inference mode and the results reported in your paper:

PDB ID: 6G3C (Cyan=FABind, Yellow=Ground Truth, Purple=Diffdock)

Reproduced with Inference	Reported in Paper

PDB ID: 6NPI (Cyan=FABind, Yellow=Ground Truth, Purple=Diffdock)

Reproduced with Inference	Reported in Paper

Could you kindly offer any advice or insights on how to replicate your published results accurately?

Best regards,
Ahmet

Sampling on custom complexes?

The README shows how to do inference on custom complexes. Is there a way to do sampling on custom complexes? I only see the test_sampling_fabind.py file which doesn't seem designed for custom complexes.

No CUDA runtime is found

Hi, while running the inference script after installing all the required packages, I encountered this error. I do have an Nvidia GPU and it was working fine for other projects but not this time.

======  preprocess molecules  ======
No CUDA runtime is found, using CUDA_HOME='/usr/bin/nvcc'
======  preprocess proteins  ======

No Results Written to uid_smiles_sdfname.csv After Running Inference on Custom Complexes

After following the instructions provided in the README to run inference on custom complexes, I noticed that the uid_smiles_sdfname.csv file was created, but no results were written into it.

I suspect the issue may be related to an error encountered during the execution of post_optim_mol at line 371 in fabind_inference.py. To further investigate the problem, I removed the try block surrounding this part of the code. This action led to the generation of the following error message:

Traceback (most recent call last):
File "fabind_inference.py", line 382, in
post_optim_mol(args, accelerator, data, com_coord_pred, com_coord_pred_per_sample_list, com_coord_per_sample_list, compound_batch, LAS_tmp=LAS_tmp, rigid=args.rigid)
File "fabind_inference.py", line 288, in post_optim_mol
com_coord_i = data[i]['compound'].rdkit_coords
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 177, in getitem
return self.get_example(idx) # type: ignore
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 124, in get_example
data = separate(
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/separate.py", line 35, in separate
attrs = slice_dict[key].keys()
KeyError: 'complex'

It seems the error might be preventing the successful writing of results to the uid_smiles_sdfname.csv file. Any assistance in resolving this issue and enabling the proper output to the file would be greatly appreciated.

No pretrained model

Could you please offer the pretained model？

Keyerror "Complex" when infer examples

Hi,

When I used your code to infer your example pdbs, I got no result. I checked the error message and found the following:

Traceback (most recent call last):
File "fabind_inference.py", line 372, in
post_optim_mol(args, accelerator, data, com_coord_pred, com_coord_pred_per_sample_list, com_coord_per_sample_list, compound_batch, LAS_tmp=LAS_tmp, rigid=args.rigid)
File "fabind_inference.py", line 288, in post_optim_mol
com_coord_i = data[i]['compound'].rdkit_coords
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 177, in getitem
return self.get_example(idx) # type: ignore
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 124, in get_example
data = separate(
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/separate.py", line 35, in separate
attrs = slice_dict[key].keys()
KeyError: 'complex'

Could please you help me with this error? Thank you very much!

FABind+

Hi,

I just saw that fabind_inference.py was updated 2 weeks ago. Is it updated to FABind+ or it is just an update of FABind? Are you going to create a new project for FABind+ or just update the FABind project?

Thank you,
Christian

Binding scores?

Hi,

I was able to run your examples w/o problems. However, I did not find a file that contains the binding scores. Is FABind generating only poses and then I need to use a rescoring tool such as BR-Nib to sort the binding poses?

Thanks for you help,
Christian

not all ligands submitted get through

Segmentation fault

When running evaluation, an error occurs when iterating through the "new_dataset" in the test set at index 17300. The error message is as follows:

problem with preprocessing smiles in FABind+

Hi,

In FABind+ the processing of the ligand give me this error:

(fabind) christian@christian-linux01:/media/christian/VS/VS/VS_FABind_plus/fabind$ python inference_preprocess_mol_confs.py --index_csv ../inference_examples/example.csv --save_mols_dir ../inference_examples/temp_files
Traceback (most recent call last):
File "inference_preprocess_mol_confs.py", line 22, in
smiles, pdb = line.strip().split(',')
ValueError: too many values to unpack (expected 2)

I think that originally the name of the molecule was not indicated in the smiles, now it contains 3 columns

If I preprocess a library of smiles and I want to reuse it with other targets, I guess I do not have to reprocess it again?

Thanks,
Christian

qizhipei / fabind Goto Github PK

fabind's Issues

PDB ID: 6G3C (Cyan=FABind, Yellow=Ground Truth, Purple=Diffdock)

PDB ID: 6NPI (Cyan=FABind, Yellow=Ground Truth, Purple=Diffdock)

Recommend Projects

Recommend Topics

Recommend Org