qizhipei / fabind Goto Github PK
View Code? Open in Web Editor NEWFABind: Fast and Accurate Protein-Ligand Binding (NeurIPS 2023)
Home Page: https://arxiv.org/abs/2310.06763
License: MIT License
FABind: Fast and Accurate Protein-Ligand Binding (NeurIPS 2023)
Home Page: https://arxiv.org/abs/2310.06763
License: MIT License
Hi,
This is not an issue per se, more a question and suggestions.
in your ex, the structure of the file is:
Cleaned_SMILES,pdb_id
CCC@HC@HC(=O)NC@@HC(=O)NC@@HC(=O)NC@HC(C)C,6efk
CC(C)CCN1c2nc(Nc3cc(F)c(O)c(F)c3)ncc2N(C)C(=O)C1(C)C,6g3c
CC(C)(COP@@(O)OP@(O)OC[C@H]1OC@@HC@H[C@@h]1OP(=O)(O)O)C@@HC(=O)NCCC(=O)NCCO,6n93
O=C(O)c1ccccc1-n1cccc1,6npi
and the output are sd files with a name composed of pdb_id + a number. If I were to generate million of poses on one receptor... so on a virtual screen setup, it would be more practical that instead of the protein_id, it would be a compound id. It is just a suggestion for an eventual update if you do not have already a screening mode. If cmpd_id would be in the header than instead of appending a number to the name of the sd file, it would be the cmpd_id.
The poses generated are saved in sdf, can they be saved in mol2 files? It is not a big problem as I can convert them but converting multi-million files takes some time so having them directly in mol2 files would save some time. as most of the rescoring tools accept mol2 files.
Best,
Christian
Hi Qizhi,
Does FABind provide some confidence/affinity score that I use to decide whether a pair of protein-ligand can bind or not?
I have tried setting the confidence score as
pred_index_true = pocket_cls_pred[i][:j].sigmoid().unsqueeze(-1) # pocket predicted probability
pred_index_false = 1. - pred_index_true
pred_index_prob = torch.cat([pred_index_false, pred_index_true], dim=-1)
pred_index_log_prob = torch.log(pred_index_prob)
pred_index_one_hot = gumbel_softmax_no_random(pred_index_log_prob,
tau=self.args.gs_tau,
hard=self.args.gs_hard)
pred_index_one_hot_true = pred_index_one_hot[:, 1].unsqueeze(-1)
pred_confidence_gumbel = pred_index_one_hot_true * pred_index_true
pred_pocket_confidence[i] = pred_confidence_gumbel.sum(dim=0) / pred_index_one_hot_true.sum(dim=0)
However, I find the confidence scores are very high for arbitrary protein-ligand pairs, like > 0.9.
Therefore, I would like to ask if you have a better suggestion.
The README says soon, any idea when that would be? I'd be really interested in trying it!
How can I start an inference using a pdb for the protien, and a 'mol2' or 'sdf' file for the ligand?
Return complex hidden features in inference so people can also use these features to do further prediction or other tasks.
Hi,
I'm quite impressed with the concept presented in your work; it has the potential to save considerable time. I attempted to replicate your model in inference mode, as described in README, but encountered discrepancies in both RMSD scores and visualizations compared to the reported results.
Here are the RMSD scores I obtained:
6g3c rmsd: 13.472411671189839
6npi rmsd: 13.028547491615917
Additionally, I observed differences in the docking visualizations between my replication attempts in inference mode and the results reported in your paper:
Reproduced with Inference | Reported in Paper |
---|---|
Reproduced with Inference | Reported in Paper |
---|---|
Could you kindly offer any advice or insights on how to replicate your published results accurately?
Best regards,
Ahmet
The README shows how to do inference on custom complexes. Is there a way to do sampling on custom complexes? I only see the test_sampling_fabind.py file which doesn't seem designed for custom complexes.
Hi, while running the inference script after installing all the required packages, I encountered this error. I do have an Nvidia GPU and it was working fine for other projects but not this time.
====== preprocess molecules ======
No CUDA runtime is found, using CUDA_HOME='/usr/bin/nvcc'
====== preprocess proteins ======
After following the instructions provided in the README to run inference on custom complexes, I noticed that the uid_smiles_sdfname.csv file was created, but no results were written into it.
I suspect the issue may be related to an error encountered during the execution of post_optim_mol at line 371 in fabind_inference.py. To further investigate the problem, I removed the try block surrounding this part of the code. This action led to the generation of the following error message:
Traceback (most recent call last):
File "fabind_inference.py", line 382, in
post_optim_mol(args, accelerator, data, com_coord_pred, com_coord_pred_per_sample_list, com_coord_per_sample_list, compound_batch, LAS_tmp=LAS_tmp, rigid=args.rigid)
File "fabind_inference.py", line 288, in post_optim_mol
com_coord_i = data[i]['compound'].rdkit_coords
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 177, in getitem
return self.get_example(idx) # type: ignore
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 124, in get_example
data = separate(
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/separate.py", line 35, in separate
attrs = slice_dict[key].keys()
KeyError: 'complex'
It seems the error might be preventing the successful writing of results to the uid_smiles_sdfname.csv file. Any assistance in resolving this issue and enabling the proper output to the file would be greatly appreciated.
Could you please offer the pretained model?
Hi,
When I used your code to infer your example pdbs, I got no result. I checked the error message and found the following:
Traceback (most recent call last):
File "fabind_inference.py", line 372, in
post_optim_mol(args, accelerator, data, com_coord_pred, com_coord_pred_per_sample_list, com_coord_per_sample_list, compound_batch, LAS_tmp=LAS_tmp, rigid=args.rigid)
File "fabind_inference.py", line 288, in post_optim_mol
com_coord_i = data[i]['compound'].rdkit_coords
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 177, in getitem
return self.get_example(idx) # type: ignore
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 124, in get_example
data = separate(
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/separate.py", line 35, in separate
attrs = slice_dict[key].keys()
KeyError: 'complex'
Could please you help me with this error? Thank you very much!
Hi,
I just saw that fabind_inference.py was updated 2 weeks ago. Is it updated to FABind+ or it is just an update of FABind? Are you going to create a new project for FABind+ or just update the FABind project?
Thank you,
Christian
Hi,
I was able to run your examples w/o problems. However, I did not find a file that contains the binding scores. Is FABind generating only poses and then I need to use a rescoring tool such as BR-Nib to sort the binding poses?
Thanks for you help,
Christian
Hi,
In FABind+ the processing of the ligand give me this error:
(fabind) christian@christian-linux01:/media/christian/VS/VS/VS_FABind_plus/fabind$ python inference_preprocess_mol_confs.py --index_csv ../inference_examples/example.csv --save_mols_dir ../inference_examples/temp_files
Traceback (most recent call last):
File "inference_preprocess_mol_confs.py", line 22, in
smiles, pdb = line.strip().split(',')
ValueError: too many values to unpack (expected 2)
I think that originally the name of the molecule was not indicated in the smiles, now it contains 3 columns
If I preprocess a library of smiles and I want to reuse it with other targets, I guess I do not have to reprocess it again?
Thanks,
Christian
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.