cch1999 / posecheck Goto Github PK

View Code? Open in Web Editor NEW

64.0 64.0 5.0 36.2 MB

Pose checks for 3D Structure-based Drug Design methods

License: MIT License

Python 98.81% Jupyter Notebook 1.19%

docking drug-design generative-model molecule-generation

posecheck's Introduction

Hi there 👋

🔭 I'm a PhD student at the University of Cambridge in the Cambridge Centre for AI in Medicine. My interests are in deep learning applied to solving problems in structural biology and drug design. I am supervised by Prof Sir Tom Blundell and Prof Pietro Lio.
🌱 I was previously at Imperial College London where I gained a background in Biochemistry and Structural Biology. Whilst there, I developed novel Geometric Deep Learning methods for protein structure under the supervision of Prof Michael Bronstein.
📫 How to reach me:
- twitter: @charlieharris01
- linkedin: in/charlieharris
- email: <cch57 [at] cam dot ac dot uk>
- website: cch1999.github.io

posecheck's People

Contributors

Stargazers

Watchers

Forkers

modelturnedgeek kir- caiyingchun amorehead andchencm

posecheck's Issues

severe bug after updating version

Dear authors,

I followed your work and use PoseCheck to calculate clash and strain energy since last year. Recently I pulled your update and re-calculated those numbers and found a severe mismatch between strain energy.

Previously, the 25%, 50%, 75% percentiles of the CrossDocked test set are: 36, 114, 205, quite reasonable.
However, after pulling, the numbers become: 1.08, 3.56, 6.19.

This is a BIG PROBLEM.

Can you re-check your updated code to ensure your new version can reproduce your paper results?

I think your last commint in 2023 should be the version of your paper results.

Thanks,
Yanru

About pc.calculate_interactions and complete environment config

Dear authors,
Thanks for sharing this amazing work.
However, we met errors when we run pc.calculate_interactions():

Traceback (most recent call last):
File "/homre/miniconda3/envs/pc/lib/python3.8/site-packages/ loguru/_locks_machinery.py", line 29, in release_locks
def release locks():
File "/home/miniconda3/envs/pc/lib/python3.8/site-packages/gsd/_init__.py", line 24, in <lambda>
signal.signal(signal.SIGTERM，lambda n,f; sys.exit(1)
SystemExit: 1
Exception ignored in: <function release locks at 0x7fa895bcb940>
Traceback (most recent call last):File "/home/miniconda3/envs/pc/lib/python3.8/site-packages/loguru/_locks_machinery.py", line 29,in release_locks
def release locks():
File "/home/miniconda3/envs/pc/lib/python3.8/site-packages/gsd/ init .py", line 24,in <lambda>
signal.signal(signal.SIGTERM,lambda n,f: sys.exit(1))
SystemExit: 1

Can you suggest what to do?

And could you please provide the complete environment configs of posecheck? that would be helpful.

Data from the paper DOI not found

Dear Authors,

It looks like the zenodo link has expired, when I try to access it it shows an error page with DOI NOT FOUND. Would you mind kindly fix it? Thank you!

Thanks,
Curtis(Yuchen) Wu

The position constraint of 0.1 angstrom did not work as expected

Dear authors,

The idea to do a small local relaxation before computing strain energies to fix some inconsistency is great, but the current implementation does not seem to work as expected. You set the max displacement of atoms to be 0.1 Å using the RDKit UFFAddPositionConstraint function here, with forceConstant=1. However, if you check the displacement of atoms using the following code with a regular ligand
6ten_ligand.txt (change the suffix to sdf before testing), you will find the displacement is above the set threshold:

from rdkit import Chem
from rdkit.Chem import ChemicalForceFields

m = Chem.SDMolSupplier('6ten_ligand.sdf')[0]
m = Chem.AddHs(m, addCoords=True)
ff = ChemicalForceFields.UFFGetMoleculeForceField(m)
conf = m.GetConformer()
p = conf.GetAtomPosition(1)
ff.UFFAddPositionConstraint(1, maxDispl=0.1, forceConstant=1)
r = ff.Minimize()
q = conf.GetAtomPosition(1)
assert((p - q).Length() < 0.1)

I did some searches into this problem, and find that the force constant needs to be large enough when the displacement of the atom is larger than maxDispl, so that when the atom moves too far it gets a great energy penalty during minimization. The solution is simply changing forceConstant to 1e5 like what is done in a RDKit test script.

I have created a PR to fix this issue, along with some minor modifications. Please check if it is appropriate.

unexpected behavior when calculating key interactions, and asking for help when using your code

Hi:

Recently I've been using PoseCheck to evaluate clash, strain energy and key interactions for SBDD models. And I found the following unexpected behaviors:

When loading CrossDocked test set proteins, a ValueError occurs: vdw radii for types: CO. These can be defined manually using the keyword 'vdwradii', which results in protein loading failed. I did a quick search to resolve this bug but cannot find relavant solutions, except that copilot suggests inserting the following code:

vdw_radii = {'CO': 1.7}  # Define the vdw radius for 'CO'. The value 1.7 is just an example, replace it with the correct value.
ag.guess_bonds(vdwradii=vdw_radii)

I'm not sure whether 1.7 is correct for CO, and what is ag, how to call guess_bonds?

Since doing PoseCheck on >10K molecules is quite slow, multi-processing is necessary. An error would occur if multiple processes load the same protein at the same time. After some debugging, I found the following code (starting at line 60 in posecheck/utils/loading.py):

tmp_path = pdb_path.split(".pdb")[0] + "_tmp.pdb"

# Call reduce to make tmp PDB with waters
reduce_command = f"{reduce_path} -NOFLIP  {pdb_path} -Quiet > {tmp_path}"
subprocess.run(reduce_command, shell=True)

# Load the protein from the temporary PDB file
prot = load_protein_prolif(tmp_path)
os.remove(tmp_path)

I guess when multiple processes accessing the same tmp_path, a process may unexpectedly remove the file before another process load it. Thus I made the following quick fix:

tmp_path = pdb_path.split(".pdb")[0] + "_tmp.pdb"
while os.path.exists(tmp_path):
    hash_code = str(hash(tmp_path))[:4]
    tmp_path = tmp_path[:-8] + '_' + hash_code + "_tmp.pdb"
print(tmp_path)

This quick fix works for me. Please check if this is correct and consider update your source code for better multi-process support.

Except for those bugs, I will appreciate your help in the following cases:

In your README tips, you said Reading and processing all the PDB files using reduce can take a while for a large test set. If you are running PoseCheck frequently, it might be worth pre-processing all proteins yourself using prot = posecheck.utils.loading.load_protein_from_pdb(pdb_path) and setting this directly within PoseCheck using pc.protein = prot. I guess you are mentioning the following code which could be quite time-consuming:

# Call reduce to make tmp PDB with waters
reduce_command = f"{reduce_path} -NOFLIP  {pdb_path} -Quiet > {tmp_path}"
subprocess.run(reduce_command, shell=True)

How about adding an interface to preprocess all protein files? And I'm curious about how much time can be saved if proteins are preprocessed?

Can you provide some formal code for calculating the exact numbers for key interactions? Since your interface pc.calculate_interactions() returns a complicated dataframe, I'm not sure whether my parsing is correct:

df = pc.calculate_interactions()
columns = np.array([column[2] for column in df.columns])
flags = np.array([df[column][0] for column in df.columns])

def count_inter(inter_type):
    if len(columns) == 0:
        return 0
    count = sum((columns == inter_type) & flags)
    return count

# ['Hydrophobic', 'HBDonor', 'VdWContact', 'HBAcceptor']
hb_donor = count_inter('HBDonor')
hb_acceptor = count_inter('HBAcceptor')
vdw = count_inter('VdWContact')
hydrophobic = count_inter('Hydrophobic')

Please verify my code and provide some formal guidelines.

When using PoseCheck, many warning messages will be printed like this:

/opt/conda/lib/python3.9/site-packages/MDAnalysis/topology/guessers.py:146: UserWarning: Failed to guess the mass for the following atom types: CO
  warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
/opt/conda/lib/python3.9/site-packages/MDAnalysis/converters/RDKit.py:473: UserWarning: No `bonds` attribute in this AtomGroup. Guessing bonds based on atoms coordinates
  warnings.warn(
WARNING: atom  ZN from ZN will be treated as zinc
*WARNING*: Residues GLN 84  and HIS 90  in chain  B appear unbonded 
            and will be treated as a chain break
*WARNING*: Residues GLN 84  and HIS 90  in chain  B appear unbonded 
            and will be treated as a chain break
*WARNING*: Residues LEU 98  and ASP 101  in chain  B appear unbonded 
            and will be treated as a chain break
*WARNING*: Residues LEU 98  and ASP 101  in chain  B appear unbonded 
            and will be treated as a chain break
*WARNING*: Res "ZN" not in HETATM Connection Database. Hydrogens not added.
*WARNING*: Res "K" not in HETATM Connection Database. Hydrogens not added.
*WARNING*: Res "K" not in HETATM Connection Database. Hydrogens not added.

I think this information doesn't change the results, right? Do you know how to suppress these warning message? A verbose flag would be quite useful if this message can be turned off.

I think many of your followers are using CrossDocked dataset. A formal and efficient evaluation script for CrossDocked (many proteins, each protein associated with many molecules) would be beneficial. I would like to discuss with you and contribute to this script.

If you want discussion, drop me an email at kevinqu16 [at] gmail [dot] com.
That's all. Thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.