choderalab / qmlify Goto Github PK

License: MIT License

Python 3.54% Jupyter Notebook 96.43% Shell 0.03%

qmlify's Introduction

qmlify

============================== []: # (Badges)

qmlify provides python executables for post-processing molecular mechanical (MM) receptor:ligand binding free energy calculations for the purpose of making hybrid machine learning/MM (ML/MM) free energy corrections. Current ML implementations are with Torchani's ANI2x model. MM free energy calculations are conducted with Perses

The code is experimental and subject to change.

Installation

Install qmlify master via git clone:

git clone https://github.com/choderalab/qmlify.git
cd qmlify
python setup.py install

You will also need the latest conda release of Perses:

conda install -c omnia perses

as well as a couple of classes from coddiwomple and Arsenic:

git clone https://github.com/choderalab/coddiwomple.git
cd coddiwomple
python setup.py install

git clone https://github.com/openforcefield/Arsenic.git
cd arsenic
python setup.py install

Use

qmlify postprocesses perses free energy calculations. You will need perses MM relative free energy calculation data before you can post-process it with qmlify

Perses

See experiments/mm_data/README.md for instructions on running perses replica exchange relative free energy calculations. A set of pre-generated perses simulation data is provided at experiments/mm_data, which are queried and analyzed in a demonstration at experiments/free_energy_corrections.ipynb.

qmlify

qmlify uses bidirectional nonequilibrium (NEQ) free energy calculations to anneal from the MM thermodynamic states (afforded by perses) to the hybrid ML/MM thermodynamics states (and back). Once perses free energy calculations are complete, you can extract and reformat the necessary perses files for the purpose of conducting a MM-to-ML/MM corrections using:

from qmlify.executables import perses_extraction_admin
ligand_indices = [(0,12),(0,7),(10,13),(10,6),(11,0),(11,14),(14,2),(14,8),(14,9),(15,14),(15,4),(1,13),(1,6),(2,7),(3,0),(3,13),(4,12),(4,13),(4,14),(4,5),(4,9),(5,0),(6,0),(8,0)]  # the full set of ligand index pairs for Tyk2
perses_extraction_admin(ligand_indices,
                        '<location of perses data directory to extract from>',
                        '<location of local directory to extract to>',
                        phases = ['solvent', 'complex'],
                        delete_execution=False)

Once you have switched to the extract_to directory, execute forward NEQ:

from qmlify.executables import propagation_admin
import os
propagation_admin(ligand_indices, annealing_steps=5000, direction = 'forward', parent_dir = os.getcwd(), extraction_indices = range(0, 200, 2), phases = ['solvent', 'complex'], write_log=True, cat_outputs=False, delete_outputs=False)

followed by resampling and decorrelation at the ML/MM endstate:

propagation_admin(ligand_indices, annealing_steps=5000, direction = 'ani_endstate', parent_dir = os.getcwd(), extraction_indices = 100, phases = ['solvent', 'complex'], write_log=True, cat_outputs=True, delete_outputs=False, eq_steps=5000)

and finally, backward NEQ:

propagation_admin(ligand_indices, annealing_steps=5000, direction = 'backward', parent_dir = os.getcwd(), extraction_indices = 100, phases = ['solvent', 'complex'], write_log=True, cat_outputs=True, delete_outputs=False, eq_steps=5000)

These commands will generate .npz files containing a numpy array of cumulative reduced work values (for each ligand in each pair, for each phase, and for each trajectory launched), the last of which is the total reduced work performed on the trajectory over the protocol. Each work file has a default form of lig{i}to{j}.{phase}.{old/new}.{direction}.idx_{snapshot_index}.{annealing_steps}.works.npz where i, j are ligand index pairs, phase is either 'complex' or 'solvent', old/new denotes whether the work array corresponds to ligand i (old) or j (new), direction is 'forward' or 'backward', snapshot_index is which configuration index is annealed, and annealing_steps denotes the number of integration steps in the NEQ protocol. Forward and backward work distributions can be extracted, and the free energy correction of each phase can be computed (in kT) with the Bennett Acceptance Ratio (BAR) to find the maximum likelihood estimate of the free energy.

The work files can be queried and saved for each phase with:

import numpy as np
from qmlify.analysis import aggregate_pair_works, fully_aggregate_work_dict
work_pair_dictionary = aggregate_pair_works(ligand_indices, annealing_steps = {'complex': 5000, 'solvent':5000}) #this may take a few minutes
aggregated_work_dictionary, concatenated_work_dictionary = fully_aggregate_work_dict(work_pair_dictionary) #process the pair dictionary into something that is easier to analyze
np.savez('work_dictionaries.npz', aggregated_work_dictionary, concatenated_work_dictionary) #save into a compressed file

.See the function documentation for kwargs in qmlify.analysis for a more complete description. Once the work_dictionary.npz is generated, see experiments/free_energy_corrections.ipynb, which demonstrates how to query the work dictionary, calculate BAR free energy estimates, plot the work distributions per ligand, per phase, and perform MM-to-MM/ML free energy corrections (with plots).

Copyright

Authors

dominic rufa
Hannah E. Bruce Macdonald
Josh Fass
Marcus Wieder

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.3.

If you found this repository helpful, consider citing:

@article {Rufa2020.07.29.227959,
        author = {Rufa, Dominic A. and Bruce Macdonald, Hannah E. and Fass, Josh and Wieder, Marcus and Grinaway, Patrick B. and Roitberg, Adrian E. and Isayev, Olexandr and Chodera, John D.},
        title = {Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials},
        elocation-id = {2020.07.29.227959},
        year = {2020},
        doi = {10.1101/2020.07.29.227959},
        publisher = {Cold Spring Harbor Laboratory},
        URL = {https://www.biorxiv.org/content/early/2020/07/30/2020.07.29.227959},
        eprint = {https://www.biorxiv.org/content/early/2020/07/30/2020.07.29.227959.full.pdf},
        journal = {bioRxiv}
}

qmlify's People

Contributors

Stargazers

Watchers

Forkers

lmacaya aunderbtommy888 livazquezs aizvorski hbioquant

qmlify's Issues

Some perses runs fail with "Exception: Programming error: OpenEye atom stereochemistry assumptions failed."

I'm trying to re-run perses using the experiments/mm_data/inputs/run.py script

I noticed that a few of the runs fail, for example:

$ python run.py 4
Warning: The aromaticity of the molecule needs to be perceived first.
Traceback (most recent call last):
  File "/home/alex/miniconda3/envs/qmlify-env/bin/perses-relative", line 10, in <module>
    sys.exit(run())
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/perses/app/setup_relative_calculation.py", line 610, in run
    setup_dict = run_setup(setup_options)
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/perses/app/setup_relative_calculation.py", line 356, in run_setup
    trajectory_directory=trajectory_directory, trajectory_prefix=setup_options['trajectory_prefix'])
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/perses/app/relative_setup.py", line 261, in __init__
    small_molecule_forcefield=small_molecule_forcefield, molecules=molecules, cache=small_molecule_parameters_cache)
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openmmforcefields/generators/system_generators.py", line 206, in __init__
    self.add_molecules(molecules)
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openmmforcefields/generators/system_generators.py", line 233, in add_molecules
    self.template_generator.add_molecules(molecules)
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openmmforcefields/generators/template_generators.py", line 124, in add_molecules
    self._molecules.update( { molecule.to_smiles() : molecule for molecule in molecules } )
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openmmforcefields/generators/template_generators.py", line 124, in <dictcomp>
    self._molecules.update( { molecule.to_smiles() : molecule for molecule in molecules } )
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openforcefield/topology/molecule.py", line 1841, in to_smiles
    smiles = to_smiles_method(self)
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openforcefield/utils/toolkits.py", line 1032, in to_smiles
    oemol = OpenEyeToolkitWrapper.to_openeye(molecule)
  File "/home/alex/miniconda3/envs/qmlify-env/lib/python3.7/site-packages/openforcefield/utils/toolkits.py", line 934, in to_openeye
    'Programming error: OpenEye atom stereochemistry assumptions failed.'
Exception: Programming error: OpenEye atom stereochemistry assumptions failed.
Relative calcluation of ligand 6 to 0 complete

The failing runs are any that involve ligand 6 or 13. All others seem to run okay.

Installed versions:
qmlify=g01abd1c (latest from github)
perses=0.5.0=py37_0
openeye-toolkits=2020.1.0=py37_0
openforcefield=0.6.0=py37_1
openforcefields=1.3.0=py_0

How do I fix this? Is this a version incompatibility or a bug in one of the other packages?

General free energy correction interface

I am very interested in using qmlify to calculate free energy corrections with methods other than ani and using trajectories extracted from different free energy packages like SOMD/Sire and was wondering if there would be interest or if anyone has started working on a general interface for such corrections? If not we would love to contribute such features and it would be great to get some advice on how to add them and any gotchas we should look out for while implementing them.

I think that a good first step might be making a general interface for the method used in the correction calculation, my plan would involve making an abstract base class with the same methods as the ani class which could then be subclassed and registered with qmlify so users can build interfaces easily without changes the main source code. Are there any other changes that would be needed for this to work?

I imagine that something similar could also be done with the extraction code as well.

cc @djcole56 @bieniekmateusz

Role of extraction_indices in forward direction

Hi Chodera lab,

I am trying to replicate your implementation of qmlify on some sample data and am having difficulty understanding the role of the extraction_indices used in the forward direction. As I understand, in the forward direction, the indices are used to extract the positions of the starting positions. However, the size of the positions in the positions.npz files are (1, M, N), resulting in different versions of the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-29-8f360d3045d1> in <module>
----> 1 position_extractor(cache_name, 2)

<ipython-input-27-c2adb15d3c20> in position_extractor(positions_cache_filename, index_to_extract)
     29         all_box_vectors = None
     30 
---> 31     positions = all_positions[index_to_extract]
     32     assert positions.shape[
     33         1] == 3, f"the extracted position shapes are: {positions.shape}"

IndexError: index 2 is out of bounds for axis 0 with size 1

I assume the size of the all_positions array in positions.npz must correspond to the extraction_indices somehow but I don't see how since the variable is not part of the Perses simulation preceding Qmlify where checkpoint.nc from which positions.npz is generated are created.

I'd really appreciate any insight you can give me into your work. Thank you, Sören.