choderalab / brokenyank Goto Github PK

YANK: GPU-accelerated calculation of ligand binding affinities

License: GNU Lesser General Public License v3.0

Python 99.80% Shell 0.20%

brokenyank's Introduction

yank

YANK: GPU-accelerated calculation of ligand binding affinities in implicit and explicit solvent using alchemical free energy methodologies.

Description

YANK uses a sophisticated set of algorithms to rigorously compute (within a classical statistical mechanical framework) biomolecular ligand binding free energies. This is accomplished by performing an alchemical free energy calculation in either implicit or explicit solvent, in which the interactions between a ligand (usually a small molecule or peptide) are decoupled in a number of alchemical intermediates whose interactions with the environment are modified, creating an alternative thermodynamic cycle to the direct dissociation of ligand and target biomolecule. The free energy of decoupling the ligand from the environment is computed both in the presence and absence of the biomolecular target, yielding the overall binding free energy (in the strong single-binding-mode framework of Gilson et al.) once a standard state correction is applied to correct for the restraint added between ligand and biomolecule in the complex.

Computation of the free energy for each leg of the thermodynamic cycle utilized a modified replica-exchange simulations in which exchanges between alchemical states are permitted (Hamiltonian exchange) and faster mixing is achieved by using a Gibbs sampling framework.

Authors

John D. Chodera | [email protected]
Kim Branson | [email protected]
Imran Haque | [email protected]
Michael Shirts | [email protected]

Copyright

Portions of this code copyright (c) 2009-2011 University of California, Berkeley, Vertex Pharmaceuticals, Stanford University, University of Virginia, Memorial Sloan-Kettering Cancer Center, and the Authors.

Prerequisites

Use of this module requires the following

AmberTools (for setting up protein-ligand systems): http://ambermd.org/#AmberTools
OpenMM with Python wrappers: http://simtk.org/home/openmm
Python 2.6 or later: http://www.python.org
NetCDF (compiled with netcdf4 support): http://www.unidata.ucar.edu/software/netcdf/
HDF5 (required by NetCDF4): http://www.hdfgroup.org/HDF5/
netcdf4-python (a Python interface for netcdf4): http://code.google.com/p/netcdf4-python/
numpy and scipy: http://www.scipy.org/
mpi4py (if MPI support is desired): http://mpi4py.scipy.org/ (Note that the mpi4py installation must be compiled against the appropriate MPI implementation.)
OpenEye toolkit and Python wrappers (if mol2 and PDB reading features are used ;requires academic or commercial license): http://www.eyesopen.com

Simplified Python prerequisite installation

The Enthought Python Distribution (EPD) provides many of these prerequisites (including Python, NetCDF 4, HDF5, netcdf4-python, numpy, and scipy): http://www.enthought.com/products/epd.php

Note that using EPD with OpenEye requires some care, as OpenEye tools are very selective about which Python and library versions are compatible.

For example, to use EPD 7.1-2 on OS X with OpenEye's latest toolkit, install OpenEye's toolkit and Python wrappers, then:

# Change to OpenEye libs directory
cd /path/to/openeye/python/openeye/libs

# Create a symlink for your EPD platform to trick OpenEye into thinking it is supported.
ln -s osx-10.7-g++4.2-x64-python2.7 osx-10.7-g++4.0-x64-python2.7

Running YANK from the command line

python yank.py --ligand_prmtop PRMTOP --receptor_prmtop PRMTOP --complex_prmtop PRMTOP { {--ligand_crd CRD | --ligand_mol2 MOL2} {--receptor_crd CRD | --receptor_pdb PDB} | {--complex_crd CRD | --complex_pdb PDB} } [-v | --verbose] [-i | --iterations ITERATIONS] [-o | --online] [-m | --mpi] [--restraints restraint-type]

EXAMPLES:

Serial execution:

# Specify AMBER prmtop/crd files for ligand and receptor.
python yank.py --ligand_prmtop ligand.prmtop --receptor_prmtop receptor.prmtop --complex_prmtop complex.prmtop --ligand_crd ligand.crd --receptor_crd receptor.crd --iterations 1000

# Specify (potentially multi-conformer) mol2 file for ligand and (potentially multi-model) PDB file for receptor.
python yank.py --ligand_prmtop ligand.prmtop --receptor_prmtop receptor.prmtop --complex_prmtop complex.prmtop --ligand_mol2 ligand.mol2 --receptor_pdb receptor.pdb --iterations 1000

# Specify (potentially multi-model) PDB file for complex, along with flat-bottom restraints (instead of harmonic).
python yank.py --ligand_prmtop ligand.prmtop --receptor_prmtop receptor.prmtop --complex_prmtop complex.prmtop --complex_pdb complex.pdb --iterations 1000 --restraints flat-bottom

MPI execution:

See example script mvapich2.pbs for an example using MVAPICH2.

Notes

Currently, YANK only accepts AMBER "new style" prmtop topology files to define molecular systems. For examples of how to set up your own systems using the free AmberTools suite, see the examples/ directory.

In these AMBER prmtop and inpcrd files, receptor atoms must come before ligand atoms. Atom orderings must be the same in all files (AMBER prmtop/crd, PDB, mol2). mol2 files must contain only copies of the same molecule in different geometries.

Free energy calculations in both implicit and explicit solvent are supported. The presence of water is automatically detected.

Use the testrun.sh script as an example for serial execution, and the mvapich2.pbs script as an example of MPI execution (can be run with batch or interactive queues).

Using the YANK module from Python

YANK can also be used as a module from another Python program. See the command-line driver in 'main' in yank.py as an example:

# Import the YANK module:
from yank import Yank

# Initialize YANK object using OpenMM System objects and coordinates.
yank = Yank(receptor=receptor_system, ligand=ligand_system, complex=complex_system, complex_coordinates=complex_coordinates, output_directory=output_directory, verbose=verbose)

# Run the YANK simulation using serial execution; use run_mpi() method for MPI execution.
yank.run()

# Analyze results.
results = yank.analyze()

Note that the analysis routines can also be run asynchronously as the YANK object is running.

Testing

Three levels of testing frameworks are provided:

doctests

Doctests ensure that each of the individual functions that compose YANK run on valid data without throwing exceptions. These are implemented in the 'main' part of each module in YANK (e.g. 'repex.py'), and are regularly run to ensure that there is no invalid code in YANK.

module tests

Module tests test that the code contained in the corresponding module (e.g. 'test_repex.py' for 'repex.py') generates the correct results for analytically-tractable test cases. This code ensures the correctness of individual components of YANK. Though it is impossible to test every conceivable input combination, some care is taken to ensure overall correctness of recommended codepaths.

integration tests

Integration tests ensure that the whole of YANK run on certain test problems produce reliable free energy differences for well-characterized systems (such as harmonic oscillators, Lennard-Jones particles, etc.). Integration tests are run from the provided 'integration_tests.py' script.

Known Issues

Running the MPI version of YANK with certain MPI implementations (mpich2 with hydra in particular) appears to hang in 'nvcc' if the CUDA platform is used. The OpenCL platform seems to be unaffected.
The CUDA 5.0 runtime on Linux platforms appears to allocate an enormous amount (tens of GB) of virtual memory that is not used. This is a known bug in the CUDA runtime and is expected to be rectified in CUDA 5.5.

TODO

Remove dependence on deprecated pyopenmm pure-Python wrapper; require System objects for complex, ligand, and protein instead.
Add support for asynchronous execution of Yank.run()
Add support for on-the-fly analysis thread(s)
Speed up initialization and resuming runs
Change atom ordering so ligand is first, protein second, and solvent third.

Roadmap

Support for the following is planned:

Online analysis and automatic convergence detection/termination [in progress]
Explicit solvent support with NPT simulations [almost ready; only waiting on analytical dispersion correction additions]
General Markov chain Monte Carlo (MCMC) move sets in between Hamiltonian exchanges [refactoring of repex.py in progress]
Expanded ensemble simulations (as an alternative to Hamiltonian exchange)
Support for relative free energy calculations
Support for sampling over protein mutations
Generative factories, to allow searching over combinatorially large chemical spaces (both for ligand substituents and protein mutations)
Constant-pH and ligand tautomer sampling

License

All code in this repository is released under the GNU General Public License.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Acknowledgents

The authors are extremely grateful to the OpenMM development team for their help in the development of YANK, especially (but not limited to):

Peter Eastman, Stanford University [email protected]
Mark Friedrichs [email protected]
Vijay Pande, Stanford University [email protected]
Randy Radmer
Christopher Bruns

The developers are very grateful to the following contributors for suggesting patches, bugfixes, or changes that have improved YANK:

Kai Wang, University of Virginia [email protected]
Christoph Klein, University of Virginia [email protected]
Levi Naden, University of Virginia [email protected]

Citations

Please cite the following papers if you use YANK for a publication:

YANK

Parton DL, Shirts MR, Wang K, Eastman P, Friedrichs M, Pande VS, Branson K, Mobley DL, Chodera JD. YANK: A GPU-accelerated platform for alchemical free energy calculations. In preparation.
OpenMM GPU-accelerated molecular mechanics library

Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing units. J. Comput. Chem. 30:864, 2009. http://dx.doi.org/10.1002/jcc.21209

Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. http://dx.doi.org/10.1109/MCSE.2010.27

Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. http://dx.doi.org/10.1002/jcc.21413

Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. http://dx.doi.org/10.1021/ct900463w

Eastman P, Friedrichs M, Chodera JD, Radmer RJ, Bruns CM, Ku JP, Beauchamp KA, Lane TJ, Wang LP, Shukla D, Tye T, Houston M, Stich T, Klein C, Shirts M, and Pande VS. OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. J. Chem. Theor. Comput. 2012.
http://dx.doi.org/10.1021/ct300857j
Replica-exchange with Gibbs sampling

Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing. J. Chem. Phys. 135:19410, 2011. http://dx.doi.org/10.1063/1.3660669
MBAR for estimation of free energies from simulation data

Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105, 2008. http://dx.doi.org/10.1063/1.2978177
Long-range dispsersion corrections for explicit solvent free energy calculations

Shirts MR, Mobley DL, Chodera JD, and Pande VS. Accurate and efficient corrections or missing dispersion interactions in molecular simulations. J. Phys. Chem. 111:13052, 2007. http://dx.doi.org/10.1021/jp0735987

brokenyank's People

Contributors

Stargazers

Watchers

Forkers

somous-jhzhao

brokenyank's Issues

Allow systems to be set up using OpenMM Modeller facility

It would be useful if YANK was able to parameterize System objects using the OpenMM app ForceField module. This would allow us to access multiple forcefields, instead of having to go through AmberTools LEaP externally.

To do this, a few things need to happen:

To allow arbitrary PDB files to be parameterized, we probably need to use the new pdbfixer module.
We would then use the app Modeller to add explicit solvent
Most critically, we need a way to deal with small molecules. Perhaps these could still be parameterized by AmberTools antechamber and those parameters used to write an XML parameter repository readable by ForceField?

Standardize our usage of delayed imports

Some requirements get imported at file headers, while others get imported in the heat of things.

My preference would be:

Move as many imports to the header as possible
Follow a recipe for handing whatever delayed imports we need.

Here's an example of how Robert has handled this. I'm not sure if his approach is optimal, but it's something. https://github.com/rmcgibbo/mdtraj/issues/47

Check everything with PyFlakes and (auto) pep8.

After the dust settles, we should run everything through pyflakes. It finds a few issues in repex.py.

We don't need to do this until closer to the release, however.

See if alchemical intermediates can be assigned global Context parameter to avoid the need to create and cache many Context objects

See if we can create groups of alchemical intermediates in which a global Context parameter can select the intermediate, rather than needing to create separate System objects for each intermediate.

Feature request: Protein mutations

Several laboratories would like to use YANK for computing free energy differences upon protein mutations, either for peptide ligands or protein targets.

This functionality would require a way to modify OpenMM topologies to allow alchemical transformation of protein sidechains from one residue identity to another.

Speed up alchemical intermediate creation

Currently, alchemical intermediate creation required ~4 s per intermediate, which adds up to several minutes for many replicas. Profile this code and see if can be sped up.

Modify alchemy module to use group-based CustomNonbondedForce

By using the new feature of CustomNonbondedForce in OpenMM to restrict the computation of interactions to specific pairs of atom groups, we can potentially speed up execution by a factor of two.

test_repex_mpi.py hangs on context creation

I've tried this on two different systems (cluster and desktop) now. I'm finding that Yank hangs when creating the second cached context object. Have you seen anything like this before?

[kyleb@node013 ~]$ mpirun -np 6 ~/src/yank/src/test_repex_mpi.py 
Creating test systems...
Selecting MPI communicator and selecting a GPU device...
Creating test systems...
Selecting MPI communicator and selecting a GPU device...
Creating test systems...
Selecting MPI communicator and selecting a GPU device...
Creating test systems...
Selecting MPI communicator and selecting a GPU device...
Creating test systems...
Selecting MPI communicator and selecting a GPU device...
Creating test systems...
Selecting MPI communicator and selecting a GPU device...
node 'node013' deviceid 1 / 6, MPI rank 1 / 6
node 'node013' deviceid 4 / 6, MPI rank 4 / 6
node 'node013' deviceid 5 / 6, MPI rank 5 / 6
node 'node013' deviceid 3 / 6, MPI rank 3 / 6
node 'node013' deviceid 2 / 6, MPI rank 2 / 6
node 'node013' deviceid 0 / 6, MPI rank 0 / 6
Initialized node 0 / 6
Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing units. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing. J. Chem. Phys., in press. arXiv: 1105.5749
Creating and caching Context objects...
Node 0 / 6 creating Context for state 0...
Initialized node 4 / 6
Initialized node 2 / 6Node 4 / 6 creating Context for state 4...

Node 2 / 6 creating Context for state 2...
Initialized node 3 / 6
Node 3 / 6 creating Context for state 3...
Initialized node 5 / 6Initialized node 1 / 6
Node 1 / 6 creating Context for state 1...

Node 5 / 6 creating Context for state 5...
Node 0 / 6: Using platform CUDA
Node 3 / 6: Using platform CUDA
Node 4 / 6: Using platform CUDA
Node 2 / 6: Using platform CUDA
Node 1 / 6: Using platform CUDA
Node 5 / 6: Using platform CUDA
Node 0 / 6: Context creation took 34.206 s
Note 0 / 6: Context creation done.  Waiting for MPI barrier...

Use separate MBar?

If MBar is set up correctly as a python package, it should be possible to have Yank automatically install mbar when installing via pip.

I'm not sure if this is preferable to bundling mbar inside yank, but it could be.

test_repex_mpi.py cuda platform

The "Cuda" platform should be renamed to "CUDA". (Line 38)

platform = simtk.openmm.Platform.getPlatformByName("Cuda")

should be

platform = simtk.openmm.Platform.getPlatformByName("CUDA")

In [4]: mm.Platform.getPlatform(1).getName()
Out[4]: 'CUDA'

[kyleb@node012 ~]$ python ~/src/yank/src/test_repex_mpi.py
Creating test systems...
Traceback (most recent call last):
File "/home/kyleb/src/yank/src/test_repex_mpi.py", line 29, in
platform = simtk.openmm.Platform.getPlatformByName("Cuda")
File "/home/kyleb/opt/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 10637, in getPlatformByName
return _openmm.Platform_getPlatformByName(*args)
Exception: There is no registered Platform called "Cuda"

Multidimensional Repex and AMD

At some point, we should look into plugging accelerated MD into our repex code. This could be useful for general enhanced sampling.

The current AMD code is built using a CustomIntegrator, so it might be tricky to reframe it as a HREX potential.

It's also worth thinking about how to best combine multidimensional replica change. We currently have alchemical HREX, AMD, and temperature. I imagine we might want to combine these in the near future.

Feature request: Add support for GROMOS forcefields and small molecules via ATb

We should consider support for GROMOS forcefields in a future release.

GROMOS forcefields are not currently available in OpenMM, but could be added (likely requiring permission from the GROMOS maintainer):
http://www.gromos.net/page.pl?page=contact

Small molecules could be parameterized with Alan Mark's ATb automated topology builder:
http://compbio.biosci.uq.edu.au/atb/index.py?tab=home_tab

Deprecate pyopenmm

The stuff in pyopenmm is pretty questionable. We should move the needed features to a separate repo and provide some sort of testing / quality control / docs.

Repex large file sizes

For explicit solvent, the size of our NCFiles are probably going to be problematic to many users--a lot of clusters have quotas in the ~10-100GB range.

In the distant past, I had an issue where the NCFile would fill up my space allocation on a cluster. The writing would raise and error and produce an unreadable NCFile.

Once the online analysis feature is complete, it could make sense to have "multiple" output frequencies. For example, you might not want to store all coordinates with every RE iteration. The idea is that with online analysis, you can throw away a lot more information and save coordinates less often.

Here are some ideas:

[] Ouput solvent, protein, and ligand at different frequencies
[] Write conformations every nth RE iteration

I'm envisioning this working somewhat like Peter's Reporter objects.

Obviously, this is low priority, but I think it might be useful for some people.

YANK does not correctly handle the situation where all MPI processes are attached to GPU nodes

Because only CPU MPI processes run the vacuum and solvent simulations, no processes will be allocated if all MPI processes are GPU processes.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.