Giter Site home page Giter Site logo

choderalab / pymbar Goto Github PK

View Code? Open in Web Editor NEW
233.0 26.0 89.0 29.72 MB

Python implementation of the multistate Bennett acceptance ratio (MBAR)

Home Page: http://pymbar.readthedocs.io

License: MIT License

Shell 0.42% Python 98.61% PowerShell 0.59% Batchfile 0.36% HTML 0.02%
pymbar free-energies python thermodynamic-states equilibrium molecular-dynamics-simulations single-molecule-pulling mbar multistate-bennett-acceptance-ratio bennett-acceptance-ratio

pymbar's Introduction

Build Status Anaconda Cloud Downloads Badge Anaconda Cloud Badge PyPI Version DOI

pymbar

Python implementation of the multistate Bennett acceptance ratio (MBAR) method for estimating expectations and free energy differences from equilibrium samples from multiple probability densities. See our docs.

Installation

The easiest way to install the pymbar release is via conda:

conda install -c conda-forge pymbar

which will come with JAX to speed up the code. Or to get the non-JAX accelerated version:

conda install -c conda-forge pymbar-core

You can also install JAX accelerated pymbar from the Python package index using pip:

pip install pymbar[jax]

or the non-jax-accelerated version with

pip install pymbar

Whether you install the JAX accelerated or non-JAX-accelerated version does not change any calls or how the code is run. The non-Jax version is smaller on disk due to smaller dependencies, but may not run as fast.

The development version can be installed directly from github via pip:

# Get the compressed tarball
pip install https://github.com/choderalab/pymbar/archive/master.tar.gz
# Or obtain a temporary clone of the repo with git
pip install git+https://github.com/choderalab/pymbar.git

Usage

Basic usage involves importing pymbar and constructing an MBAR object from the reduced potential of simulation or experimental data.

Suppose we sample a 1D harmonic oscillator from a few thermodynamic states:

>>> from pymbar import testsystems
>>> x_n, u_kn, N_k, s_n = testsystems.HarmonicOscillatorsTestCase().sample()

We have the nsamples sampled oscillator positions x_n (with samples from all states concatenated), reduced potentials in the (nstates,nsamples) matrix u_kn, number of samples per state in the nsamples array N_k, and indices s_n denoting which thermodynamic state each sample was drawn from.

To analyze this data, we first initialize the MBAR object:

>>> mbar = MBAR(u_kn, N_k)

Estimating dimensionless free energy differences between the sampled thermodynamic states and their associated uncertainties (standard errors) simply requires a call to compute_free_energy_differences():

>>> results = mbar.compute_free_energy_differences()

Here results is a dictionary with keys Deltaf_ij, dDeltaf, and Theta. Deltaf_ij[i,j] is the matrix of dimensionless free energy differences f_j - f_i, dDeltaf_ij[i,j] is the matrix of standard errors in this matrices estimate, and Theta is a covariance matrix that can be used to propagate error into quantities derived from the free energies.

Expectations and associated uncertainties can easily be estimated for observables A(x) for all states:

>>> A_kn = x_kn # use position of harmonic oscillator as observable
>>> results = mbar.compute_expectations(A_kn)

where results is a dictionary with keys mu, sigma, and Theta, where mu[i] is the array of the estimate for the average of the observable for in state i, sigma[i] is the estimated standard deviation of the mu estimates, and Theta[i,j] is the covariance matrix of the log weights.

See the docstring help for these individual methods for more information on exact usage; in Python or IPython, you can view the docstrings with help().

JAX needs 64-bit mode

PyMBAR needs 64-bit floats to provide reliable answers. JAX by default uses 32-bit (Single) bitsize. PyMBAR will turn on JAX's 64-bit mode, which may cause issues with some separate uses of JAX in the same code as PyMBAR, such as existing Neural Network (NN) Models for machine learning.

Authors

References

  • Please cite the original MBAR paper:

    Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105 (2008). DOI

  • Some timeseries algorithms can be found in the following reference:

    Chodera JD, Swope WC, Pitera JW, Seok C, and Dill KA. Use of the weighted histogram analysis method for the analysis of simulated and parallel tempering simulations. J. Chem. Theor. Comput. 3(1):26-41 (2007). DOI

  • The automatic equilibration detection method provided in pymbar.timeseries.detectEquilibration() is described here:

    Chodera JD. A simple method for automated equilibration detection in molecular simulations. J. Chem. Theor. Comput. 12:1799, 2016. DOI

License

pymbar is free software and is licensed under the MIT license.

Thanks

We would especially like to thank a large number of people for helping us identify issues and ways to improve pymbar, including Tommy Knotts, David Mobley, Himanshu Paliwal, Zhiqiang Tan, Patrick Varilly, Todd Gingrich, Aaron Keys, Anna Schneider, Adrian Roitberg, Nick Schafer, Thomas Speck, Troy van Voorhis, Gupreet Singh, Jason Wagoner, Gabriel Rocklin, Yannick Spill, Ilya Chorny, Greg Bowman, Vincent Voelz, Peter Kasson, Dave Caplan, Sam Moors, Carl Rogers, Josua Adelman, Javier Palacios, David Chandler, Andrew Jewett, Stefano Martiniani, and Antonia Mey.

Notes

pymbar's People

Contributors

badisa avatar bdice avatar chayast avatar chrisjonesbsu avatar dotsdl avatar ijpulidos avatar jaimergp avatar jchodera avatar kyleabeauchamp avatar lnaden avatar mattwthompson avatar maxentile avatar mikemhenry avatar mrshirts avatar richardjgowers avatar rmcgibbo avatar schuhmc avatar smcantab avatar tuckerburgin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pymbar's Issues

Create pymbar2.0 branch

So we should consider maintaining separate branches for 1.0 and 2.0, with future pull requests being merged towards the 2.0 branch.

Documentation improvements

I wanted to start a thread to discuss what additions/changes to the excellent documentation Kyle has started [http://pymbar.readthedocs.org] we might want now or in the future.

Some quick things:

  • I think I'll need to update the version available via the PyPI, last updated 11 Aug 2013: https://pypi.python.org/pypi/pymbar/2.0.1-beta
    Any thoughts on what sort of numbering/naming/revision scheme we should be using, since each version on PyPI must be a unique version number?
  • I think we can mention on the Getting Started page that easy_install pymbar should also work.
  • We should add some tutorials illustrating the application to the example systems we have in pymbar-examples that will eventually form the foundation for what we've been colloquially referring to as the "MBAR for Dummies" page.
    @mrshirts : What do you think we should focus on first, and how should the tutorials be structured?
    @kyleabeauchamp : Do you want to be the lead author on that paper, since this would also be a nice way for you to get credit for what you've done here? The notes for that paper are here in the old svn: https://simtk.org/websvn/wsvn/pymbar/manuscripts/#_manuscripts_
  • Rework main README.md to also include reference to the online documentation or potentially just direct the user there (to remove redundancy).

Avoid caching values except when necessary

So a lot of intermediate calculations are being cached as member variables in the various classes.

Various functions are then implemented as member functions that proceed to lookup the cached values.

I would suggest that we avoid caching values except when necessary for performance reasons (e.g. in the iterative solver).

I would also suggest that each "calculation" should have a function--not member function--that lists all parameters as arguments and has a clear docstring of what the arguments are.

The reason I suggest this is that I think it really makes it clearer what we are calculating and what the calculation depends on.

Obviously, there will be cases where we have to do some caching for performance reasons, but I think we're over-using it right now.

Move BAR to separate module

I think code would be easier to maintain and read if BAR, MBAR, and ExpGauss were all separate python files.

Create drop-in replacement for pymbar 2.0 MBAR object

This "legacy" interface could facilitate compatibility in existing codebases.

As a reminder, the key differences between 1.0 and 2.0 are probably:

  1. Member function names use Python naming convention
  2. Inputs are U_kn rather than U_kln
  3. Minor algorithmic and function argument differences.

The legacy interface might initially just implement the most important behaviors, such as calculating f_k and computeExpectations().

pandas support?

Is there a way to only require pandas if those utilities are used? I noticed that it's not installed in python on my cluster, and most functionality doesn't include this.

Fix long imports

So it seems that right now, our __init__.py has one too many levels of depth because of how we import:

In [1]: import pymbar

In [2]: pymbar.pymbar.[...]

We can instead use from pymbar import xyz in __init__.py to reduce the tree.

NumExpr and Cython

These are two possible things that could help code maintainability and performance:

I think Cython makes significantly cleaner code than, e.g., weave. For example, here is an RMSD wrapper written in Cython. The key is that all pointers are handled by numpy array indices. (https://github.com/rmcgibbo/mdtraj/blob/master/MDTraj/IRMSD/rmsd_wrap.pyx).

I also recently found out that NumExpr can dramatically accelerate simple arithmetic expressions involving transcendental functions. For example, here is a simple implementation of LogSumExp that speeds things up 10X. I know that this operation is probably not rate limiting, but I suspect that similar types of expressions are rate limiting...

import numexpr as ne
import scipy.misc
x = np.random.normal(size=100000)
%timeit y = np.log(ne.evaluate("sum(exp(x))")
%timeit y = scipy.misc.logsumexp(x)

Here are results:

1000 loops, best of 3: 341 us per loop

and

100 loops, best of 3: 2.15 ms per loop

Feel free to close this issue immediately, I just want get this down for the record.

Move tests

Can we move the pymbar/pymbar/tests/ to pymbar/tests/?

We can keep pymbar/testsystems/ where it is.

Int vs. Float for N_k

Does it make sense to use float everywhere for N_k as a way to avoid silly integer arithmetic issues / casting issues? I suppose there is a slight reduction in resolution when doing this, but I don't think it's relevant--the resolution of a float64 is 1E-15...

removing (or moving) files in the 'old' directory

The statistical inefficiency code is in tests now, and discussing with Kyle, it would make sense to move the other two to pymbar-examples. This would also remove matplotlib requirements from anything in pymbar.

BAR errors from overflow should be caught or worked around

Right now, there are occasionally errors with BAR. It ends up giving up and going to zero, but this error should be caught earlier.

pymbar.py:308: RuntimeWarning: overflow encountered in exp
log_f_R = - max_arg_R - numpy.log( numpy.exp(-max_arg_R) + numpy.exp(exp_arg_R - max_arg_R) ) - w_R

GPU acceleration?

At one point, I wrote a CUDA GPU-accelerated kernel. Is this something we should include in a future version?

Error in computePMF example

I think I've spotted an error in the examples section of the MBAR.computePMF funcion. But it could also be me geting confused with the indices

>>> from pymbar import oldtestsystems
>>> [x_kn, u_kln, N_k] = oldtestsystems.HarmonicOscillatorsSample(N_k=[100,100,100])
>>> mbar = MBAR(u_kln, N_k)
>>> u_kn = u_kln[0,:,:]

If $u_{kln}=u_l(x_n^{(k)})$ shouldn't it be

>>> u_kn=u_kln[:, 0, :]

in the 3rd line?

In other words don't you use the potential energy at fixed state $l_0$ for data generated at different thermodynamic states $k$ to estimate the pmf at the desired state $l_0$?

Create Branch for 1.XX

Before we start merging changes for the 2.0 release, we should probably create a branch for the last 1.XX release.

We have two options:

  1. Take the current git and copy it to a separate branch
  2. Copy the contents of the release tarball to a separate branch.

The problem with (1) is that I suspect the current codebase doesn't perfectly correspond to any particular released version.

The problem with (2) is that I think this "breaks" the Git model of code changes (fork and merge).

Not sure what the best option is, but I think we have to pick one of these two before we proceed on other merges.

Release PyPI package

This mainly involves doing a python setup.py upload after we update the package.

Differences between pseudoinverse and np.linalg.pinv

So in my refactor, I switched from our custom pseudoinverse to np.linalg.pinv. I noticed some differences in output for "svd-ev", which I tracked down to the tolerance inputs.

np.linalg.pinv defaults to 1E-15, while our custom pinv defaulted to 1E-12.

Switching np.linalg.pinv to use an rcond of 1E-12 eliminated the differences and restored agreement with pymbar1.0

Preserving svn history?

There's a lot of information in the pymbar svn repository -- for example, I'm now going back to figure out where we need to keep weights stored in logarithmic form by diffing against different versions of the log.

So, is there any way to preserve this history?

Convert to N x K storage for U_kln matrix

For data sets where we have different numbers of samples at different states, rather than having the energies stored in a KxKxN_each matrix, it would be more efficient to store data as a KxN_tot matrix, where N is the total number of sampled collected, and K is the number of states we are evaluating the energies at. This will take a bit of extra work, however,

Switch license to LGPL 2.1

What would you all think about switching the license to LGPL? If the goal is maximal usage, maybe a less restrictive license would help?

Variable length function outputs

This is a minor style comment, but a lot of our functions look like this:

[...]

return returns

Where returns is a list whose length various depending on a bunch of optional arguments to the function.

To me, this style is unclear. Whenever possible, I would prefer for our functions to return a fixed number of variables--and returned by name:

return u_kn, f_i, kitchen_sink

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.