Giter Site home page Giter Site logo

choderalab / bayesian-itc Goto Github PK

View Code? Open in Web Editor NEW
5.0 18.0 10.0 208.57 MB

Python tools for the analysis and modeling of isothermal titration calorimetry (ITC) experiments.

License: GNU General Public License v3.0

Python 7.68% Shell 0.03% TeX 92.28%

bayesian-itc's Introduction

Bayesian ITC

Build Status
A python library for the Bayesian analysis of isothermal titration calorimetry experiments. This library is currently under heavy development, and API changes are frequent. Not all of the available command line options have been implemented yet.

Manifest

  • scripts/bayesitc_integrate.py, a command line utility that integrates data from .itc files using Gaussian process regression.
  • scripts/bayesitc_mcmc.py, a command line utility to run MCMC using one of two available models.
  • bayesitc/ - the library with underlying tools.
  • scripts/bayesitc_util.py, a deprecated command line utility that can be used to both integrate, and analyze data

Requirements

Python2.7+, python3.4+ . We recommend using Anaconda as your python distribution. Some of the requirements will need to be installed using pip.

numpy
nose
pandas
pymc
pint
docopt
schema
scikit-learn

Usage instructions and command line options:

Integrating ITC peaks.

The bayesitc_integrate.py script can integrate the data from .itc files using Gaussian process regression. Below are the options that the program accepts.

Integrate ITC data using Gaussian process regression. Uses MicroCal .itc files, or custom format .yml files for analysing experiments.

Usage:
  bayesitc_integrate.py <datafiles>... [-w <workdir> | --workdir=<workdir>] [-v | -vv | -vvv] [options]
  bayesitc_integrate.py (-h | --help)
  bayesitc_integrate.py --license
  bayesitc_integrate.py --version

Options:
  -h, --help                             Show this screen
  --version                              Show version
  --license                              Show license
  -l <logfile>, --log=<logfile>          File to write logs to. Will be placed in workdir.
  -v,                                    Verbose output level. Multiple flags increase verbosity.
  <datafiles>                            Datafile(s) to perform the analysis on, .itc, .yml
  -w <workdir>, --workdir=<workdir>      Directory for output files                      [default: ./]
  -n <name>, --name=<name>               Name for the experiment. Will be used for output files. Defaults to input file name-integrated.dat.
  -i <ins>, --instrument=<ins>           The name of the instrument used for the experiment. Overrides .itc file instrument.
  -f <frac>, --fraction=<frac>           The fraction of the injection to fit, measured from the end [default: 0.2]
  --theta0=<theta0>                      The parameters in the autocorrelation model. [default: 5.0]
  --nugget=<nugget>                      Size of nugget effect to allow smooth predictions from noisy data. [default: 1.0]
  --plot                                 Generate plots of the baseline fit

Sample files have been included to test if the library is functional. You can find them under bayesitc/testdata.

This example command shows how to integrate a .itc file using default settings

bayesitc_integrate.py bayesitc/testdata/sample.itc -w workdir -v

.

Bayesian inference using Markov chain Monte Carlo

The bayesitc_mcmc.py script runs Markov chain Monte Carlo (MCMC) on the supplied data, using a predefined model. Below are the options that the program accepts.

Analyze ITC data using Markov chain Monte Carlo (MCMC). Uses MicroCal .itc files, or custom format .yml files for modeling experiments.
When running the program, you can select one of two options:

competitive
  A competitive binding model. Requires multiple experiments to be specified.

twocomponent
  A twocomponent binding model. Analyzes only a single experiment

Usage:
  bayesitc_mcmc.py twocomponent <datafile> <heatsfile> [-v | -vv | -vvv] [options]
  bayesitc_mcmc.py competitive (<datafile> <heatsfile>)... (-r <receptor> | --receptor <receptor>) [-v | -vv | -vvv] [options]
  bayesitc_mcmc.py (-h | --help)
  bayesitc_mcmc.py --license
  bayesitc_mcmc.py --version

Options:
  -h, --help                             Show this screen
  --version                              Show version
  --license                              Show license
  -l <logfile>, --log=<logfile>          File to write logs to. Will be placed in workdir.
  -v,                                    Verbose output level. Multiple flags increase verbosity.
  -w <workdir>, --workdir <workdir>      Directory for output files                      [default: ./]
  -r <receptor> | --receptor <receptor>  The name of the receptor for a competitive binding model.
  -n <name>, --name <name>               Name for the experiment. Will be used for output files. Defaults to inputfile name.
  -i <ins>, --instrument <ins>           The name of the instrument used for the experiment. Overrides .itc file instrument.
  --nfit=<n>                             No. of iteration for maximum a posteriori fit   [default: 20000]
  --niters=<n>                           No. of iterations for mcmc sampling             [default: 2000000]
  --nburn=<n>                            No. of Burn-in iterations for mcmc sampling     [default: 500000]
  --nthin=<n>                            Thinning period for mcmc sampling               [default: 500]

Sample files have been included to test if the library is functional. You can find them under bayesitc/testdata.

For example, here is how to run MCMC on an experiment using a two-component binding model:

python .\scripts\bayesitc_mcmc.py twocomponent bayesitc/testdata/sample.itc bayesitc/testdata/sample-integrated.dat

Legacy examples:

Below are some examples of using the old bayesitc_util.py script.

An example for a two-component binding model

bayesitc_util.py mcmc bayesitc/testdata/sample.itc -w workdir -v -m TwoComponent --niters=20000 --nburn=1000 --nthin=10 --nfit=100

Analyze .itc files without mcmc

bayesitc_util.py bayesitc/testdata/sample.itc bayesitc/testdata/sample2.itc -w workdir

An example for a competitive binding model.

bayesitc_util.py mcmc bayesitc/testdata/acetyl_pepstatin.yml testdata/kni10033.yml bayesitc/testdata/kni10075.yml -w workdir -v -m Competitive -r "HIV protease" --niters=20000 --nburn=1000 --nthin=10 --nfit=100

bayesian-itc's People

Contributors

bas-rustenburg avatar jchodera avatar mikemhenry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayesian-itc's Issues

Failing travis builds

@bas-rustenburg : Can you take a look at the failing travis builds?

https://travis-ci.org/choderalab/bayesian-itc?utm_source=email&utm_medium=notification

Looks like Python 3.5 is failing:

======================================================================
ERROR: Test the importing of all libraries.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/choderalab/bayesian-itc/bitc/tests/test_bitc.py", line 15, in test_imports
    self._imports('bitc.' + mod)
  File "/home/travis/build/choderalab/bayesian-itc/bitc/tests/test_bitc.py", line 9, in _imports
    self.assertIsInstance(__import__(mod), ModuleType)
  File "/home/travis/build/choderalab/bayesian-itc/bitc/models.py", line 11, in <module>
    import pymc
  File "/home/travis/miniconda3/envs/test/lib/python3.5/site-packages/pymc/__init__.py", line 30, in <module>
    from .CommonDeterministics import *
  File "/home/travis/miniconda3/envs/test/lib/python3.5/site-packages/pymc/CommonDeterministics.py", line 21, in <module>
    from .utils import safe_len, stukel_logit, stukel_invlogit, logit, invlogit, value, find_element
  File "/home/travis/miniconda3/envs/test/lib/python3.5/site-packages/pymc/utils.py", line 14, in <module>
    from . import flib
ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory
----------------------------------------------------------------------

It looks like the .travis.yml is out of date. Can you update it to

  • Use omnia and conda-forge
  • Test 2.7, 3.5, 3.6

What determines initial log_sigma guess?

Right now, it seems to be the log of the standard deviation in the last 4 injection heats. The assumption would be that the last four injections are all the same, since they're dilution/mechanical heats (, if I interpret this correctly). Is there any way to select that number, "4", in a more intelligent fashion? Or even estimate the noise differently, like deviation from a baseline model?

https://github.com/choderalab/bayesian-itc/blob/master/python/models.py#L188

Since it's just an initial guess for the noise, it might not be worth spending time on optimizing this. I'm just curious about possible approaches.

Review parts of code

I left a bunch of review and todo Easter eggs (# style comments) for parts of code that really should be inspected carefully because the implementation might be a little flaky.

How to use this

Could you tell me how to use the analysis program?
I don't know open this program.
Do I need to install new python version?

I have studied microtubule microtubule stabilizing interactions by using ITC. So I want to analysis this data as soon as possible...
Please help me about this problem.

Constants.absolute_zero is not defined

Where does constants.absolute_zero get defined?

When I run the example ITC.py in the sampl4_notebook folder, I get a AttributeError: module 'constants' has no attribute 'absolute_zero'.

bitc_util.py not writing out all statistics

Some values not properly extracted, returns 0.

DeltaH:      -430.72 +-   274.65 cal/mol     [ -920.26,   157.12] 
H_0:       -0.00 +-     0.00 cal         [   -0.00,     0.00] 
Ls:         2.06 +-     0.20 mM           [    1.68,     2.48] 
sigma:   0.00000 +-  0.00000 ucal/s^(1/2) [ 0.00000,  0.00000] 

How to use this software

Could you tell me how to use this software? When I use this software, I could not do it.

I have studied microtubule- microtubule stabilizing factors by using ITC. So I want to use this analysis software.

Error for experiments without syringe concentration

See :

EXPERIMENT

Source filename: /home/bas/Git/choderalab/host-guest/SAMPL4-CB7/itc/data/02042015/20150204a15.itc
Number of injections: 11
Target temperature: 298.1 K
Equilibration time before first injection: 300.0 s
Syringe concentration: 0.000 mM
Cell concentration: 0.029 mM
Cell volume: 0.203 ml
Reference power: 5.000 ucal/s

INJECTIONS

       injection              volume (uL)             duration (s)      collection time (s)            time step (s)      evolved heat (ucal)
               1                    0.200                    0.400                   60.000                    1.000                    0.019
               2                    3.000                    6.000                  120.000                    0.500                    0.068
               3                    3.000                    6.000                  120.000                    0.500                    0.112
               4                    3.000                    6.000                  120.000                    0.500                    0.134
               5                    3.000                    6.000                  120.000                    0.500                    0.061
               6                    3.000                    6.000                  120.000                    0.500                    0.014
               7                    3.000                    6.000                  120.000                    0.500                    0.080
               8                    3.000                    6.000                  120.000                    0.500                    0.079
               9                    3.000                    6.000                  120.000                    0.500                    0.036
              10                    3.000                    6.000                  120.000                    0.500                    0.073
              11                    3.000                    6.000                  120.000                    0.500                    0.045

ERROR::bitc_util:L152
math domain error
ERROR::bitc_util:L153
Traceback (most recent call last):
  File "bitc_util.py", line 150, in <module>
    model = Model(experiment, instrument)
  File "/home/bas/anaconda/lib/python2.7/site-packages/bitc-0.0.0-py2.7.egg/bitc/models.py", line 220, in __init__
    self.Ls = pymc.Lognormal('Ls', mu=log(Ls_stated / ureg.micromolar), tau=1.0/log(1.0+(dLs/Ls_stated)**2), value=Ls_stated / ureg.micromolar)
ValueError: math domain error

Traceback (most recent call last):
  File "bitc_util.py", line 154, in <module>
    raise Exception("MCMC model could not me constructed!\n" + str(e))
Exception: MCMC model could not me constructed!
math domain error

Is likely easily fixed by an if statement.

Reimplement broken command-line options

Some options like
-q heats files, -n names, and maybe some others, are no longer working because the interface has changed to handling multiple experiments.

For the full set of changes, see #46.

Displacement volumes of iTC200 and Auto-iTC200

Is this data anywhere? (Is there a published reference for this?)

The V0 for these machines is around 200µL, but I remember using 202.8 µL in my host-guest calculations link. Does that include the volume correction, and what is the source for that number?

Doctest failure?

I'm noting a doctest failure:

[LSKI1497:~/projects/bayesian-itc/bayesian-itc.choderalab] choderaj% nosetests --with-doctest
F./Users/choderaj/anaconda/lib/python2.7/unittest/case.py:340: RuntimeWarning: TestResult has no addExpectedFailure method, reporting as passes
  RuntimeWarning)
................................
======================================================================
FAIL: Doctest: bitc.models.CompetitiveBindingModel.equilibrium_concentrations
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/choderaj/anaconda/lib/python2.7/doctest.py", line 2226, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for bitc.models.CompetitiveBindingModel.equilibrium_concentrations
  File "/Users/choderaj/projects/bayesian-itc/bayesian-itc.choderalab/bitc/models.py", line 634, in equilibrium_concentrations

----------------------------------------------------------------------
File "/Users/choderaj/projects/bayesian-itc/bayesian-itc.choderalab/bitc/models.py", line 655, in bitc.models.CompetitiveBindingModel.equilibrium_concentrations
Failed example:
    C_PLn = equilibrium_concentrations(Ka_n, x_R, x_Ln, V)
Exception raised:
    Traceback (most recent call last):
      File "/Users/choderaj/anaconda/lib/python2.7/doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest bitc.models.CompetitiveBindingModel.equilibrium_concentrations[4]>", line 1, in <module>
        C_PLn = equilibrium_concentrations(Ka_n, x_R, x_Ln, V)
    NameError: name 'equilibrium_concentrations' is not defined

-------------------- >> begin captured logging << --------------------
bitc.units: DEBUG: Using custom pint definition for molar unit.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 34 tests in 3.451s

FAILED (failures=1)

Making ITC code faster

Hey,

I was messing around with the code and discovered that stripping the units from the expected_injection_heats improves performance by something like two orders of magnitude. In other words, calling the function below (with units stripped in a wrapper):

    @staticmethod
    def _calculate_expected_heats(P0, V0, Ls, DeltaG, DeltaH, DeltaH_0, DeltaVn, beta, N):

        Kd = exp(beta * DeltaG)

        Pn  = numpy.zeros([N])
        Ln  = numpy.zeros([N])
        PLn = numpy.zeros([N])

        dcum = 1.0
        for n in range(N):
            d = 1.0 - (DeltaVn[n] / V0) # dilution factor for this injection (dimensionless)
            dcum *= d # cumulative dilution factor
            P = V0 * P0 * dcum # total quantity of protein in sample cell after n injections (mol)
            L = V0 * Ls * (1. -dcum)  # total quantity of ligand in sample cell after n injections (mol)
            PLn[n] = (0.5/V0 * ((P + L + Kd*V0) - ((P + L + Kd*V0)**2 - 4*P*L)**0.5)) # complex concentration (M)
            Pn[n] = P/V0 - PLn[n]  # free protein concentration in sample cell after n injections (M)
            Ln[n] = L/V0 - PLn[n]  # free ligand concentration in sample cell after n injections (M)
        q_n = numpy.zeros([N]) # q_n_model[n] is the expected heat from injection n
        q_n[0] = (DeltaH) * V0 * PLn[0] + DeltaH_0  # first injection # review removed a factor 1000 here, presumably leftover from a unit conversion
        for n in range(1,N):
            d = 1.0 - (DeltaVn[n] / V0)  # dilution factor (dimensionless)
            q_n[n] = DeltaH * V0 * (PLn[n] - d*PLn[n-1]) + DeltaH_0 # subsequent injections # review removed a factor 1000 here, presumably leftover from a unit conversion
        return q_n

Baseline fitting algorithm

The currently implemented algorithm in python/bayesianitc/experiments does not perform well.

@pgrinaway experimented once with gaussian processes.

That said, we can also use origin files now to override the integrated heats. NITPIC also produces a .dat file with origin style formatting that ITC.py can read with the -q command line option.

Optimize CompetitiveBindingModel

Did a little profiling.. here is by total internal time spent working. Pint units are a big thing. I was running verbose, so arrayprint is also a top hit.

workdir/bitc.profile% sort tottime
workdir/bitc.profile% stats 10
Wed Feb 18 21:55:14 2015    workdir/bitc.profile

         158007904 function calls (156067107 primitive calls) in 209.440 seconds

   Ordered by: internal time
   List reduced from 5159 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   683637   21.679    0.000   41.510    0.000 /home/bas/anaconda/lib/python2.7/site-packages/numpy/core/arrayprint.py:547(fillFormat)
   987718   15.600    0.000   93.743    0.000 /home/bas/anaconda/lib/python2.7/site-packages/Pint-0.6-py2.7.egg/pint/quantity.py:981(__getattr__)
 34221455   11.034    0.000   12.454    0.000 {isinstance}
  2321267   10.394    0.000   10.394    0.000 {method 'reduce' of 'numpy.ufunc' objects}
4236243/4184919    7.559    0.000   21.419    0.000 /home/bas/anaconda/lib/python2.7/site-packages/Pint-0.6-py2.7.egg/pint/quantity.py:78(__new__)
   515637    6.364    0.000   11.861    0.000 /home/bas/anaconda/lib/python2.7/site-packages/numpy/core/arrayprint.py:598(__call__)
3343844/1844002    6.022    0.000   17.725    0.000 /home/bas/anaconda/lib/python2.7/copy.py:66(copy)
  5391245    5.252    0.000   12.280    0.000 /home/bas/anaconda/lib/python2.7/site-packages/Pint-0.6-py2.7.egg/pint/compat/__init__.py:85(_to_magnitude)
   956909    5.054    0.000   32.232    0.000 /home/bas/anaconda/lib/python2.7/site-packages/Pint-0.6-py2.7.egg/pint/quantity.py:559(_mul_div)
  2398555    4.586    0.000   10.041    0.000 /home/bas/anaconda/lib/python2.7/site-packages/numpy/core/numeric.py:2320(seterr)

And here is cumulatively, the lambda, expected heats and equilibrium concentrations functions are taking a long time.

workdir/bitc.profile% sort cumulative
workdir/bitc.profile% stats 10
Wed Feb 18 21:55:14 2015    workdir/bitc.profile

         158007904 function calls (156067107 primitive calls) in 209.440 seconds

   Ordered by: cumulative time
   List reduced from 5159 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002  209.520  209.520 scripts/bitc_util.py:28(<module>)
      201    0.001    0.000  207.866    1.034 bitc/models.py:463(<lambda>)
      201    0.626    0.003  207.865    1.034 bitc/models.py:622(expected_injection_heats)
      154    0.001    0.000  202.349    1.314 /home/bas/anaconda/lib/python2.7/site-packages/pymc/Node.py:25(logp_of_set)
     1267    0.002    0.000  202.348    0.160 /home/bas/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py:903(get_logp)
1786/1385    0.003    0.000  202.346    0.146 {method 'get' of 'pymc.LazyFunction.LazyFunction' objects}
1917/1090    0.002    0.000  202.338    0.186 /home/bas/anaconda/lib/python2.7/site-packages/pymc/Container.py:539(get_value)
1917/1090    0.004    0.000  202.336    0.186 {method 'run' of 'pymc.Container_values.DCValue' objects}
      519    0.002    0.000  202.334    0.390 /home/bas/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py:464(get_value)
     5762    0.351    0.000  185.745    0.032 bitc/models.py:506(equilibrium_concentrations)

Naproxen experiments

Sodium concentration seems to affect the affinity
See this paper:
http://pubs.acs.org/doi/pdf/10.1021/jp062734p
Table 3 in particular.

Since we're using 100mM NaPO3, what shall I assume for the affinity in our guess?

Also, I looked up our stock, seems that we have naproxen as a powder.
I looked up the solubility for our particular item:
Soluble in DMSO, dimethyl formamide and 100% ethanol.
Should we be worried? It does not mention water.. but it seems rather polar. (carboxylic acid group etc.)

This page suggests low solubility.
http://toxnet.nlm.nih.gov/cgi-bin/sis/search2/r?dbs+hsdb:@term+@rn+@rel+22204-53-1

Documentation strings

Documentation strings need to be updated. There have been many changes in the API in order to make the code more portable.

Add a global option for binding models to select titrant mixing model

Would be useful to have a categorical selection of titrant mixing model, e.g.:

  • perfusion with instantaneous mixing
  • perfusion with post-injection mixing
  • non-perfusion

These could be separated out into separate functions that take mixing argument so that you don't need to keep reimplementing.

Later, we can do automatic selection of the model that best fits data.

Finite difference vs. autodiff?

Just taking a random browse through the code here, and I'm curious about this function:

      def odegrad(c_n, t, Ka_n, x_Ln, x_R):
         N = c_n.size
         d2c = numpy.zeros([N,N], numpy.float64)
         for n in range(N):
            d2c[n,:] = -Ka_n[n] * (x_Ln[n]/V - c_n[n])
            d2c[n,n] += -(Ka_n[n] * (x_R/V - c_n[:].sum()) + 1.0)
         return d2c

in models.py. Would it be better to use a library that does automatic differentiation here, both for efficiency and accuracy? Admittedly somewhat low priority at the moment, but I thought I'd note it in case it turns out to be worthwhile.

What step method should be we using?

Copy-paste from #53

Here is something interesting.

I took a look at the different step methods for MCMC. Here is the host concentration, estimated at 1.0157 mM in this particular experiment (I just picked a host-guest titration at random from one of our previous sets of experiments).

Settings:
Using the TwoComponentBindingModel about 320000 iterations (I stopped it at around 16% of 2000000) 10000 burn in iterations and a thinning period of 25. Not sure how good those settings are but there is quite a striking difference.

Using the RescalingStep:
sample-ls
Note that the value on the y-axis is offset by 1.0124
Using a default pymc Metropolis step:
sample-ls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.