Giter Site home page Giter Site logo

dpthorngren / sam Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 691 KB

Sam is a flexible MCMC sampler for Python, designed for astrophysical applications.

License: MIT License

Python 17.42% Cython 82.58%
mcmc-sampler bayesian-statistics python-library cython-library

sam's Introduction

Sam

Sam is a flexible MCMC sampler for Python and Cython (written in the latter). It was designed with astrophysics use cases in mind, as an alternative choice when the popular emcee algorithm is inappropriate. It allows for the implementation of multiple MCMC algorithms:

  1. Hamiltonian Monte-Carlo
  2. Metropolis-Hastings
  3. Adaptive Metropolis
  4. Gibbs

In addition, the sample supports Gaussian Process surrogate sampling, in which a Gaussian process is used to model the posterior probability to reduce the number of required likelihood calcuations in the case where they are expensive. The surrogate includes gradient estimation, so HMC can be used with it even if the underlying posterior's gradients are unknown.

Documentation

The full documentation can be found on Readthedocs at https://sam-mcmc.readthedocs.io.

Installation

Sam requires the following libraries to compile:

  • Cython
  • Numpy
  • Boost -- specifically the Random, Special, and Math libraries.
  • The Python dev libraries (e.g. python-dev or python3-dev)

Once these are installed, you can install any remaining missing python dependencies (Scipy and Multiprocessing) and compile Sam with the following command from the Sam directory:

pip install --user .

If you prefer not to use pip, you may instead use:

python setup.py install --user

Finally, for a system-wide install, you may omit the --user, although this may require elevated user privileges.

Basic usage

  1. Define the log probability of the target distribution (the posterior). This is the difficult part, for which a proper discussion is well outside the scope of these instructions. Consider Gelman's book for a good overview of Bayesian statistics. Once you know what you want, you should create a function that takes a vector of parameters and returns the log probability. You will likely want to use either the Scipy Statistics functions or the builtin Sam logPDF functions. For example:
import sam
from scipy.stats import norm

def logProb(theta):
    likelihood = norm.logPDF(data,theta[0],theta[1])
    prior = sam.normalLogPDF(theta[0],0,2) + sam.exponentalLogPDF(theta[1],3)
    return likelihood + prior
  1. Initialize the sampler. You will need to provide the log probability function as well as a scale vector, which is a tuning parameter of the same length as the parameter vector. This serves different purposes depending on the sampler, but usually you can use a rough guess of the posterior standard deviation. In Metropolis-Hastings sampler, for example, this sets the default proposal standard deviation. You can optionally pick a sampler to use at this point, or stick with the default Metropolis sampler.
s = sam.Sam(logProb,[1.,1.])
# Optionally:
# s.addAdaptiveMetropolis()
# s.addMetrpolis(proposalCovariance)
# s.addHMC(steps=10,stepSize=.1)
  1. Run the sampler. You must provide an initial position to begin sampling from. You may run several identical samplers in parallel using the threads keyword.
s.run([1.,1.],samples=10000,burnIn=1000,thinning=10,threads=2)
  1. Examine the results. Once sampling is complete, the samples can be accessed as s.samples, and s.results (for the thread-flattened version -- these will be the same if only one thread was used). Sam provides a few functions to assist in saving the results and checking for convergence. First, calling s.summary() prints a brief summary of the posterior sample for each dimension, including the acceptance fraction, Gelman-Rubin diagnostic, mean, standard deviation, and some percentiles. Additionally s.save("filename.npz") can be used to save the samples and supporting information to a Numpy .npz file.

Examples

First, consider a very simple case, sampling from a normal distribution:

import sam

def logProb(x):
    return sam.normalLogPDF(x[0],3.5,1.2)

s = sam.Sam(logProb,scale=[.5])
s.run(100000,x=[0.])
s.summary()

# Sampling: <==========> (100000 / 100000)          
# Dim. Accept    GR* |       Mean       Std. |        16%        50%        84%
# 0     86.7%  1.001 |       3.47      1.204 |      2.277      3.465      4.667

For a more involved example, here is an example where we fit a logistic regression model to some data using an adaptive Metropolis sampler on four parallel chains:

import sam
from scipy.special import expit, logit, norm
from scipy.stats import bernoulli

# Create data to use for sampling
betaTrue = np.array([7.5,-1.2])
n = 100
x = column_stack([ones(n),10*np.random.rand(n)])
y = bernoulli.rvs(expit(np.matmul(x,betaTrue)))

def logProb(beta):
    logLikeliood = bernoulli.logpmf(y,expit(np.matmul(x,beta))).sum()
    logPrior = norm.logpdf(beta,[0,0],5.).sum()
    return logLikeliood + logPrior

# Run the MCMC
s = sam.Sam(logProb,[1.,1.])
s.addAdaptiveMetropolis()
s.run(10000,x=[1.,1.],burnIn=1000,thinning=10,threads=4)
s.summary()

# Sampling: <==========> (10000 / 10000)          
# Sampling: <==========> (10000 / 10000)          
# Sampling: <==========> (10000 / 10000)          
# Sampling: <==========> (10000 / 10000)          
# Dim. Accept     GR |       Mean       Std. |        16%        50%        84%
# 0      5.9%  1.001 |      7.914      1.626 |       6.33      7.761      9.544
# 1      5.9%  1.001 |     -1.144     0.2328 |     -1.374     -1.124    -0.9185

sam's People

Contributors

dpthorngren avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

gzyy1026

sam's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.