aloctavodia / bap Goto Github PK

View Code? Open in Web Editor NEW

644.0 644.0 251.0 187.94 MB

Bayesian Analysis with Python (Second Edition)

Home Page: https://www.amazon.com/dp/B07HHBCR9G

License: MIT License

Jupyter Notebook 99.99% Python 0.01%

arviz bayesian-analysis data-analysis data-visualization errata pymc3 python

bap's People

Contributors

Stargazers

Watchers

Forkers

findmyway sfarhd14 miturchi willhtam batermj hill-myna frankchenbjcn manabukosaka yuskyosumi yuhong88888 mojax2 stubz riviera2015 paintingpeter invegat dataning hello-kukoo snowdj radugrosu shafiahmed justinchavez yyht beyondliyang pkumaplee pythseq wangguoweimmg canyon289 chdamianos yuxi120407 tangyc8866 legendtianjin shlpu pgm8sjc lnsongxf jealcalat ml-dxb zhouyonglong harirajeev eiskrzypczakhennig stjordanis wesleyz kkyong77 ryotamiyawaki vnikov peng-liu yuv4r4j rizkyalfianrz kiwiriver schatzr abyvinod edwardcheu volpatto javabean68 sandgate-dev shuyi1981 waternk alexandorra fullstackdevil shotaisozumi shashankg7 lystahi aebk2015 ilovebayesianai hiro-o918 whatevergeek jacksontcs tiamo-geo gongxiaobo2006 supr4pt0 sljaeger lizhaodong littlea1 tsincug simonsteinberg kumarh1982 fangyangjz alexewd easilydone renatoviolin afcarl nathanbraun arunpersaud mafrasiabi nenetto futtetennista pydatawrangler wilsonify himelys hotessy mrwizard82d1 zjgulai skn123 sky19941015 cyrilgalitzine amiribr jialuw96 jiazichen111 luiggi629 saurabh180 dipsingh

bap's Issues

Add pip to bap.yml file

I just ran conda env create -f bap.yml (conda version 4.8.3) and it complained

Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies.  Conda may not use the correct pip to install your packages, and they may end up in the wrong place.  Please add an explicit pip dependency.  I'm adding one for you, but still nagging you.

Might as well stop conda from nagging us?

Chapter 4 - exercises question 8

trace_logistic = pm.sample(size=2000)

should be

trace_logistic = pm.sample(2000)

This figure shows the samples of theta (expected mean for Poisson) against the number of children.
So, it is not a plot of posterior predictive samples of fish_caught, because the probability $\psi$ was not considered; $\psi$ is the probability of going fishing. In other words, Figure 4.12 is about the fish caught on average when you go fishing. The book explains about the role of $\beta_1$ for the existence of camper based on the plot in the figure. This example shows one of the purposes of the ZIP model, to extract the Poisson process from the observations.

Brilliant and a good example!

Extras: add Variational example

Chapter 7 Error - LinAlgError: Matrix is not positive definite

Hi - when I run following code from Chapter 7:

X_new = np.linspace(np.floor(x_1.min()), np.ceil(x_1.max()), 200)[:, None]
with model_iris:
    f_pred = gp.conditional('f_pred', X_new)
    pred_samples = pm.sample_posterior_predictive(
        trace_iris, vars=[f_pred], samples=1000)

I get the following error:

LinAlgError: Matrix is not positive definite

Let me know if you have any thoughts. fpred seems to be all zeros and I'm not sure how to fix it. I'm on Windows.

This is an outstanding book, thanks!

Here is the whole error message if helpful:

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-27-47a0dbdef087> in <module>
      4     f_pred = gp.conditional('f_pred', X_new)
      5     pred_samples = pm.sample_posterior_predictive(
----> 6         trace_iris, vars=[f_pred], samples=1000)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\sampling.py in sample_posterior_predictive(trace, samples, model, vars, var_names, size, keep_size, random_seed, progressbar)
   1730                 param = _trace[idx % len_trace]
   1731 
-> 1732             values = draw_values(vars, point=param, size=size)
   1733             for k, v in zip(vars, values):
   1734                 ppc_trace_t.insert(k.name, v, idx)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py in draw_values(params, point, size)
    781                     # This may fail for autotransformed RVs, which don't
    782                     # have the random method
--> 783                     value = _draw_value(next_, point=point, givens=temp_givens, size=size)
    784                     givens[next_.name] = (next_, value)
    785                     drawn[(next_, size)] = value

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py in _draw_value(param, point, givens, size)
    945             return point[param.name]
    946         elif hasattr(param, "random") and param.random is not None:
--> 947             return param.random(point=point, size=size)
    948         elif (
    949             hasattr(param, "distribution")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\model.py in __call__(self, *args, **kwargs)
    104 
    105     def __call__(self, *args, **kwargs):
--> 106         return getattr(self.obj, self.method_name)(*args, **kwargs)
    107 
    108 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\multivariate.py in random(self, point, size)
    278 
    279         if self._cov_type == "cov":
--> 280             chol = np.linalg.cholesky(param)
    281         elif self._cov_type == "chol":
    282             chol = param

<__array_function__ internals> in cholesky(*args, **kwargs)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\linalg\linalg.py in cholesky(a)
    762     t, result_t = _commonType(a)
    763     signature = 'D->D' if isComplexType(t) else 'd->d'
--> 764     r = gufunc(a, signature=signature, extobj=extobj)
    765     return wrap(r.astype(result_t, copy=False))
    766 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_nonposdef(err, flag)
     89 
     90 def _raise_linalgerror_nonposdef(err, flag):
---> 91     raise LinAlgError("Matrix is not positive definite")
     92 
     93 def _raise_linalgerror_eigenvalues_nonconvergence(err, flag):

two small typos in equations in chapter 5

in 5.14: I think in the term with the 1/S the sum should run up to 'S' not 's' (but perhaps it is already a capital s and I just have font size issues;))

in 5.18: terms in the second line have the wrong sign, e.g. the q_i term should be '-' and the r_i term '+'

things I'm not sure about:

comparing 5.5 and 5.6 I'm also wondering if in 5.5 pAIC should be p_{AIC}?
5.7 and 5.14 are the only places where you use a \sum^{N}{i} without giving a starting value for i, that is everywhere else you write \sum^{N}{i=1} or just \sum_i without an end value (perhaps in 4.16 the sum should gain a 'k', since it is the only place I found that doesn't have an index). 5.7 is also the only place I could find where the indices are to the right of the sum symbol and not on top and bottom

Typo in chapter 3: Lenght->Length

in data/babies.csv and the following code that uses it.

Will create a PR in a minute

Softmax Regression, results are very different

Hello,

I was interested in the Softmax Regression example that you describe in Chapter 4 of your book, and I tried to rerun the code. In general I am getting much lower accuracy (30% for the non-identifiable model, and 20% for the identifiable one). Any idea what the problem might be?

Google Colab Notebook

Thank you for your great work!

Confusion about the Regression with spatial autocorrelation sample of Ch7

I'm quite confused about what is the x value for each island when computing the exponential quadratic kernel function value in the sample code.

From the line , f = gp.prior('f', X=islands_dist_sqr), the x value for island i should be the ith column of the islands_dist_sqr 2-D matrix, which is to say, the x values for the kernel function should be vectors (with length=10)

However, the code for generating Fig 7.9 apparently assume x is a scalar value. Moreover the code which computs the exponential quadratic kernel function for Fig 7.9 does not look right. It computes exp(-ℓ (x1-x2)^2), but the defintion of exponential quadratic kernel is exp(-(x1-x2)^2/(2ℓ ^2)

Regression with spatial autocorrelation example

The Poisson regression with spatial similarity example on page 267 uses a GP with a Gaussian kernel:

cov = η * pm.gp.cov.ExpQuad(1, ls=ℓ)
gp = pm.gp.Latent(cov_func=cov)
f = gp.prior('f', X=islands_dist_sqr)

I interpret this to mean that the similarity between two islands will be judged by comparing their distances to every other island (which is fine for this small data set). I'm surprised though that the squared distances are used as features (which the ExpQuad kernel would square yet again before scaling, adding up and exponentiating). Also, the plotting code using the posterior samples for the kernel parameters seems to interpret the inputs to the kernel function as distances (not squared distances):
np.median(trace_η) * np.exp(-np.median(trace_ℓ) * xrange**2).

In short, it seems that the model is consistent with something like
f = gp.prior('f', X=islands_dist)
rather than with
f = gp.prior('f', X=islands_dist_sqr)

An additional question is why not add an extra scaling factor (γ, say), which would modulate the influence of geography, i.e. μ = pm.math.exp(α + γ * f[index] + β * log_pop) - is it because the zero mean assumption on the GP would allow it to produce small values easily, if needed?.

Thanks, and please accept my apologies if I misunderstood the text.

mauna_loa_CO2.csv not found

Dear Author,
Could you know where mauna_loa_CO2.csv is located?

Thank you so much!

Please update the codes in Chapter 5

Dear Osvaldo,

Thank you very much for writing this book. I really enjoy reading it while learning bayesian analysis.

Recently, I got stuck with two places in Chapter 5, it seems the current codes do not work with pymc3 3.8

In the section Using Sequential Monte Carlo to compute Bayes factors:

model_BF_0.marginal_likelihood / model_BF_1.marginal_likelihood

should possibly be

np.exp(trace_BF_0.report.log_marginal_likelihood -
       trace_BF_1.report.log_marginal_likelihood)

In the section Bayes factors and Information Criteria ：

I guess there is something wrong with the codes block Figure 5.13,

fig, ax = plt.subplots(1, 2, sharey=True)
labels = model_names
indices = [0, 0, 1, 1]

for i, (ind, d) in enumerate(zip(indices, waics)):
    mean = d.waic
    ax[ind].errorbar(mean, -i, xerr=d.waic_se, fmt='o')
   ax[ind].text(mean, -i+0.2, labels[i], ha='center')

ax[0].set_xlim(30, 50)
ax[1].set_xlim(330, 400)
plt.ylim([-i-0.5, 0.5])
plt.yticks([])
plt.subplots_adjust(wspace=0.05)
fig.text(0.5, 0, 'Deviance', ha='center', fontsize=14)

So please update these codes when you got a time.

Thank you.

Probability of superiority in Chapter2

Hi!
Thank you for the book!
In Chapter2 I made opposite conclusions based on HPD and the Probability of superiority.
I believe that this is because Cohen's d may be negative which affects cumulative normal distribution that calculates the Probability of superiority.

For example, based on the HPD I am able to say that the average tip size differs between Thursday and Sunday. But based on the Probability of superiority there is only a 39% chance that a person visit picked at random from the Sunday group will have a higher tip than a person visit picked at random from the Thursday group.
However, when I take the absolute value of Cohen's d for cumulative normal distribution then conclusions based on Probability of superiority are consistent with conclusions based on HPD.

So my suggestion: ps = dist.cdf(np.abs(d_cohen)/(2**0.5))

I also attached screenshots in Jypyter:

Before fix:

After fix:

Should var_names be x_4, y_4?

Looking at model_t2 at the bottom should the array names be x_4 and y_4? If so I can make a PR, just wanted to double check first.

https://github.com/aloctavodia/BAP/blob/master/code/Chp3/03_Modeling%20with%20Linear%20Regressions.ipynb

Fig 3.22 is a duplicate of fig 3.26

Fig 3.22 does not match the text describing it, or the output of the code sample provided. Instead, it is an accidental duplicate of Fig 3.26.

Extras: add non-Gaussian Mixture model

Add link to license in readme

Thought it may be helpful/convenient to have a link to the license in the readme

Error setting up conda/pymc3

My environment:

MacOS 10.15.5
pyenv 1.2.13
miniconda3-4.3.30
miniconda3-4.3.30/envs/bap

When kicking off the notebook for Chapter2, I get the following error in the first cell:

You can find the C code in this temporary file: /var/folders/p6/65r6n4_j64vc0kb0mfszxw8r0000gn/T/theano_compilation_error_k62583q3
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/gof/lazylinker_c.py in <module>
     80                     version,
---> 81                     actual_version, force_compile, _need_reload))
     82 except ImportError:

ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/gof/lazylinker_c.py in <module>
    104                         version,
--> 105                         actual_version, force_compile, _need_reload))
    106         except ImportError:

ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-1-e9446852bb85> in <module>
      4 from scipy import stats
      5 import arviz as az
----> 6 import pymc3 as pm
      7 np.random.seed(123)

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/pymc3/__init__.py in <module>
      3 
      4 from .blocking import *
----> 5 from .distributions import *
      6 from .glm import *
      7 from . import gp

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/pymc3/distributions/__init__.py in <module>
----> 1 from . import timeseries
      2 from . import transforms
      3 
      4 from .continuous import Uniform
      5 from .continuous import Flat

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/pymc3/distributions/timeseries.py in <module>
----> 1 import theano.tensor as tt
      2 from theano import scan
      3 
      4 from pymc3.util import get_variable_name
      5 from .continuous import get_tau_sd, Normal, Flat

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/__init__.py in <module>
    108     object2, utils)
    109 
--> 110 from theano.compile import (
    111     SymbolicInput, In,
    112     SymbolicOutput, Out,

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/compile/__init__.py in <module>
     10 from theano.compile.function_module import *
     11 
---> 12 from theano.compile.mode import *
     13 
     14 from theano.compile.io import *

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/compile/mode.py in <module>
      9 import theano
     10 from theano import gof
---> 11 import theano.gof.vm
     12 from theano import config
     13 from six import string_types

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/gof/vm.py in <module>
    672     if not theano.config.cxx:
    673         raise theano.gof.cmodule.MissingGXX('lazylinker will not be imported if theano.config.cxx is not set.')
--> 674     from . import lazylinker_c
    675 
    676     class CVM(lazylinker_c.CLazyLinker, VM):

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/gof/lazylinker_c.py in <module>
    138             args = cmodule.GCC_compiler.compile_args()
    139             cmodule.GCC_compiler.compile_str(dirname, code, location=loc,
--> 140                                              preargs=args)
    141             # Save version into the __init__.py file.
    142             init_py = os.path.join(loc, '__init__.py')

~/.pyenv/versions/miniconda3-4.3.30/envs/bap/lib/python3.6/site-packages/theano/gof/cmodule.py in compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
   2386             # difficult to read.
   2387             raise Exception('Compilation failed (return status=%s): %s' %
-> 2388                             (status, compile_stderr.replace('\n', '. ')))
   2389         elif config.cmodule.compilation_warning and compile_stderr:
   2390             # Print errors just below the command line.

Exception: Compilation failed (return status=1): In file included from /Users/sashao/.theano/compiledir_Darwin-18.6.0-x86_64-i386-64bit-i386-3.6.7-64/lazylinker_ext/mod.cpp:1:. In file included from /Users/sashao/.pyenv/versions/miniconda3-4.3.30/envs/bap/include/python3.6m/Python.h:25:. /Users/sashao/.pyenv/versions/miniconda3-4.3.30/bin/../include/c++/v1/stdio.h:108:15: fatal error: 'stdio.h' file not found. #include_next <stdio.h>.               ^~~~~~~~~. 1 error generated..

Update: when I try on fresh Python 3.7.3 virtulal environment and install pymc3 etc. with pip, it all seems to work:

Extras: Add interaction example

Error while importing arviz from provided conda environment

I just started reading your book and I used the env.yml file to create the conda environment with the required packages.

Unfortunately, importing arviz package results in the following error:

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-2-d3b40e9507e8> in <module>
      3 import pandas as pd
      4 from scipy import stats
----> 5 import arviz as az

~\Anaconda3\envs\bap\lib\site-packages\arviz\__init__.py in <module>
     20     _log.addHandler(handler)
     21 
---> 22 from .data import *
     23 from .plots import *
     24 from .stats import *

~\Anaconda3\envs\bap\lib\site-packages\arviz\data\__init__.py in <module>
      1 """Code for loading and manipulating data structures."""
----> 2 from .inference_data import InferenceData
      3 from .io_netcdf import load_data, save_data
      4 from .datasets import load_arviz_data, list_datasets, clear_data_home
      5 from .base import numpy_to_data_array, dict_to_dataset

~\Anaconda3\envs\bap\lib\site-packages\arviz\data\inference_data.py in <module>
      1 """Data structure for using netcdf groups with xarray."""
----> 2 import netCDF4 as nc
      3 import xarray as xr
      4 
      5 

~\Anaconda3\envs\bap\lib\site-packages\netCDF4\__init__.py in <module>
      1 # init for netCDF4. package
      2 # Docstring comes from extension module _netCDF4.
----> 3 from ._netCDF4 import *
      4 # Need explicit imports for names beginning with underscores
      5 from ._netCDF4 import __doc__, __pdoc__

include\membuf.pyx in init netCDF4._netCDF4()

AttributeError: type object 'netCDF4._netCDF4._MemBuf' has no attribute '__reduce_cython__'

Do we need to install anything else? I work on Windows 10 with Python 3.6.

Thanks in advance.

arviz._fast_kde deprecated

Dear Osvaldo

Thank you for writing BAP. I am learning a lot!

I wanted to point out an error I am getting with this line in the last cell of Chapter 3

density, l, u = az._fast_kde(y_ppc)

the _fast_kde is deprecated and does not exist anymore in arviz 0.9.0 (latest version as of Jul 28 2020). What would be the equivalent in the new versions of arviz? Or maybe we need to use another module to retrieve the kernel density estimate from a pymc3 trace?

Thank you!

within-sample vs out-of-sample accuracy

"For any combination of data and models, the within-sample accuracy will be, on average, smaller than the out-of-sample accuracy." (p. 190)

Shouldn't within-sample and out-of-sample be switched in the above? Perhaps I am reading it wrong.

Confounding variables and redundant variables chapter

"As we can see, \beta_2 for model m_x1x2 is around zero, indicating an almost null contribution of the x_2 variable to explain y. "

The image in contrast shows a negative value far away from 0 for this coefficient.

Multivariate normal log-likelihood?

Hi you,
Now, I am using Multivariate normal log-likelihood to find the posterior distribution but I don't know how to use Multivariate normal log-likelihood in pymc3.

I have an equation of input and output:

mu = a * q0 ** 2 + b * q0 * q1 + c * q1 ** 2 - d * q0 * q2 + e * q0 - f * q1 * q2 + g * q2 ** 2 - h * q1 - i * q2 + j

where:
a = -1.2422069376608758
b = 0.010979502286122458
c = 0.005065108695132867
d = 0.0003998832861218647
e = 0.16015398278991752
f = 0.00013362145005939259
g = 2.0112165363058473e-06
h = 0.0020280971401678734
i = 4.0276945348026916e-05
j = 0.01657469140501759

with
q0= [0.05675, 0.05934, 0.05633, 0.0557 , 0.05702, 0.06401, 0.06322, 0.06571, 0.06099, 0.05832, 0.06196, 0.06463, 0.05507, 0.06351, 0.06287, 0.06122, 0.05407, 0.05985, 0.05774,0.06015]

q1 = [0.9486, 0.9095, 0.9856, 0.9318, 1.0477, 1.0489,1.0663, 0.9184, 0.9646, 1.0345, 1.0168, 1.0565, 0.9727, 0.9907, 0.9277, 0.9548, 1.0933, 1.0751,1.0026, 1.0231]

q2= [51.813, 54.279, 52.659, 51.197 , 46.629, 49.791, 48.581, 54.799, 46.413, 47.078, 52.367, 48.204, 50.389, 45.402, 47.893, 50.796 , 49.332, 53.323, 53.713, 45.757]
mu = [0.0204232,0.0205054,0.0204971,0.0204463,0.0206686,0.0206678,0.0206883,0.0204627,0.020426,0.0206532,0.0206322,0.020677,0.0204431,0.0205319,0.0204508,0.0204115,0.020721,0.0206988,0.0206179,0.0206418]
from the each variable value above I estimate the surrogate model look like mu equation. and I also have prior distribution of (q0,q1,q2) is normal distribution.
q0 Normal (0.05,0.06)
q1 Normal (1.0,0.1)
q2 Normal (50,5)

How to fin the posterior distribution p(mu| (q0,q1,q2)).

I hope you to help me.

Thanks you.

the code "Multiple logistic regression" error

I run the code about chapter 4 "Multiple logistic regression"

`with pm.Model() as model_1:
α = pm.Normal('α', mu=0, sd=10)
β = pm.Normal('β', mu=0, sd=2, shape=len(x_n))

μ = α + pm.math.dot(x_1, β) 
θ = pm.Deterministic('θ', 1 / (1 + pm.math.exp(-μ))) 
bd = pm.Deterministic('bd', -α/β[1] - β[0]/β[1] * x_1[:,0])
 
yl = pm.Bernoulli('yl', p=θ, observed=y_1) 

trace_1 = pm.sample(2000)`

but i meet some error:

"Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...

You can find the C code in this temporary file: C:\Users\ADMINI~1\AppData\Local\Temp\theano_compilation_error_09yx1jkm
library blas is not found.
"
when I run the code “ chapter 4” is OK。
python3.8 pymc3.8 win10

A question on chapter 2

Hi Osvaldo

first off compliments on your book. I'm reading it on Safari and I decided to purchase the paper edition as well.
Sorry for my silly questions as I am learning this matter the first time.

In Model specification (chapter 2) it reads 'For the likelihood, we will use the binomial distribution with n=1'

A binomial distribution with n =1 is equivalent to a Bernoulli one. Right?

In the image 2.1 I would then expect y ~ B(n=1, p=Theta) or Bern(p=Theta) instead of Bern(n=1, p=Theta)

That means also we are re-coding only a sub-case of the previous chapter where n had no limitations?
if so, how could we code the same case in PyMC3?
I am a bit confused.

Thank you for your help,
Regards
Fabio

Troubleshooting installation issues with M1P MBP, conda environment, and arviz

Hi, just want to let you know, and hope can help somebody who also has the same problem.

I am using M1P MBP (Ventura 13.2.1).
I faced some issues when running conda env create -f bap.yml.
Sorry that I can't reproduce the error, but the error is about installing hdf5 and netcdf
I ran brew install hdf5 netcdf first to stove the problem.

The env installed successfully, but I still have an error trying to import arviz
The error shows:
ImportError: dlopen(/Users/alvin/opt/anaconda3/envs/bap/lib/python3.6/site-packages/netCDF4/_netCDF4.cpython-36m-darwin.so, 0x0002): symbol not found in flat namespace '_nc_close'
There's also a warning:
WARNING (theano.configdefaults): install mkl with ``conda install mkl-service``: No module named 'mkl'

I ran conda install -c conda-forge netCDF4 to fix them.

Add section for links to books in readme

Thought it may be helpful to add a link to where the books can be found.

A link to the packt version is in the introduction & the amazon is in the repo about section (which most people can overlook if they don't know about the about section.

Thought it may be helpful to have an explicit section in the readme to find the books in case they stumble on the repository first.

An example is in the following format with explanations so people can choose if they want to use an affiliate link as an example

Get the book

Non-affiliate links

Packt link (non affiliate link)
Amazon US link (non affiliate link)
Amazon UK link (non affiliate link)

Affiliate links

Packt link (affiliate link if any)
Amazon US link (affiliate link if any)
Amazon UK link (affiliate link if any)

Typos Chapter 3

p.123 (print version), eq. (3.16), I think there is a problem with the dimension of X
- If X is an (m x n)-Matrix (as stated in the book), then beta should be a (1 x m)-vector (row-vector) and eq. (3.16) should read as mu = alpha * ones(1, n) + beta * X. Accordingly, mu is also a (1 x n)-vector (row-vector).
- If X were an (n x m)-Matrix, then eq. (3.16) is OK, i.e., mu = alpha * ones(n, 1) + X * beta. Now, mu is an (n x 1)-vector (column-vector).
p.123, eq. (3.17), I believe the sum should run from i = 1 to n (not m)

Extras: Add workflow example

Page 128

Figure 3.22 and it's description paragraph immediately below do not match. The description describes m_x2 is around 0.55 when in the diagram it is shown around -0.2. Either the figure is wrong or the description is wrong. I think it should be this figure

and not this

A Typo on Cohen's d

Hi Osvaldo,

I found a typo on the Cohen's d definition 2.4 on page 67 : sigma2^2 - sigma1^2. (- instead of +)

I'm enjoying your book but I have some difficulties to understand the meaning of the method sample on the model.

For example on page 70 when you analyze the tip dataset and write:
trace_cg = pm.sample(5000)
my guess is that 5000 samples are drawn from the mu interval and 5000 from the sigma interval and that for every such couple of values is done what you have described in the first chapter.

That's to say for the corresponding prior distribution is evaluated the likelihood under the data values and this is the multiplied by the prior getting so the posterior distribution of sigma and mu ...but then? How can you derive the global distribution as in figure 2.10? I suppose the values are somewhat aggregated, summed and normalized?

Thank you very much for your help in advance,
Bye
Fabio

Typo: Extra closing parenthesis in eq 5.1 page 187

p(T_{sim} > T_{obs} ) | y)
                     ^^

I guess it should read:

    p(  T_{sim} > T_{obs} | y)

    p(  (T_{sim} > T_{obs}) | y)

Any plans to port to other frameworks ?

Hey @aloctavodia,
I loved your book as a beginner. This repository, in itself, is immensely helpful.
Do you have any plans to port the examples to other frameworks like pyro, edward or pymc4?

How to calculate correlation in pymc3?

Dear you,
I have three variables (nodes1,nodes2,nodes3) and a function f(nodes1,nodes2,nodes3) how to calculate between each pair variable together and each variable with function f(nodes1,nodes2,nodes3).

the following this is my code.

a= -1.24
b=0.011
c=0.005
d=-0.0004
e=0.16
f=-0.00013
g=-2.01e-6
k=-0.002
m=-4.03e-5
l=0.0165


basic_model = pm.Model()

with basic_model:
    nodes1 = pm.Normal('nodes1', 0.06, 0.006)
    nodes2 = pm.Normal('nodes2', 1, 0.06)
    nodes3 = pm.Normal('nodes3', 50.06, 2.88)


    # likehood function
    mu=a*nodes1**2+b*nodes1*nodes2+c*nodes2**2+d*nodes1*nodes3+e*nodes1+f*nodes2*nodes3+g*nodes3**2+k*nodes2+m*nodes3+l

    # posterior distribution
    y=pm.Normal('y', mu, 1, observed=evals)
    trace = pm.sample(10000, cores =1)

    pm.traceplot(trace,varnames= ['nodes1','nodes2','nodes3'])
    k = pm.summary(trace).round(2)
    pm.plot_posterior (trace, varnames= ['nodes1','nodes2','nodes3'])
    tracedf1 = pm.trace_to_dataframe (trace, varnames = ['nodes1','nodes2','nodes3'])
    sns.pairplot(tracedf1 )
print (k)
plt.show()

thank you

Understanding the plot_trace

Hej,

First I'd like to thank you for a great book! It's a pleasure reading through it.
There is something fundamental that I have not understood and is not explained in the book. This is the ArviZ plot_trace.

For theta there is one blue and one orange line. Why not just one? Is it one per chain?

Softmax function in aesara

Hi,

I was trying to run the code for the Softmax regression example in Chapter 4 using the Aesara (rather than Theano) Softmax function. It looked like the aesara.tensor.nnet subpackage had been removed but I was hoping this might do the trick but was getting a "NotImplementedError: Cannot convert μ to a tensor variable." upon running the following code:

with pm.Model() as model_s:
α = pm.Normal('α', mu=0, sigma=5, shape=3)
β = pm.Normal('β', mu=0, sigma=5, shape=(4,3))
μ = pm.Deterministic('μ', α + pm.math.dot(x_s, β))
θ = aesara.tensor.special.softmax(μ)
yl = pm.Categorical('yl', p=θ, observed=y_s)
idata_s = pm.sample(2000, target_accept=0.9, return_inferencedata=True)

Any suggestions greatly appreciated.

Thanks,

Rich

[BUG] Link to download of anaconda not rendering correctly

In the Installation section of the readme the link to anaconda does not work correctly

https:/ / [www.](http://www./) anaconda. com/ download/

https://github.com/aloctavodia/BAP#installation

Chapter 4 exercises - Question 6

The plot_hpd :
az.plot_hpd(x_3[:,0], trace_3['bd'], color='k')

fails with :

LinAlgError Traceback (most recent call last)
in
----> 1 az.plot_hpd(x_3[:,0], trace_3['bd'], color='k')
2
3 plt.xlabel(x_n[0])
4 plt.ylabel(x_n[1]);

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/arviz/plots/hpdplot.py in plot_hpd(x, y, credible_interval, color, circular, smooth, smooth_kwargs, fill_kwargs, plot_kwargs, ax, backend, backend_kwargs, show)
98 x_data = np.linspace(x.min(), x.max(), 200)
99 hpd_interp = griddata(x, hpd_, x_data)
--> 100 y_data = savgol_filter(hpd_interp, axis=0, **smooth_kwargs)
101 else:
102 idx = np.argsort(x)

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/scipy/signal/_savitzky_golay.py in savgol_filter(x, window_length, polyorder, deriv, delta, axis, mode, cval)
346 # the last window_length elements.
347 y = convolve1d(x, coeffs, axis=axis, mode="constant")
--> 348 _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y)
349 else:
350 # Any mode other than 'interp' is passed on to ndimage.convolve1d.

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/scipy/signal/_savitzky_golay.py in _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y)
219 halflen = window_length // 2
220 _fit_edge(x, 0, window_length, 0, halflen, axis,
--> 221 polyorder, deriv, delta, y)
222 n = x.shape[axis]
223 _fit_edge(x, n - window_length, n, n - halflen, n, axis,

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/scipy/signal/_savitzky_golay.py in _fit_edge(x, window_start, window_stop, interp_start, interp_stop, axis, polyorder, deriv, delta, y)
189 # where '-1' is the same as in xx_edge.
190 poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start),
--> 191 xx_edge, polyorder)
192
193 if deriv > 0:

<array_function internals> in polyfit(*args, **kwargs)

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/numpy/lib/polynomial.py in polyfit(x, y, deg, rcond, full, w, cov)
629 scale = NX.sqrt((lhs*lhs).sum(axis=0))
630 lhs /= scale
--> 631 c, resids, rank, s = lstsq(lhs, rhs, rcond)
632 c = (c.T/scale).T # broadcast scale coefficients
633

<array_function internals> in lstsq(*args, **kwargs)

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/numpy/linalg/linalg.py in lstsq(a, b, rcond)
2266 # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
2267 b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
-> 2268 x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
2269 if m == 0:
2270 x[...] = 0

~/Data/01-Software/03-Python/01-Anaconda3/lib/python3.7/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_lstsq(err, flag)
107
108 def _raise_linalgerror_lstsq(err, flag):
--> 109 raise LinAlgError("SVD did not converge in Linear Least Squares")
110
111 def get_linalg_error_extobj(callback):

LinAlgError: SVD did not converge in Linear Least Squares

Chapter 3 exercises - Question 5

dataset.prior["y,"] raises a KeyError.

Fig 3.26 does not agree with text

Fig 3.26 should show x_2 negatively correlating with y, but there is a slight positive correlation instead. It looks like an accidental duplicate of Fig 3.21.

Typos Chapter 7

First of all, thanks for sharing your knowledge and writing this book. I really enjoyed working it through. I thought, you'd might appreciate if I let you know about typos I found. If I am mistaken, I am sorry for the inconvenience

p. 257 (I am referring to the page numbers of the print version), I think the sentence For the purpose ... returns a value of >zero< i>n< ... should be changed to For the purpose ... returns a value of one if ..., compare to eq. (7.3).
p.264, I think there is a code line missing trace_reg = pm.sample(...).
p.269, in text you say that you will use column lon2. Later in the code example, however, you used the lon - column, i.e., x_data = [ ... , islands.lon.values[:, None]].

module 'arviz' has no attribute 'geweke'

Whenever I run the code examples on Jupyter notebook, and on the part where you import pymc3, I get the error, "module 'arviz' has no attribute 'geweke'". This only happens when I import pymc3, which is all chapters. I was only able to run the code examples in chapter 1, because it did not require importing pymc3.

attached is a snapshot of the error message:

Please help.

Dos últimos modelos del cap. 8 no funcionan

Los dos últimos modelos (GP y GP_periodic) del cap. 8 no funcionan con pymc3 versión 3.3. Para solucionar esto, cambiar

trace = pm.sample(1000)

trace = pm.sample(1000, njobs=1).

Los dos modelos mencionados sí funcionan con pymc3 versión 3.2

OSError: ../data/mauna_loa_CO2.csv not found.

This data file is missing for chapter 1.

help me attach the bayesian inference!!

Hi You,
I am new user pymc3. Now I have a problem with the bayesian inference apply to my problem.
My problem has a prior value that is uniform distribution [0,1]. And likelihood is the normal distribution [1.35,0.05]. How to find the posterior by Bayesian method with MCMC sampler. Could you give me how to attach it? thank you very much.

Errata.md page numbers shifted

Hey,

I was just about to report some typos in the second edition of the book and update the errata.md as I noticed that for the second chapter the page numbers are shifted one down. E.g. the typo reported in the errata on page 59 is actually on page 60 in the book.

Did I miss something or do we need to correct the errata.md?

Best
Sven

Issue with installing theano and pymc3 on PyCharm

There has been particular issue with installing both theano and pymc3 on PyCharm. I have tried multiple install/uninstall with both pip and conda and nothing has worked till now. The error message I get is linked to theano. Please let me know how to solve this issue.

The x label of figure Choosing the likelihood

I think the x label of figure Choosing the likelihood should be y. Since here the theta is fixed in each subplot and we are calculating the conditional probability of y observations under theta.

hdf5 installation error on windows when installing from bap.yml

Perhaps this has something to do with my setup, but it would be nice to know. I get the following error when running conda env create -f bap.yml

ERROR: Command errored out with exit status 1:
     command: 'C:\miniconda3\envs\bap\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\OPHERL~1\\AppData\\Local\\Temp\\pip-install-peqyy82t\\netcdf4\\setup.py'"'"'; __file__='"'"'C:\\Users\\OPHERL~1\\AppData\\Local\\Temp\\pip-install-peqyy82t\\netcdf4\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\OPHERL~1\AppData\Local\Temp\pip-pip-egg-info-m8pllb3k'
         cwd: C:\Users\OPHERL~1\AppData\Local\Temp\pip-install-peqyy82t\netcdf4\
    Complete output (28 lines):
    Package hdf5 was not found in the pkg-config search path.
    Perhaps you should add the directory containing `hdf5.pc'
    to the PKG_CONFIG_PATH environment variable
    No package 'hdf5' found