Code repository for Think Bayes: Bayesian Statistics Made Simple by Allen B. Downey
Available from Green Tea Press at http://thinkbayes.com.
Published by O'Reilly Media, October 2013.
Code repository for Think Bayes.
Code repository for Think Bayes: Bayesian Statistics Made Simple by Allen B. Downey
Available from Green Tea Press at http://thinkbayes.com.
Published by O'Reilly Media, October 2013.
Hey Allen,
I wrote "from thinkbayes.py import Pmf" in order to practice but it shows a message that says "No module named 'thinkbayes'".
Hi,
I try to run codes in species.py. for method "RunSubject('B1242', conc=1, high=100)", it requires a csv file, "BBB_data_from_Rob.csv". If it is not due to confidential limit, can you please share it?
Thanks for the book and codes, it is nice explained and wrote, I enjoy to read it.
Thanks,
Qiang
Hi,
I have been trying to replicate some of your results and I might have found an issue in Chapter 9.
My results suggest a much tighter posterior after observing the four data points x=[15, 16, 18, 21] than what is suggested in your Figures 9.2 and 9.5, as well as in the reported posterior credible intervals.
In fact, I get something closer to your posterior plots if I only include the last datapoint x=[21].
You can see my results in this colab notebook
In the Euro problem, when calculating the likelihood of the entire set at once, it seems like this should use the binomial distribution. The binomial distribution calculates what the odds are of seeing K
instances in N
draws if the probability is P
, and it seems like that's exactly what the likelihood should be, with N
being tails + heads
, K
being heads
, and P
being x
.
How does this likelihood function differ from a binomial?
Dear Professor Downey,
In chapter 7, predictions, we are calculating the probability of winning in sudden-death overtime:
We are creating a mixture of Exponential distributions, but we are taking the parameters of the Poisson distribution to do so. The posterior is our belief of what lambda is, which is the parameter for the Poisson distribution, which is also the expected goals per game, not the time between goals. So what is the rationale behind constructing the exponential mixture from goals per game posterior?
I am assuming this is done to find the distribution of "games until goals". But figure 7.3 is named as "Distribution of time between goals". What is the relationship between "games until goal" (i.e. Poisson parameter) and "time between goals" (i.e. exponential parameter)?
A) Would you think it would be prudent to use the actual time between goals to do this computation? For example, we can get the time between goals for each time till the last 4 matches. Use it as a prior. Then update it with the time between goals of the last 4 games to get the posterior of our belief about the exponential distribution parameter and then make a mixture of it?
B) I feel this would also factor in situations where both teams score the same number of goals (like 2-2) and go to the overtime for sudden-death. In point 2, we are only considering going to overtime if no team scores a goal if I am not mistaken. (unless that's what the rules are - please forgive my ignorance of hockey games).
I would be grateful if you or anyone can throw some light on the matter.
Thanks a lot!
def GaussianCdfInverse(p, mu=0, sigma=1):
"""Evaluates the inverse CDF of the gaussian distribution.
See http://en.wikipedia.org/wiki/Normal_distribution#Quantile_function
Args:
p: float
mu: mean parameter
sigma: standard deviation parameter
Returns:
float
"""
x = root2 * erfinv(2 * p - 1)
return mu + x * sigma
When trying to run code from hockey.py, I get:
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/ssrra/.spyder-py3/temp.py', wdir='C:/Users/ssrra/.spyder-py3')
File "C:\Users\ssrra\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\ssrra\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/ssrra/.spyder-py3/temp.py", line 541, in
main()
File "C:/Users/ssrra/.spyder-py3/temp.py", line 435, in main
goal_dist1 = MakeGoalPmf(suite1)
File "C:/Users/ssrra/.spyder-py3/temp.py", line 127, in MakeGoalPmf
metapmf.Set(pmf, prob)
File "C:\Users\ssrra.spyder-py3\thinkbayes.py", line 589, in Set
self.d[x] = y
TypeError: unhashable type: 'Pmf'
These files are all symbolic links, rather than the actual files.
Hello,
Thanks for having your book available here. Would it also be possible to upload the figures so the book compiles?
If dungeons.py is run as is an exception
Traceback (most recent call last):
File "dungeons.py", line 117, in
main()
File "dungeons.py", line 63, in main
colors = thinkplot.Brewer.Colors()
AttributeError: 'module' object has no attribute 'Brewer'
is thrown. I think line 63 should be thinkplot._Brewer.Colors().
It works then, however I am not sure what is it exactly you intend in terms of the single underscore - weakly hidden method.
Chapter 8 makes an interesting point about Observer Bias on the Red Line, but it took me a while to understand why the distribution over passengers' observed wait times is greater than the true wait times. After some thought it turns out I was assuming a more complicated model than the text. I don't think either model is unreasonable; my intuition just wasn't on the same page and I didn't find an explicit reason in the text to invalidate my model. The correct model might be obvious to most but perhaps the clarification below will help someone in the future:
The text reads:
The average time between trains, as seen by a ran- dom passenger, is substantially higher than the true average.
Why? Because a passenger is more like (sic) to arrive during a large interval than a small one. Consider a simple example: suppose that the time between trains is either 5 minutes or 10 minutes with equal probability. In that case the average time between trains is 7.5 minutes.
But a passenger is more likely to arrive during a 10 minute gap than a 5 minute gap; in fact, twice as likely. If we surveyed arriving passengers, we would find that 2/3 of them arrived during a 10 minute gap, and only 1/3 during a 5 minute gap. So the average time between trains, as seen by an arriving passenger, is 8.33 minutes.
For this to be true, I believe we have to assume a passenger arriving 0 minutes after the previous train has the same observed waiting time as a passenger arriving any arbitrary n > 0
minutes after the train. In other words, a passenger who just missed the previous train and waited the full gap is treated the same as a passenger who just barely made it the train.
My intuition was as follows: In reality, a passenger can arrive at the 9th minute of a 10 minute gap or the 4th minute of a 5 minute gap. Both passengers wait 1 minute. If you model it this way, the biased distribution actually shifts to the left. Why? Let's say there are two passengers arriving per minute (lam = 2
). For a 2 minute gap, you might have the following wait times for 4 passengers: [0, 0, 1, 1]
. For a 3 minute gap, you might have the following wait times for 6 passengers: [0, 0, 1, 1, 2, 2]
. A passenger who waits 0 has arrived just before the train departs. For an n
minute gap, wait time n-1
indicates the passenger arrived within the first minute after the previous train departed. From the 2-minute and 3-minute gaps above, you can deduce that across all trains P(wait n) < P(wait n-1)
. I.e., there is always be a chance for a passenger to wait 0 minutes. But for an e.g. 5 minute gap, it's impossible to wait 6 minutes.
Here is some code to simulate the process and the resulting histogram.
from math import floor
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
n = 50000 # Number of trains.
l = 2 # Passengers arriving per minute.
T = np.random.normal(10, 2, n) # True time between trains.
W1 = [] # Passengers' observed waiting time (my initial formulation).
W2 = [] # Passengers' observed waiting time (Think Bayes Formulation).
for t in T:
size = int(floor(t * l)) # This many passengers will end up on the next train.
W1 += list(np.random.uniform(0, floor(t), size))
W2 += list(np.ones(size) * t)
bins = int(T.max() - T.min())
plt.hist(T, color='red', bins=bins, alpha=0.3, normed=True, label='True wait $\mu=%.3lf$' % T.mean())
plt.hist(W1, color='blue', bins=bins, alpha=0.3, normed=True, label='Observed wait $\mu=%.3lf$' % np.mean(W1))
plt.hist(W2, color='green', bins=bins, alpha=0.3, normed=True, label='Observed wait simplified $\mu=%.3lf$' % np.mean(W2))
plt.legend(fontsize=8)
plt.show()
Hello,
I have been trying to pull off exercise 2.1 to create the cookie example without replacement but after failing miserably I checked the GIT repository for ThinkBayes2, and found a code with the solution for the second edition....
To my surprise I saw that I was on the correct track, however when trying to re-write the solution for the first version of ThinkBayes I still could not make it work...
if I use the following to set the hypos
bowl1=dict(vanilla=30,chocolate=10)
bowl2=dict(vanilla=20,chocolate=20)
pmf=Cookie([bowl1, bowl2])
I get:
TypeError: unhashable type: 'dict'
If I use the following (as cookie3.py in ThinkBayes2 )
bowl1=Hist(dict(vanilla=30,chocolate=10))
bowl2=Hist(dict(vanilla=20,chocolate=20))
I get :
AttributeError: 'Hist' object has no attribute 'Normalize'
I really want to see the solution to this, any hint or suggestion will be really appreciated!
Many thanks!
Leo
It would be nice to have a place for budding Bayesians who are also Julia fanatics to be able to submit as PRs translations of the Python code in the book.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.