allendowney / bayesmadesimple Goto Github PK

Code for a tutorial on Bayesian Statistics by Allen Downey.

Python 6.39% Jupyter Notebook 93.61%

bayesmadesimple's Introduction

Bayesian Statistics Made Simple

Bayesian statistical methods are becoming more common, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start.

In this tutorial, I introduce Bayesian methods using grid algorithms, which help develop understanding and prepare for MCMC, which is a powerful algorithm for real-world problems.

It is based on my book, Think Bayes, a class I teach at Olin College, and my blog, “Probably Overthinking It.”

Slides for this tutorial are here.

Installation instructions

Note: Please try to install everything you need for this tutorial before you leave home!

To prepare for this tutorial, you have two options:

Install Jupyter on your laptop and download my code from GitHub.
Run the Jupyter notebooks on a virtual machine on Binder.

I'll provide instructions for both, but here's the catch: if everyone chooses Option 2, the wireless network might not be able to handle the load. So, I strongly encourage you to try Option 1 and only resort to Option 2 if you can't get Option 1 working.

Option 1A: If you already have Jupyter installed.

Code for this workshop is in a Git repository on Github.
You can download it in this zip file. When you unzip it, you should get a directory named BayesMadeSimple.

Or, if you have a Git client installed, you can clone the repo by running:

    git clone https://github.com/AllenDowney/BayesMadeSimple

It should create a directory named BayesMadeSimple.

To run the notebooks, you need Python 3 with Jupyter, NumPy, SciPy, matplotlib and Seaborn. If you are not sure whether you have those modules already, the easiest way to check is to run my code and see if it works.

You will also need a small library I wrote, called empyrical-dist. You can see it on PyPI and you can install it using pip:

    pip install empyrical-dist

To start Jupyter, run:

    cd BayesMadeSimple
    jupyter notebook

Jupyter should launch your default browser or open a tab in an existing browser window. If not, the Jupyter server should print a URL you can use. For example, when I launch Jupyter, I get

    ~/BayesMadeSimple$ jupyter notebook
    [I 10:03:20.115 NotebookApp] Serving notebooks from local directory: /home/downey/BayesMadeSimple
    [I 10:03:20.115 NotebookApp] 0 active kernels
    [I 10:03:20.115 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
    [I 10:03:20.115 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

In this case, the URL is http://localhost:8888.
When you start your server, you might get a different URL. Whatever it is, if you paste it into a browser, you should should see a home page with a list of the notebooks in the repository.

Click on 01_cookie.ipynb. It should open the first notebook for the tutorial.

Select the cell with the import statements and press "Shift-Enter" to run the code in the cell. If it works and you get no error messages, you are all set.

If you get error messages about missing packages, you can install the packages you need using your package manager, or try Option 1B and install Anaconda.

Option 1B: If you don't already have Jupyter.

I highly recommend installing Anaconda, which is a Python distribution that contains everything you need for this tutorial. It is easy to install on Windows, Mac, and Linux, and because it does a user-level install, it will not interfere with other Python installations.

Information about installing Anaconda is here.

Choose the Python 3.7 distribution.

After you install Anaconda, you can install the packages you need like this:

    conda install jupyter numpy scipy matplotlib seaborn
    pip install empyrical-dist

Or you can create a Conda environment just for the workshop, like this:

    cd BayesMadeSimple
    conda env create -f environment.yml
    conda activate BayesMadeSimple

Then go to Option 1A to make sure you can run my code.

Option 2: if Option 1 failed.

You can run my notebook in a virtual machine on Binder. To launch the VM, press this button:

You should see a home page with a list of the files in the repository.

If you want to try the exercises, open 01_cookie.ipynb. You should be able to run the notebooks in your browser and try out the examples.

However, be aware that the virtual machine you are running is temporary.
If you leave it idle for more than an hour or so, it will disappear along with any work you have done.

Special thanks to the people who run Binder, which makes it easy to share and reproduce computation.

bayesmadesimple's People

Contributors

Stargazers

Watchers

Forkers

ashbt hamsterham opticalcy anmwinter bigsnarfdude tojainth leonvanbokhorst rdkap42 ashokkumar2016 danielle128 sunu antiface waanng lsxinh sdonapar pgnepal exhale11 boukos joelchan attibalazs marcelver vkarihaloo sshegheva gtostock mpfrush kmriyad aasthagrover hehuanshu96 wenliangz mitchshack gjcooper rlugojr ddesmarais58 yudai-miyoshi hynekb nathanyee bash-a sdusa jakerockland mjk276 talabany youess anhnguyendepocen srinyuta tsalo rpoudel1 62442katieb madi cwaldbieser kaitlynkeil masayas leblancfg jaydenwhyte oli5679 sportacause interzone2001 kudlatygosc au500 prayashm riemannzeta1191 b-rich yashrajstha anntenna vinpala thgngu biroc sahaba judeaugustinej jstaffans jasonjklim carletonlsmith tolygins alhamim soliz-factual jwoznicki kcsekhar-de juntf farhangithub27 labeebee pleabargain scottclay afcarl maryamnajafian evatamtam bravepollita piyali1988 stanreport dongqing7 jizhihang mehdiborji briando2005 frankhajoschusonschulze pepper1709 colcarroll incognito786 robertcordery tiagoooliveira ihaveknowledge hal2001 vishalbelsare

bayesmadesimple's Issues

Can't calculate credible intervals nor quantiles

When I try to do it, I get the following error:

NotImplementedError Traceback (most recent call last)
in
1 for i, b in enumerate(beliefs):
----> 2 print(b.mean(), b.credible_interval(0.9))

c:\users...\appdata\local\programs\python\python36-32\lib\site-packages\empiricaldist\empiricaldist.py in credible_interval(self, p)
716 tail = (1 - p) / 2
717 ps = [tail, 1 - tail]
--> 718 return self.quantile(ps)
719
720 @staticmethod

c:\users...\appdata\local\programs\python\python36-32\lib\site-packages\empiricaldist\empiricaldist.py in quantile(self, ps, **kwargs)
137 :return: float
138 """
--> 139 return self.make_cdf().quantile(ps, **kwargs)
140
141 def choice(self, *args, **kwargs):

c:\users...\appdata\local\programs\python\python36-32\lib\site-packages\empiricaldist\empiricaldist.py in inverse(self, **kwargs)
846 )
847
--> 848 interp = interp1d(self.ps, self.qs, **kwargs)
849 return interp
850

c:\users...\appdata\local\programs\python\python36-32\lib\site-packages\scipy\interpolate\interpolate.py in init(self, x, y, kind, axis, copy, bounds_error, fill_value, assume_sorted)
443 elif kind not in ('linear', 'nearest'):
444 raise NotImplementedError("%s is unsupported: Use fitpack "
--> 445 "routines for other types." % kind)
446 x = array(x, copy=self.copy)
447 y = array(y, copy=self.copy)

NotImplementedError: next is unsupported: Use fitpack routines for other types.

add_dist failed at statement "twice = d6.add_dist(d6)" in 01_cookie.ipynb

The error:
AttributeError: 'Pmf' object has no attribute 'add_dist'

Looks like you have removed add_dist from the Pmf class in the latest version of the empyrical_dist and that seems to break the code in the notebook.

My questions/requests to you are:

Did you change the empyrical_dist from empiricaldist module? Are they 2 different packages? Is one a later version of the other?
If you have changed the name/code of the module, could you please make necessary changes in the jupyter notebook files as well, so that it is easier for people who are following your lecture on youtube?

Unable to import empyrical-dist module

I have installed empyrical-dist module using pipenv virtual environment but when I try to run your code 01_cookie.ipynb, it throws an error in the first cell at "from empiricaldist import Pmf" saying "ModuleNotFoundError: No module named 'empiricaldist'". Could you please look into this and confirm this is not an issue with the package itself?

Google Sites is broken - README

The link in the README is broken.

A good replacement could be the Wayback Machine version

Incorrect solution: Dungeons & Dragons Bonus

First of all, thank you for putting on a great tutorial!

I believe I found bad solution in the cookie-notebook.

Bonus exercise: In Dungeons and Dragons, the amount of damage a goblin can withstand is the sum of two six-sided dice. The amount of damage you inflict with a short sword is determined by rolling one six-sided die.

Suppose you are fighting a goblin and you have already inflicted 3 points of damage. What is your probability of defeating the goblin with your next successful attack?

The provided solution is:

d6 = Pmf()
for x in [1,2,3,4,5,6]:
    d6[x] = 1
d6.normalize()

twice = d6.add_dist(d6)
twice[2] = 0
twice[3] = 0
twice.normalize()

>>> d6.ge_dist(twice)
0.11111111111111109

This implies that Goblin's health should be reduced, due to the 3 damage you already did, by creating the posterior over the Goblin's health with the assumption that it does not have 1-3 health remaining. Clearly this is not correct. The blow means that the Goblin's health must lie in the interval [1, 9], not [4, 12]

The correct solution, I believe, would be:

d6 = Pmf()
for x in [1,2,3,4,5,6]:
    d6[x] = 1
d6.normalize()

twice = d6.add_dist(d6)
goblin_health = twice.copy()

# 3 HP of damage already dealt:
dmg3 = Pmf()
dmg3[3] = 1.
sword = d6.copy().add_dist(dmg3)

>>> sword.ge_dist(goblin_health)
0.5

No mention of requirements needed to have installed

Probably should mention that the following is needed to be installed:

matplotlib
numpy
scipy
pandas

in the readme and/or your website to make it easier to get going.

link in README is broken

as title states

Include a `requirements.txt` file

To make it easier for attendees to install the necessary packages it would be nice to include a requirements.txt, e.g.

# requirements.txt
scipy
numpy
matplotlib
pandas

Attendees can then run pip install -r requirement.txt to get the required packages installed.