rmcelreath / stat_rethinking_2023 Goto Github PK

View Code? Open in Web Editor NEW

2.2K 150.0 245.0 140.41 MB

Statistical Rethinking Course for Jan-Mar 2023

License: Creative Commons Zero v1.0 Universal

R 97.38% Stan 2.62%

stat_rethinking_2023's Introduction

Statistical Rethinking (2023 Edition)

For the 2024 version of the course see: https://github.com/rmcelreath/stat_rethinking_2024

Instructor: Richard McElreath

Lectures: Uploaded and pre-recorded, two per week

Discussion: Online (Zoom), Fridays 3pm-4pm Central European (Berlin) Time

Purpose

This course teaches data analysis, but it focuses on scientific models. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face.

Format

Online, flipped instruction. I will pre-record the lectures each week. We'll meet online once a week for an hour to discuss the material. The discussion time (3-4pm Berlin Time) should allow people in the Americas to join in their morning.

We'll use the 2nd edition of my book, <Statistical Rethinking>, and possibly some draft chapters for the 3rd edition. I'll provide a PDF of the book to enrolled students.

Registration: Closed.

Calendar & Topical Outline

There are 10 weeks of instruction. Links to lecture recordings will appear in this table. Weekly problem sets are assigned on Fridays and due the next Friday, when we discuss the solutions in the weekly online meeting.

Full lecture playlist: <Statistical Rethinking 2023 Playlist>

Note about slides: In some browsers, the slides don't show correctly. If points are missing from plots, download the slides PDF instead of viewing in browser.

Week ##	Meeting date	Reading	Lectures
Week 01	06 January	Chapters 1, 2 and 3	[1] <Science Before Statistics> <Slides> [2] <Garden of Forking Data> <Slides>
Week 02	13 January	Chapter 4	[3] <Geocentric Models> <Slides> [4] <Categories and Curves> <Slides>
Week 03	20 January	Chapters 5 and 6	[5] <Elemental Confounds> <Slides> [6] <Good and Bad Controls> <Slides>
Week 04	27 January	Chapters 7,8,9	[7] <Overfitting> <Slides> [8] <MCMC> <Slides>
Week 05	03 February	Chapters 10 and 11	[9] <Modeling Events> <Slides> [10] <Counts and Confounds> <Slides>
Week 06	10 February	Chapters 11 and 12	[11] <Ordered Categories> <Slides> [12] <Multilevel Models> <Slides>
Week 07	17 February	Chapter 13	[13] <Multilevel Adventures> <Slides> [14] <Correlated Features> <Slides>
Week 08	24 February	Chapter 14	[15] <Social Networks> <Slides> [16] <Gaussian Processes> <Slides>
Week 09	03 March	Chapter 15	[17] <Measurement> <Slides> [18] <Missing Data> <Slides>
Week 10	10 March	Chapters 16 and 17	[19] <Generalized Linear Madness> <Slides> [20] <Horoscopes> <Slides>

Coding

This course involves a lot of scripting. Students can engage with the material using either the original R code examples or one of several conversions to other computing environments. The conversions are not always exact, but they are rather complete. Each option is listed below.

Original R Flavor

For those who want to use the original R code examples in the print book, you need to install the rethinking R package. The code is all on github https://github.com/rmcelreath/rethinking/ and there are additional details about the package there, including information about using the more-up-to-date cmdstanr instead of rstan as the underlying MCMC engine.

R + Tidyverse + ggplot2 + brms

The <Tidyverse/brms> conversion is very high quality and complete through Chapter 14.

Python and PyMC3

The <Python/PyMC3> conversion is quite complete.

Julia and Turing

The <Julia/Turing> conversion is not as complete, but is growing fast and presents the Rethinking examples in multiple Julia engines, including the great <TuringLang>.

Other

The are several other conversions. See the full list at https://xcelab.net/rm/statistical-rethinking/.

Homework and solutions

I will also post problem sets and solutions. Check the folders at the top of the repository.

stat_rethinking_2023's People

Contributors

Stargazers

Watchers

Forkers

kshitizd janrinaldo matthewfeickert nizardino mdekauwe haoli2025 alexandreokano karthy257 augustkrzhu maxdrohde georgi-petkov romcia englianhu gkoutis-george gtpb nirajpaudel14 mbrukman siddharthansingaravel ukaserge amyn83 tirimula francishzq rxhem hansvomkreuz chaoscalm lanetk cybernetics owain-s smarthi azizighani sergemayombo focardozom icodein gheber daffeh10 margosolo washingtonsilva carinafo eliekawerk jaedukseo rabinpoudyal khlljm linushof c0ngtri123 minhaics maiphong0411 vietlod milicastankovic ailabteam truongscotl thanhhung2112 kieutrinh-t lequoc2877 pqhai hve1964 kamicollo alexnguyen96 praveennanda124 julienlin jeannie-nguyen hj3938 gitgaryhuang jonyhuang snowdj caiogeraldes tejas28ecube mbolivar-glv fghibellini brndnaxr berningf nhattruongpham voicon jamedina09 fredyenn 5l1v3r1 tommasoc80 csqr hdocmsu normanli33 byteapple00 roland-hochmuth jmwhite prowriters beskoonlog cornelltechcs5670-spring2022 nian-jingqing mrgongqi jcgv2002 bojespersen a1ip davan690 paolabc franc703 fgazzelloni marnolean hendersonad pythseq damense xavierbarber wossata

stat_rethinking_2023's Issues

height and age as piecewise linear in 03_howell_new_weight_model.r not working

Let me start by expressing my gratitude for the great lecture and book. Thanks for making this available on youtube and GitHub.

I did notice, that the last few lines in https://github.com/rmcelreath/stat_rethinking_2023/blob/main/scripts/03_howell_new_weight_model.r
are unfinished. There is a missing closing parenthesis as well as missing prior specifications.

#######
# height and age as piecewise linear

data(Howell1)
d <- Howell1

dat <- list(
    H = d$height,
    A = d$age )

m <- quap(
    alist(
        H ~ dnorm(mu,sigma),
        mu <- a1*(1-exp(-b1*A)) + a2*(1-exp(-b2*(A))
    )
)

Garden of forking paths not rendering

Hi, was not a signed-up student, but following along with your 2023 lectures, and couldn't get the initial 02_garden_animation.r to render without this error:

Error in garden(arc = arc, possibilities = c(0, 0, 0, 1), data = dat,  : 
  argument "prog_dat" is missing, with no default

Since no one else complained about this yet, it may just be a user-error on my part, as I'm neither an RStudio or dev regular. But I found that the error was resolved by adding a default value for prog_dat in line 93:

prog_dat=c(-1,-1,-1)

blank

The example code uses the function "blank".
However, it is not clear from the code which package I do need to import.

blank(w=2,h=0.7)
Error in blank(w = 2, h = 0.7) : could not find function "blank"

lppd cv equation

lppd cv equation in Lecture 7 slide and text (2nd edition) p218 looks inconsistent with lppd equation in text p210 which puts "log" in front of "1/S".

I raised the same issue in 2022.
rmcelreath/stat_rethinking_2022#20

Another backdoor path in bonus example of Lecture 6?

Is the path X <- S <- A -> Y not a path that should be considered in the analysis of the DAG in the bonus content of Lecture 6?

`sapply()` is an avoidable complication in code block 2.1

I've just been watching lecture 2, and when R example 2.1 comes up (35:00), you explain that

For those of you who don't use R, sapply is just a loop, it's just a function that loops over a list

Because R vectorises functions by default, you don't need to use sapply here. This might make the code flow clearer, since you wouldn't need to explain the purpose of sapply.

I've written a small counter-example that uses vectorisation rather than sapply:

sample <- c("W", "L", "W", "W", "W", "L", "W", "L", "W")
W <- sum(sample == "W")
L <- sum(sample == "L")
p <- c(0, 0.25, 0.5, 0.75, 1)

get_ways <- function(q) (q*4)^W * ((1-q)*4)^L
ways <- get_ways(p)

prob <- ways/sum(ways)

cbind(p, ways, prob)
#>         p ways       prob
#> [1,] 0.00    0 0.00000000
#> [2,] 0.25   27 0.02129338
#> [3,] 0.50  512 0.40378549
#> [4,] 0.75  729 0.57492114
#> [5,] 1.00    0 0.00000000

^{Created on 2023-03-20 with reprex v2.0.2}

Possible error in the variance-covariance matrix in the r chunks 3.17 and 3.19

I am following the code of the third chapter of the third draft (I am a little late), and I have found a discrepancy. In the m3.1 model, using quap, the estimators calculated by running the code match perfectly with those in the draft. However, when I try to display the variance-covariance matrix, the values do not match.

I think the problem is that the matrix that appears in the draft is not the correct one, because if I square the sd of the estimators, they match the variance that I get using vcov(m3.1).

I report it here in case it is worth checking!

Issues with installing the "rethinking" package in RStudio

A colleague and I are going through the 'Statistical Rethinking' series, but we are unable to install the "rethinking" package in RStudio. This is the warning message in R, "package ‘rethinking’ is not available for this version of R".

We are unsure why this is happening because we are using the latest version of R. We would appreciate your suggestions on how to go about the issue.

Thanks!

make_bar from compute_posterior function (Lecture 2) not found

Hello! I was wondering if I needed to install a separate library to be able to run the make_bar in the compute_posterior function that was shown in Lecture #2? I tried typing out all the code and running it, however it produced an error saying "Error in make_bar(q): could not find function "make_bar"?

week07_solutions - sim( ) maybe broken of behaving unexpected

The code chunk below gives an error that may be pointing to the in bold below. When run, it breaks and gives the error listed later.

pU0 <- sapply( 1:61 ,
function(dist)
sim(m4,vars=c("Ks","C"),data=list(A=Asim,U=rep(1,n),D=rep(dist,n)))$C
)

Error in if (left == var) { : the condition has length > 1

When I tried to debug it, the error may be coming from the sim() function in the line in bold below... I might be wrong, but not able to fix it either. Or maybe I missed the whole point, actually.

sim_vars <- list()
for (var in vars) {
f <- fit@formula[[1]]
for (i in 1:length(fit@formula)) {
f <- fit@formula[[i]]
left <- as.character(f[[2]])
if (left == var) {
if (debug == TRUE)
print(f)
break

Book version 3

Hi,
I hope this is not the wrong place to ask, but I couldn't find information elsewhere:
Is there a rough date for when the 3rd version of the book gets published?

Circular Bayesian statistics

I was wondering, do you have some ideas/books/articles on circular statistics (using Bayesian methods as a core)? I have the book 'Circular statistics in R', but that is more non-Bayesian testing. Thanks for the help.
All the best,
Victor

Splines to compare treatments

Say I have some time-series data for 2 groups: 1 group received placebo and the other received treatment. Can the effect of treatment be modeled with splines similar to a linear model. For example:

# DATA
n_A <- 10 # Number of rats in Group A
n_B <- 10 # Number of rats in Group B

m_NREMS <- 4 # number of measurements for NREMS.  Here, we'll assume that the rhythyms of NREMS express over 6-h blocks, giving us 4 blocks.

NREMS_A <- data.frame("Blk0" = rep(0, n_A),
                      "Blk1" = rgamma(n = n_A, shape=144, rate=6/5), # 2h NREMS
                      "Blk2" = rgamma(n=n_A, shape=324, rate=9/5), # 3h NREMS
                      "Blk3" = rgamma(n=n_A, shape=900, rate=3), # 5h NREMS
                      "Blk4" = rgamma(n=n_A, shape=576, rate=12/5)) # 4h NREMS

NREMS_B <- data.frame("Blk0" = rep(0, n_B),
                      "Blk1" = rgamma(n = n_B, shape=144, rate=6/5) + rnorm(n=n_B, mean=30, sd=6), # Similar to NREMS_A, but with the effect of S added
                      "Blk2" = rgamma(n=n_B, shape=324, rate=9/5) + rnorm(n=n_B, mean=20, sd=5),
                      "Blk3" = rgamma(n=n_B, shape=900, rate=3) + rnorm(n=n_B, mean=10, sd=4),
                      "Blk4" = rgamma(n=n_B, shape=576, rate=12/5) + rnorm(n=n_B, mean=5, sd=3))

NREMS_A_cuml <- NREMS_A
for(i in 2:ncol(NREMS_A)) {
  NREMS_A_cuml[,i] <- NREMS_A_cuml[,i] + NREMS_A_cuml[,i-1]
}

NREMS_B_cuml <- NREMS_B
for(i in 2:ncol(NREMS_B)) {
  NREMS_B_cuml[,i] <- NREMS_B_cuml[,i] + NREMS_B_cuml[,i-1]
}

NREMS_All <- data.frame("NREMS" = c(as.numeric(unlist(NREMS_A_cuml)), as.numeric(unlist(NREMS_B_cuml))),
                        "Group" = c(rep("A", (m_NREMS+1)*n_A), c(rep("B", (m_NREMS+1)*n_B))),
                        "Block" = c(rep(0:4, each=n_A), c(rep(0:4, each=n_B))))
NREMS_All$treatment <- ifelse(test = NREMS_All$Group=="A",
                              yes = FALSE,
                              no = TRUE)
NREMS_All$minute <- NREMS_All$Block*360

# SPLINES
num_knots <- 4
knot_list <- quantile(NREMS_All$minute, probs=seq(from=0, to=1, length.out=num_knots))
knot_degree <- 3

B <- bs(NREMS_All$minute,
        knots=knot_list[-c(1, num_knots)],
        degree=knot_degree,
        intercept=TRUE)

# MODEL
NREMS_model_1 <- quap(
  alist(
    NREMS ~ dnorm(mu, sigma),
      mu <- a +
            B %*% w[treatment],
        a ~ dnorm(180, 10),
        w[treatment] ~ dnorm(1, 1),
      sigma ~ dexp(1)
  ), data=list(NREMS=NREMS_All$NREMS,
               treatment=as.integer(NREMS_All$treatment),
               B=B),
     start=list(w=rep(0, ncol(B)))
)

I think that the problem is that B is a matrix and w is a vector such that w[treatment] is read as an index on the vector w, but I want it to be work like it does in the random effects models.

Any help much appreciated.

Enquiry for 2024 enrolment

Any info for a 2024 course and how to enrol it ?
For online learning, I am much more successful if I am part of a cohort of students learning the same stuff at the same time..

phylogenetic imputation of a binary predictor

Thanks for yet another round of awesome lectures!

I have a question about the imputation procedure in the missing data lecture.

So, in the following model, the primate phylogeny is used to impute missing data in predictor G (group size).

mBMG_OU3 <- ulam(
    alist(
        B ~ multi_normal( mu , K ),
        mu <- a + bM*M + bG*G,
        G ~ multi_normal( nu , KG ),
        nu <- aG + bMG*M,
        M ~ normal(0,1),
        matrix[N_spp,N_spp]:K <- cov_GPL1(Dmat,etasq,rho,0.01),
        matrix[N_spp,N_spp]:KG <- cov_GPL1(Dmat,etasqG,rhoG,0.01),
        c(a,aG) ~ normal( 0 , 1 ),
        c(bM,bG,bMG) ~ normal( 0 , 0.5 ),
        c(etasq,etasqG) ~ half_normal(1,0.25),
        c(rho,rhoG) ~ half_normal(3,0.25)
    ), data=dat_all , chains=4 , cores=4 , sample=TRUE )

My question is, what if G was a binary predictor? For instance, we might code a species either as solitary (S=0) or social (S=1) and use that to predict brain size B. We then want to use phylogenetic information in the imputation of S.

My guess is that the likelihood for S would not be multivariate normal, but how would the code look like then?

Thanks in advance!