dgrtwo / empirical-bayes-book Goto Github PK

View Code? Open in Web Editor NEW

189.0 189.0 75.0 171.74 MB

Introduction to Empirical Bayes: Examples from Baseball Statistics

License: Other

TeX 76.30% CSS 23.70%

empirical-bayes-book's People

Contributors

Stargazers

Watchers

Forkers

ravinpoudel sco-lo-digital danfreak ashlevitt tavpritesh benjamesbabala danielhadley allensmile vpranavanshu91 yamad jonathanjowen anhnguyendepocen davechilders evetricia narayananr kevinsoo paulstaab emekanwosu efsilvaa statkclee curioustauseef jwarwick brophyj xie186 mwilbur ybj2004 johnhenrypezzuto syzdemonhunter estebanangelm vishalbelsare torch77 yama1968 yimingli hal2001 kwmsmith hollins huangrh m-dz duolajiang perlatex jazzyart08 xenakas wckdouglas snowdj nisheethjaiswal jcabraham mkuehn10 nliced jldaniel77 fat-tail josuema mattdube daranjjohnson igorpereirabr1 shane-kercheval camadi lejarx lxhui sopkaki clobos sfarhd14 fayadabbasi jeongho leticia-han mihagazvoda ctaboada74 berningf martinheroux erhard1 fditraglia owain-s artlesshao symplyelah aureopaula

empirical-bayes-book's Issues

Typo on page 32

On page 32 of the PDF, section 4.4 of Credible intervals

But in cases where batters got 1 or 2 hits out of 10, the credible interval is much narrower than the credible interval.

Think it should read:

But in cases where batters got 1 or 2 hits out of 10, the credible interval is much narrower than the confidence interval.

error in compiling the book

Dear all, I hope to not be annoying (if this is the case, just remove this issue), however, when I try to compile your book from source with RStudio, I have this error:

Quitting from lines 185-194 (dirichlet-multinomial.Rmd)
Error: object "simulation" not found
Backtrace:
x

+-base::local(...)
| -base::eval.parent(substitute(eval(quote(expr), envir)))
| -base::eval(expr, p)
| -base::eval(expr, p)
+-base::eval(...)
| -base::eval(...)
| +-base::do.call(...)
| -(function (input, output_format = NULL, output_file = NULL, output_dir = NULL, ...
| -knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
| -knitr:::process_file(text, output)
| +-base::withCallingHandlers(...)
| +-knitr:::process_group(group)
| -knitr:::process_group.block(group)
| -knitr:::call_block(x)
| -knitr:::block_exec(params)
| +-knitr:::in_dir(...)
| -knitr:::evaluate(...)
| -evaluate::evaluate(...)
| -evaluate:::evaluate_call(...)
| +-evaluate:::timing_fn(...)

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8
[4] LC_COLLATE=it_IT.UTF-8 LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8
[7] LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] purrr_0.3.4 scales_1.1.0 ggplot2_3.3.0 knitr_1.28 tidyr_1.0.2 dplyr_0.8.5
[7] Lahman_7.0-1

loaded via a namespace (and not attached):
[1] rstan_2.19.3 tidyselect_1.0.0 xfun_0.13 reshape2_1.4.4 splines_4.0.0
[6] colorspace_1.4-1 vctrs_0.2.4 htmltools_0.4.0 stats4_4.0.0 loo_2.2.0
[11] yaml_2.2.1 utf8_1.1.4 rlang_0.4.6 pkgbuild_1.0.7 pillar_1.4.4
[16] glue_1.4.0 withr_2.2.0 matrixStats_0.56.0 lifecycle_0.2.0 plyr_1.8.6
[21] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 VGAM_1.1-3 evaluate_0.14
[26] labeling_0.3 inline_0.3.15 callr_3.4.3 ps_1.3.2 parallel_4.0.0
[31] fansi_0.4.1 highr_0.8 Rcpp_1.0.4.6 StanHeaders_2.19.2 farver_2.0.3
[36] gridExtra_2.3 packrat_0.5.0 digest_0.6.25 stringi_1.4.6 bookdown_0.18
[41] processx_3.4.2 grid_4.0.0 cli_2.0.2 tools_4.0.0 magrittr_1.5
[46] tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0 prettyunits_1.1.1
[51] assertthat_0.2.1 rmarkdown_2.1 rstudioapi_0.11 R6_2.4.1 compiler_4.0.0

Bayesian A/B Testing Intro

Just an artifact from the blog post to book conversion in the introduction to the A/B Testing chapter:

"In this series of posts about an empirical Bayesian approach to
batting statistics, we’ve been estimating batting averages by modeling
them as a binomial distribution with a beta prior."

The book is brilliant!

Variable "yankee_1998_career" not defined in the rendered book - Ch: Credible Intervals

The variable "yankee_1998_career" ist first used inside the pipe on page 30. As it is also used in the subsequent code, you should consider including the filtering step in the book as well.

Thanks for the great book!

7.2 Text refers to earlier blog post instead of book chapter

At the beginning of section 7.2 (page 59), we have

We made up this model in one of the first posts in this series and have been 
using it since.

It would be more coherent to refer to the earlier chapters of the book here.

The link in the readme is dead

5.3 - cumulative/mean PEP calculation vs text

Last part of this section, the calculated cumulative/mean PEP is at odds with the writing (5.46% vs 4.43% respectively)

Error in max likelihood est program in Sec 3.2?

I'm running into an error message when I run the log-likelihood and max likelihood programs in section 3.2. It reads: "Error in is.nan(x) : default method not implemented for typ 'list'." When I rerun it with Debug, it says: "Called from: VGAM::dbetabinom.ab(x, total, alpha, beta, log = TRUE)".

It may have to do with compatibility with different versions of R. I am running 3.3.3.

Any idea how to solve it?
Thanks
CHris Green
Toronto

Chapter 3 – Page 21

I was unable to run the code on p.21 (cleaning for the 'career' tibble). Seems like the Lahman data has changed from 'Master' to 'People', using that instead worked.

6.2 - printing table Aaron vs Piazza

When printing out the table 'two_players', in the subsequent text is a reference to column 'eb_estimate' however this is not printed. Could remove 'playerID' column from data or do a select to print relevant variables.

Unable to build after downloading

I've never used R before so this is likely my issue. First, I installed all the required packages. But when I attempt to build the book from scratch using rmarkdown::render_site(encoding = 'UTF-8') it fails after a few minutes with the following.

Quitting from lines 77-78 (simulation-parameters.Rmd)
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: local ... withCallingHandlers -> withVisible -> eval -> eval -> load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'intermediate-datasets/sim_replication_models.rda', probable reason 'No such file or directory'

Execution halted
Error in Rscript_render(f, render_args, render_meta) :
Failed to compile simulation-parameters.Rmd
In addition: Warning message:
running command '"C:/PROGRA~~1/R/R-34~~1.0/bin/x64/Rscript" "C:/Program Files/R/R-3.4.0/library/bookdown/scripts/render_one.R" "simulation-parameters.Rmd" ".\render7020642845f2.rds" "_main.rds"' had status 1

In the code for simulation-parameters.Rmd, I see the load("intermediate-datasets/sim_replication_models.rda") call, but I don't see that directory anywhere in the download or created earlier in the process.

h function in page 51

Thank you so much for such a great book!

I was trying out the closed-form solution in page 51 and found that 1-sum(exp(log_vals)) wasn't implemented in Evan Miller's implementation section. Thus, this would be giving Pr(p_A>p_B) instead of Pr(p_B>p_A). I am not sure if what I am pointing out is correct or not but would greatly appreciate your confirmation.

Spelling

We can see that the shape of the posteriors are be similar (following
a beta distribution)

page 18. Change to "posteriors are similar"?

error running code from chapter 11: The ebbr package

Hi David,

Last month I asked you a question about fuzzyjoin. From there I discovered you have written a book. I have printed your book and it is very interesting to read!

Currently I try to run the code from chapter 11. If I run this code snippet I receive an error:

## Hierarchical modeling

library(splines)
eb_career_prior <- career_full %>%
  ebb_fit_prior(H, AB, method = "gamlss",
                mu_predictors = ~ 0 + ns(year, df = 5) * bats + log(AB))

The error is:

Error: Argument 2 must be length 1, not 20

> lifecycle::last_warning()
<deprecated>
message: `data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
backtrace:
 1. ebbr::ebb_fit_mixture(career_w_pitchers, H, AB, clusters = 2)
 2. ebbr::ebb_fit_mixture_(...)
 3. dplyr::data_frame(...)

Do you know which can be the cause?

Thanks a lot!

Kind regards,
Marcel

wrong word in section 4.3 re confidence vs. credible intervals

"But in cases where batters got 1 or 2 hits out of 10, the credible interval is much narrower than the credible interval."

should say credible interval is much narrower than the confidence interval