dgrtwo / empirical-bayes-book Goto Github PK
View Code? Open in Web Editor NEWIntroduction to Empirical Bayes: Examples from Baseball Statistics
License: Other
Introduction to Empirical Bayes: Examples from Baseball Statistics
License: Other
On page 32 of the PDF, section 4.4 of Credible intervals
But in cases where batters got 1 or 2 hits out of 10, the credible interval is much narrower than the credible interval.
Think it should read:
But in cases where batters got 1 or 2 hits out of 10, the credible interval is much narrower than the confidence interval.
Dear all, I hope to not be annoying (if this is the case, just remove this issue), however, when I try to compile your book from source with RStudio, I have this error:
Quitting from lines 185-194 (dirichlet-multinomial.Rmd)
Error: object "simulation" not found
Backtrace:
x
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8
[4] LC_COLLATE=it_IT.UTF-8 LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=it_IT.UTF-8
[7] LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.3.4 scales_1.1.0 ggplot2_3.3.0 knitr_1.28 tidyr_1.0.2 dplyr_0.8.5
[7] Lahman_7.0-1
loaded via a namespace (and not attached):
[1] rstan_2.19.3 tidyselect_1.0.0 xfun_0.13 reshape2_1.4.4 splines_4.0.0
[6] colorspace_1.4-1 vctrs_0.2.4 htmltools_0.4.0 stats4_4.0.0 loo_2.2.0
[11] yaml_2.2.1 utf8_1.1.4 rlang_0.4.6 pkgbuild_1.0.7 pillar_1.4.4
[16] glue_1.4.0 withr_2.2.0 matrixStats_0.56.0 lifecycle_0.2.0 plyr_1.8.6
[21] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 VGAM_1.1-3 evaluate_0.14
[26] labeling_0.3 inline_0.3.15 callr_3.4.3 ps_1.3.2 parallel_4.0.0
[31] fansi_0.4.1 highr_0.8 Rcpp_1.0.4.6 StanHeaders_2.19.2 farver_2.0.3
[36] gridExtra_2.3 packrat_0.5.0 digest_0.6.25 stringi_1.4.6 bookdown_0.18
[41] processx_3.4.2 grid_4.0.0 cli_2.0.2 tools_4.0.0 magrittr_1.5
[46] tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0 prettyunits_1.1.1
[51] assertthat_0.2.1 rmarkdown_2.1 rstudioapi_0.11 R6_2.4.1 compiler_4.0.0
Just an artifact from the blog post to book conversion in the introduction to the A/B Testing chapter:
"In this series of posts about an empirical Bayesian approach to
batting statistics, we’ve been estimating batting averages by modeling
them as a binomial distribution with a beta prior."
The book is brilliant!
The variable "yankee_1998_career" ist first used inside the pipe on page 30. As it is also used in the subsequent code, you should consider including the filtering step in the book as well.
Thanks for the great book!
At the beginning of section 7.2 (page 59), we have
We made up this model in one of the first posts in this series and have been
using it since.
It would be more coherent to refer to the earlier chapters of the book here.
Last part of this section, the calculated cumulative/mean PEP is at odds with the writing (5.46% vs 4.43% respectively)
I'm running into an error message when I run the log-likelihood and max likelihood programs in section 3.2. It reads: "Error in is.nan(x) : default method not implemented for typ 'list'." When I rerun it with Debug, it says: "Called from: VGAM::dbetabinom.ab(x, total, alpha, beta, log = TRUE)".
It may have to do with compatibility with different versions of R. I am running 3.3.3.
Any idea how to solve it?
Thanks
CHris Green
Toronto
I was unable to run the code on p.21 (cleaning for the 'career' tibble). Seems like the Lahman data has changed from 'Master' to 'People', using that instead worked.
When printing out the table 'two_players', in the subsequent text is a reference to column 'eb_estimate' however this is not printed. Could remove 'playerID' column from data or do a select to print relevant variables.
I've never used R before so this is likely my issue. First, I installed all the required packages. But when I attempt to build the book from scratch using rmarkdown::render_site(encoding = 'UTF-8')
it fails after a few minutes with the following.
Quitting from lines 77-78 (simulation-parameters.Rmd)
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: local ... withCallingHandlers -> withVisible -> eval -> eval -> load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'intermediate-datasets/sim_replication_models.rda', probable reason 'No such file or directory'Execution halted
Error in Rscript_render(f, render_args, render_meta) :
Failed to compile simulation-parameters.Rmd
In addition: Warning message:
running command '"C:/PROGRA1/R/R-341.0/bin/x64/Rscript" "C:/Program Files/R/R-3.4.0/library/bookdown/scripts/render_one.R" "simulation-parameters.Rmd" ".\render7020642845f2.rds" "_main.rds"' had status 1
In the code for simulation-parameters.Rmd, I see the load("intermediate-datasets/sim_replication_models.rda")
call, but I don't see that directory anywhere in the download or created earlier in the process.
Thank you so much for such a great book!
I was trying out the closed-form solution in page 51 and found that 1-sum(exp(log_vals))
wasn't implemented in Evan Miller's implementation section. Thus, this would be giving Pr(p_A>p_B) instead of Pr(p_B>p_A). I am not sure if what I am pointing out is correct or not but would greatly appreciate your confirmation.
We can see that the shape of the posteriors are be similar (following
a beta distribution)
page 18. Change to "posteriors are similar"?
Hi David,
Last month I asked you a question about fuzzyjoin. From there I discovered you have written a book. I have printed your book and it is very interesting to read!
Currently I try to run the code from chapter 11. If I run this code snippet I receive an error:
## Hierarchical modeling
library(splines)
eb_career_prior <- career_full %>%
ebb_fit_prior(H, AB, method = "gamlss",
mu_predictors = ~ 0 + ns(year, df = 5) * bats + log(AB))
The error is:
Error: Argument 2 must be length 1, not 20
> lifecycle::last_warning()
<deprecated>
message: `data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
backtrace:
1. ebbr::ebb_fit_mixture(career_w_pitchers, H, AB, clusters = 2)
2. ebbr::ebb_fit_mixture_(...)
3. dplyr::data_frame(...)
Do you know which can be the cause?
Thanks a lot!
Kind regards,
Marcel
"But in cases where batters got 1 or 2 hits out of 10, the credible interval is much narrower than the credible interval."
should say credible interval is much narrower than the confidence interval
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.