This project has moved to https://github.com/statsthinking21
poldrack / psych10-book Goto Github PK
View Code? Open in Web Editor NEWSource files for Statistical Thinking For the 21st Century
Source files for Statistical Thinking For the 21st Century
This project has moved to https://github.com/statsthinking21
raised by Jack van Horn: your example of website visits over a week might not be the best example for discrete categories and sampling requirements required by contingency tables and Chi-Squared testing.
dbinom()
function in R"raised by @tansey: glanced at the hypothesis testing chapter and do have one gripe. You really should probably mention that randomization in the presence of covariates is generally not valid. Biologists seem to love the idea of simply being able to randomize a column in their covariate matrix and report back a p-value, but that permutation tests assumes that all covariates are independent.
Hi there,
Awesome book and thanks so much for making it open!
A small typo on section 11.3.2: nPositives <- 9 should instead be 6 (unless the typo is in the written section).
11.3.4's written section says 9, 11.3.5 says 6.
from Felipe Ortega @jfelipe on twitter - Many examples are great, but I find fig. 5.5 confusing. Data in the last panel cannot be fitted to a straight line, but it can be fitted to an 2nd order polynomial which is linear model as well.
The labels of the right panel of Figure 5.8 overlap each other.
At the bottom of p. 97, it says "concepts in statsitics", instead of "statistics".
When I run the following command to create epub format ebook, I got errors. Has anyone tried to create a epub format before? Thanks!
bookdown::render_book("index.Rmd", "bookdown::epub_book")
Console output with errors:
label: sleepHist (with options)
List of 5
$ echo : logi FALSE
$ fig.cap : chr "Left: Histogram showing the number (left) and proportion (right) of people reporting each possible value of the"| __truncated__
$ fig.width : num 8
$ fig.height: num 4
$ out.height: chr "33%"
|........ | 12%
ordinary text without R code
label: unnamed-chunk-18
ordinary text without R code
label: sleepAbsCumulRelFreq (with options)
List of 5
$ echo : logi FALSE
$ fig.cap : chr "A plot of the relative (red) and cumulative relative (blue) values for frequency (left) and proportion (right) "| __truncated__
$ fig.width : num 8
$ fig.height: num 4
$ out.height: chr "33%"
ordinary text without R code
label: ageHist (with options)
List of 5
$ echo : logi FALSE
$ fig.cap : chr "A histogram of the Age (left) and Height (right) variables in NHANES."
$ fig.width : num 8
$ fig.height: num 4
$ out.height: chr "33%"
Quitting from lines 1265-1280 (StatsThinking21.Rmd)
Error in select(., Height) : unused argument (Height)
The book looks excellent; in particular the tight coupling of text and code! I have dreamed about the perfect stats book/course for a while, and your book seems close.
I would suggest adding a condensed overview of the popular statistical test as linear models and mentioning non-parametric tests. I've made my first attempt here: https://rpubs.com/lindeloev/tests_as_linear (still WIP, but close to finished).
If the infograph or other parts of this could be useful, you are welcome to steal it :-)
Disposition-wise, I like starting with a simple regression (y = a*x+b
) because that is what they've learned in high-school. And only then show how dummy coding of x
can be exploited to make this work for categorical differences (t-tests).
Chapter 5.2: In the line of
The mean (ofted denoted by a bar over the ...
I guess ofted
is a typo here.
Chapter 8.1: Line 6
In in a casino game, numbers ...
Here I think the in
is a typo.
There is a typo on p. 21 (Chapter 2). The caption of Figure 2.1 says "valdity" instead of "validity".
section 10.1.5
raised by Jack van Horn: In your chapter on contingency tables, given that you mention Sir Ronald Fisher in an earlier chapter, you might also wish to discuss Fisher's Exact Test on 2x2 tables.
The LaTeX commands \begin{figure}, \end{figure}, and \caption appear on p. 97 as part of the text (just below the figure).
In the P(cancer|test) equation at the end of 3.7, both "cancer" and "disease" are used. I'm assuming these refer to the same thing (B), so it may be clearer to use only one of the words.
In line 320 you have, "for example, the now discredited claims by wake:1999..." I think perhaps this an issue with a citation manager?
Chapter 3.8 in the second line of the last paragraph (page 37):
while the prior on the right side (P(B)) tells us how likely
I believe "prior" should be "part".
Chapter 3.10 in the second line of the second paragraph (page 38):
If were to ask you “How likely is it that the US will return to the moon by 2026”
I think a pronoun is missing between "If" and "were".
PS: I'm loving the book so far.
from twitter via @enoriverbend
would be useful to separate sentences to individual lines for purposes of diffing
In section 3.2.3, it says that the result of P(Roll1throw1 U Roll1throw2) is 1/6. I think it should be 11/36. (In fact, 11/36 is the result given on the slides of the lecture on probability).
Great chapter!
Adding a license file would help clarify just what people are allowed to do with this material and under what terms they are contributing when opening pull requests.
GitHub has a great resource for choosing an open source license here: https://choosealicense.com/ as well as instructions on how to easily add one of those licenses to the repository: https://help.github.com/articles/adding-a-license-to-a-repository/
There is a type in the first line of p. 59. It says: "which is evident in the fact that the all of the points are very close to the line." The "the" in bolds should not be there.
section 14.1.2
section 14.3.1
In section 3.9, just before the formula for prior odds, there is a typo in the word "positively". It says "positvely" instead of "positively".
When finding the conditional probability of having diabetes if inactive, this line states:
the probability of someone having diabetes given that they are physically active is 0.141
I believe active should be inactive.
"You may not be familiar with the
In the Histogram Bins section, you begin by explaining height is measured to the first decimal place. Could be helpful to add a code snippet where you show a slice of 5-10 height rows from the dataset. e.g.,:
NHANES_adult %>%
select(Height) %>%
slice(50:55)
I'm not sure the section on the "Freedman-Diaconis" rule is necessary...ends up making your code a bit more complicated too.
In the code chunk under "Skewness," I'm not sure why this is there:
names(waittimes) <- c("waittime")
There's a closing parenthesis missing at the end of p. 73. It says: "(see right panel Figure 5.16.".
In the second paragraph below section 6.9.1., "pie chart" is missing a blank space between the two words: "The piechart in Figure 6.14 [...]."
There's a type on p. 91. In the paragraph just below "7.1 How do we sample", it says "indivdual" instead of "individual".
I think there are typos in this sentence that are making it hard to understand: "Similarly, based on the fact that he reasoned that the since the probability of a double-six in throws of dice is 1/36, then the probability of at least one double-six on 24 rolls of two dice would be
I think there is a typo here: "He then used the fact that the complement of no sixes in four rolls is the complement of at least one six in four rolls". Should this instead read "He then used the fact that the complement of no sixes in four rolls is the probability of at least one six in four rolls"?
In the code chunk under "cumulative probability distributions," I'm not sure why this is there: pFreeThrows=dbinom(seq(0,4),4,0.91)
When you begin discussing conditional probability, some students may confuse this with the conjoint events you described above. You write "So far we have limited ourselves to simple probabilities - that is, the probability of a single event." Perhaps you can add a sentence clarifying this distinction?
On p. 95, before "7.4 The Central Limit Theorem", it says: "In Section 9.3.6statistical-power) [...]".
Original:
In particular, they could have show a figure like that shown in Figure 6.2,...
Suggested Change:
In particular, they could have shown a figure like that shown in Figure 6.2,...
-The first "show" should be "shown", but the sentence also seems a bit redundant.
I was flipping through a few of the chapters (really enjoying it so far!) and noticed that there are cases where you use English words in math environments, for example $P(Jefferson)=0.014$
on this line. This is of course totally fine, but the way LaTeX typesets characters in math is very different than the way it typesets text, which can lead to kind of weird rendering, e.g., the "ff" in the text uses a ligature whereas the f's are typeset separately in math mode:
There are a variety of ways to remedy this.
Personally, I just use the \text{}
command from within math mode, e.g., $P(\text{Jefferson})=0.014$
.
This is a pretty minor issue and likely not worth the time to retrospectively correct (similar to #6), but perhaps the sort of thing you'd want to consider as you add or revise equations.
I'm excited to continue reading through the book and think it's fantastic that you've released it into the wild!
There's an extra "in" at the end of p. 61: "Let's say five people are in in a bar".
Hi Prof Poldrack,
Thank you so much for making this awesome textbook open source. Do you have any plan to make this textbook available in file formats for e-readers, like epub? It would be great if we can read this book in a more small screen friendly way.
There is a typo in the last paragraph of p. 53. It says: "unless we are looking at the same number of of observations". There is one extra "of".
For example, in chapter 3, instead of the formulas I can see the source code for those formula (e.g., (\bar{A})). I used Lithium epub reader on Android.
For first plot in Introduction, perhaps add geom_point() to help clarify the sentence, "This plot is based on ten numbers."
Is code visible to students whenever there is no "echo=FALSE" (e.g, the recoding in the third chunk)? If so, I think it would be better to have everything using tidyverse and to make sure style is consistent throughout the book. I'm happy to work on this if that makes sense to you.
When defining nominal scale, could be helpful to loop back to your qualitative data coding example so they understand you are talking about the same thing.
I wasn't totally clear on this "A nominal variable can only be compared for equality; that is, do two observations on that variable have the same value?" -- perhaps add an example with the fruit or the political parties?
I found this somewhat confusing, "I could create a highly reliable measurement by simply giving the same answer each time regardless of the data." I wasn't sure what '"regardless of the data" meant.
In the text you write that
In this case, let’s say that we know that the specifity of the test is 0.9, such that the likelihood of a positive result when there is no explosive is 0.1.
The formula of marginal_likelihood is:
marginal_likelihood <- dbinom(nPositives,nTests,0.99)*prior + dbinom(nPositives,nTests,.1)*(1-prior)
Maybe I missed something, but I think that the correct formula should be:
marginal_likelihood <- dbinom(nPositives,nTests,0.9)*prior + dbinom(nPositives,nTests,.1)*(1-prior)
Or with respect to previous paragraph where you say:
Let’s say that we know that the sensitivity of the test is 0.99 – that is, when a device is present, it will detect it 99% of the time.
Then the marginal_likelihood formula should be:
marginal_likelihood <- dbinom(nPositives,nTests,0.99)*prior + dbinom(nPositives,nTests,.01)*(1-prior)
PS: Great book, thank you!
There is a capital letter where there should be a lowercase letter in the caption of Figure 6.3 (p. 80). It says "PAnel D shows a box plot".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.