Giter Site home page Giter Site logo

freq_cogsci's Introduction

Freq_CogSci

Linear mixed models in Linguistics and Psychology: A Comprehensive Introduction

freq_cogsci's People

Contributors

anthesevenants avatar audreyburki avatar vasishth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

freq_cogsci's Issues

Compiling book fails: data/powerbeta1mean.Rda missing

Compiling the book fails because the file data/powerbeta1mean.Rda is missing, see here.

There is a line that creates this file on disk but it's commented out, see here.

Not clear why the file is created and then immediately loaded. The detour via disk may not be necessary.

Same problem here.

ch 3 REML vs ML

There is absolutely nothing at the moment about REML vs ML. A broader issue is that we need some explanation for how the parameters are estimated. This is explained for the simple linear model in ch 5, but not for the LMM, even though this is not so hard if one know matrix algebra at an elementary level.

Add measurement error simulation code into book

Source: https://statmodeling.stat.columbia.edu/2024/04/14/simulation-to-understand-measurement-error-in-regression/#comment-2359176

library(tidyverse)
set.seed(123)
n <- 1000
a <- 0.2
b <- 0.3
sigma <- 0.5

fake %
mutate(y_star = rnorm(n, y, sigma_y),
x_star = rnorm(n, x, sigma_x))

bind_rows(
tibble(x=fake$x, y=fake$y, name=”No measurement error”),
tibble(x=fake$x, y=fake$y_star, name=”Measurement error on y”),
tibble(x=fake$x_star, y=fake$y, name=”Measurement error on x”),
tibble(x=fake$x_star, y=fake$y_star, name=”Measurement error on x and y”)
) %>%
mutate(name = fct_inorder(name)) %>%
ggplot(aes(x,y)) +
geom_point() +
geom_smooth(method=”lm”, fullrange=TRUE) +
facet_wrap(~name)

Nested contrasts chapter

Show a quick example illustrating this point:

"Note that in cases such as these, where $A_{B1}$ vs. $A_{B2}$ are nested within levels of $B$, it is necessary to include the effect of $B$ (part of speech) in the model, even if one is only interested in the effect of $A$ (word frequency) within levels of $B$ (part of speech). Leaving out factor $B$ in this case can lead to biases in parameter estimation in the case the data are not fully balanced."

ch 3 explain log normal

"The exponentiated values are medians, not means. We use the median here because the mean in the log-transformed data depends on the standard deviation."

This needs to be explained in detail in a box.

BLUEs and BLUPs

  • p. 126: Possibly add some content here: discuss difference between BLUEs and BLUPs, which estimate is „more correct“? What is the reason why we want BLUPs? I.e., regression to the mean.

Improve figure

Figure 3.2: again, remove the data points, they don’t contribute anything, do they? it’s hard to see the different lines, maybe better after removing the points. Also: is it possible to give the slopes in numbers; that might be easier to judge

Add measurement error simulation code into book

Source: https://statmodeling.stat.columbia.edu/2024/04/14/simulation-to-understand-measurement-error-in-regression/#comment-2359176

Code doesn't work, but need to fix this.

library(tidyverse)
set.seed(123)
n <- 1000
a <- 0.2
b <- 0.3
sigma <- 0.5

fake %<%
mutate(y_star = rnorm(n, y, sigma_y),
x_star = rnorm(n, x, sigma_x))

bind_rows(
tibble(x=fake$x, y=fake$y, name=”No measurement error”),
tibble(x=fake$x, y=fake$y_star, name=”Measurement error on y”),
tibble(x=fake$x_star, y=fake$y, name=”Measurement error on x”),
tibble(x=fake$x_star, y=fake$y_star, name=”Measurement error on x and y”)
) %>%
mutate(name = fct_inorder(name)) %>%
ggplot(aes(x,y)) +
geom_point() +
geom_smooth(method=”lm”, fullrange=TRUE) +
facet_wrap(~name)

Daniel objection to overfiitting in the no pooling model

  • p. 108, Figure: what do the points represent? the raw data? there seems serious overfitting

SV: Those are the data-points from the RC expt. Sure, overfitting yes, but that’s what the repeated measures regression model would require us to do. What is your objection here?

Simulation chapter: missing data

Show through simulation that the LMM's Type I error properties are hardly affected by missing data. This is a consequence of shrinkage.

F1 formant data not appropriate?

p. 72: why are female and male data points paired? I don’t understand this. Isn’t gender fixed for each person, and data are from different people? Is this averaged across male versus female subjects? And is it paired because these are responses to the same vowel in the same language?
yes, the last point you mention.
I.e., in an item-based analysis, gender is a dependent variable? That is not a very intuitive concept for psychologists, who may not even know about item-based statistics - many psych people do not need these. This needs to be explained, or use an example with subject-based statistics.
not seeing the problem, but maybe we can talk about it later and change the example. I opened an issue.

[Typo?] Unnecessary 'was' in an object relative clause

In the following line that explains an object relative clause, the first was which follows the relative clause marker who should be unnecessary.

Subject relative clauses are sentences like *The man who was standing near the doorway laughed*. Here, the phrase (called a relative clause) *who was standing near the doorway* modifies the noun phrase *man*; it is called a subject relative because the noun phrase *man* is the subject of the relative clause. By contrast, object relative clauses are sentences like *The man who was the woman was talking to near the doorway laughed*; here, the *man* is the grammatical object of the relative clause *who was the woman was talking to near the doorway*.

🆖 The man who (*was) the woman was talking to near the doorway laughed
🆗 The man who the woman was talking to near the doorway laughed

Fig caption missing

  • p. 107: add Figure number + caption (also missing for some other figures, e.g., p. 108)

Compiling book fails: missing image file

Error message:

label: lk13E1 (with options) 
List of 4
 $ fig.cap  : chr "(ref:lk13E1)"
 $ out.width: chr "99%"
 $ echo     : logi FALSE
 $ fig.align: chr "center"

Quitting from lines 2948-2949 (Freq_CogSci.Rmd) 
Error in knitr::include_graphics("figures/lk13E1.png", dpi = 1000) : 
  Cannot find the file(s): "figures/lk13E1.png"
Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> <Anonymous>

The code block in question is:

knitr::include_graphics("figures/lk13E1.png", dpi = 1000)

However, the image figures/lk13E1.png does not exist and is not generated either (at least not under that name afaics).

Compiling book fails: 'x' must be an array of at least two dimensions

This error is generated here. Full error message:

label: unnamed-chunk-385
Quitting from lines 9514-9521 (Freq_CogSci.Rmd) 
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 
  'x' must be an array of at least two dimensions
Calls: <Anonymous> ... eval -> eval -> table -> rowSums -> rowSums -> <Anonymous>

ch 2 Audrey suggested changes

  • Audrey said:

Is there already a paragraph or two in the book about hypothesis testing, and the fact that these statistical tests only make sense with a priori hypotheses? I assume there will be a discussion about inference vs exploratory analyses later on but it would not harm to mention that these tests test one a priori defined hypothesis here already.

Also, I would add somewhere in this chapter an explanation of why this approach uses null hypothesis testing (i.e., the only hypothesis for which we have some information)

PS I don't understand the last sentence from Audrey.

  • Add a box on a one-sided t-test.
  • Relocate funnel plot to beginning of discussion on Type M and S errors.
  • Explain Levy and Keller design (2.7.1) in more detail. The word adjunct was not clear to Audrey.
  • When showing the formant data (Apache etc), show more than one vowel.
  • Explain what degrees of freedom is
  • Add a section on why aggregation is bad.

Simulation chapter needs a section on the dangers of aggregation in LMMs

Need to show through simulation that aggregating data by items is going to cause a lot of potentially important variation from being hidden, leading to possible Type I error inflation (of course, it depends on the particular situation being simulated---maybe show a case where this doesn't happen).

Add an explanation of terms

maybe write a brief intro to central math concepts in the beginning, such as: what is an expectation? what is i.i.d.? ∀, matrix inversion, … (or provide a footnote with an explanation once the concept is first encountered)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.