rvlenth / emmeans Goto Github PK
View Code? Open in Web Editor NEWEstimated marginal means
Home Page: https://rvlenth.github.io/emmeans/
Estimated marginal means
Home Page: https://rvlenth.github.io/emmeans/
I've been trying to create my own method, but have found that arguments cannot be passed when the method is used in an interaction. Here is some sample mock code to demonstrate the issue:
# some fake data
df <- data.frame(Y = rnorm(100),
F1 = factor(rep(letters[1:4],each = 25)),
F2 = factor(rep(letters[5:8],25)))
fit <- aov(Y ~ F1*F2, data = df)
rg <- emmeans(fit, ~F1*F2)
# my method - won't run unless '.second' is set to FALSE
my_method.emmc <- function(x, .second = TRUE){
vex <- rep(0,length(x))
if(.second)
stop("")
vex[1:2] <- c(-1,1)
as.data.frame(vex)
}
# this works
contrast(rg, "my_method",.second = FALSE)
# this does not
contrast(rg, interaction = c("my_method","pairwise"),.second = FALSE)
I think this can be fixed by changing contrast.emmGrid
, when an interaction in called to include ...
. So from this:
object = contrast.emmGrid(object, interaction[i], by = vars[-i], name = nm)
to this:
object = contrast.emmGrid(object, interaction[i], by = vars[-i], name = nm,...)
Hi
while working with emmeans on a multinomial model I've noticed that you get an error when the dependent variable has more than 3 levels and the model features a covariate (see below for reproducible code). I did a bit of exploring and found this:
There is a weird behavior in emmeans when it calls nnet multinomial functions required to compute the standard errors of the means. When the standard errors are estimated, four matrices are needed, which are in the ref_grid object called by emmeans. They are :
grid<-ref_grid(model)
grid@grid
grid@V
grid@bhat
coef(model)
when the number of categories in the dep.variable are more than 3, and there is a covariate in the model, for some reasons the matrices are non-conformable and an error is issued. It does not happen for K=3.
Here is the code to reproduce the error
K=4
gre<-rnorm(1200,0,1)
admit<-factor(rep(c(1:2),600))
rank<-factor(rep(c(1:K),1200/K))
formula <- as.formula(rank ~ admit +gre)
model <- nnet::multinom(formula)
model$call$formula <- formula
emmeans(model, formula(~ rank|admit),mode = "prob")
The workaround I found is to estimate the reg_grid like this:
grid<-ref_grid(model,cov.reduce = function(x) c(mean(x)+.1*100,mean(x)-.1*100))
and then it works.
I hope this is useful
mc
I am fitting a bayesian mixed model as follows:
df <- data.frame(Participant = as.factor(rep(1:25, each = 4)),
RT = rnorm(100, 30, .2),
Stress = runif(100, 3, 5))
fit <- rstanarm::stan_lmer(RT ~ Stress + (1|Participant), data=df)
I'd like to extract the reference grid (to further predict the outcome):
ref_grid <- emmeans::ref_grid(fit, at = list(
Stress = seq(min(df$Stress),
max(df$Stress),
length.out = 10)))
However, when I check the reference grid (ref_grid@grid), no sign of the random terms. Is it normal?
Sorry if the question is stupid, I didn't find any elements of answer in the vignettes or documentation.
Thanks a lot!
I think it would be very useful if there was an option to have emmeans
return the RR (risk ratio) for binomial models.
RR = Pr(success | a) / Pr(success | b)
or:
RR = (1+exp(-ORb)) / (1+exp(-ORa))
For predicted values (no contrast) this should return the same value as type = 'response'
(prob
), and for contrasts the RR as stated above.
It should look something like this:
g <- sample(c(0,1),size = 1000, replace = T)
outcome <- round(1/(1+exp(-g + rnorm(1000))))
fit <- glm(outcome ~ factor(g), family = binomial())
emmeans(fit,~g, type = 'rr')
> g prob SE df asymp.LCL asymp.UCL
> 0 0.5049116 0.02216104 Inf 0.4615488 0.5482006
> 1 0.8289206 0.01699386 Inf 0.7930034 0.8597081
>
> Confidence level used: 0.95
> Intervals are back-transformed from the logit scale
contrast(emmeans(fit,~g, type = 'rr'),'pairwise')
> contrast estimate SE df z.ratio p.value
> 0 - 1 0.6091194 0.1490623 Inf -10.454 <.0001
>
> Results are given on the risk ratio (not the response) scale.
Thanks (again and again) for this amazing package!
I have difficulty to understand the emmeans
results for coxph
model.
Is the default setting of emmeans
calculate the linear combination of log hazards?
Would it possible to have emmeans
to report the estimated survival curve?
For example, in the code below, would it be possible to create the estimated survival probability based on a user-defined time (e.g. time = 1)?
library(survival)
library(emmeans)
# Create the simplest test data set
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Fit a stratified model
fit <- coxph(Surv(time, status) ~ x + factor(sex), test1)
emmeans(fit, specs = "sex")
sex emmean SE df asymp.LCL asymp.UCL
0 -0.4001928 0.6034757 NA -1.582983 0.7825979
1 0.5335904 0.8046343 NA -1.043464 2.1106447
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Hi,
in emmeans version 1.1.2 if a model has been fit using a variable that is coerced to a numeric variable with as.numeric(var) in the formula, specifying the variable in emtrends causes an error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
‘range’ not meaningful for factors
Example giving error:
data(iris)
model <- lm(Sepal.Length ~ as.numeric(Species), data = iris)
emtrends(model, ~ as.numeric(Species), var = "as.numeric(Species)")
However, if the variable is coerced to numeric before using it in the formula, then it works as expected:
data(iris)
iris$Species <- as.numeric(iris$Species)
model <- lm(Sepal.Length ~ Species, data = iris)
emtrends(model, ~ Species, var = "Species")
Does anyone know of a work around as I don't want to re-run the model?
Thank you for this nice package!
Hi Russ,
I noticed the following problems disallowing scale()
in the var
part of emtrends
.
library("emmeans")
data("Prestige", package = "carData")
mp_3 <- lm(income ~ scale(education)*scale(prestige), Prestige)
emtrends(mp_3, specs = "education", var = "prestige",
at = list(education = c(8.01, 10.74, 13.47)))
# education prestige.trend SE df lower.CL upper.CL
# 8.01 141.6524 40.24965 98 61.7783 221.5266
# 10.74 186.9203 32.50945 98 122.4064 251.4343
# 13.47 232.1882 34.95312 98 162.8249 301.5515
#
# Confidence level used: 0.95
emtrends(mp_3, specs = "education", var = "scale(prestige)",
at = list(education = c(8.01, 10.74, 13.47)))
# Error in if (any(diffl == 0)) warning("Some differentials are zero") :
# missing value where TRUE/FALSE needed
Best,
Henrik
Hi Russ,
I am working on extending my new afex_plot()
function with a default
method. In this method I rely on emmeans::recover_data()
to obtain the raw data used for fitting the model. This works in many cases, but not in all. For example:
require("ordinal")
wine.clm <- clm(rating ~ temp + contact, scale = ~ judge,
data = wine, link = "probit")
emmeans::recover_data(wine.clm, list(pairwise ~ temp, pairwise ~ contact))
# Error in UseMethod("recover_data") :
# no applicable method for 'recover_data' applied to an object of class "clm"
However
emmeans:::recover_data.clm(wine.clm)
# temp contact judge
# 1 cold no 1
# 2 cold no 1
# 3 cold yes 1
# 4 cold yes 1
# [...]
It would be great if you could export all recover_data
methods so they could be used in other packages, such as afex
. If it helps, I could submit a pull request.
Thanks,
Henrik
arguably adjust='none' should squelch and p.value adjustments, but does not, as evidenced:
stopifnot(! identical(coefficients(contrast(warp.emm, "eff", by = NULL)),coefficients(contrast(warp.emm, "eff", by = NULL))))
Error: !identical(coefficients(contrast(warp.emm, "eff", by = NULL)), .... is not TRUE
THANKS FOR EMMEANS!
The interaction plot using emmip()
seems to accept some of the customization from ggplot2
but not others. For example, theme_bw()
works, and the line color works, but the linetype and plotting symbol don't have an effect. I tried using the lty
, col
, and pch
arguments as well, and the lattice
engine, to no avail. Am I missing something or is this a bug?
I recognize this is a tiny feature. For the benefit of my students, I try to be as "compact" as possible, and your package is a phenomenal resource. But for those who might not have color printers, I tend to rely on linetype and plotting symbol more than colors.
Example:
df1 <- data.frame(sex = rep(c("Female", "Male"), each=9),
time=c("breakfeast", "Lunch", "Dinner"),
bill=c(10, 30, 15, 13, 40, 17) )
lm01 <- lm( bill ~ sex*time , data=df1 )
emmip( lm01 , sex ~ time ) + theme_bw() +
scale_color_manual( values=c("tomato", "skyblue") ) +
scale_shape_manual( values=c(21,24) ) +
scale_linetype_manual( values=c("solid", "dashed") )
I have version 1.1.3 (updated yesterday).
Hello,
I mainly use your functions of emmeans and emtrends to get summary statistics from my models. Your package is awesome!
I've been able to retrieve the means and standard errors of emmeans and plot them in bar graphs. Now, I am working with data where one of my predictors is a factor and the other is numeric. So I would be plotting trends.
Is there a similar way to plot trends with the emmGrid object that I get from emtrends?
Thanks!
Dear prof.Lenth,
When I run the following Shiny app on my Windows computer it uses t-test in the emmeans paired comparisons. The calculated P-values are in agreement with the results from the Mixed Model with Paired comparison in SAS/JMP Pro 13 EN.
However when I publish this Shiny app on shinyapps.io the paired comparisons use the z-test. See https://gmwebapplications.shinyapps.io/InvestigatePvalue/
Would it be possible to use the t-test on the shinyapps.io ?
Thanks for your attention, Gerben
`# Investigate difference in calculated P-values between Windows and shinyapps.io
suppressWarnings(suppressMessages(library(shiny)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(lme4)))
suppressWarnings(suppressMessages(library(emmeans)))
ui <- fluidPage(
tags$hr(),
sidebarLayout(
sidebarPanel(),
mainPanel(
verbatimTextOutput("lme4Results"),
tags$hr(),
verbatimTextOutput("emmeansResults")
)
)
)
server <- function(input, output) {
FI <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,1,0,0,0,0,0,0,1,2,0,0,0,0,-1,0,1,3,2,-1,-2,1,1,0,0,1,0,0,-1,3,1,0,0,0,0,1,1,0,1)
Assessor <- c("am","am","an","an","ca","ca","ch","ch","gr","gr","kl","kl","na","na","ro","ro","sa","sa","st","st","am","am","an","an","ca","ca","ch","ch","gr","gr","kl","kl","na","na","ro","ro","sa","sa","st","st","am","am","an","an","ca","ca","ch","ch","gr","gr","kl","kl","na","na","ro","ro","sa","sa","st","st")
Product <- rep(c("1=Ref","2=Alt","3=Alt2"), each = 20)
dt <- tibble(Product,Assessor,FI)
MixedModel <-
lmer(FI ~ Product + (1| Assessor) + (1| Product:Assessor), data = dt)
output$lme4Results <- renderPrint(summary(MixedModel))
MixedModel.emm <- emmeans(MixedModel,"Product")
ConfIntTbl <- as.data.frame(MixedModel.emm)
MixedModel.emm.Dunnett <- contrast(MixedModel.emm, "trt.vs.ctrl", ref = 1)
output$emmeansResults <- renderPrint(summary(MixedModel.emm.Dunnett))
}
shinyApp(ui, server)
`
Result in Windows:
`contrast estimate SE df t.ratio p.value
2=Alt - 1=Ref 0.6 0.3333333 18 1.8 0.1594
3=Alt2 - 1=Ref 0.3 0.3333333 18 0.9 0.5807
P value adjustment: dunnettx method for 2 tests `
Result in shinyapps.io:
`contrast estimate SE df z.ratio p.value
2=Alt - 1=Ref 0.6 0.3333333 Inf 1.8 0.1318
3=Alt2 - 1=Ref 0.3 0.3333333 Inf 0.9 0.5683
P value adjustment: dunnettx method for 2 tests `
Hi Russ,
I have run the same model (mod1) with both lsmeans and emmeans for pairwise comparisons. I noticed that the output from emmeans involved some inflated df values which were not found in the lsmeans output. The inflated dfs reduced the number of statistically significant group contrasts. The output from lsmeans and emmeans is listed below. Could you please let me know whether the inflated dfs are resulted from the incorrect models I built or the inaccurate calculations of emmeans? Thanks in advance!
mod1<-lmer(GJTscore
GJTpre+Time+Group+Time*Group+(1|Class/Subject),data=GJT, REML = F)Group|Time),adjust="bonferroni")
lsmeans(mod1,list(pairwise
$pairwise differences of contrast, Time | Time
Time = 2:
contrast estimate SE df t.ratio p.value
4 - 1 -3.7713780 1.211024 216.96 -3.114 0.0126
4 - 2 -3.0443478 1.183329 216.71 -2.573 0.0646
4 - 3 -2.6995979 1.262705 215.33 -2.138 0.2019
1 - 2 0.7270302 1.111591 217.16 0.654 1.0000
1 - 3 1.0717801 1.179184 217.87 0.909 1.0000
2 - 3 0.3447499 1.153255 217.89 0.299 1.0000
Time = 4:
contrast estimate SE df t.ratio p.value
4 - 1 -5.8488587 1.204532 212.95 -4.856 <.0001
4 - 2 -3.7107185 1.184252 217.30 -3.133 0.0118
4 - 3 -4.4727699 1.256006 211.36 -3.561 0.0027
1 - 2 2.1381402 1.113608 218.52 1.920 0.3370
1 - 3 1.3760888 1.173556 214.28 1.173 1.0000
2 - 3 -0.7620513 1.154011 218.37 -0.660 1.0000
Time = 6:
contrast estimate SE df t.ratio p.value
4 - 1 -5.4159758 1.204532 212.95 -4.496 0.0001
4 - 2 -3.2493890 1.179468 214.27 -2.755 0.0382
4 - 3 -3.8862978 1.252647 209.45 -3.102 0.0131
1 - 2 2.1665868 1.108840 215.30 1.954 0.3120
1 - 3 1.5296780 1.170054 212.13 1.307 1.0000
2 - 3 -0.6369087 1.146254 213.39 -0.556 1.0000
Time = 8:
contrast estimate SE df t.ratio p.value
4 - 1 -5.2912010 1.204532 212.95 -4.393 0.0001
4 - 2 -2.4249866 1.177488 213.02 -2.059 0.2440
4 - 3 -3.9278571 1.273459 221.66 -3.084 0.0138
1 - 2 2.8662144 1.106607 213.79 2.590 0.0615
1 - 3 1.3633440 1.192269 226.20 1.143 1.0000
2 - 3 -1.5028704 1.166641 226.61 -1.288 1.0000
P value adjustment: bonferroni method for 6 tests
emmeans(mod1,list(pairwise~Group|Time),adjust="bonferroni")
$emmeans of Group | Time
$pairwise differences of Group | Time
Time = 2:
contrast estimate SE df t.ratio p.value
4 - 1 -3.7713780 1.420087 6.14 -2.656 0.2213
4 - 2 -3.0443478 1.395626 6.13 -2.181 0.4258
4 - 3 -2.6995979 1.467241 100.85 -1.840 0.4123
1 - 2 0.7270302 1.351757 2.79 0.538 1.0000
1 - 3 1.0717801 1.394396 5.57 0.769 1.0000
2 - 3 0.3447499 1.371838 5.58 0.251 1.0000
Time = 4:
contrast estimate SE df t.ratio p.value
4 - 1 -5.8488587 1.415019 6.01 -4.133 0.0367
4 - 2 -3.7107185 1.397154 6.14 -2.656 0.2215
4 - 3 -4.4727699 1.466036 95.44 -3.051 0.0177
1 - 2 2.1381402 1.353467 2.81 1.580 1.0000
1 - 3 1.3760888 1.392144 5.44 0.988 1.0000
2 - 3 -0.7620513 1.375174 5.56 -0.554 1.0000
Time = 6:
contrast estimate SE df t.ratio p.value
4 - 1 -5.4159758 1.415019 6.01 -3.827 0.0520
4 - 2 -3.2493890 1.392929 6.04 -2.333 0.3488
4 - 3 -3.8862978 1.461934 95.30 -2.658 0.0553
1 - 2 2.1665868 1.349424 2.77 1.606 1.0000
1 - 3 1.5296780 1.388054 5.38 1.102 1.0000
2 - 3 -0.6369087 1.367408 5.43 -0.466 1.0000
Time = 8:
contrast estimate SE df t.ratio p.value
4 - 1 -5.2912010 1.415019 6.01 -3.739 0.0577
4 - 2 -2.4249866 1.391218 6.00 -1.743 0.7917
4 - 3 -3.9278571 1.482142 100.09 -2.650 0.0561
1 - 2 2.8662144 1.347532 2.74 2.127 0.7904
1 - 3 1.3633440 1.408233 5.79 0.968 1.0000
2 - 3 -1.5028704 1.385909 5.81 -1.084 1.0000
P value adjustment: bonferroni method for 6 tests
Best regards,
Mengxia
Hello, I was trying out an example for my class and found some surprising results using cld()
library(emmeans)
fert_long <- structure(list(Finca = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
.Label = c("1", "2", "3", "4",
"5"), class = "factor"),
Fertilizante = c("Fertilizante.1", "Fertilizante.1",
"Fertilizante.1", "Fertilizante.1", "Fertilizante.1", "Fertilizante.2",
"Fertilizante.2", "Fertilizante.2", "Fertilizante.2", "Fertilizante.2",
"Fertilizante.3", "Fertilizante.3", "Fertilizante.3", "Fertilizante.3",
"Fertilizante.3"),
Produccion = c(1, 2, 7, 6, 12, 5, 8, 9, 13,
14, 8, 14, 16, 18, 17)), class = "data.frame",
.Names = c("Finca",
"Fertilizante", "Produccion"), row.names = c(NA, -15L))
fert_aov <- aov(Produccion ~ Fertilizante + Finca, data = fert_long)
fert_lsm <- emmeans(fert_aov, pairwise ~ Fertilizante)
cld(fert_lsm)
This yields in:
$emmeans
Fertilizante emmean SE df lower.CL upper.CL .group
Fertilizante.1 5.6 0.772442 8 3.818746 7.381254 1
Fertilizante.2 9.8 0.772442 8 8.018746 11.581254 2
Fertilizante.3 14.6 0.772442 8 12.818746 16.381254 3
Results are averaged over the levels of: Finca
Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05
$contrasts
contrast estimate SE df t.ratio p.value .group
Fertilizante.1 - Fertilizante.3 -9.0 1.092398 8 -8.239 0.0001 1
Fertilizante.2 - Fertilizante.3 -4.8 1.092398 8 -4.394 0.0058 2
Fertilizante.1 - Fertilizante.2 -4.2 1.092398 8 -3.845 0.0121 2
Results are averaged over the levels of: Finca
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05
emmeans results is correct, but I find the constrast section kind of misleading. I think, the contrast between Fertilizante.1 - Fertilizante.2 should result in .group 3.
Is this a bug or a feature?
Hi Russ,
It appears that at the moment the brms
support with emmeans
fails with models that include a call to poly
in the formula. For example:
library("emmeans")
library("brms")
warpbreaks$tension_num <- as.numeric(warpbreaks$tension)
warp.mcmc <- brm(breaks ~ wool * poly(tension_num, 2), data = warpbreaks,
family = gaussian())
emmeans(warp.mcmc, "wool")
# Error in poly(tension_num, 2) :
# 'degree' must be less than number of unique points
That probably is due to the specific way in which brms
handles those terms:
summary(warp.mcmc)
# [...]
# Population-Level Effects:
# Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# Intercept 31.00 2.11 26.84 35.03 4000 1.00
# woolB -5.78 3.03 -11.56 0.20 4000 1.00
# polytension_num21 -60.59 15.91 -92.30 -29.10 3102 1.00
# polytension_num22 36.43 16.14 4.70 67.77 3032 1.00
# woolB:polytension_num21 32.52 22.72 -12.25 77.85 3252 1.00
# woolB:polytension_num22 -54.70 22.56 -99.32 -9.86 2997 1.00
# [...]
Compare with:
warp.lm <- lm(breaks ~ wool * poly(tension_num, 2), data = warpbreaks)
emmeans(warp.lm, "wool")
# NOTE: Results may be misleading due to involvement in interactions
# wool emmean SE df lower.CL upper.CL
# A 24.00000 3.646761 48 16.66769 31.33231
# B 28.77778 3.646761 48 21.44547 36.11008
#
# Confidence level used: 0.95
Cheers,
Henrik
Hi,
On line 28 of emmeans/R/gam-support.R you open up the possibility for future support for mgcv::gam. I am wondering if you would be able to add this support?
The reason for me using the mgcv package rather than the gam package is the ability of the former to handle factor covariates in the s(...) smoother function using a keyword argument named "by=".
As an example:
# load data and libraries
> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-23. For overview type 'help("mgcv-package")'.
> library(emmeans)
# load mtcars
> data(mtcars)
# create a factor out of the cyl variable
> mtcars$cylfact <- factor(mtcars$cyl)
# fit a mgcv::gam model and print the summary
> mtcars.gam1 <- gam(mpg ~ 0 + cylfact + s(wt, by = cylfact), data = mtcars)
> summary(mtcars.gam1)
# attempt building an emmeans::ref_grid
> ref_grid(mtcars.gam1)
Error in emm_basis.gam_mgcv(object, trms, xlev, grid, ...) :
Can't handle mgcv::gam objects
I also run into trouble when I use gam::gam
# reset R session, reload mtcars and make a factor out of the cyl variable
> library(gam)
Loading required package: splines
Loading required package: foreach
Loaded gam 1.14-4
> library(emmeans)
> data(mtcars)
> mtcars$cylfact <- factor(mtcars$cyl)
# fit a gam::gam model and print the summary
> mtcars.gam2 <- gam(mpg ~ 0 + cylfact * s(wt), data = mtcars)
> summary(mtcars.gam2)
# attempt building an emmeans::ref_grid
> ref_grid(mtcars.gam2)
Error in apply(smooth.frame[[i]], 1, paste, collapse = ":") :
dim(X) must have a positive length
Since mgcv is packaged in the R standard library, my preference is to be able to use it together with emmeans.
Thank you for considering.
Hi,
I believe there may be an error in the way that confidence intervals are calculated using Scheffe's method of multiple comparisons. As an example, if I run the code below
data(fiber)
modFiber <- lm(strength ~ machine, data=fiber)
emFiber <- emmeans(modFiber, "machine")
contrast(emFiber, "pairwise", adjust="scheffe", options=list(infer=TRUE))
I get
Warning message:
In sqrt(n.contr + scheffe.adj * qf(level, n.contr + scheffe.adj, :
NaNs produced
It appears there are two issues in the calculation of the critical coefficient:
n.contr + scheffe.adj
.I have tried this on other datasets and get confidence intervals, but the numbers are incorrect. In particular, the intervals are shorter than Tukey's method for pairwise comparisons.
Thanks!
Chris
Note to self on future enhancement of Bayesian support:
bhat
and V
slots when there is a nontrivial post.beta
. we no longerregrid()
. Could be a tough row to hoe.emmean
, though. Re-label to med.emmean
? Change the default to the mean?type
argument to as,mcmc
, so that as.mcmc(emm, type = "response")
wouldConsider this example adapted from one of the vignettes:
cows <- data.frame (
route = factor(rep(c("injection", "oral"), c(5, 9))),
drug = factor(rep(c("Bovineumab", "Charloisazepam",
"Angustatin", "Herefordmycin", "Mollycoddle"), c(3,2, 4,2,3))),
resp = c(34, 35, 34, 44, 43, 36, 33, 36, 32, 26, 25, 25, 24, 24),
x1 = .2*sin(1:14),
x2 = (.03*(19:6)^2)
)
cows.lm <- lm(resp ~ route + drug + x1 + x2, data = cows)
emmeans(cows.lm, "route")
## NOTE: A nesting structure was detected in the fitted model:
## drug %in% route
## If this is incorrect, re-run or update with `nesting` specified
## route emmean SE df lower.CL upper.CL
## injection 36.49093 1.832164 7 32.15855 40.82331
## oral 29.61657 1.265626 7 26.62384 32.60930
##
## Results are averaged over the levels of: drug, x1, x2
## Confidence level used: 0.95
This is a nested model. The results are correct, but the annotation says that we averaged over x1
and x2
; but these are covariates, and each is set to only one value in the grid (its mean).
Anyway, we need to fix this so the annotations to show what we actually averaged over.
(Not so much an issue as it is a request)
Are there any plans to add effect sized to contrasts (either partial eta square, or r-square-alerting)?
This would be very useful, as some journals require an effects size to accompany each p-value (and doing this "by hand" is quite tedious).
Is emmeans recognizing link functions with glmmTMB? I am running emmeans on a glmmTMB object specified with a poisson model, but summary(emmGrid object) does not report the link. Including type = "response" returns the same output as the default command.
cohab1 <- glmmTMB(Cohabit ~ Sex*OthSex + offset(log(Fixes)) + (1 | ID), family = poisson, data = AllShelts)
emm.TMB<- emmeans(cohab1, ~OthSex | Sex)
summary(emm.TMB)
Sex = F:
OthSex emmean SE df asymp.LCL asymp.UCL
F 0.3131363 0.1931931 NA -0.06551531 0.6917879
M 0.4734789 0.1846699 NA 0.11153255 0.8354253
Sex = M:
OthSex emmean SE df asymp.LCL asymp.UCL
F 2.1803186 0.2194255 NA 1.75025259 2.6103846
M -0.5052588 0.5989604 NA -1.67919960 0.6686820
Confidence level used: 0.95
summary(emm.TMB, type = "response")
Sex = F:
OthSex emmean SE df asymp.LCL asymp.UCL
F 0.3131363 0.1931931 NA -0.06551531 0.6917879
M 0.4734789 0.1846699 NA 0.11153255 0.8354253
Sex = M:
OthSex emmean SE df asymp.LCL asymp.UCL
F 2.1803186 0.2194255 NA 1.75025259 2.6103846
M -0.5052588 0.5989604 NA -1.67919960 0.6686820
Confidence level used: 0.95
Hi,
I have an issue with emtrends. I have a model in which emmeans detects incorrectly nesting. I updated the ref_grid with nesting = NULL and that fixed the issue with emmeans. I tried to use the updated ref_grid with emtrends and I get the following error: Error in data[[var]] : subscript out of bounds
As I mentioned the same updated ref_grid works just fine with emmeans, thus I presume that it would be some problem in emtrends.
On a different topic, is there a way to stop emmeans/emtrends nesting recognition?
Thank you.
Paco
Hello,
The following issue was discovered by one of my undergrads working on a homework assignment. In unbalanced factorial designs (in which the weights= option should typically result in different estimates), the weights= argument in emmeans is ignored if one of the factors is a string (as opposed to a factor). The result is surprising because (obviously) both the lm statement (and also the emmeans statement) run this model without issues. In fact, as expected the lm statement using either a variable that is a factor or a character yield identical results.
Here is a minimally reproducible example:
t1 <- c(0,0,0,0,0,0,1,1,1,1,1,1)
t2 <- c("C","C","C","T","T","C","T","T","T","C","C","C")
y <- c(1,2,3,4,5,6,7,8,9,10,1,2)
df <- data.frame(t1,t2,y)
df$t1 <- as.factor(df$t1)
df$t2_chr <- as.character(df$t2)
str(df)
lm1 <- lm(y~t1*t2,data=df)
lm2 <- lm(y~t1*t2_chr,data=df)
all.equal(as.numeric(lm1$coefficients),as.numeric(lm2$coefficients))
library(emmeans)
e1 <- emmeans(lm1,"t1",weights="proportional")
e2 <- emmeans(lm1,"t1",weights="equal")
all.equal(e1,e2)
b1 <- emmeans(lm2,"t1",weights="proportional")
b2 <- emmeans(lm2,"t1",weights="equal")
all.equal(b1,b2)
I got this error while analysing data with 2 groups, for three groups work fine
NOTE: 'cld()' groupings are determined separately for each list member.
Error in [[<-.data.frame
(*tmp*
, nm, value = c(-1, 1)) :
replacement has 2 rows, data has 1
summary_emm
objects also have class data.frame. $
and [[
indexing work as expected, and result in a single column. It's not clear to me, however, what [
indexing does, since it usually just returns an error. I also can't find documentation of a [.summary_emm
method. Is this intentional?
library(emmeans)
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
warp.emmGrid <- emmeans(warp.lm, ~ tension | wool)
summary(warp.emmGrid)
# wool = A:
# tension emmean SE df lower.CL upper.CL
# L 44.55556 3.646761 48 37.22325 51.88786
# M 24.00000 3.646761 48 16.66769 31.33231
# H 24.55556 3.646761 48 17.22325 31.88786
#
# wool = B:
# tension emmean SE df lower.CL upper.CL
# L 28.22222 3.646761 48 20.88992 35.55453
# M 28.77778 3.646761 48 21.44547 36.11008
# H 18.77778 3.646761 48 11.44547 26.11008
# these both work
summary(warp.emmGrid)$emmean
summary(warp.emmGrid)[['emmean']]
# errors:
summary(warp.emmGrid)['emmean']
# Error in .subset2(x, i, exact = exact) :
# attempt to select less than one element in get1index
Would it be possible to have a "reverse" argument in trt.vs.ctrl.emmc?
Right now the control groups is subtracted from the treatments, and I think this is likely what folks most often want when comparing everything to a control group. However, I was recently in a situation where the control group was much larger than the treatment and the comparisons (which in this case happened to be odds ratios) felt more interpretable as control minus treatment.
Of course, since you've made it so easy to extract results into data.frames, I could easily invert the results myself and so "manually" switch the order of the comparisons.
User-reported issue:
In a model with 360 abundance measurement performed at six sampling dates, i want to know the pairwise tukey contrasts for the six dates.
mod <- glm(Abundance ~ Date, family="gaussian")
FWIW, Abundance is normally distributed in its entirety, the subpopulations per date are not
means <- emmeans(mod, ~Date)
pairs (means, simple="Date")
works well, as does
glht(mod, mcp(Date="Tukey")).
However, Abund
is spatially auto-correlated. I thus switch to lme
with a dummy random effect and update to the correlation structure yielding the best AIC.
best.mod <- lme(fixed=Abundance ~ Date, data = data,
random = ~1| dummy, method="ML",
correlation=corRat(1, form = ~ x + y))
Now, glht() still works:
> summary(glht(mods[[best.mod]], mcp(Date = "Tukey")))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lme.formula(fixed = Abundance ~ Date, data = D, random = ~1 |
Date, correlation = getFunction(m)(1, form = ~x + y), method = "ML")
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
May - April == 0 -0.30829 0.15815 -1.949 0.3718
June - April == 0 0.41849 0.15749 2.657 0.0841 .
Aug. - April == 0 -0.25109 0.15741 -1.595 0.6018
Oct. - April == 0 -0.28579 0.15747 -1.815 0.4561
Nov. - April == 0 -0.09987 0.15709 -0.636 0.9883
June - May == 0 0.72677 0.15759 4.612 <0.001 ***
Aug. - May == 0 0.05719 0.15751 0.363 0.9992
Oct. - May == 0 0.02250 0.15758 0.143 1.0000
Nov. - May == 0 0.20842 0.15720 1.326 0.7707
Aug. - June == 0 -0.66958 0.15685 -4.269 <0.001 ***
Oct. - June == 0 -0.70428 0.15691 -4.488 <0.001 ***
Nov. - June == 0 -0.51835 0.15653 -3.311 0.0119 *
Oct. - Aug. == 0 -0.03470 0.15683 -0.221 0.9999
Nov. - Aug. == 0 0.15122 0.15645 0.967 0.9285
Nov. - Oct. == 0 0.18592 0.15652 1.188 0.8429
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
However, emmeans does not produce p-values with best.mod, possible because of zero degree of freedoms
> means <- emmeans(mods[[best.mod]], ~ Date)
> pairs(means, simple="Date")
contrast estimate SE df t.ratio p.value
April - May 0.30828775 0.1594914 0 1.933 NaN
April - June -0.41848582 0.1588249 0 -2.635 NaN
April - Aug. 0.25109352 0.1587444 0 1.582 NaN
April - Oct. 0.28579102 0.1588097 0 1.800 NaN
April - Nov. 0.09986911 0.1584278 0 0.630 NaN
May - June -0.72677357 0.1589291 0 -4.573 NaN
May - Aug. -0.05719423 0.1588486 0 -0.360 NaN
May - Oct. -0.02249673 0.1589139 0 -0.142 NaN
May - Nov. -0.20841864 0.1585322 0 -1.315 NaN
June - Aug. 0.66957934 0.1581794 0 4.233 NaN
June - Oct. 0.70427684 0.1582449 0 4.451 NaN
June - Nov. 0.51835493 0.1578616 0 3.284 NaN
Aug. - Oct. 0.03469750 0.1581641 0 0.219 NaN
Aug. - Nov. -0.15122440 0.1577806 0 -0.958 NaN
Oct. - Nov. -0.18592191 0.1578463 0 -1.178 NaN
P value adjustment: tukey method for comparing a family of 6 estimates
Warning messages:
1: In pt(abs(t), DF, lower.tail = FALSE) : NaNs wurden erzeugt
2: In ptukey(sqrt(2) * abst, fam.size, zapsmall(df), lower.tail = FALSE) :
NaNs wurden erzeugt
A user has reported that the "weights"
argument is ineffective in emmeans()
when the specs
argument is a list
. Here is an example:
require("emmeans")
nutr.lm <- lm(gain ~ age + group + race, data = nutrition)
sapply(c("equal", "prop", "outer", "cells", "flat"), function(w)
predict(emmeans(nutr.lm, "race", weights = w)))
## equal prop outer cells flat
## [1,] 0.6662975 1.724608 1.724608 0.3809524 0.7178124
## [2,] 1.4760579 2.534368 2.534368 1.6666667 2.6219735
## [3,] 1.3655468 2.423857 2.423857 2.7951807 1.8945106
sapply(c("equal", "prop", "outer", "cells", "flat"), function(w)
predict(emmeans(nutr.lm, list("race"), weights = w)[[1]]))
## equal prop outer cells flat
## [1,] 0.6662975 0.6662975 0.6662975 0.6662975 0.6662975
## [2,] 1.4760579 1.4760579 1.4760579 1.4760579 1.4760579
## [3,] 1.3655468 1.3655468 1.3655468 1.3655468 1.3655468
Note in the second case the weights make no difference, whereas they do in the first case.
I am fitting a parametric survival model with survreg and I would like to get the means by using the emmeans() function. The function used to work fine until emmeans vers. 1.2.3, but it no longer works with vers. 1.2.4. I am getting the following error:
Error in parse(text = x, keep.source = FALSE) : <text>:1:20: unexpected input 1: ~Surv(L, U, type = _
^
My code is as follows:
BBsurvey <- structure(data.frame(1:36, structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L), .Label = c("A",
"B", "C", "D", "E", "F", "G", "H", "I"), class = "factor"), c(0.1,
0.1, 5, 5, 0.1, 0.1, 5, 5, 0, 0.1, 5, 5, 0, 0.1, 5, 5, 0, 0,
0.1, 0.1, 5, 5, 25, 25, 0.1, 0.1, 5, 5, 0, 0.1, 5, 5, 75, 25,
50, 25), c(5, 5, 25, 25, 5, 5, 25, 25, 0.1, 5, 25, 25, 0.1, 5,
25, 25, 0.1, 0.1, 5, 5, 25, 25, 50, 50, 5, 5, 25, 25, 0.1, 5,
25, 25, 100, 50, 75, 50), c(2.55, 2.55, 15, 15, 2.55, 2.55, 15,
15, 0.05, 2.5, 15, 15, 0.05, 2.55, 15, 15, 0.05, 0.05, 2.55,
2.55, 15, 15, 37.5, 37.5, 2.5, 2.55, 15, 15, 0.1, 2.55, 15, 15,
87.5, 37.5, 62.5, 37.5)), .Names = c("X", "Herbicide", "L", "U",
"midPoint"))
mod.surv <- survreg(Surv(L, U, type="interval2") ~ Herbicide,
dist="gaussian", data = BBsurvey)
means.surv <- emmeans(mod.surv, Herbicide)
Any help with this is very much appreciated. Thank you very much in advance.
Andrea
I am using lsmeans
and just notice the new package emmeans
. Hopefully my issue for lsmeans
has been addressed in emmeans
.
In lsemans, while we have a model with offset term, the LS means will automatically add the offset term.
It would be nice to have some information in the help page, as this is inconsistent with SAS. (SAS did not add offset term)
It will be nice to have LS means for zero-inflated models that used in pscl::zeroinfl.
The current code can directly be used. my hack is as below
# negtive binomial model for event rate
fit <- MASS::glm.nb(cnt ~ TRT01P + ... + offset(logT), data = db_cnt)
.ls <- lsmeans(fit, "TRT")
.er <- exp(fit_ls[,2] - .ls@grid$.offset.)
# Zero inflated negtive binomial model for event rate
fit<- pscl::zeroinfl(cnt ~ TRT + ... + offset(logT) | TRT,
dist = "negbin", data = db_cnt)
.er <- as.numeric(exp(.ls@linfct %*% fit$coefficients$count ))
First of all, thanks for this great package.
I am currently using a model with one numerical and two categorical variables.
Here's an example of how it might look like:
Create data frame:
y <- rnorm(n = 100, mean = -2, sd = 0.5)
x1 <- rnorm(n = 100, mean = 3, sd = 1.5)
x2 <- sample(x = c("a1", "a2", "a3"), size = 100, replace = TRUE)
x3 <- sample(x = c("b1", "b2", "b3"), size = 100, replace = TRUE)
Run model and show coefficients
res <- lm(y ~ x1 + x2 + x3)
coef(res)
#> (Intercept) x1 x2a2 x2a3 x3b2 x3b3
#> -1.87891191 -0.00222856 -0.19620519 -0.16569943 0.00158045 -0.05318129
Then, I used emmeans to quickly calculate the predicted means and SEs for the different factor levels of x2
pairs.x2 <- emmeans::emmeans(res, pairwise ~ x2)
pairs.x2$emmeans
#> x2 emmean SE df lower.CL upper.CL
#> a1 -1.902427 0.10199471 94 -2.104940 -1.699914
#> a2 -2.098633 0.08209662 94 -2.261637 -1.935628
#> a3 -2.068127 0.09411921 94 -2.255003 -1.881251
#> Results are averaged over the levels of: x3
#> Confidence level used: 0.95
Given the message "averaged over the levels of: x3", I thought that results were NOT averaged over the numerical variable x1
In that case, I thought that the means should be the following:
# for x2 = a1
0 + coef(res)["(Intercept)"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> -1.896112
# For x2 = a2
coef(res)["x2a2"] + coef(res)["(Intercept)"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> -2.092317
# For x2 = a3
coef(res)["x2a3"] + coef(res)["(Intercept)"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> -2.061812
However, they are
0 + coef(res)["(Intercept)"] + mean(x1) * coef(res)["x1"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> (Intercept)
#> -1.902427
coef(res)["x2a2"] + coef(res)["(Intercept)"] + mean(x1) * coef(res)["x1"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> x2a2
#> -2.098633
coef(res)["x2a3"] + coef(res)["(Intercept)"] + mean(x1) * coef(res)["x1"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> x2a3
#> -2.068127
I was confused that the output did not indicate that results are averaged over x3 AND x1
and I was wondering if you might want to consider including this information into the output
Thanks a lot,
Urs
Reported by a user. The results obtained with cld(..., details = TRUE)
are not correct. For example:
> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
> warp.emm <- emmeans(warp.lm, ~ tension | wool)
> cld(warp.emm, details=TRUE, adjust="none", alpha = .2)
$`emmeans`
wool = A:
tension emmean SE df lower.CL upper.CL .group
M 24.00000 3.646761 48 16.66769 31.33231 1
H 24.55556 3.646761 48 17.22325 31.88786 1
L 44.55556 3.646761 48 37.22325 51.88786 2
wool = B:
tension emmean SE df lower.CL upper.CL .group
H 18.77778 3.646761 48 11.44547 26.11008 1
L 28.22222 3.646761 48 20.88992 35.55453 2
M 28.77778 3.646761 48 21.44547 36.11008 2
Confidence level used: 0.95
significance level used: alpha = 0.2
$comparisons
wool = A:
contrast estimate SE df t.ratio p.value
H - M 0.5555556 5.157299 48 0.108 0.9147
L - M 20.5555556 5.157299 48 3.986 0.0002
L - H 20.0000000 5.157299 48 3.878 0.0003
wool = B:
contrast estimate SE df t.ratio p.value
H - M 9.4444444 5.157299 48 1.831 0.0733
L - M 10.0000000 5.157299 48 1.939 0.0584
L - H 0.5555556 5.157299 48 0.108 0.9147
> pairs(warp.emm, adjust = "none")
wool = A:
contrast estimate SE df t.ratio p.value
L - M 20.5555556 5.157299 48 3.986 0.0002
L - H 20.0000000 5.157299 48 3.878 0.0003
M - H -0.5555556 5.157299 48 -0.108 0.9147
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.5555556 5.157299 48 -0.108 0.9147
L - H 9.4444444 5.157299 48 1.831 0.0733
M - H 10.0000000 5.157299 48 1.939 0.0584
Observe that the results from pairs()
are inconsistent with the $comparisons
portion from cld()
?lsmeans::summary provides an explicit list and discussion of p-value adjustment methods in the Details section.
?summary.emmGrid suggests in the Arguments section that a similar discussion p-value adjustment methods would be included in the Details section, but that discussion is not included there.
Hello,
Huge fan of lsmeans/emmeans! Noticed this issue today and thought I would pass along a reprex.
I have a function which creates lme models using an assembled formula. Emmeans works fine on such models when they are assembled in the global space, but appear to fail inside functions. Does not occur with lm functions, appears to be specific to lme, perhaps because of the 'fixed = ' portion of the formula?
Here is my attempt at a reprex using mtcars:
#outside of a function an lme model with a variable in the call works fine
lme_formula <- as.formula("mpg ~ as.factor(cyl)")
lme_formula_model <- lme(lme_formula, random =~1|disp, data = mtcars)
emmeans(lme_formula_model, ~ cyl)
lme_formula_model$call
#inside of a function the emmeans cannot seem to find the formula variable in the model call
lme_formula_in_fn <- function(formula_string){
formula_inside_fn <- as.formula(formula_string)
model_inside_fn <- lme(formula_inside_fn, random =~1|disp, data = mtcars)
try(emmeans(model_inside_fn, ~ cyl))
return(model_inside_fn)
}
#call the above function and see resulting error
model_outside_fn <- lme_formula_in_fn("mpg ~ as.factor(cyl)")
#model itself seems ok otherwise just has the variable in the call
model_outside_fn$call
summary(model_outside_fn)
The error text for the inside the function emmeans is:
Error in eval(x$call$fixed) : object 'formula_inside_fn' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed
I just found this morning, still toying around with it, will update if I find anything further/workarounds.
Cheers,
Ben
Hi,
I have been using emtrends to estimate slopes in a mixed effects model and it works nicely to get the trend for the different groupings. Is there a way to use emtrends to get not the group trend, but the individual subject trends?
Thank you.
Paco
Do you plan on adding support for gamm4 objects?
Thanks a lot for your awesome package!
First of all thanks for providing initial support for brmsfit objects! I have a few comments / points to discuss:
Regarding:
recover_data.brmsfit = function(object, data, ...) {
fcall = object$call
form = brms::parse_bf(formula(object))$dpars$mu$fe
recover_data.call(fcall, terms(form), "na.omit", data = object$data, ...)
}
brmsfit object have no $call
component. For what do you need to use this component internally?
Regarding:
emm_basis.brmsfit = function(object, trms, xlev, grid, vcov., ...) {
V = vcov(object)
m = model.frame(trms, grid, na.action = na.pass, xlev = xlev)
contr = lapply(object$data, function(.) attr(., "contrasts"))
contr = contr[!sapply(contr, is.null)]
X = model.matrix(trms, m, contrasts.arg = contr)
nm = dimnames(X)[[2]]
nbasis=estimability::all.estble
dfargs = list()
dffun = function(k, dfargs) Inf
misc = .std.link.labels(parse_bf(formula(object))$dpars$mu$family, list())
# Pick the 1st length(nm) columns -- probably not the best approach...
post.beta = as.matrix(posterior_samples(object)[, seq_along(nm), drop = FALSE])
bhat = apply(post.beta, 2, mean)
list(X=X, bhat=bhat, nbasis=nbasis, V=V, dffun=dffun, dfargs=dfargs,
misc=misc, post.beta=post.beta)
}
You can try to extract posterior samples via something like
b_pars <- paste0("b_", nm)
post.beta <- as.matrix(object, pars = b_pars, exact = TRUE)
The V
component could be restricted to the actually used coefficients via
V <- V[nm, nm, drop = FALSE]
Hi Russ,
I noticed some surprising behavior when comparing the marginal means between two equivalent models estimated with both lm
and aov
. When the models include all interactions, the emmeans are equal. However, when only including a main effect between two of the variables, the emmeans are not any more identical. The following example code shows it:
df_wide <- structure(list(id = structure(1:16, .Label = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16"), class = "factor"), treatment = structure(c(1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("control",
"A", "B"), class = "factor", contrasts = "contr.sum"), gender = structure(c(2L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("F",
"M"), class = "factor", contrasts = "contr.sum"), fup = c(3,
4, 7, 4, 4, 9, 9, 6, 5, 8, 6, 8, 8, 6, 7, 8), post = c(3, 3,
5, 3, 6, 9, 8, 5, 4, 7, 5, 9, 6, 5, 7, 7), pre = c(2, 4, 6, 5,
4, 8, 5, 3, 4, 4, 3, 6, 6, 2, 3, 5)), .Names = c("id", "treatment",
"gender", "fup", "post", "pre"), row.names = c(NA, -16L), class = "data.frame")
library(emmeans)
options(contrasts=c('contr.sum', 'contr.poly'))
lm_main <- lm(cbind(fup, post, pre) ~ treatment + gender, df_wide)
lm_inter <- lm(cbind(fup, post, pre) ~ treatment * gender, df_wide)
df_long <- reshape2::melt(df_wide, id.vars = c("id", "treatment", "gender"))
aov_main <- aov(value ~ (gender + treatment)*variable + Error(id/variable), df_long)
aov_inter <- aov(value ~ (gender * treatment)*variable + Error(id/variable), df_long)
emm_lm_inter <- emmeans(lm_inter, "treatment")
emm_aov_inter <- emmeans(aov_inter, "treatment")
summary(emm_lm_inter)$emmean
# [1] 4.222222 6.250000 6.027778
summary(emm_aov_inter)$emmean
# [1] 4.222222 6.250000 6.027778
emm_lm_main <- emmeans(lm_main, "treatment")
emm_aov_main <- emmeans(aov_main, "treatment")
summary(emm_lm_main)$emmean
# [1] 4.100365 6.250000 6.071168
summary(emm_aov_main)$emmean
# [1] 4.126521 6.276156 6.097324
As you can see, the example requires both independent sample factors (between which the critical distinction between an interaction or main effect is included) and the repeated-measures factors.
Interestingly, if you remove one of the independent samples factors, the emmeans also differ:
lm2 <- lm(cbind(fup, post, pre) ~ treatment, df_wide)
aov2 <- aov(value ~ treatment*variable + Error(id/variable), df_long)
summary(emmeans(lm2, "treatment"))$emmean
# [1] 4.20 6.25 6.00
summary(emmeans(aov2, "treatment"))$emmean
# [1] 4.216667 6.266667 6.016667
It would be great to know if this is indeed some sort of bug or just a consequence of the different models. The fact that for the full models with the interaction are in fact identical, somehow suggests there might be some problem here.
Cheers,
Henrik
I am a Professor of Statistics at Indira Gandhi Krishi Vishwavidyalaya, Raipur, India. While teaching in class about analysis of variance using R, I was doing a one-way analysis for the two data-sets given below in the R-class. I got a typical error in "emmeans" package, please help:
Medley and Clements (1998) investigated the impact of zinc contamination (and other heavy metals) on the diversity of diatom species in the USA Rocky Mountains. The diversity of diatoms (number of species) and degree of zinc contamination (categorized as either of high, medium, low or natural background level) were recorded from between four and six sampling stations within each of six streams known to be polluted, as given below:
stream=c("Eagle", "Eagle", "Eagle", "Eagle", "Blue", "Blue",
"Blue", "Blue", "Blue", "Blue", "Blue", "Snake", "Snake",
"Snake", "Snake", "Snake", "Arkan", "Arkan", "Arkan",
"Arkan", "Arkan", "Arkan", "Arkan", "Chalk", "Chalk",
"Chalk", "Chalk", "Chalk", "Splat", "Splat", "Splat",
"Splat", "Splat", "Splat")
zinc=c("BACK", "HIGH", "HIGH", "MED", "BACK", "HIGH", "BACK", "BACK",
"HIGH", "MED", "MED", "BACK", "MED", "HIGH", "HIGH", "HIGH",
"LOW", "LOW", "LOW", "LOW", "MED", "MED", "LOW", "LOW",
"HIGH", "HIGH", "MED", "LOW", "BACK", "BACK", "MED", "LOW",
"MED", "BACK")
diversity=c(2.27, 1.25, 1.15, 1.62, 1.7, 0.63, 2.05, 1.98, 1.04,
2.19, 2.1, 2.2, 2.06, 1.9, 1.88, 0.85, 1.4, 2.18, 1.83,
1.88, 2.02, 1.94, 2.1, 2.38, 1.43, 1.37, 1.75, 2.83,
1.53, 0.76, 0.8, 1.66, 0.98, 1.89)
medley.clementis=data.frame(stream,zinc,diversity)
medley.clementis.aov=with(medley.clementis, aov(diversity ~ zinc))
anova(medley.clementis)
emmeans::emmeans(medley.clementis.aov, "zinc")
Error in recover_data.call(fcall, delete.response(terms(object)), object$na.action, :
object 'possibly.random' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed
Keough and Raimondi (1995) examined the effects of four biofilm types (SL: sterile unfilmed substrate, NL: netted laboratory biofilms, UL: unnetted laboratory biofilms and F: netted field biofilms) on the recruitment of serpulid larvae. Substrates treated with one of the four biofilm types were left in shallow marine waters for one week after which the number of newly recruited serpulid worms were counted, as given below:
biofilm=c("SL", "SL", "SL", "SL", "SL", "SL", "SL", "UL", "UL", "UL",
"UL", "UL", "UL", "UL", "NL", "NL", "NL", "NL", "NL", "NL",
"NL", "F", "F", "F", "F", "F", "F", "F")
serpulid=c(61, 113, 123, 75, 75, 83, 95, 143, 81, 101, 155, 156, 193,
163, 203, 159, 139, 161, 179, 97, 157, 128.5, 204.5,
108.5, 116.5, 140.5, 160.5, 87.5)
keough.raimondi=data.frame(biofilm,serpulid)
keough.raimondi.ln=transform(keough.raimondi, serpulid.ln=log(serpulid))
contrasts(keough.raimondi.ln$biofilm) <- cbind(c(0, 1, 0, -1),
c(2, -1, 0, -1), c(-1, -1, 3, -1))
keough.raimondi.ln$biofilm
keough.contr.list <- list(biofilm = list('NL vs UL' = 1,
'F vs (NL & UL)' = 2, 'SL vs (F & NL & UL)' = 3))
keough.contr.list
keough.raimondi.ln.aov=with(keough.raimondi.ln, aov(serpulid.ln ~ biofilm))
summary(keough.raimondi.ln.aov,split=keough.contr.list)
emmeans(keough.raimondi.ln.aov, ~ biofilm)
Error in recover_data.call(fcall, delete.response(terms(object)), object$na.action, :
object 'possibly.random' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed
On many other data sets and data frame I successfully used "emmeans" package using the help available in R.
But, for the above two data-sets, I consistently got the same error as described above.
I do not know what is amiss. Where I am missing or whatever is wrong, I request the entire R-team to help me to solve above problem. Thanking in advance.
Dr. A.K. Singh
Professor and Head
Department of Agricultural Statistics
Indira Gandhi Krishi Vishwavidyalaya
Raipur-492012, Chhattisgarh, India
Mob: +918770625795
Email: [email protected]
WhatsApp: +919039202968
Dear maintainers of emmeans,
This is working:
dv = rnorm(100)
iv = rbinom(100, 1, 0.5)
d = data.frame(dv, iv)
m = lm(dv ~ iv, data=d)
emmeans(m, ~iv)
iv emmean SE df lower.CL upper.CL
0.56 -0.09024097 0.102466 98 -0.2935815 0.1130996
Confidence level used: 0.95
However, this fails:
dv = rnorm(100)
iv = rbinom(100, 1, 0.5)
m = lm(dv ~ iv)
emmeans(m, ~iv)
Error in recover_data.call(fcall, delete.response(terms(object)), object$na.action, :
object 'possibly.random' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed
It seems that emmeans wants all the data in a data.frame. I see that the data.frame variant covers
probably 99% of the use cases, though, but maybe it is only a small thing to fix...
Best regards,
Matthias
Hi,
I'm estimating confidence intervals with emmeans on a glm model and found discrepancies with confint()
function.
An example can be done using data from SAS user guide which I saved here.
The dep var (Pain) is binary and the factor is 3-level (Treatment). I code Treatment with contr.treatment
so the intercept of the model will be the expected logit for Treatment=A
contrasts(dat$Treatment)<-contr.treatment(3,base=1)
model<-glm(Pain~Treatment,data=dat,family = binomial())
summary(model)
The summary indicates that the intercept is -1.0986, with SE=0.51639. Asking for CI I get
confint(model)
(Intercept) -2.2218727 -0.1504401
If now I run the emmeans()
function I get (ci bounds are the last two numbers)
emmeans(model,~Treatment)
A -1.098612 0.5163973 NA -2.11073247 -0.08649211
The estimate is the same, the SE is the same, but the confidence interval is different. Is this ok?
I found this behavior to be general, as it applies to many different examples.
Thanks a lot for the great material
marcello
I am fitting a bayesian model as follows:
df <- data.frame(
RT = rnorm(100, 30, .2),
Stress = runif(100, 3, 5))
fit <- rstanarm::stan_glm(RT ~ poly(Stress, 2, raw=T), data=df)
I'd like to extract the reference grid (to further predict the outcome):
ref_grid <- emmeans::ref_grid(fit, at = list(
Stress = seq(min(df$Stress),
max(df$Stress),
length.out = 10)))
But it says the following:
Error in model.frame.default(formula = ~Stress + T, data = df, drop.unused.levels = TRUE, :
variables differ in length (found for 'T')
Error in emmeans::ref_grid(fit, at = list(Stress = seq(min(df$Stress), :
Perhaps a 'data' or 'params' argument is needed
What can I do?
Thanks for your help!
Here's the continuation of your mail with the error in the psycho
package.
It is rather strange:
For example, this chunk of code appear as broken if emmeans is not loaded (library(emmeans))
fit <- rstanarm::stan_glm(Sepal.Width ~ Sepal.Length * Species, data=iris)
formula <- "Sepal.Length * Species"
formula <- as.formula(paste0("~", formula))
means_posterior <- fit %>%
emmeans::emmeans(formula) %>%
coda::as.mcmc() %>%
as.matrix() %>%
coda::as.mcmc() %>%
as.data.frame()
Error in y[, var.cols] <- x :
number of items to replace is not a multiple of replacement length
It works if I load the package beforehand. In the same way, the error in the test-get_contrasts of the psycho
package also works after explicitly loading emmeans...
Is there an internal outside scope call that is made?
Note this curious result:
> warp.lm = lm(log(breaks) ~ wool * tension, data = warpbreaks, subset = 10:48)
> rg = ref_grid(warp.lm)
> emmeans(warp.lm, "tension")
NOTE: Results may be misleading due to involvement in interactions
tension emmean SE df lower.CL upper.CL
L nonEst NA NA NA NA
M 3.213038 0.08825321 34 3.033686 3.392390
H 3.095196 0.12480889 34 2.841554 3.348838
Results are averaged over the levels of: wool
Results are given on the log (not the response) scale.
Confidence level used: 0.95
> regrid(emmeans(warp.lm, "tension"))
NOTE: Results may be misleading due to involvement in interactions
tension response SE df asymp.LCL asymp.UCL
L nonEst NA NA NA NA
M 24.85448 2.193488 NA 20.55533 29.15364
H 22.09157 2.757224 NA 16.68751 27.49563
Results are averaged over the levels of: wool
Confidence level used: 0.95
Note that the first result shows statistics with 34 d.f., but after regrid()
is called, all the d.f. are NA
and the statistics are asymptotic. Why?
hi russell,
i notice the most recent emmeans
adds an import to brms
. this adds a lot of additional dependencies. emmeans
went from having 34 dependencies, to having 87!
is it possible for brms
to be changed to a suggests
?
i'm happy to explore this option if you like.
with thanks
jonathon
I don't know if this is intentional or not, but a student was totally stumped by this, so I'll pass it on.
The first cld works. The second does not. The object created with pairwise is mmGrid, not emmGrid.
_
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
warp.emmGrid <- emmeans(warp.lm, ~ tension | wool)
cld(warp.emmGrid)
warp2.emmGrid <- emmeans(warp.lm, pairwise ~ tension | wool)
cld(warp2.emmGrid)
_
Hi Russ,
It appears as if as.glht()
cannot deal with df
that are Inf
. See code below:
require(lme4)
require(emmeans)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
as.glht(emmeans(fm1, "Days"))
# Loading required namespace: pbkrtest
# Note: df set to 17
#
# General Linear Hypotheses
#
# Linear Hypotheses:
# Estimate
# 4.5 == 0 298.5
emm_options(lmer.df = "asymptotic")
as.glht(emmeans(fm1, "Days"))
# Error in if (any(args$df != df)) message("Note: df set to ", args$df) :
# missing value where TRUE/FALSE needed
# In addition: Warning message:
# In glht.emmGrid(, object, ...) :
# NAs introduced by coercion to integer rang
Thank you for an excellent package.
When I fit a multivariate intercepts-only model for a factorial within-subjects design, I get an error using the mult.levs
argument. Adapted from the ?ref_grid
help page:
MOats.lm = lm(yield ~ 1, data = MOats)
ref_grid(MOats.lm, mult.name = "nitro") # no problem with a single WS factor
ref_grid(MOats.lm, mult.levs = list(T=LETTERS[1:2], U=letters[1:2])) # error
The same error occurs in the latest lsmeans::ref.grid
. It does not occur when there is at least 1 between-subjects factor.
I installed archived versions, and it worked in version 2.26-3, but not 2.27-2
Thanks for your attention,
Terrence
Hello,
This may be an issue, or maybe it's a misunderstanding on my part, but I was surprised by the behavior of emmeans when a polynominal term of the form "I(x^2)" is in an lm statement.
Using the I(x^2) notation, or creating a new squared term by hand (xsq <- x^2), and then using one or the other in an lm statement, and then feeding that into emmeans, yields different results. As far as I can tell inputting the already squared term by hand, yields the correct result, whereas the I(x^2) notation does not.
Here is a reproducible example.
I generate a bunch of data, then I generate the squared term by hand, or insert it directly in the lm statement. I include all interactions in the lm model.
The models are called lm.a and lm.b. Feeding them into emmeans, hoping to get the average treatment effect yields two different results. As far as I can tell the results of lm.b correspond with what would be called the average causal effect in the causal inference lit.
set.seed(1111)
preacademic <- rnorm(100,100,20)
p <- (exp(-2.3+.01*preacademic)) / (1/(1+exp(-2.3+.01*preacademic)))
treat <- rbinom(100,1,p)
postacademic<- 100 + 5*treat + 2*preacademic + rnorm(100,0,20)
treat <- factor(treat)
levels(treat) <- c("control","treatment")
df1 <- data.frame(preacademic,treat,postacademic)
df1$preacademic2 <- df1$preacademic^2
lm.a <-lm(postacademic~treat+preacademic+I(preacademic^2)+treat:preacademic+treat:I(preacademic^2))
lm.b <- lm(postacademic~treat+preacademic+preacademic2+treat:preacademic+treat:preacademic2)
summary(emmeans(lm.a,"treat",contr="pairwise",weights="proportinal"),infer=TRUE)
summary(emmeans(lm.b,"treat",contr="pairwise",weights="proportinal"),infer=TRUE)
The last two commands produce the following output:
`$contrasts
contrast estimate SE df lower.CL upper.CL t.ratio p.value
control - treatment -11.54449 5.067913 94 -21.60695 -1.48203 -2.278 0.0250
$contrasts
contrast estimate SE df lower.CL upper.CL t.ratio p.value
control - treatment -10.63247 4.133922 94 -18.84047 -2.424475 -2.572 0.0117
`
To compare my results, I generate predicted values for all individuals setting either all to treatment or all to control, and then taking a mean difference of those. The results are always the same, and always identical to the emmeans model with lm.b.
`#treatment effect by hand using predictions
mean(predict(lm.a,newdata = data.frame(treat=factor(rep("control",100))))) - mean(predict(lm.a,newdata = data.frame(treat=factor(rep("treatment",100)))))
#treatment effect by hand using predictions
mean(predict(lm.b,newdata = data.frame(treat=factor(rep("control",100))))) - mean(predict(lm.b,newdata = data.frame(treat=factor(rep("treatment",100)))))
`
Thanks!
Hello,
I noticed that emmeans has retired the as.stanfit command, and is now exclusively using as.mcmc when using emmeans for Bayesian models, like the ones generated from the rstanarm package.
Just as a personal opinion, I liked the as.stanfit() command, because it allowed me to generate interest contrasts, etc. using emmeans, but getting output that looked very similar to the stan output, and not like the mcmc output. In particular, this meant that the credible intervals appear next to the estimates, and the Gelman-Rubin convergence criteria are printed alongside the estimates.
It's not a huge deal, and I still appreciate that emmeans is so flexible to utilize all these different fitted methods, but it was nice having as.stanfit().
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.