Giter Site home page Giter Site logo

emmeans's People

Contributors

banfai avatar batpigandme avatar bbolker avatar billdenney avatar femiguez avatar hriebl avatar iago-pssjd avatar jonathon-love avatar maarten-jung avatar nmjakobsen avatar pawelru avatar retodomax avatar rvlenth avatar singmann avatar strengejacke avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emmeans's Issues

Additional arguments not passed to methods when used in "interaction"

I've been trying to create my own method, but have found that arguments cannot be passed when the method is used in an interaction. Here is some sample mock code to demonstrate the issue:

# some fake data
df <- data.frame(Y = rnorm(100),
                 F1 = factor(rep(letters[1:4],each = 25)),
                 F2 = factor(rep(letters[5:8],25)))

fit <- aov(Y ~ F1*F2, data = df)

rg <- emmeans(fit, ~F1*F2)

# my method - won't run unless '.second' is set to FALSE
my_method.emmc <- function(x, .second = TRUE){
  vex <- rep(0,length(x))
  if(.second)
    stop("")
  
  vex[1:2] <- c(-1,1)
  
  as.data.frame(vex)
}

# this works
contrast(rg, "my_method",.second = FALSE)

# this does not
contrast(rg, interaction = c("my_method","pairwise"),.second = FALSE)

I think this can be fixed by changing contrast.emmGrid, when an interaction in called to include .... So from this:

object = contrast.emmGrid(object, interaction[i], by = vars[-i], name = nm)

to this:

object = contrast.emmGrid(object, interaction[i], by = vars[-i], name = nm,...)

nnet:multinom

Hi
while working with emmeans on a multinomial model I've noticed that you get an error when the dependent variable has more than 3 levels and the model features a covariate (see below for reproducible code). I did a bit of exploring and found this:
There is a weird behavior in emmeans when it calls nnet multinomial functions required to compute the standard errors of the means. When the standard errors are estimated, four matrices are needed, which are in the ref_grid object called by emmeans. They are :
grid<-ref_grid(model)
grid@grid
grid@V
grid@bhat
coef(model)
when the number of categories in the dep.variable are more than 3, and there is a covariate in the model, for some reasons the matrices are non-conformable and an error is issued. It does not happen for K=3.

Here is the code to reproduce the error

K=4
gre<-rnorm(1200,0,1)
admit<-factor(rep(c(1:2),600))
rank<-factor(rep(c(1:K),1200/K))
formula <- as.formula(rank ~ admit +gre)
model <- nnet::multinom(formula)
model$call$formula <- formula
emmeans(model, formula(~ rank|admit),mode = "prob")

The workaround I found is to estimate the reg_grid like this:

grid<-ref_grid(model,cov.reduce = function(x) c(mean(x)+.1*100,mean(x)-.1*100))

and then it works.
I hope this is useful
mc

ref_grid with random effects

I am fitting a bayesian mixed model as follows:

df <- data.frame(Participant = as.factor(rep(1:25, each = 4)), 
                 RT = rnorm(100, 30, .2), 
                 Stress = runif(100, 3, 5))
fit <- rstanarm::stan_lmer(RT ~ Stress + (1|Participant), data=df)

I'd like to extract the reference grid (to further predict the outcome):

ref_grid <- emmeans::ref_grid(fit, at = list(
  Stress = seq(min(df$Stress),
            max(df$Stress),
            length.out = 10)))

However, when I check the reference grid (ref_grid@grid), no sign of the random terms. Is it normal?

Sorry if the question is stupid, I didn't find any elements of answer in the vignettes or documentation.

Thanks a lot!

Add RR `type` / `transform`

I think it would be very useful if there was an option to have emmeans return the RR (risk ratio) for binomial models.
RR = Pr(success | a) / Pr(success | b)
or:
RR = (1+exp(-ORb)) / (1+exp(-ORa))

For predicted values (no contrast) this should return the same value as type = 'response' (prob), and for contrasts the RR as stated above.
It should look something like this:

g <- sample(c(0,1),size = 1000, replace = T)
outcome <- round(1/(1+exp(-g + rnorm(1000))))
fit <- glm(outcome ~ factor(g), family = binomial())

emmeans(fit,~g, type = 'rr')
>  g      prob         SE  df asymp.LCL asymp.UCL
>  0 0.5049116 0.02216104 Inf 0.4615488 0.5482006
>  1 0.8289206 0.01699386 Inf 0.7930034 0.8597081
> 
> Confidence level used: 0.95 
> Intervals are back-transformed from the logit scale 

contrast(emmeans(fit,~g, type = 'rr'),'pairwise')
>  contrast  estimate        SE  df z.ratio p.value
>  0 - 1    0.6091194 0.1490623 Inf -10.454  <.0001
> 
> Results are given on the risk ratio (not the response) scale. 

Thanks (again and again) for this amazing package!

On Coxph or survival model

I have difficulty to understand the emmeans results for coxph model.

Is the default setting of emmeans calculate the linear combination of log hazards?

Would it possible to have emmeans to report the estimated survival curve?

For example, in the code below, would it be possible to create the estimated survival probability based on a user-defined time (e.g. time = 1)?

library(survival)
library(emmeans)
# Create the simplest test data set 
test1 <- list(time=c(4,3,1,1,2,2,3), 
              status=c(1,1,1,0,1,1,0), 
              x=c(0,2,1,1,1,0,0), 
              sex=c(0,0,0,0,1,1,1)) 
# Fit a stratified model 
fit <- coxph(Surv(time, status) ~ x + factor(sex), test1) 
emmeans(fit, specs = "sex")
 sex     emmean        SE df asymp.LCL asymp.UCL
   0 -0.4001928 0.6034757 NA -1.582983 0.7825979
   1  0.5335904 0.8046343 NA -1.043464 2.1106447

Results are given on the log (not the response) scale. 
Confidence level used: 0.95 

emtrends does not work with as.numeric variable

Hi,
in emmeans version 1.1.2 if a model has been fit using a variable that is coerced to a numeric variable with as.numeric(var) in the formula, specifying the variable in emtrends causes an error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
‘range’ not meaningful for factors

Example giving error:
data(iris)
model <- lm(Sepal.Length ~ as.numeric(Species), data = iris)
emtrends(model, ~ as.numeric(Species), var = "as.numeric(Species)")

However, if the variable is coerced to numeric before using it in the formula, then it works as expected:
data(iris)
iris$Species <- as.numeric(iris$Species)
model <- lm(Sepal.Length ~ Species, data = iris)
emtrends(model, ~ Species, var = "Species")

Does anyone know of a work around as I don't want to re-run the model?
Thank you for this nice package!

emtrends does not allow retransforming scaled variables

Hi Russ,

I noticed the following problems disallowing scale() in the var part of emtrends.

library("emmeans")
data("Prestige", package = "carData")
mp_3 <- lm(income ~ scale(education)*scale(prestige), Prestige)

emtrends(mp_3, specs = "education", var = "prestige", 
         at = list(education = c(8.01, 10.74, 13.47)))
#  education prestige.trend       SE df lower.CL upper.CL
#       8.01       141.6524 40.24965 98  61.7783 221.5266
#      10.74       186.9203 32.50945 98 122.4064 251.4343
#      13.47       232.1882 34.95312 98 162.8249 301.5515
# 
# Confidence level used: 0.95 

emtrends(mp_3, specs = "education", var = "scale(prestige)", 
         at = list(education = c(8.01, 10.74, 13.47)))
# Error in if (any(diffl == 0)) warning("Some differentials are zero") : 
#   missing value where TRUE/FALSE needed

Best,
Henrik

some recover_data() methods are not yet exported

Hi Russ,

I am working on extending my new afex_plot() function with a default method. In this method I rely on emmeans::recover_data() to obtain the raw data used for fitting the model. This works in many cases, but not in all. For example:

require("ordinal")
wine.clm <- clm(rating ~ temp + contact, scale = ~ judge,
                data = wine, link = "probit")
emmeans::recover_data(wine.clm, list(pairwise ~ temp, pairwise ~ contact))
# Error in UseMethod("recover_data") : 
#   no applicable method for 'recover_data' applied to an object of class "clm"

However

emmeans:::recover_data.clm(wine.clm)
#    temp contact judge
# 1  cold      no     1
# 2  cold      no     1
# 3  cold     yes     1
# 4  cold     yes     1
# [...]

It would be great if you could export all recover_data methods so they could be used in other packages, such as afex. If it helps, I could submit a pull request.

Thanks,
Henrik

adjust='none' is ignored by contrast

arguably adjust='none' should squelch and p.value adjustments, but does not, as evidenced:

stopifnot(! identical(coefficients(contrast(warp.emm, "eff", by = NULL)),coefficients(contrast(warp.emm, "eff", by = NULL))))
Error: !identical(coefficients(contrast(warp.emm, "eff", by = NULL)), .... is not TRUE

THANKS FOR EMMEANS!

emmip plot options

The interaction plot using emmip() seems to accept some of the customization from ggplot2 but not others. For example, theme_bw() works, and the line color works, but the linetype and plotting symbol don't have an effect. I tried using the lty, col, and pch arguments as well, and the lattice engine, to no avail. Am I missing something or is this a bug?

I recognize this is a tiny feature. For the benefit of my students, I try to be as "compact" as possible, and your package is a phenomenal resource. But for those who might not have color printers, I tend to rely on linetype and plotting symbol more than colors.

Example:

df1 <- data.frame(sex = rep(c("Female", "Male"), each=9),
                  time=c("breakfeast", "Lunch", "Dinner"),
                  bill=c(10, 30, 15, 13, 40, 17) )
lm01 <- lm( bill ~ sex*time , data=df1 )

emmip( lm01 , sex ~ time  ) + theme_bw() + 
  scale_color_manual(    values=c("tomato", "skyblue") ) + 
  scale_shape_manual(    values=c(21,24) ) + 
  scale_linetype_manual( values=c("solid", "dashed") )

I have version 1.1.3 (updated yesterday).

Plot trends of emtrends

Hello,

I mainly use your functions of emmeans and emtrends to get summary statistics from my models. Your package is awesome!

I've been able to retrieve the means and standard errors of emmeans and plot them in bar graphs. Now, I am working with data where one of my predictors is a factor and the other is numeric. So I would be plotting trends.

Is there a similar way to plot trends with the emmGrid object that I get from emtrends?

Thanks!

t tests become z tests on shinyapps.io

Dear prof.Lenth,

When I run the following Shiny app on my Windows computer it uses t-test in the emmeans paired comparisons. The calculated P-values are in agreement with the results from the Mixed Model with Paired comparison in SAS/JMP Pro 13 EN.
However when I publish this Shiny app on shinyapps.io the paired comparisons use the z-test. See https://gmwebapplications.shinyapps.io/InvestigatePvalue/
Would it be possible to use the t-test on the shinyapps.io ?
Thanks for your attention, Gerben

`# Investigate difference in calculated P-values between Windows and shinyapps.io

suppressWarnings(suppressMessages(library(shiny)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(lme4)))
suppressWarnings(suppressMessages(library(emmeans)))

ui <- fluidPage(

tags$hr(),

sidebarLayout(

sidebarPanel(),

mainPanel(
  verbatimTextOutput("lme4Results"),
  tags$hr(),
  verbatimTextOutput("emmeansResults")
)

)
)

server <- function(input, output) {

Build dataset -----------------------------------------------------------

FI <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,1,0,0,0,0,0,0,1,2,0,0,0,0,-1,0,1,3,2,-1,-2,1,1,0,0,1,0,0,-1,3,1,0,0,0,0,1,1,0,1)
Assessor <- c("am","am","an","an","ca","ca","ch","ch","gr","gr","kl","kl","na","na","ro","ro","sa","sa","st","st","am","am","an","an","ca","ca","ch","ch","gr","gr","kl","kl","na","na","ro","ro","sa","sa","st","st","am","am","an","an","ca","ca","ch","ch","gr","gr","kl","kl","na","na","ro","ro","sa","sa","st","st")
Product <- rep(c("1=Ref","2=Alt","3=Alt2"), each = 20)
dt <- tibble(Product,Assessor,FI)

Calculations ------------------------------------------------------------

Mixed model with Product and Assessor -----------------------------------

MixedModel <-
lmer(FI ~ Product + (1| Assessor) + (1| Product:Assessor), data = dt)
output$lme4Results <- renderPrint(summary(MixedModel))

Paired comparison with Dunnett correction using emmeans

MixedModel.emm <- emmeans(MixedModel,"Product")
ConfIntTbl <- as.data.frame(MixedModel.emm)
MixedModel.emm.Dunnett <- contrast(MixedModel.emm, "trt.vs.ctrl", ref = 1)
output$emmeansResults <- renderPrint(summary(MixedModel.emm.Dunnett))
}

shinyApp(ui, server)
`
Result in Windows:

`contrast estimate SE df t.ratio p.value
2=Alt - 1=Ref 0.6 0.3333333 18 1.8 0.1594
3=Alt2 - 1=Ref 0.3 0.3333333 18 0.9 0.5807

P value adjustment: dunnettx method for 2 tests `

Result in shinyapps.io:

`contrast estimate SE df z.ratio p.value
2=Alt - 1=Ref 0.6 0.3333333 Inf 1.8 0.1318
3=Alt2 - 1=Ref 0.3 0.3333333 Inf 0.9 0.5683

P value adjustment: dunnettx method for 2 tests `

inflated dfs in emmeans output

Hi Russ,

I have run the same model (mod1) with both lsmeans and emmeans for pairwise comparisons. I noticed that the output from emmeans involved some inflated df values which were not found in the lsmeans output. The inflated dfs reduced the number of statistically significant group contrasts. The output from lsmeans and emmeans is listed below. Could you please let me know whether the inflated dfs are resulted from the incorrect models I built or the inaccurate calculations of emmeans? Thanks in advance!

mod1<-lmer(GJTscoreGJTpre+Time+Group+Time*Group+(1|Class/Subject),data=GJT, REML = F)
lsmeans(mod1,list(pairwise
Group|Time),adjust="bonferroni")

$pairwise differences of contrast, Time | Time
Time = 2:
contrast estimate SE df t.ratio p.value
4 - 1 -3.7713780 1.211024 216.96 -3.114 0.0126
4 - 2 -3.0443478 1.183329 216.71 -2.573 0.0646
4 - 3 -2.6995979 1.262705 215.33 -2.138 0.2019
1 - 2 0.7270302 1.111591 217.16 0.654 1.0000
1 - 3 1.0717801 1.179184 217.87 0.909 1.0000
2 - 3 0.3447499 1.153255 217.89 0.299 1.0000

Time = 4:
contrast estimate SE df t.ratio p.value
4 - 1 -5.8488587 1.204532 212.95 -4.856 <.0001
4 - 2 -3.7107185 1.184252 217.30 -3.133 0.0118
4 - 3 -4.4727699 1.256006 211.36 -3.561 0.0027
1 - 2 2.1381402 1.113608 218.52 1.920 0.3370
1 - 3 1.3760888 1.173556 214.28 1.173 1.0000
2 - 3 -0.7620513 1.154011 218.37 -0.660 1.0000

Time = 6:
contrast estimate SE df t.ratio p.value
4 - 1 -5.4159758 1.204532 212.95 -4.496 0.0001
4 - 2 -3.2493890 1.179468 214.27 -2.755 0.0382
4 - 3 -3.8862978 1.252647 209.45 -3.102 0.0131
1 - 2 2.1665868 1.108840 215.30 1.954 0.3120
1 - 3 1.5296780 1.170054 212.13 1.307 1.0000
2 - 3 -0.6369087 1.146254 213.39 -0.556 1.0000

Time = 8:
contrast estimate SE df t.ratio p.value
4 - 1 -5.2912010 1.204532 212.95 -4.393 0.0001
4 - 2 -2.4249866 1.177488 213.02 -2.059 0.2440
4 - 3 -3.9278571 1.273459 221.66 -3.084 0.0138
1 - 2 2.8662144 1.106607 213.79 2.590 0.0615
1 - 3 1.3633440 1.192269 226.20 1.143 1.0000
2 - 3 -1.5028704 1.166641 226.61 -1.288 1.0000

P value adjustment: bonferroni method for 6 tests

emmeans(mod1,list(pairwise~Group|Time),adjust="bonferroni")
$emmeans of Group | Time

$pairwise differences of Group | Time
Time = 2:
contrast estimate SE df t.ratio p.value
4 - 1 -3.7713780 1.420087 6.14 -2.656 0.2213
4 - 2 -3.0443478 1.395626 6.13 -2.181 0.4258
4 - 3 -2.6995979 1.467241 100.85 -1.840 0.4123
1 - 2 0.7270302 1.351757 2.79 0.538 1.0000
1 - 3 1.0717801 1.394396 5.57 0.769 1.0000
2 - 3 0.3447499 1.371838 5.58 0.251 1.0000

Time = 4:
contrast estimate SE df t.ratio p.value
4 - 1 -5.8488587 1.415019 6.01 -4.133 0.0367
4 - 2 -3.7107185 1.397154 6.14 -2.656 0.2215
4 - 3 -4.4727699 1.466036 95.44 -3.051 0.0177
1 - 2 2.1381402 1.353467 2.81 1.580 1.0000
1 - 3 1.3760888 1.392144 5.44 0.988 1.0000
2 - 3 -0.7620513 1.375174 5.56 -0.554 1.0000

Time = 6:
contrast estimate SE df t.ratio p.value
4 - 1 -5.4159758 1.415019 6.01 -3.827 0.0520
4 - 2 -3.2493890 1.392929 6.04 -2.333 0.3488
4 - 3 -3.8862978 1.461934 95.30 -2.658 0.0553
1 - 2 2.1665868 1.349424 2.77 1.606 1.0000
1 - 3 1.5296780 1.388054 5.38 1.102 1.0000
2 - 3 -0.6369087 1.367408 5.43 -0.466 1.0000

Time = 8:
contrast estimate SE df t.ratio p.value
4 - 1 -5.2912010 1.415019 6.01 -3.739 0.0577
4 - 2 -2.4249866 1.391218 6.00 -1.743 0.7917
4 - 3 -3.9278571 1.482142 100.09 -2.650 0.0561
1 - 2 2.8662144 1.347532 2.74 2.127 0.7904
1 - 3 1.3633440 1.408233 5.79 0.968 1.0000
2 - 3 -1.5028704 1.385909 5.81 -1.084 1.0000

P value adjustment: bonferroni method for 6 tests

Best regards,

Mengxia

CLD yields incoherent results

Hello, I was trying out an example for my class and found some surprising results using cld()

library(emmeans)

fert_long <- structure(list(Finca = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 
                                                3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
                                              .Label = c("1", "2", "3", "4", 
                                                         "5"), class = "factor"), 
                            Fertilizante = c("Fertilizante.1", "Fertilizante.1", 
                                             "Fertilizante.1", "Fertilizante.1", "Fertilizante.1", "Fertilizante.2", 
                                             "Fertilizante.2", "Fertilizante.2", "Fertilizante.2", "Fertilizante.2", 
                                             "Fertilizante.3", "Fertilizante.3", "Fertilizante.3", "Fertilizante.3", 
                                             "Fertilizante.3"), 
                            Produccion = c(1, 2, 7, 6, 12, 5, 8, 9, 13, 
                                           14, 8, 14, 16, 18, 17)), class = "data.frame", 
                       .Names = c("Finca", 
                                  "Fertilizante", "Produccion"), row.names = c(NA, -15L))

fert_aov <- aov(Produccion ~ Fertilizante + Finca, data = fert_long)
fert_lsm <- emmeans(fert_aov, pairwise ~ Fertilizante)
cld(fert_lsm)

This yields in:

$emmeans
 Fertilizante   emmean       SE df  lower.CL  upper.CL .group
 Fertilizante.1    5.6 0.772442  8  3.818746  7.381254  1    
 Fertilizante.2    9.8 0.772442  8  8.018746 11.581254   2   
 Fertilizante.3   14.6 0.772442  8 12.818746 16.381254    3  

Results are averaged over the levels of: Finca 
Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 3 estimates 
significance level used: alpha = 0.05 

$contrasts
 contrast                        estimate       SE df t.ratio p.value .group
 Fertilizante.1 - Fertilizante.3     -9.0 1.092398  8  -8.239  0.0001  1    
 Fertilizante.2 - Fertilizante.3     -4.8 1.092398  8  -4.394  0.0058   2   
 Fertilizante.1 - Fertilizante.2     -4.2 1.092398  8  -3.845  0.0121   2   

Results are averaged over the levels of: Finca 
P value adjustment: tukey method for comparing a family of 3 estimates 
significance level used: alpha = 0.05 

emmeans results is correct, but I find the constrast section kind of misleading. I think, the contrast between Fertilizante.1 - Fertilizante.2 should result in .group 3.

Is this a bug or a feature?

brms support fails with poly() function in formula

Hi Russ,

It appears that at the moment the brms support with emmeans fails with models that include a call to poly in the formula. For example:

library("emmeans")
library("brms")
warpbreaks$tension_num <- as.numeric(warpbreaks$tension)

warp.mcmc <- brm(breaks ~ wool * poly(tension_num, 2), data = warpbreaks, 
                  family = gaussian())

emmeans(warp.mcmc, "wool")
# Error in poly(tension_num, 2) : 
#   'degree' must be less than number of unique points

That probably is due to the specific way in which brms handles those terms:

summary(warp.mcmc)
# [...]
# Population-Level Effects: 
#                         Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# Intercept                  31.00      2.11    26.84    35.03       4000 1.00
# woolB                      -5.78      3.03   -11.56     0.20       4000 1.00
# polytension_num21         -60.59     15.91   -92.30   -29.10       3102 1.00
# polytension_num22          36.43     16.14     4.70    67.77       3032 1.00
# woolB:polytension_num21    32.52     22.72   -12.25    77.85       3252 1.00
# woolB:polytension_num22   -54.70     22.56   -99.32    -9.86       2997 1.00
# [...]

Compare with:

warp.lm <- lm(breaks ~ wool * poly(tension_num, 2), data = warpbreaks)

emmeans(warp.lm, "wool")
# NOTE: Results may be misleading due to involvement in interactions
#  wool   emmean       SE df lower.CL upper.CL
#  A    24.00000 3.646761 48 16.66769 31.33231
#  B    28.77778 3.646761 48 21.44547 36.11008
# 
# Confidence level used: 0.95 

Cheers,
Henrik

Support for mgcv::gam requested

Hi,

On line 28 of emmeans/R/gam-support.R you open up the possibility for future support for mgcv::gam. I am wondering if you would be able to add this support?

The reason for me using the mgcv package rather than the gam package is the ability of the former to handle factor covariates in the s(...) smoother function using a keyword argument named "by=".

As an example:

# load data and libraries
> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-23. For overview type 'help("mgcv-package")'.

> library(emmeans)

# load mtcars
> data(mtcars)

# create a factor out of the cyl variable
> mtcars$cylfact <- factor(mtcars$cyl)

# fit a mgcv::gam model and print the summary
> mtcars.gam1 <- gam(mpg ~ 0 + cylfact + s(wt, by = cylfact), data = mtcars)
> summary(mtcars.gam1)

# attempt building an emmeans::ref_grid
> ref_grid(mtcars.gam1)
Error in emm_basis.gam_mgcv(object, trms, xlev, grid, ...) : 
  Can't handle mgcv::gam objects

I also run into trouble when I use gam::gam

# reset R session, reload mtcars and make a factor out of the cyl variable
> library(gam)
Loading required package: splines
Loading required package: foreach
Loaded gam 1.14-4

> library(emmeans)
> data(mtcars)
> mtcars$cylfact <- factor(mtcars$cyl)

# fit a gam::gam model and print the summary
> mtcars.gam2 <- gam(mpg ~ 0 + cylfact * s(wt), data = mtcars)
> summary(mtcars.gam2)

# attempt building an emmeans::ref_grid
> ref_grid(mtcars.gam2)
Error in apply(smooth.frame[[i]], 1, paste, collapse = ":") : 
  dim(X) must have a positive length

Since mgcv is packaged in the R standard library, my preference is to be able to use it together with emmeans.

Thank you for considering.

Possible error in Scheffe adjustment

Hi,
I believe there may be an error in the way that confidence intervals are calculated using Scheffe's method of multiple comparisons. As an example, if I run the code below

data(fiber)
modFiber <- lm(strength ~ machine, data=fiber)
emFiber <- emmeans(modFiber, "machine")
contrast(emFiber, "pairwise", adjust="scheffe", options=list(infer=TRUE))

I get

Warning message:
In sqrt(n.contr + scheffe.adj * qf(level, n.contr + scheffe.adj,  :
  NaNs produced

It appears there are two issues in the calculation of the critical coefficient:

  1. Parentheses are missing around n.contr + scheffe.adj.
  2. The calculation is using the number of contrasts instead of the number of treatments.

I have tried this on other datasets and get confidence intervals, but the numbers are incorrect. In particular, the intervals are shorter than Tukey's method for pairwise comparisons.

Thanks!
Chris

Bayesian enhancements desired

Note to self on future enhancement of Bayesian support:

  1. Consider not requiring the bhat and V slots when there is a nontrivial post.beta. we no longer
    need these when doing a summary. However, we need to work around having these absent in methods
    like regrid(). Could be a tough row to hoe.
    Skip this -- (See add'l comment below)
  2. Best posterior point estimate: currently the median. But this does create confusion in calling these estimates emmean, though. Re-label to med.emmean? Change the default to the mean?
  3. Consider adding type argument to as,mcmc, so that as.mcmc(emm, type = "response") would
    return the EMM sample already back-transformed.

"Averaged over" annotation out of control with nested models

Consider this example adapted from one of the vignettes:

cows <- data.frame (
    route = factor(rep(c("injection", "oral"), c(5, 9))),
    drug = factor(rep(c("Bovineumab", "Charloisazepam", 
              "Angustatin", "Herefordmycin", "Mollycoddle"), c(3,2,  4,2,3))),
    resp = c(34, 35, 34,   44, 43,      36, 33, 36, 32,   26, 25,   25, 24, 24),
    x1 = .2*sin(1:14),
    x2 = (.03*(19:6)^2)
)
cows.lm <- lm(resp ~ route + drug + x1 + x2, data = cows)

emmeans(cows.lm, "route")
## NOTE: A nesting structure was detected in the fitted model:
##     drug %in% route
## If this is incorrect, re-run or update with `nesting` specified
##  route       emmean       SE df lower.CL upper.CL
##  injection 36.49093 1.832164  7 32.15855 40.82331
##  oral      29.61657 1.265626  7 26.62384 32.60930
## 
## Results are averaged over the levels of: drug, x1, x2 
## Confidence level used: 0.95 

This is a nested model. The results are correct, but the annotation says that we averaged over x1 and x2; but these are covariates, and each is set to only one value in the grid (its mean).

Anyway, we need to fix this so the annotations to show what we actually averaged over.

Effect sizes for computed contrasts

(Not so much an issue as it is a request)
Are there any plans to add effect sized to contrasts (either partial eta square, or r-square-alerting)?
This would be very useful, as some journals require an effects size to accompany each p-value (and doing this "by hand" is quite tedious).

link functions in glmmTMB

Is emmeans recognizing link functions with glmmTMB? I am running emmeans on a glmmTMB object specified with a poisson model, but summary(emmGrid object) does not report the link. Including type = "response" returns the same output as the default command.

cohab1 <- glmmTMB(Cohabit ~ Sex*OthSex + offset(log(Fixes)) + (1 | ID), family = poisson, data = AllShelts)
emm.TMB<- emmeans(cohab1, ~OthSex | Sex)
summary(emm.TMB)

Sex = F:
 OthSex     emmean        SE df   asymp.LCL asymp.UCL
 F       0.3131363 0.1931931 NA -0.06551531 0.6917879
 M       0.4734789 0.1846699 NA  0.11153255 0.8354253

Sex = M:
 OthSex     emmean        SE df   asymp.LCL asymp.UCL
 F       2.1803186 0.2194255 NA  1.75025259 2.6103846
 M      -0.5052588 0.5989604 NA -1.67919960 0.6686820

Confidence level used: 0.95

summary(emm.TMB, type = "response")

Sex = F:
 OthSex     emmean        SE df   asymp.LCL asymp.UCL
 F       0.3131363 0.1931931 NA -0.06551531 0.6917879
 M       0.4734789 0.1846699 NA  0.11153255 0.8354253

Sex = M:
 OthSex     emmean        SE df   asymp.LCL asymp.UCL
 F       2.1803186 0.2194255 NA  1.75025259 2.6103846
 M      -0.5052588 0.5989604 NA -1.67919960 0.6686820

Confidence level used: 0.95

emtrends: Error in data[[var]] : subscript out of bounds

Hi,

I have an issue with emtrends. I have a model in which emmeans detects incorrectly nesting. I updated the ref_grid with nesting = NULL and that fixed the issue with emmeans. I tried to use the updated ref_grid with emtrends and I get the following error: Error in data[[var]] : subscript out of bounds

As I mentioned the same updated ref_grid works just fine with emmeans, thus I presume that it would be some problem in emtrends.

On a different topic, is there a way to stop emmeans/emtrends nesting recognition?

Thank you.

Paco

Factors coded as "factor" vs "character" - using character seems to ignore any weights= argument

Hello,
The following issue was discovered by one of my undergrads working on a homework assignment. In unbalanced factorial designs (in which the weights= option should typically result in different estimates), the weights= argument in emmeans is ignored if one of the factors is a string (as opposed to a factor). The result is surprising because (obviously) both the lm statement (and also the emmeans statement) run this model without issues. In fact, as expected the lm statement using either a variable that is a factor or a character yield identical results.

Here is a minimally reproducible example:

t1 <- c(0,0,0,0,0,0,1,1,1,1,1,1)
t2 <- c("C","C","C","T","T","C","T","T","T","C","C","C")
y <- c(1,2,3,4,5,6,7,8,9,10,1,2)

df <- data.frame(t1,t2,y)
df$t1 <- as.factor(df$t1)
df$t2_chr <- as.character(df$t2)
str(df)

lm1 <- lm(y~t1*t2,data=df)
lm2 <- lm(y~t1*t2_chr,data=df)

all.equal(as.numeric(lm1$coefficients),as.numeric(lm2$coefficients))

library(emmeans)

e1 <- emmeans(lm1,"t1",weights="proportional")
e2 <- emmeans(lm1,"t1",weights="equal")
all.equal(e1,e2)


b1 <- emmeans(lm2,"t1",weights="proportional")
b2 <- emmeans(lm2,"t1",weights="equal")
all.equal(b1,b2)

Error

I got this error while analysing data with 2 groups, for three groups work fine

NOTE: 'cld()' groupings are determined separately for each list member.
Error in [[<-.data.frame(*tmp*, nm, value = c(-1, 1)) :
replacement has 2 rows, data has 1

Unexpected indexing of `summary_emm` object.

summary_emm objects also have class data.frame. $ and [[ indexing work as expected, and result in a single column. It's not clear to me, however, what [ indexing does, since it usually just returns an error. I also can't find documentation of a [.summary_emm method. Is this intentional?

library(emmeans)
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
warp.emmGrid <- emmeans(warp.lm, ~ tension | wool)

summary(warp.emmGrid)
# wool = A:
#   tension   emmean       SE df lower.CL upper.CL
# L       44.55556 3.646761 48 37.22325 51.88786
# M       24.00000 3.646761 48 16.66769 31.33231
# H       24.55556 3.646761 48 17.22325 31.88786
# 
# wool = B:
#   tension   emmean       SE df lower.CL upper.CL
# L       28.22222 3.646761 48 20.88992 35.55453
# M       28.77778 3.646761 48 21.44547 36.11008
# H       18.77778 3.646761 48 11.44547 26.11008

# these both work
summary(warp.emmGrid)$emmean
summary(warp.emmGrid)[['emmean']]

# errors:
summary(warp.emmGrid)['emmean']

# Error in .subset2(x, i, exact = exact) : 
#   attempt to select less than one element in get1index

Feature request: "reverse" argument in trt.vs.ctrl.emmc

Would it be possible to have a "reverse" argument in trt.vs.ctrl.emmc?

Right now the control groups is subtracted from the treatments, and I think this is likely what folks most often want when comparing everything to a control group. However, I was recently in a situation where the control group was much larger than the treatment and the comparisons (which in this case happened to be odds ratios) felt more interpretable as control minus treatment.

Of course, since you've made it so easy to extract results into data.frames, I could easily invert the results myself and so "manually" switch the order of the comparisons.

df = 0 in certain lme models

User-reported issue:

In a model with 360 abundance measurement performed at six sampling dates, i want to know the pairwise tukey contrasts for the six dates.

mod <- glm(Abundance ~ Date, family="gaussian") 

FWIW, Abundance is normally distributed in its entirety, the subpopulations per date are not

means <- emmeans(mod, ~Date)
pairs (means, simple="Date")

works well, as does

glht(mod, mcp(Date="Tukey")).

However, Abund is spatially auto-correlated. I thus switch to lme with a dummy random effect and update to the correlation structure yielding the best AIC.

best.mod <- lme(fixed=Abundance ~ Date, data = data, 
    random = ~1| dummy, method="ML", 
    correlation=corRat(1, form = ~ x + y))

Now, glht() still works:

> summary(glht(mods[[best.mod]], mcp(Date = "Tukey")))

	 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lme.formula(fixed = Abundance ~ Date, data = D, random = ~1 | 
    Date, correlation = getFunction(m)(1, form = ~x + y), method = "ML")

Linear Hypotheses:
                  Estimate Std. Error z value Pr(>|z|)    
May - April == 0  -0.30829    0.15815  -1.949   0.3718    
June - April == 0  0.41849    0.15749   2.657   0.0841 .  
Aug. - April == 0 -0.25109    0.15741  -1.595   0.6018    
Oct. - April == 0 -0.28579    0.15747  -1.815   0.4561    
Nov. - April == 0 -0.09987    0.15709  -0.636   0.9883    
June - May == 0    0.72677    0.15759   4.612   <0.001 ***
Aug. - May == 0    0.05719    0.15751   0.363   0.9992    
Oct. - May == 0    0.02250    0.15758   0.143   1.0000    
Nov. - May == 0    0.20842    0.15720   1.326   0.7707    
Aug. - June == 0  -0.66958    0.15685  -4.269   <0.001 ***
Oct. - June == 0  -0.70428    0.15691  -4.488   <0.001 ***
Nov. - June == 0  -0.51835    0.15653  -3.311   0.0119 *  
Oct. - Aug. == 0  -0.03470    0.15683  -0.221   0.9999    
Nov. - Aug. == 0   0.15122    0.15645   0.967   0.9285    
Nov. - Oct. == 0   0.18592    0.15652   1.188   0.8429    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

However, emmeans does not produce p-values with best.mod, possible because of zero degree of freedoms

> means <- emmeans(mods[[best.mod]], ~ Date)
> pairs(means, simple="Date")
 contrast        estimate        SE df t.ratio p.value
 April - May   0.30828775 0.1594914  0   1.933     NaN
 April - June -0.41848582 0.1588249  0  -2.635     NaN
 April - Aug.  0.25109352 0.1587444  0   1.582     NaN
 April - Oct.  0.28579102 0.1588097  0   1.800     NaN
 April - Nov.  0.09986911 0.1584278  0   0.630     NaN
 May - June   -0.72677357 0.1589291  0  -4.573     NaN
 May - Aug.   -0.05719423 0.1588486  0  -0.360     NaN
 May - Oct.   -0.02249673 0.1589139  0  -0.142     NaN
 May - Nov.   -0.20841864 0.1585322  0  -1.315     NaN
 June - Aug.   0.66957934 0.1581794  0   4.233     NaN
 June - Oct.   0.70427684 0.1582449  0   4.451     NaN
 June - Nov.   0.51835493 0.1578616  0   3.284     NaN
 Aug. - Oct.   0.03469750 0.1581641  0   0.219     NaN
 Aug. - Nov.  -0.15122440 0.1577806  0  -0.958     NaN
 Oct. - Nov.  -0.18592191 0.1578463  0  -1.178     NaN

P value adjustment: tukey method for comparing a family of 6 estimates 
Warning messages:
1: In pt(abs(t), DF, lower.tail = FALSE) : NaNs wurden erzeugt
2: In ptukey(sqrt(2) * abst, fam.size, zapsmall(df), lower.tail = FALSE) :
  NaNs wurden erzeugt

"weights" ignored in `emmeans` when `specs` is a list

A user has reported that the "weights" argument is ineffective in emmeans() when the specs argument is a list. Here is an example:

require("emmeans")
nutr.lm <- lm(gain ~ age + group + race, data = nutrition) 

sapply(c("equal", "prop", "outer", "cells", "flat"), function(w)     
    predict(emmeans(nutr.lm, "race", weights = w)))
##          equal     prop    outer     cells      flat
## [1,] 0.6662975 1.724608 1.724608 0.3809524 0.7178124
## [2,] 1.4760579 2.534368 2.534368 1.6666667 2.6219735
## [3,] 1.3655468 2.423857 2.423857 2.7951807 1.8945106

sapply(c("equal", "prop", "outer", "cells", "flat"), function(w)
    predict(emmeans(nutr.lm, list("race"), weights = w)[[1]]))
##          equal      prop     outer     cells      flat
## [1,] 0.6662975 0.6662975 0.6662975 0.6662975 0.6662975
## [2,] 1.4760579 1.4760579 1.4760579 1.4760579 1.4760579
## [3,] 1.3655468 1.3655468 1.3655468 1.3655468 1.3655468

Note in the second case the weights make no difference, whereas they do in the first case.

Possible bug in emmeans and Surv object

I am fitting a parametric survival model with survreg and I would like to get the means by using the emmeans() function. The function used to work fine until emmeans vers. 1.2.3, but it no longer works with vers. 1.2.4. I am getting the following error:

Error in parse(text = x, keep.source = FALSE) :    <text>:1:20: unexpected input 1: ~Surv(L, U, type = _
                       ^

My code is as follows:

BBsurvey <- structure(data.frame(1:36, structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L), .Label = c("A", 
"B", "C", "D", "E", "F", "G", "H", "I"), class = "factor"), c(0.1, 
0.1, 5, 5, 0.1, 0.1, 5, 5, 0, 0.1, 5, 5, 0, 0.1, 5, 5, 0, 0, 
0.1, 0.1, 5, 5, 25, 25, 0.1, 0.1, 5, 5, 0, 0.1, 5, 5, 75, 25, 
50, 25), c(5, 5, 25, 25, 5, 5, 25, 25, 0.1, 5, 25, 25, 0.1, 5, 
25, 25, 0.1, 0.1, 5, 5, 25, 25, 50, 50, 5, 5, 25, 25, 0.1, 5, 
25, 25, 100, 50, 75, 50), c(2.55, 2.55, 15, 15, 2.55, 2.55, 15, 
15, 0.05, 2.5, 15, 15, 0.05, 2.55, 15, 15, 0.05, 0.05, 2.55, 
2.55, 15, 15, 37.5, 37.5, 2.5, 2.55, 15, 15, 0.1, 2.55, 15, 15, 
87.5, 37.5, 62.5, 37.5)), .Names = c("X", "Herbicide", "L", "U", 
"midPoint"))

mod.surv <- survreg(Surv(L, U, type="interval2") ~ Herbicide, 
  dist="gaussian", data = BBsurvey)
means.surv <- emmeans(mod.surv, Herbicide)

Any help with this is very much appreciated. Thank you very much in advance.

Andrea

LS means with offset option

I am using lsmeans and just notice the new package emmeans. Hopefully my issue for lsmeans has been addressed in emmeans.

In lsemans, while we have a model with offset term, the LS means will automatically add the offset term.

It would be nice to have some information in the help page, as this is inconsistent with SAS. (SAS did not add offset term)

It will be nice to have LS means for zero-inflated models that used in pscl::zeroinfl.

The current code can directly be used. my hack is as below

# negtive binomial model for event rate
  fit <- MASS::glm.nb(cnt ~ TRT01P + ... + offset(logT), data = db_cnt)
  .ls <- lsmeans(fit, "TRT")
  .er <- exp(fit_ls[,2] - .ls@grid$.offset.)

# Zero inflated negtive binomial model for event rate
  fit<- pscl::zeroinfl(cnt ~ TRT + ... + offset(logT) | TRT,
                 dist = "negbin", data = db_cnt)  
  .er <- as.numeric(exp(.ls@linfct %*% fit$coefficients$count ))

Message about "averaging over"

First of all, thanks for this great package.

I am currently using a model with one numerical and two categorical variables.
Here's an example of how it might look like:

Create data frame:

y <- rnorm(n = 100, mean = -2, sd = 0.5)
x1 <- rnorm(n = 100, mean = 3, sd = 1.5)
x2 <- sample(x = c("a1", "a2", "a3"), size = 100, replace = TRUE)
x3 <- sample(x = c("b1", "b2", "b3"), size = 100, replace = TRUE)

Run model and show coefficients

res <- lm(y ~ x1 + x2 + x3)
coef(res)
#> (Intercept)          x1        x2a2        x2a3        x3b2        x3b3 
#> -1.87891191 -0.00222856 -0.19620519 -0.16569943  0.00158045 -0.05318129

Then, I used emmeans to quickly calculate the predicted means and SEs for the different factor levels of x2

pairs.x2 <- emmeans::emmeans(res, pairwise ~ x2)
pairs.x2$emmeans
#>  x2    emmean         SE df  lower.CL  upper.CL
#>  a1 -1.902427 0.10199471 94 -2.104940 -1.699914
#>  a2 -2.098633 0.08209662 94 -2.261637 -1.935628
#>  a3 -2.068127 0.09411921 94 -2.255003 -1.881251
#> Results are averaged over the levels of: x3 
#> Confidence level used: 0.95

Given the message "averaged over the levels of: x3", I thought that results were NOT averaged over the numerical variable x1

In that case, I thought that the means should be the following:

# for x2 = a1
0 + coef(res)["(Intercept)"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#>   -1.896112
# For x2 = a2
coef(res)["x2a2"] + coef(res)["(Intercept)"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> -2.092317
# For x2 = a3
coef(res)["x2a3"] + coef(res)["(Intercept)"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> -2.061812

However, they are

0 + coef(res)["(Intercept)"] + mean(x1) * coef(res)["x1"] + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#> (Intercept) 
#>   -1.902427
coef(res)["x2a2"] + coef(res)["(Intercept)"] + mean(x1) * coef(res)["x1"]  + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#>      x2a2 
#> -2.098633
coef(res)["x2a3"] + coef(res)["(Intercept)"] + mean(x1) * coef(res)["x1"]  + ((coef(res)["x3b2"] + coef(res)["x3b3"])/3)
#>      x2a3 
#> -2.068127

I was confused that the output did not indicate that results are averaged over x3 AND x1
and I was wondering if you might want to consider including this information into the output

Thanks a lot,
Urs

cld details incorrect when there is a by variable

Reported by a user. The results obtained with cld(..., details = TRUE) are not correct. For example:

> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
> warp.emm <- emmeans(warp.lm, ~ tension | wool)
> cld(warp.emm, details=TRUE, adjust="none", alpha = .2)
$`emmeans`
wool = A:
 tension   emmean       SE df lower.CL upper.CL .group
 M       24.00000 3.646761 48 16.66769 31.33231  1    
 H       24.55556 3.646761 48 17.22325 31.88786  1    
 L       44.55556 3.646761 48 37.22325 51.88786   2   

wool = B:
 tension   emmean       SE df lower.CL upper.CL .group
 H       18.77778 3.646761 48 11.44547 26.11008  1    
 L       28.22222 3.646761 48 20.88992 35.55453   2   
 M       28.77778 3.646761 48 21.44547 36.11008   2   

Confidence level used: 0.95 
significance level used: alpha = 0.2 

$comparisons
wool = A:
 contrast   estimate       SE df t.ratio p.value
 H - M     0.5555556 5.157299 48   0.108  0.9147
 L - M    20.5555556 5.157299 48   3.986  0.0002
 L - H    20.0000000 5.157299 48   3.878  0.0003

wool = B:
 contrast   estimate       SE df t.ratio p.value
 H - M     9.4444444 5.157299 48   1.831  0.0733
 L - M    10.0000000 5.157299 48   1.939  0.0584
 L - H     0.5555556 5.157299 48   0.108  0.9147


> pairs(warp.emm, adjust = "none")
wool = A:
 contrast   estimate       SE df t.ratio p.value
 L - M    20.5555556 5.157299 48   3.986  0.0002
 L - H    20.0000000 5.157299 48   3.878  0.0003
 M - H    -0.5555556 5.157299 48  -0.108  0.9147

wool = B:
 contrast   estimate       SE df t.ratio p.value
 L - M    -0.5555556 5.157299 48  -0.108  0.9147
 L - H     9.4444444 5.157299 48   1.831  0.0733
 M - H    10.0000000 5.157299 48   1.939  0.0584

Observe that the results from pairs() are inconsistent with the $comparisons portion from cld()

emmeans on an lme model created from a formula variable inside a function fails

Hello,

Huge fan of lsmeans/emmeans! Noticed this issue today and thought I would pass along a reprex.

I have a function which creates lme models using an assembled formula. Emmeans works fine on such models when they are assembled in the global space, but appear to fail inside functions. Does not occur with lm functions, appears to be specific to lme, perhaps because of the 'fixed = ' portion of the formula?

Here is my attempt at a reprex using mtcars:

#outside of a function an lme model with a variable in the call works fine
lme_formula <- as.formula("mpg ~ as.factor(cyl)")
lme_formula_model <- lme(lme_formula, random =~1|disp, data = mtcars)
emmeans(lme_formula_model, ~ cyl)
lme_formula_model$call

#inside of a function the emmeans cannot seem to find the formula variable in the model call
lme_formula_in_fn <- function(formula_string){  
  
  formula_inside_fn <- as.formula(formula_string)
  model_inside_fn <- lme(formula_inside_fn, random =~1|disp, data = mtcars)
  try(emmeans(model_inside_fn, ~ cyl))
  return(model_inside_fn)
  
}

#call the above function and see resulting error
model_outside_fn <- lme_formula_in_fn("mpg ~ as.factor(cyl)")

#model itself seems ok otherwise just has the variable in the call
model_outside_fn$call
summary(model_outside_fn)

The error text for the inside the function emmeans is:
Error in eval(x$call$fixed) : object 'formula_inside_fn' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed

I just found this morning, still toying around with it, will update if I find anything further/workarounds.

Cheers,
Ben

emtrends for individual subjects

Hi,

I have been using emtrends to estimate slopes in a mixed effects model and it works nicely to get the trend for the different groupings. Is there a way to use emtrends to get not the group trend, but the individual subject trends?

Thank you.

Paco

emmeans for gamm4

Do you plan on adding support for gamm4 objects?

Thanks a lot for your awesome package!

Comments on brmsfit methods

First of all thanks for providing initial support for brmsfit objects! I have a few comments / points to discuss:

Regarding:

recover_data.brmsfit = function(object, data, ...) {
    fcall = object$call
    form = brms::parse_bf(formula(object))$dpars$mu$fe
    recover_data.call(fcall, terms(form), "na.omit", data = object$data, ...)
}

brmsfit object have no $call component. For what do you need to use this component internally?

Regarding:

emm_basis.brmsfit = function(object, trms, xlev, grid, vcov., ...) {
    V = vcov(object)
    m = model.frame(trms, grid, na.action = na.pass, xlev = xlev)
    contr = lapply(object$data, function(.) attr(., "contrasts"))
    contr = contr[!sapply(contr, is.null)]
    X = model.matrix(trms, m, contrasts.arg = contr)
    nm = dimnames(X)[[2]]
    nbasis=estimability::all.estble
    dfargs = list()
    dffun = function(k, dfargs) Inf
    misc = .std.link.labels(parse_bf(formula(object))$dpars$mu$family, list())
    # Pick the 1st length(nm) columns -- probably not the best approach...
    post.beta = as.matrix(posterior_samples(object)[, seq_along(nm), drop = FALSE])
    bhat = apply(post.beta, 2, mean)
   
    list(X=X, bhat=bhat, nbasis=nbasis, V=V, dffun=dffun, dfargs=dfargs, 
         misc=misc, post.beta=post.beta)
}

You can try to extract posterior samples via something like

b_pars <- paste0("b_", nm)
post.beta <- as.matrix(object, pars = b_pars, exact = TRUE)

The V component could be restricted to the actually used coefficients via

V <- V[nm, nm, drop = FALSE]

lm and aov model do not always produce identical emmeans

Hi Russ,

I noticed some surprising behavior when comparing the marginal means between two equivalent models estimated with both lm and aov. When the models include all interactions, the emmeans are equal. However, when only including a main effect between two of the variables, the emmeans are not any more identical. The following example code shows it:

df_wide <- structure(list(id = structure(1:16, .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16"), class = "factor"), treatment = structure(c(1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("control", 
"A", "B"), class = "factor", contrasts = "contr.sum"), gender = structure(c(2L, 
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("F", 
"M"), class = "factor", contrasts = "contr.sum"), fup = c(3, 
4, 7, 4, 4, 9, 9, 6, 5, 8, 6, 8, 8, 6, 7, 8), post = c(3, 3, 
5, 3, 6, 9, 8, 5, 4, 7, 5, 9, 6, 5, 7, 7), pre = c(2, 4, 6, 5, 
4, 8, 5, 3, 4, 4, 3, 6, 6, 2, 3, 5)), .Names = c("id", "treatment", 
"gender", "fup", "post", "pre"), row.names = c(NA, -16L), class = "data.frame")

library(emmeans)
options(contrasts=c('contr.sum', 'contr.poly'))

lm_main <- lm(cbind(fup, post, pre) ~ treatment + gender, df_wide)
lm_inter <- lm(cbind(fup, post, pre) ~ treatment * gender, df_wide)

df_long <- reshape2::melt(df_wide, id.vars = c("id", "treatment", "gender"))
aov_main <- aov(value ~ (gender + treatment)*variable + Error(id/variable), df_long)
aov_inter <- aov(value ~ (gender * treatment)*variable + Error(id/variable), df_long)

emm_lm_inter <- emmeans(lm_inter, "treatment")
emm_aov_inter <- emmeans(aov_inter, "treatment")
summary(emm_lm_inter)$emmean 
# [1] 4.222222 6.250000 6.027778
summary(emm_aov_inter)$emmean
# [1] 4.222222 6.250000 6.027778

emm_lm_main <- emmeans(lm_main, "treatment")
emm_aov_main <- emmeans(aov_main, "treatment")
summary(emm_lm_main)$emmean 
# [1] 4.100365 6.250000 6.071168
summary(emm_aov_main)$emmean
# [1] 4.126521 6.276156 6.097324

As you can see, the example requires both independent sample factors (between which the critical distinction between an interaction or main effect is included) and the repeated-measures factors.

Interestingly, if you remove one of the independent samples factors, the emmeans also differ:

lm2 <- lm(cbind(fup, post, pre) ~ treatment, df_wide)
aov2 <- aov(value ~ treatment*variable + Error(id/variable), df_long)
summary(emmeans(lm2, "treatment"))$emmean
# [1] 4.20 6.25 6.00
summary(emmeans(aov2, "treatment"))$emmean
# [1] 4.216667 6.266667 6.016667

It would be great to know if this is indeed some sort of bug or just a consequence of the different models. The fact that for the full models with the interaction are in fact identical, somehow suggests there might be some problem here.

Cheers,
Henrik

Error occurring in "emmeans" package for the two data sets I used. Please help.

I am a Professor of Statistics at Indira Gandhi Krishi Vishwavidyalaya, Raipur, India. While teaching in class about analysis of variance using R, I was doing a one-way analysis for the two data-sets given below in the R-class. I got a typical error in "emmeans" package, please help:

Data-set-1:

Medley and Clements (1998) investigated the impact of zinc contamination (and other heavy metals) on the diversity of diatom species in the USA Rocky Mountains. The diversity of diatoms (number of species) and degree of zinc contamination (categorized as either of high, medium, low or natural background level) were recorded from between four and six sampling stations within each of six streams known to be polluted, as given below:

stream=c("Eagle", "Eagle", "Eagle", "Eagle", "Blue", "Blue",
"Blue", "Blue", "Blue", "Blue", "Blue", "Snake", "Snake",
"Snake", "Snake", "Snake", "Arkan", "Arkan", "Arkan",
"Arkan", "Arkan", "Arkan", "Arkan", "Chalk", "Chalk",
"Chalk", "Chalk", "Chalk", "Splat", "Splat", "Splat",
"Splat", "Splat", "Splat")

zinc=c("BACK", "HIGH", "HIGH", "MED", "BACK", "HIGH", "BACK", "BACK",
"HIGH", "MED", "MED", "BACK", "MED", "HIGH", "HIGH", "HIGH",
"LOW", "LOW", "LOW", "LOW", "MED", "MED", "LOW", "LOW",
"HIGH", "HIGH", "MED", "LOW", "BACK", "BACK", "MED", "LOW",
"MED", "BACK")

diversity=c(2.27, 1.25, 1.15, 1.62, 1.7, 0.63, 2.05, 1.98, 1.04,
2.19, 2.1, 2.2, 2.06, 1.9, 1.88, 0.85, 1.4, 2.18, 1.83,
1.88, 2.02, 1.94, 2.1, 2.38, 1.43, 1.37, 1.75, 2.83,
1.53, 0.76, 0.8, 1.66, 0.98, 1.89)

medley.clementis=data.frame(stream,zinc,diversity)

I did the one-way anova:

medley.clementis.aov=with(medley.clementis, aov(diversity ~ zinc))

anova(medley.clementis)

Then, I tried to do post hoc analysis using "emmeans" package following command:

emmeans::emmeans(medley.clementis.aov, "zinc")

This gives following error:

Error in recover_data.call(fcall, delete.response(terms(object)), object$na.action, :
object 'possibly.random' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed

Data-set-2:

Keough and Raimondi (1995) examined the effects of four biofilm types (SL: sterile unfilmed substrate, NL: netted laboratory biofilms, UL: unnetted laboratory biofilms and F: netted field biofilms) on the recruitment of serpulid larvae. Substrates treated with one of the four biofilm types were left in shallow marine waters for one week after which the number of newly recruited serpulid worms were counted, as given below:

biofilm=c("SL", "SL", "SL", "SL", "SL", "SL", "SL", "UL", "UL", "UL",
"UL", "UL", "UL", "UL", "NL", "NL", "NL", "NL", "NL", "NL",
"NL", "F", "F", "F", "F", "F", "F", "F")

serpulid=c(61, 113, 123, 75, 75, 83, 95, 143, 81, 101, 155, 156, 193,
163, 203, 159, 139, 161, 179, 97, 157, 128.5, 204.5,
108.5, 116.5, 140.5, 160.5, 87.5)

keough.raimondi=data.frame(biofilm,serpulid)

Applied log-transformation:

keough.raimondi.ln=transform(keough.raimondi, serpulid.ln=log(serpulid))

I did the one-way anova, with contrasts defined below:

contrasts(keough.raimondi.ln$biofilm) <- cbind(c(0, 1, 0, -1),
c(2, -1, 0, -1), c(-1, -1, 3, -1))
keough.raimondi.ln$biofilm

keough.contr.list <- list(biofilm = list('NL vs UL' = 1,
'F vs (NL & UL)' = 2, 'SL vs (F & NL & UL)' = 3))
keough.contr.list

One-way anova:

keough.raimondi.ln.aov=with(keough.raimondi.ln, aov(serpulid.ln ~ biofilm))

summary(keough.raimondi.ln.aov,split=keough.contr.list)

Then, I tried to do post hoc analysis using "emmeans" package following command:

emmeans(keough.raimondi.ln.aov, ~ biofilm)

This gives following error:

Error in recover_data.call(fcall, delete.response(terms(object)), object$na.action, :
object 'possibly.random' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed

Help Needed:

On many other data sets and data frame I successfully used "emmeans" package using the help available in R.

But, for the above two data-sets, I consistently got the same error as described above.

I do not know what is amiss. Where I am missing or whatever is wrong, I request the entire R-team to help me to solve above problem. Thanking in advance.

Dr. A.K. Singh
Professor and Head
Department of Agricultural Statistics
Indira Gandhi Krishi Vishwavidyalaya
Raipur-492012, Chhattisgarh, India
Mob: +918770625795
Email: [email protected]
WhatsApp: +919039202968

emmeans with variables not in a data.frame

Dear maintainers of emmeans,

This is working:

dv = rnorm(100)
iv = rbinom(100, 1, 0.5)
d = data.frame(dv, iv)
m = lm(dv ~ iv, data=d)
emmeans(m, ~iv)
iv emmean SE df lower.CL upper.CL
0.56 -0.09024097 0.102466 98 -0.2935815 0.1130996
Confidence level used: 0.95

However, this fails:

dv = rnorm(100)
iv = rbinom(100, 1, 0.5)
m = lm(dv ~ iv)
emmeans(m, ~iv)
Error in recover_data.call(fcall, delete.response(terms(object)), object$na.action, :
object 'possibly.random' not found
Error in ref_grid(object, ...) :
Perhaps a 'data' or 'params' argument is needed

It seems that emmeans wants all the data in a data.frame. I see that the data.frame variant covers
probably 99% of the use cases, though, but maybe it is only a small thing to fix...

Best regards,

Matthias

Problems with glm confidence intervals

Hi,
I'm estimating confidence intervals with emmeans on a glm model and found discrepancies with confint() function.
An example can be done using data from SAS user guide which I saved here.

The dep var (Pain) is binary and the factor is 3-level (Treatment). I code Treatment with contr.treatment so the intercept of the model will be the expected logit for Treatment=A

contrasts(dat$Treatment)<-contr.treatment(3,base=1)
model<-glm(Pain~Treatment,data=dat,family = binomial())
summary(model)

The summary indicates that the intercept is -1.0986, with SE=0.51639. Asking for CI I get
confint(model)
(Intercept) -2.2218727 -0.1504401

If now I run the emmeans() function I get (ci bounds are the last two numbers)
emmeans(model,~Treatment)
A -1.098612 0.5163973 NA -2.11073247 -0.08649211

The estimate is the same, the SE is the same, but the confidence interval is different. Is this ok?
I found this behavior to be general, as it applies to many different examples.
Thanks a lot for the great material
marcello

reg_grid with polynomial transformation

I am fitting a bayesian model as follows:

df <- data.frame(
                 RT = rnorm(100, 30, .2), 
                 Stress = runif(100, 3, 5))

fit <- rstanarm::stan_glm(RT ~ poly(Stress, 2, raw=T), data=df)

I'd like to extract the reference grid (to further predict the outcome):

ref_grid <- emmeans::ref_grid(fit, at = list(
  Stress = seq(min(df$Stress),
            max(df$Stress),
            length.out = 10)))

But it says the following:

Error in model.frame.default(formula = ~Stress + T, data = df, drop.unused.levels = TRUE,  : 
  variables differ in length (found for 'T')
Error in emmeans::ref_grid(fit, at = list(Stress = seq(min(df$Stress),  : 
  Perhaps a 'data' or 'params' argument is needed

What can I do?

Thanks for your help!

stanreg error

Here's the continuation of your mail with the error in the psycho package.

It is rather strange:

For example, this chunk of code appear as broken if emmeans is not loaded (library(emmeans))

fit <- rstanarm::stan_glm(Sepal.Width ~ Sepal.Length * Species, data=iris)

formula <- "Sepal.Length * Species"
formula <- as.formula(paste0("~", formula))
means_posterior <- fit %>%
    emmeans::emmeans(formula) %>%
    coda::as.mcmc() %>%
    as.matrix() %>%
    coda::as.mcmc() %>%
    as.data.frame()
Error in y[, var.cols] <- x : 
  number of items to replace is not a multiple of replacement length

It works if I load the package beforehand. In the same way, the error in the test-get_contrasts of the psycho package also works after explicitly loading emmeans...

Is there an internal outside scope call that is made?

t tests become z tests after `regrid()` applied

Note this curious result:

> warp.lm = lm(log(breaks) ~ wool * tension, data = warpbreaks, subset = 10:48)
> rg = ref_grid(warp.lm)

> emmeans(warp.lm, "tension")
NOTE: Results may be misleading due to involvement in interactions
 tension   emmean         SE df lower.CL upper.CL
 L         nonEst         NA NA       NA       NA
 M       3.213038 0.08825321 34 3.033686 3.392390
 H       3.095196 0.12480889 34 2.841554 3.348838

Results are averaged over the levels of: wool 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95 

> regrid(emmeans(warp.lm, "tension"))
NOTE: Results may be misleading due to involvement in interactions
 tension response       SE df asymp.LCL asymp.UCL
 L         nonEst       NA NA        NA        NA
 M       24.85448 2.193488 NA  20.55533  29.15364
 H       22.09157 2.757224 NA  16.68751  27.49563

Results are averaged over the levels of: wool 
Confidence level used: 0.95 

Note that the first result shows statistics with 34 d.f., but after regrid() is called, all the d.f. are NA and the statistics are asymptotic. Why?

new brms import adds many dependencies

hi russell,

i notice the most recent emmeans adds an import to brms. this adds a lot of additional dependencies. emmeans went from having 34 dependencies, to having 87!

is it possible for brms to be changed to a suggests?

i'm happy to explore this option if you like.

with thanks

jonathon

using cld with a list of emmGrids created with pairwise

I don't know if this is intentional or not, but a student was totally stumped by this, so I'll pass it on.

The first cld works. The second does not. The object created with pairwise is mmGrid, not emmGrid.

_
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)

warp.emmGrid <- emmeans(warp.lm, ~ tension | wool)
cld(warp.emmGrid)

warp2.emmGrid <- emmeans(warp.lm, pairwise ~ tension | wool)
cld(warp2.emmGrid)
_

as.glht() fails with lme4 model with df = Inf

Hi Russ,

It appears as if as.glht() cannot deal with df that are Inf. See code below:

require(lme4)
require(emmeans)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

as.glht(emmeans(fm1, "Days"))
# Loading required namespace: pbkrtest
# Note: df set to 17
# 
# 	 General Linear Hypotheses
# 
# Linear Hypotheses:
#          Estimate
# 4.5 == 0    298.5

emm_options(lmer.df = "asymptotic")
as.glht(emmeans(fm1, "Days"))
# Error in if (any(args$df != df)) message("Note: df set to ", args$df) : 
#   missing value where TRUE/FALSE needed
# In addition: Warning message:
# In glht.emmGrid(, object, ...) :
#   NAs introduced by coercion to integer rang

error with 2 within-subjects factors and no between-subjects

Thank you for an excellent package.

When I fit a multivariate intercepts-only model for a factorial within-subjects design, I get an error using the mult.levs argument. Adapted from the ?ref_grid help page:

MOats.lm = lm(yield ~ 1, data = MOats)
ref_grid(MOats.lm, mult.name = "nitro") # no problem with a single WS factor
ref_grid(MOats.lm, mult.levs = list(T=LETTERS[1:2], U=letters[1:2])) # error

The same error occurs in the latest lsmeans::ref.grid. It does not occur when there is at least 1 between-subjects factor.

I installed archived versions, and it worked in version 2.26-3, but not 2.27-2

Thanks for your attention,
Terrence

Unexpected behavior for quadratic terms

Hello,
This may be an issue, or maybe it's a misunderstanding on my part, but I was surprised by the behavior of emmeans when a polynominal term of the form "I(x^2)" is in an lm statement.

Using the I(x^2) notation, or creating a new squared term by hand (xsq <- x^2), and then using one or the other in an lm statement, and then feeding that into emmeans, yields different results. As far as I can tell inputting the already squared term by hand, yields the correct result, whereas the I(x^2) notation does not.

Here is a reproducible example.
I generate a bunch of data, then I generate the squared term by hand, or insert it directly in the lm statement. I include all interactions in the lm model.

The models are called lm.a and lm.b. Feeding them into emmeans, hoping to get the average treatment effect yields two different results. As far as I can tell the results of lm.b correspond with what would be called the average causal effect in the causal inference lit.

set.seed(1111)
preacademic <- rnorm(100,100,20)
p <- (exp(-2.3+.01*preacademic)) / (1/(1+exp(-2.3+.01*preacademic)))
treat <- rbinom(100,1,p)
postacademic<- 100 + 5*treat + 2*preacademic + rnorm(100,0,20)
treat <- factor(treat)
levels(treat) <- c("control","treatment")
df1 <- data.frame(preacademic,treat,postacademic)
df1$preacademic2 <- df1$preacademic^2
lm.a <-lm(postacademic~treat+preacademic+I(preacademic^2)+treat:preacademic+treat:I(preacademic^2))
lm.b <- lm(postacademic~treat+preacademic+preacademic2+treat:preacademic+treat:preacademic2)
summary(emmeans(lm.a,"treat",contr="pairwise",weights="proportinal"),infer=TRUE)
summary(emmeans(lm.b,"treat",contr="pairwise",weights="proportinal"),infer=TRUE)

The last two commands produce the following output:

`$contrasts
contrast estimate SE df lower.CL upper.CL t.ratio p.value
control - treatment -11.54449 5.067913 94 -21.60695 -1.48203 -2.278 0.0250

$contrasts
contrast estimate SE df lower.CL upper.CL t.ratio p.value
control - treatment -10.63247 4.133922 94 -18.84047 -2.424475 -2.572 0.0117
`
To compare my results, I generate predicted values for all individuals setting either all to treatment or all to control, and then taking a mean difference of those. The results are always the same, and always identical to the emmeans model with lm.b.

`#treatment effect by hand using predictions
mean(predict(lm.a,newdata = data.frame(treat=factor(rep("control",100))))) - mean(predict(lm.a,newdata = data.frame(treat=factor(rep("treatment",100)))))

#treatment effect by hand using predictions
mean(predict(lm.b,newdata = data.frame(treat=factor(rep("control",100))))) - mean(predict(lm.b,newdata = data.frame(treat=factor(rep("treatment",100)))))
`

Thanks!

as.stanfit

Hello,
I noticed that emmeans has retired the as.stanfit command, and is now exclusively using as.mcmc when using emmeans for Bayesian models, like the ones generated from the rstanarm package.
Just as a personal opinion, I liked the as.stanfit() command, because it allowed me to generate interest contrasts, etc. using emmeans, but getting output that looked very similar to the stan output, and not like the mcmc output. In particular, this meant that the credible intervals appear next to the estimates, and the Gelman-Rubin convergence criteria are printed alongside the estimates.
It's not a huge deal, and I still appreciate that emmeans is so flexible to utilize all these different fitted methods, but it was nice having as.stanfit().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.