Giter Site home page Giter Site logo

easystats / modelbased Goto Github PK

View Code? Open in Web Editor NEW
232.0 12.0 19.0 333.73 MB

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models

Home Page: https://easystats.github.io/modelbased/

License: GNU General Public License v3.0

R 99.75% TeX 0.25%
contrasts marginal means marginal-effects predict estimate easystats r contrast-analysis hacktoberfest

modelbased's Introduction

easystats: An R Framework for Easy Statistical Modeling, Visualization, and Reporting

downloads total lifecycle

What is easystats?

easystats is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.

However, there is not (yet) an unique “easystats” way of doing data analysis. Instead, start with one package and, when you’ll face a new challenge, do check if there is an easystats answer for it in other packages. You will slowly uncover how using them together facilitates your life. And, who knows, you might even end up using them all.

Installation

CRAN_Status_Badge insight status badge R-CMD-check

Type Source Command
Release CRAN install.packages("easystats")
Development r-universe install.packages("easystats", repos = "https://easystats.r-universe.dev")
Development GitHub remotes::install_github("easystats/easystats")

Finally, as easystats sometimes depends on some additional packages for specific functions that are not downloaded by default. If you want to benefit from the full easystats experience without any hiccups, simply run the following:

easystats::install_suggested()

Citation

To cite the package, run the following command:

citation("easystats")
To cite easystats in publications use:

  Lüdecke, Patil, Ben-Shachar, Wiernik, Bacher, Thériault, & Makowski
  (2022). easystats: Framework for Easy Statistical Modeling,
  Visualization, and Reporting. CRAN.
  doi:10.32614/CRAN.package.easystats
  <https://doi.org/10.32614/CRAN.package.easystats>

A BibTeX entry for LaTeX users is

  @Article{,
    title = {easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting},
    author = {Daniel Lüdecke and Mattan S. Ben-Shachar and Indrajeet Patil and Brenton M. Wiernik and Etienne Bacher and Rémi Thériault and Dominique Makowski},
    journal = {CRAN},
    doi = {https://doi.org/10.32614/CRAN.package.easystats},
    year = {2022},
    note = {R package},
    url = {https://easystats.github.io/easystats/},
  }

If you want to do this only for certain packages in the ecosystem, have a look at this article on how you can do so! https://easystats.github.io/easystats/articles/citation.html

Getting started

Each easystats package has a different scope and purpose. This means your best way to start is to explore and pick the one(s) that you feel might be useful to you. However, as they are built with a “bigger picture” in mind, you will realize that using more of them creates a smooth workflow, as these packages are meant to work together. Ideally, these packages work in unison to cover all aspects of statistical analysis and data visualization.

  • report: 📜 🎉 Automated statistical reporting of objects in R
  • correlation: 🔗 Your all-in-one package to run correlations
  • modelbased: 📈 Estimate effects, group averages and contrasts between groups based on statistical models
  • bayestestR: 👻 Great for beginners or experts of Bayesian statistics
  • effectsize: 🐉 Compute, convert, interpret and work with indices of effect size and standardized parameters
  • see: 🎨 The plotting companion to create beautiful results visualizations
  • parameters: 📊 Obtain a table containing all information about the parameters of your models
  • performance: 💪 Models’ quality and performance metrics (R2, ICC, LOO, AIC, BF, …)
  • insight: 🔮 For developers, a package to help you work with different models and packages
  • datawizard: 🧙 Magic potions to clean and transform your data

Frequently Asked Questions

How is easystats different from the tidyverse?

You’ve probably already heard about the tidyverse, another very popular collection of packages (ggplot, dplyr, tidyr, …) that also makes using R easier. So, should you pick the tidyverse or easystats? Pick both!

Indeed, these two ecosystems have been designed with very different goals in mind. The tidyverse packages are primarily made to create a new R experience, where data manipulation and exploration is intuitive and consistent. On the other hand, easystats focuses more on the final stretch of the analysis: understanding and interpreting your results and reporting them in a manuscript or a report, while following best practices. You can definitely use the easystats functions in a tidyverse workflow!

easystats + tidyverse = ❤️

Can easystats be useful to advanced users and/or developers?

Yes, definitely! easystats is built in terms of modules that are general enough to be used inside other packages. For instance, the insight package is made to easily implement support for post-processing of pretty much all regression model packages under the sun. We use it in all the easystats packages, but it is also used in other non-easystats packages, such as ggstatsplot, modelsummary, ggeffects, and more.

So why not in yours?

Moreover, the easystats packages are very lightweight, with a minimal set of dependencies, which again makes it great if you want to rely on them.

Documentation

Websites

Each easystats package has a dedicated website.

For example, website for parameters is https://easystats.github.io/parameters/.

Blog

In addition to the websites containing documentation for these packages, you can also read posts from easystats blog: https://easystats.github.io/blog/posts/.

Other learning resources

In addition to these websites and blog posts, you can also check out the following presentations and talks to learn more about this ecosystem:

https://easystats.github.io/easystats/articles/resources.html

Dependencies

easystats packages are designed to be lightweight, i.e., they don’t have any third-party hard dependencies, other than base-R packages or other easystats packages! If you develop R packages, this means that you can safely use easystats packages as dependencies in your own packages, without the risk of entering the dependency hell.

library(deepdep)

plot_dependencies("easystats", depth = 2, show_stamp = FALSE)

As we can see, the only exception is the {see} package, which is responsible for plotting and creating figures and relies on {ggplot2}, which does have a substantial number of dependencies.

Usage

Total downloads

Total insight datawizard parameters performance bayestestR effectsize correlation see modelbased report easystats
22,663,701 6,645,177 4,004,503 2,765,087 2,682,787 2,629,672 2,087,821 690,405 572,721 338,671 186,895 59,962

Trend

Contributing

We are happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. Pull Requests for contributions are encouraged.

Here are some simple ways in which you can contribute (in the increasing order of commitment):

  • Read and correct any inconsistencies in the documentation
  • Raise issues about bugs or wanted features
  • Review code
  • Add new functionality

Code of Conduct

Please note that the ‘easystats’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

modelbased's People

Contributors

bwiernik avatar dominiquemakowski avatar etiennebacher avatar github-actions[bot] avatar indrajeetpatil avatar jpasquier avatar mattansb avatar moritzkoerber avatar rempsyc avatar strengejacke avatar tam-pham avatar vincentarelbundock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modelbased's Issues

issue with estimate_contrasts

Hello,

firstly I have to say that I like the easystats packages very much, thank you.

Having said that, I would like to report an issue with estimate_contrasts. It does not accept anymore formula as levels. An example:

df <- iris
df$factor1 <- ifelse(df$Sepal.Width > 3, "A", "B")
df$factor2 <- ifelse(df$Petal.Length > 3.5, "C", "D")
model <- lm(Petal.Width ~ factor1 * factor2, data=df)
estimate_contrasts(model, levels = ~ "factor1 * factor2")
#> Error in `[.default`(insight::get_data(model), levels): invalid subscript type 'list'

This code worked till some time, but now not anymore. In the documentation of estimate_contrasts it says:

levels - A character vector or formula specifying the names of the predictors over which to estimate means or contrasts.

How do I specify the formula now?

Thank you very much.

Best regards

download models only for local running tests

Since downloading models is increasingly causing troubles with CRAN checks, we should fit the models in tests directly, or only run tests locally when downloading models from GitHub is required. We should prepare this in time (performance, insight and parameters are already fixed), to be ready for a possible update.

'Bias adjusted' centrality and uncertainty measures for models with response transformations and/or random effects

Hello,

I came across bias adjustment for frequentist models involving response transformations and/or random effects while reading the detailed vignette in the emmeans package and wondered if the same applied to Bayesian models as well. Indeed, emmeans have a small section on Bayesian models.

It would be great if these could be ported over to the relevant functions (estimate_means and estimate_contrasts) here. Going a step further, it would be even better if the sigma argument could be calculated automatically for common model objects from popular Bayesian R packages (such as rstanarm, brms, BayesFactor)

Thank you

Save more info as attributes (for visualisation)

We could improve or prepare functions for plotting. Eg, visualization_marix() could save the terms in questions as attribute, so when we plot from estimate_link(), we know which variable is mapped to the x axis, which to the color scale etc.

how to fix emmeans at one specific factor levels

@mattansb I need your help again ☺️

data <- iris
data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B")
model <- lm(Petal.Length ~ Species * Petal.Length_factor, data = data)

Let's say I want to get the refgrid/means at the "A" level of Petal.Length_factor.

I'd have expected the following to work, yet:

at <- list("Species"=c("setosa", "versicolor", "virginica"), "Petal.Length_factor"=c("A"))
emmeans::ref_grid(model, at = at)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Any ideas?

estimate_contrasts cannot transform Poisson glm

Hi- I'd like to use estimate_contrasts to get contrasts on the same scale as the response for a glm fit with Poisson family distribution. I thought the code below should do it but instead it fails:

library(modelbased)

counts <- c(18,17,15,20,10,20,25,13,12)
treatment <- gl(3,3)
fit <- glm(counts ~ treatment, family = poisson())

estimate_contrasts(fit, transform= 'response')
Error in names(levelcols) <- c("Level1", "Level2") : 
  'names' attribute [2] must be the same length as the vector [1]

Is this a bug or am I missing something?
Thanks!

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS/LAPACK: /home/dario/miniconda3/envs/tritume/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] modelbased_0.3.0

loaded via a namespace (and not attached):
 [1] compiler_3.6.3   emmeans_1.5.2-1  plyr_1.8.6       estimability_1.3 tools_3.6.3      parameters_0.9.0 Rcpp_1.0.5       mvtnorm_1.1-1   
 [9] bayestestR_0.7.5 insight_0.11.0   xtable_1.8-4    

Support for 'brms' model objects

Loving the different 'easystats' package, thank you all!

I quite like the intuitive functions of modelbased and it would be nice to have it extended beyond rstanarm objects to brms model objects as well. This could very well be in the pipeline already...

`estimate_means` for BFBayesFactor model object or data frame from `weighted_posteriors`

Hello,

I'm trying to obtain centrality (means or medians) as well as uncertainty (credible intervals) measures for different levels of a categorical factor . I realised that emmeans doesn't accept BFBayesFactor model types and came across the estimate_means function that would give me the marginal means I want. Couple of questions,

  1. Is there a plan to include BFBayesFactor model types for use with estimate_means?

  2. If not, I realised that I can obtain the weighted posterior estimates from a BFBayesFactor model object using the weighted_posteriors function in bayestestR. Is it possible to use the data frame that is output from weighted_posteriors to use with estimate_means?

Thank you

CRAN update

We see:
** running tests for arch 'i386' ... [373s] OK
** running tests for arch 'x64' ... [255s] OK

Can you please reduce (at least halve) the check time in the tests? For
example by running less important tests only conditionally if some
environment variable is set that you only define on your machine?

error estimate_contrasts using Bayes factors on a stan_polr model

From email:

Sorry to bug you again, but I am trying to use estimate_contrasts using Bayes factors on a stan_polr model and keep getting an odd error message (this time I have installed the latest version of R!). I’ve included the session info in case it might be useful. Hope you an help!

Model:

options(mc.cores = parallel::detectCores())

WE1GCm <- stan_polr(choice1 ~ condition, data = WE1DataGC_long,

                    prior = R2(0.25), prior_counts = dirichlet(1),

                    seed = 12345)

 

> Pairwise_Bayes <- estimate_contrasts(WE1GCm, levels = "condition", test = "bf")

Error in emm_basis.stanreg(object, trms, xlev, grid, misc = attr(data,  :

  object 'bhat' not found

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ordinal_2019.12-10 sjPlot_2.8.5 boot_1.3-25 afex_0.28-0 lme4_1.1-23 Matrix_1.2-18 rstanarm_2.21.1 Rcpp_1.0.5
[9] loo_2.3.1 tidybayes_2.1.1 cowplot_1.1.0 lsmeans_2.30-0 doBy_4.6.7 modelbased_0.3.0 logspline_2.1.16 bayestestR_0.7.2
[17] gganimate_1.0.6 RColorBrewer_1.1-2 rstan_2.21.2 StanHeaders_2.21.0-6 ggridges_0.5.2 ggstance_0.3.4 modelr_0.1.8 purrr_0.3.4
[25] tidyr_1.1.2 forcats_0.5.0 dplyr_1.0.2 magrittr_1.5 plyr_1.8.6 car_3.0-10 carData_3.0-4 MASS_7.3-51.6
[33] Hmisc_4.4-1 ggplot2_3.3.2 Formula_1.2-3 survival_3.1-12 lattice_0.20-41 rstantools_2.1.1 sjmisc_2.8.5 gtools_3.8.2
[41] emmeans_1.5.1

loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.1.10 igraph_1.2.5 svUnit_1.0.3 splines_4.0.2 crosstalk_1.1.0.1 inline_0.3.16 digest_0.6.25
[9] htmltools_0.5.0 rsconnect_0.8.16 lmerTest_3.1-2 fansi_0.4.1 checkmate_2.0.0 cluster_2.1.0 openxlsx_4.2.2 RcppParallel_5.0.2
[17] matrixStats_0.57.0 xts_0.12.1 prettyunits_1.1.1 jpeg_0.1-8.1 colorspace_1.4-1 ggdist_2.2.0 haven_2.3.1 xfun_0.18
[25] callr_3.4.4 crayon_1.3.4 jsonlite_1.7.1 zoo_1.8-8 glue_1.4.2 gtable_0.3.0 sjstats_0.18.0 V8_3.2.0
[33] distributional_0.2.0 pkgbuild_1.1.0 abind_1.4-5 scales_1.1.1 mvtnorm_1.1-1 ggeffects_0.16.0 miniUI_0.1.1.1 performance_0.5.0
[41] xtable_1.8-4 progress_1.2.2 htmlTable_2.1.0 foreign_0.8-80 stats4_4.0.2 DT_0.15 htmlwidgets_1.5.1 threejs_0.3.3
[49] arrayhelpers_1.1-0 ellipsis_0.3.1 farver_2.0.3 pkgconfig_2.0.3 nnet_7.3-14 effectsize_0.3.3 tidyselect_1.1.0 rlang_0.4.7
[57] reshape2_1.4.4 later_1.1.0.1 munsell_0.5.0 cellranger_1.1.0 tools_4.0.2 cli_2.0.2 generics_0.0.2 sjlabelled_1.1.7
[65] broom_0.7.1 stringr_1.4.0 fastmap_1.0.1 processx_3.4.4 knitr_1.30 zip_2.1.1 nlme_3.1-148 mime_0.9
[73] compiler_4.0.2 bayesplot_1.7.2 shinythemes_1.1.2 rstudioapi_0.11 curl_4.3 png_0.1-7 tweenr_1.0.1 tibble_3.0.3
[81] statmod_1.4.34 stringi_1.5.3 parameters_0.8.6 ps_1.3.4 nloptr_1.2.2.2 markdown_1.1 shinyjs_2.0.0 vctrs_0.3.4
[89] pillar_1.4.6 lifecycle_0.2.0 ucminf_1.1-4 estimability_1.3 data.table_1.13.0 insight_0.9.6 httpuv_1.5.4 R6_2.4.1
[97] latticeExtra_0.6-29 promises_1.1.1 gridExtra_2.3 rio_0.5.16 codetools_0.2-16 colourpicker_1.1.0 assertthat_0.2.1 withr_2.3.0
[105] Deriv_4.1.0 shinystan_2.5.0 parallel_4.0.2 hms_0.5.3 grid_4.0.2 rpart_4.1-15 coda_0.19-4 minqa_1.2.4
[113] numDeriv_2016.8-1.1 shiny_1.5.0 base64enc_0.1-3 dygraphs_1.1.1.6

Row order not aligned with order of factor levels in estimate_means()

modelbased::estimate_means() seems to order rows by alphabetically ordering factor levels, not as these factor levels are defined in the factor:

library(modelbased)
library(glmmTMB)
library(emmeans)

model <- glmmTMB(
  count ~ mined + (1 | site),
  ziformula = ~mined,
  family = poisson,
  data = Salamanders
)

estimate_means(model)
#> mined | rate |   SE |       95% CI
#> ----------------------------------
#> no    | 3.42 | 0.31 | [2.86, 4.09]
#> yes   | 1.09 | 0.25 | [0.69, 1.72]

as.data.frame(emmeans(model, ~mined, type = "response"))
#>   mined     rate        SE  df  lower.CL upper.CL
#> 1   yes 1.091875 0.2542755 639 0.6911451 1.724951
#> 2    no 3.420613 0.3109316 639 2.8614317 4.089069

levels(Salamanders$mined)
#> [1] "yes" "no"

Created on 2020-06-22 by the reprex package (v0.3.0)

Contrasts between factor levels at different values of another numeric predictor

I would like to implement a simple way of testing contrasts evolution following a numeric predictor. I.e., testing pairwise contrasts between levels of X depending on different values of Z, in a model like Y ~ factor(X) * Z.

I asked this to Russ here, but the vignette section he pointed did not suffice for me to successfully implement it.

I tried things like:

model <- lm(Sepal.Width ~ Species * Petal.Width, data=iris)

model %>%
      emmeans::ref_grid(at = list(Petal.Width = c(0, 1, 2, 3))) %>%
      emmeans::contrast(method = "pairwise") 

But this gives me all the pairwise contrasts, instead of the contrasts within each value of Petal.Width...

@strengejacke Do you have by any chance any experience with such planned contrasts?

estimate_contrasts dealing with three factors

Hello,

I have an issue with estimate_contrasts() when dealing with three factors (when dealing with two factors, it works fine). Try out the following code (thanks to @DominiqueMakowski ):

library(modelbased)

df <- iris
df$factor1 <- ifelse(df$Sepal.Width > 3, "A", "B")
df$factor2 <- ifelse(df$Petal.Length > 3.5, "C", "D")
df$factor3 <- ifelse(df$Sepal.Length > 5, "E", "F")

model <- lm(Petal.Width ~ factor1 * factor2 * factor3, data=df)
estimate_contrasts(model, levels = "factor1")

{I choose as an example "factor1", but it does not matter if "factor1" or "factor2" or "factor3"}
"NOTE: Results may be misleading due to involvement in interactions"
{The NOTE is correct and helpful}
{The output looks like follows, so basically there is no output}

#> Level1 | Level2 | Difference | SE | 95% CI | z | df | p | Difference (std.)
#> ---------------------------------------------------------------------------
#> A      |      B |            |    |        |   |    |   |                  
                                       
1: In max(unlist(lapply(stats::na.omit(CI_low), function(.i) nchar(as.character(.i))))) :
  no non-missing arguments to max; returning -Inf
2: In max(unlist(lapply(stats::na.omit(CI_high), function(.i) nchar(as.character(.i))))) :
  no non-missing arguments to max; returning -Inf

Do I do something wrong? Is this desired behavior? How do I get the contrasts of only one factor?

Help would be very much appreciated.

Thank you very much and stay healthy!

Support for glmmTMB count component conditioned on the zi-component

One thing that is rather difficult is the "overal" estimated means from a model, i.e. the count component conditioned on the zi-component:

library(glmmTMB)
library(modelbased)
library(ggeffects)

model <- glmmTMB(
  count ~ mined + (1 | site),
  ziformula = ~mined,
  family = poisson,
  data = Salamanders
)

# count model
ggpredict(model, "mined")
#> 
#> # Predicted counts of count
#> # x = mined
#> 
#> x   | Predicted |   SE |       95% CI
#> -------------------------------------
#> yes |      1.09 | 0.23 | [0.69, 1.72]
#> no  |      3.42 | 0.09 | [2.86, 4.09]
#> 
#> Adjusted for:
#> * site = NA (population-level)
#> Standard errors are on link-scale (untransformed).
estimate_means(model)
#> mined | rate |   SE |       95% CI
#> ----------------------------------
#> no    | 3.42 | 0.31 | [2.86, 4.09]
#> yes   | 1.09 | 0.25 | [0.69, 1.72]

# zero-inflated model
ggpredict(model, "mined", type = "zi_prob")
#> 
#> # Predicted zero-inflation probabilities of count
#> # x = mined
#> 
#> x   | Predicted |   SE |       95% CI
#> -------------------------------------
#> yes |      0.76 | 0.24 | [0.66, 0.83]
#> no  |      0.36 | 0.12 | [0.30, 0.41]
#> 
#> Adjusted for:
#> * site = NA (population-level)
#> Standard errors are on link-scale (untransformed).
estimate_means(model, component = "zi")
#> mined | Mean |   SE |       95% CI
#> ----------------------------------
#> no    | 0.36 | 0.03 | [0.30, 0.41]
#> yes   | 0.76 | 0.04 | [0.66, 0.83]

# model conditioning both on count- and zero-inflated model
ggpredict(model, "mined", type = "zero_inflated")
#> 
#> # Predicted counts of count
#> # x = mined
#> 
#> x   | Predicted |   SE |       95% CI
#> -------------------------------------
#> yes |      0.26 | 0.08 | [0.11, 0.42]
#> no  |      2.21 | 0.22 | [1.75, 2.66]
#> 
#> Adjusted for:
#> * site = NA (population-level)
#> Standard errors are on link-scale (untransformed).

Created on 2020-06-22 by the reprex package (v0.3.0)

If model is of class glmmTMB, hurdle, zeroinfl or zerotrunc, simulations from a multivariate normal distribution (see ?MASS::mvrnorm) are drawn to calculate mu*(1-p). Confidence intervals are then based on quantiles of these results. For type = "re.zi", prediction intervals also take the uncertainty in the random-effect paramters into account (see also Brooks et al. 2017, pp.391-392 for details).

Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378-400.

Originally posted by @strengejacke in #66 (comment)

Test failure with dev testthat

testthat::expect_warning(nrow(visualisation_matrix(iris, target = "Sepal.Length", length = 4, standardize = TRUE)), 5)

now fails with error "Error: pattern must be a single string, NULL, or NA" — I think this is just a typo in your test.

Error in predict.glmmTMB(model, newdata = newdata, re.form = re.form, : re.form not yet implemented

this popped up, i don't remember having that when submitted to cran a few days ago 🤔

data <- glmmTMB::Salamanders

model2 <- glmmTMB::glmmTMB(
  count ~ cover + mined + (1 | site),
  ziformula = ~ cover + mined,
  family = glmmTMB::truncated_poisson,
  data = data
)


estim <- modelbased::estimate_response(model2)
#> Error in predict.glmmTMB(model, newdata = newdata, re.form = re.form, : re.form not yet implemented

Created on 2020-10-05 by the reprex package (v0.2.1)

feedback

@strengejacke @pdwaggoner @mattansb @humanfactors @IndrajeetPatil

I know it's summer and all, but if you wanna really relax at the beach, enjoying a nice Pina colada and a good read, I can only suggest the following vignettes, fresh out of the oven, showcasing some of the features of estimate 😁:

In fact, for me this package is pretty much ready for a first release (right after parameters is accepted). Although supporting mainly rstanarm models, for now, once we have the API right it will be easy to extend (in fact, estimate_means and estimate_contrasts already support base/lme4 models 🤐).

Anyway, please do not hesitate to give your feedback on any aspects of the package!

standardize = TRUE does not return Cohen's d

Instead it returns something closer to a beta, which is standardized by sd(y), whereas Cohen's d is standardized by sd(errors) - which becomes more complicated with mixed models, or with interactions.

I suggest correcting the docs,
Or if you're brave, try playing with emmeans::eff_size.

modelbased vs. ggeffects vs. emmeans - ROUND 1: Means

I'm kickstarting a super fun "game" if you get bored, that could be helpful 1) to push and test modelbased's API, 2) to outline its strengths and weaknesses, 3) to illustrate similarities and differences with other approaches which can in turn facilitate the navigation of users to what's most appropriate to their usage. Later if that's helpful we might turn that into a vignette. It might also help @strengejacke to understand better the API (#59).

The first round is about estimating "means" under different scenarios (basically predictions at different levels). I'm challenging @strengejacke for ggeffects and @mattansb for emmeans to reproduce the same "estimation" in the simplest way 😁

Model 1: One continuous and one factor

library(modelbased)

model <- lm(Petal.Length ~ Species * Sepal.Width, data = iris)

# 1. Default behaviour: pick factors and average over continuous
estimate_means(model)  
#> NOTE: Results may be misleading due to involvement in interactions
#> Species    | Mean |   SE |       95% CI
#> ---------------------------------------
#> setosa     | 1.43 | 0.08 | [1.28, 1.58]
#> versicolor | 4.50 | 0.07 | [4.35, 4.65]
#> virginica  | 5.61 | 0.06 | [5.50, 5.72]

# 2. Fix continuous at one value (mean by default) - note that for this particular model it's the same as above, but for more complex models it might not be the same
estimate_means(model, fixed="Sepal.Width")  
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        3.06 | 1.43 | 0.08 | [1.28, 1.58]
#> versicolor |        3.06 | 4.50 | 0.07 | [4.35, 4.65]
#> virginica  |        3.06 | 5.61 | 0.06 | [5.50, 5.72]

# 3. Include all predictors
estimate_means(model, levels = c("Species", "Sepal.Width"), length=2)  
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        2.00 | 1.35 | 0.21 | [0.92, 1.77]
#> versicolor |        2.00 | 3.61 | 0.15 | [3.33, 3.90]
#> virginica  |        2.00 | 4.88 | 0.17 | [4.54, 5.23]
#> setosa     |        4.40 | 1.54 | 0.15 | [1.24, 1.84]
#> versicolor |        4.40 | 5.63 | 0.29 | [5.05, 6.20]
#> virginica  |        4.40 | 6.53 | 0.25 | [6.04, 7.02]


# 4. Specify levels
estimate_means(model, levels = "Species=c('versicolor', 'setosa')")  
#> NOTE: Results may be misleading due to involvement in interactions
#> Species    | Mean |   SE |       95% CI
#> ---------------------------------------
#> setosa     | 1.43 | 0.08 | [1.28, 1.58]
#> versicolor | 4.50 | 0.07 | [4.35, 4.65]


# 5. Specify levels for continuous
estimate_means(model, levels = "Sepal.Width=c(1, 3)")  
#> NOTE: Results may be misleading due to involvement in interactions
#> Sepal.Width | Mean |   SE |       95% CI
#> ----------------------------------------
#> 1.00        | 2.75 | 0.20 | [2.36, 3.13]
#> 3.00        | 3.82 | 0.04 | [3.74, 3.90]


# 6. Same
estimate_means(model, levels = c("Species", "Sepal.Width=0"))  
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        0.00 | 1.18 | 0.50 | [0.19, 2.17]
#> versicolor |        0.00 | 1.93 | 0.49 | [0.97, 2.90]
#> virginica  |        0.00 | 3.51 | 0.51 | [2.50, 4.52]


# 7. Estimate means along the values of a continuous
estimate_means(model, modulate = "Sepal.Width", length=4)  
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        2.00 | 1.35 | 0.21 | [0.92, 1.77]
#> versicolor |        2.00 | 3.61 | 0.15 | [3.33, 3.90]
#> virginica  |        2.00 | 4.88 | 0.17 | [4.54, 5.23]
#> setosa     |        2.80 | 1.41 | 0.11 | [1.20, 1.62]
#> versicolor |        2.80 | 4.29 | 0.05 | [4.18, 4.39]
#> virginica  |        2.80 | 5.43 | 0.06 | [5.31, 5.56]
#> setosa     |        3.60 | 1.48 | 0.06 | [1.36, 1.59]
#> versicolor |        3.60 | 4.96 | 0.16 | [4.65, 5.26]
#> virginica  |        3.60 | 5.98 | 0.12 | [5.74, 6.22]
#> setosa     |        4.40 | 1.54 | 0.15 | [1.24, 1.84]
#> versicolor |        4.40 | 5.63 | 0.29 | [5.05, 6.20]
#> virginica  |        4.40 | 6.53 | 0.25 | [6.04, 7.02]


# 8. Estimate means along specific values of a continuous
estimate_means(model, modulate = "Sepal.Width=c(2, 4)")  
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        2.00 | 1.35 | 0.21 | [0.92, 1.77]
#> versicolor |        2.00 | 3.61 | 0.15 | [3.33, 3.90]
#> virginica  |        2.00 | 4.88 | 0.17 | [4.54, 5.23]
#> setosa     |        4.00 | 1.51 | 0.10 | [1.31, 1.70]
#> versicolor |        4.00 | 5.29 | 0.22 | [4.85, 5.73]
#> virginica  |        4.00 | 6.26 | 0.18 | [5.89, 6.62]

Model 2: Two factors

data <- iris
data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B")
model <- lm(Petal.Length ~ Species * Petal.Length_factor, data = data)

# 9. Default: pick factors
estimate_means(model)  
#> Species    | Petal.Length_factor | Mean |   SE |       95% CI
#> -------------------------------------------------------------
#> setosa     |                   A | 1.46 | 0.05 | [1.36, 1.57]
#> versicolor |                   A | 3.77 | 0.08 | [3.61, 3.94]
#> virginica  |                   A |      |      |             
#> setosa     |                   B |      |      |             
#> versicolor |                   B | 4.56 | 0.07 | [4.43, 4.69]
#> virginica  |                   B | 5.55 | 0.05 | [5.45, 5.66]


# 10. fix at first level
estimate_means(model, fixed="Petal.Length_factor")  
#> Species    | Petal.Length_factor | Mean |   SE |       95% CI
#> -------------------------------------------------------------
#> setosa     |                   A | 1.46 | 0.05 | [1.36, 1.57]
#> versicolor |                   A | 3.77 | 0.08 | [3.61, 3.94]
#> virginica  |                   A |      |      |


 # 11. fix estimation at specific level
estimate_means(model, fixed="Petal.Length_factor='B'") 
#> Species    | Petal.Length_factor | Mean |   SE |       95% CI
#> -------------------------------------------------------------
#> setosa     |                   B |      |      |             
#> versicolor |                   B | 4.56 | 0.07 | [4.43, 4.69]
#> virginica  |                   B | 5.55 | 0.05 | [5.45, 5.66]

Created on 2020-05-20 by the reprex package (v0.3.0)

Error when estimating contrasts for interaction variable that has only two factors

Hi!

Love your package, thanks for all the hard work!

I have a dataset that has a factor variable with two values. I tried to estimate contrasts with a numeric interaction but I am getting an error.

Here is a reproducible example:

## only leave in two values for Species
iris_2FA<- iris[which(iris$Species %in% c("versicolor", "setosa")),]
## estimate model
model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris_2FA)
## estimate contrasts
estimate_contrasts(model, modulate = "Petal.Width", length = 4)

Leads to the following error:
Error in names(levelcols) <- c("Level1", "Level2") : 'names' attribute [2] must be the same length as the vector [0]

The error occurs here:

names(levelcols) <- c("Level1", "Level2")

Contrast between points of continuous predictor

library(ggplot2)
library(modelbased)

model <- lm(Sepal.Length ~ Petal.Length, data=iris)

data <- estimate_link(model) 
  
ggplot(data, aes(x=Petal.Length, y=Predicted)) +
  geom_ribbon(aes(ymin=CI_low, ymax=CI_high), alpha=0.3) +
  geom_line()

Created on 2020-05-19 by the reprex package (v0.3.0)

@mattansb Let's say I'm interested in the contrast between when Petal.Length == 4 vs. Petal.Length == 2. What would be the correct emmeans specification to get that contrast 🤔

cannot estimate contrast

Hi everybody

I get this error with estimate_contrasts()

estimate_contrasts(fit)
Error: 'summarise_posteriors' is not an exported object from 'namespace:parameters'

How can I get contrasts from stan_lmer object ?

Many thanks

estimate_slope: issue in emmeans?

Using emmeans 1.4.2, I encountered this very bizarre issue with this:

library(estimate)
testthat::test_that("estimate_slopes", {
  library(insight)
  library(rstanarm)

  testthat::expect_error(estimate_slopes(insight::download_model("stanreg_lm_1")))

  model <- insight::download_model("stanreg_lm_4")
  estim <- estimate_slopes(model)
  testthat::expect_equal(c(nrow(estim), ncol(estim)), c(1, 7))

  model <- insight::download_model("stanreg_lm_6")
  estim <- estimate_slopes(model)
  testthat::expect_equal(c(nrow(estim), ncol(estim)), c(3, 7))
})

It works perfectly if I run it as is. However, it fails now (not sure since when) when it this run through the checks:

Error in as.vector(x) : object 'model' not found
-- 1. Error: estimate_slopes (@test-estimate_slopes.R#11)  ---------------------------------
Perhaps a 'data' or 'params' argument is needed
1: estimate_slopes(model) at testthat/test-estimate_slopes.R:11
2: estimate_slopes.stanreg(model)
3: .estimate_slopes(model, trend = trend, levels = levels, transform = transform, standardize = standardize, standardize_robust = standardize_robust, 
       centrality = centrality, ci = ci, ci_method = ci_method, test = test, rope_range = rope_range, rope_ci = rope_ci)
4: tryCatch(emmeans::emtrends(model, levels, var = trend, transform = transform, ...), error = function(e) emmeans::emtrends(model, 
       levels, var = trend, ...))
5: tryCatchList(expr, classes, parentenv, handlers)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
7: value[[3L]](cond)
8: emmeans::emtrends(model, levels, var = trend, ...)
9: eval(cl)
10: eval(cl)
11: ref_grid(object = model, options = list(just.data = TRUE))
12: stop("Perhaps a 'data' or 'params' argument is needed")

It seems to be related to emmeans... @mattansb since you're the emmeans expert now, would you have any idea why this happens?
It seems to be a fairly new behaviour (i.e., this test has been there forever and never a problem), but not sure if it's related to a change in emmeans or easystats (and the code here in estimate_slope was not really changed since a long time from what I recall)

Change object names

Change object names to match functions names (estimateContrasts -> estimate_contrasts)

Better explanation of functions in docs

Actually, I'm always confused with what the functions do. The help-files are not helpful yet for (new) users. I think we should add more information to the docs (details section?), explaining the functionality or showing the differences between functions.

issue with estimate_contrasts

Hello,

I have the following problem with the function "estimate_contrasts()":

model <- lmer(data = Data, observationVariable ~ factor1 * factor2 + (1|subjectID))
{no errors or warnings here}
contrasts <- estimate_contrasts(model, levels = "factor1", fixed = "factor2")
{Error in names(levelcols) <- c("Level1", "Level2") :
'names' attribute [2] must be the same length as the vector [0]}

No matter what I do, I always get this error using "estimate_contrasts()" with the "fixed" option. What do I do wrong? Why does the vector have a length of [0]?

Help would be very much appreciated.

Thank you very much and stay healthy!

Problem with as.factor in formula

This works

levels <- "gear"
data <- mtcars
data$gear <- as.factor(data$gear)
model <- lm(mpg ~gear, data = data)

at <- insight::get_data(model)[levels]
at <- sapply(at, modelbased::visualisation_matrix, simplify = FALSE)

emmeans::ref_grid(model, at = at)
#> 'emmGrid' object with variables:
#>     gear = 3, 4, 5

This doesn't

levels <- "gear"
model <- lm(mpg ~as.factor(gear), data = mtcars)

at <- insight::get_data(model)[levels]
at <- sapply(at, modelbased::visualisation_matrix, simplify = FALSE)

emmeans::ref_grid(model, at = at)
#> Error in Math.factor(structure(1:3, .Label = c("3", "4", "5"), class = "factor"), : 'round' not meaningful for factors

Created on 2020-05-19 by the reprex package (v0.3.0)

@mattansb @strengejacke any ideas on how to best address?

robustlmer and estimate_slopes()

Is there any chance to use an rlmerMod Model and estimate_slopes() in the near future? I'm trying to compare slopes of an robust Model at different factor levels, but so far I did not find any solution.

preserve_range issue?

library(estimate)
library(dplyr)

data <- iris[c("Species", "Sepal.Length")]
data <- data_grid(data, length = 20, preserve_range = TRUE)
arrange(data, Species)

range seems not preserved?

Bayes factors for constrasts

@mattansb Now that estimate is updated with the most recent versions of bayestestR and parameters, it would be nice to fully support BFs for contrasts. It seems that it should be quite straightforward to add, would you like to give it a look? ☺️

library(estimate)
library(rstanarm)

model <- stan_glm(Sepal.Width ~ Species, data=iris)
estimate_contrasts(model, test="BF")
#> Warning in bayesfactor_parameters.data.frame(posterior = posterior, prior
#> = prior, : Prior not specified! Please specify priors (with column order
#> matching 'posterior') to get meaningful results.
#> Loading required namespace: logspline
#> Level1     |     Level2 | Median |         89% CI | BF | Median (std.)
#> ----------------------------------------------------------------------
#> setosa     | versicolor |   0.65 |   [0.55, 0.77] |  1 |          1.50
#> setosa     |  virginica |   0.45 |   [0.33, 0.55] |  1 |          1.04
#> versicolor |  virginica |  -0.20 | [-0.31, -0.10] |  1 |         -0.47

Created on 2019-07-20 by the reprex package (v0.3.0)

No probability in estimate_means for logistic estimates

Hi,

A few days ago I had no problem obtaining probability estimates from fitted logistic models using estimate_means. In order to solve another issue, I installed the development version of modelbased and later returned to the latest public release (0.1.2).

packageVersion('modelbased')
[1] ‘0.1.2’

However, this seems to have changed the behaviour of estimate_means unexpectedly. From this simple example:

df <- iris
df$factor1 <- ifelse(df$Sepal.Width > 3, "A", "B")
df$factor2 <- ifelse(df$Sepal.Length > 5, "C", "D")
df$y <- ifelse(df$Species=='setosa', 1, 0)

log_model <- y ~ factor1 + factor2
log_fit <- glm(log_model, family='binomial', data=df)

if I run any of the following:

modelbased::estimate_means(log_fit)
modelbased::estimate_means(log_fit, transform='response')
modelbased::estimate_means(log_fit, transform='none')

I get exactly the same output:

factor1 | factor2 |   Mean |      SE |              95% CI
----------------------------------------------------------
A       |       C |  -0.13 |    0.29 | [   -0.70,    0.45]
B       |       C | -20.46 | 1851.96 | [-3650.22, 3609.31]
A       |       D |  21.02 | 1851.96 | [-3608.74, 3650.79]
B       |       D |   0.69 |    0.61 | [   -0.51,    1.89]

Previously, the Mean column changed to Probability and included the expected values.

For comparison, emmeans behaves as follows:

emmeans::emmeans(log_fit, c('factor1', 'factor2'))
 factor1 factor2  emmean       SE  df asymp.LCL asymp.UCL
 A       C        -0.128    0.292 Inf    -0.701     0.445
 B       C       -20.459 1851.955 Inf -3650.224  3609.306
 A       D        21.024 1851.955 Inf -3608.741  3650.790
 B       D         0.693    0.612 Inf    -0.507     1.893

Results are given on the logit (not the response) scale. 
Confidence level used: 0.95 
emmeans::emmeans(log_fit, c('factor1', 'factor2'), 
    type='response')
 factor1 factor2      prob         SE  df asymp.LCL asymp.UCL
 A       C       0.4680851 0.07278377 Inf 0.3316389 0.6094772
 B       C       0.0000000 0.00000241 Inf 0.0000000 1.0000000
 A       D       1.0000000 0.00000137 Inf 0.0000000 1.0000000
 B       D       0.6666667 0.13608276 Inf 0.3758781 0.8691399

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

Was this an intentional change? Do you have any clue how installing the development version temporarily might have changed the behaviour?

PS: I have just installed again the development version and I get exactly the same output as above.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.