dcousin3 / superb Goto Github PK

Summary plots with adjusted error bars

Home Page: https://dcousin3.github.io/superb

R 95.43% TeX 4.57%

plotting summary-statistics error-bars summary-plots r statistics visualization

superb's Introduction

superb: Summary plots with adjusted error bars

The library superb offers two main functionalities. First, it can be used to obtain plots with adjusted error bars. The main function is superbPlot() but you can also use superbShiny() for a graphical user interface requiring no programming nor scripting. See the nice tutorial by Walker (2021).

The purpose of superbPlot() is to provide a plot with summary statistics and correct error bars. With simple adjustments, the error bar are adjusted to the design (within or between), to the purpose (single or pair-wise differences), to the sampling method (simple randomized samples or cluster randomized samples) and to the population size (infinite or of a specific size). The superbData() function does not generate the plot but returns the summary statistics and the interval boundaries. These can afterwards be sent to other plotting environment.

The second functionality is to generate random datasets. The function GRD() is used to easily generate random data from any design (within or between) using any population distribution with any parameters, and with various effect sizes. GRD() is useful to test statistical procedures and plotting procedures such as superbPlot().

Installation

The official CRAN version can be installed with

install.packages("superb")
library(superb)

The development version 0.95.14 can be accessed through GitHub:

devtools::install_github("dcousin3/superb")
library(superb)

Examples

The easiest is to use the graphical interface which can be launched with

superbShiny()

The following examples use the script-based commands.

Here is a simple example illustrating the ToothGrowth dataset of rats (in which the dependent variable is len) as a function of the dose of vitamin and the form of the vitamin supplements supp (pills or juice)

superbPlot(ToothGrowth, 
    BSFactors = c("dose","supp"), 
    variables = "len" )

In the above, the default summary statistic, the mean, is used. The error bars are, by default, the 95% confidence intervals. These two choices can be changed with the statistic and the errorbar arguments.

This second example explicitly indicates to display the median instead of the default mean summary statistics

superbPlot(ToothGrowth, 
    BSFactors = c("dose","supp"), 
    variables = "len",
    statistic = "median")

As a third example, we illustrate the harmonic means hmean along with 99.9% confidence intervals using lines:

superbPlot(ToothGrowth, 
    BSFactors = c("dose","supp"), 
    variables = "len",
    statistic = "hmean", 
    errorbar = "CI", gamma = 0.999,
    plotStyle = "line")

The second function, GRD(), can be used to generate random data from designs with various within- and between-subject factors. This example generates scores for 300 simulated participants in a 3 x 2 design with repeated-measures on Days. Only the factor Day is modeled as impacting the scores (the reduce by 3 points on the second day):

testdata <- GRD(
    RenameDV   = "score", 
    SubjectsPerGroup = 100, 
    BSFactors  = "Difficulty(A,B,C)", 
    WSFactors  = "Day(2)",
    Population = list(mean = 75,stddev = 12,rho = 0.5),
    Effects    = list("Day" = slope(-3) )
)
head(testdata)

##   id Difficulty  score.1  score.2
## 1  1          A 81.29915 80.72620
## 2  2          A 67.80337 56.78848
## 3  3          A 82.15947 57.86741
## 4  4          A 74.54734 80.99448
## 5  5          A 97.20008 95.55848
## 6  6          A 87.51076 68.16463

The simulated scores are illustrated using using a more elaborated layout, the pointjitterviolin which, in addition to the mean and confidence interval, shows the raw data using jitter dots and the distribution using a violin plot:

superbPlot(testdata, 
    BSFactors  = "Difficulty", 
    WSFactors  = "Day(2)",
    variables = c("score.1","score.2"),
    plotStyle = "pointjitterviolin",
    errorbarParams = list(color = "purple"),
    pointParams = list( size = 3, color = "purple")
)

In the above example, optional arguments errorbarParams and pointParams are used to inject specifications in the error bars and the points respectively. When these arguments are used, they override the defaults from superbPlot().

For more

As seen, the library superb makes it easy to illustrate summary statistics along with the error bars. Some layouts can be used to visualize additional characteristics of the raw data. Finally, the resulting appearance can be customized in various ways.

The complete documentation is available on this site.

A general introduction to the superb framework underlying this library is published at Advances in Methods and Practices in Psychological Sciences (Cousineau, Goulet, & Harding, 2021).

References

Cousineau D, Goulet M, Harding B (2021). “Summary plots with adjusted error bars: The superb framework with an implementation in R.” Advances in Methods and Practices in Psychological Science, 2021, 1–46. doi:

Walker, J. A. L. (2021). “Summary plots with adjusted error bars (superb).” Youtube video, accessible here.

Walker, J. A. L. (2021). Summary plots with adjusted error bars (superb). Retrieved from https://www.youtube.com/watch?v=rw_6ll5nVus

superb's People

Contributors

Stargazers

Watchers

Forkers

humanfactors achetverikov

superb's Issues

factor levels are shown incorrectly in the outputs

Factors levels in the outputs are not sorted properly.

library(superb)
library(data.table)

set.seed(256)
data <- data.table(expand.grid(A = c('E', 'F'), B = c('E', 'F'), subj = 1:10, trial = 1:20))
data[,A:=factor(A)]
data[,B:=factor(B)]

data[,y:=rnorm(.N)]

wide_data <- dcast(data, subj~A+B, value.var = 'y', fun.aggregate = mean)

superbData(wide_data, WSFactors = c('A(2)','B(2)'), variables = colnames(wide_data[,2:5]))$summaryStatistics

colmeans(wide_data[,2:5])

Gives:

> superbData(wide_data, WSFactors = c('A(2)','B(2)'), variables = colnames(wide_data[,2:5]))$summaryStatistics
  A B      center lowerwidth upperwidth
1 1 1 -0.13316853 -0.1826856  0.1826856
2 1 2  0.01258029 -0.1065945  0.1065945
3 2 1  0.06947037 -0.1426781  0.1426781
4 2 2  0.08321091 -0.1844031  0.1844031
> 
> colmeans(wide_data[,2:5])
        E_E         E_F         F_E         F_F 
-0.13316853  0.06947037  0.01258029  0.08321091

So A==1 & B == 2 now correspond to F_E instead of E_F.

A related problem is that the factor labels are not preserved.

Paired Proportions

Thank you so much for your work in the superb package.

In the Vignette C you explain how it is possible to plot proportions with difference-adjusted confidence intervals.

Do you have any suggestions for plotting proportions from a within-subjects design?

Feature Request: Coerce tibbles to data frames

The superb package does not seem tidyverse friendly at the moment. When superbPlot() or superbData() are given a tibble as input they throw an error, but if that tibble is coerced to a data frame then the functions work as expected. Here is a reprex showing this behaviour:

library(tibble)
library(superb)

# Motivation data for 15 participants over three weeks in wide format:
tib <- matrix( c(
  45, 50,  59,
  47, 58,  64,
  53, 63,  72,
  57, 64,  81,
  58, 67,  86,
  61, 70,  98,
  61, 75, 104,
  63, 79, 100,
  63, 79,  84,
  71, 81,  96,
  72, 83,  82,
  74, 84,  82,
  76, 86,  93,
  84, 90,  85,
  90, 96,  89
), ncol=3, byrow=T)

# put column names then convert to tibble:
colnames(tib) <- c("Week 1", "Week 2", "Week 3")
tib           <- as_tibble(tib)

# Superb throws an error when the data frame is a tibble
superbPlot(tib, 
           WSFactors = "Moment(3)",
           variables = c("Week 1", "Week 2", "Week 3"),
           adjustments = list(purpose = "difference"),
           plotStyle="line"
)
#> Error: Must subset rows with a valid subscript vector.
#> ℹ Logical subscripts must match the size of the indexed input.
#> x Input has size 15 but subscript `!duplicated(x, fromLast = fromLast, ...)` has size 0.

# But if the tibble is coerced to a data frame the function works
superbPlot(as.data.frame(tib), 
           WSFactors = "Moment(3)",
           variables = c("Week 1", "Week 2", "Week 3"),
           adjustments = list(purpose = "difference"),
           plotStyle="line"
)

^{Created on 2021-09-04 by the reprex package (v2.0.0)}

It would be nice if you supported tibbles in this package since that is a data frame format commonly used in R. This would further reduce the difficulty of obtaining the statistics calculated in this package. An easy fix would likely be to just do that coercion inside the superbPlot() and superbData() functions. I have not tested it but something like this should work:

superbData <- function(data, 
    BSFactors     = NULL,            # vector of the between-subject factor columns
    WSFactors     = NULL,            # vector of the names of the within-subject factors
    WSDesign      = "fullfactorial", # or ws levels of each variable if not a full factorial ws design
    factorOrder   = NULL,            # order of the factors for plots
    variables,                       # dependent variable name(s)
    statistic     = "mean",          # descriptive statistics
    errorbar      = "CI",            # content of the error bars
    gamma         = 0.95,            # coverage if confidence intervals
    adjustments   = list(
        purpose        = "single",   # is "single" or "difference"
        popSize        = Inf,        # is Inf or a specific positive integer
        decorrelation  = "none",     # is "CM", "LM", "CA" or "none"
        samplingDesign = "SRS"       # is "SRS" or "CRS" (in which case use clusterColumn)
    ),
    preprocessfct = NULL,            # run preprocessing on the matrix
    postprocessfct= NULL,            # run post-processing on the matrix
    clusterColumn = ""               # if samplineScheme = CRS
) {

    ##############################################################################
    # All DONE: just send this to the main function superbPlot with showPlot=FALSE
    ##############################################################################

    results <- superbPlot(data    = as.data.frame(data), 
        BSFactors      = BSFactors,
        WSFactors      = WSFactors,
        WSDesign       = WSDesign,
        variables      = variables,  
        statistic      = statistic,  
        errorbar       = errorbar, 
        gamma          = gamma, 
        factorOrder    = factorOrder,
        adjustments    = adjustments,
        clusterColumn  = clusterColumn,
        preprocessfct  = preprocessfct,
        postprocessfct = postprocessfct,
        showPlot       = FALSE
    )    
    summaryStatistics = results[[1]]
    rawData = results[[2]]

#    if(missing(factorOrder))  {factorOrder <- c(WSFactors, BSFactors)}
#    widthfct <- paste(errorbar, statistic, sep = ".")

    # do some renaming of the columns for clearer results
#    verbosecol <- c(
#        statistic,
#        if (errorbar == "SE") c("- 1 * SE", "+ 1 * SE") 
#        else if (errorbar == "CI") c(paste("-", gamma* 100, "% CI width"), paste("+", gamma* 100, "% CI width") ) 
#        else if (errorbar == "PI") c(paste("-", gamma* 100, "% PI width"), paste("+", gamma* 100, "% PI width") ) 
#        else c(paste("-", widthfct), paste("+", widthfct) )
#    )
#    colnames(summaryStatistics)[(length(factorOrder)+1):(length(factorOrder)+3)] <- verbosecol

    return(list(summaryStatistics = summaryStatistics, rawData = rawData) )

}

Comments and suggestions

Hi, Denis,

I read your comment on

https://stats.stackexchange.com/questions/60767/how-to-display-error-bars-for-cross-over-paired-experiments

and immediately started reading the documentation. Congratulations, I will make some of your vignettes a recommended reading for my (medical) colleagues.

Some comments:

Your github link is difficult to find from the documentation. When you put it into DESCRIPTION, it will be automatically shown on the first page after a website-rebuild. Vice-versa, on the github page, adding the documentation website in About will make the documentation easier recoverable, e.g:.
Since medical researchers are always VERY busy and normally not willing to read documentation, some type of flowchart with links in the boxes would be nice ("Did you record data from one subject on multiple occasions") for faster orientation.
Medical researchers are madly obsessed with what they call "normal ranges", and with some luck to get them to use the recommended term "reference intervals" to avoid "normal ranges of normals with highly skewed distribution". I have seen and reviewed multiple publications where standard errors where used as "normal ranges", leading to surprising 80% outliers in the normals group. This sometimes works with laboratory data, but the obsession carries forward to physiological data such as gastric emptying times where the whole concepts fails. The more it would be important, if superb could have some additional information on computing referenceIntervals (e.g. with the package with that name), possibly crying loud when 10 points were available with the requirement to compute 95% reference Intervals.

Some minor typos I noted in the vignettes

Making Figure 3: recruted -> recruited

Four steps: high density interval (HDI). -> should be highest density interval

vignette 1: CM: this method, the two authors ->
adjusments -> adjustments

vignette 3: havlved

"DISCONNECTE FROM THE SERVER - RELOAD"

Dear Responsible,
dear Community,

I filled in the steps required to perform the superb online graphic interface _https://dcousin3.shinyapps.io/superbshiny/;
when I click to apply step 5, something goes wrong and the following message is presented:

"DISCONNECTED FROM THE SERVER - RELOAD"

Once I reload, of course, the forms are all empty and I have to restart everithing from the beginning. I repeated the operation over and over but the problem is always there.

Can you help me?

Best regards

dcousin3 / superb Goto Github PK

superb's Introduction

superb: Summary plots with adjusted error bars

Installation

Examples

For more

References

superb's People

Contributors

Stargazers

Watchers

Forkers

superb's Issues

factor levels are shown incorrectly in the outputs

Paired Proportions

Feature Request: Coerce tibbles to data frames

Comments and suggestions

"DISCONNECTE FROM THE SERVER - RELOAD"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent