Giter Site home page Giter Site logo

gdemin / expss Goto Github PK

View Code? Open in Web Editor NEW
80.0 9.0 15.0 16.4 MB

expss: Tables and Labels in R

Home Page: https://cran.r-project.org/web/packages/expss/

R 100.00%
r spss vlookup recode labels tables spss-statistics pivot-tables variable-labels labels-support

expss's Introduction

expss

CRAN_Status_Badge Coverage Status

Introduction

expss computes and displays tables with support for 'SPSS'-style labels, multiple / nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', R notebooks, 'Shiny' and 'Jupyter' notebooks. Proper methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package offers useful functions for data processing in marketing research / social surveys - popular data transformation functions from 'SPSS' Statistics and 'Excel' ('RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP', etc.). Package is intended to help people to move data processing from 'Excel'/'SPSS' to R. See examples below. You can get help about any function by typing ?function_name in the R console.

Links

Installation

expss is on CRAN, so for installation you can print in the console install.packages("expss").

Cross-tablulation examples

We will use for demonstartion well-known mtcars dataset. Let's start with adding labels to the dataset. Then we can continue with tables creation.

library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

For quick cross-tabulation there are fre and cross family of function. For simplicity we demonstrate here only cross_cpct which calculates column percent. Documentation for other functions, such as cross_cases for counts, cross_rpct for row percent, cross_tpct for table percent and cross_fun for custom summary functions can be seen by typing ?cross_cpct and ?cross_fun in the console.

# 'cross_*' examples
# just simple crosstabulation, similar to base R 'table' function
cross_cases(mtcars, am, vs)

# Table column % with multiple banners
cross_cpct(mtcars, cyl, list(total(), am, vs))

# magrittr pipe usage and nested banners
mtcars %>% 
    cross_cpct(cyl, list(total(), am %nest% vs))      

We have more sophisticated interface for table construction with magrittr piping. Table construction consists of at least of three functions chained with pipe operator: %>%. At first we need to specify variables for which statistics will be computed with tab_cells. Secondary, we calculate statistics with one of the tab_stat_* functions. And last, we finalize table creation with tab_pivot, e. g.: dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot(). After that we can optionally sort table with tab_sort_asc, drop empty rows/columns with drop_rc and transpose with tab_transpose. Resulting table is just a data.frame so we can use usual R operations on it. Detailed documentation for table creation can be seen via ?tables. For significance testing see ?significance. Generally, tables automatically translated to HTML for output in knitr or Jupyter notebooks. However, if we want HTML output in the R notebooks or in the RStudio viewer we need to set options for that: expss_output_rnotebook() or expss_output_viewer().

# simple example
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# table with caption
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(), am) %>% 
    tab_stat_mean_sd_n() %>%
    tab_last_sig_means(subtable_marks = "both") %>% 
    tab_pivot() %>% 
    set_caption("Table with summary statistics and significance marks.")

# Table with the same summary statistics. Statistics labels in columns.
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(label = "#Total| |"), am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
    tab_pivot()

# Different statistics for different variables.
mtcars %>%
    tab_cols(total(), vs) %>%
    tab_cells(mpg) %>% 
    tab_stat_mean() %>% 
    tab_stat_valid_n() %>% 
    tab_cells(am) %>%
    tab_stat_cpct(total_row_position = "none", label = "col %") %>%
    tab_stat_rpct(total_row_position = "none", label = "row %") %>%
    tab_stat_tpct(total_row_position = "none", label = "table %") %>%
    tab_pivot(stat_position = "inside_rows") 

# Table with split by rows and with custom totals.
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct(total_row_position = "above",
                  total_label = c("number of cases", "row %"),
                  total_statistic = c("u_cases", "u_rpct")) %>% 
    tab_pivot()

# Linear regression by groups.
mtcars %>% 
    tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>% 
    tab_cols(total(label = "#Total| |"), am) %>% 
    tab_stat_fun_df(
        function(x){
            frm = reformulate(".", response = as.name(names(x)[1]))
            model = lm(frm, data = x)
            sheet('Coef.' = coef(model), 
                  confint(model)
            )
        }    
    ) %>% 
    tab_pivot() 

Example of data processing with multiple-response variables

Here we use truncated dataset with data from product test of two samples of chocolate sweets. 150 respondents tested two kinds of sweets (codenames: VSX123 and SDF546). Sample was divided into two groups (cells) of 75 respondents in each group. In cell 1 product VSX123 was presented first and then SDF546. In cell 2 sweets were presented in reversed order. Questions about respondent impressions about first product are in the block A (and about second tested product in the block B). At the end of the questionnaire there was a question about the preferences between sweets.

List of variables:

  • id Respondent Id
  • cell First tested product (cell number)
  • s2a Age
  • a1_1-a1_6 What did you like in these sweets? Multiple response. First tested product
  • a22 Overall quality. First tested product
  • b1_1-b1_6 What did you like in these sweets? Multiple response. Second tested product
  • b22 Overall quality. Second tested product
  • c1 Preferences
data(product_test)

w = product_test # shorter name to save some keystrokes

# here we recode variables from first/second tested product to separate variables for each product according to their cells
# 'h' variables - VSX123 sample, 'p' variables - 'SDF456' sample
# also we recode preferences from first/second product to true names
# for first cell there are no changes, for second cell we should exchange 1 and 2.
w = w %>% 
    let_if(cell == 1, 
        h1_1 %to% h1_6 := recode(a1_1 %to% a1_6, other ~ copy),
        p1_1 %to% p1_6 := recode(b1_1 %to% b1_6, other ~ copy),
        h22 := recode(a22, other ~ copy), 
        p22 := recode(b22, other ~ copy),
        c1r = c1
    ) %>% 
    let_if(cell == 2, 
        p1_1 %to% p1_6 := recode(a1_1 %to% a1_6, other ~ copy), 
        h1_1 %to% h1_6 := recode(b1_1 %to% b1_6, other ~ copy),
        p22 := recode(a22, other ~ copy),
        h22 := recode(b22, other ~ copy), 
        c1r := recode(c1, 1 ~ 2, 2 ~ 1, other ~ copy) 
    ) %>% 
    let(
        # recode age by groups
        age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
        # count number of likes
        # codes 2 and 99 are ignored.
        h_likes = count_row_if(1 | 3 %thru% 98, h1_1 %to% h1_6), 
        p_likes = count_row_if(1 | 3 %thru% 98, p1_1 %to% p1_6) 
    )

# here we prepare labels for future usage
codeframe_likes = num_lab("
    1 Liked everything
    2 Disliked everything
    3 Chocolate
    4 Appearance
    5 Taste
    6 Stuffing
    7 Nuts
    8 Consistency
    98 Other
    99 Hard to answer
")

overall_liking_scale = num_lab("
    1 Extremely poor 
    2 Very poor
    3 Quite poor
    4 Neither good, nor poor
    5 Quite good
    6 Very good
    7 Excellent
")

w = apply_labels(w, 
    c1r = "Preferences",
    c1r = num_lab("
        1 VSX123 
        2 SDF456
        3 Hard to say
    "),
    
    age_cat = "Age",
    age_cat = c("18 - 25" = 1, "26 - 35" = 2),
    
    h1_1 = "Likes. VSX123",
    p1_1 = "Likes. SDF456",
    h1_1 = codeframe_likes,
    p1_1 = codeframe_likes,
    
    h_likes = "Number of likes. VSX123",
    p_likes = "Number of likes. SDF456",
    
    h22 = "Overall quality. VSX123",
    p22 = "Overall quality. SDF456",
    h22 = overall_liking_scale,
    p22 = overall_liking_scale
)

Are there any significant differences between preferences? Yes, difference is significant.

# 'tab_mis_val(3)' remove 'hard to say' from vector 
w %>% tab_cols(total(), age_cat) %>% 
      tab_cells(c1r) %>% 
      tab_mis_val(3) %>% 
      tab_stat_cases() %>% 
      tab_last_sig_cases() %>% 
      tab_pivot()
    

Further we calculate distribution of answers in the survey questions.

# lets specify repeated parts of table creation chains
banner = w %>% tab_cols(total(), age_cat, c1r) 
# column percent with significance
tab_cpct_sig = . %>% tab_stat_cpct() %>% 
                    tab_last_sig_cpct(sig_labels = paste0("<b>",LETTERS, "</b>"))

# means with siginifcance
tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("<b><u>Mean</u></b>", "sd", "N")) %>% 
                      tab_last_sig_means(
                          sig_labels = paste0("<b>",LETTERS, "</b>"),   
                          keep = "means")

# Preferences
banner %>% 
    tab_cells(c1r) %>% 
    tab_cpct_sig() %>% 
    tab_pivot() 

# Overall liking
banner %>%  
    tab_cells(h22) %>% 
    tab_means_sig() %>% 
    tab_cpct_sig() %>%  
    tab_cells(p22) %>% 
    tab_means_sig() %>% 
    tab_cpct_sig() %>%
    tab_pivot() 

# Likes
banner %>% 
    tab_cells(h_likes) %>% 
    tab_means_sig() %>% 
    tab_cells(mrset(h1_1 %to% h1_6)) %>% 
    tab_cpct_sig() %>% 
    tab_cells(p_likes) %>% 
    tab_means_sig() %>% 
    tab_cells(mrset(p1_1 %to% p1_6)) %>% 
    tab_cpct_sig() %>%
    tab_pivot() 

# below more complicated table where we compare likes side by side
# Likes - side by side comparison
w %>% 
    tab_cols(total(label = "#Total| |"), c1r) %>% 
    tab_cells(list(unvr(mrset(h1_1 %to% h1_6)))) %>% 
    tab_stat_cpct(label = var_lab(h1_1)) %>% 
    tab_cells(list(unvr(mrset(p1_1 %to% p1_6)))) %>% 
    tab_stat_cpct(label = var_lab(p1_1)) %>% 
    tab_pivot(stat_position = "inside_columns") 

We can save labelled dataset as *.csv file with accompanying R code for labelling.

write_labelled_csv(w, file  filename = "product_test.csv")

Or, we can save dataset as *.csv file with SPSS syntax to read data and apply labels.

write_labelled_spss(w, file  filename = "product_test.csv")

Export to Microsoft Excel

To export expss tables to *.xlsx you need to install excellent openxlsx package. To install it just type in the console install.packages("openxlsx").

Examples

First we apply labels on the mtcars dataset and build simple table with caption.

library(expss)
library(openxlsx)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (lb/1000)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

mtcars_table = mtcars %>% 
    cross_cpct(
        cell_vars = list(cyl, gear),
        col_vars = list(total(), am, vs)
    ) %>% 
    set_caption("Table 1")

mtcars_table

Then we create workbook and add worksheet to it.

wb = createWorkbook()
sh = addWorksheet(wb, "Tables")

Export - we should specify workbook and worksheet.

xl_write(mtcars_table, wb, sh)

And, finally, we save workbook with table to the xlsx file.

saveWorkbook(wb, "table1.xlsx", overwrite = TRUE)

Screenshot of the exported table: table1.xlsx

Automation of the report generation

First of all, we create banner which we will use for all our tables.

banner = with(mtcars, list(total(), am, vs))

Then we generate list with all tables. If variables have small number of discrete values we create column percent table. In other cases we calculate table with means. For both types of tables we mark significant differencies between groups.

list_of_tables = lapply(mtcars, function(variable) {
    if(length(unique(variable))<7){
        cro_cpct(variable, banner) %>% significance_cpct()
    } else {
        # if number of unique values greater than seven we calculate mean
        cro_mean_sd_n(variable, banner) %>% significance_means()
        
    }
    
})

Create workbook:

wb = createWorkbook()
sh = addWorksheet(wb, "Tables")

Here we export our list with tables with additional formatting. We remove '#' sign from totals and mark total column with bold. You can read about formatting options in the manual fro xl_write (?xl_write in the console).

xl_write(list_of_tables, wb, sh, 
         # remove '#' sign from totals 
         col_symbols_to_remove = "#",
         row_symbols_to_remove = "#",
         # format total column as bold
         other_col_labels_formats = list("#" = createStyle(textDecoration = "bold")),
         other_cols_formats = list("#" = createStyle(textDecoration = "bold")),
         )

Save workbook:

saveWorkbook(wb, "report.xlsx", overwrite = TRUE)

Screenshot of the generated report: report.xlsx

Labels support for base R

Variable label is human readable description of the variable. R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier. Variable label can give a nice, long description of variable. With this description it is easier to remember what those variable names refer to. Value labels are similar to variable labels, but value labels are descriptions of the values a variable can take. Labeling values means we don’t have to remember if 1=Extremely poor and 7=Excellent or vice-versa. We can easily get dataset description and variables summary with info function.

The usual way to connect numeric data to labels in R is factor variables. However, factors miss important features which the value labels provide. Factors only allow for integers to be mapped to a text label, these integers have to be a count starting at 1 and every value need to be labelled. Also, we can’t calculate means or other numeric statistics on factors.

With labels we can manipulate short variable names and codes when we analyze our data but in the resulting tables and graphs we will see human-readable text.

It is easy to store labels as variable attributes in R but most R functions cannot use them or even drop them. expss package integrates value labels support into base R functions and into functions from other packages. Every function which internally converts variable to factor will utilize labels. Labels will be preserved during variables subsetting and concatenation. Additionally, there is a function (use_labels) which greatly simplify variable labels usage. See examples below.

Getting and setting variable and value labels

First, apply value and variables labels to dataset:

library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

In addition to apply_labels we have SPSS-style var_lab and val_lab functions:

nps = c(-1, 0, 1, 1, 0, 1, 1, -1)
var_lab(nps) = "Net promoter score"
val_lab(nps) = num_lab("
            -1 Detractors
             0 Neutralists    
             1 Promoters    
")

We can read, add or remove existing labels:

var_lab(nps) # get variable label
val_lab(nps) # get value labels

# add new labels
add_val_lab(nps) = num_lab("
                           98 Other    
                           99 Hard to say
                           ")

# remove label by value
# %d% - diff, %n_d% - names diff 
val_lab(nps) = val_lab(nps) %d% 98
# or, remove value by name
val_lab(nps) = val_lab(nps) %n_d% "Other"

Additionaly, there are some utility functions. They can applied on one variable as well as on the entire dataset.

drop_val_labs(nps)
drop_var_labs(nps)
unlab(nps)
drop_unused_labels(nps)
prepend_values(nps)

There is also prepend_names function but it can be applied only to data.frame.

Labels with base R and ggplot2 functions

Base table and plotting with value labels:

with(mtcars, table(am, vs))
with(mtcars, 
     barplot(
         table(am, vs), 
         beside = TRUE, 
         legend = TRUE)
     )

There is a special function for variables labels support - use_labels. By now variables labels support available only for expression which will be evaluated inside data.frame.

# table with dimension names
use_labels(mtcars, table(am, vs)) 

# linear regression
use_labels(mtcars, lm(mpg ~ wt + hp + qsec)) %>% summary

# boxplot with variable labels
use_labels(mtcars, boxplot(mpg ~ am))

And, finally, ggplot2 graphics with variables and value labels. Note that with ggplot2 version 3.2.0 and higher you need to explicitly convert labelled variables to factors in the facet_grid formula:

library(ggplot2, warn.conflicts = FALSE)

use_labels(mtcars, {
    # '..data' is shortcut for all 'mtcars' data.frame inside expression 
    ggplot(..data) +
        geom_point(aes(y = mpg, x = wt, color = qsec)) +
        facet_grid(factor(am) ~ factor(vs))
}) 

Extreme value labels support

We have an option for extreme values lables support: expss_enable_value_labels_support_extreme(). With this option factor/as.factor will take into account empty levels. However, unique will give weird result for labelled variables: labels without values will be added to unique values. That's why it is recommended to turn off this option immediately after usage. See examples.

We have label 'Hard to say' for which there are no values in nps:

nps = c(-1, 0, 1, 1, 0, 1, 1, -1)
var_lab(nps) = "Net promoter score"
val_lab(nps) = num_lab("
            -1 Detractors
             0 Neutralists    
             1 Promoters
             99 Hard to say
")

Here we disable labels support and get results without labels:

expss_disable_value_labels_support()
table(nps) # there is no labels in the result
unique(nps)

Results with default value labels support - three labels are here but "Hard to say" is absent.

expss_enable_value_labels_support()
# table with labels but there are no label "Hard to say"
table(nps)
unique(nps)

And now extreme value labels support - we see "Hard to say" with zero counts. Note the weird unique result.

expss_enable_value_labels_support_extreme()
# now we see "Hard to say" with zero counts
table(nps) 
# weird 'unique'! There is a value 99 which is absent in 'nps'
unique(nps) 

Return immediately to defaults to avoid issues:

expss_enable_value_labels_support()

Labels are preserved during common operations on the data

There are special methods for subsetting and concatenating labelled variables. These methods preserve labels during common operations. We don't need to restore labels on subsetted or sorted data.frame.

mtcars with labels:

str(mtcars)

Make subset of the data.frame:

mtcars_subset = mtcars[1:10, ]

Labels are here, nothing is lost:

str(mtcars_subset)

Interaction with 'haven'

To use expss with haven you need to load expss strictly after haven (or other package with implemented 'labelled' class) to avoid conflicts. And it is better to use read_spss with explict package specification: haven::read_spss. See example below. haven package doesn't set 'labelled' class for variables which have variable label but don't have value labels. It leads to labels losing during subsetting and other operations. We have a special function to fix this: add_labelled_class. Apply it to dataset loaded by haven.

# we need to load packages strictly in this order to avoid conflicts
library(haven)
library(expss)
spss_data = haven::read_spss("spss_file.sav")
# add missing 'labelled' class
spss_data = add_labelled_class(spss_data) 

expss's People

Contributors

danchaltiel avatar gdemin avatar johnfrombluff avatar michaelchirico avatar sjewo avatar tmelliott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

expss's Issues

Chi Square Cell Test

Hi @gdemin

I wanted to ask if it is possible or if it is in the EXPSS roadmap to perform the Chi Square Cell Test for a contingency table. This test is very common that is requested by companies such as TNS, GFK and Nielsen. Let's say it's a basic specification next to the independence counter obtained in the table (possible with tab_last_sig_cases).

In my case, I worked with Quantum (basic) and with Barbwin and SPSS (test not available in user interface). I attach documentation that was delivered to me at the time by Kantar TNS, where formula is specified and it can be seen that it is a specification for a 2 * 2 table of the general Chi test. (SPSS doc)

The particular thing is that it is applied to each cell of the table i * j, constructing the table 2 * 2 derived from each cell, the p-value is obtained and the cell is marked if the p-value is significant.

Chi-Square Cell Test.pdf

Thanks in advance.

P.S. If you think it is necessary, I can send calculation examples in EXCEL.

tab_subgroup() with mrset()

Hi @gdemin ...Thats is question, not an issue.

Can I fix a criteria with tab_subgroup() like this?

df <- data.frame(id=seq(1:10),VAR_1=sample(1:5,10,replace=TRUE), VAR_2=sample(1:5,10,replace=TRUE))
df %>% 
    tab_subgroup(mrset(VAR_1 %to% VAR_2)==1) %>% 
    tab_cols(total()) %>% 
    tab_cells(id) %>% 
    tab_stat_cases() %>% 
    tab_pivot()

This table select records with VAR_1==1 but not records with VAR_2==1

The only correct way for writing this subgroup would be VAR_1==1 | VAR_2==1?

Thanks in advance.

TESTS: Errors with develop matrixStats 0.54.0-9000

Hi. While fixing HenrikBengtsson/matrixStats#140 that you posted, I've updated:

Version: 0.54.0-9000 [2019-08-13]

SIGNIFICANT CHANGES:

  • weightedVar(), weightedSd(), weightedMad(), and their row- and column-
    specific counter parts now return a missing value if there are missing
    values in any of the weights 'w' after possibly dropping (x, w) elements
    with missing values in 'x' (na.rm = TRUE). Previously, na.rm=TRUE would
    also drop (x, w) elements where 'w' was missing. With this change, we
    now have that for all functions in this package, na.rm=TRUE never applies
    to weights - only 'x' values.

This affects your expss tests on this, which now produce three errors:

 ERROR
Running the tests intests/testthat.Rfailed.
Last 13 lines of output:
  
  ── 3. Failure: (unknown) (@test_weighted_stats.R#169)  ───────────────
  w_mad(x, w, na.rm = FALSE) not equal to matrixStats::weightedMad(...).
  1/1 mismatches
  [1] 0.193 - NA == NA
  
  Set default dataset to 'd_iris'
  ══ testthat results  ═════════════════════════════════════════════════
  [ OK: 1991 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 3 ]
  1. Failure: (unknown) (@test_weighted_stats.R#77) 
  2. Failure: (unknown) (@test_weighted_stats.R#92) 
  3. Failure: (unknown) (@test_weighted_stats.R#169) 
  
  Error: testthat unit tests failed
  Execution halted

In all these cases, matrixStats::weighted{Var,Mad}(x, w), return NA because there's non-dropped (x,w) element for which w is a missing value.

Could you please fix/workaround this in your tests because it's currently a show stopper for submitting the next version of matrixStats to CRAN. Thxs.

Mean (standard deviation in parentheses)

Hello,

I am trying to create summary statistic table for my data using the following:

"tab_stat_fun("Mean" = w_mean, "SD" = w_sd , "Median"=w_median, "Min"=w_min, "Max"=w_max,"Valid N" = w_n )"

I was wondering if parentheses can be added to standard deviation.

Thank you!

Automated Crosstabs

This is a great tool, thank you for developing it. Is there a way to automate a function so that it would run a column crosstabs across all of the columns? For instance mtcars$wt, against all of them?

It would also take generate a crosstab against all the columns. It would start with mtcars$mpg and then do cross tabs for every column until mtcars$carb

Thanks!

Feature Request - Prefix/Prepend Variable Name & Data Value

Would it possible to add an argument to the various tab_* and cro_* functions that rather than replacing the variable name and raw data value, with their corresponding labels it pre-pends the variable name to the variable label and the data value to the value label akin to how it can be done in much / most output in SPSS?

Tables with frequency for groups of records

Hi @gdemin. Again, I ask for your help. It isn't an issue, it's a question.
With this dataframe (attached prueba.sav)

VAR1 VAR2 VAR3 VAR4
1    1    1 5203
2    2    1 7591
3    1    1 9704
4    1    1 6349
5    2    1 5560
6    2    1 7948
7    1    2 7033
8    2    2  617
9    1    2 2259
10   2    2 6743
11   2    2 4823
12   2    2 5901
13   1    2 3540
14   1    2 3225
15   2    2 5300
16   1    3 1928
17   2    1 8492

I need to list this table

|        |              | #Total |
| ------ | ------------ | ------ |
| Gender |         male |      3 |
|        |       female |      2 |
|        | #Total cases |      3 |

Table is result of count the number of groups (VAR3 =1:3). Male freq 3, mains there are three groups with any male, and female freq 2 mains there are two groups with any female. I asked your help for similar case in the past (with quantity and stats), but I don't get replicate for this case. Similarly, Generalization for this case would be with more variables in cols, rows or cells.

Code for standard table ...

data <- read_spss("prueba.sav")
data %>%
    tab_cols(total()) %>%
    tab_cells(VAR2) %>%
    tab_stat_cases() %>%
    tab_pivot()

prueba.zip

Thanks in advance.

Installing expss to use count_if

Hello team,
i'm trying to install expss but i'm facing this error
the code i'm using is as follows :

install.packages("stringr")
install.packages("knitr")
install.packages("checkmate")
install.packages("htmlwidgets")
install.packages("htmltools")
install.packages("htmlTable")
install.packages("expss")
library(stringr)
library(knitr)
library(checkmate)
library(htmlwidgets)
library(htmltools)
library(htmlTable)
library(expss)

while getting library(htmlTable)
the error is :
installation of package ‘htmlTable’ had non-zero exit status
Error in library(htmlTable) : there is no package called ‘htmlTable’
library(expss)

Error: package or namespace load failed for ‘expss’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
there is no package called ‘htmlTable’
i much appreciate your support please in this issue

Exporting tables into Excel

Hi Mr. Gregory,
I have R version 3.5.1, and I've successfully installed the openxlsx package. When I try to rerun the same codes in the guide (https://cran.r-project.org/web/packages/expss/vignettes/xlsx-export.html), I get this error;

xl_write(mtcars_table, wb, sh)
Error in openxlsx::createStyle(borderStyle = borders$borderStyle, borderColour = borders$borderColour, :
Unknown border argument

How can I solve this problem?
Thank you for your help.

All my best,
Zeliha

Add as_hux function for etables

Hi Greogory,

I like your package very much, but I'm miss a function to export the tables to word or latex.

I made a working, but not complete example, for a function to convert etables to huxtable objects (for the huxtable package by @hughjonesd): sjewo/huxtable@6079014

The function will merge cells automatically and returns a huxtable object for export to Word or LaTeX. @hughjonesd suggested (hughjonesd/huxtable#117) to include this function rather into expss than into huxtable.

What do you think?

Create Nets for Rows

First off, this is a fantastic package and meets a tremendous need for good open-source crosstabulation software that has existed for a long time. Very impressed by the amount of thought and work that has gone into it.

I'm struggling with something that I feel like should be straightforward, but can't figure out after reviewing the readme. I'm not sure if there's a way to create row nets and incorporate those into a crosstab report.

Example: In the github.io page, one of the examples is a variable with response options:

Extremely poor
Very poor
Quite poor
Neither good nor poor
Quite good
Very good
Excellent

If one wanted to display column percentages for each of those rows, but also create two summary rows in order to net the three 'good' options and three 'poor' options and generate a report as follows, how would one go about it:

TOTAL Poor (Sum of responses 1:3)
TOTAL Good (Sum of responses 5;7)

Extremely poor
Very poor
Quite poor
Neither good nor poor
Quite good
Very good
Excellent

Sort columns in table

Hi @gdemin ...

Is there any way to display the columns according to the order of definition of the labels instead of according value serie?

My example ...

a=c(1,2,3,4,5,1,2,3,4)
b=c(1,2,1,2,1,2,1,2,2)
data <- data.frame(a,b)
val_lab(data$a) <- c('PP'=2, 'PSOE'=1, 'UP'=4, 'CS'=3, 'Other'=5)
val_lab(data$b) <- c('Yes'=1, 'No'=2)
data %>%
    tab_cols(a) %>%
    tab_cells(b) %>%
    tab_stat_cases() %>%
    tab_pivot()

I need in my output this order in columns
PP, PSOE, UP, CS, Other

Thanks in advance

Replicate tables

Hi @gdemin,

I want to ask the easier way to replicate this case...

Imagine that I have one variable, BRAND, with codes 1:4 and I need replicate a table like this, whre LIKES is a variable related with BRAND. In the questionnaire, you answer for your first BRAND and next question is LIKES of this BRAND answered. How can I reproduce the below table for every level (value) of BRAND?

I tried "criteria", "by_groups", with an R "loop"..., but I don't find the best solution for a dummies (me). I believe that it should be a more easier way to build it that I don't know.

`BRAND <- c(1,2,3,2,3,2,1,2,3,4,3,2,1,2,3,4,3,4)
LIKES <- c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,1)
GENDER <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)

cis <- dataframe(GENDER,BRAND,LIKES)
cis %>%
tab_cells(LIKES) %>%
tab_cols(total(), GENDER) %>%
tab_stat_cases() %>%
tab_pivot()`

The output for me should be ...

BRAND ==1
table GENDER x LIKES

BRAND ==2
table GENDER x LIKES

BRAND ==3
table GENDER x LIKES

BRAND ==4
table GENDER x LIKES

Thanks for your help...

tabulation fails

The following code fails?

df <- data.frame(gender = sample(c("m", "f"), 1000, replace = TRUE),
agegroup = sample(c("young", "old", 1000, replace = TRUE)))

library(expss)

df %>% tab_cells(gender) %>%
tab_cols(total(), agegroup) %>%
tab_stat_cpct() %>%
tab_pivot()

#> Error in data.table(cell_var, col_var, row_var): object '.R.listCopiesNamed' not found

Session Info:
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> system x86_64, linux-gnu
#> ui X11
#> language en_US:en
#> collate en_US.UTF-8
#> tz Europe/Stockholm
#> date 2018-05-20
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
#> base * 3.5.0 2018-04-24 local
#> checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
#> compiler 3.5.0 2018-04-24 local
#> data.table 1.11.2 2018-05-08 CRAN (R 3.5.0)
#> datasets * 3.5.0 2018-04-24 local
#> devtools 1.13.5 2018-02-18 CRAN (R 3.5.0)
#> digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0)
#> expss * 0.8.6 2018-01-24 CRAN (R 3.5.0)
#> foreign 0.8-69 2017-06-21 CRAN (R 3.5.0)
#> graphics * 3.5.0 2018-04-24 local
#> grDevices * 3.5.0 2018-04-24 local
#> htmlTable 1.11.2 2018-01-20 CRAN (R 3.5.0)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> htmlwidgets 1.2 2018-04-19 CRAN (R 3.5.0)
#> knitr 1.20 2018-02-20 CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
#> matrixStats 0.53.1 2018-02-11 CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
#> methods * 3.5.0 2018-04-24 local
#> Rcpp 0.12.16 2018-03-13 CRAN (R 3.5.0)
#> rmarkdown 1.9 2018-03-01 CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> rstudioapi 0.7 2017-09-07 CRAN (R 3.5.0)
#> stats * 3.5.0 2018-04-24 local
#> stringi 1.2.2 2018-05-02 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
#> tools 3.5.0 2018-04-24 local
#> utils * 3.5.0 2018-04-24 local
#> withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
#> yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)

User defined column headings for fre

It would be great if you could add an argument to fre to change the column headings for the table output:

expss/R/fre.R

Lines 173 to 176 in 4266df1

colnames(res) = c(first_column_label,
"Count", "Valid percent",
"Percent", "Responses, %",
"Cumulative responses, %")

This would be handy for non-English-language (markdown) documents.

Tabulation for scale variables

Hi Mr. Gregory,

I have data in SPSS, and scale variables' class are factor in R. I would like to tabulate a variable that is measured on 5-point Likert scale like this;

1 Extremely poor 
2 Very poor
3 Neither good, nor poor
4 Very good
5 Excellent
T2B(5+4)
B2B(1+2)
Mean
Total

I've learnt how to prepare T2B and B2B from you, thanks again for this information. I have to add the mean under them and I need mean significant tests also. Since their classes are factor, I couldn't add the mean directly. I've come to a solution but not a clean one. Is there any way to tabulate factor variables like this?

Thanks for your help.
Zeliha

html code fails

This is really nice package. Thank you. I've run into an error that gets triggered (I think) when the table has a variable with categories that use a $.

The code works fine within a md file,

library(expss)
tempDF = apply_labels(workingDF,
                      date="Date",
                      ID="ID",
                      functions="Function"
)
cro_cases(tempDF$functions[tempDF$date==19],tempDF$ID[tempDF$date==19])

image

but when I knit it,

image

Set options htmlTable

Hi @gdemin

Can I set htmlTable options for all the expss crosstabs? I want to print all the tables (tab) in output with this options (simple example):

htmlTable(tab, css.cell = c('min-width: 33%; border-top: 1px solid #f0f0f0;font-family: Roboto', rep('min-width: 50px; border-top: 1px solid #f0f0f0;', ncol(tab) - 1)), align = paste(rep('c', ncol(tab) - 1), collapse = ''), align.header = paste(rep('c',ncol(tab)),collapse = ''), align.cgroup = paste(rep('c',ncol(tab)),collapse = ''), col.rgroup = c('#eeeeee', '#ffffff'), col.columns = c('#f8f8f8', '#f0f0f0'), css.rgroup = 'font-weight: 900;font-size: 1em', css.rgroup.sep = 'hr style="border-color: #333333;', css.cgroup = 'font-weight: 900;font-size: 1em', css.cgroup.sep = 'hr style="border-color: #333333;' )

Thanks in advance.
Rober

Compare all columns

Hi Mr. Demin,

I have a nested table and I need to put significant tests for all columns.

The default mode for compare_type is "subtable". Is it possible to change that mode to compare all columns?

I used these functions:

For table columns: tab_cols(Quest1 %nest% list(total(), Quest2)) %>%

For significant tests: tab_last_sig_cpct(sig_level = 0.05, delta_cpct = 0,
min_base = 30, sig_labels = LETTERS,
keep = "percent",bonferroni = TRUE) %>%

Thanks for your help.

All my best,
Zeliha

DT 0.3

Hi, I'm preparing a new release of DT, and I found some tests in your package failed with the dev version of DT:
https://github.com/gdemin/expss/blob/master/tests/testthat/test_html_datatable.R

I'm not sure what exactly you were trying to do in these tests, so I'll appreciate it if you could install the dev version of DT, and see if it is straightforward to fix these tests, or tell me what I need to do in DT to prevent the R CMD check breakage your package.

devtools::install_github('rstudio/DT')
checking tests ... ERROR
  Running ‘testthat.R’ [75s/118s]
Running the tests in ‘tests/testthat.R’ failed.
Complete output:
  > library(testthat)
  > library(expss)
  > options(width = 1000)
  > options(covr = TRUE)
  > test_check("expss")
  Set default dataset to 'a'
... 205 lines ...
  3. Failure: (unknown) (@test_html_datatable.R#55) 
  4. Failure: (unknown) (@test_html_datatable.R#58) 
  5. Failure: (unknown) (@test_html_datatable.R#61) 
  6. Failure: (unknown) (@test_html_datatable.R#64) 
  7. Failure: (unknown) (@test_html_datatable.R#67) 
  8. Failure: (unknown) (@test_html_datatable.R#71) 
  9. Failure: (unknown) (@test_html_datatable.R#75) 
  1. ...
  
  Error: testthat unit tests failed
  Execution halted

Thanks!

Error on trying examples

I am new to R. I get the following error message when I try to run the example provided

Error in name_dots(...) : could not find function "name_dots"


library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

cro(mtcars$am, mtcars$vs)


drop_r() can't take effect when significant test maker appear

Dear Sir,

First, let me say a big thank you for library expss. You make me to generate banner tables (with multiple answers) in R. I can't image that's possible before I find this library.

After many testing run, I found a bug in drop_r. If I use sig test and the significant makers happens, the drop_r() function can't take effect. The empty row will appear.

Please see below for my test codes:

`library(expss)
data(mtcars)

mtcars = apply_labels(mtcars,
vs = "Engine",
vs = c("V-engine" = 0,
"Straight engine" = 1,
"Test Row (blank)" = 2), # additional row without data
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1)
)

Standard tables with empty row

mtcars %>%
tab_cells(vs) %>%
tab_cols(total(), am) %>%
tab_stat_cpct(label = "%", total_statistic = c("u_cases")) %>%
tab_last_sig_cpct() %>%
tab_pivot()

Standard tables without empty row (i.e. with drop_r)

mtcars %>%
tab_cells(vs) %>%
tab_cols(total(), am) %>%
tab_stat_cpct(label = "%", total_statistic = c("u_cases")) %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
drop_r()

make sample size bigger to get significant test results.

When sig test marker appears, drop_r() doesn't take any effect.

The empty row appears.

mtcars_bigger = mtcars
for (i in 1:100) { mtcars_bigger = rbind(mtcars_bigger, mtcars) }

mtcars_bigger %>%
tab_cells(vs) %>%
tab_cols(total(), am) %>%
tab_stat_cpct(label = "%", total_statistic = c("u_cases")) %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
drop_r()

`

Total columns in a nested banner

Hi Mr. Gregory,
I've tried to put totals in a nested banner, and used your codes for an example.
Is it possible to add total columns for the vs variable in these codes(below), so we can see "V-engine Total" and "Straight engine Total" in the same table?

data(mtcars)
mtcars = apply_labels(mtcars,
                      cyl = "Number of cylinders",
                      vs = "Engine",
                      vs = num_lab("
                                    0 V-engine
                                    1 Straight engine
                                                   "),
                      am = "Transmission",
                      am = num_lab("
                                  0 Automatic
                                  1 Manual
                                                "),
                      carb = "Number of carburetors"
                                                 )
mtcars %>%
  tab_cells(cyl) %>%
  tab_cols(total(), vs %nest% am) %>%
  tab_stat_cpct() %>%
  tab_pivot()

Thanks in advance.
Zeliha

Error in executing Rmd

Hi @gdemin,
I've a problem with a chunk in *.rmd file. The instruction set doesn't run properly in a complete *.rmd, but it runs properly in an isolate *.rmd. The instruction set runs properly in a standard script, too. The piece of code is this ...

Note: loading data and package is repeated in the instrucion set for academic needs

suppressMessages(library(expss, quietly = TRUE))
data <- read_spss("~/_datos/99.tmim/testapp.sav")
data$RLOADING = recode(data$LOADING, "Bajo" = 0.00 %thru% 25.00 ~ 1, "Medio" = 25.01 %thru% 46.00 ~ 2, "Alto" = 46.01 %thru% 100 ~ 3)
data$RSPEED = recode(data$SPEED, "Bajo" = 0.00 %thru% 29.80 ~ 1, "Medio" = 29.81 %thru% 50.00 ~ 2, "Alto" = 50.01 %thru% 100 ~ 3)
data$RSECURITY = recode(data$SECURITY, "Bajo" = 0.00 %thru% 25.00 ~ 1, "Medio" = 25.01 %thru% 34.20 ~ 2, "Alto" = 34.21 %thru% 100 ~ 3)
data$RPRIVACY = recode(data$PRIVACY, "Bajo" = 0.00 %thru% 39.00 ~ 1, "Medio" = 39.01 %thru% 53.20 ~ 2, "Alto" = 53.21 %thru% 100 ~ 3)
data$RDESIGN = recode(data$DESIGN, "Bajo" = 0.00 %thru% 41.00 ~ 1, "Medio" = 41.01 %thru% 54.00 ~ 2, "Alto" = 54.01 %thru% 100 ~ 3)
options(digits = 3, width = 9999)
data %>% 
     tab_cols(total(), AGENCY) %>% 
     tab_cells(RLOADING, RSPEED, RSECURITY, RPRIVACY, RDESIGN) %>% 
     tab_stat_cpct() %>% 
     tab_last_sig_cases() %>% 
     tab_pivot()
data %>% 
     tab_cols(total(), AGENCY) %>% 
     tab_cells(RLOADING, RSPEED, RSECURITY, RPRIVACY, RDESIGN) %>% 
     tab_stat_cpct() %>% 
     tab_last_sig_cell_chisq() %>%
     tab_pivot()

and the stop error is ...

Quitting from lines 163-183 (caso1.inf.solucion.Rmd)
Error in if (length(ans) == 0L || as.character(ans[[1L]])[1L] == "~") { :
valor ausente donde TRUE/FALSE es necesario
Calls: ... eval -> eval -> recode -> recode.numeric -> [ -> [.formula
Ejecución interrumpida

I send a zip file with complete and isolate rmd files and *.sav file. I would greatly appreciate your help, as this fact leaves me very baffled.

Thanks in advance...

rstudio-export.zip

Error in running expss Error in data.table(cell_var, col_var) : object 'CcopyNamedInList' not found

mtcars = apply_labels(mtcars,

  •                   mpg = "Miles/(US) gallon",
    
  •                   cyl = "Number of cylinders",
    
  •                   disp = "Displacement (cu.in.)",
    
  •                   hp = "Gross horsepower",
    
  •                   drat = "Rear axle ratio",
    
  •                   wt = "Weight (1000 lbs)",
    
  •                   qsec = "1/4 mile time",
    
  •                   vs = "Engine",
    
  •                   vs = c("V-engine" = 0,
    
  •                          "Straight engine" = 1),
    
  •                   am = "Transmission",
    
  •                   am = c("Automatic" = 0,
    
  •                          "Manual"=1),
    
  •                   gear = "Number of forward gears",
    
  •                   carb = "Number of carburetors"
    
  • )

cro(mtcars$am, mtcars$vs)
Error in data.table(cell_var, col_var) :
object 'CcopyNamedInList' not found

Error in running the above example and my own code (not shown here) using expss which was running fine until this specific bug which similar to that mentioned in the following StackOverflow link. Will be grateful for any help that you can provide to debug this and make it work. Thanks

https://stackoverflow.com/questions/58546705/r-shiny-app-works-locally-but-after-deployment-i-get-an-error-with-i-believe-d

Output from sessionInfo()
sessionInfo()

R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] expss_0.9.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 ggpubr_0.2.3 pillar_1.4.2 compiler_3.5.2 R.methodsS3_1.7.1 R.utils_2.9.0
[7] base64enc_0.1-3 iterators_1.0.12 tools_3.5.2 bit_1.1-14 digest_0.6.22 checkmate_1.9.4
[13] htmlTable_1.13.2 tibble_2.1.3 gtable_0.3.0 lattice_0.20-38 pkgconfig_2.0.3 rlang_0.4.1
[19] Matrix_1.2-17 foreach_1.4.7 rstudioapi_0.10 yaml_2.2.0 parallel_3.5.2 xfun_0.10
[25] stringr_1.4.0 dplyr_0.8.3 knitr_1.25 R.devices_2.16.1 htmlwidgets_1.5.1 bit64_0.9-7
[31] grid_3.5.2 tidyselect_0.2.5 glue_1.3.1 data.table_1.12.6 R6_2.4.0 foreign_0.8-72
[37] Formula_1.2-3 purrr_0.3.3 ggplot2_3.2.1 magrittr_1.5 matrixStats_0.55.0 backports_1.1.5
[43] scales_1.0.0 codetools_0.2-16 htmltools_0.4.0 itertools_0.1-3 assertthat_0.2.1 colorspace_1.4-1
[49] ggsignif_0.6.0 stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 contextual_0.9.8.2 doParallel_1.0.15
[55] crayon_1.3.4 rjson_0.2.20 R.oo_1.22.0

Undesired behaviour with cro_cases function when value labels are defined

Congratulations for your efforts on your excelent package.

It seems that cro_cases does not function properly when labels are previously attached to the variables (using val_lab).

Code to reproduce:

gender = as.factor(c(1, 1, 1, 0, 0, 0, 1, 0, 0, 0,
                     1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1,
                     0, 0, 1, 0, 1, 1, 1, 1, 1, 0))
group = as.factor(c(rep(1, 10), rep(2, 11), rep(3, 10)))
sample.df = data.frame(group, gender)

library(expss)
sample.df = apply_labels(sample.df,
                     gender = "Gender",
                     group = "Group")

try without value labels: functioning ok:

cro_cases(sample.df$group, sample.df$gender)

screenshot - 23_8_2017 7_52_16

try with value labels: functioning not ok:

val_lab(sample.df$group) = num_lab("1 A
                               2 B
                               3 C")
val_lab(sample.df$gender) = num_lab("0 Women
                                1 Men")
cro_cases(sample.df$group, sample.df$gender)

screenshot - 23_8_2017 7_52_47

The problem appears when value labels are defined at column variable (gender in the above example). Value labels at the row variable as well as presence or not of the labels at the variables are irrelevant.

tab_sort_desc

Hi Mr. Demin,

I am happily using your package so far, thanks again.
I've tried to use "tab_sort_desc()" function and it is very well working with "tab_stat_cases" and "tab_stat_cpct", but not with significant tests.

I am using the below code for the test;

" tab_last_sig_cpct(sig_level = 0.05, delta_cpct = 0,
min_base = 30, sig_labels = LETTERS, bonferroni = TRUE) %>% "

Do you think is there any issue here?

Thanks for your help
Regards,
Zeliha

Q: custom table

It's a question, not an issue, I thought it would be the best place to ask the author.
I am trying to create a table from the two independent variables (questions), both variables have the same range of labelled values. Is it possible to create a table where each row shows a single labelled value and in a column header variable name/label? I tried with the mrset() but that's not what I am looking for. I know I can reshape the dataset to long format, so I have a variable indicating the question, but I drop all the labels, so I was hoping to avoid the reshaping.

Thanks!

Align with htmlTable

Hi @gdemin

I tried to align to center all columns in a table (except col 1), using your suggestion for htmlTable ...

htmlTable(tab.gen, css.cell = c("width: 200px", rep("width: 75px", ncol(table.gen) - 1)))

with

htmlTable(tab.gen, css.cell = c("width: 300px", rep("width: 75px", ncol(tab.gen) - 1)), align= paste(rep("c", ncol(tab.gen)-1), collapse = ""))

but I always have the same warning / error

Error in htmlTable.default(prConvertDfFactors(x), ...) : formal argument "align" matched by multiple actual arguments Calls: <Anonymous> ... htmlTable.etable -> htmlTable -> htmlTable.data.frame

What is the correct way for deploy alignment in tables?

Thanks in advance @gdemin

allow use.NA option for cro?

can we have a similar argument like table(, use.NA='') for cro(). It seems the current cro() will drop NA values.
thanks
wei

Print Options

Do you have an example of code that you've used to successfully print the html formatted EXPSS crosstabs (the version that shows up in the RStudio Viewer) to an external file such as a PDF or Word Doc?

I've been struggling with this as I'd like to be able to generate a large number of crosstabs in the very clean/printable formatting that displays within RStudio, but the only export options for multiple tables are into Excel, which is just a text export that removes all the formatting. I've been experimenting with Knitr and RMarkdown but that will generate a document for all the code not just the Crosstab output.

add_labelled_class not available

Hi Gregory:

When I run the following:

library(haven)
library(expss)
mville <- haven::read_spss("/.../path/mville.sav")
mville=add_labelled_class(mville)

I get this error:
Error in add_labelled_class(mville) :
could not find function "add_labelled_class"

Any ideas why?
Thanks! Great package!

Installing problem

Hi Mr. Gregory,
Thanks for this useful package. I am trying to find another program instead of SPSS for tabulation. I hope this package would meet our needs.

I am using Ubuntu 16.04 and RStudio 3.5. I must say I am new in R.

When I write install.packages ("expss") on RStudio console, I get this error:
Warning in install.packages :
installation of package ‘expss’ had non-zero exit status

Should I install another version of R? Is there any other way to install this package?
Best regards,
Zeliha

Question about crosstabs with groups

Hi @gdemin

This is not an issue, it is a question. I have the following dataframe:

library (expss)
set.seed (311265)
hh <- sample (1:75, 100, replace = TRUE)
gender <- sample (1: 2, 100, replace = TRUE)
liters <- sample (1: 5, 100, replace = TRUE)
sc <- sample (1:10, 100, replace = TRUE)
data <- data.frame (hh, gender, liters, sc)

and I want to compute in the expss table format, average of liters of water consumed in households (hh) by men and the average liters of water consumed in households by women, as well as the average liters of water consumed in homes. For this I need to calculate the sum of liters in each household and averaging.

Is it possible in a single step, or should an intermediate dataframe be created? the structure of the table would be this, but the cells should contain the indicated measure.

data%>%
    tab_cols (total (), gender)%>%
    tab_cells (liters)%>% # but liters, should be the sum of the liters of each household
    tab_stat_mean ()%>%
    tab_pivot ()

In the dataframe there are 55 households and there are 38 households in which there is at least one man and 38 households in which there is at least one woman.

My result in EXCEL is

|Sum of liters| TOTAL_ | 1 | 2|
|Mean| 5.33| 3.61| 4,11|

An alternative calculation would be that in the men's column count the liters of men and women in the household if at least one man was in the household and that in the column of women the liters of all the members of the household were men or women if in the home there is at least one woman.

The result would be this:
|Sum of liters| TOTAL_ | 1 | 2|
|Mean|5.33 | 6 | 5.92 |

thanks in advance

Apply CSS to etable object in RMarkdown

Hi @gdemin

I'm newbie in R. This is a question, not an issue. I have been searching in github expss and in stack overflow) about how we can modify css for expss tables.

Is there any way? I tried with an exaple of CSS (w3.css example provided by w3schools) but I don't know how can I apply this CSS beyond YAML in r markdown. I would like to apply table class to expss tables (etables object).

Thanks a lot.

Logical vectors don't recycle

The documentation for count_if says of the ... arguments

Data on which criterion will be applied. Vector, matrix, data.frame, list. Shorter arguments will be recycled.

However, it appears that if the shorter arguments are logical, they are not recycled

count_if(TRUE, c(TRUE, FALSE), TRUE)
# [1] 2
count_if(1, c(1, 0), 1)
# [1] 3

Problems with overlapping subgroups

I am testing on the overlapping categories:

data(product_test)

w = product_test # shorter name to save some keystrokes

# here we recode variables from first/second tested product to separate variables for each product according to their cells
# 'h' variables - VSX123 sample, 'p' variables - 'SDF456' sample
# also we recode preferences from first/second product to true names
# for first cell there are no changes, for second cell we should exchange 1 and 2.
w = w %>% 
  do_if(cell == 1, {
    recode(a1_1 %to% a1_6, other ~ copy) %into% (h1_1 %to% h1_6)
    recode(b1_1 %to% b1_6, other ~ copy) %into% (p1_1 %to% p1_6)
    recode(a22, other ~ copy) %into% h22
    recode(b22, other ~ copy) %into% p22
    c1r = c1
  }) %>% 
  do_if(cell == 2, {
    recode(a1_1 %to% a1_6, other ~ copy) %into% (p1_1 %to% p1_6)
    recode(b1_1 %to% b1_6, other ~ copy) %into% (h1_1 %to% h1_6)
    recode(a22, other ~ copy) %into% p22
    recode(b22, other ~ copy) %into% h22
    recode(c1, 1 ~ 2, 2 ~ 1, other ~ copy) %into% c1r
  }) %>% 
  compute({
    # recode age by groups
    age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2)
    # count number of likes
    # codes 2 and 99 are ignored.
    h_likes = count_row_if(1 | 3 %thru% 98, h1_1 %to% h1_6) 
    p_likes = count_row_if(1 | 3 %thru% 98, p1_1 %to% p1_6) 
  })

# here we prepare labels for future usage
codeframe_likes = num_lab("
                          1 Liked everything
                          2 Disliked everything
                          3 Chocolate
                          4 Appearance
                          5 Taste
                          6 Stuffing
                          7 Nuts
                          8 Consistency
                          98 Other
                          99 Hard to answer
                          ")

overall_liking_scale = num_lab("
                               1 Extremely poor 
                               2 Very poor
                               3 Quite poor
                               4 Neither good, nor poor
                               5 Quite good
                               6 Very good
                               7 Excellent
                               ")

w = apply_labels(w, 
                 c1r = "Preferences",
                 c1r = num_lab("
                               1 VSX123 
                               2 SDF456
                               3 Hard to say
                               "),
                 
                 age_cat = "Age",
                 age_cat = c("18 - 25" = 1, "26 - 35" = 2),
                 
                 h1_1 = "Likes. VSX123",
                 p1_1 = "Likes. SDF456",
                 h1_1 = codeframe_likes,
                 p1_1 = codeframe_likes,
                 
                 h_likes = "Number of likes. VSX123",
                 p_likes = "Number of likes. SDF456",
                 
                 h22 = "Overall quality. VSX123",
                 p22 = "Overall quality. SDF456",
                 h22 = overall_liking_scale,
                 p22 = overall_liking_scale
                 )

w %>%
  tab_cells(subtotal(h22,
                     "1 - 7" = 1:7,
                     "1 - 3" = 1:3,
                     "5 - 7" = 5:7,
                     position = "above")) %>%
  tab_cols(total()) %>%
  tab_stat_cases() %>%
  tab_pivot()

It outputs:

# |                         |                        | #Total |
  # | ----------------------- | ---------------------- | ------ |
  # | Overall quality. VSX123 |                  1 - 7 |    150 |
  # |                         |         Extremely poor |        |
  # |                         |              Very poor |        |
  # |                         |             Quite poor |      3 |
  # |                         | Neither good, nor poor |     16 |
  # |                         |             Quite good |     59 |
  # |                         |              Very good |     50 |
  # |                         |              Excellent |     22 |
  # |                         |                  1 - 3 |      3 |
  # |                         |         Extremely poor |        |
  # |                         |              Very poor |        |
  # |                         |             Quite poor |      3 |
  # |                         |                  5 - 7 |    131 |
  # |                         |             Quite good |     59 |
  # |                         |              Very good |     50 |
  # |                         |              Excellent |     22 |
  # |                         |           #Total cases |    150 |

but I would like the overlapping subgroup showing the subtotal only, as follows

# |                         |                        | #Total |
  # | ----------------------- | ---------------------- | ------ |
  # | Overall quality. VSX123 |                  1 - 7 |    150 |
  # |                         |                  1 - 3 |      3 |
  # |                         |         Extremely poor |        |
  # |                         |              Very poor |        |
  # |                         |             Quite poor |      3 |
  # |                         |                  5 - 7 |    131 |
  # |                         |             Quite good |     59 |
  # |                         |              Very good |     50 |
  # |                         |              Excellent |     22 |
  # |                         |           #Total cases |    150 |

Any suggestion? Thanks a lot.

Error in iconv(enc2utf8(x), from = "UTF8", to = "cp65001") when using expss_output_viewer()

Hi!

Thank you for this great package!

I am getting an error when using expss_output_viewer() or expss_output_rnotebook(). I am also able to reproduce this on a colleague's macOS machine, but have successfully run this code on a Windows machine.

Code:

library(expss)
data(mtcars)
expss_output_viewer()
cro_cpct(mtcars$cyl, list(total(), mtcars$am, mtcars$vs))

Error:

Error in iconv(enc2utf8(x), from = "UTF8", to = "cp65001") : 
  unsupported conversion from 'UTF-8' to 'cp65001'

I am using Rstudio Version 1.1.419 on macOS Version 10.12.6

Note, the expss_output_default() outputs correctly to the console.

Thank you!

Problems with subgroup when having total() as banner

I am having a problems with subgroup when I have set total() as one of my banners. Please see code below for more details:

mtcars %>%
  tab_cells(mpg) %>%
  tab_cols(total(), gear) %>%
  tab_subtotal_cols(1:2, 3:4, "5 and more" = greater(4)) %>%
  tab_stat_mean() %>%
  tab_pivot()

The results are a bit strange:

| | | #Total | 2 | TOTAL #Total/2 | 3 | 4 | TOTAL 3/4 | 5 and more | Number of forward gears | | | | |

One Two TOTAL One/Two Three Four
Miles/(US) gallon Mean 20.1 20.1 16.1 24.5
              |      |            |
TOTAL Three/Four Five 5 and more
         19.9 | 21.4 |       21.4 |

but when I remove total() as banner, it looks totally fine, as follows:

mtcars %>%
  tab_cells(mpg) %>%
  tab_cols(gear) %>%
  tab_subtotal_cols(1:2, 3:4, "5 and more" = greater(4)) %>%
  tab_stat_mean() %>%
  tab_pivot()

Thanks so much in advance.

Error for any command in the package

I continuously get this error when I try to use expss package in the way described in the Tables with Labels document. The error is as follows:

Error in setalloccol(ans) : could not find function "setalloccol"

This is happening after the "cro" command and others. Even re-installed the package and it still occurs.

Mix cases & stats in same table

Hello,

I'm trying to mix stat_cases and stat_mean / sd in the same table. Output is ok, but the number of decimals after the value in cases is equal to number of decimals after the value in mean or sd.

Is it possible getting the same table, but with 0 decimals in stat_cases and one decimal in mean or sd?

I tried with "expss_options" or "expss.options", but I think that I located in bad place argument...

P28 <- sample(c(1:10,98,99), 625, replace=TRUE)
P32 <- sample(1:5, 625, replace=TRUE)
data <- data.frame(P28, P32)
data %>%
tab_cells(P28) %>%
tab_cols(total(), P32) %>%
tab_stat_cases() %>%
tab_cells(P28=na_if(P28, gt(10))) %>%
tab_stat_fun("Media" = w_mean, "Sd" = w_sd) %>%
tab_pivot(stat_position = "inside_rows")

Thanks in advance

cro_rpct reports column cases instead of row cases

Hi Gregory,

I wanted to create a crosstable with row percents. I expected to get a table with number of cases for each row value, instead only the cases for the column variables are reported. Thus there is no information about the number of observations in each row.

cro_rpct(mtcars$am, list(mtcars$gear, total()))

So i tried to create a custom table, but I struggle to calculate the row sums of the valid cases.
The total column give the total number of cases including missing values:

mtcars$gear[c(1,3,7)] <- NA 
mtcars$am[c(4,5)] <- NA

mtcars %>%
  tab_cells(am) %>%
  tab_cols(gear, total()) %>%
  tab_stat_rpct() %>%
  tab_stat_cases() %>%
  tab_pivot(stat_position = "inside_columns")

I would expect to have valid row counts by default in cro_rpct

Table rendering question

Hello Greg,

I am using R studio 1.1.456 & R 3.5.1.

I tried to repeat your example (cro(mtcars$am, mtcars$vs)) after installing htmlTable, data.table, expss and loading them. But the rendering output (as below) looks very different from yours. Could you please help? Thank you.

| | | Engine | |

V-engine Straight engine
Transmission Automatic 12 7
Manual 6 7
#Total cases 18 14

tables: Crosstable with stacked colpct and mean returns additional columns when labels are set

Hi,

first of all: thank you for your work on expss - I recently spent some time making lots of tables and the table functions are just amazing!

I just found out something when I tried to make a table where I stack column percents of some variables and the mean (or other summary statistics) of other vars. There seems to be a problem with the labels. In short: with labelled values the mean occupies it's own columns (s. below). I made a reprex with two small datasets - can you reproduce this? (I do not think this is similar to #3, but I don't know exactly).

Thanks a lot for your work!
Best, G

library(expss) # expss 0.9.1
library(magrittr)

### Some Data without Labels
df.nolab <- structure(list(gndr = c(1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L), age_r6 = c(3, 3, 2, 
1, 3, 2, 2, 1, 3, 2, 3, 5, 5, 2, 2, 4, 1, 3, 5, 4), age = c(40L, 
40L, 33L, 24L, 43L, 33L, 38L, 27L, 44L, 38L, 42L, 61L, 65L, 30L, 
39L, 51L, 24L, 47L, 61L, 59L), cluster = c(5L, 5L, 1L, 1L, 1L, 
1L, 4L, 5L, 4L, 1L, 1L, 1L, 5L, 3L, 3L, 1L, 4L, 4L, 4L, 1L)), row.names = c(NA, 
-20L), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"), na.action = structure(c(`1` = 1L, 
`2` = 2L, `24` = 24L, `32` = 32L, `45` = 45L, `47` = 47L, `64` = 64L, 
`71` = 71L, `72` = 72L, `79` = 79L, `99` = 99L, `102` = 102L, 
`116` = 116L, `126` = 126L, `127` = 127L, `132` = 132L, `138` = 138L, 
`149` = 149L, `150` = 150L, `151` = 151L, `159` = 159L, `161` = 161L, 
`165` = 165L, `174` = 174L, `176` = 176L, `188` = 188L, `192` = 192L, 
`196` = 196L, `203` = 203L, `217` = 217L, `226` = 226L, `249` = 249L, 
`272` = 272L, `275` = 275L, `276` = 276L, `287` = 287L, `288` = 288L, 
`293` = 293L, `301` = 301L, `305` = 305L, `309` = 309L, `322` = 322L, 
`334` = 334L, `336` = 336L, `352` = 352L, `356` = 356L, `361` = 361L, 
`365` = 365L, `369` = 369L, `374` = 374L, `378` = 378L, `389` = 389L, 
`398` = 398L, `401` = 401L, `414` = 414L, `415` = 415L, `418` = 418L, 
`420` = 420L, `432` = 432L, `452` = 452L, `456` = 456L, `471` = 471L, 
`478` = 478L, `480` = 480L, `494` = 494L, `505` = 505L, `521` = 521L, 
`552` = 552L, `556` = 556L, `567` = 567L, `571` = 571L), class = "omit"))

### Same Data with Variable and Value Labels
df.lab <- structure(list(gndr = c(1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L), age_r6 = c(3, 3, 2, 
1, 3, 2, 2, 1, 3, 2, 3, 5, 5, 2, 2, 4, 1, 3, 5, 4), age = c(40L, 
40L, 33L, 24L, 43L, 33L, 38L, 27L, 44L, 38L, 42L, 61L, 65L, 30L, 
39L, 51L, 24L, 47L, 61L, 59L), cluster = structure(c(5L, 5L, 
1L, 1L, 1L, 1L, 4L, 5L, 4L, 1L, 1L, 1L, 5L, 3L, 3L, 1L, 4L, 4L, 
4L, 1L), labels = c(Bla1 = 1, Bla2 = 2, Bla3 = 3, Bla4 = 4, Bla5 = 5
), label = "Type")), row.names = c(NA, -20L), na.action = structure(c(`1` = 1L, 
`2` = 2L, `24` = 24L, `32` = 32L, `45` = 45L, `47` = 47L, `64` = 64L, 
`71` = 71L, `72` = 72L, `79` = 79L, `99` = 99L, `102` = 102L, 
`116` = 116L, `126` = 126L, `127` = 127L, `132` = 132L, `138` = 138L, 
`149` = 149L, `150` = 150L, `151` = 151L, `159` = 159L, `161` = 161L, 
`165` = 165L, `174` = 174L, `176` = 176L, `188` = 188L, `192` = 192L, 
`196` = 196L, `203` = 203L, `217` = 217L, `226` = 226L, `249` = 249L, 
`272` = 272L, `275` = 275L, `276` = 276L, `287` = 287L, `288` = 288L, 
`293` = 293L, `301` = 301L, `305` = 305L, `309` = 309L, `322` = 322L, 
`334` = 334L, `336` = 336L, `352` = 352L, `356` = 356L, `361` = 361L, 
`365` = 365L, `369` = 369L, `374` = 374L, `378` = 378L, `389` = 389L, 
`398` = 398L, `401` = 401L, `414` = 414L, `415` = 415L, `418` = 418L, 
`420` = 420L, `432` = 432L, `452` = 452L, `456` = 456L, `471` = 471L, 
`478` = 478L, `480` = 480L, `494` = 494L, `505` = 505L, `521` = 521L, 
`552` = 552L, `556` = 556L, `567` = 567L, `571` = 571L), class = "omit"), class = c("rowwise_df", 
"tbl_df", "tbl", "data.frame"))


# Same code for a table with stacked column percent of 2 variables and the mean of another variable


# Without Labels this works - all numbers are within the columns
tab.nolab <- df.nolab %>% 
  tab_cells(gndr, age_r6) %>% 
  tab_cols(cluster, total(label = "Total")) %>% 
  tab_stat_cpct(total_label = "n") %>% 
  tab_cells(age) %>% 
  tab_stat_mean(label = "Ø") %>% 
  tab_stat_valid_n(label = "#n") %>% 
  tab_pivot()
tab.nolab

# Result:

# |        |    | cluster |       |      |    | Total |
# |        |    |       1 |     3 |    4 |  5 |       |
# | ------ | -- | ------- | ----- | ---- | -- | ----- |
# |   gndr |  1 |    55.6 |       | 40.0 | 50 |    45 |
# |        |  2 |    44.4 | 100.0 | 60.0 | 50 |    55 |
# |        | #n |     9.0 |   2.0 |  5.0 |  4 |    20 |
# | age_r6 |  1 |    11.1 |       | 20.0 | 25 |    15 |
# |        |  2 |    33.3 | 100.0 | 20.0 |    |    30 |
# |        |  3 |    22.2 |       | 40.0 | 50 |    30 |
# |        |  4 |    22.2 |       |      |    |    10 |
# |        |  5 |    11.1 |       | 20.0 | 25 |    15 |
# |        | #n |     9.0 |   2.0 |  5.0 |  4 |    20 |
# |    age |  Ø |    42.7 |  34.5 | 42.8 | 43 |    42 |
# |        | #n |     9.0 |   2.0 |  5.0 |  4 |    20 |



# With Labels I can't make it work - the mean gets its own columns (s. below)

tab.lab <- df.lab %>% 
  tab_cells(gndr, age_r6) %>% 
  tab_cols(cluster, total(label = "Total")) %>% 
  tab_stat_cpct(total_label = "n") %>% 
  tab_cells(age) %>% 
  tab_stat_mean(label = "Ø") %>% 
  tab_stat_valid_n(label = "#n") %>% 
  tab_pivot()
tab.lab


## Result:

# |        |    | Type |      |      |      |      | Total | Type |      |      |    |
# |        |    | Bla1 | Bla2 | Bla3 | Bla4 | Bla5 |       |    1 |    3 |    4 |  5 |
# | ------ | -- | ---- | ---- | ---- | ---- | ---- | ----- | ---- | ---- | ---- | -- |
# |   gndr |  1 | 55.6 |      |      |   40 |   50 |    45 |      |      |      |    |
# |        |  2 | 44.4 |      |  100 |   60 |   50 |    55 |      |      |      |    |
# |        | #n |  9.0 |      |    2 |    5 |    4 |    20 |      |      |      |    |
# | age_r6 |  1 | 11.1 |      |      |   20 |   25 |    15 |      |      |      |    |
# |        |  2 | 33.3 |      |  100 |   20 |      |    30 |      |      |      |    |
# |        |  3 | 22.2 |      |      |   40 |   50 |    30 |      |      |      |    |
# |        |  4 | 22.2 |      |      |      |      |    10 |      |      |      |    |
# |        |  5 | 11.1 |      |      |   20 |   25 |    15 |      |      |      |    |
# |        | #n |  9.0 |      |    2 |    5 |    4 |    20 |      |      |      |    |
# |    age |  Ø |      |      |      |      |      |    42 | 42.7 | 34.5 | 42.8 | 43 |
# |        | #n |      |      |      |      |      |    20 |  9.0 |  2.0 |  5.0 |  4 |

table size issues

Hi, I am trying to use the expss package but I can't seem to adjust the table size. I tried your first example but my table looks likes this:

_var_folders_d5_3zwq438d2mn5d_92hlxqzcg00000gn_t__rtmpjxot8l_preview-47046616b336_html

How can I make this table smaller? I'm using rmarkdown, and my setup is:

library(knitr)
knitr::opts_chunk$set(echo = TRUE)

Assign zero for blank cells & Total percent

Hi Mr. Gregory,

When we generate a crosstab, we may see some blank cells. Is it possible to assign zero for these blank cells?
And I couldn't find a specific function that designs to put "Total percent" in the table instead of "#Total cases". I would like to put "Total percent" if the results are in percents. Is there any way to put this line in the table?

Here are your codes that I used for an example.

library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (lb/1000)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

mtcars %>%
  tab_prepend_values %>%
  tab_cols(total(), vs, am) %>%
  tab_cells(cyl, gear) %>%
  tab_stat_cpct() %>%
  tab_pivot()

Here is the output;

emptyy

Thanks in advance,

Have a nice week.
Zeliha

Error to re-run code from examples

Getting the following error:

Error in setalloccol(ans): could not find function "setalloccol"
Traceback:

  1. mtcars %>% calc_cro_cpct(cyl, list(total(), am %nest% vs))
  2. withVisible(eval(quote(_fseq(_lhs)), env, env))
  3. eval(quote(_fseq(_lhs)), env, env)
  4. eval(quote(_fseq(_lhs)), env, env)
  5. _fseq(_lhs)
  6. freduce(value, _function_list)
  7. withVisible(function_list[k])
  8. function_list[k]
  9. calc_cro_cpct(., cyl, list(total(), am %nest% vs))
  10. calculate_internal(data, expr = expr, parent = parent.frame())
  11. eval(expr, envir = e, enclos = baseenv())
  12. eval(expr, envir = e, enclos = baseenv())
  13. cro_cpct(cell_vars = cyl, col_vars = list(total(), am %nest%
    . vs), row_vars = NULL, weight = NULL, subgroup = NULL, total_label = NULL,
    . total_statistic = "u_cases", total_row_position = c("below",
    . "above", "none"))
  14. multi_cro(cell_vars = cell_vars, col_vars = col_vars, row_vars = row_vars,
    . weight = weight, subgroup = subgroup, total_label = total_label,
    . total_statistic = total_statistic, total_row_position = total_row_position,
    . stat_type = "cpct")
  15. lapply(row_vars, function(each_row_var) {
    . res = lapply(cell_vars, function(each_cell_var) {
    . all_col_vars = lapply(col_vars, function(each_col_var) {
    . elementary_cro(cell_var = each_cell_var, col_var = each_col_var,
    . row_var = each_row_var, weight = weight, subgroup = subgroup,
    . total_label = total_label, total_statistic = total_statistic,
    . total_row_position = total_row_position, stat_type = stat_type)
    . })
    . Reduce(merge, all_col_vars)
    . })
    . res = do.call(add_rows, res)
    . })
  16. FUN(X[[i]], ...)
  17. lapply(cell_vars, function(each_cell_var) {
    . all_col_vars = lapply(col_vars, function(each_col_var) {
    . elementary_cro(cell_var = each_cell_var, col_var = each_col_var,
    . row_var = each_row_var, weight = weight, subgroup = subgroup,
    . total_label = total_label, total_statistic = total_statistic,
    . total_row_position = total_row_position, stat_type = stat_type)
    . })
    . Reduce(merge, all_col_vars)
    . })
  18. FUN(X[[i]], ...)
  19. lapply(col_vars, function(each_col_var) {
    . elementary_cro(cell_var = each_cell_var, col_var = each_col_var,
    . row_var = each_row_var, weight = weight, subgroup = subgroup,
    . total_label = total_label, total_statistic = total_statistic,
    . total_row_position = total_row_position, stat_type = stat_type)
    . })
  20. FUN(X[[i]], ...)
  21. elementary_cro(cell_var = each_cell_var, col_var = each_col_var,
    . row_var = each_row_var, weight = weight, subgroup = subgroup,
    . total_label = total_label, total_statistic = total_statistic,
    . total_row_position = total_row_position, stat_type = stat_type)
  22. make_datatable_for_cro(cell_var = cell_var, col_var = col_var,
    . row_var = row_var, weight = weight, subgroup = subgroup)
  23. data.table(cell_var, col_var)

Bold letters for sig.test

Hi Mr. Gregory,

I need to use bold letters for significant tests, so I've copied your codes that below;

tab_cpct_sig = . %>% tab_stat_cpct() %>%
tab_last_sig_cpct(sig_labels = paste0("<-b->",LETTERS, "</-b->"))

ps: assume without "-" in <-b-> codes

It works well, but when I export the result into an Excel file, I get the letters with <-b-> codes, as you can see in this file:
sigletter

How can I fix this problem?

All my best,
Zeliha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.