Comments (10)
Hi @robertogilsaura,
The code below gives me the same result as in your example:
library(expss)
data <- read_spss("prueba.sav")
data %>%
tab_cols(total()) %>%
tab_cells(VAR2) %>%
tab_stat_cases() %>%
tab_pivot()
library(expss)
data <- read_spss("prueba.sav")
# first variable is gender, second is group
count_groups = function(var_group){
# we take unique records to avoid count gender multiple times in the same group
var_group = unique(var_group)
# set the same label for group as for first variable to position total in the table
var_lab(var_group[[2]]) = var_lab(var_group[[1]])
# we calculate total separately because we need to count distinct groups
rbind(
# count cases
cro(list(var_group[[1]]),
total_row_position = "none"),
# calculate total
cro(
total(
unique(var_group[[2]]), label = "#Total"
),
total_row_position = "none"
)
)
}
data %>%
tab_cols(total()) %>%
tab_cells(data.frame(VAR2, VAR3)) %>%
tab_stat_fun_df(count_groups) %>%
tab_pivot()
# | | | #Total |
# | ------ | ------ | ------ |
# | Gender | male | 3 |
# | | female | 2 |
# | | #Total | 3 |
from expss.
Hi, @gdemin.
Code runs properly. I have tested with my real dataframe and output (10423 records with 2559 groups) is the same with my old software . I tested with other VAR in cols, and output is ok, too.
Thanks for the excellent package, but above all for the excellent attention and personal support.
from expss.
Hi, @robertogilsaura,
Just pure curiosity - what is a name of your old software? Your task looks very specific for me. In the past I made all my tables with SPSS and I don't know how to easily calculate such tables with it without additional aggregation.
from expss.
from expss.
Hello again @gdemin
I need to make a variation, on the code you gave me before, but I have not been able to obtain it.
I need to make a table where for each group the maximum value of one variable is obtained, and then the table shows the sum of those maximums with respect to another variable.
In this case, they are students who attend courses in different centers. There are two centers and each center consists of 2 classrooms. To know the number of students that have been trained, I need to take the maximum of each classroom, because it can happen that one session was attended by 100% of students and another 80%, being the same classroom in the same center.
My code for reproducing is this... Ouput is not desired.
library(expss)
centro <- c(1,1,1,1,2,2,2)
aula <-c ("1a","1a","1b","1b","2a","2b","2a")
alumnos <- c(50,50,25,25,100,10,80)
data <- data.frame(centro,aula,alumnos)
bsum.imax = function(dfs) #between groups sum, intragroups max
{
# dfs - data.frame
# first column - value
# all other columns - object idgroup, it will be centroaula in our case
# we should reference data.frame column by number because at runtime it will be unknown labels of the variables
setNames(sum(max(dfs[[2]]), na.rm = TRUE), colnames(dfs)[2]) # here we set name on the result
}
data %>%
tab_cols(total(), centro) %>%
tab_cells(data.frame(aula,alumnos)) %>% # note the data.frame with two variables
tab_stat_fun_df(bsum.imax) %>%
tab_pivot() %>%
drop_rc() %>%
t(.)
Output is ..
| | | alumnos |
| ------ | -- | ------- |
| #Total | | 100 |
| centro | 1 | 50 |
| | 2 | 100 |
But, my desires output is (max1a(50) + max1b(25)=75, max2a(100)+max2b(10)=110)
| | | alumnos |
| ------ | -- | ------- |
| #Total | | 185 |
| centro | 1 | 75 |
| | 2 | 110 |
I have read the "tables" function and I have not been able to find the way to perform the calculation with tab_stat_fun_df () or with another function.
Thanks in advance.
Robert
from expss.
I found a solution, but I don't know if it will be the most appropriate. I think that with multiple variables in the table it would present problems.
library(expss)
centro <- c(1,1,1,1,2,2,2)
aula <-c ("1a","1a","1b","1b","2a","2b","2a")
alumnos <- c(50,50,25,25,100,10,80)
data <- data.frame(centro,aula,alumnos)
dfs <- data %>% group_by(centro,aula)
dfs <- dfs %>% summarise(alumnos=max(alumnos))
dfs %>%
tab_cols(total(), centro) %>%
tab_cells(alumnos) %>%
tab_stat_sum() %>%
tab_pivot() %>%
drop_rc() %>%
t(.)
Output ...
| | | alumnos |
| | | Sum |
| ------ | -- | ------- |
| #Total | | 185 |
| centro | 1 | 75 |
| | 2 | 110 |
Any suggestions for improvement?
Thanks in advance.
Robert.
from expss.
Hi, @robertogilsaura
In your original version you only take one maximum for all groups. We need to calculate maximum for each alumnos inside bsum.imax
:
library(expss)
centro <- c(1,1,1,1,2,2,2)
aula <-c ("1a","1a","1b","1b","2a","2b","2a")
alumnos <- c(50,50,25,25,100,10,80)
data <- data.frame(centro,aula,alumnos)
bsum.imax = function(dfs) #between groups sum, intragroups max
{
# dfs - data.frame
# first column - value
# all other columns - object idgroup, it will be centroaula in our case
# we should reference data.frame column by number because at runtime it will be unknown labels of the variables
maxes = tapply(dfs[[2]], dfs[[1]], FUN = max, na.rm = TRUE) # get the maximum from each group
setNames(sum(maxes, na.rm = TRUE), colnames(dfs)[2]) # here we set name on the result
}
data %>%
tab_cols(total(), centro) %>%
tab_cells(data.frame(aula,alumnos)) %>% # note the data.frame with two variables
tab_stat_fun_df(bsum.imax) %>%
tab_pivot() %>%
drop_rc() %>%
t(.)
from expss.
Hi @gdemin,
Thank you very much for your input. I think your solution is more appropriate, as it avoids loading dplyr. However, the response time in large datasets (about 10,000 records) I have seen is very important.
Anyway, it is true that these types of tables are not very common, so in normal datasets, I prefer not to load supplementary packages.
Thank you very much again.
from expss.
If performance is an issue we can use data.table magic:
bsum.imax = function(dfs) #between groups sum, intragroups max
{
# dfs - data.frame
# first column - value
# all other columns - object idgroup, it will be centroaula in our case
# we should reference data.frame column by number because at runtime it will be unknown labels of the variables
varname = colnames(dfs)[1]
label = colnames(dfs)[2]
maxes = dfs[, lapply(.SD, max, na.rm = TRUE), by = eval(varname)][[2]] # get the vector with maximums from each group
setNames(sum(maxes, na.rm = TRUE), label) # here we set name on the result
}
from expss.
Wow magic, magic!!! the performance has improved a lot.
Reduce from 135 seconds - tapply () - to 5 seconds - lapply () -.
Thank you very much again. I have a lot to understand and still learn.
from expss.
Related Issues (20)
- Mismatches between significance test result of R and SPSS HOT 6
- How to not include subtotals in percent calculation? HOT 4
- Redundant column when using subtotal in columns HOT 4
- Help with val_lab() and var_lab() HOT 3
- Intermittent error: "duplicated values in labels" HOT 5
- Converting etable to huxtable leaves duplicate cells empty HOT 2
- 'do_repeat' functionality HOT 1
- Random error 'set_val_lab' - duplicated values in labels: HOT 13
- Please keep do_repeat HOT 2
- Unable to install without 'knitr' - move to Depends? HOT 1
- Problem at writting labelled csv files with accents in labels HOT 10
- Question about computing percentages with different bases HOT 3
- What happened to vlookup? HOT 1
- Where deprecated in future HOT 2
- Question about net HOT 2
- Shiny and expss HOT 2
- var_lab in net HOT 2
- Multiple response in function() HOT 4
- expss--where did it go? HOT 1
- use of `base::mtfrm` entails R 4.2 or later HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from expss.