Giter Site home page Giter Site logo

Comments (10)

gdemin avatar gdemin commented on May 26, 2024

Hi @robertogilsaura,
The code below gives me the same result as in your example:

library(expss)
data <- read_spss("prueba.sav")
data %>%
  tab_cols(total()) %>%
  tab_cells(VAR2) %>%
  tab_stat_cases() %>%
  tab_pivot()

  library(expss)
  data <- read_spss("prueba.sav")
  # first variable is gender, second is group
  count_groups = function(var_group){
    # we take unique records to avoid count gender multiple times in the same group
    var_group = unique(var_group)
    # set the same label for group as for first variable to position total in the table
    var_lab(var_group[[2]]) = var_lab(var_group[[1]])
    # we calculate total separately  because we need to count distinct groups
    rbind(
      # count cases
      cro(list(var_group[[1]]), 
          total_row_position = "none"), 
      # calculate total
      cro(
        total(
          unique(var_group[[2]]), label = "#Total"
        ), 
        total_row_position = "none"
      )  
    )
  }
  
  data %>%
    tab_cols(total()) %>%
    tab_cells(data.frame(VAR2, VAR3)) %>%
    tab_stat_fun_df(count_groups) %>%
    tab_pivot()
  # |        |        | #Total |
  # | ------ | ------ | ------ |
  # | Gender |   male |      3 |
  # |        | female |      2 |
  # |        | #Total |      3 |

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

Hi, @gdemin.

Code runs properly. I have tested with my real dataframe and output (10423 records with 2559 groups) is the same with my old software . I tested with other VAR in cols, and output is ok, too.

Thanks for the excellent package, but above all for the excellent attention and personal support.

from expss.

gdemin avatar gdemin commented on May 26, 2024

Hi, @robertogilsaura,
Just pure curiosity - what is a name of your old software? Your task looks very specific for me. In the past I made all my tables with SPSS and I don't know how to easily calculate such tables with it without additional aggregation.

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

Hello again @gdemin

I need to make a variation, on the code you gave me before, but I have not been able to obtain it.

I need to make a table where for each group the maximum value of one variable is obtained, and then the table shows the sum of those maximums with respect to another variable.

In this case, they are students who attend courses in different centers. There are two centers and each center consists of 2 classrooms. To know the number of students that have been trained, I need to take the maximum of each classroom, because it can happen that one session was attended by 100% of students and another 80%, being the same classroom in the same center.

My code for reproducing is this... Ouput is not desired.

library(expss)
centro <- c(1,1,1,1,2,2,2)
aula <-c ("1a","1a","1b","1b","2a","2b","2a")
alumnos <- c(50,50,25,25,100,10,80)
data <- data.frame(centro,aula,alumnos)

bsum.imax = function(dfs) #between groups sum, intragroups max
    {
    # dfs - data.frame
    # first column - value
    # all other columns - object idgroup, it will be centroaula in our case
    # we should reference data.frame column by number because at runtime it will be unknown labels of the variables
    setNames(sum(max(dfs[[2]]), na.rm = TRUE), colnames(dfs)[2]) # here we set name on the result
    }
data %>%
    tab_cols(total(), centro) %>%
    tab_cells(data.frame(aula,alumnos)) %>% # note the data.frame with two variables
    tab_stat_fun_df(bsum.imax) %>%
    tab_pivot() %>% 
    drop_rc() %>% 
    t(.)

Output is ..

 |        |    | alumnos |
 | ------ | -- | ------- |
 | #Total |    |     100 |
 | centro |  1 |      50 |
 |        |  2 |     100 |

But, my desires output is (max1a(50) + max1b(25)=75, max2a(100)+max2b(10)=110)

 |        |    | alumnos |
 | ------ | -- | ------- |
 | #Total |    |     185 |
 | centro |  1 |      75 |
 |        |  2 |     110 |

I have read the "tables" function and I have not been able to find the way to perform the calculation with tab_stat_fun_df () or with another function.

Thanks in advance.
Robert

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

I found a solution, but I don't know if it will be the most appropriate. I think that with multiple variables in the table it would present problems.

library(expss)
centro <- c(1,1,1,1,2,2,2)
aula <-c ("1a","1a","1b","1b","2a","2b","2a")
alumnos <- c(50,50,25,25,100,10,80)
data <- data.frame(centro,aula,alumnos)

dfs <- data %>% group_by(centro,aula)
dfs <- dfs %>% summarise(alumnos=max(alumnos))

dfs %>%
    tab_cols(total(), centro) %>%
    tab_cells(alumnos) %>%
    tab_stat_sum() %>%
    tab_pivot() %>% 
    drop_rc() %>% 
    t(.)

Output ...

 |        |    | alumnos |
 |        |    |     Sum |
 | ------ | -- | ------- |
 | #Total |    |     185 |
 | centro |  1 |      75 |
 |        |  2 |     110 |

Any suggestions for improvement?

Thanks in advance.
Robert.

from expss.

gdemin avatar gdemin commented on May 26, 2024

Hi, @robertogilsaura
In your original version you only take one maximum for all groups. We need to calculate maximum for each alumnos inside bsum.imax:

library(expss)
centro <- c(1,1,1,1,2,2,2)
aula <-c ("1a","1a","1b","1b","2a","2b","2a")
alumnos <- c(50,50,25,25,100,10,80)
data <- data.frame(centro,aula,alumnos)

bsum.imax = function(dfs) #between groups sum, intragroups max
{
    # dfs - data.frame
    # first column - value
    # all other columns - object idgroup, it will be centroaula in our case
    # we should reference data.frame column by number because at runtime it will be unknown labels of the variables
    maxes = tapply(dfs[[2]], dfs[[1]], FUN = max, na.rm = TRUE) # get the maximum from each group
    setNames(sum(maxes, na.rm = TRUE), colnames(dfs)[2]) # here we set name on the result
}
data %>%
    tab_cols(total(), centro) %>%
    tab_cells(data.frame(aula,alumnos)) %>% # note the data.frame with two variables
    tab_stat_fun_df(bsum.imax) %>%
    tab_pivot() %>% 
    drop_rc() %>% 
    t(.)

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

Hi @gdemin,

Thank you very much for your input. I think your solution is more appropriate, as it avoids loading dplyr. However, the response time in large datasets (about 10,000 records) I have seen is very important.

Anyway, it is true that these types of tables are not very common, so in normal datasets, I prefer not to load supplementary packages.

Thank you very much again.

from expss.

gdemin avatar gdemin commented on May 26, 2024

If performance is an issue we can use data.table magic:

bsum.imax = function(dfs) #between groups sum, intragroups max
{
    # dfs - data.frame
    # first column - value
    # all other columns - object idgroup, it will be centroaula in our case
    # we should reference data.frame column by number because at runtime it will be unknown labels of the variables
    varname = colnames(dfs)[1]
    label = colnames(dfs)[2]
    maxes = dfs[, lapply(.SD, max, na.rm = TRUE), by = eval(varname)][[2]] # get the vector with maximums from each group
    setNames(sum(maxes, na.rm = TRUE), label) # here we set name on the result
}

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

Wow magic, magic!!! the performance has improved a lot.

Reduce from 135 seconds - tapply () - to 5 seconds - lapply () -.

Thank you very much again. I have a lot to understand and still learn.

from expss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.