Giter Site home page Giter Site logo

Tabulation for scale variables about expss HOT 14 CLOSED

gdemin avatar gdemin commented on May 26, 2024
Tabulation for scale variables

from expss.

Comments (14)

robertogilsaura avatar robertogilsaura commented on May 26, 2024 1

Hi Zehila ...
You can test this code ...

library(expss)
value <- sample(1:10, 625, replace=TRUE)
gender <- sample(1:2, 625, replace=TRUE)
data <- data.frame(value,gender)
var_lab(data$value) = 'Number of points'
var_lab(data$gender)= 'Gender'
val_lab(data$gender)= c ('male'=1, 'female'=2)
data %>%
    tab_cells(value) %>%
    tab_cols(total(), gender) %>%
    tab_stat_cpct() %>%
    tab_cells(value=na_if(value, gt(10))) %>%
    tab_stat_mean_sd_n() %>%
    tab_pivot()
Output is this
|                  |              | #Total | Gender |        |
 |                  |              |        |   male | female |
 | ---------------- | ------------ | ------ | ------ | ------ |
 | Number of points |            1 |   10.2 |    9.1 |   11.5 |
 |                  |            2 |   11.0 |   10.3 |   11.8 |
 |                  |            3 |    9.3 |    9.1 |    9.5 |
 |                  |            4 |    9.4 |   11.6 |    7.2 |
 |                  |            5 |   10.9 |    9.4 |   12.5 |
 |                  |            6 |    9.6 |   10.0 |    9.2 |
 |                  |            7 |   10.1 |    9.7 |   10.5 |
 |                  |            8 |   12.5 |   13.4 |   11.5 |
 |                  |            9 |    8.2 |    8.4 |    7.9 |
 |                  |           10 |    8.8 |    9.1 |    8.5 |
 |                  | #Total cases |  625.0 |  320.0 |  305.0 |
 |                  |         Mean |    5.4 |    5.5 |    5.3 |
 |                  |    Std. dev. |    2.8 |    2.8 |    2.9 |
 |                  | Unw. valid N |  625.0 |  320.0 |  305.0 |

If you test significance means ...

data %>%
    tab_cells(value) %>%
    tab_cols(total(), gender) %>%
    tab_stat_cpct() %>%
    tab_cells(value=na_if(value, gt(10))) %>%
    tab_stat_mean_sd_n() %>%
    tab_last_sig_means() %>% 
    tab_pivot()

The output is ...

 |                  |              | #Total | Gender |        |       |        |
 |                  |              |        |   male | female |  male | female |
 |                  |              |        |        |        |     A |      B |
 | ---------------- | ------------ | ------ | ------ | ------ | ----- | ------ |
 | Number of points |            1 |  10.24 |    9.1 |   11.5 |       |        |
 |                  |            2 |  11.04 |   10.3 |   11.8 |       |        |
 |                  |            3 |   9.28 |    9.1 |    9.5 |       |        |
 |                  |            4 |   9.44 |   11.6 |    7.2 |       |        |
 |                  |            5 |  10.88 |    9.4 |   12.5 |       |        |
 |                  |            6 |    9.6 |   10.0 |    9.2 |       |        |
 |                  |            7 |  10.08 |    9.7 |   10.5 |       |        |
 |                  |            8 |  12.48 |   13.4 |   11.5 |       |        |
 |                  |            9 |   8.16 |    8.4 |    7.9 |       |        |
 |                  |           10 |    8.8 |    9.1 |    8.5 |       |        |
 |                  | #Total cases |    625 |  320.0 |  305.0 |       |        |
 |                  |         Mean |    5.4 |        |        |   5.5 |    5.3 |
 |                  |    Std. dev. |    2.8 |        |        |   2.8 |    2.9 |
 |                  | Unw. valid N |  625.0 |        |        | 320.0 |  305.0 |

and this is my problem, two columns for male and two for female if I add t-test...

from expss.

gdemin avatar gdemin commented on May 26, 2024

Hi,
How did you load your data? Generally it is better to load data without automatic conversion to factor.
In your case you can use as.numeric(factor_variable) and it gives correct result. But it is not safe method because in factors values always starts from one. So variable with values -1 = Disagree, 0=H/s, 1=Agree in the SPSS file will become 1 = Disagree, 2=H/s, 3=Agree in R.

from expss.

gdemin avatar gdemin commented on May 26, 2024

@robertogilsaura
There are two functions exactly for this case: tab_last_round and tab_last_add_sig_labels. It is not obvious and by now I don't know how to make it better.

data %>%
    tab_cells(value) %>%
    tab_cols(total(), gender) %>%
    tab_stat_cpct() %>%
    tab_cells(value=na_if(value, gt(10))) %>%
    tab_last_round() %>% 
    tab_last_add_sig_labels() %>% 
    tab_stat_mean_sd_n() %>%
    tab_last_sig_means() %>% 
    tab_pivot()

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

Upsss... it runs good !!!
Excellent ... I think it is the solution for @zelihay and me...

Thanks @gdemin ,

from expss.

zelihay avatar zelihay commented on May 26, 2024

Hi to all,

Thanks for your reply.

I use read.spss function to get SPSS data ; read.spss(file.choose(), to.data.frame=TRUE)

Scale variables already have value labels in Spss data, i.e, they are defined nominal. I guess this is the reason why R sees them as factor.
Yes, as you mentioned I've changed their types and calculated means. It definitely gives correct mean values. Here are my codes, it is almost the same as your codes. Sorry, I couldn't send the Spss file from here.

  • S7 is the scale variable.
library(foreign)
library(expss)

A = read.spss(file.choose(), to.data.frame=TRUE)

S7Z <-as.numeric (A$S7)

A = modify(A,{
  var_lab(S7Z) = "Do you like it?"
  var_lab(s2) = "Gender"
        })

set.seed(123)
T2S7ZZ = recode(S7Z, "1" ~ 6, "2" ~ 6, "4" ~ 7,"5" ~ 7)
var_lab(T2S7ZZ) = "T2B"
val_lab(T2S7ZZ) = c("B2B" = 6, "T2B" = 7)

tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("Mean", "sd", "N")) %>% 
                         tab_last_sig_means(sig_labels = LETTERS, keep = "means")

A %>%
  tab_prepend_all %>%
  tab_cols(total(label = "Total"),s2) %>%
  tab_cells(mrset(S7Z,T2S7ZZ)) %>% 
  tab_stat_cpct() %>%
  tab_last_sig_cpct(sig_labels = LETTERS) %>%
  tab_means_sig() %>%
  tab_pivot()%>%
  drop_rc()`

Here is the output. You can see my notes which explain why I need to modify these codes.

![scale](https://user-images.githubusercontent.com/35660557/51133649-a108da00-1846-11e9-911e-62107268a7a2.jpeg)



from expss.

zelihay avatar zelihay commented on May 26, 2024

scale

from expss.

zelihay avatar zelihay commented on May 26, 2024

Mr. Gregory,

Maybe this gives you an idea..
As you know, we can calculate means in Spss custom table by adding a category into a question.

If we calculate means like this, do you think significant test function still works? This is the main problem for Spss.

Here is the syntax;

CTABLES
/VLABELS VARIABLES=S7 s2 DISPLAY=LABEL
/PCOMPUTE &cat1 = EXPR(([1]*1+[2]*2+[3]*3+[4]*4+[5]*5)/([1]+[2]+[3]+[4]+[5]))
/PPROPERTIES &cat1 LABEL = "Mean" FORMAT=COLPCT.COUNT F40.1 HIDESOURCECATS=NO
/TABLE S7 [C][COLPCT.COUNT F40.1] BY s2 [C]
/CATEGORIES VARIABLES=S7 [1, 2, 3, 4, 5, &cat1, OTHERNM] EMPTY=INCLUDE
/CATEGORIES VARIABLES=s2 ORDER=A KEY=VALUE EMPTY=INCLUDE.

from expss.

gdemin avatar gdemin commented on May 26, 2024

It is better to read spss file with read_spss - it doesn't convert variables to factors and preserve original labels.
About table on your screenshot:

  1. There are two means because you calculate means on 'mrset' which is two column structure. We need to specify that we compute mean on single variable S7. It can be made with second tab_cells(S7).
  2. To remove total in the middle of the table we can specify that tab_last_sig_cpct keep only percent.
  3. But now we need to add total at the bottom of the table. And we can do it with statistics tab_stat_valid_n.

Example:

library(expss)

A = read_spss(file.choose())

A = modify(A,{
    var_lab(S7) = "Do you like it?"
    var_lab(s2) = "Gender"
    T2S7ZZ = recode(S7, 1 ~ 6, 2 ~ 6, 4 ~ 7, 5  ~ 7)
    var_lab(T2S7ZZ) = "T2B"
    val_lab(T2S7ZZ) = c("B2B" = 6, "T2B" = 7)
})

tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("Mean", "sd", "N")) %>% 
    tab_last_sig_means(sig_labels = LETTERS, keep = "means")

A %>%
    tab_prepend_all %>%
    tab_cols(total(label = "Total"),s2) %>%
    tab_cells(mrset(S7,T2S7ZZ)) %>% 
    tab_stat_cpct() %>%
    tab_last_sig_cpct(sig_labels = LETTERS, keep = "percent") %>% # keep only percent, drop total row
    tab_cells(S7) %>% # leave only one variable for means
    tab_means_sig() %>%
    tab_stat_unweighted_valid_n(label = "#Total") %>% # add total row
    tab_pivot() %>%
    drop_rc() # Do you really need it? 'drop_rc' remove all empty rows and columns

I didn't know about /PCOMPUTE &cat1 = EXPR(([1]*1+[2]*2+[3]*3+[4]*4+[5]*5)/([1]+[2]+[3]+[4]+[5])). Really it is very complex way to calculate mean. I am sure, there should be an easier way.

from expss.

zelihay avatar zelihay commented on May 26, 2024

Hi Mr. Gregory,
You've done quite impressive job :) Thank you so much and thanks for detailed explanation. I have two minor problems now (my questions are in bold, below).

  1. I've run your codes and gotten the output like this;

scale

Is it possible to put row total without adding extra columns?

I want to share my codes for percents only tables. Maybe someone also needs to generate the same output for percents.

A %>%
    tab_prepend_all %>%
    tab_cols(total(label = "Total"),s2) %>%
    tab_cells(mrset(S7Z,T2S7ZZ)) %>% 
    tab_stat_cpct(total_row_position = "none") %>%
    tab_cells(S7Z) %>% 
    tab_stat_fun("Mean" = w_mean, "Sd" = w_sd) %>%
    tab_stat_unweighted_valid_n(label = "Total") %>% 
    tab_pivot()%>%
    drop_rc() # Yes, for this occasion I don't need this line.

Here is the output for percents only;
scale2

  1. The other minor problem is, as I've said before, I would like to give the texts not numbers for the scale variable in the table. In the output above, we see 2 for "Very poor", 3 for "Neither good, nor poor", etc. Since S7Z is numeric, I put original question (S7) in the codes and I get what I want but with unnecessary question texts.

Here are the codes for percents only;

A %>%
    tab_prepend_all %>%
    tab_cols(total(label = "Total"),s2) %>%
    tab_cells(S7,T2S7ZZ) %>% 
    tab_stat_cpct(total_row_position = "none") %>%
    tab_cells(S7Z) %>% 
    tab_stat_fun("Mean" = w_mean, "Sd" = w_sd) %>%
    tab_stat_unweighted_valid_n(label = "Total") %>% # add total row
    tab_pivot()%>%
    drop_rc()

And the output ;
scale3

I've also used "tab_cells(mrset(S7,T2S7ZZ)) %>% " instead of "tab_cells(S7,T2S7ZZ) %>% " but the order of the scale is not correct. It gives B2B and T2B first, and the rest is also in the wrong order.

For this manner I've mentioned Spss calculation for means. I've tought it can help us too compute the mean like Spss for categorical variables. But it's just an idea and I don't know how to add :(

Do you think we can somehow put original question text in the table with T2B, T3B and mean?

I am sorry If I make too much trouble, but I am so close too use your package and I want to use it :)

All my best,
Zeliha

from expss.

robertogilsaura avatar robertogilsaura commented on May 26, 2024

Hi @zelihay ... in this sequence of messages is the answer for minor problem (1)

@gdemin said ...
There are two functions exactly for this case: tab_last_round and tab_last_add_sig_labels. It is not obvious and by now I don't know how to make it better.

data %>%
tab_cells(value) %>%
tab_cols(total(), gender) %>%
tab_stat_cpct() %>%
tab_cells(value=na_if(value, gt(10))) %>%
tab_last_round() %>%
tab_last_add_sig_labels() %>%

tab_stat_mean_sd_n() %>%
tab_last_sig_means() %>%
tab_pivot()

for second minor problem, I don't understand whiy your var labels aren't read for your spss file. If use read_spss.

However, you can use apply_labels() or val_lab() / var_lab() in order to assign your label values o label variable. For example ...
var_lab(data$value) = 'Number of points'
var_lab(data$gender)= 'Gender'
val_lab(data$gender)= c ('male'=1, 'female'=2)

Regards

from expss.

gdemin avatar gdemin commented on May 26, 2024

Hi, @zelihay
As @robertogilsaura wrote, you need to add after tab_stat_cpct:
tab_last_round() %>%
tab_last_add_sig_labels() %>%

As for second issue, I don't quite understand what do you need:

I get what I want but with unnecessary question texts.

or

we can somehow put original question text

In the first case there is a function drop_var_lab which should remove your labels:

tab_cells(drop_var_lab(mrset(S7Z,T2S7ZZ)))

In the second case you need to set variable labels or they should be in your SPSS file.
In any case example of your data will be very helpful.

from expss.

zelihay avatar zelihay commented on May 26, 2024

Hi to all,

Thanks for your help.

If I use tab_last_round(), tab_last_add_sig_labels() in this order, it helps;

A %>%
    tab_prepend_all %>%
    tab_cols(total(label = "Total"),s2) %>%
    tab_cells(mrset(S7Z,T2S7ZZ)) %>%
    tab_stat_cpct() %>%
    tab_last_sig_cpct(sig_labels = LETTERS, keep = "percent") %>% # keep only percent, drop total row
    tab_cells(S7Z) %>% # leave only one variable for means
    tab_means_sig() %>%
    tab_stat_unweighted_valid_n(label = "#Total") %>% # add total row
    tab_last_round() %>%
    tab_last_add_sig_labels() %>%
    tab_pivot() %>%
    drop_rc()

For second issue:
Sorry, it should be "scale texts" in here : "we can somehow put original scale texts"

Let me explain again why I need scale texts..

I have the original question(S7) that is factor and it has scale texts, like very poor, poor, etc. Since I have to calculate the mean for S7, I've changed the type of S7 and created S7Z as numeric. I put S7Z in codes for tabulation, because of that I couldn't see scale texts in the table.

I've tried to use drop_var_labs but it doesn't help with mrset. Is there any way to use drop_var_labs with tab_cells(S7,T2S7ZZ)?

from expss.

gdemin avatar gdemin commented on May 26, 2024

If you will read your data with read_spss you will get labelled variable with labels. It will not require conversion to numeric. See example below.
If really want to read your SPSS with factors you can convert that factors to labelled with as.labelled (S7Z = as.labelled(S7))). It will gives you variable with scale texts which you can use in mean calculations.

library(expss)
# A = read_spss(file.choose()) # to get rid from factors
set.seed(123)
A = data.frame(
    S7 = sample(1:7, 100, replace = TRUE),
    s2 = sample(1:2, 100, replace = TRUE)
) %>% 
    apply_labels(
        S7 = "Do you like it?",
        S7 = num_lab("
                        1 Extremely poor 
                        2 Very poor
                        3 Neither good, nor poor
                        4 Very good
                        5 Excellent
                      "),
        s2 = "Gender",
        s2 = c("Female" = 1, "Male" = 2)
        
    )



tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("Mean", "sd", "N")) %>% 
    tab_last_sig_means(sig_labels = LETTERS, keep = "means")

A %>%
    prepend_all() %>%
    compute({
        T2S7 = recode(S7, 1:2 ~ 6, 4:5 ~ 7)
        var_lab(T2S7) = "T2B"
        val_lab(T2S7) = c("B2B" = 6, "T2B" = 7)
    }) %>% 
    tab_cols(total(label = "Total"),s2) %>%
    tab_cells(drop_var_labs(mrset(S7,T2S7))) %>%
    tab_stat_cpct() %>%
    tab_last_sig_cpct(sig_labels = LETTERS, keep = "percent") %>% # keep only percent, drop total row
    tab_cells("|" = drop_var_labs(S7)) %>% # "|" to suppress any labels 
    tab_means_sig() %>%
    tab_stat_unweighted_valid_n(label = "#Total") %>% # add total row
    tab_last_round() %>%
    tab_last_add_sig_labels() %>%
    tab_pivot() %>%
    drop_rc()

# |                          | Total | s2 Gender |        |
# |                          |       |  1 Female | 2 Male |
# |                          |       |         A |      B |
# | ------------------------ | ----- | --------- | ------ |
# |         1 Extremely poor |  15.0 |      10.0 |   20.0 |
# |              2 Very poor |  13.0 |      12.0 |   14.0 |
# | 3 Neither good, nor poor |  15.0 |      14.0 |   16.0 |
# |              4 Very good |  15.0 |      20.0 |   10.0 |
# |              5 Excellent |  15.0 |      16.0 |   14.0 |
# |                      B2B |  40.0 |      38.0 |   42.0 |
# |                      T2B |  45.0 |      48.0 |   42.0 |
# |                     Mean |   4.0 |       4.2 |    3.8 |
# |                   #Total |   100 |        50 |     50 |

from expss.

zelihay avatar zelihay commented on May 26, 2024

That's great! Thank you so much :) I really appreciate your help.

from expss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.