Comments (14)
Hi Zehila ...
You can test this code ...
library(expss)
value <- sample(1:10, 625, replace=TRUE)
gender <- sample(1:2, 625, replace=TRUE)
data <- data.frame(value,gender)
var_lab(data$value) = 'Number of points'
var_lab(data$gender)= 'Gender'
val_lab(data$gender)= c ('male'=1, 'female'=2)
data %>%
tab_cells(value) %>%
tab_cols(total(), gender) %>%
tab_stat_cpct() %>%
tab_cells(value=na_if(value, gt(10))) %>%
tab_stat_mean_sd_n() %>%
tab_pivot()
Output is this
| | | #Total | Gender | |
| | | | male | female |
| ---------------- | ------------ | ------ | ------ | ------ |
| Number of points | 1 | 10.2 | 9.1 | 11.5 |
| | 2 | 11.0 | 10.3 | 11.8 |
| | 3 | 9.3 | 9.1 | 9.5 |
| | 4 | 9.4 | 11.6 | 7.2 |
| | 5 | 10.9 | 9.4 | 12.5 |
| | 6 | 9.6 | 10.0 | 9.2 |
| | 7 | 10.1 | 9.7 | 10.5 |
| | 8 | 12.5 | 13.4 | 11.5 |
| | 9 | 8.2 | 8.4 | 7.9 |
| | 10 | 8.8 | 9.1 | 8.5 |
| | #Total cases | 625.0 | 320.0 | 305.0 |
| | Mean | 5.4 | 5.5 | 5.3 |
| | Std. dev. | 2.8 | 2.8 | 2.9 |
| | Unw. valid N | 625.0 | 320.0 | 305.0 |
If you test significance means ...
data %>%
tab_cells(value) %>%
tab_cols(total(), gender) %>%
tab_stat_cpct() %>%
tab_cells(value=na_if(value, gt(10))) %>%
tab_stat_mean_sd_n() %>%
tab_last_sig_means() %>%
tab_pivot()
The output is ...
| | | #Total | Gender | | | |
| | | | male | female | male | female |
| | | | | | A | B |
| ---------------- | ------------ | ------ | ------ | ------ | ----- | ------ |
| Number of points | 1 | 10.24 | 9.1 | 11.5 | | |
| | 2 | 11.04 | 10.3 | 11.8 | | |
| | 3 | 9.28 | 9.1 | 9.5 | | |
| | 4 | 9.44 | 11.6 | 7.2 | | |
| | 5 | 10.88 | 9.4 | 12.5 | | |
| | 6 | 9.6 | 10.0 | 9.2 | | |
| | 7 | 10.08 | 9.7 | 10.5 | | |
| | 8 | 12.48 | 13.4 | 11.5 | | |
| | 9 | 8.16 | 8.4 | 7.9 | | |
| | 10 | 8.8 | 9.1 | 8.5 | | |
| | #Total cases | 625 | 320.0 | 305.0 | | |
| | Mean | 5.4 | | | 5.5 | 5.3 |
| | Std. dev. | 2.8 | | | 2.8 | 2.9 |
| | Unw. valid N | 625.0 | | | 320.0 | 305.0 |
and this is my problem, two columns for male and two for female if I add t-test...
from expss.
Hi,
How did you load your data? Generally it is better to load data without automatic conversion to factor.
In your case you can use as.numeric(factor_variable) and it gives correct result. But it is not safe method because in factors values always starts from one. So variable with values -1 = Disagree, 0=H/s, 1=Agree in the SPSS file will become 1 = Disagree, 2=H/s, 3=Agree in R.
from expss.
@robertogilsaura
There are two functions exactly for this case: tab_last_round
and tab_last_add_sig_labels
. It is not obvious and by now I don't know how to make it better.
data %>%
tab_cells(value) %>%
tab_cols(total(), gender) %>%
tab_stat_cpct() %>%
tab_cells(value=na_if(value, gt(10))) %>%
tab_last_round() %>%
tab_last_add_sig_labels() %>%
tab_stat_mean_sd_n() %>%
tab_last_sig_means() %>%
tab_pivot()
from expss.
Upsss... it runs good !!!
Excellent ... I think it is the solution for @zelihay and me...
Thanks @gdemin ,
from expss.
Hi to all,
Thanks for your reply.
I use read.spss function to get SPSS data ; read.spss(file.choose(), to.data.frame=TRUE)
Scale variables already have value labels in Spss data, i.e, they are defined nominal. I guess this is the reason why R sees them as factor.
Yes, as you mentioned I've changed their types and calculated means. It definitely gives correct mean values. Here are my codes, it is almost the same as your codes. Sorry, I couldn't send the Spss file from here.
- S7 is the scale variable.
library(foreign)
library(expss)
A = read.spss(file.choose(), to.data.frame=TRUE)
S7Z <-as.numeric (A$S7)
A = modify(A,{
var_lab(S7Z) = "Do you like it?"
var_lab(s2) = "Gender"
})
set.seed(123)
T2S7ZZ = recode(S7Z, "1" ~ 6, "2" ~ 6, "4" ~ 7,"5" ~ 7)
var_lab(T2S7ZZ) = "T2B"
val_lab(T2S7ZZ) = c("B2B" = 6, "T2B" = 7)
tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("Mean", "sd", "N")) %>%
tab_last_sig_means(sig_labels = LETTERS, keep = "means")
A %>%
tab_prepend_all %>%
tab_cols(total(label = "Total"),s2) %>%
tab_cells(mrset(S7Z,T2S7ZZ)) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct(sig_labels = LETTERS) %>%
tab_means_sig() %>%
tab_pivot()%>%
drop_rc()`
Here is the output. You can see my notes which explain why I need to modify these codes.
![scale](https://user-images.githubusercontent.com/35660557/51133649-a108da00-1846-11e9-911e-62107268a7a2.jpeg)
from expss.
from expss.
Mr. Gregory,
Maybe this gives you an idea..
As you know, we can calculate means in Spss custom table by adding a category into a question.
If we calculate means like this, do you think significant test function still works? This is the main problem for Spss.
Here is the syntax;
CTABLES
/VLABELS VARIABLES=S7 s2 DISPLAY=LABEL
/PCOMPUTE &cat1 = EXPR(([1]*1+[2]*2+[3]*3+[4]*4+[5]*5)/([1]+[2]+[3]+[4]+[5]))
/PPROPERTIES &cat1 LABEL = "Mean" FORMAT=COLPCT.COUNT F40.1 HIDESOURCECATS=NO
/TABLE S7 [C][COLPCT.COUNT F40.1] BY s2 [C]
/CATEGORIES VARIABLES=S7 [1, 2, 3, 4, 5, &cat1, OTHERNM] EMPTY=INCLUDE
/CATEGORIES VARIABLES=s2 ORDER=A KEY=VALUE EMPTY=INCLUDE.
from expss.
It is better to read spss file with read_spss
- it doesn't convert variables to factors and preserve original labels.
About table on your screenshot:
- There are two means because you calculate means on 'mrset' which is two column structure. We need to specify that we compute mean on single variable S7. It can be made with second
tab_cells(S7)
. - To remove total in the middle of the table we can specify that
tab_last_sig_cpct
keep only percent. - But now we need to add total at the bottom of the table. And we can do it with statistics
tab_stat_valid_n
.
Example:
library(expss)
A = read_spss(file.choose())
A = modify(A,{
var_lab(S7) = "Do you like it?"
var_lab(s2) = "Gender"
T2S7ZZ = recode(S7, 1 ~ 6, 2 ~ 6, 4 ~ 7, 5 ~ 7)
var_lab(T2S7ZZ) = "T2B"
val_lab(T2S7ZZ) = c("B2B" = 6, "T2B" = 7)
})
tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("Mean", "sd", "N")) %>%
tab_last_sig_means(sig_labels = LETTERS, keep = "means")
A %>%
tab_prepend_all %>%
tab_cols(total(label = "Total"),s2) %>%
tab_cells(mrset(S7,T2S7ZZ)) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct(sig_labels = LETTERS, keep = "percent") %>% # keep only percent, drop total row
tab_cells(S7) %>% # leave only one variable for means
tab_means_sig() %>%
tab_stat_unweighted_valid_n(label = "#Total") %>% # add total row
tab_pivot() %>%
drop_rc() # Do you really need it? 'drop_rc' remove all empty rows and columns
I didn't know about /PCOMPUTE &cat1 = EXPR(([1]*1+[2]*2+[3]*3+[4]*4+[5]*5)/([1]+[2]+[3]+[4]+[5]))
. Really it is very complex way to calculate mean. I am sure, there should be an easier way.
from expss.
Hi Mr. Gregory,
You've done quite impressive job :) Thank you so much and thanks for detailed explanation. I have two minor problems now (my questions are in bold, below).
- I've run your codes and gotten the output like this;
Is it possible to put row total without adding extra columns?
I want to share my codes for percents only tables. Maybe someone also needs to generate the same output for percents.
A %>%
tab_prepend_all %>%
tab_cols(total(label = "Total"),s2) %>%
tab_cells(mrset(S7Z,T2S7ZZ)) %>%
tab_stat_cpct(total_row_position = "none") %>%
tab_cells(S7Z) %>%
tab_stat_fun("Mean" = w_mean, "Sd" = w_sd) %>%
tab_stat_unweighted_valid_n(label = "Total") %>%
tab_pivot()%>%
drop_rc() # Yes, for this occasion I don't need this line.
Here is the output for percents only;
- The other minor problem is, as I've said before, I would like to give the texts not numbers for the scale variable in the table. In the output above, we see 2 for "Very poor", 3 for "Neither good, nor poor", etc. Since S7Z is numeric, I put original question (S7) in the codes and I get what I want but with unnecessary question texts.
Here are the codes for percents only;
A %>%
tab_prepend_all %>%
tab_cols(total(label = "Total"),s2) %>%
tab_cells(S7,T2S7ZZ) %>%
tab_stat_cpct(total_row_position = "none") %>%
tab_cells(S7Z) %>%
tab_stat_fun("Mean" = w_mean, "Sd" = w_sd) %>%
tab_stat_unweighted_valid_n(label = "Total") %>% # add total row
tab_pivot()%>%
drop_rc()
I've also used "tab_cells(mrset(S7,T2S7ZZ)) %>% " instead of "tab_cells(S7,T2S7ZZ) %>% " but the order of the scale is not correct. It gives B2B and T2B first, and the rest is also in the wrong order.
For this manner I've mentioned Spss calculation for means. I've tought it can help us too compute the mean like Spss for categorical variables. But it's just an idea and I don't know how to add :(
Do you think we can somehow put original question text in the table with T2B, T3B and mean?
I am sorry If I make too much trouble, but I am so close too use your package and I want to use it :)
All my best,
Zeliha
from expss.
Hi @zelihay ... in this sequence of messages is the answer for minor problem (1)
@gdemin said ...
There are two functions exactly for this case: tab_last_round and tab_last_add_sig_labels. It is not obvious and by now I don't know how to make it better.
data %>%
tab_cells(value) %>%
tab_cols(total(), gender) %>%
tab_stat_cpct() %>%
tab_cells(value=na_if(value, gt(10))) %>%
tab_last_round() %>%
tab_last_add_sig_labels() %>%
tab_stat_mean_sd_n() %>%
tab_last_sig_means() %>%
tab_pivot()
for second minor problem, I don't understand whiy your var labels aren't read for your spss file. If use read_spss.
However, you can use apply_labels() or val_lab() / var_lab() in order to assign your label values o label variable. For example ...
var_lab(data$value) = 'Number of points'
var_lab(data$gender)= 'Gender'
val_lab(data$gender)= c ('male'=1, 'female'=2)
Regards
from expss.
Hi, @zelihay
As @robertogilsaura wrote, you need to add after tab_stat_cpct
:
tab_last_round() %>%
tab_last_add_sig_labels() %>%
As for second issue, I don't quite understand what do you need:
I get what I want but with unnecessary question texts.
or
we can somehow put original question text
In the first case there is a function drop_var_lab
which should remove your labels:
tab_cells(drop_var_lab(mrset(S7Z,T2S7ZZ)))
In the second case you need to set variable labels or they should be in your SPSS file.
In any case example of your data will be very helpful.
from expss.
Hi to all,
Thanks for your help.
If I use tab_last_round(), tab_last_add_sig_labels()
in this order, it helps;
A %>%
tab_prepend_all %>%
tab_cols(total(label = "Total"),s2) %>%
tab_cells(mrset(S7Z,T2S7ZZ)) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct(sig_labels = LETTERS, keep = "percent") %>% # keep only percent, drop total row
tab_cells(S7Z) %>% # leave only one variable for means
tab_means_sig() %>%
tab_stat_unweighted_valid_n(label = "#Total") %>% # add total row
tab_last_round() %>%
tab_last_add_sig_labels() %>%
tab_pivot() %>%
drop_rc()
For second issue:
Sorry, it should be "scale texts" in here : "we can somehow put original scale texts"
Let me explain again why I need scale texts..
I have the original question(S7) that is factor and it has scale texts, like very poor, poor, etc. Since I have to calculate the mean for S7, I've changed the type of S7 and created S7Z as numeric. I put S7Z in codes for tabulation, because of that I couldn't see scale texts in the table.
I've tried to use drop_var_labs
but it doesn't help with mrset. Is there any way to use drop_var_labs
with tab_cells(S7,T2S7ZZ)
?
from expss.
If you will read your data with read_spss
you will get labelled variable with labels. It will not require conversion to numeric. See example below.
If really want to read your SPSS with factors you can convert that factors to labelled with as.labelled (S7Z = as.labelled(S7))
). It will gives you variable with scale texts which you can use in mean calculations.
library(expss)
# A = read_spss(file.choose()) # to get rid from factors
set.seed(123)
A = data.frame(
S7 = sample(1:7, 100, replace = TRUE),
s2 = sample(1:2, 100, replace = TRUE)
) %>%
apply_labels(
S7 = "Do you like it?",
S7 = num_lab("
1 Extremely poor
2 Very poor
3 Neither good, nor poor
4 Very good
5 Excellent
"),
s2 = "Gender",
s2 = c("Female" = 1, "Male" = 2)
)
tab_means_sig = . %>% tab_stat_mean_sd_n(labels = c("Mean", "sd", "N")) %>%
tab_last_sig_means(sig_labels = LETTERS, keep = "means")
A %>%
prepend_all() %>%
compute({
T2S7 = recode(S7, 1:2 ~ 6, 4:5 ~ 7)
var_lab(T2S7) = "T2B"
val_lab(T2S7) = c("B2B" = 6, "T2B" = 7)
}) %>%
tab_cols(total(label = "Total"),s2) %>%
tab_cells(drop_var_labs(mrset(S7,T2S7))) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct(sig_labels = LETTERS, keep = "percent") %>% # keep only percent, drop total row
tab_cells("|" = drop_var_labs(S7)) %>% # "|" to suppress any labels
tab_means_sig() %>%
tab_stat_unweighted_valid_n(label = "#Total") %>% # add total row
tab_last_round() %>%
tab_last_add_sig_labels() %>%
tab_pivot() %>%
drop_rc()
# | | Total | s2 Gender | |
# | | | 1 Female | 2 Male |
# | | | A | B |
# | ------------------------ | ----- | --------- | ------ |
# | 1 Extremely poor | 15.0 | 10.0 | 20.0 |
# | 2 Very poor | 13.0 | 12.0 | 14.0 |
# | 3 Neither good, nor poor | 15.0 | 14.0 | 16.0 |
# | 4 Very good | 15.0 | 20.0 | 10.0 |
# | 5 Excellent | 15.0 | 16.0 | 14.0 |
# | B2B | 40.0 | 38.0 | 42.0 |
# | T2B | 45.0 | 48.0 | 42.0 |
# | Mean | 4.0 | 4.2 | 3.8 |
# | #Total | 100 | 50 | 50 |
from expss.
That's great! Thank you so much :) I really appreciate your help.
from expss.
Related Issues (20)
- Mismatches between significance test result of R and SPSS HOT 6
- How to not include subtotals in percent calculation? HOT 4
- Redundant column when using subtotal in columns HOT 4
- Help with val_lab() and var_lab() HOT 3
- Intermittent error: "duplicated values in labels" HOT 5
- Converting etable to huxtable leaves duplicate cells empty HOT 2
- 'do_repeat' functionality HOT 1
- Random error 'set_val_lab' - duplicated values in labels: HOT 13
- Please keep do_repeat HOT 2
- Unable to install without 'knitr' - move to Depends? HOT 1
- Problem at writting labelled csv files with accents in labels HOT 10
- Question about computing percentages with different bases HOT 3
- What happened to vlookup? HOT 1
- Where deprecated in future HOT 2
- Question about net HOT 2
- Shiny and expss HOT 2
- var_lab in net HOT 2
- Multiple response in function() HOT 4
- expss--where did it go? HOT 1
- use of `base::mtfrm` entails R 4.2 or later HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from expss.