melff / memisc Goto Github PK
View Code? Open in Web Editor NEWTools for Managing Survey Data, Creating Tables of Estimates and Data Summaries
Home Page: https://melff.github.io/memisc
Tools for Managing Survey Data, Creating Tables of Estimates and Data Summaries
Home Page: https://melff.github.io/memisc
Hi,
I'm only experiencing this issue on my Linux PC, the same script runs without trouble on Windows.
I'm importing a survey dataset, there are a couple of string variables in the dataset. The string variables mostly contain numbers, eg "1", "2", "-66", but some of them contain text-based answers as well, eg. "Please don't send me more surveys".
I've now noticed that only the string variables with absolutely no text seem to be working properly in memisc on Linux. is.character(ds$var) returns TRUE, and it can be coerced into numeric without errors. The variables with values containing text on the other hand will give errors:
>ds$problemvar
Item 'blablabla variable label blablabla' (measurement: nominal, type: character, length = 1729)
Error in if (any(xw > width)) { : missing value where TRUE/FALSE needed
> str(ds$problemvar)
Nmnl. item chr [1:1729] "-66
"|
__truncated__ ...
It appears that some form of truncation is happening. Here is what it looks like when indexing the column:
> ds[1]
Data set with 1729 observations and 1 variables
...
1 ...
2 ...
3 ...
4 ...
5 ...
6 ...
7 ...
8 ...
9 ...
10 ...
While another variable, based on a near identical survey question, works fine:
> str(ds$noproblemvar)
Nmnl. item chr [1:1729] "-66" "-66" "-66" "-66" ...
I have been comparing the above variables every which way, both in SPSS and R; the only discernible difference is that one of them, while being exported from the survey software as a string variable because text input was allowed, only contains numbers.
I'm importing the data from .sav files like so:
in_file = suppressWarnings(
spss.system.file(
file.path(use_dir, sav_file, fsep= .Platform$file.sep)))
ds = as.data.set(in_file)
Anyways, thanks for making memisc. My script works on windows so I can still make use of it, but it would be nice to figure out a workaround so I can handle these datasets in Linux as well. I can send you a .sav dataset to troubleshoot with if that helps.
library(magrittr)
library(memisc)
array(as.character(1:12), dim = c(2,2,3), dimnames = list(X=1:2, Y=letters[1:2], Z=LETTERS[1:3])) %>%
ftable %>% toLatex(extrarowsep="1ex")
Output:
\begin{tabular}{lllD{.}{.}{0}D{.}{.}{0}D{.}{.}{0}}
\toprule
& && \multicolumn{3}{c}{Z}\\
\cmidrule{4-4}\cmidrule{5-5}\cmidrule{6-6}
X&Y && \multicolumn{1}{c}{A}&\multicolumn{1}{c}{B}&\multicolumn{1}{c}{C}\\
\midrule
1&a && 1 & 5 & 9\\
&b && 3 & 7 & 11\\
2&a && 2 & 6 & 10\\[1ex]
&b && 4 & 8 & 12\\
1&a && 1 & 5 & 9\\[1ex]
\bottomrule
\end{tabular}
The first row is repeated for some reason, the extrarowsep should appear one line above its actual appearance, and omitted for the last row.
I have been using memisc for a while to open some SPSS files.
I have just updated the package to the current version (0.99.25.5), and the function spss.system.file is not able anymore to open those files, and returns an error.
The error states that the variable "encoding" is not defined, when running the lines of code message(sprintf("File character set is '%S'.", encoding)).
The result is that the function spss.system.file fails to execute and it is not possible to load the file.
Please note that I went back using the old version of memisc (0.99.22) and with this package version I am able to correctly open the SPSS files with spss.system.file. This suggests that the problem is not with the file, but there might be a bug in the new version of spss.system.file.
P.s.
Your package has been extremely helpful over the years, thank you for your work.
Having double quotes embedded in a quote string does not work. In other words, this string
"Has used drugs - lifetime (incl marijuana ""just once"")"
2
does not work. I had to replace the double quotes with the ` character to read the file.
SPSS.zip are the examples.
I can provide a link to the data file if you wish to test it but it is 18mb zipped
I want to use openxlsx::write.,xlsx(my_data_set,file="mydata,xlsx")
It works fine but I get the labels. The missing codes are replaced with missing values. I would actually like to get the numeric codes or labels for the missing values. Is that feasible? I think I remember doing it once but that may be my imagination. Suggestions?
I am trying to convert a data set to data.frame using to.data.frame, after loading it using memisc::spss.system.file(filename) and subset(..). I get the following error:
While testing memisc I got a strange problem. It works well in platform x86_64-w64-mingw32/x64 (64-bit) but not in platform i386-w64-mingw32/i386 (32-bit).
It's a bug or I missed something?
Thanks,
Manel Salamero
[email protected]
PS: I attached a file with the instructions and results.
memisc_test.txt
Hi,
I have been processing SPSS .sav files with great success with an older version of memisc and R on Windows.
I have recently upgraded to the 3.4 series of R and memisc version 0.99.14.2.
In code where I use slicing e.g.:
data_sp[,2]
or
data_sp[,names(data_sp)=="grp"]
I now get this error:
Error in .Call("read_sysfile_slice", x@ptr, what = x, j = cols, i = rows, : "read_sysfile_slice" not resolved from current namespace (memisc
I find the function "read_sysfile_slice" in pspp-system-for-R.c.
I believe that "read_sysfile_slice" should be exported in memisc.h
Best regards,
Jon Wickmann
I am trying to process cycle30 of the General Social survey of statistics canada.
The file has a very large record layout. To see if it was feasible, I split of just the record layout
I got the following error message. I have included the columns file and the log from the run. Is there anything that can be done or some I find someone who can load it for me in spss.
Error in rofseek(fptr, pos = 0) : not an rofile
gss30_processl_log.pdf
gss30_columns.txt
Hi,
Very helpful package! I am just running into one issue where there doesn't seem to be any support for the Date
class, which our group finds pretty important for survey data. Are there any plans to add support in the near future? I installed the latest version of memisc
from GitHub, and code that shows the problem is below.
Thanks!
~Andrew
> rccsData<-data.set(rccsData)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘as.item’ for signature ‘"Date"’
> colnames(rccsData)[3]<-"sample.date"
> rccsData<-within(rccsData, {
+ description(sample.date)<-"Date of interview and sample collection"
+ })
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] memisc_0.99.4.1 MASS_7.3-40 lattice_0.20-31
loaded via a namespace (and not attached):
[1] tools_3.2.0 grid_3.2.0
I want to import a SPSS file (*.sav) to R and I guess that memisc is the best available package for that. However, I'm facing two issues - probably due to my low R skills - and any help would be greatly appreciated.
Firstly, measure information "Ordinal", defined in the SPSS file, is skipped when imported in R and ordinal variables are declared as "Nominal". In addition the SPSS file has two extra Attributes (two custom columns added in Variable View) that are not imported when memisc importer is used.
I wonder if any of the above issues is treated successfully with memisc.
Thanks in advanced.
Thanks for closing the issue with duplicate labels. It works, but unfortunately it is very slow on large data sets. The time spent on importing data is about 100 times longer with deduplicate_labels()
than without. My guess is that the implementation could be improved.
Beware that the test file is big: 1.7 GB, and that it will take almost 4 hours to run deduplicate_labels()
on it.
myurl <- "http://hansekbrand.se/temp/test_deduplicate.sav"
z <- tempfile()
download.file(myurl,z,mode="wb")
my.meta.data <- spss.system.file(z)
## File character set is 'UTF-8'.
## Converting character set to the local 'utf-8'.
## Warning message:
## 1 variables have duplicated labels:
## SHDISTRI
#### The next step takes almost 4 hours on my machine
fixed.meta.data <- deduplicate_labels(my.meta.data)
Importing a subset of the file without running deduplicate_labels()
takes only a few minutes.
my.subset <- c("HHID", "HVIDX", "HV000", "HV001", "HV002", "HV005", "HV006",
"HV007", "HV009", "HV013", "HV014", "HV016", "HV024", "HV025",
"HV028", "HV201", "HV204", "HV205", "HV207", "HV208", "HV209",
"HV210", "HV211", "HV212", "HV213", "HV214", "HV215", "HV216",
"HV221", "HV225", "HV226", "HV227", "HV228", "HV230A", "HV236",
"HV237", "HV239", "HV241", "HV242", "HV243B", "HV243C", "HV243D",
"HV244", "HV245", "HV246", "HV247", "HV271", "SH36", "HV101",
"HV104", "HV105", "HV106", "HV108", "HV111", "HV112", "HV113",
"HV114", "HV140", "HC60")
names(my.subset) <- my.subset
my.ds <- subset(my.meta.data, select = my.subset)
my.df <- as.data.frame(within(my.ds, {
missing.values(HV112) <- c("Mother not in household")
}))
Is there a way to speed up deduplicate_labels()
?
When trying to import the attached dataset with spss.system.file() , this error occures:
Error in row(mrang_val[, 1:2]) :
ein matrixähnliches Objekt ist als Argument für 'row' nötig (GERMAN)
should be something like
a matrix-like object is required as an argument for 'row' (english)
AtestSet.sav.zip
This happens only if there is set a discrete missing value AND a missing value range at the same time.
Dear Martin,
I have been given a SPSS system file that I would like to analyse using R. I am using the following magic for parsing the file into R.
library(memisc)
foo <- spss.system.file("foobar.sav")
bar <- subset(foo, select=c(var1,var2,var3))
When having a look at the parsed data, you get the following:
> bar
Data set with 379 observations and 3 variables
var1 var2 var3
1 gut weiblich Herbst
2 gut mnlich Sommer
3 gut mnlich Sommer
4 gut mnlich Winter
5 gut mnlich Fr�hling
6 gut mnlich Fr�hling
7 gut weiblich Fr�hling
.
.
.
25 gut weiblich Fr�hling
.. ........ ........... ...........
(27 of 379 observations shown)
I guess you get the idea. The collaborator has saved the sav-file in utf-8 by adding a line SET UNICODE = ON.
to his/her syntax-file. My locals are set to utf-8, too.
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.04
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] graphics grDevices datasets utils stats methods base
other attached packages:
[1] foreign_0.8-63 memisc_0.97 MASS_7.3-40 lattice_0.20-29
[5] ggplot2_1.0.1 reshape2_1.4.1 plyr_1.8.2
I am using the uxterm
terminal-emulator for running R. Thus, everything is in utf-8. I have the strong suspicion that memisc
is using a latin1
encoding when parsing the SPSS sav-file by default. Is this correct? Is it possible to change this encoding when parsing?
Thanks you very much!
PS. Why does it say 27 of 379 observations shown, when in fact only 25 of them are shown?
Please see this basi-R example with a data.frame
.
d <- data.frame(a = sample(1:100))
d$a_strat <- cut(d$a, breaks=seq(1,100, by=10)) # stratify by 10
e <- d[,c('a_strat')]
> str(d$a_strat)
Factor w/ 9 levels "(1,11]","(11,21]",..: 2 6 1 8 6 9 5 3 NA 9 ...
> str(e)
Factor w/ 9 levels "(1,11]","(11,21]",..: 2 6 1 8 6 9 5 3 NA 9 ...
You see the labels for levels ar not lost. But when I do the same with a memisc:data.set
they are lost.
d <- data.set(a = sample(1:100))
d$a_strat <- cut(d$a, breaks=seq(1,100, by=10))
e <- d[,c('a_strat')]
> str(d$a_strat)
Factor w/ 9 levels "(1,11]","(11,21]",..: 4 9 3 1 NA 9 5 4 9 9 ...
> str(e)
Data set with 100 obs. of 1 variable:
$ a_strat: Nmnl. item w/ 9 labels for 1,2,3,... int 4 9 3 1 NA 9 5 4 9 9 ...
What is behind that behaviour?
The help text for memisc::cases() states for the parameter check.xor
: "checks, whether the case conditions are mutually exclusive and exhaustive".
In case the conditions are not exhaustive check.xor="ignore"
does not work. It does issue a warning:
> x <- c(1,2)
> memisc::cases(
+ "1"=x==1,
+ "2"=x==2,
+ "3"=x==3,
+ check.xor="ignore"
+ )
[1] 1 2
Levels: 1 2 3
Warning message:
In memisc::cases(`1` = x == 1, `2` = x == 2, `3` = x == 3, check.xor = "stop") :
condition x == 3 is never satisfied
in cases.R
one should probably change (similarly to the check for done
):
if(any(never) && check.xor!="ignore"){
msg <- switch(check.xor,warn=warning,stop=stop)
neverlab <- deflabels[never]
if(length(neverlab)==1)
msg("condition ",neverlab," is never satisfied")
else
msg("conditions ",paste(neverlab,collapse=", ")," are never satisfied")
}
format_html
currently does not output any information regarding character encoding used in the output. In that case, ISO-8859-1 is assumed. But R strings are typically in UTF-8. This means any non-ASCII characters are not interpreted correctly by web browsers.
This can easily be fixed by adding the following output to the HTML:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
My use case is that I'm using the UK Labour Force Survey, and I want to pool a number of datasets.
I only want to load a small number of variables for analysis, but the variable names change between datasets depending on the naming convention. The weight variable changes from time to time (pwt16, pwt14, pwt11 etc).
I'd like to load a list of datasets, do a set of transformations to common form, and then pool the datasets for analysis using the survey or svryr packages.
When I try to use the subset specification as a pre-prepared character vector, I get
Error in max(sapply(args, length)) : invalid 'type' (list) of argument.
library(magrittr)
library(memisc)
array(as.character(1:12), dim = c(2,2,3), dimnames = list(1:2, letters[1:2], LETTERS[1:3])) %>%
ftable %>% toLatex
## Error in hleaders[n.col.vars, 1:n.row.vars] <- names(row.vars) :
## number of items to replace is not a multiple of replacement length
array(as.character(1:12), dim = c(2,2,3), dimnames = list(X=1:2, Y=letters[1:2], Z=LETTERS[1:3])) %>%
ftable %>% toLatex
## ...works
When importing with the following syntax, description works but i get an error generating the codebook.
ZA <- spss.system.file(Alt_in)
description(ZA)
codebook(ZA)
The error generated says:
"Error in if (ncol(descr) > 1) { : argument is of length zero"
If i ask for a codebook on a specific variable, it works.
I'm sure it's something simple I'm missing?
For some very simple tables (with only one variable, counting the number of appearances of that variable), I use xtabs to generate the table and then export it with ftable
. However, doing so adds no less than seven decimal digits.
What could be the reason for this? How could the decimal digits be made to disappear in the finale, LaTeX-ready table (without manually removing them in the .tex-file)?
stops with an error, tested with 34e66cd. What am I doing wrong?
library(memisc)
#> Loading required package: lattice
#> Loading required package: MASS
#>
#> Attaching package: 'memisc'
#> The following objects are masked from 'package:stats':
#>
#> contr.sum, contr.treatment, contrasts
#> The following object is masked from 'package:base':
#>
#> as.array
format_html(codebook(iris[5]))
#> Error in tab[, 3]: subscript out of bounds
Created on 2018-01-12 by the reprex package (v0.1.1.9000).
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.3 (2017-11-30)
#> system x86_64, linux-gnu
#> ui X11
#> language en_US
#> collate en_US.UTF-8
#> tz Europe/Busingen
#> date 2018-01-12
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 cran (@1.1.2)
#> base * 3.4.3 2017-12-01 local
#> car 2.1-5 2017-07-04 CRAN (R 3.4.1)
#> compiler 3.4.3 2017-12-01 local
#> datasets * 3.4.3 2017-12-01 local
#> devtools 1.13.4 2017-11-09 CRAN (R 3.4.2)
#> digest 0.6.13 2017-12-14 CRAN (R 3.4.3)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
#> graphics * 3.4.3 2017-12-01 local
#> grDevices * 3.4.3 2017-12-01 local
#> grid 3.4.3 2017-12-01 local
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.1)
#> knitr 1.18 2017-12-27 CRAN (R 3.4.3)
#> lattice * 0.20-35 2017-03-25 CRAN (R 3.4.1)
#> lme4 1.1-13 2017-04-19 CRAN (R 3.4.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.3)
#> MASS * 7.3-47 2017-04-21 CRAN (R 3.4.1)
#> Matrix 1.2-10 2017-04-28 CRAN (R 3.4.1)
#> MatrixModels 0.4-1 2015-08-22 CRAN (R 3.4.0)
#> memisc * 0.99.15 2018-01-12 Github (melff/memisc@34e66cd)
#> memoise 1.1.0 2017-08-07 Github (hadley/memoise@d63ae9c)
#> methods * 3.4.3 2017-12-01 local
#> mgcv 1.8-17 2017-02-08 CRAN (R 3.4.0)
#> minqa 1.2.4 2014-10-09 CRAN (R 3.4.0)
#> nlme 3.1-131 2017-02-06 CRAN (R 3.4.0)
#> nloptr 1.0.4 2014-08-04 CRAN (R 3.4.0)
#> nnet 7.3-12 2016-02-02 CRAN (R 3.4.0)
#> parallel 3.4.3 2017-12-01 local
#> pbkrtest 0.4-7 2017-03-15 CRAN (R 3.4.0)
#> quantreg 5.33 2017-04-18 CRAN (R 3.4.0)
#> Rcpp 0.12.14.5 2018-01-11 local
#> repr 0.12.0 2017-04-07 CRAN (R 3.4.0)
#> rmarkdown 1.8 2017-11-17 CRAN (R 3.4.3)
#> rprojroot 1.3-2 2018-01-03 local (krlmlr/rprojroot@851d293)
#> SparseM 1.77 2017-04-23 CRAN (R 3.4.0)
#> splines 3.4.3 2017-12-01 local
#> stats * 3.4.3 2017-12-01 local
#> stringi 1.1.6 2017-11-17 CRAN (R 3.4.3)
#> stringr 1.2.0 2017-02-18 CRAN (R 3.4.1)
#> tools 3.4.3 2017-12-01 local
#> utils * 3.4.3 2017-12-01 local
#> withr 2.1.1.9000 2017-12-30 Github (r-lib/withr@df18523)
#> yaml 2.1.16 2017-12-12 CRAN (R 3.4.3)
Hello,
A question related to the memisc
package. Is it possible to present the exponentiated coefficients of a GLM model? I used the mtable()
function to exhibt the results from a model from the mclogit()
and lme4()
package, and tried to see if there was an option to report the estimates as odds ratio.
Thanks you very much.
Hi, I get an out of memory error
> dataset2003<- as.data.set(spss.portable.file("../rawData/census/f463/f463ind.por"))
Error in readStringPorStream(pstream) :
cannot allocate memory block of size 16777216 Tb
Is there any way around this? I want to import 15 files of this size and can't even do 1.
Extending mtable for a new model class "cls" currently requires specifying a corresponding "summary.stats.cls" option in addition to a getSummary.cls
method, otherwise no summary statistics are displayed with the default summary.stats = TRUE
argument of mtable()
. From what help("mtable")
says about summary.stats
, I would have expected that all summary statistics from getSummary.cls()
are reported when no such option has been defined.
A minimal reproducible example follows:
lm0 <- lm(sr ~ pop15 + pop75, data = LifeCycleSavings)
class(lm0) <- "cls"
getSummary.cls <- function (obj, ...) {
class(obj) <- "lm"
getSummary.lm(obj, ...)
}
mtable(lm0, summary.stats = TRUE) # summary statistics are not shown
mtable(lm0, summary.stats = "N") # works
oopt <- options("summary.stats.cls" = getOption("summary.stats.default"))
mtable(lm0, summary.stats = TRUE) # now the defaults (logLik and N) are shown
options(oopt)
The reason is that "summary.stats.default" will not be selected in selectSummaryStats
:
Lines 85 to 93 in 977c022
The condition length(sumstats)
equals the length of the class vector class(x)
, which will contain at least one element for a new model class, so the option "summary.stats.default" won't come into play and "summary.stats.cls" needs to be specified to avoid a NULL result.
Should the condition be replaced by something like any(!vapply(sumstats, is.null, TRUE))
?
Statistics Canada's summer release of the Survey of Financial Security seems fine. There are two datasets. A large special weight dataset which processes fine and an economic family dataset which has issues. The spss.fixed process seems to stop midway through the first record. I tried it with only the record layout as well as with the full set of files. I have attached the efam set in the zip file. Suggestions as to how to move forward would be appreciated.
sfs2016_for_elff.zip
Since I am often working with surveys the memisc library looks very promising. I always work with data.table and there seems to be an issue however.
When I generate a codebook with memisc on a data.table, there is the following issue: when I type the name of the object (say, mtcars) an error is thrown, see code below.
library(memisc)
library(data.table)
data(mtcars)
setDT(mtcars, keep.rownames= T)
mtcars = within(mtcars, {
description(vs) = "whatever"
description(am) = "something unclear"
description(carb) = "something different"
wording(vs) = "this is going to be a long comment"
labels(vs) = c(
"many" = 1,
"not so many" = 0
)
labels(carb) = c(
"one" = 1,
"two" = 2,
"three" = 3
)
missing.values(carb) = c(4,6, 8)
})
codebook(mtcars) %>% show_html
# annotation(mtcars) = "my long story"
# if data.frame = error and annotation(mtcars) = "....text...." then message "Error in if (nzchar(nm.i)) { : argument is of length zero"
# annotation(mtcars)
# if data.table: throws error "Error in format.item(char.trunc(col), justify = justify, ...) :
# unused argument (justify = justify)
mtcars
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.04
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] Hmisc_3.17-0 ggplot2_2.0.0 Formula_1.2-1 survival_2.38-3 magrittr_1.5 DescTools_0.99.15 manipulate_1.0.1 memisc_0.99.3 MASS_7.3-44 lattice_0.20-33 data.table_1.9.7
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 nloptr_1.0.4 RColorBrewer_1.1-2 plyr_1.8.3 tools_3.2.3 rpart_4.1-10 boot_1.3-17 lme4_1.1-9 nlme_3.1-122 gtable_0.1.2 mgcv_1.8-7 Matrix_1.2-2
[13] parallel_3.2.3 mvtnorm_1.0-3 SparseM_1.7 proto_0.3-10 gridExtra_2.0.0 cluster_2.0.3 MatrixModels_0.4-1 nnet_7.3-11 foreign_0.8-66 latticeExtra_0.6-26 minqa_1.2.4 car_2.0-25
[25] scales_0.3.0 splines_3.2.3 rsconnect_0.4.1.11 pbkrtest_0.4-2 colorspace_1.2-6 quantreg_5.19 acepack_1.3-3.3 munsell_0.4.2 chron_2.3-47
Is there anyway to work past such an error?
m2003 <- spss.portable.file("../rawData/census/f466/f466ind.por")
[1] "f basic metalS/21/Manufacture of metal products (excl. machinery and equipment)T"
[2] "/18/Manufacture of machinery and equipment10/20/Manufacture of office and accoun"
Hide Traceback
Error in parseHeaderPorStream(ptr) : unknown tag "T" found in line 65 offset 54
stop("unknown tag ", dQuote(tag.code), " found in line ", currline, " offset ", offset)
parseHeaderPorStream(ptr)
The function memisc::subset()
does not allow us to use string variable as parameter for select. Currently, you can't do:
sav <- memisc::spss.system.file(filename)
vars = c("var1", "var2")
data = memisc::subset(sav, select=vars)
That simple limitation is a little counterproductive and makes the package less portable. I am proposing a modification in the file importer-methods.R
below, in the function setMethod("subset","importer",...
. The modification preserves the current way the subset works, but now also accepts a string vector in the select parameter of the function subset in the package. I posted it here instead or using a pull request because this little piece of code is sufficient to do the job.
CURRENT CODE:
setMethod("subset","importer",
function (x, subset, select, drop = FALSE,
...)
....
nl <- as.list(1:nvars)
names(nl) <- names
cols <- logical(nvars)
cols[eval(substitute(select), nl, parent.frame())] <- TRUE
select.vars <- sapply(substitute(select)[-1],as.character)
....
{
PROPOSED MODIFICATION:
setMethod("subset","importer",
function (x, subset, select, drop = FALSE,
...)
....
nl <- as.list(1:nvars)
names(nl) <- names
cols <- logical(nvars)
if (class(substitute(select)) == 'call') {
cols[eval(substitute(select), nl, parent.frame())] <- TRUE
select.vars <- sapply(substitute(select)[-1],as.character)
}else{
select.vars = select
cols[which(names(nl) %in% select.vars )] = TRUE
}
....
{
Since an update in memisc (somewhere between 0.99.22 and 0.99.28), the following code returns an error:
foo <- as.item(c(0,1,1,-1), missing.values= 0, labels = structure(c(-1,0,1), names=c('Yes', 'PNR', 'No')))
bar <- as.factor(as.character(foo))
bar <- relevel(bar, 'PNR')
Before the update (I had 0.99.22), we had: foo = c('PNR', 'Yes', 'Yes', 'No')
while now we have foo = c(NA, 'Yes', 'Yes', 'No')
.
This is a weird behavior because the point of memisc is to distinguish missing values from NA
.
Can you revert to the previous behavior?
When one of the models that are being printed with mtable_format_latex() has a LaTeX special character, LaTeX won't compile the table. Of course, using those special characters in model names can easily be avoided by the user - but could mtable_format_latex() throw a warning when it detects such issues?
Hi,
It would appear that attaching both the memisc and the tibble R packages causes certain functions to print text about caches and NAMESPACES. For example, I've included a reproducible version below (tested with memisc CRAN version 0.99.25.6 and GitHub version 0.99.26.3):
# load packages
library(tibble)
library(memisc)
# create data
d <- tibble(value = seq(0, 100))
# subset data
subset_d <- subset(d, d$value < 50)
# Found more than one class "tbl_df" in cache; using the first, from namespace 'tibble'
# Also defined by ‘memisc’
# Found more than one class "tbl_df" in cache; using the first, from namespace 'tibble'
# Also defined by ‘memisc’
Although this text isn't a warning -- and nor is it an error message -- it could potentially cause issues from some users? One of the packages I contribute to had a similar issue a while ago, and @davidcanarte reported that they are currently experiencing this issue with the memisc R package and was looking for a fix. I believe this issue is due to defining tbl_df
as a S4 class in the memisc R package (i.e. setOldClass("tbl_df")
) when the tibble R package defines tbl_df
as an S4 class? Therefore, a potential fix could involve (1) removing the setOldClass("tbl_df")
code and (2) updating the NAMESPACE file to import the tbl_df
S4 class from the tibble R package. This would also require (3) listing the tibble R package under Imports
and not Enhances
in the DESCRIPTION file. What do you think?
I've verified that this approach fixes the issue locally (i.e. by running the example code above on an updated fork of the memisc GitHub repository). In case this is helpful, I've submitted a PR with the proposed fix. I've bumped the version to 0.99.26.4, but please let me know if you need any additional updates to merge the PR?
I've noticed a failure when trying to export a codebook to HTML. It turns out this can be reproduced simply by having a character variable in the data frame:
> write_html(codebook(data.frame(x="a")), file="codebook.html")
Error in is.finite(x) : default method not implemented for type 'list'
This happens on git master too.
Does the memisc package contain example datasets like the R-inbuild-datasets (e.g. mtcars
, iris
)?
If not, it would be nice to have them. Of course they should use/demonstrate the memisc-specific data-types/classes.
Thanks for the nice package. I just had an issue with a new improvement that you listed in the NEWS file for 0.99 under Improvements, namely:
toLatex() methods optionally escape dollar, subscript and superscript symbols.
This is great for some cases I am sure, but for my case it meant that my paper written a while ago (which has math symbols in table headers) could not be knitted without error anymore. It also seems that this is set via a global argument
toLatex.escape.tex
but the toLatex() method (in this case for an ftable) does not expose this as an argument. I was able to fix the issue with
options(toLatex.escape.tex = FALSE)
after debugging a few times. However, it would be much nicer if this argument was directly accessible in the function toLatex() itself, and documented there.
So the proposal is to have an argument in the toLatex() methods for changing this behavior, together with documentation.
For backwards compatibility, would it be an option to have toLatex.escape.tex = FALSE by default?
Sorry for not submitting a patch, but I need to work on the referenced paper now....
Thanks!
Pieter
There should be a forum or a mailinglist related to memisc
.
Currently I see no good way to ask memisc
-related questions. GitHub-issuses are for development bug-reports etc and not for support questions.
StackOverflow doesn't over a memisc
tag currently.
This is the currently opened question http://stackoverflow.com/questions/41208734/how-to-drop-labels-from-a-memiscdata-set-in-r.
I'm having issues with relabeling items using argument gsub = TRUE. The regular expression produces incorrect matches. For example trying to relabel c to foo by regular expression also relabels a, which should not be matched by c at all:
f <- as.factor(rep(letters[1:4],5))
f <- as.item(f)
f2 <- relabel(f, c = 'foo', gsub = TRUE)
labels(f)
Values and labels:
1 'a'
2 'b'
3 'c'
4 'd'
labels(f2)
Values and labels:
1 'foo'
2 'b'
3 'foo'
4 'd'
I expected following result:
Values and labels:
1 'a'
2 'b'
3 'foo'
4 'd'
Is it too much to ask that memisc
would be able to cope with duplicated labels?
myurl <- "http://hansekbrand.se/temp/BUG.SAV"
z <- tempfile()
download.file(myurl,z,mode="wb")
my.meta.data <- spss.system.file(z)
Warning message:
1 variables have duplicated labels:
HML23
> my.df <- as.data.frame(my.meta.data)
Error in as.factor(x) : Duplicate labels
file.remove(z)
I work a lot with data from the Demographic and Health Surveys (DHS), and some of those files are so big that importing them with read.spss()
requires amounts of RAM not found in most computers. Many or even most of the hundreds of files from DHS have duplicated labels in them. To get memisc
working with such files would really help my work, currently I have a computer with 68 GB RAM so I manage, but I want others to be able to use my code.
Kind regards,
Hans Ekbrand, university of Gothenburg, Sweden.
The Statcan SPSS file that I am processing (community health survey) has two variables with duplicate labels. If these variables are in the dataset that is used a source for xtabs, even if the variables are not in the xtab, there is an error message about factor problems and R hangs in a loop).
Hi Martin:
In my experience so far, It seems as though variable names exceeding 8 characters are trimmed to 8 characters when importing from SPSS .sav files to R. Any subsequent variable names sharing the same initial 8 characters are subsequently coerced into unique, but generic, strings (e.g., longername1 -> longerna, longername2 -> V2_a, longername3 -> V3_a).
I'm wondering if perhaps there is something I'm overlooking or doing incorrectly, as I've seen no mention of variable name truncation in the documentation or any of the discussions I've read. I did go as far as to read the pspp-system-for-R.c code where I saw the following call in line 412: trim(curr_var.name,8). I don't know if this bit of code is relevant to my query, but it did make me wonder if the truncation of variable names was by design and static (i.e., there is no option to turn it off).
If it is not an intentional feature, would you have any ideas about why my variable names are being trimmed? If this, however, is a feature, would altering it be a possibility in the future?
Thanks!
-Mike
The variable descriptions in an SPSS dataset contain imbedded quotes
Example
abc 'The rain doesn''t fall on Sunday '
the result is a missing vbl error message.
I could not get your software to process it unless I edited the duplicated single quotes to a pattern such as *.
I tried using ' instead of * but that did not work.
That is not a particular problem. I can edit the descriptions but I can't replace the original descriptions with the edited ones. I am forgetting how to do bulk edits of
your description. Do I have to use some kind of "for" loop?
I tried to do an edit with
descriptions(my.,ds)<-edited_descriptions_as_character_array by this does not work.
string_replacement_problem.pdf
is there a simple way to omit specific covariates (ie. controls variables) from the table?
[memisc 0.99.14.12]
[R version 3.3.1]
[openSuSE leap]
Loading an SPSS .sav
file with
library("memisc")
data <- as.data.set(spss.system.file("foo.sav"))
results in
"NewSysFile" not resolved from current namespace (memisc)
Note that memisc
has been installed into a subdirectory of ~/R/x86_64-suse-linux-gnu-library/3.3
, which .libPath()
properly shows.
Is this a mistake or problem on my side?
Hello. This is something I ran into while working with SPSS files.
The issue can be reproduced as follows:
example(TukeyHSD) # works fine
library(memisc)
example(TukeyHSD)
Error in FUN(X[[i]], ...) : subscript out of bounds
I tried to install.packages("memsic")
but got the message Paket ‘memsic’ ist nicht verfügbar (for R version 3.3.1)
(means not available for R 3.3.1).
I am using Siduction (Debian GNU/Linux unstable) with R version 3.3.1 (2016-06-21)
.
Hi, great package, love your work.
I'm wondering if there is any straight-forward way of manipulating content from codebook(). It's producing more output than I want in my documentation.
How do you suggest getting codebook() output without information on the following objects:
-storage mode & measurement
-description
-N and percent
Thanks <3
memisc appears to be lacking a dedicated codebookEntry for the "dateitem.item " class, instead defaulting to "atomic" which uses the R's "summary" method. Because this function does not provide information on missing values, codebookEntry appears to be missing information on missings in date-Variables as well.
Here is some quick code, i used to patch this after the package is loaded. It's not perfect but it seems to provide the needed information.
# set the method after loading the package
setGeneric("codebookEntry", getGeneric("codebookEntry", package="memisc"))
setMethod("codebookEntry","datetime.item",function(x){
spec <- c(
"Storage mode:"=storage.mode(x)
)
isna <- is.na(x)
stats <- summary(x)
stats <- list(descr=cbind(names(stats),paste(stats), NAs=sum(isna)))
new("codebookEntry",
spec = spec,
stats = stats,
annotation = annotation(x)
)
})
What do you think?
Hi,
I have been having troubles installing memisc
from sources on a system where I am stuck with R version 3.0.2 (2013-09-25). It seems that the usage of anyNA
, introduced with commit [https://github.com/melff/memisc/commit/dd3da28045220e5b3726cfc0f1b56cdaeda87b0c] in function [.mtable
breaks compatibility with R versions < 3.1.
I therefore suggest that the R dependency in DESCRIPTION is bumped up to 3.1.
Palmar
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.