jacobkap / asciisetupreader Goto Github PK

View Code? Open in Web Editor NEW

10.0 2.0 5.0 11.34 MB

R package to read fixed-width ASCII files using SPSS or SAS setup files

Home Page: https://jacobkap.github.io/asciiSetupReader/

License: Other

R 96.24% HTML 3.63% Scheme 0.13%

ascii fixed-width data-reader r sas dat spss fixed-width-text fixed-width-parser fixed-width-tables

asciisetupreader's Introduction

asciiSetupReader

Overview

Some (usually older) data sets are only available in fixed-width ASCII files (.txt or .dat) that have an .sps (SPSS) or .sas (SAS) setup file explaining to the software how to read that file. These file combinations are sometimes referred to as .txt+.sps, .txt+.sas, .dat+.sps, .dat+.sas. This package allows you to read in the data if you have both the fixed-width file and its accompanying setup file.

Installation

To install this package, use the code
install.packages("asciiSetupReader")


# The development version is available on Github.
# install.packages("devtools")
devtools::install_github("jacobkap/asciiSetupReader")

Usage

The parameters data and setup_file are the only ones requires to run the package though three optional parameters allow you to customize results.

data - A string containing the name of the data file

setup_file - A string containing the name of the setup file

Both files must be in your working directory or the string must contain the path to the file. Below is an example of reading in the example dataset - the original data and setup files can be found here.

Please note that I am only using system.file() here so the vignette builds in the package even not on my own computer. You will not use this in the function. Instead you’d simply input data = "example_data.zip" and setup_file = "example_setup.sps". The data file does not have to be in a zip folder, it is only in a zip folder here to reduce the size of this package. In most cases it will be a .dat or a .txt file.

data <- system.file("extdata", "example_data.zip",
             package = "asciiSetupReader")
setup_file <- system.file("extdata", "example_setup.sps",
             package = "asciiSetupReader")

example <- asciiSetupReader::read_ascii_setup(data = data,
setup_file = setup_file)
example[1:6, 1:4] # Look at first 6 rows and first 4 columns
#>   IDENTIFIER_CODE NUMERIC_STATE_CODE ORI_CODE             GROUP
#> 1 SHR master file            Alabama  AL00112 Cit 50,000-99,999
#> 2 SHR master file            Alabama  AL00112 Cit 50,000-99,999
#> 3 SHR master file            Alabama  AL00112 Cit 50,000-99,999
#> 4 SHR master file            Arizona  AZ00189       Cit < 2,500
#> 5 SHR master file            Arizona  AZ00189       Cit < 2,500
#> 6 SHR master file            Arizona  AZ00189       Cit < 2,500

asciisetupreader's People

Contributors

Stargazers

Watchers

Forkers

kashenfelter randomcriticalanalysis jimhester m-mburu jfontestad

asciisetupreader's Issues

Error in grep2

Hi! Thanks a lot for developing this package, this is exactly what I have been looking for!

I am trying to import a French survey on risk perceptions available here: https://barometre.irsn.fr/graphiques/
I have downloaded one wave for an example (two files: .dat and .sps) that you can find here: https://github.com/sophiecetre/Survey_ascii

I get the same error that others have already pointed, although the fixes you did don't seem to work for me.

My code is the following:

df <- asciiSetupReader::read_ascii_setup(data ="Barometre-55.dat",
                                             setup_file = "Barometre-55.sps")

And I get the following error:

Error in grep2("DATA LIST|/VARIABLES =$", codebook):second_grep_value : 
  argument of length 0

Thanks a lot!
Sophie

Weird error when start column > end column

Hi. @deholliday and I ran into an unusual error using asciiSetupReader when the SPS file used to contain the column information itself contains an error. if any column has a start value > its end value (e.g. start 2000, end 1980), asciiSetupReader's read_data function will read in the data fine, but the data.table:::as.data.table line will error, producing the useless error message "Error in copy(x) : R character strings are limited to 2^31-1 bytes", which doesn't really highlight the cause of the error.

The bug is actually downstream in the vroom package (see tidyverse/vroom#217) but I think generally it's good for every package in the supply chain to have appropriate error checking and friendly user-facing errors.

The actual data we are working with is private so I can't easily produce a reprex that goes through asciiSetupReader, but I think if you look at the vroom version of the issue you'll see a pretty clear reprex with toy data.

Error: `labels` must be unique

I'm trying to use spss_ascii_reader to read in the 1970-1973 Dutch Parliamentary Election Study (available here https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/7261/).

I use the following files from that webpage:
07261-0001-Setup.sps
07261-0001-Data.txt

and the following syntax:

dpes_7073 <- spss_ascii_reader(dataset_name = "07261-0001-Data.txt", sps_name = "07261-0001-Setup.sps")

and I get the following errors:

Error: `labels` must be unique In addition: Warning messages: 1: In matrix(value_label_section, ncol = 2, byrow = TRUE) : data length [123] is not a sub-multiple or multiple of the number of rows [62] 2: In matrix(value_label_section, ncol = 2, byrow = TRUE) : data length [123] is not a sub-multiple or multiple of the number of rows [62] 3: In matrix(value_label_section, ncol = 2, byrow = TRUE) : data length [123] is not a sub-multiple or multiple of the number of rows [62]

I was able to read in this dataset in a previous version of either this package or the haven package (which I believe this package relies on). I've updated to the latest github versions of both packages and I'm still getting these errors.

Thanks!

Error in read_ascii_setup; missing True/False value

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem
Hey Jacob! I'm trying to use the read_ascii_setup function on some data from the JARP dataset (https://www.icpsr.umich.edu/web/NACDA/studies/8450/summary); specifically the files data1.sps and data1.txt files (https://github.com/lucarmich/JARPData).

The basic code I am running is this:

Data <- read_ascii_setup(data="Data1.txt", setup_file = "Data1.sps")

And the error I am getting is:

Error in if (any(setup$begin > setup$end)) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In get_column_spaces(setup, variables, codebook) :
NAs introduced by coercion

I've also tried running the spss_acii_reader command, but it returned the same error message back.

Any help would be appreciated! Thanks!

Lucas

Failure with dev haven

I see the following errors which all have the same root cause: haven now checks that labels are unique, and you're producing non-unique labels. Can you please take a look and see if you can get a fix to CRAN soon?

checking examples ... ERROR

...
+   package = "asciiSetupReader")
> 
> ## Not run: 
> ##D example <- read_ascii_setup(data = dataset_name,
> ##D   setup_file = sps_name)
> ##D 
> ##D 
> ##D # Does not fix value labels
> ##D example2 <- read_ascii_setup(data = dataset_name,
> ##D   setup_file = sps_name, use_value_labels = FALSE)
> ##D 
> ##D # Keeps original column names
> ##D example3 <- read_ascii_setup(data = dataset_name,
> ##D   setup_file = sps_name, use_clean_names = FALSE)
> ## End(Not run)
> 
> # Only returns the first 5 columns
> example4 <- read_ascii_setup(data = dataset_name,
+   setup_file = sps_name, select_columns = 1:5)
Error: `labels` must be unique
Execution halted

checking tests ...

 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Complete output:
  > library(testthat)
  > library(asciiSetupReader)
  > 
  > test_check("asciiSetupReader")
  Error: `labels` must be unique
  Execution halted

checking re-building of vignette outputs ... WARNING

Error in re-building vignettes:
  ...
Quitting from lines 20-28 (Introduction_to_asciiSetupReader.Rmd) 
Error: processing vignette 'Introduction_to_asciiSetupReader.Rmd' failed with diagnostics:
`labels` must be unique
Execution halted

Error in `validate_labelled()`:

I am trying to read the NIBRS extract file from ICPSR. However, when I run the code I get the following error:

Error in validate_labelled():
! labels must be unique.

My code looks like this:

data11_1 = ("xaa.txt")
setup_file11 = ("34603-0004-Setup.sps")
dt11_1 = read_ascii_setup(data = data11_1, setup_file = setup_file11)

I am using the National Incident-Based Reporting System, 2011: Extract Files (ICPSR 34603) (DS0004) from ICPSR. The code was working perfectly before. I am not sure what the issue is. I had to split the large file into multiple txt files to help speed the process.

Error in parse_setup() to parse_missing_sps()

parse_setup() calls parse_missing_sps() if any(grepl2("MISSING VALUE", codebook)).
However, parse_missing_sps() sets start <- grep2("MISSING VALUES$", codebook).

My codebook has one line that contains "MISSING VALUE": * SPSS MISSING VALUE RECODE *.
However, it doesn't contain "MISSING VALUES$". As a result, parse_missing_sps() is called but fails because start has length 0.

Grep error

Hi, I noticed that the grep error was discussed and issue had been closed. I am facing a similar problem now - I have an .asc file and an SPSS setup file, and I hope to output it into something I can analyse using R.

library(asciiSetupReader)
library(readr)
d <- read_file("CA_SEDD_2005_AHAL_trimmed.asc")
s <- read_file("CA_SEDD_2005_AHAL.sps")
output <- read_ascii_setup(data = d, setup_file = s)

However, when I run the program, the error says 'Error in grep2("^INPUT$", codebook):second_grep_value :
argument of length 0'. Is there anything wrong with my codes? Thanks!

read_ascii_setup function no longer a part of the package

It looks like the read_ascii_setup function is no longer a part of the package, but the README.Rmd does not reflect this. This code returns an error that the function could not be found:

install.packages("asciiSetupReader")
library(asciiSetupReader)
read_ascii_setup()

grep error with spss_ascii_reader()

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Hi Jacob,

I'm attempting to use the spss_ascii_reader function on some data from the PSID. Particularly, the 2016 well-being and daily life survey (https://simba.isr.umich.edu/VS/f.aspx). I've re-posted the data on my github for your ease here: https://github.com/dwiwad/well-being-and-daily-life/tree/master/WB2016

I've been using the WB2016.txt and WB2016.sps files (though I tried the sas version as well. I've also tried various iterations of code (i.e., just choosing certain columns), as well as both the CRAN and devtools versions of asciiSetupReader.

The basic code I am running is:

d <- spss_ascii_reader(dataset_name = 'WB2016.txt', sps_name = 'WB2016.sps')

and the error I am gettinng is:

Error in grep2("^variable labels$", codebook):grep2("^value labels$|missing values|^execute$", :
NA/NaN argument

Any help would be appreciated! Thank you!