alisonlanski / ipedsuploadables Goto Github PK
View Code? Open in Web Editor NEWProducing uploadable txt files for IPEDS reporting, one submission at a time
Home Page: https://alisonlanski.github.io/IPEDSuploadables/
License: Other
Producing uploadable txt files for IPEDS reporting, one submission at a time
Home Page: https://alisonlanski.github.io/IPEDSuploadables/
License: Other
to avoid processing issues from upload, can use "toupper" on colnames; have to adjust scripts to match
should we have an option to format already aggregated data? would people use that? ask at air forum
Reported by Calumet College of St. Joseph:
I have received errors running the outcomes measure report with the sample data. Running produce_om_report() with part=”all” or part=”d” produces the following errors:
produce_om_report(om_students, part = "ALL", format = "uploadable")
Error in dplyr::mutate()
:
! Problem while computing `..1 =
dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.
Caused by error in across()
:
! Problem while computing column UNITID
.
Caused by error in stop_vctrs()
:
! Can't convert replace
to match type of data
.
Run rlang::last_error()
to see where the error occurred.
Here is the output of rlang::last_error()
<error/dplyr:::mutate_error>
Error in dplyr::mutate()
:
! Problem while computing `..1 =
dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.
Caused by error in across()
:
! Problem while computing column UNITID
.
Caused by error in stop_vctrs()
:
! Can't convert replace
to match type of data
.
Backtrace:
IPEDSuploadables::produce_om_report(...)
tidyr:::replace_na.default(UNITID, 0)
vctrs::vec_assign(data, missing, replace, x_arg = "data", value_arg = "replace")
vctrs <fn>
()
vctrs::vec_default_cast(...)
vctrs::stop_incompatible_cast(...)
vctrs::stop_incompatible_type(...)
vctrs:::stop_incompatible(...)
vctrs:::stop_vctrs(...)
Run rlang::last_trace()
to see the full context.
The output of rlang::last_trace()
rlang::last_trace()
<error/dplyr:::mutate_error>
Error in dplyr::mutate()
:
! Problem while computing `..1 =
dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.
Caused by error in across()
:
! Problem while computing column UNITID
.
Caused by error in stop_vctrs()
:
! Can't convert replace
to match type of data
.
Backtrace:
▆
├─IPEDSuploadables::produce_om_report(...)
│ ├─IPEDSuploadables::write_report(...)
│ └─IPEDSuploadables::make_om_part_D(df = students)
│ └─... %>% ...
├─dplyr::transmute(...)
├─dplyr::mutate(...)
├─dplyr:::mutate.data.frame(...)
│ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
│ ├─base::withCallingHandlers(...)
│ ├─base::withCallingHandlers(...)
│ └─mask$eval_all_mutate(quo)
├─tidyr::replace_na(UNITID, 0)
└─tidyr:::replace_na.default(UNITID, 0)
└─vctrs::vec_assign(data, missing, replace, x_arg = "data", value_arg = "replace")
└─vctrs `<fn>`()
└─vctrs::vec_default_cast(...)
└─vctrs::stop_incompatible_cast(...)
└─vctrs::stop_incompatible_type(...)
└─vctrs:::stop_incompatible(...)
└─vctrs:::stop_vctrs(...)
└─rlang::abort(message, class = c(class, "vctrs_error"), ...)
I have a new install of R (4.1.3 “One Push-Up”) and RStudio (2022.02.1+461 "Prairie Trillium") along with 2.3.5 version of your R package on my MacBook Pro. It may mean there has been some breaking change introduced with a newer library? I am a relative novice in R preferring to do most of my work in python/pandas instead.
Followup:
On the issue, I did a google search earlier today and found this link https://stackoverflow.com/questions/71227130/replace-na-of-numeric-columns-with-both-numeric-and-character-values-in-r
It looks like the problem could stem from replace_na with a tidy version 1.2.0 change.
look at HR for example and make Com match
filename issue
When running produce_report scripts, the filenames are not showing "PartX" but instead show "AllParts" because of how the list of arguments is passed (it pulls the default, which requires the user to change "full" to "part" which is silly because if they're asking for a part, they clearly want a part)
Expected behavior
Filenames show the part;.
pretty files produces incorrect output.
for when the setup fails or errors are received
IPEDS will accept students a multiple cips
The current import requirements do NOT allow for a single student to be reported at two levels under one cip code.
The current prep scripts also do NOT account for this,
Fall Enrollment is an unduplicated survey (1 row per person, so you have to pick a level to report at)
To Dos
Update the dummy data to have a case like this
Update the specs to be able to handle this situation
Update the prep scripts
turn it into a function (see the hr dummy data file) ; will have 2 functions or 2 returns one for regular cip and one for extracips and/or set that into two different functions
The FE1 Inclusion column is defined poorly in the vignette. I believe it should be the number of student STILL INCLUDED from the original cohort, instead of students who are ADDED, which the vignette seems to say.
After merge, check the vignettes against each other for colnames and update for consistent style/language in dummy data and scripts across all uploads
Don't save a local path; have the write_report function only run once per "produce" instead of after each part
Avoid saving a path locally but reduce annoyance by only calling "write report" once per "produce" function instead of one per part within the produce function. See the autoupload scripts for an example of how to implement this and/or call the autoupload to help with the final output instead of feeding directly into write_report.
Acceptance Criteria
One call to write_report/path per "produce" instead of per part.
The situation
Because OM, GR ,and GR_200 have related concepts of reporting, it might be the case that the data files schools would use are the same or similar. So it might be possible to have our import requirements use the same fields across these reports, which could reduce the prep burden.
Action Items
Investigate to see if this makes sense to pursue
if it does make sense, select the right setup
adjust GR google sheet import specs and saved data
adjust GR scripts
adjust GR_200 google sheet import specs and saved data
adjust GR_200 scripts
adjust OM google sheet import specs and saved data
adjust OM scripts
Replace IPEDS codes in prettyfiles with literals
Make the csv outputs easier to use for the reviewer by converting ipeds-required codes and column names and organization back to something closer to what you get on the actual IPEDS PDF submission screens.
Or just tell people to look at their reports after they upload it in IPEDS because then it's lovely
Why force people to install devtools?
Turns out the github install in devtools is taking is from the remotes package. Might as well list that as an option too since it may be more lightweight
Would need to edit the readme, probably the "how to set up" vignette, and rerun pkgdown
HR SOUTLAYs including medical?
Found by UNLV: I believe we may have found a bug in your package. It looks like the salary outlays that populate Part G4 are including records where IsMedical = 1 (but should only include Non-Medical). The employee counts in G4 look fine; it's just the salary piece. We installed the package yesterday, so I believe we are using the current version.
Verify that all packages have a df-specs vignette and also verify/create general "how to use the package" vignette(s) that incorporate all scripts currently available.
df-specs [update these to say "produce" instead of "make" functions?
how-to-use
IPEDS instructions state that the cip counts are NOT unduplicated
specs_FE requests 1 MajorCip per student for even-year CIP-based ipeds breakouts, but IPEDS actually allows students to be double-reported.
Will need to update intake files specs and code to account for this. Maybe use flags instead of a single column.
May also want to consider breaking specs into 3 groups:
-always required?
-for odd years?
-for even years?
Not vital to update for 2021 since CIPs are not reported this year, but should be updated overall.
Describe the bug
*make_ef1_part_D() will fail when there are either no non-degree students or no new non-degree students. The following filter is the problem:
dplyr::filter(.data$ISDEGREECERTSEEKING == 0 & .data$STUDENTLEVEL == "Undergraduate" &
(.data$ISFIRSTTIME == 1 | .data$ISTRANSFER == 1)) %>%
If the number of qualifying cases is zero, dplyr::summarize will throw the following error:
nms %in% c('"i", "x", "") are not all TRUE
Additional context
Perhaps create a dichotomous variable using the same criterion as the filter and take the sum:
partD <- df %>%
dplyr::select(.data$UNITID,
.data$ISDEGREECERTSEEKING,
.data$STUDENTLEVEL,
.data$ISFIRSTTIME,
.data$ISTRANSFER) %>%
dplyr::mutate(FTND = as.integer(.data$ISDEGREECERTSEEKING == 0 & .data$STUDENTLEVEL == "Undergraduate" &
(.data$ISFIRSTTIME == 1 | .data$ISTRANSFER == 1))) %>%
dplyr::group_by(.data$UNITID) %>%
dplyr::summarise(COUNT = sum(FTND, na.rm = T) %>%
dplyr::ungroup() %>%
#format for upload
dplyr::transmute(UNITID = paste0("UNITID=", .data$UNITID),
SURVSECT = "SURVSECT=EF1",
PART = "PART=D",
COUNT = paste0("COUNT=", .data$COUNT)
)
Add a toupper to tolower function to each prep script (or add a prep script that has this) and then change all colnames in the function to be entirely upper/lower
rename?
make sure it works with new versions of make_ functions (including produce functions?)
tie in with write_report? don't know if we need that or it can be totally separate
The code assumes all students start and complete in the same program-type. If a student changes from a 4-yr to a 1-yr program, for example, it does not handle their data correctly
Students should be reported in the section related to their starting-cohort program-type.
Within the correct section, their 150% completion should be reported for their actual program-type instead of their starting-program-type.
For example:
This is not the behavior of the current package.
Issue reported by University of Nebraska - Lincoln by Jason Casey
Please fill out all sections on here so we can assist you in a timely manner!
Describe the bug
write_report() does not produce BOTH types when format=="BOTH". It only produces the uploadable format.
Expected behavior
It should produce both the Readable and Uploadable files.
Let's make R optional for the users
Outcome measures throws a fatal error if cells are empty
Outcome measures needs to have the complete value set in the uploadable file. Missing rows of data will end up as fatal errors in the upload.
Let's eliminate the errors
These should be included and set to 0.
We need to add dummy expand_grids to the OM parts or the award function to capture this situation in the final output file.
Also should revisit the sort at the end
Describe the bug
When processing OM, there is only a Part B and Part C, so running a produce_report with part = all repeatedly keeps appending instead of starting a file fresh.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Keep this behavior for reports with part A, but adjust the write script for reports without a part A and/or create a generic dummy part A that is blank to process first for the write script
Additional context
don't know what other reports might have this problem, but we could look at the upload instructions proactively to try to find them
are datatypes and colnames correct?
is there unexpected missing data?
are data values in the allowable ranges for particular columns?
do any of the prep script recodes fail?
IPEDS 2020 has changes for Completions (no changes for HR -- whew!)
Let's get this thing official!
get_unitid sets it as numeric but in com partD we have to then reset it back to character
make this consistent through the code generally
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.