The ipedsuploadables from alisonlanski

Change colnames in IPEDS files to be all upper

to avoid processing issues from upload, can use "toupper" on colnames; have to adjust scripts to match

aggregated formatting

should we have an option to format already aggregated data? would people use that? ask at air forum

tidyverse update breaks OM part D

Reported by Calumet College of St. Joseph:

I have received errors running the outcomes measure report with the sample data. Running produce_om_report() with part=”all” or part=”d” produces the following errors:

produce_om_report(om_students, part = "ALL", format = "uploadable")

Error in dplyr::mutate():

! Problem while computing `..1 =

dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.

Caused by error in across():

! Problem while computing column UNITID.

Caused by error in stop_vctrs():

! Can't convert replace to match type of data .

Run rlang::last_error() to see where the error occurred.

Here is the output of rlang::last_error()

Error in dplyr::mutate():

! Problem while computing `..1 =

dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.

Caused by error in across():

! Problem while computing column UNITID.

Caused by error in stop_vctrs():

! Can't convert replace to match type of data .

Backtrace:

IPEDSuploadables::produce_om_report(...)
tidyr:::replace_na.default(UNITID, 0)
vctrs::vec_assign(data, missing, replace, x_arg = "data", value_arg = "replace")
vctrs <fn>()
vctrs::vec_default_cast(...)
vctrs::stop_incompatible_cast(...)
vctrs::stop_incompatible_type(...)
vctrs:::stop_incompatible(...)
vctrs:::stop_vctrs(...)

Run rlang::last_trace() to see the full context.

The output of rlang::last_trace()

rlang::last_trace()

Error in dplyr::mutate():

! Problem while computing `..1 =

dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.

Caused by error in across():

! Problem while computing column UNITID.

Caused by error in stop_vctrs():

! Can't convert replace to match type of data .

Backtrace:

▆

├─IPEDSuploadables::produce_om_report(...)
│ ├─IPEDSuploadables::write_report(...)
│ └─IPEDSuploadables::make_om_part_D(df = students)
│ └─... %>% ...
├─dplyr::transmute(...)
├─dplyr::mutate(...)
├─dplyr:::mutate.data.frame(...)
│ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
│ ├─base::withCallingHandlers(...)
│ ├─base::withCallingHandlers(...)
│ └─mask$eval_all_mutate(quo)
├─tidyr::replace_na(UNITID, 0)
└─tidyr:::replace_na.default(UNITID, 0)
└─vctrs::vec_assign(data, missing, replace, x_arg = "data", value_arg = "replace")
```
└─vctrs `<fn>`()
```
```
  └─vctrs::vec_default_cast(...)
```

    └─vctrs::stop_incompatible_cast(...)

      └─vctrs::stop_incompatible_type(...)

        └─vctrs:::stop_incompatible(...)

          └─vctrs:::stop_vctrs(...)

            └─rlang::abort(message, class = c(class, "vctrs_error"), ...)

I have a new install of R (4.1.3 “One Push-Up”) and RStudio (2022.02.1+461 "Prairie Trillium") along with 2.3.5 version of your R package on my MacBook Pro. It may mean there has been some breaking change introduced with a newer library? I am a relative novice in R preferring to do most of my work in python/pandas instead.

Followup:
On the issue, I did a google search earlier today and found this link https://stackoverflow.com/questions/71227130/replace-na-of-numeric-columns-with-both-numeric-and-character-values-in-r

It looks like the problem could stem from replace_na with a tidy version 1.2.0 change.

Update Com files with new version of write_report

look at HR for example and make Com match

using produce report to get a part uses wrong filename

filename issue
When running produce_report scripts, the filenames are not showing "PartX" but instead show "AllParts" because of how the list of arguments is passed (it pulls the default, which requires the user to change "full" to "part" which is silly because if they're asking for a part, they clearly want a part)

Expected behavior
Filenames show the part;.

Revise 12 Month Enrollment to include 2 year institutions

Check IPEDS documentation to see where there are overlaps with 4 year degree granting institutions
Figure out the lift to include 2 years
Check that parts line up
Check that fields line up

pretty files errors

pretty files produces incorrect output.

Revise Grad Rates to include 2 year institutions

Check IPEDS documentation to see where there are overlaps with 4 year degree granting institutions
Figure out the lift to include 2 years
Check that parts line up
Check that fields line up

create troubleshooting vignette

for when the setup fails or errors are received

save dummy data files as part of the package

Fall Enrollment needs to be able to handle multiple cips/levels

IPEDS will accept students a multiple cips
The current import requirements do NOT allow for a single student to be reported at two levels under one cip code.
The current prep scripts also do NOT account for this,

Fall Enrollment is an unduplicated survey (1 row per person, so you have to pick a level to report at)

To Dos
Update the dummy data to have a case like this
Update the specs to be able to handle this situation
Update the prep scripts

packageify the Completions Dummy Data file

turn it into a function (see the hr dummy data file) ; will have 2 functions or 2 returns one for regular cip and one for extracips and/or set that into two different functions

change Inclusion column in FE1

The FE1 Inclusion column is defined poorly in the vignette. I believe it should be the number of student STILL INCLUDED from the original cohort, instead of students who are ADDED, which the vignette seems to say.

Check df colnames across scripts for consistency

After merge, check the vignettes against each other for colnames and update for consistent style/language in dummy data and scripts across all uploads

add scripts for 12 month enrollment

Require path selection every time you run a report

Don't save a local path; have the write_report function only run once per "produce" instead of after each part

Avoid saving a path locally but reduce annoyance by only calling "write report" once per "produce" function instead of one per part within the produce function. See the autoupload scripts for an example of how to implement this and/or call the autoupload to help with the final output instead of feeding directly into write_report.

Acceptance Criteria
One call to write_report/path per "produce" instead of per part.

Check if GR, GR_200, and OM data structures should be similar

The situation
Because OM, GR ,and GR_200 have related concepts of reporting, it might be the case that the data files schools would use are the same or similar. So it might be possible to have our import requirements use the same fields across these reports, which could reduce the prep burden.

Action Items

Investigate to see if this makes sense to pursue
if it does make sense, select the right setup
adjust GR google sheet import specs and saved data
adjust GR scripts
adjust GR_200 google sheet import specs and saved data
adjust GR_200 scripts
adjust OM google sheet import specs and saved data
adjust OM scripts

Make pretty files prettier

Replace IPEDS codes in prettyfiles with literals
Make the csv outputs easier to use for the reviewer by converting ipeds-required codes and column names and organization back to something closer to what you get on the actual IPEDS PDF submission screens.

Or just tell people to look at their reports after they upload it in IPEDS because then it's lovely

write scripts for graduation rates 200

change/add install instructions to use remotes package

Why force people to install devtools?
Turns out the github install in devtools is taking is from the remotes package. Might as well list that as an option too since it may be more lightweight

Would need to edit the readme, probably the "how to set up" vignette, and rerun pkgdown

Medical in HR salaries

HR SOUTLAYs including medical?
Found by UNLV: I believe we may have found a bug in your package. It looks like the salary outlays that populate Part G4 are including records where IsMedical = 1 (but should only include Non-Medical). The employee counts in G4 look fine; it's just the salary piece. We installed the package yesterday, so I believe we are using the current version.

write scripts for institutional characteristics

Revise GR 200 to include < 4 year institutions

Check IPEDS documentation to see where there are overlaps with 4 year degree granting institutions
Figure out the lift to include 2 years
Check that parts line up
Check that fields line up

Check/Create Vignettes for full package

Verify that all packages have a df-specs vignette and also verify/create general "how to use the package" vignette(s) that incorporate all scripts currently available.

df-specs [update these to say "produce" instead of "make" functions?

how-to-use

turn readmes into vignettes (how to set up your data)

add scripts for outcome measures

FE should accept multiple cips per person

IPEDS instructions state that the cip counts are NOT unduplicated
specs_FE requests 1 MajorCip per student for even-year CIP-based ipeds breakouts, but IPEDS actually allows students to be double-reported.

Will need to update intake files specs and code to account for this. Maybe use flags instead of a single column.

May also want to consider breaking specs into 3 groups:
-always required?
-for odd years?
-for even years?

Not vital to update for 2021 since CIPs are not reported this year, but should be updated overall.

set up issues template

make_ef1_part_D failure

Describe the bug
*make_ef1_part_D() will fail when there are either no non-degree students or no new non-degree students. The following filter is the problem:

           dplyr::filter(.data$ISDEGREECERTSEEKING == 0 & .data$STUDENTLEVEL == "Undergraduate" &
                           (.data$ISFIRSTTIME == 1 | .data$ISTRANSFER == 1)) %>%

If the number of qualifying cases is zero, dplyr::summarize will throw the following error:

nms %in% c('"i", "x", "") are not all TRUE

Additional context
Perhaps create a dichotomous variable using the same criterion as the filter and take the sum:

  partD <- df %>%
           dplyr::select(.data$UNITID,
                         .data$ISDEGREECERTSEEKING,
                         .data$STUDENTLEVEL,
                         .data$ISFIRSTTIME,
                         .data$ISTRANSFER) %>%
           dplyr::mutate(FTND = as.integer(.data$ISDEGREECERTSEEKING == 0 & .data$STUDENTLEVEL == "Undergraduate" &
                                            (.data$ISFIRSTTIME == 1 | .data$ISTRANSFER == 1))) %>%
           dplyr::group_by(.data$UNITID) %>%
           dplyr::summarise(COUNT = sum(FTND, na.rm = T) %>%
           dplyr::ungroup() %>%
           #format for upload
           dplyr::transmute(UNITID = paste0("UNITID=", .data$UNITID),
                            SURVSECT = "SURVSECT=EF1",
                            PART = "PART=D",
                            COUNT = paste0("COUNT=", .data$COUNT)
                           )

add scripts for fall enrollment

add documentation about academic libraries prep

add scripts for student financial aid

Make all colnames in scripts UPPER or lower

Add a toupper to tolower function to each prep script (or add a prep script that has this) and then change all colnames in the function to be entirely upper/lower

add scripts for graduation rates

add scripts for finance

get pretty files working

rename?
make sure it works with new versions of make_ functions (including produce functions?)
tie in with write_report? don't know if we need that or it can be totally separate

Revise Fall Enrollment to support 2 year degree granting institutions

Check IPEDS documentation to see where there are overlaps with 4 year degree granting institutions
Figure out the lift to include 2 years
Check that parts line up
Check that fields line up

Add output & format to "PRODUCE" scripts

GR doesn't account for student who switch program-types

The code assumes all students start and complete in the same program-type. If a student changes from a 4-yr to a 1-yr program, for example, it does not handle their data correctly

Students should be reported in the section related to their starting-cohort program-type.
Within the correct section, their 150% completion should be reported for their actual program-type instead of their starting-program-type.

For example:

Of those who start in our 2-4 program, those who finish the 48-hour program within 150% (3 years, in their case) get counted in Completed150 and should show up in column/line 12 of section III. If they change majors and complete one of our bachelor’s degrees instead, they remain in the other degree-seeking cohort, but are reported in column 18 of section III.
Of those who start in the bachelor's degree-seeking cohort, those who finish their bacc degree in six years get reported in line/column 18, section II. Those who change majors to the pre-vet/pre-health programs and complete their 48 hours in 3 years get counted in line/column 12, section II.

This is not the behavior of the current package.

Issue reported by University of Nebraska - Lincoln by Jason Casey

write_report() does not produce BOTH types when format=="BOTH"

Please fill out all sections on here so we can assist you in a timely manner!

Describe the bug
write_report() does not produce BOTH types when format=="BOTH". It only produces the uploadable format.

Expected behavior
It should produce both the Readable and Uploadable files.

Embed package in shiny app

Let's make R optional for the users

OM needs all possible rows (with 0 fill)

Outcome measures throws a fatal error if cells are empty
Outcome measures needs to have the complete value set in the uploadable file. Missing rows of data will end up as fatal errors in the upload.

Let's eliminate the errors
These should be included and set to 0.

We need to add dummy expand_grids to the OM parts or the award function to capture this situation in the final output file.
Also should revisit the sort at the end

fix "part = all" write report to start fresh for OM

Describe the bug
When processing OM, there is only a Part B and Part C, so running a produce_report with part = all repeatedly keeps appending instead of starting a file fresh.

To Reproduce
Steps to reproduce the behavior:

produce the OM report multiple times in a row (with parts-all)
open the txt file and look at it

Expected behavior
Keep this behavior for reports with part A, but adjust the write script for reports without a part A and/or create a generic dummy part A that is blank to process first for the write script

Additional context
don't know what other reports might have this problem, but we could look at the upload instructions proactively to try to find them

new distance ed questions on the CIP data screen -- not sure if this will affect the upload
Subbac certificates < 1yr are split into two subcategories by duration [new levels: 1a, 1b, 2, 3, 4, 5, 6, 7, 8, 17, 18, 19] -- if the change is 1 > 1a/1b it also means a change in datatype :{

alisonlanski / ipedsuploadables Goto Github PK

ipedsuploadables's People

Contributors

Stargazers

Watchers

Forkers

ipedsuploadables's Issues

Recommend Projects

Recommend Topics

Recommend Org