Giter Site home page Giter Site logo

ipedsuploadables's People

Contributors

alisonlanski avatar davisvaughan avatar shilohfling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ipedsuploadables's Issues

aggregated formatting

should we have an option to format already aggregated data? would people use that? ask at air forum

tidyverse update breaks OM part D

Reported by Calumet College of St. Joseph:

I have received errors running the outcomes measure report with the sample data. Running produce_om_report() with part=”all” or part=”d” produces the following errors:

produce_om_report(om_students, part = "ALL", format = "uploadable")

Error in dplyr::mutate():

! Problem while computing `..1 =

dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.

Caused by error in across():

! Problem while computing column UNITID.

Caused by error in stop_vctrs():

! Can't convert replace to match type of data .

Run rlang::last_error() to see where the error occurred.

Here is the output of rlang::last_error()

<error/dplyr:::mutate_error>

Error in dplyr::mutate():

! Problem while computing `..1 =

dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.

Caused by error in across():

! Problem while computing column UNITID.

Caused by error in stop_vctrs():

! Can't convert replace to match type of data .


Backtrace:

  1. IPEDSuploadables::produce_om_report(...)

  2. tidyr:::replace_na.default(UNITID, 0)

  3. vctrs::vec_assign(data, missing, replace, x_arg = "data", value_arg = "replace")

  4. vctrs <fn>()

  5. vctrs::vec_default_cast(...)

  6. vctrs::stop_incompatible_cast(...)

  7. vctrs::stop_incompatible_type(...)

  8. vctrs:::stop_incompatible(...)

  9. vctrs:::stop_vctrs(...)

Run rlang::last_trace() to see the full context.

The output of rlang::last_trace()

rlang::last_trace()

<error/dplyr:::mutate_error>

Error in dplyr::mutate():

! Problem while computing `..1 =

dplyr::across(dplyr::everything(), ~tidyr::replace_na(.x, 0))`.

Caused by error in across():

! Problem while computing column UNITID.

Caused by error in stop_vctrs():

! Can't convert replace to match type of data .


Backtrace:

  1. ├─IPEDSuploadables::produce_om_report(...)

  2. │ ├─IPEDSuploadables::write_report(...)

  3. │ └─IPEDSuploadables::make_om_part_D(df = students)

  4. │ └─... %>% ...

  5. ├─dplyr::transmute(...)

  6. ├─dplyr::mutate(...)

  7. ├─dplyr:::mutate.data.frame(...)

  8. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())

  9. │ ├─base::withCallingHandlers(...)

  10. │ ├─base::withCallingHandlers(...)

  11. │ └─mask$eval_all_mutate(quo)

  12. ├─tidyr::replace_na(UNITID, 0)

  13. └─tidyr:::replace_na.default(UNITID, 0)

  14. └─vctrs::vec_assign(data, missing, replace, x_arg = "data", value_arg = "replace")

  15. └─vctrs `<fn>`()
    
  16.   └─vctrs::vec_default_cast(...)
    
  17.     └─vctrs::stop_incompatible_cast(...)
    
  18.       └─vctrs::stop_incompatible_type(...)
    
  19.         └─vctrs:::stop_incompatible(...)
    
  20.           └─vctrs:::stop_vctrs(...)
    
  21.             └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
    

I have a new install of R (4.1.3 “One Push-Up”) and RStudio (2022.02.1+461 "Prairie Trillium") along with 2.3.5 version of your R package on my MacBook Pro. It may mean there has been some breaking change introduced with a newer library? I am a relative novice in R preferring to do most of my work in python/pandas instead.


Followup:
On the issue, I did a google search earlier today and found this link https://stackoverflow.com/questions/71227130/replace-na-of-numeric-columns-with-both-numeric-and-character-values-in-r

It looks like the problem could stem from replace_na with a tidy version 1.2.0 change.

using produce report to get a part uses wrong filename

filename issue
When running produce_report scripts, the filenames are not showing "PartX" but instead show "AllParts" because of how the list of arguments is passed (it pulls the default, which requires the user to change "full" to "part" which is silly because if they're asking for a part, they clearly want a part)

Expected behavior
Filenames show the part;.

Fall Enrollment needs to be able to handle multiple cips/levels

IPEDS will accept students a multiple cips
The current import requirements do NOT allow for a single student to be reported at two levels under one cip code.
The current prep scripts also do NOT account for this,

Fall Enrollment is an unduplicated survey (1 row per person, so you have to pick a level to report at)

To Dos
Update the dummy data to have a case like this
Update the specs to be able to handle this situation
Update the prep scripts

packageify the Completions Dummy Data file

turn it into a function (see the hr dummy data file) ; will have 2 functions or 2 returns one for regular cip and one for extracips and/or set that into two different functions

change Inclusion column in FE1

The FE1 Inclusion column is defined poorly in the vignette. I believe it should be the number of student STILL INCLUDED from the original cohort, instead of students who are ADDED, which the vignette seems to say.

add scripts for 12 month enrollment

  • Write part A
  • Write part B
  • Write part C
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .dat to the package
  • Run devtools::check() and address errors

Require path selection every time you run a report

Don't save a local path; have the write_report function only run once per "produce" instead of after each part

Avoid saving a path locally but reduce annoyance by only calling "write report" once per "produce" function instead of one per part within the produce function. See the autoupload scripts for an example of how to implement this and/or call the autoupload to help with the final output instead of feeding directly into write_report.

Acceptance Criteria
One call to write_report/path per "produce" instead of per part.

Check if GR, GR_200, and OM data structures should be similar

The situation
Because OM, GR ,and GR_200 have related concepts of reporting, it might be the case that the data files schools would use are the same or similar. So it might be possible to have our import requirements use the same fields across these reports, which could reduce the prep burden.

Action Items

  • Investigate to see if this makes sense to pursue

  • if it does make sense, select the right setup

  • adjust GR google sheet import specs and saved data

  • adjust GR scripts

  • adjust GR_200 google sheet import specs and saved data

  • adjust GR_200 scripts

  • adjust OM google sheet import specs and saved data

  • adjust OM scripts

Make pretty files prettier

Replace IPEDS codes in prettyfiles with literals
Make the csv outputs easier to use for the reviewer by converting ipeds-required codes and column names and organization back to something closer to what you get on the actual IPEDS PDF submission screens.

Or just tell people to look at their reports after they upload it in IPEDS because then it's lovely

write scripts for graduation rates 200

  • Write part A
  • Write prep function
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .dat to the package
  • Run devtools::check() and address errors

change/add install instructions to use remotes package

Why force people to install devtools?
Turns out the github install in devtools is taking is from the remotes package. Might as well list that as an option too since it may be more lightweight

Would need to edit the readme, probably the "how to set up" vignette, and rerun pkgdown

Medical in HR salaries

HR SOUTLAYs including medical?
Found by UNLV: I believe we may have found a bug in your package. It looks like the salary outlays that populate Part G4 are including records where IsMedical = 1 (but should only include Non-Medical). The employee counts in G4 look fine; it's just the salary piece. We installed the package yesterday, so I believe we are using the current version.

write scripts for institutional characteristics

  • Write part A
  • Write part B
  • Write part C
  • Write part D
  • Write part E
  • Write part F
  • Write prep function
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .dat to the package
  • Run devtools::check() and address errors

Revise GR 200 to include < 4 year institutions

  • Check IPEDS documentation to see where there are overlaps with 4 year degree granting institutions
  • Figure out the lift to include 2 years
  • Check that parts line up
  • Check that fields line up

Check/Create Vignettes for full package

Verify that all packages have a df-specs vignette and also verify/create general "how to use the package" vignette(s) that incorporate all scripts currently available.

df-specs [update these to say "produce" instead of "make" functions?

  • HR
  • Fall Enrollment
  • GR
  • GR 200
  • OM
  • 12 month enrollment
  • Completions

how-to-use

  • HR
  • Fall Enrollment
  • GR
  • GR 200
  • OM
  • 12 month enrollment
  • Completions
  • Global (all-surveys)

add scripts for outcome measures

  • Write part A
  • Write part B
  • Write part C
  • Write part D
  • Write prep function
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .dat to the package
  • Run devtools::check() and address errors

FE should accept multiple cips per person

IPEDS instructions state that the cip counts are NOT unduplicated
specs_FE requests 1 MajorCip per student for even-year CIP-based ipeds breakouts, but IPEDS actually allows students to be double-reported.

Will need to update intake files specs and code to account for this. Maybe use flags instead of a single column.

May also want to consider breaking specs into 3 groups:
-always required?
-for odd years?
-for even years?

Not vital to update for 2021 since CIPs are not reported this year, but should be updated overall.

make_ef1_part_D failure

Describe the bug
*make_ef1_part_D() will fail when there are either no non-degree students or no new non-degree students. The following filter is the problem:

           dplyr::filter(.data$ISDEGREECERTSEEKING == 0 & .data$STUDENTLEVEL == "Undergraduate" &
                           (.data$ISFIRSTTIME == 1 | .data$ISTRANSFER == 1)) %>%

If the number of qualifying cases is zero, dplyr::summarize will throw the following error:

nms %in% c('"i", "x", "") are not all TRUE

Additional context
Perhaps create a dichotomous variable using the same criterion as the filter and take the sum:

  partD <- df %>%
           dplyr::select(.data$UNITID,
                         .data$ISDEGREECERTSEEKING,
                         .data$STUDENTLEVEL,
                         .data$ISFIRSTTIME,
                         .data$ISTRANSFER) %>%
           dplyr::mutate(FTND = as.integer(.data$ISDEGREECERTSEEKING == 0 & .data$STUDENTLEVEL == "Undergraduate" &
                                            (.data$ISFIRSTTIME == 1 | .data$ISTRANSFER == 1))) %>%
           dplyr::group_by(.data$UNITID) %>%
           dplyr::summarise(COUNT = sum(FTND, na.rm = T) %>%
           dplyr::ungroup() %>%
           #format for upload
           dplyr::transmute(UNITID = paste0("UNITID=", .data$UNITID),
                            SURVSECT = "SURVSECT=EF1",
                            PART = "PART=D",
                            COUNT = paste0("COUNT=", .data$COUNT)
                           )

add scripts for fall enrollment

  • Write part A
  • Write part B
  • Write part C
  • Write part D
  • Write part E
  • Write part F
  • Write part G
  • Write prep function
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .rda to the package
  • Run devtools::check() and address errors

add scripts for student financial aid

  • Write part A
  • Write part B
  • Write part C
  • Write part D
  • Write part E
  • Write part F
  • Write part G
  • Write prep function
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .dat to the package
  • Run devtools::check() and address errors

add scripts for graduation rates

  • Write part B
  • Write part C
  • Write prep function
  • Write wrapper function
  • Create google sheet for import specs
  • Create RMD vignette that reads in google sheet
  • Create dummy data
  • Test functions using dummy data
  • Test functions using real data
  • Save dummy data as a .rda to the package
  • Save google sheet as .rda to the package
  • Run devtools::check() and address errors
  • Rewrite google sheet and RMD for new df requirements

get pretty files working

rename?
make sure it works with new versions of make_ functions (including produce functions?)
tie in with write_report? don't know if we need that or it can be totally separate

GR doesn't account for student who switch program-types

The code assumes all students start and complete in the same program-type. If a student changes from a 4-yr to a 1-yr program, for example, it does not handle their data correctly

Students should be reported in the section related to their starting-cohort program-type.
Within the correct section, their 150% completion should be reported for their actual program-type instead of their starting-program-type.

For example:

  1. Of those who start in our 2-4 program, those who finish the 48-hour program within 150% (3 years, in their case) get counted in Completed150 and should show up in column/line 12 of section III. If they change majors and complete one of our bachelor’s degrees instead, they remain in the other degree-seeking cohort, but are reported in column 18 of section III.
  2. Of those who start in the bachelor's degree-seeking cohort, those who finish their bacc degree in six years get reported in line/column 18, section II. Those who change majors to the pre-vet/pre-health programs and complete their 48 hours in 3 years get counted in line/column 12, section II.

This is not the behavior of the current package.

Issue reported by University of Nebraska - Lincoln by Jason Casey

write_report() does not produce BOTH types when format=="BOTH"

Please fill out all sections on here so we can assist you in a timely manner!

Describe the bug
write_report() does not produce BOTH types when format=="BOTH". It only produces the uploadable format.

Expected behavior
It should produce both the Readable and Uploadable files.

OM needs all possible rows (with 0 fill)

Outcome measures throws a fatal error if cells are empty
Outcome measures needs to have the complete value set in the uploadable file. Missing rows of data will end up as fatal errors in the upload.

Let's eliminate the errors
These should be included and set to 0.

We need to add dummy expand_grids to the OM parts or the award function to capture this situation in the final output file.
Also should revisit the sort at the end

fix "part = all" write report to start fresh for OM

Describe the bug
When processing OM, there is only a Part B and Part C, so running a produce_report with part = all repeatedly keeps appending instead of starting a file fresh.

To Reproduce
Steps to reproduce the behavior:

  1. produce the OM report multiple times in a row (with parts-all)
  2. open the txt file and look at it

Expected behavior
Keep this behavior for reports with part A, but adjust the write script for reports without a part A and/or create a generic dummy part A that is blank to process first for the write script

Additional context
don't know what other reports might have this problem, but we could look at the upload instructions proactively to try to find them

Test current code, branch for IPEDS 2019, then fix completions for 2020

IPEDS 2020 has changes for Completions (no changes for HR -- whew!)

  1. new distance ed questions on the CIP data screen -- not sure if this will affect the upload
  2. Subbac certificates < 1yr are split into two subcategories by duration [new levels: 1a, 1b, 2, 3, 4, 5, 6, 7, 8, 17, 18, 19] -- if the change is 1 > 1a/1b it also means a change in datatype :{

check datatype of ipedsunitid

get_unitid sets it as numeric but in com partD we have to then reset it back to character
make this consistent through the code generally

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.