read_gtfs
as implemented right now looks very similar to tidytransit's (with the useful files
argument). It still doesn't read files from an url, but that will soon be coming.
The behaviour is pretty straight forward. Here is sample code on how to use it:
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/gtfs_supervia.zip")
gtfs
#> $stops
#> stop_id stop_name stop_desc stop_lat stop_lon zone_id stop_url location_type parent_station
#> 1: 9610 Prefeito Bento Ribeiro -22.86431 -43.36207 78 0
#> 2: 9594 Jardim Guapimirim -22.56264 -42.99505 77 0
#> 3: 9602 Central do Brasil -22.90339 -43.19171 78 0
#> 4: 9595 Surui -22.66299 -43.12897 77 0
#> 5: 9597 Maracanã -22.90934 -43.23327 78 0
#> ---
#> 105: 9598 Alemão -22.85826 -43.27097 79 0
#> 106: 9694 Bonsucesso -22.86771 -43.25518 79 0
#> 107: 9628 Adeus -22.86530 -43.26120 79 0
#> 108: 9627 Baiana -22.85874 -43.26629 79 0
#> 109: 9626 Itararé -22.86156 -43.27205 79 0
#>
#> ....... fairly long print
#>
#> attr(,"class")
#> [1] "gtfs"
#> attr(,"validation_result")
#> file file_spec file_provided_status field field_spec field_provided_status validation_status
#> 1: agency req TRUE agency_id opt TRUE ok
#> 2: agency req TRUE agency_name req TRUE ok
#> 3: agency req TRUE agency_url req TRUE ok
#> 4: agency req TRUE agency_timezone req TRUE ok
#> 5: agency req TRUE agency_lang opt TRUE ok
#> ---
#> 132: attributions opt FALSE is_operator opt FALSE info
#> 133: attributions opt FALSE is_authority opt FALSE info
#> 134: attributions opt FALSE attribution_url opt FALSE info
#> 135: attributions opt FALSE attribution_email opt FALSE info
#> 136: attributions opt FALSE attribution_phone opt FALSE info
#> validation_details
#> 1: <NA>
#> 2: <NA>
#> 3: <NA>
#> 4: <NA>
#> 5: <NA>
#> ---
#> 132: missing_opt_file
#> 133: missing_opt_file
#> 134: missing_opt_file
#> 135: missing_opt_file
#> 136: missing_opt_file
One thing that is bugging me right now is on how to deal with parsing failures.
tidytransit::read_gtfs
appends parsing failures as attributes of the resultant GTFS object using readr::problems
, which returns parsing failures as a tibble (at least that's what supposed to do, but apparently it just prints the failures).
data.table
doesn't have any function like that, as far as I could find, so I implemented a "warning catcher" using tryCatch
, but this approach is not as flexible as readr
's. Right now I look for warnings while reading the files and if those are found they are returned as string and no GTFS object is created. Take a look:
# tidytransit approach
gtfs <- tidytransit::read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> # A tibble: 3 x 5
#> row col expected actual file
#> <int> <chr> <chr> <chr> <chr>
#> 1 7 stop_desc delimiter or quote ; "'C:\\Users\\Usuario\\AppData\\Local\\Temp\\RtmpgzFQUs/stops.txt'"
#> 2 7 stop_desc delimiter or quote S "'C:\\Users\\Usuario\\AppData\\Local\\Temp\\RtmpgzFQUs/stops.txt'"
#> 3 7 NA 9 columns 10 columns "'C:\\Users\\Usuario\\AppData\\Local\\Temp\\RtmpgzFQUs/stops.txt'"
#> Warning messages:
#> 1: In parse_gtfs_file(prefix, full_file_path, quiet = quiet) :
#> Parsing failures while reading stops
#> 2: In gtfs_validate(gtfs_obj, quiet = quiet) :
#> Invalid feed. Missing required field(s): trip_id, agency_name, agency_url
# it doesn't actually appends problems as attributes, but that's what supposed to do according to their code
attributes(gtfs)
#> $names
#> [1] "trips" "fare_attributes" "transfers" "frequencies" "fare_rules" "stop_times"
#> [7] "calendar_dates" "routes" "shapes" "agency" "stops"
#>
#> $class
#> [1] "gtfs"
#>
#> $validation_result
#> # A tibble: 131 x 8
#> file file_spec file_provided_status field field_spec field_provided_stat~ validation_stat~ validation_detai~
#> <chr> <chr> <lgl> <chr> <chr> <lgl> <chr> <chr>
#> 1 trips req TRUE route_id req TRUE ok NA
#> 2 trips req TRUE service_id req TRUE ok NA
#> 3 trips req TRUE trip_id req TRUE ok NA
#> 4 trips req TRUE trip_headsign opt TRUE ok NA
#> 5 trips req TRUE trip_short_name opt FALSE info missing_opt_field
#> 6 trips req TRUE direction_id opt TRUE ok NA
#> 7 trips req TRUE block_id opt TRUE ok NA
#> 8 trips req TRUE shape_id opt TRUE ok NA
#> 9 trips req TRUE wheelchair_access~ opt FALSE info missing_opt_field
#> 10 trips req TRUE bikes_allowed opt FALSE info missing_opt_field
#> # ... with 121 more rows
# the current approach being used in gtfstools
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Warning message:
#> In read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") :
#> Parsing failures while reading stops
gtfs
#> $stops
#> [1] "Stopped early on line 8. Expected 9 fields but found 8."
The main disadvantage of this approach is that I have to figure out all possible warning messages that may come from fread
in order to parse them correctly and give the user a meaningful message. Do you know of any data.table
equivalent of readr::problems
?
Returning the GTFS along the warning messages (maybe using tidytransit's supposed approach of appending them as attributes) is doable, but I'm not sure if I like it. What do you think?