ipeagit / gtfstools Goto Github PK

View Code? Open in Web Editor NEW

37.0 8.0 9.0 6.41 MB

General Transit Feed Specification (GTFS) Editing and Analysing Tools

Home Page: https://ipeagit.github.io/gtfstools/

License: Other

R 98.93% C++ 1.07%

r gtfs public-transport publictransport

gtfstools's Introduction

gtfstools

gtfstools offers a set of convenient tools for editing and analysing transit feeds in GTFS format. Feeds are read as a list of data.tables, allowing for easy and fast data manipulation. Many of this package’s features are based on functions from other packages, especially {tidytransit} and {gtfs2gps}.

Installation

Stable version:

install.packages("gtfstools")

Development version:

# either
install.packages("gtfstools", repos = "https://dhersz.r-universe.dev")

# or
# install.packages("remotes")
remotes::install_github("ipeaGIT/gtfstools")

This package requires a working installation of {sf}. Please check this link for more information on how to install it.

Usage

Please read gtfstools vignettes for more on the package usage:

Basic usage: reading, analysing, manipulating and writing feeds. Run vignette("gtfstools") or check it on the website (Introduction to gtfstools).
Filtering GTFS feeds. Run vignette("filtering", package = "gtfstools") or check it on the website (Filtering GTFS feeds).
Validating GTFS feeds. Run vignette("validating", package = "gtfstools") or check it on the website (Validating GTFS feeds).

Related packages

Acknowledgement

gtfstools is developed by a team at the Institute for Applied Economic Research (Ipea), Brazil.

gtfstools's People

Contributors

Stargazers

Watchers

Forkers

polettif rafapereirabr pedro-andrade-inpe joaobazzo mpadge sayrabemanan leonleprau joaocarabetta

gtfstools's Issues

Create filter by day of the week

Here we have a filter_week_days() function in the gtfs2gps package we could use as base.

Issue getting GTFS data into R to filter

Hi there,

So this is the first time i've used R to deal with GTFS data and I've run into an issue pretty early on.
after figuring out all the necessary packages and tools needed I'm trying to read my GTFS zip data file as per the 'GTFS filter instructions'

So I try to set up the path and then get R to read the data

data_path <- system.file("C:/Users/leond/OneDrive/Documents/itm_north_east_gtfs.zip", package = "gtfstools")
gtfs <- read_gtfs(data_path)

to which I'm faced with the error

Error 'path' must have '.zip' extension.

when I google this error I get no clear answers, from what I can understand anyway.

I've made sure that my GTFS file is in zip form (which is how it comes on download), and then I copied the location of the file and wrote out the command as it says to in the instructions for 'GTFS Filter'.

any help with this would be appreciated. Thanks so much.

All the best,
Leon

add `encoding` argument to `read_gtfs()`

This will require changes upstream in {gtfsio}.

Filtering fares by zone id

Currently the fare tables are filtered in two ways (please correct me if I am wrong!):

The fare_attributes table may have an agency_id column. All rows with non-relevant agencies can be removed, and the remaining fare_ids can be used to filter the fare_rules table.
The fare_rules table may have a route_id column. All rows with non-relevant routes can be removed, and the remaining fare_ids can be used to filter the fare_attributes table.

However, fare rules can also be based on fare zones rather than on routes. Especially in cities this happens quite often. In that case they have a zone index stored in either the origin_id, destination_id or contains_id columns. This zone indices refer to values in the zone_id column in the stops table. Hence, when stops are removed (e.g. by filtering trips) some zones might not be present anymore, and the fare_rules table can be filtered to only contain relevant zones. The remaining fare_ids can then be used to filter the fare_attributes table.

See https://developers.google.com/transit/gtfs/reference#fare_rulestxt for reference

Again, please correct me if I made wrong assumptions!

Roadmap to v1.0.0

I finally started playing with this package again. Here is a checklist of what I consider important prior to submitting v1.0.0 to CRAN.

It'll be important to write a vignette about the filtering functions as well, but that could be left out to v0.2.1, perhaps.

I might add some checks to this issue as I see fit.

New function `remove_unused_ids()`

Lots of unused ids in the sample feeds.

New fuction to set route frequency

It would be great if we had a set_route_frequency() function. I just wanted to flag this for now, but I can propose latter a function to do this.

Release gtfstools v1.0.0

Prepare for release:

Submit to CRAN:

Bump major version
Commit changes "v1.0.0 release"
devtools::submit_cran()
Approve email

Wait for CRAN...

build/find a simple gtfs with data in all tables to use in filters' tests

Create filter by `service_id`

New function get_route_frequency()

I suggested in #5 that we created a function that allow users to change the frequency of a GTFS feed. Before we do that, I thought we could start with something easier. So I've now created the 1st draft of a function to get_route_frequency(). I've placed this function in a new branch named frequencies.

Hex sticker

Revisiting the hex stickers today, I have two suggestions for gtfstools:

Create internal function to raise warnings if one of the specified ids are missing from the gtfs

Many of the functions use the same pattern of checking if the specified ids are in the required tables, and if they're not they raise a warning (and sometimes an error if none of the ids are present) to communicate it to the user. This pattern can probably be converted into an internal function, which will greatly improve code readibility.

Converting bbox to polygon

Sorry for the issue overload. I just saw the internal function polygon_to_bbox. Just wanted to mention that you can directly convert a bbox object to an sfc object (with POLYGON geometry) using sf::st_as_sfc(bbox). If the bbox has a CRS attached this will be copied to the sfc object.

Filtering the translations table

Probably very low priority, since I rarely see GTFS feeds with this table present, but just for the sake of completeness: as far as I could find the translations table is not filtered in any of the filter functions. This table contains translations from certain field values to other languages. If these fields are not present anymore after a filter operations (e.g. after filtering trips or routes), they are not needed anymore in the translations table.

write_gtfs is taking a bit too long

write_gtfs() right now takes a bit too long to write.

This is because converting hms objects to character takes long.

For reference, saving the sample PoA's GTFS right now takes around 7~ seconds. Almost all of this time is consumed converting the times to strings. I tried creating a custom function for this, instead of using as.character(). The times go down to around 5~ seconds, but still, that's long.

Another reference: saving PoA's GTFS as read with tidytransit::read_gtfs after using tidytransit::set_hms_times takes around 4~ seconds.

So gtfstools' write_gtfs() is not that much slower, but I think it can really improve. I'll try a similar approach to tidytransit: read times as strings (instead of hms as I'm doing right now) and create a set_hms_times to create hms columns.

Spatial filter by stop geometries rather than by trip geometries

Relates to #43. Spatial filters with filter_by_sf() now utilize the geometry of trips, if I understand correctly. As is clearly shown in the examples, this often results in a GTFS that has stops (far) outside the area which was used as a filter. When doing local analysis (with possible origins and destinations only within that area) you don't care about these stops outside the area of interest, even if they are part of trips that intersect the area of interest. For example, when filtering the Dutch GTFS file for local analysis in Amsterdam, I do want the international train to Paris to be in there for travelling between Amsterdam central station and Amsterdam airport, but am not interested in keeping the full trip all the way to Paris. Having all these other stops still included makes the GTFS file unnecessarily large and also "harms" clear visualizations of the transport network inside the area of the interest.

Therefore I think it is useful to also allow spatial filters using the stop geometries. This would only keep the stops that are inside the area of interest. All the trips using these stops will still be included, but only for that part inside the area of interest. This is different then using a "within" predicate in the current implementation, since that will remove all trips that do intersect the area of interest, but are not fully contained in it.

There may be situations in which a trip "exits" the area of interest, passes some other stops, and then "enters" the area of interest again, but as mentioned in #43 this is not a problem since stop sequence values in the stop_times table don't have to be consecutive. The only issue to overcome would be how to handle the trip shapes if a shapes table is present.

Happy to contribute if needed and when you think the idea makes sense!

merge filter_route_type() from gfs2gps

merge filter_agency_id() from gfs2gps

new function `crop_gtfs()`

add new function crop_by_sf()

Create a function to convert frequencies to stop_times

Self-explanatory title.

By the way, can anyone think of a better name than convert_frequencies_to_stop_times()? I like the verb_object() structure, but it's obviously too verbose in this case.

Relationship between gtfs2gps and gtfstools

Hi all @dhersz @pedro-andrade-inpe @Joaobazzo , I think we can start having a conversation about integrating the gtfs2gps and gtfstools packages. Here is a quick summary of my take on this.

The gtfs2gps has a very focused purpouse, which is to conver gtfs to a GPS-like data format. However, we have already developed many functions that help the user conveniently and quickly edit a GTFS feed. Meanwhile, the core aim of the gtfstools package is to provide tools for the manipulation of gtfs data.

I think it would be good if we could migrate some of these functions from gtfs2gps to gtfstools. See below the functions that I initially thought could be migrated. The details of this migration would have to be dicussed case-by-case. Naturally, this would imply making the authors of gtfs2gps co-authors gtfstools.

Filter

filter_day_period()
filter_by_day()
filter_week_days()
filter_by_agency_id()
filter_single_trip()
filter_by_route_id()
filter_by_route_type()
filter_by_shape_id()
filter_valid_stop_times()

Spatial

gtfs_shapes_as_sf()
gtfs_stops_as_sf()

others

merge_gtfs_feeds()
remove_invalid() to Remove invalid objects from GTFS data

merge filter_day_period() from gfs2gps

alternatives:

Keep any trip that crosses a given day period
keep only the trip segments within a given day period

Improve consistency with gtfstio

@dhersz I find the following behaviour somewhat confusing:

f <- "/<a>/<gtfs>/<feed>.zip"
gtfs0 <- gtfsio::import_gtfs (f)
f <- gtfstools::frequencies_to_stop_times (gtfs0)
#> Error in gtfstools::frequencies_to_stop_times(gtfs0): Assertion on 'gtfs' failed: Must inherit from class 'dt_gtfs', but has classes 'gtfs','list'.

gtfs1 <- gtfstools::read_gtfs (f)
f <- gtfstools::frequencies_to_stop_times (gtfs1)
#> Error: The following columns in the GTFS object 'frequencies' element do not inherit from the required classes:
#>   - 'headway_secs': requires integer, but inherits from character

^{Created on 2022-01-31 by the reprex package (v2.0.1.9000)}

I would expect the first to be handled internally, and not to error like that. I suspect the best - and definitely least confusing - way to achieve that would be to entirely ditch the gtftools::read_gtfs function, so all packages can always rely on gtfsio::import_gtfs(). You could then just have one additional wrapper function at the start of all gtfstools functions which calls the additional new_gtfs and convert_from_standard functions. (If that slowed things down, you easily just memoise the calls so conversion would be instantaneous.) Having read_gtfs here is really quite confusing for me, and I considered myself "internal" to this whole ecosystem - that must mean we should presume it is even more confusing for any general users.

The second example appears to be a small 🐛 here. Sorry for not submitting exactly reproducible code, but it's hard these days with transit.feeds in private s3 buckets. That's the feed referred to in r-transit/gtfsio#25 - an older version of the Madrid metro system.

Retaining only specified stops.

Hi there ,

So I just wanted to say thanks for the last advice, I was able to get all my specified stops into a vector.

I run the filter and as someone has mentioned before, the tool keeps all my specified stops and stops associated with routes that pass through my specified stops. this means my resulting gtfs data has about 8000 stops instead of my original 4000

I was wondering if there is a way to just retain the specified stop id's and their associated data?

Thanks so much for your help, this tool is really useful.

All the best,
Leon

add `prefix` argument to `merge_gtfs()`

merge gtfs_shapes_as_sf() from gfs2gps

rename to convert_shapes_to_sf()

`{bit64}` required but not installed in recent builds

merge filter_route_id() from gfs2gps

`get_trip_speed()` returns `NA` when the specified trip_id is not listed in stop_times

That's because get_trip_duration(), which is used inside get_trip_speed(), doesn't return anything for such trip. When the duration dataset is joined to the speed dataset, this entry results in an NA. Also, get_trip_duration() raises a warning saying that this trip is not listed in stop_times.

IMO the best to do here is not to send trips that are not listed in stop_times to get_trip_duration in first place. The warning can be copied to still inform about what is going on to the user (i.e. could not calculate speed because could not calculate duration, since it's not listed in stop_times).

Optimizing `get_trip_speed()`

get_trip_speed() can take some time if we're dealing with a big gtfs. The majority of the time is spent calculating the length of trips. An obvious optimization is to calculate the length of the shapes/patterns of stop times before joining them to trips. Otherwise we calculate the length of the same geometry multiple times, which is exactly what we're doing right now.

To solve this for the shapes case is pretty simple, we just need to use convert_shapes_to_sf(), calculate the length and then join to trips, instead of using get_trip_geometry(file = "shapes") and then calculating the length.

For the stop times case it's a bit less simple, because we have to identify identical patterns of stop times, and then generate only one geometry per pattern, to later join to trips. This optimization could also be implemented to get_trip_geometry() in the case of stop times.

Keep parent_stop when in filter functions

help with Stop_id Filter

Hi again,

So I've used excel to put my stop_id data into the correct format e.g. ("18848", "940004157") and then set up the command

stop_ids <- C("x", "x" and so on) to continue with the filter process.

When I try to run it presents me with an (+) implying that more code needs to be written. I'm sure that its all in the same format as the example. I have 4031 individual stop_id's so I've just attached screenshots of the start and the end.

Thanks so much for your help with this its really appreciated.

merge filter_weekday() from gfs2gps

Two original functions from gfs2gps

filter_by_day()
filter_week_days()

New `disambiguate_shapes()` and `trim_shapes()` functions

Oftentimes the same shape_id will be assigned to trips that follow the same "general" shape, but that are actually different.

Let's say for example that shape_id 1 goes from A, to B, to C and to D.

trip_id 1 describes a trip that goes from A to C, and trip_id 2 describes a trip that goes from B to D. In my opinion, these trips should be assigned to different shapes, even though they share the same "general" path.

In practice, GTFS from many different agencies will assign the same shape (shape_id 1) to the two different trips.

This issue can be fixed in two steps:

First identifying the shape_ids that are shared inadequately (in this case, let's say it would rename it to 1_1 and 1_2). disambiguate_shapes()
Then for each distinct shape, trim it to the first and the last stops associated to the trip it describes. trim_shapes()

trim_shapes() will behave weirdly if the shapes are not disambiguated first. In the aforementioned example, should it trim to A or B? So trim_shapes() MUST call disambiguate_shapes() first.

read_gtfs parsing failures detection

read_gtfs as implemented right now looks very similar to tidytransit's (with the useful files argument). It still doesn't read files from an url, but that will soon be coming.

The behaviour is pretty straight forward. Here is sample code on how to use it:

gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/gtfs_supervia.zip")
gtfs
#> $stops
#>      stop_id              stop_name stop_desc  stop_lat  stop_lon zone_id stop_url location_type parent_station
#>   1:    9610 Prefeito Bento Ribeiro           -22.86431 -43.36207      78                      0               
#>   2:    9594      Jardim Guapimirim           -22.56264 -42.99505      77                      0               
#>   3:    9602      Central do Brasil           -22.90339 -43.19171      78                      0               
#>   4:    9595                  Surui           -22.66299 -43.12897      77                      0               
#>   5:    9597              MaracanÃ£           -22.90934 -43.23327      78                      0               
#>  ---                                                                                                           
#> 105:    9598                AlemÃ£o           -22.85826 -43.27097      79                      0               
#> 106:    9694             Bonsucesso           -22.86771 -43.25518      79                      0               
#> 107:    9628                  Adeus           -22.86530 -43.26120      79                      0               
#> 108:    9627                 Baiana           -22.85874 -43.26629      79                      0               
#> 109:    9626               ItararÃ©           -22.86156 -43.27205      79                      0               
#> 
#> ....... fairly long print
#> 
#> attr(,"class")
#> [1] "gtfs"
#> attr(,"validation_result")
#>              file file_spec file_provided_status             field field_spec field_provided_status validation_status
#>   1:       agency       req                 TRUE         agency_id        opt                  TRUE                ok
#>   2:       agency       req                 TRUE       agency_name        req                  TRUE                ok
#>   3:       agency       req                 TRUE        agency_url        req                  TRUE                ok
#>   4:       agency       req                 TRUE   agency_timezone        req                  TRUE                ok
#>   5:       agency       req                 TRUE       agency_lang        opt                  TRUE                ok
#>  ---                                                                                                                 
#> 132: attributions       opt                FALSE       is_operator        opt                 FALSE              info
#> 133: attributions       opt                FALSE      is_authority        opt                 FALSE              info
#> 134: attributions       opt                FALSE   attribution_url        opt                 FALSE              info
#> 135: attributions       opt                FALSE attribution_email        opt                 FALSE              info
#> 136: attributions       opt                FALSE attribution_phone        opt                 FALSE              info
#>      validation_details
#>   1:               <NA>
#>   2:               <NA>
#>   3:               <NA>
#>   4:               <NA>
#>   5:               <NA>
#>  ---                   
#> 132:   missing_opt_file
#> 133:   missing_opt_file
#> 134:   missing_opt_file
#> 135:   missing_opt_file
#> 136:   missing_opt_file

One thing that is bugging me right now is on how to deal with parsing failures.

tidytransit::read_gtfs appends parsing failures as attributes of the resultant GTFS object using readr::problems, which returns parsing failures as a tibble (at least that's what supposed to do, but apparently it just prints the failures).

data.table doesn't have any function like that, as far as I could find, so I implemented a "warning catcher" using tryCatch, but this approach is not as flexible as readr's. Right now I look for warnings while reading the files and if those are found they are returned as string and no GTFS object is created. Take a look:

# tidytransit approach

gtfs <- tidytransit::read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> # A tibble: 3 x 5
#>     row col       expected           actual     file                                                              
#>   <int> <chr>     <chr>              <chr>      <chr>                                                             
#> 1     7 stop_desc delimiter or quote ;          "'C:\\Users\\Usuario\\AppData\\Local\\Temp\\RtmpgzFQUs/stops.txt'"
#> 2     7 stop_desc delimiter or quote S          "'C:\\Users\\Usuario\\AppData\\Local\\Temp\\RtmpgzFQUs/stops.txt'"
#> 3     7 NA        9 columns          10 columns "'C:\\Users\\Usuario\\AppData\\Local\\Temp\\RtmpgzFQUs/stops.txt'"
#> Warning messages:
#> 1: In parse_gtfs_file(prefix, full_file_path, quiet = quiet) :
#>   Parsing failures while reading stops
#> 2: In gtfs_validate(gtfs_obj, quiet = quiet) :
#>   Invalid feed. Missing required field(s): trip_id, agency_name, agency_url
 
# it doesn't actually appends problems as attributes, but that's what supposed to do according to their code

attributes(gtfs)
#> $names
#>  [1] "trips"           "fare_attributes" "transfers"       "frequencies"     "fare_rules"      "stop_times"     
#>  [7] "calendar_dates"  "routes"          "shapes"          "agency"          "stops"          
#> 
#> $class
#> [1] "gtfs"
#> 
#> $validation_result
#> # A tibble: 131 x 8
#>    file  file_spec file_provided_status field              field_spec field_provided_stat~ validation_stat~ validation_detai~
#>    <chr> <chr>     <lgl>                <chr>              <chr>      <lgl>                <chr>            <chr>            
#>  1 trips req       TRUE                 route_id           req        TRUE                 ok               NA               
#>  2 trips req       TRUE                 service_id         req        TRUE                 ok               NA               
#>  3 trips req       TRUE                 trip_id            req        TRUE                 ok               NA               
#>  4 trips req       TRUE                 trip_headsign      opt        TRUE                 ok               NA               
#>  5 trips req       TRUE                 trip_short_name    opt        FALSE                info             missing_opt_field
#>  6 trips req       TRUE                 direction_id       opt        TRUE                 ok               NA               
#>  7 trips req       TRUE                 block_id           opt        TRUE                 ok               NA               
#>  8 trips req       TRUE                 shape_id           opt        TRUE                 ok               NA               
#>  9 trips req       TRUE                 wheelchair_access~ opt        FALSE                info             missing_opt_field
#> 10 trips req       TRUE                 bikes_allowed      opt        FALSE                info             missing_opt_field
#> # ... with 121 more rows

# the current approach being used in gtfstools

gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Warning message:
#> In read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") :
#>   Parsing failures while reading stops
gtfs
#> $stops
#> [1] "Stopped early on line 8. Expected 9 fields but found 8."

The main disadvantage of this approach is that I have to figure out all possible warning messages that may come from fread in order to parse them correctly and give the user a meaningful message. Do you know of any data.table equivalent of readr::problems?

Returning the GTFS along the warning messages (maybe using tidytransit's supposed approach of appending them as attributes) is doable, but I'm not sure if I like it. What do you think?

`get_parent_station()` creating `<NA>` stop_ids

library(gtfstools)

ber_path <- system.file("extdata/ber_gtfs.zip", package = "gtfstools")
ber_gtfs <- read_gtfs(ber_path)
ber_shapes <- c("14", "2")

smaller_ber <- filter_shape_id(ber_gtfs, ber_shapes)

get_parent_station(ber_gtfs, smaller_ber$stop_times$stop_id)
#>           stop_id parent_station
#>   1: 100000471802   900000200000
#>   2: 100000471901   900000203376
#>   3: 100000471002   900000203365
#>   4: 100000471602   900000200102
#>   5: 100000471201   900000203369
#>  ---                            
#> 143: 900000210216           <NA>
#> 144: 900000210217           <NA>
#> 145: 900000210325           <NA>
#> 146: 900000210005           <NA>
#> 147:         <NA>           <NA>

^{Created on 2021-10-28 by the reprex package (v2.0.1.9000)}

The NAs in the parent_station column are expected, the one in the stop_id is not.

`get_trip_geometry()` raises an error with default `file` argument if `shapes` or `stop_times` doesn't exist

Use NULL instead of c("shapes", "stop_times") as default, and if NULL do not raise an error if one of the files doesn't exist (but raise one if none of them exist).

Filtering by stop_id does not really filter by stop_id

I know it is very clearly documented that when filtering by stop_id you are actually filtering trips that contain those stops 😉 I am just wondering why. This seems counter-intuitive to me, that you filter by stop_id but end up with many more stops in your filtered GTFS. Personally, when I filter stops I do that because I am really only interested in those stops. For the stop_times table it does not matter, since trips are still valid trips when the values in the stop_sequence column are not consecutive. Also, the first stop in a trip does not have to have a stop_sequence value of 1. The only requirement is that stop_sequence values increase along a trip, which will still be the case when stops are removed (see https://developers.google.com/transit/gtfs/reference#stop_timestxt). So I would argue the "integrity of trips as described in the stop_times table" is preserved also after removing stops from a trip.

Of course I can also think of use-cases where you actually want to filter trips through stop ids, but maybe this can instead be addressed through a boolean parameter like preserve_trips (which can then default to TRUE if it is assumed this is the most common use-case and to not make backward incompatible changes).

When really filtering by stop_id (i.e. not filtering trips) it is useful (I think) to have a boolean include_child_stops parameter which, when set to TRUE, makes sure that child stops are included if a given stop_id refers to a parent stop (as is already happening in the current implementation). Probably TRUE is also the best default value for this one. But the possibility is then there to set it to FALSE and really keep only the stop_ids that you provide (which is in the end what you'd expect a filter function to do).

The only issue to overcome would be how to update the shapes table if present. Not sure about that yet.

Happy to contribute if needed and when you think the idea makes sense!

Relates to #44.

merge gtfs_stops_as_sf() from gfs2gps

rename to convert_stops_to_sf()

merge filter_shape_id() from gfs2gps

Check whether `agency_id` field exists when filtering feeds

agency_id is conditionally required in agency, routes and fare_attributes, so we may have some problems when filtering feeds that don't list them.

Also, it may also be a problem when it's listed on agency but not on e.g. routes, because when doing filter_by_route_id(gtfs, route_id = character(0)) I think the whole feed will be empty but the agency table won't.

A solution to the second problem would be to "clear" agency if all other tables are empty. But to fix that then we'd need to first fix #46.

`remove_duplicates()`

Many GTFS files have duplicated entries on them (for example, spo_gtfs has an agency.txt with 2 identical rows and many other duplications spread throughout the tables).

prune_gtfs() will basically remove duplicate entries from the file. It can be used in conjunction with the filter_*** functions as well, calling it either in the beginning or the end of the filtering process.

transfer gtfs samples from gtfs2gps

`set_trip_speed()` raises an `max()`-related warning when none of the specified trips exist

library(gtfstools) 

path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools") 
gtfs <- read_gtfs(path) 

a <- set_trip_speed(gtfs, "a", 1) 
#> Warning in get_trip_geometry(gtfs, trip_id, file = "shapes"): 'trips' doesn't
#> contain the following trip_id(s): 'a'
#> Warning in max(stop_sequence): no non-missing arguments to max; returning -Inf

b <- set_trip_speed(gtfs, character(0), 1)
#> Warning in max(stop_sequence): no non-missing arguments to max; returning -Inf

running goodpractice::gp()

This is the output of goodpractice::gp() atm:

It is good practice to

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R\get_trip_duration.R:112:1
    R\get_trip_speed.R:38:1
    R\merge_gtfs.R:103:1
    R\read_gtfs.R:10:1
    R\read_gtfs.R:13:1
    ... and 65 more lines

  ✖ fix this R CMD check NOTE: Namespaces in Imports field not imported from: 'lwgeom' 'methods' All declared Imports should
    be used.
  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd problems. LaTeX errors
    found:

I have already fixed many long code lines. The large majority of what is left are: lines that are 1 or 2 characters longer than 80; test_that() headers. Should these be changed as well?
I've tried asking the folks from {sf} what to do with the {lwgeom} dependency. The {methods} note refers to using methods::setClass() and methods::setAs() as build-time dependency, instead of a run-time dependency. Apparently just adding @importFrom methods setClass setAs should fix, but I'll investigate if there's a more elegant solution.
I haven't yet looked at the third note.

get_trip_speed: Consider distance between first and last stop

In the get_trip_speed() function, the code currently considers the lenght of the entire trip (see below). However, it's quite common that the trip geometry stretches longer the the 1st and last stops. To get more accurate results, we would need to tweak the code to consider only the distance AND duration between the1st and last stops.

  # generate desired geometries - checking for required files/fields is done
  # pretty early into get_trip_geometry code

  trips_geometries <- get_trip_geometry(gtfs, trip_id, file)

  # calculate the length of each geometry

  trips_length <- sf::st_length(trips_geometries)
  if (unit == "km/h") trips_length <- units::set_units(trips_length, "km")
  trips_length <- as.numeric(trips_length)

`filter_by_sf()` function

A function to "Develop a function to reduce a GTFS feed to within a defined bounding box, through removing all routes which do not pass within the bounding box at all.", as mentioned by @mpadge here.

I remember @dhersz had something similar to this, but it would be nice to hear from @mpadge what he has in mind.

`get_children_stops()` behaviour should similar to other functions in the package

i.e. get_children_stops(gtfs) should return the children of all stops in the GTFS, and not error.
same for get_parent_station()

to do list

remove little usethis dependency when saving gtfs_metadata
implement filtering functions
implement gtfstools_to_tidytransit()
implement set_times_to_seconds()
implement prune_gtfs()
reduce poa_gtfs after implementing filtering functions
create vignette
improve README
change merge_gtfs() and validate_gtfs() examples

please add here or comment below any other features you'd like to see

things I really need to check/know might be problematic:

need to check if first departure_time/last arrival_time are missing and handle it adequately in `get_trip_duration()
Suggests != Depends - conditionally run tests and examples that depend on {lwgeom}