Comments (10)
No worries @dhersz, feel free to ping anytime:smile: I agree with @mvpsaraiva that filter_by_time_of_day
makes a lot more intuitive sense than filter_by_day_period
- i definitely would have no real understanding what filter_by_day_period
is supposed to mean.
from gtfstools.
Ok, I have decided in favour of update_frequencies = TRUE
. I agree with Marcus that I normally wouldn't expect the function to change the data, but if the frequencies table is not updated we won't have a "correct" GTFS after all.
Regarding the name, not sure which one is best. I've "copied" the name from {gtfs2gps}, but filter_by_time_of_day()
seems good too - although too underscore-y?
Perhaps @mpadge could help us on that (sorry for pinging you out of nowhere, but you're the only native English speaker that has contributed to the package to this date :P).
from gtfstools.
Cool, thank you very much Mark! I'll update the function/documentation/tests as soon as possible and will close this issue once that's done.
from gtfstools.
The last few commits introduced this function to the package. Since its behaviour can be quite complex, I think it's of good taste to quickly show how it works in this comment.
First a quick look on the original frequencies and stop_times tables:
library(gtfstools)
path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
gtfs <- read_gtfs(path)
head(gtfs$frequencies)
#> trip_id start_time end_time headway_secs
#> 1: CPTM L07-0 04:00:00 04:59:00 720
#> 2: CPTM L07-0 05:00:00 05:59:00 360
#> 3: CPTM L07-0 06:00:00 06:59:00 360
#> 4: CPTM L07-0 07:00:00 07:59:00 360
#> 5: CPTM L07-0 08:00:00 08:59:00 360
#> 6: CPTM L07-0 09:00:00 09:59:00 480
head(gtfs$stop_times)
#> trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0 04:00:00 04:00:00 18940 1
#> 2: CPTM L07-0 04:08:00 04:08:00 18920 2
#> 3: CPTM L07-0 04:16:00 04:16:00 18919 3
#> 4: CPTM L07-0 04:24:00 04:24:00 18917 4
#> 5: CPTM L07-0 04:32:00 04:32:00 18916 5
#> 6: CPTM L07-0 04:40:00 04:40:00 18965 6
When filtering by time period, it's important to filter both the frequencies and the stop_times table. The stop_times entries of trips described in frequencies, however, should not be filtered, because those are just templates that describe how long it takes from one stop to another (i.e. the departure and arrival times listed there should not be considered "as is"). So for example, filtering the the gtfs object above from 5am to 6am doesn't change the stop_times of frequencies' trips:
filtered_gtfs <- filter_by_day_period(gtfs, "05:00:00", "06:00:00")
head(filtered_gtfs$frequencies)
#> trip_id start_time end_time headway_secs
#> 1: CPTM L07-0 05:00:00 05:59:00 360
#> 2: CPTM L07-1 05:00:00 05:59:00 360
#> 3: CPTM L08-0 05:00:00 05:59:00 480
#> 4: CPTM L08-1 05:00:00 05:59:00 480
#> 5: CPTM L09-0 05:00:00 05:59:00 480
#> 6: CPTM L09-1 05:00:00 05:59:00 480
head(filtered_gtfs$stop_times)
#> trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0 04:00:00 04:00:00 18940 1
#> 2: CPTM L07-0 04:08:00 04:08:00 18920 2
#> 3: CPTM L07-0 04:16:00 04:16:00 18919 3
#> 4: CPTM L07-0 04:24:00 04:24:00 18917 4
#> 5: CPTM L07-0 04:32:00 04:32:00 18916 5
#> 6: CPTM L07-0 04:40:00 04:40:00 18965 6
As usual, you can use the keep
parameter:
filtered_gtfs <- filter_by_day_period(
gtfs,
"05:00:00",
"06:00:00",
keep = FALSE
)
head(filtered_gtfs$frequencies)
#> trip_id start_time end_time headway_secs
#> 1: CPTM L07-0 04:00:00 04:59:00 720
#> 2: CPTM L07-0 06:00:00 06:59:00 360
#> 3: CPTM L07-0 07:00:00 07:59:00 360
#> 4: CPTM L07-0 08:00:00 08:59:00 360
#> 5: CPTM L07-0 09:00:00 09:59:00 480
#> 6: CPTM L07-0 10:00:00 10:59:00 480
But keep
works kinda "strangely" with the frequencies table. Let's say we want to filter the feed to keep trips from 5:30am to 6am. We will have to keep the entire frequencies entry that describes the trip from 5am to 6am:
filtered_gtfs <- filter_by_day_period(gtfs, "05:30:00", "06:00:00")
head(filtered_gtfs$frequencies)
#> trip_id start_time end_time headway_secs
#> 1: CPTM L07-0 05:00:00 05:59:00 360
#> 2: CPTM L07-1 05:00:00 05:59:00 360
#> 3: CPTM L08-0 05:00:00 05:59:00 480
#> 4: CPTM L08-1 05:00:00 05:59:00 480
#> 5: CPTM L09-0 05:00:00 05:59:00 480
#> 6: CPTM L09-1 05:00:00 05:59:00 480
But we wanted to get rid from the trips from 5am to 5:30am. In this case, we can use the update_frequencies
parameter, that solves this problem for us:
filtered_gtfs <- filter_by_day_period(
gtfs,
"05:30:00",
"06:00:00",
update_frequencies = TRUE
)
head(filtered_gtfs$frequencies)
#> trip_id start_time end_time headway_secs
#> 1: CPTM L07-0 05:30:00 05:59:00 360
#> 2: CPTM L07-1 05:30:00 05:59:00 360
#> 3: CPTM L08-0 05:30:00 05:59:00 480
#> 4: CPTM L08-1 05:30:00 05:59:00 480
#> 5: CPTM L09-0 05:30:00 05:59:00 480
#> 6: CPTM L09-1 05:30:00 05:59:00 480
The function also adjusts the frequencies table according to the exact_times
field. This field indicates whether the service follows a fixed schedule throughout the day or not. If it's 0 (or if it's not present), the service does not follow a fixed schedule. Instead, the operators try to maintain the listed headways. If exact_times
is 1, however, operators try to strictly adhere to the start times and headway. As a result, when updating the start_time field we need to follow the listed headway. So for example, if we set exact_times
to 1 in our feed, and filter from 05:05am to 6am, we get some trips starting at 05:06am and 05:08am, because had we updated it to 05:05am the trip wouldn't be respecting the headway originally listed:
gtfs$frequencies[, exact_times := 1]
filtered_gtfs <- filter_by_day_period(
gtfs,
"05:05:00",
"06:00:00",
update_frequencies = TRUE
)
head(filtered_gtfs$frequencies)
#> trip_id start_time end_time headway_secs exact_times
#> 1: CPTM L07-0 05:06:00 05:59:00 360 1
#> 2: CPTM L07-1 05:06:00 05:59:00 360 1
#> 3: CPTM L08-0 05:08:00 05:59:00 480 1
#> 4: CPTM L08-1 05:08:00 05:59:00 480 1
#> 5: CPTM L09-0 05:08:00 05:59:00 480 1
#> 6: CPTM L09-1 05:08:00 05:59:00 480 1
Now let's suppose this filter didn't had a frequencies table. When filtering the stop_times, we have two options. We either keep entire trips that cross the specified period, or we keep only the trip segments within this period. To control this behaviour you can use the full_trips
parameter:
gtfs$frequencies <- NULL
filtered_gtfs <- filter_by_day_period(
gtfs,
"05:00:00",
"06:00:00"
)
head(filtered_gtfs$stop_times)
#> trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0 05:04:00 05:04:00 4114459 9
#> 2: CPTM L07-0 05:12:00 05:12:00 18921 10
#> 3: CPTM L07-0 05:20:00 05:20:00 18924 11
#> 4: CPTM L07-0 05:28:00 05:28:00 18925 12
#> 5: CPTM L07-0 05:36:00 05:36:00 18926 13
#> 6: CPTM L07-0 05:44:00 05:44:00 18971 14
filtered_gtfs <- filter_by_day_period(
gtfs,
"05:00:00",
"06:00:00",
full_trips = TRUE
)
head(filtered_gtfs$stop_times)
#> trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0 04:00:00 04:00:00 18940 1
#> 2: CPTM L07-0 04:08:00 04:08:00 18920 2
#> 3: CPTM L07-0 04:16:00 04:16:00 18919 3
#> 4: CPTM L07-0 04:24:00 04:24:00 18917 4
#> 5: CPTM L07-0 04:32:00 04:32:00 18916 5
#> 6: CPTM L07-0 04:40:00 04:40:00 18965 6
And finally, it's important to understand how the keep
parameter work with full_trips
. If full_trips
is FALSE
and keep is FALSE
, it will keep segments outside the specified period. If keep
is FALSE
and full_trips
is TRUE
, however, the function will drop any trips that cross the specified period (which is analogous of keeping entire trips that cross the period):
filtered_gtfs <- filter_by_day_period(
gtfs,
"04:24:00",
"06:00:00",
keep = FALSE
)
head(filtered_gtfs$stop_times)
#> trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0 04:00:00 04:00:00 18940 1
#> 2: CPTM L07-0 04:08:00 04:08:00 18920 2
#> 3: CPTM L07-0 04:16:00 04:16:00 18919 3
#> 4: CPTM L07-0 06:08:00 06:08:00 18974 17
#> 5: CPTM L07-0 06:16:00 06:16:00 18975 18
#> 6: CPTM L07-1 04:00:00 04:00:00 18975 1
filtered_gtfs <- filter_by_day_period(
gtfs,
"04:24:00",
"06:00:00",
keep = FALSE,
full_trips = TRUE
)
filtered_gtfs$stop_times[trip_id == "CPTM L07-0"]
#> Empty data.table (0 rows and 5 cols): trip_id,arrival_time,departure_time,stop_id,stop_sequence
The function also covers some other hairy cases, but this is a good overview of basic functionality. I hope you like it. I tried my best documenting these pieces of behaviour using text and examples.
from gtfstools.
Before closing this issue: I'm not sure what should be the default values of full_trips
and update_frequencies
. I have set both to FALSE
for now, but perhaps update_frequencies = TRUE
is more sensible?
At the same time I didn't want the function to change the entries by default, because it goes beyond the goal of simply filtering the tables... So I'm not sure what to do here, and I wanted to hear any opinions you may have on this.
from gtfstools.
Hi @dhersz . This looks really great. I checked the documentation and it also reads very clear to me. Two quick comments:
Regarding the default parameters
Intuitively, I believe these values here make more sense to me as a user.
full_trips = FALSE
update = TRUE
From what I understand, setting update = TRUE
would not distort the number of trips , right?
Regarding the filtering the stop_times
when there is a frequencies
table
The documentation says that filtering the stop_times
when there is a frequencies
table does not have any effect. I'm curious if you have tested this with r5r or OTP. My point is, even if it does not have any effect, I don't see a reason why not to filter the stop_times
as well.
from gtfstools.
Regarding the default parameters
Intuitively, I believe these values here make more sense to me as a user.
* `full_trips = FALSE` * `update = TRUE`
From what I understand, setting
update = TRUE
would not distort the number of trips , right?
Yes, you're right. It only updates the existing entries.
Regarding the filtering the
stop_times
when there is afrequencies
tableThe documentation says that filtering the
stop_times
when there is afrequencies
table does not have any effect. I'm curious if you have tested this with r5r or OTP. My point is, even if it does not have any effect, I don't see a reason why not to filter thestop_times
as well.
Perhaps I have to make this clearer, but it's not that it doesn't filter stop_times at all. It doesn't filter the stop_times entries of trips listed in the frequencies table. So taking the "CPTM L07-0"
trip that appears a lot in the examples above, its stop_times entries should not be taken as literally leaving at 4am, then 4:08am and then 4:16, but rather as a template that says that from the first stop to the second takes 8 minutes, and from the second to the third too.
According to the frequencies table this trip departs every 6 minutes from 5am to 6am. So the stop_times table says that if a trip departs at 5am from the first stop, it will arrive at 5:08am to the second and at 5:16 to the third.
So assuming we want to drop trips that fall within 4am to 4:30am, it doesn't make sense to filter out the stop_times entries, because they're used to describe the trips that start outside this period as well (even though the template says 4am, then 4:08, etc).
from gtfstools.
Ok, so in this case it makes sense to set update = TRUE
.
Regarding the second point. If dropping the other entries does not have any effect, then I think it would be better to drop them simply to avoid the confusion among users. It's just an opinion here. I'm glad to discuss this further and hear what the others think
from gtfstools.
Here are my 2 cents. Feel free to ignore them if they don't make sense.
- I don't think the name
filter_by_day_period()
is clear enough. I think a better name would be something likefilter_by_time_of_day()
. Perhaps we could get the opinion of a native English speaker on this. - I also don't like the idea of
update_frequencies = TRUE
by default. When we call a function that filters something, we don't expect that function to also change the data. In general, I think such side effects should be avoided. But you guys are heavier GTFS users than me, so perhaps updating the records makes sense in this case.
from gtfstools.
Done in 9355a46.
from gtfstools.
Related Issues (20)
- Release gtfstools v1.1.0
- behavior of filter_by_sf with sf::st_crop HOT 2
- function to calculate `shape_dist_traveled` from shape geometry
- validate_gtfs() should also work with gtfs saved as directories
- write gtfs validation vignette
- new parameter `sort_sequence` to shape-related functions HOT 1
- error in `set_trip_speed()` HOT 10
- improving flexibility of filter_by_route_id HOT 1
- `filter_by_*` should take into account when `agency.txt` has only 1 agency and it's not listed in the other tables HOT 1
- Release gtfstools v1.2.0
- adapt `validate_gtfs()` and `merge_gtfs()` to start accepting other types of gtfs objects HOT 1
- New function stop_times_to_frequencies() HOT 4
- Extraction by route_type for extended route types HOT 6
- Foreign key violation after filter_by_sf() HOT 8
- Error when using get_trip_geometry() and get_trip_speed(): sfheaders - error indexing lines HOT 3
- After processing saves round shape_distances in exponential notation HOT 7
- function copy_gtfs_without_field HOT 1
- Deprecate `full_trips` in `filter_by_stop_id()` for v2.0.0
- Superseed `filter_by_sf()` with `filter_by_spatial_extent()` and create `crop_gtfs()` HOT 1
- Extended GTFS Route Types are mentioned in docs, but throw an error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gtfstools.