Giter Site home page Giter Site logo

Comments (10)

mpadge avatar mpadge commented on June 14, 2024 2

No worries @dhersz, feel free to ping anytime:smile: I agree with @mvpsaraiva that filter_by_time_of_day makes a lot more intuitive sense than filter_by_day_period - i definitely would have no real understanding what filter_by_day_period is supposed to mean.

from gtfstools.

dhersz avatar dhersz commented on June 14, 2024 1

Ok, I have decided in favour of update_frequencies = TRUE. I agree with Marcus that I normally wouldn't expect the function to change the data, but if the frequencies table is not updated we won't have a "correct" GTFS after all.

Regarding the name, not sure which one is best. I've "copied" the name from {gtfs2gps}, but filter_by_time_of_day() seems good too - although too underscore-y?

Perhaps @mpadge could help us on that (sorry for pinging you out of nowhere, but you're the only native English speaker that has contributed to the package to this date :P).

from gtfstools.

dhersz avatar dhersz commented on June 14, 2024 1

Cool, thank you very much Mark! I'll update the function/documentation/tests as soon as possible and will close this issue once that's done.

from gtfstools.

dhersz avatar dhersz commented on June 14, 2024

The last few commits introduced this function to the package. Since its behaviour can be quite complex, I think it's of good taste to quickly show how it works in this comment.

First a quick look on the original frequencies and stop_times tables:

library(gtfstools)

path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
gtfs <- read_gtfs(path)

head(gtfs$frequencies)
#>       trip_id start_time end_time headway_secs
#> 1: CPTM L07-0   04:00:00 04:59:00          720
#> 2: CPTM L07-0   05:00:00 05:59:00          360
#> 3: CPTM L07-0   06:00:00 06:59:00          360
#> 4: CPTM L07-0   07:00:00 07:59:00          360
#> 5: CPTM L07-0   08:00:00 08:59:00          360
#> 6: CPTM L07-0   09:00:00 09:59:00          480
head(gtfs$stop_times)
#>       trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0     04:00:00       04:00:00   18940             1
#> 2: CPTM L07-0     04:08:00       04:08:00   18920             2
#> 3: CPTM L07-0     04:16:00       04:16:00   18919             3
#> 4: CPTM L07-0     04:24:00       04:24:00   18917             4
#> 5: CPTM L07-0     04:32:00       04:32:00   18916             5
#> 6: CPTM L07-0     04:40:00       04:40:00   18965             6

When filtering by time period, it's important to filter both the frequencies and the stop_times table. The stop_times entries of trips described in frequencies, however, should not be filtered, because those are just templates that describe how long it takes from one stop to another (i.e. the departure and arrival times listed there should not be considered "as is"). So for example, filtering the the gtfs object above from 5am to 6am doesn't change the stop_times of frequencies' trips:

filtered_gtfs <- filter_by_day_period(gtfs, "05:00:00", "06:00:00")

head(filtered_gtfs$frequencies)
#>       trip_id start_time end_time headway_secs
#> 1: CPTM L07-0   05:00:00 05:59:00          360
#> 2: CPTM L07-1   05:00:00 05:59:00          360
#> 3: CPTM L08-0   05:00:00 05:59:00          480
#> 4: CPTM L08-1   05:00:00 05:59:00          480
#> 5: CPTM L09-0   05:00:00 05:59:00          480
#> 6: CPTM L09-1   05:00:00 05:59:00          480
head(filtered_gtfs$stop_times)
#>       trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0     04:00:00       04:00:00   18940             1
#> 2: CPTM L07-0     04:08:00       04:08:00   18920             2
#> 3: CPTM L07-0     04:16:00       04:16:00   18919             3
#> 4: CPTM L07-0     04:24:00       04:24:00   18917             4
#> 5: CPTM L07-0     04:32:00       04:32:00   18916             5
#> 6: CPTM L07-0     04:40:00       04:40:00   18965             6

As usual, you can use the keep parameter:

filtered_gtfs <- filter_by_day_period(
    gtfs,
    "05:00:00",
    "06:00:00",
    keep = FALSE
)
head(filtered_gtfs$frequencies)
#>       trip_id start_time end_time headway_secs
#> 1: CPTM L07-0   04:00:00 04:59:00          720
#> 2: CPTM L07-0   06:00:00 06:59:00          360
#> 3: CPTM L07-0   07:00:00 07:59:00          360
#> 4: CPTM L07-0   08:00:00 08:59:00          360
#> 5: CPTM L07-0   09:00:00 09:59:00          480
#> 6: CPTM L07-0   10:00:00 10:59:00          480

But keep works kinda "strangely" with the frequencies table. Let's say we want to filter the feed to keep trips from 5:30am to 6am. We will have to keep the entire frequencies entry that describes the trip from 5am to 6am:

filtered_gtfs <- filter_by_day_period(gtfs, "05:30:00", "06:00:00")
head(filtered_gtfs$frequencies)
#>       trip_id start_time end_time headway_secs
#> 1: CPTM L07-0   05:00:00 05:59:00          360
#> 2: CPTM L07-1   05:00:00 05:59:00          360
#> 3: CPTM L08-0   05:00:00 05:59:00          480
#> 4: CPTM L08-1   05:00:00 05:59:00          480
#> 5: CPTM L09-0   05:00:00 05:59:00          480
#> 6: CPTM L09-1   05:00:00 05:59:00          480

But we wanted to get rid from the trips from 5am to 5:30am. In this case, we can use the update_frequencies parameter, that solves this problem for us:

filtered_gtfs <- filter_by_day_period(
    gtfs,
    "05:30:00",
    "06:00:00",
    update_frequencies = TRUE
)
head(filtered_gtfs$frequencies)
#>       trip_id start_time end_time headway_secs
#> 1: CPTM L07-0   05:30:00 05:59:00          360
#> 2: CPTM L07-1   05:30:00 05:59:00          360
#> 3: CPTM L08-0   05:30:00 05:59:00          480
#> 4: CPTM L08-1   05:30:00 05:59:00          480
#> 5: CPTM L09-0   05:30:00 05:59:00          480
#> 6: CPTM L09-1   05:30:00 05:59:00          480

The function also adjusts the frequencies table according to the exact_times field. This field indicates whether the service follows a fixed schedule throughout the day or not. If it's 0 (or if it's not present), the service does not follow a fixed schedule. Instead, the operators try to maintain the listed headways. If exact_times is 1, however, operators try to strictly adhere to the start times and headway. As a result, when updating the start_time field we need to follow the listed headway. So for example, if we set exact_times to 1 in our feed, and filter from 05:05am to 6am, we get some trips starting at 05:06am and 05:08am, because had we updated it to 05:05am the trip wouldn't be respecting the headway originally listed:

gtfs$frequencies[, exact_times := 1]
filtered_gtfs <- filter_by_day_period(
    gtfs,
    "05:05:00",
    "06:00:00",
    update_frequencies = TRUE
)
head(filtered_gtfs$frequencies)
#>       trip_id start_time end_time headway_secs exact_times
#> 1: CPTM L07-0   05:06:00 05:59:00          360           1
#> 2: CPTM L07-1   05:06:00 05:59:00          360           1
#> 3: CPTM L08-0   05:08:00 05:59:00          480           1
#> 4: CPTM L08-1   05:08:00 05:59:00          480           1
#> 5: CPTM L09-0   05:08:00 05:59:00          480           1
#> 6: CPTM L09-1   05:08:00 05:59:00          480           1

Now let's suppose this filter didn't had a frequencies table. When filtering the stop_times, we have two options. We either keep entire trips that cross the specified period, or we keep only the trip segments within this period. To control this behaviour you can use the full_trips parameter:

gtfs$frequencies <- NULL
filtered_gtfs <- filter_by_day_period(
    gtfs,
    "05:00:00",
    "06:00:00"
)
head(filtered_gtfs$stop_times)
#>       trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0     05:04:00       05:04:00 4114459             9
#> 2: CPTM L07-0     05:12:00       05:12:00   18921            10
#> 3: CPTM L07-0     05:20:00       05:20:00   18924            11
#> 4: CPTM L07-0     05:28:00       05:28:00   18925            12
#> 5: CPTM L07-0     05:36:00       05:36:00   18926            13
#> 6: CPTM L07-0     05:44:00       05:44:00   18971            14

filtered_gtfs <- filter_by_day_period(
    gtfs,
    "05:00:00",
    "06:00:00",
    full_trips = TRUE
)
head(filtered_gtfs$stop_times)
#>       trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0     04:00:00       04:00:00   18940             1
#> 2: CPTM L07-0     04:08:00       04:08:00   18920             2
#> 3: CPTM L07-0     04:16:00       04:16:00   18919             3
#> 4: CPTM L07-0     04:24:00       04:24:00   18917             4
#> 5: CPTM L07-0     04:32:00       04:32:00   18916             5
#> 6: CPTM L07-0     04:40:00       04:40:00   18965             6

And finally, it's important to understand how the keep parameter work with full_trips. If full_trips is FALSE and keep is FALSE, it will keep segments outside the specified period. If keep is FALSE and full_trips is TRUE, however, the function will drop any trips that cross the specified period (which is analogous of keeping entire trips that cross the period):

filtered_gtfs <- filter_by_day_period(
    gtfs,
    "04:24:00",
    "06:00:00",
    keep = FALSE
)
head(filtered_gtfs$stop_times)
#>       trip_id arrival_time departure_time stop_id stop_sequence
#> 1: CPTM L07-0     04:00:00       04:00:00   18940             1
#> 2: CPTM L07-0     04:08:00       04:08:00   18920             2
#> 3: CPTM L07-0     04:16:00       04:16:00   18919             3
#> 4: CPTM L07-0     06:08:00       06:08:00   18974            17
#> 5: CPTM L07-0     06:16:00       06:16:00   18975            18
#> 6: CPTM L07-1     04:00:00       04:00:00   18975             1

filtered_gtfs <- filter_by_day_period(
    gtfs,
    "04:24:00",
    "06:00:00",
    keep = FALSE,
    full_trips = TRUE
)
filtered_gtfs$stop_times[trip_id == "CPTM L07-0"]
#> Empty data.table (0 rows and 5 cols): trip_id,arrival_time,departure_time,stop_id,stop_sequence

The function also covers some other hairy cases, but this is a good overview of basic functionality. I hope you like it. I tried my best documenting these pieces of behaviour using text and examples.

from gtfstools.

dhersz avatar dhersz commented on June 14, 2024

Before closing this issue: I'm not sure what should be the default values of full_trips and update_frequencies. I have set both to FALSE for now, but perhaps update_frequencies = TRUE is more sensible?

At the same time I didn't want the function to change the entries by default, because it goes beyond the goal of simply filtering the tables... So I'm not sure what to do here, and I wanted to hear any opinions you may have on this.

from gtfstools.

rafapereirabr avatar rafapereirabr commented on June 14, 2024

Hi @dhersz . This looks really great. I checked the documentation and it also reads very clear to me. Two quick comments:

Regarding the default parameters

Intuitively, I believe these values here make more sense to me as a user.

  • full_trips = FALSE
  • update = TRUE

From what I understand, setting update = TRUE would not distort the number of trips , right?

Regarding the filtering the stop_times when there is a frequencies table

The documentation says that filtering the stop_times when there is a frequencies table does not have any effect. I'm curious if you have tested this with r5r or OTP. My point is, even if it does not have any effect, I don't see a reason why not to filter the stop_times as well.

from gtfstools.

dhersz avatar dhersz commented on June 14, 2024

Regarding the default parameters

Intuitively, I believe these values here make more sense to me as a user.

* `full_trips = FALSE`

* `update = TRUE`

From what I understand, setting update = TRUE would not distort the number of trips , right?

Yes, you're right. It only updates the existing entries.

Regarding the filtering the stop_times when there is a frequencies table

The documentation says that filtering the stop_times when there is a frequencies table does not have any effect. I'm curious if you have tested this with r5r or OTP. My point is, even if it does not have any effect, I don't see a reason why not to filter the stop_times as well.

Perhaps I have to make this clearer, but it's not that it doesn't filter stop_times at all. It doesn't filter the stop_times entries of trips listed in the frequencies table. So taking the "CPTM L07-0" trip that appears a lot in the examples above, its stop_times entries should not be taken as literally leaving at 4am, then 4:08am and then 4:16, but rather as a template that says that from the first stop to the second takes 8 minutes, and from the second to the third too.

According to the frequencies table this trip departs every 6 minutes from 5am to 6am. So the stop_times table says that if a trip departs at 5am from the first stop, it will arrive at 5:08am to the second and at 5:16 to the third.

So assuming we want to drop trips that fall within 4am to 4:30am, it doesn't make sense to filter out the stop_times entries, because they're used to describe the trips that start outside this period as well (even though the template says 4am, then 4:08, etc).

from gtfstools.

rafapereirabr avatar rafapereirabr commented on June 14, 2024

Ok, so in this case it makes sense to set update = TRUE.

Regarding the second point. If dropping the other entries does not have any effect, then I think it would be better to drop them simply to avoid the confusion among users. It's just an opinion here. I'm glad to discuss this further and hear what the others think

from gtfstools.

mvpsaraiva avatar mvpsaraiva commented on June 14, 2024

Here are my 2 cents. Feel free to ignore them if they don't make sense.

  • I don't think the name filter_by_day_period() is clear enough. I think a better name would be something likefilter_by_time_of_day(). Perhaps we could get the opinion of a native English speaker on this.
  • I also don't like the idea of update_frequencies = TRUE by default. When we call a function that filters something, we don't expect that function to also change the data. In general, I think such side effects should be avoided. But you guys are heavier GTFS users than me, so perhaps updating the records makes sense in this case.

from gtfstools.

dhersz avatar dhersz commented on June 14, 2024

Done in 9355a46.

from gtfstools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.