urbananalyst / m4ra Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 2.0 5.17 MB

many-to-many multi-modal routing aggregator

Home Page: https://urbananalyst.github.io/m4ra/

R 58.18% Makefile 0.25% C++ 41.57%

gtfs open-street-map routing-engine

m4ra's People

Contributors

Stargazers

Watchers

Forkers

prototypefund prototypefund

m4ra's Issues

Reduce sizes of initial times,transfers,interval matrices

The multi-modal algorithm currently reads in full GTFS matrices and passes these straight through to C++ routines. For cities like Paris with 30,000 stops, these matrices are huge and potentially too big to hold in memory. Sizes should be reduced to only a subset of GTFS origin points that are within the maximum travel duration of all origin points.

rm remotes dodgr from DESC

It's still necessary (21 sep 2022) because CRAN hasn't yet build binaries for all OSs of dodgr v0.2.16. Once they're all there, the remotes dep can be removed.

Travel modes

Single modes

Multiple modes in conjunction with public transport (PT)

walk -> PT -> walk
walk -> PT -> cycle
cycle -> PT -> walk
walk -> PT -> shared scooter
shared scooter -> PT -> walk
non-eletric scooter -> PT -> non-electric scooter
electric scooter -> PT -> electric scooter

Shared eletric scooters won't be analysed for now, because the areas in which they are (potentially) available are generally highly regulated, preventing the construction of any generally realistic models.

Loop over subsets of stops in m4ra_gtfs_traveltimes

Because for cities like Paris, the matrices are huge and almost don't fit into 32GB of memory. Looping over subsets and saving intermediate results before a final re-assembly step will be more memory-efficient, and will also allow a progress indicator.

Parallelise add_net_to_gtfs

That's the final part of the multi-modal routing algorithm, and really does not scale well with larger networks. It can easily be parallelised.

Main travel times algorithm

TODO:

Generate matrix of traveltimes within GTFS feed (#5)
Generate matrix of traveltimes from all network points to GTFS stops
Generate matrix of traveltimes from all GTFS stops to all network points
Algorithm to combine these three matrices

m4ra_times_mm_car function

The multi modal function #8 now works, as do the parking penalties for automobile travel #9. These can now be combined to replicate https://github.com/ATFutures/m4ra/blob/main/R/times-bike-car.R to write an equivalent function to generate ratios of multi-modal travel times to equivalent automobile times.

Vehicle times

Time penalties for congestion

MapBox Traffic Data has useable data, but is a commercial (not free) service. The TomTom API, also discussed below, offers a Traffic Density endpoint which could be used. It is asyncronous, and non-trivial to process requests, so would be very difficult to use at the scales needed here. It could nevertheless be used to provide some calibration estimates?

Requirements

A way of scaling current estimates of journey times to actual values, ideally based on network centrality estimates.

Time penalties for parking

Vehicle times need to include additional time penalties for parking. This paper from hpi.de (with pdf here and a slightly different version here) describes a probabilistic algorithm for estimating times. It's based on date obtained from a TomTom API which clearly no longer exists (it was called "on-street-parking").

Their results at the end of the second pdf linked to above show some plausible benchmark values of extra driving time of 100-200 seconds, extra walking time of marginally less than that, and overall additional times of around 300 seconds. There are strong variations in both space and time. The underlying algorithm assumes a Pareto II probability density of finding a parking spot per metre of driving, which it itself spatially and temporally variable. The TomTom data were used to calibrate these probability densities in time and space, and considered factors including functional road class and average speed of road segments, although they concluded that length of a road segment was the single determining factor.

Requirements

Estimates of local density of parking spots relative to local population and/or employment densities
A heuristic model of daily temporal variability of parking occupancy based on relative densities of population, employment, and other factors
A method to convert those into something akin to the probability densities described in the paper linked above, and to use that to aggregate single metrics for each street segment and time period of likely parking durations plus additional durations to walk from car to destination points.

running m4ra_gtfs_traveltimes on windows is impossible due to mc.cores>1

trying to run m4ra_gtfs_traveltimes on windows fails due to the fact that num_cores gets the number of cores on the computer, which is larger than 1. num_cores is fed into mcapply mc.cores argument,without the possibilty of changing the parameter. is m4ra not suitable for windows?

Bundle dodgr distance algorithms here

It would be good to expose the main algorithms in inst/include of dodgr and link them that way, but they'll need quite a bit of custom modification here anyway, notably including modifying to work with distance matrices too large to hold in memory, as well as stripping away many of the other options built in there. It'll be much easier here to bundle the algorithms internally, with the (potentially large) downside of having to cross-maintain these two packages.

Return data on number of transfers

In some suitably standardised form, like transfers per 10km or so. That will then also require either an actual spatial representation of the transport network, or maybe use of walking distances as a proxy? Network distances can and should be extracted from OSM, and not from GTFS files, because the latter will not necessarily have shape data needed to calculate inter-station distances. This will then require an additional pre-processing routine, to calculate a matrix of distances between all GTFS stops.

Restrict main MM routines to distance threshold

The GTFS part currently calculates matrices from subset of origins but to all destinations. The latter could be restricted to a subset defined by the distance threshold. Hard to know without doing it exactly how much that would improve efficiency, because most of the calculations are simply O(N) matrix manipulations, but it should have som effect.

Move atfutures -> UrbanAnalyst

Change github org

Estimate quality of bicycle infrastructure

A useful additional measure to accompany results of ˋm4ra_times_multi_mode()ˋ would be the quality of bicycle infrastructure connecting each origin point to GTFSdeparture station. That could easily be done by aggregating initial bicycle distances alongside one additional distance measure as such of distances time weighting factors. The weighting of the two would then directly reflect quality of bicycle infrastructure connecting origins to public transport services.

Add elevation to networks generated in "prepare_data" phase

Use arrow to read/write main GTFS matrices

Because #22 pre-generates a restricted index of all GTFS stops prior to loading them. Current code loads full matrices and then reduces size straight away, but would be a perfect use case for arrow. The loading for each batch of vertices is also quite slow, and arrow should considerably improve the efficiency of that.

Fix fst write routines

They currently produce garbage when writing really large files (~10M rows), but seems to be because of default compress = 50. Setting that to compress = 0 seems to resolve the issue.

Include options to extract data from local file with osmium

Change storage mode of Rcpp matrices to <int>

The travel times algorithms use IntegerMatrix:
https://github.com/ATFutures/m4ra/blob/91e503991a336bb654cc3d0be4ea094ca30fe0cd/src/travel-times.h#L44-L48
But the multi-modal versions use NumericMatrix:
https://github.com/ATFutures/m4ra/blob/91e503991a336bb654cc3d0be4ea094ca30fe0cd/src/multi-modal.h#L15-L17

This was done because the latter derive from original distance algorithms, which use NumericMatrix. This has to be changed, however, as these matrices can be really huge, whereas equivalent <int> values yield much smaller matrices. The only problem is that Rcpp offers no options for long int, as explained in FAQ 3.9 on p8. On my local machine, <int> goes up safely to at least 2 ^ 30, but definitely mis-behaves at 2^32, even though that should be possible, as is less than std::numeric_limits <int>::max ().

This change should thus be pretty straightforward, but it might be good at some stage to implement a check for behaviour of large int representations?

bike car ratios

A reprex to illustrate the kinds of analyses currently possible:

library (dplyr)
library (ggplot2)
library (m4ra)
packageVersion ("m4ra")
#> [1] '0.0.1.121'

flist <- gsub ("^m4ra\\-", "", list.files ("~/.cache/R/m4ra/"))
cities <- gsub ("\\-[a-z0-9]{6}\\-(bicycle|foot|motorcar)\\.Rds$", "", flist)
cities <- unique (cities)
cities <- cities [cities != "helsinki"]

bike_car_ratio_one_city <- function (city) {

    graph_f <- m4ra_load_cached_network (city, mode = "foot")
    graph_f <- graph_f [graph_f$component == 1, ]
    v_f <- dodgr::dodgr_vertices (graph_f)
    graph_b <- m4ra_load_cached_network (city, mode = "bicycle")
    graph_b <- graph_b [graph_b$component == 1, ]
    v_b <- dodgr::dodgr_vertices (graph_b)
    graph_c <- m4ra_load_cached_network (city, mode = "motorcar")
    graph_c <- graph_c [graph_c$component == 1, ]
    v_c <- dodgr::dodgr_vertices (graph_c)

    # Get vertices common to all networks:
    vert_count <- table (c (
        unique (graph_f$.vx0),
        unique (graph_b$.vx0),
        unique (graph_c$.vx0)
    ))
    verts_all <- names (vert_count) [which (vert_count == 3)]
    v <- v_f [which (v_f$id %in% verts_all), ]

    # Get central vertex:
    i <- which.min ((v$x - mean (v$x)) ^ 2 + (v$y - mean (v$y)) ^ 2)
    from <- v$id [i]

    dat <- m4ra_bike_car_times (city = city, from = from)
    areas <- m4ra_bike_car_ratio_areas (dat, ratio_lims = 1:20 / 4)
    areas$city <- city

    return (areas)
}

result_file <- "bike-car-ratio-results.Rds"
if (!file.exists (result_file)) {

    result <- bike_car_ratio_one_city (city = cities [1])
    count <- 2L
    system.time ({
    for (ci in cities [-1]) {
        message (ci, " [", count, " / ", length (cities), "]")
        count <- count + 1L

        result <- rbind (result, bike_car_ratio_one_city (city = ci))
    }
    })
    saveRDS (result, result_file)
}

result <- readRDS ("bike-car-ratio-results.Rds") |>
    group_by (city) |>
    filter (ratio <= 2) |>
    mutate (label = c (rep (NA_character_, length (city) - 1L), city [1]))

ggplot (result, aes (x = ratio, y = area, colour = city)) +
    geom_line () +
    geom_label (aes (label = label), nudge_x = 0.35, size = 4) +
    theme (legend.position = "none")
#> Warning: Removed 279 rows containing missing values (geom_label).

# Then summarise values at unit ratio, and slopes 1 -> 2:

x <- readRDS ("bike-car-ratio-results.Rds") |>
    filter (ratio >= 1 & ratio <= 2) |>
    transform (ratio = ratio - 1) |>
    group_by (city) |>
    do (
        mod = lm (area ~ ratio, data = .)
    ) |>
    mutate (
        a1 = mod$model$area [1],
        a2 = tail (mod$model$area, 1L),
        intercept = summary (mod)$coefficient [1],
        slope = summary (mod)$coefficient [2]
    ) |>
    arrange (by = desc (intercept))
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
print (x, n = 100)
#> # A tibble: 40 × 6
#> # Rowwise: 
#>    city          mod          a1     a2 intercept   slope
#>    <chr>         <list>    <dbl>  <dbl>     <dbl>   <dbl>
#>  1 hamburg       <lm>   185.     1267.    190.    1096.  
#>  2 paris         <lm>   188.      188.    188.       0   
#>  3 muenchen      <lm>    91.2     488.    140.     371.  
#>  4 aachen        <lm>   106.      380.    137.     263.  
#>  5 brussels      <lm>   119.      123.    121.       2.90
#>  6 duesseldorf   <lm>   129.      518.    114.     396.  
#>  7 san-francisco <lm>    98.6     253.    103.     155.  
#>  8 stuttgart     <lm>    63.6     392.     86.3    340.  
#>  9 zurich        <lm>    45.9     171.     69.1    119.  
#> 10 frankfurt     <lm>    45.8     391.     64.9    332.  
#> 11 dresden       <lm>    57.1     506.     55.8    459.  
#> 12 luxembourg    <lm>    43.2     122.     54.9     77.9 
#> 13 copenhagen    <lm>    54.3     240.     54.7    205.  
#> 14 essen         <lm>    48.2     356.     54.3    331.  
#> 15 karlsruhe     <lm>    37.1     281.     29.7    250.  
#> 16 nuernberg     <lm>    35.9     297.     28.6    271.  
#> 17 liege         <lm>    16.7     135.     20.2    125.  
#> 18 leipzig       <lm>    26.5     461.     15.4    458.  
#> 19 bielefeld     <lm>    31.2     388.     10.5    369.  
#> 20 mannheim      <lm>    18.1     194.      9.96   186.  
#> 21 leiden        <lm>     6.52     41.6     9.79    36.1 
#> 22 muenster      <lm>    25.4     376.      9.43   343.  
#> 23 ghent         <lm>    44.3     398.      7.94   346.  
#> 24 lausanne      <lm>     6.82     79.5     7.46    68.6 
#> 25 hannover      <lm>    24.6     398.      5.69   370.  
#> 26 basel         <lm>    28.7     436.      5.60   414.  
#> 27 honfluer      <lm>     0.0574   27.8     0.893   28.1 
#> 28 leuven        <lm>    11.4     246.     -0.491  244.  
#> 29 rastede       <lm>     0.282    16.9    -2.55    18.5 
#> 30 antwerpen     <lm>     1.37     99.1    -5.73    98.2 
#> 31 tallinn       <lm>    15.4     391.     -8.39   412.  
#> 32 freiburg      <lm>     1.66    109.     -9.55   109.  
#> 33 bern          <lm>    38.0     476.    -11.8    439.  
#> 34 san-sebastian <lm>     0.231    96.7   -14.5     96.1 
#> 35 brugge        <lm>    12.3     290.    -21.4    265.  
#> 36 halle         <lm>     3.16    216.    -22.3    205.  
#> 37 mainz         <lm>    21.6     417.    -22.8    383.  
#> 38 groningen     <lm>    15.6     364.    -28.9    338.  
#> 39 minsk         <lm>     6.43    314.    -30.7    293.  
#> 40 bremen        <lm>     0       171.    -33.1    164.

^{Created on 2022-10-21 with reprex v2.0.2}

Those results summarise the areas of each city over which cycling from a roughly central point of the city remains raster than driving a car. The points are nevertheless effectively random, and so the comparisons between cities don't really say anything in that case, but the general principle of the analyses remains valid.

The values in the results are:

a1 = The actual observed area within which cycling takes same time as driving (ratio = 1)
a2 = The actual observed area within which cycling takes less than twice as long as driving (ratio = 2)
intercept = The intercept of a linear regression fitted to values betwen ratios of 1 and 2
slope = The slope of a linear regression fitted to values between ratios of 1 and 2

Use fst to cache networks

Use fst to write and read locally cached network files
Use save/readRDS to write associated attributes and re-attach on load.
Use same scheme as dodgr/save_load_streetnet.R

Estimate frequency of fastest connections

Current ˋm4ra::m4ra_times_multi_modeˋ function returns single fastest time in specified internal. This should be extended to run just the GTFS part starting at just after the fastest service, and returning the time internal between the two as an indicator of service frequency.

Automobile times

Automobile times are needed to convert multi-modal times to ratios relative to equivalent automobile times. Calibration of actual times has been done via https://github.com/UrbanAnalyst/ttcalib, including potential to easily calibrate times during peak hours and not. The only remaining thing is then:

Implement time penalty for car parking, including metrics of local densities of car parks.

urbananalyst / m4ra Goto Github PK

m4ra's People

Contributors

Stargazers

Watchers

Forkers

m4ra's Issues

Single modes

Multiple modes in conjunction with public transport (PT)

TODO:

Time penalties for congestion

Requirements

Time penalties for parking

Requirements

Recommend Projects

Recommend Topics

Recommend Org