Giter Site home page Giter Site logo

m4ra's People

Contributors

mpadge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

m4ra's Issues

Reduce sizes of initial times,transfers,interval matrices

The multi-modal algorithm currently reads in full GTFS matrices and passes these straight through to C++ routines. For cities like Paris with 30,000 stops, these matrices are huge and potentially too big to hold in memory. Sizes should be reduced to only a subset of GTFS origin points that are within the maximum travel duration of all origin points.

rm remotes dodgr from DESC

It's still necessary (21 sep 2022) because CRAN hasn't yet build binaries for all OSs of dodgr v0.2.16. Once they're all there, the remotes dep can be removed.

Travel modes

Single modes

  • walking
  • cycling
  • non-electric scooters
  • electric scooters
  • motorvehicle (need to generate relative travel time estimates; see #3)

Multiple modes in conjunction with public transport (PT)

  • walk -> PT -> walk
  • walk -> PT -> cycle
  • cycle -> PT -> walk
  • walk -> PT -> shared scooter
  • shared scooter -> PT -> walk
  • non-eletric scooter -> PT -> non-electric scooter
  • electric scooter -> PT -> electric scooter

Shared eletric scooters won't be analysed for now, because the areas in which they are (potentially) available are generally highly regulated, preventing the construction of any generally realistic models.

Loop over subsets of stops in m4ra_gtfs_traveltimes

Because for cities like Paris, the matrices are huge and almost don't fit into 32GB of memory. Looping over subsets and saving intermediate results before a final re-assembly step will be more memory-efficient, and will also allow a progress indicator.

Parallelise add_net_to_gtfs

That's the final part of the multi-modal routing algorithm, and really does not scale well with larger networks. It can easily be parallelised.

Main travel times algorithm

TODO:

  • Generate matrix of traveltimes within GTFS feed (#5)
  • Generate matrix of traveltimes from all network points to GTFS stops
  • Generate matrix of traveltimes from all GTFS stops to all network points
  • Algorithm to combine these three matrices

Vehicle times

Time penalties for congestion

MapBox Traffic Data has useable data, but is a commercial (not free) service. The TomTom API, also discussed below, offers a Traffic Density endpoint which could be used. It is asyncronous, and non-trivial to process requests, so would be very difficult to use at the scales needed here. It could nevertheless be used to provide some calibration estimates?

Requirements

  • A way of scaling current estimates of journey times to actual values, ideally based on network centrality estimates.

Time penalties for parking

Vehicle times need to include additional time penalties for parking. This paper from hpi.de (with pdf here and a slightly different version here) describes a probabilistic algorithm for estimating times. It's based on date obtained from a TomTom API which clearly no longer exists (it was called "on-street-parking").

Their results at the end of the second pdf linked to above show some plausible benchmark values of extra driving time of 100-200 seconds, extra walking time of marginally less than that, and overall additional times of around 300 seconds. There are strong variations in both space and time. The underlying algorithm assumes a Pareto II probability density of finding a parking spot per metre of driving, which it itself spatially and temporally variable. The TomTom data were used to calibrate these probability densities in time and space, and considered factors including functional road class and average speed of road segments, although they concluded that length of a road segment was the single determining factor.

Requirements

  • Estimates of local density of parking spots relative to local population and/or employment densities
  • A heuristic model of daily temporal variability of parking occupancy based on relative densities of population, employment, and other factors
  • A method to convert those into something akin to the probability densities described in the paper linked above, and to use that to aggregate single metrics for each street segment and time period of likely parking durations plus additional durations to walk from car to destination points.

Bundle dodgr distance algorithms here

It would be good to expose the main algorithms in inst/include of dodgr and link them that way, but they'll need quite a bit of custom modification here anyway, notably including modifying to work with distance matrices too large to hold in memory, as well as stripping away many of the other options built in there. It'll be much easier here to bundle the algorithms internally, with the (potentially large) downside of having to cross-maintain these two packages.

Return data on number of transfers

In some suitably standardised form, like transfers per 10km or so. That will then also require either an actual spatial representation of the transport network, or maybe use of walking distances as a proxy? Network distances can and should be extracted from OSM, and not from GTFS files, because the latter will not necessarily have shape data needed to calculate inter-station distances. This will then require an additional pre-processing routine, to calculate a matrix of distances between all GTFS stops.

Restrict main MM routines to distance threshold

The GTFS part currently calculates matrices from subset of origins but to all destinations. The latter could be restricted to a subset defined by the distance threshold. Hard to know without doing it exactly how much that would improve efficiency, because most of the calculations are simply O(N) matrix manipulations, but it should have som effect.

Estimate quality of bicycle infrastructure

A useful additional measure to accompany results of ˋm4ra_times_multi_mode()ˋ would be the quality of bicycle infrastructure connecting each origin point to GTFSdeparture station. That could easily be done by aggregating initial bicycle distances alongside one additional distance measure as such of distances time weighting factors. The weighting of the two would then directly reflect quality of bicycle infrastructure connecting origins to public transport services.

Use arrow to read/write main GTFS matrices

Because #22 pre-generates a restricted index of all GTFS stops prior to loading them. Current code loads full matrices and then reduces size straight away, but would be a perfect use case for arrow. The loading for each batch of vertices is also quite slow, and arrow should considerably improve the efficiency of that.

Fix fst write routines

They currently produce garbage when writing really large files (~10M rows), but seems to be because of default compress = 50. Setting that to compress = 0 seems to resolve the issue.

Change storage mode of Rcpp matrices to <int>

The travel times algorithms use IntegerMatrix:
https://github.com/ATFutures/m4ra/blob/91e503991a336bb654cc3d0be4ea094ca30fe0cd/src/travel-times.h#L44-L48
But the multi-modal versions use NumericMatrix:
https://github.com/ATFutures/m4ra/blob/91e503991a336bb654cc3d0be4ea094ca30fe0cd/src/multi-modal.h#L15-L17

This was done because the latter derive from original distance algorithms, which use NumericMatrix. This has to be changed, however, as these matrices can be really huge, whereas equivalent <int> values yield much smaller matrices. The only problem is that Rcpp offers no options for long int, as explained in FAQ 3.9 on p8. On my local machine, <int> goes up safely to at least 2 ^ 30, but definitely mis-behaves at 2^32, even though that should be possible, as is less than std::numeric_limits <int>::max ().

This change should thus be pretty straightforward, but it might be good at some stage to implement a check for behaviour of large int representations?

bike car ratios

A reprex to illustrate the kinds of analyses currently possible:

library (dplyr)
library (ggplot2)
library (m4ra)
packageVersion ("m4ra")
#> [1] '0.0.1.121'

flist <- gsub ("^m4ra\\-", "", list.files ("~/.cache/R/m4ra/"))
cities <- gsub ("\\-[a-z0-9]{6}\\-(bicycle|foot|motorcar)\\.Rds$", "", flist)
cities <- unique (cities)
cities <- cities [cities != "helsinki"]

bike_car_ratio_one_city <- function (city) {

    graph_f <- m4ra_load_cached_network (city, mode = "foot")
    graph_f <- graph_f [graph_f$component == 1, ]
    v_f <- dodgr::dodgr_vertices (graph_f)
    graph_b <- m4ra_load_cached_network (city, mode = "bicycle")
    graph_b <- graph_b [graph_b$component == 1, ]
    v_b <- dodgr::dodgr_vertices (graph_b)
    graph_c <- m4ra_load_cached_network (city, mode = "motorcar")
    graph_c <- graph_c [graph_c$component == 1, ]
    v_c <- dodgr::dodgr_vertices (graph_c)

    # Get vertices common to all networks:
    vert_count <- table (c (
        unique (graph_f$.vx0),
        unique (graph_b$.vx0),
        unique (graph_c$.vx0)
    ))
    verts_all <- names (vert_count) [which (vert_count == 3)]
    v <- v_f [which (v_f$id %in% verts_all), ]

    # Get central vertex:
    i <- which.min ((v$x - mean (v$x)) ^ 2 + (v$y - mean (v$y)) ^ 2)
    from <- v$id [i]

    dat <- m4ra_bike_car_times (city = city, from = from)
    areas <- m4ra_bike_car_ratio_areas (dat, ratio_lims = 1:20 / 4)
    areas$city <- city

    return (areas)
}

result_file <- "bike-car-ratio-results.Rds"
if (!file.exists (result_file)) {

    result <- bike_car_ratio_one_city (city = cities [1])
    count <- 2L
    system.time ({
    for (ci in cities [-1]) {
        message (ci, " [", count, " / ", length (cities), "]")
        count <- count + 1L

        result <- rbind (result, bike_car_ratio_one_city (city = ci))
    }
    })
    saveRDS (result, result_file)
}

result <- readRDS ("bike-car-ratio-results.Rds") |>
    group_by (city) |>
    filter (ratio <= 2) |>
    mutate (label = c (rep (NA_character_, length (city) - 1L), city [1]))

ggplot (result, aes (x = ratio, y = area, colour = city)) +
    geom_line () +
    geom_label (aes (label = label), nudge_x = 0.35, size = 4) +
    theme (legend.position = "none")
#> Warning: Removed 279 rows containing missing values (geom_label).

# Then summarise values at unit ratio, and slopes 1 -> 2:

x <- readRDS ("bike-car-ratio-results.Rds") |>
    filter (ratio >= 1 & ratio <= 2) |>
    transform (ratio = ratio - 1) |>
    group_by (city) |>
    do (
        mod = lm (area ~ ratio, data = .)
    ) |>
    mutate (
        a1 = mod$model$area [1],
        a2 = tail (mod$model$area, 1L),
        intercept = summary (mod)$coefficient [1],
        slope = summary (mod)$coefficient [2]
    ) |>
    arrange (by = desc (intercept))
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
print (x, n = 100)
#> # A tibble: 40 × 6
#> # Rowwise: 
#>    city          mod          a1     a2 intercept   slope
#>    <chr>         <list>    <dbl>  <dbl>     <dbl>   <dbl>
#>  1 hamburg       <lm>   185.     1267.    190.    1096.  
#>  2 paris         <lm>   188.      188.    188.       0   
#>  3 muenchen      <lm>    91.2     488.    140.     371.  
#>  4 aachen        <lm>   106.      380.    137.     263.  
#>  5 brussels      <lm>   119.      123.    121.       2.90
#>  6 duesseldorf   <lm>   129.      518.    114.     396.  
#>  7 san-francisco <lm>    98.6     253.    103.     155.  
#>  8 stuttgart     <lm>    63.6     392.     86.3    340.  
#>  9 zurich        <lm>    45.9     171.     69.1    119.  
#> 10 frankfurt     <lm>    45.8     391.     64.9    332.  
#> 11 dresden       <lm>    57.1     506.     55.8    459.  
#> 12 luxembourg    <lm>    43.2     122.     54.9     77.9 
#> 13 copenhagen    <lm>    54.3     240.     54.7    205.  
#> 14 essen         <lm>    48.2     356.     54.3    331.  
#> 15 karlsruhe     <lm>    37.1     281.     29.7    250.  
#> 16 nuernberg     <lm>    35.9     297.     28.6    271.  
#> 17 liege         <lm>    16.7     135.     20.2    125.  
#> 18 leipzig       <lm>    26.5     461.     15.4    458.  
#> 19 bielefeld     <lm>    31.2     388.     10.5    369.  
#> 20 mannheim      <lm>    18.1     194.      9.96   186.  
#> 21 leiden        <lm>     6.52     41.6     9.79    36.1 
#> 22 muenster      <lm>    25.4     376.      9.43   343.  
#> 23 ghent         <lm>    44.3     398.      7.94   346.  
#> 24 lausanne      <lm>     6.82     79.5     7.46    68.6 
#> 25 hannover      <lm>    24.6     398.      5.69   370.  
#> 26 basel         <lm>    28.7     436.      5.60   414.  
#> 27 honfluer      <lm>     0.0574   27.8     0.893   28.1 
#> 28 leuven        <lm>    11.4     246.     -0.491  244.  
#> 29 rastede       <lm>     0.282    16.9    -2.55    18.5 
#> 30 antwerpen     <lm>     1.37     99.1    -5.73    98.2 
#> 31 tallinn       <lm>    15.4     391.     -8.39   412.  
#> 32 freiburg      <lm>     1.66    109.     -9.55   109.  
#> 33 bern          <lm>    38.0     476.    -11.8    439.  
#> 34 san-sebastian <lm>     0.231    96.7   -14.5     96.1 
#> 35 brugge        <lm>    12.3     290.    -21.4    265.  
#> 36 halle         <lm>     3.16    216.    -22.3    205.  
#> 37 mainz         <lm>    21.6     417.    -22.8    383.  
#> 38 groningen     <lm>    15.6     364.    -28.9    338.  
#> 39 minsk         <lm>     6.43    314.    -30.7    293.  
#> 40 bremen        <lm>     0       171.    -33.1    164.

Created on 2022-10-21 with reprex v2.0.2

Those results summarise the areas of each city over which cycling from a roughly central point of the city remains raster than driving a car. The points are nevertheless effectively random, and so the comparisons between cities don't really say anything in that case, but the general principle of the analyses remains valid.

The values in the results are:

  • a1 = The actual observed area within which cycling takes same time as driving (ratio = 1)
  • a2 = The actual observed area within which cycling takes less than twice as long as driving (ratio = 2)
  • intercept = The intercept of a linear regression fitted to values betwen ratios of 1 and 2
  • slope = The slope of a linear regression fitted to values between ratios of 1 and 2

Estimate frequency of fastest connections

Current ˋm4ra::m4ra_times_multi_modeˋ function returns single fastest time in specified internal. This should be extended to run just the GTFS part starting at just after the fastest service, and returning the time internal between the two as an indicator of service frequency.

Automobile times

Automobile times are needed to convert multi-modal times to ratios relative to equivalent automobile times. Calibration of actual times has been done via https://github.com/UrbanAnalyst/ttcalib, including potential to easily calibrate times during peak hours and not. The only remaining thing is then:

  • Implement time penalty for car parking, including metrics of local densities of car parks.

GTFS matrix

Need a routine to calculate full travel matrix for a given time window.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.