urbananalyst / m4ra Goto Github PK
View Code? Open in Web Editor NEWmany-to-many multi-modal routing aggregator
Home Page: https://urbananalyst.github.io/m4ra/
many-to-many multi-modal routing aggregator
Home Page: https://urbananalyst.github.io/m4ra/
The multi-modal algorithm currently reads in full GTFS matrices and passes these straight through to C++ routines. For cities like Paris with 30,000 stops, these matrices are huge and potentially too big to hold in memory. Sizes should be reduced to only a subset of GTFS origin points that are within the maximum travel duration of all origin points.
It's still necessary (21 sep 2022) because CRAN hasn't yet build binaries for all OSs of dodgr v0.2.16. Once they're all there, the remotes dep can be removed.
Shared eletric scooters won't be analysed for now, because the areas in which they are (potentially) available are generally highly regulated, preventing the construction of any generally realistic models.
Because for cities like Paris, the matrices are huge and almost don't fit into 32GB of memory. Looping over subsets and saving intermediate results before a final re-assembly step will be more memory-efficient, and will also allow a progress indicator.
That's the final part of the multi-modal routing algorithm, and really does not scale well with larger networks. It can easily be parallelised.
The multi modal function #8 now works, as do the parking penalties for automobile travel #9. These can now be combined to replicate https://github.com/ATFutures/m4ra/blob/main/R/times-bike-car.R to write an equivalent function to generate ratios of multi-modal travel times to equivalent automobile times.
MapBox Traffic Data has useable data, but is a commercial (not free) service. The TomTom API, also discussed below, offers a Traffic Density endpoint which could be used. It is asyncronous, and non-trivial to process requests, so would be very difficult to use at the scales needed here. It could nevertheless be used to provide some calibration estimates?
Vehicle times need to include additional time penalties for parking. This paper from hpi.de (with pdf here and a slightly different version here) describes a probabilistic algorithm for estimating times. It's based on date obtained from a TomTom API which clearly no longer exists (it was called "on-street-parking").
Their results at the end of the second pdf linked to above show some plausible benchmark values of extra driving time of 100-200 seconds, extra walking time of marginally less than that, and overall additional times of around 300 seconds. There are strong variations in both space and time. The underlying algorithm assumes a Pareto II probability density of finding a parking spot per metre of driving, which it itself spatially and temporally variable. The TomTom data were used to calibrate these probability densities in time and space, and considered factors including functional road class and average speed of road segments, although they concluded that length of a road segment was the single determining factor.
trying to run m4ra_gtfs_traveltimes on windows fails due to the fact that num_cores gets the number of cores on the computer, which is larger than 1. num_cores is fed into mcapply mc.cores argument,without the possibilty of changing the parameter. is m4ra not suitable for windows?
It would be good to expose the main algorithms in inst/include
of dodgr and link them that way, but they'll need quite a bit of custom modification here anyway, notably including modifying to work with distance matrices too large to hold in memory, as well as stripping away many of the other options built in there. It'll be much easier here to bundle the algorithms internally, with the (potentially large) downside of having to cross-maintain these two packages.
In some suitably standardised form, like transfers per 10km or so. That will then also require either an actual spatial representation of the transport network, or maybe use of walking distances as a proxy? Network distances can and should be extracted from OSM, and not from GTFS files, because the latter will not necessarily have shape data needed to calculate inter-station distances. This will then require an additional pre-processing routine, to calculate a matrix of distances between all GTFS stops.
The GTFS part currently calculates matrices from subset of origins but to all destinations. The latter could be restricted to a subset defined by the distance threshold. Hard to know without doing it exactly how much that would improve efficiency, because most of the calculations are simply O(N) matrix manipulations, but it should have som effect.
Change github org
A useful additional measure to accompany results of ˋm4ra_times_multi_mode()ˋ would be the quality of bicycle infrastructure connecting each origin point to GTFSdeparture station. That could easily be done by aggregating initial bicycle distances alongside one additional distance measure as such of distances time weighting factors. The weighting of the two would then directly reflect quality of bicycle infrastructure connecting origins to public transport services.
Because #22 pre-generates a restricted index of all GTFS stops prior to loading them. Current code loads full matrices and then reduces size straight away, but would be a perfect use case for arrow. The loading for each batch of vertices is also quite slow, and arrow should considerably improve the efficiency of that.
They currently produce garbage when writing really large files (~10M rows), but seems to be because of default compress = 50
. Setting that to compress = 0
seems to resolve the issue.
The travel times algorithms use IntegerMatrix
:
https://github.com/ATFutures/m4ra/blob/91e503991a336bb654cc3d0be4ea094ca30fe0cd/src/travel-times.h#L44-L48
But the multi-modal versions use NumericMatrix
:
https://github.com/ATFutures/m4ra/blob/91e503991a336bb654cc3d0be4ea094ca30fe0cd/src/multi-modal.h#L15-L17
This was done because the latter derive from original distance algorithms, which use NumericMatrix
. This has to be changed, however, as these matrices can be really huge, whereas equivalent <int>
values yield much smaller matrices. The only problem is that Rcpp offers no options for long int, as explained in FAQ 3.9 on p8. On my local machine, <int>
goes up safely to at least 2 ^ 30, but definitely mis-behaves at 2^32, even though that should be possible, as is less than std::numeric_limits <int>::max ()
.
This change should thus be pretty straightforward, but it might be good at some stage to implement a check for behaviour of large int representations?
A reprex to illustrate the kinds of analyses currently possible:
library (dplyr)
library (ggplot2)
library (m4ra)
packageVersion ("m4ra")
#> [1] '0.0.1.121'
flist <- gsub ("^m4ra\\-", "", list.files ("~/.cache/R/m4ra/"))
cities <- gsub ("\\-[a-z0-9]{6}\\-(bicycle|foot|motorcar)\\.Rds$", "", flist)
cities <- unique (cities)
cities <- cities [cities != "helsinki"]
bike_car_ratio_one_city <- function (city) {
graph_f <- m4ra_load_cached_network (city, mode = "foot")
graph_f <- graph_f [graph_f$component == 1, ]
v_f <- dodgr::dodgr_vertices (graph_f)
graph_b <- m4ra_load_cached_network (city, mode = "bicycle")
graph_b <- graph_b [graph_b$component == 1, ]
v_b <- dodgr::dodgr_vertices (graph_b)
graph_c <- m4ra_load_cached_network (city, mode = "motorcar")
graph_c <- graph_c [graph_c$component == 1, ]
v_c <- dodgr::dodgr_vertices (graph_c)
# Get vertices common to all networks:
vert_count <- table (c (
unique (graph_f$.vx0),
unique (graph_b$.vx0),
unique (graph_c$.vx0)
))
verts_all <- names (vert_count) [which (vert_count == 3)]
v <- v_f [which (v_f$id %in% verts_all), ]
# Get central vertex:
i <- which.min ((v$x - mean (v$x)) ^ 2 + (v$y - mean (v$y)) ^ 2)
from <- v$id [i]
dat <- m4ra_bike_car_times (city = city, from = from)
areas <- m4ra_bike_car_ratio_areas (dat, ratio_lims = 1:20 / 4)
areas$city <- city
return (areas)
}
result_file <- "bike-car-ratio-results.Rds"
if (!file.exists (result_file)) {
result <- bike_car_ratio_one_city (city = cities [1])
count <- 2L
system.time ({
for (ci in cities [-1]) {
message (ci, " [", count, " / ", length (cities), "]")
count <- count + 1L
result <- rbind (result, bike_car_ratio_one_city (city = ci))
}
})
saveRDS (result, result_file)
}
result <- readRDS ("bike-car-ratio-results.Rds") |>
group_by (city) |>
filter (ratio <= 2) |>
mutate (label = c (rep (NA_character_, length (city) - 1L), city [1]))
ggplot (result, aes (x = ratio, y = area, colour = city)) +
geom_line () +
geom_label (aes (label = label), nudge_x = 0.35, size = 4) +
theme (legend.position = "none")
#> Warning: Removed 279 rows containing missing values (geom_label).
# Then summarise values at unit ratio, and slopes 1 -> 2:
x <- readRDS ("bike-car-ratio-results.Rds") |>
filter (ratio >= 1 & ratio <= 2) |>
transform (ratio = ratio - 1) |>
group_by (city) |>
do (
mod = lm (area ~ ratio, data = .)
) |>
mutate (
a1 = mod$model$area [1],
a2 = tail (mod$model$area, 1L),
intercept = summary (mod)$coefficient [1],
slope = summary (mod)$coefficient [2]
) |>
arrange (by = desc (intercept))
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
print (x, n = 100)
#> # A tibble: 40 × 6
#> # Rowwise:
#> city mod a1 a2 intercept slope
#> <chr> <list> <dbl> <dbl> <dbl> <dbl>
#> 1 hamburg <lm> 185. 1267. 190. 1096.
#> 2 paris <lm> 188. 188. 188. 0
#> 3 muenchen <lm> 91.2 488. 140. 371.
#> 4 aachen <lm> 106. 380. 137. 263.
#> 5 brussels <lm> 119. 123. 121. 2.90
#> 6 duesseldorf <lm> 129. 518. 114. 396.
#> 7 san-francisco <lm> 98.6 253. 103. 155.
#> 8 stuttgart <lm> 63.6 392. 86.3 340.
#> 9 zurich <lm> 45.9 171. 69.1 119.
#> 10 frankfurt <lm> 45.8 391. 64.9 332.
#> 11 dresden <lm> 57.1 506. 55.8 459.
#> 12 luxembourg <lm> 43.2 122. 54.9 77.9
#> 13 copenhagen <lm> 54.3 240. 54.7 205.
#> 14 essen <lm> 48.2 356. 54.3 331.
#> 15 karlsruhe <lm> 37.1 281. 29.7 250.
#> 16 nuernberg <lm> 35.9 297. 28.6 271.
#> 17 liege <lm> 16.7 135. 20.2 125.
#> 18 leipzig <lm> 26.5 461. 15.4 458.
#> 19 bielefeld <lm> 31.2 388. 10.5 369.
#> 20 mannheim <lm> 18.1 194. 9.96 186.
#> 21 leiden <lm> 6.52 41.6 9.79 36.1
#> 22 muenster <lm> 25.4 376. 9.43 343.
#> 23 ghent <lm> 44.3 398. 7.94 346.
#> 24 lausanne <lm> 6.82 79.5 7.46 68.6
#> 25 hannover <lm> 24.6 398. 5.69 370.
#> 26 basel <lm> 28.7 436. 5.60 414.
#> 27 honfluer <lm> 0.0574 27.8 0.893 28.1
#> 28 leuven <lm> 11.4 246. -0.491 244.
#> 29 rastede <lm> 0.282 16.9 -2.55 18.5
#> 30 antwerpen <lm> 1.37 99.1 -5.73 98.2
#> 31 tallinn <lm> 15.4 391. -8.39 412.
#> 32 freiburg <lm> 1.66 109. -9.55 109.
#> 33 bern <lm> 38.0 476. -11.8 439.
#> 34 san-sebastian <lm> 0.231 96.7 -14.5 96.1
#> 35 brugge <lm> 12.3 290. -21.4 265.
#> 36 halle <lm> 3.16 216. -22.3 205.
#> 37 mainz <lm> 21.6 417. -22.8 383.
#> 38 groningen <lm> 15.6 364. -28.9 338.
#> 39 minsk <lm> 6.43 314. -30.7 293.
#> 40 bremen <lm> 0 171. -33.1 164.
Created on 2022-10-21 with reprex v2.0.2
Those results summarise the areas of each city over which cycling from a roughly central point of the city remains raster than driving a car. The points are nevertheless effectively random, and so the comparisons between cities don't really say anything in that case, but the general principle of the analyses remains valid.
The values in the results are:
a1
= The actual observed area within which cycling takes same time as driving (ratio = 1)a2
= The actual observed area within which cycling takes less than twice as long as driving (ratio = 2)intercept
= The intercept of a linear regression fitted to values betwen ratios of 1 and 2slope
= The slope of a linear regression fitted to values between ratios of 1 and 2fst
to write and read locally cached network filessave/readRDS
to write associated attributes and re-attach on load.dodgr/save_load_streetnet.R
Current ˋm4ra::m4ra_times_multi_modeˋ function returns single fastest time in specified internal. This should be extended to run just the GTFS part starting at just after the fastest service, and returning the time internal between the two as an indicator of service frequency.
Automobile times are needed to convert multi-modal times to ratios relative to equivalent automobile times. Calibration of actual times has been done via https://github.com/UrbanAnalyst/ttcalib, including potential to easily calibrate times during peak hours and not. The only remaining thing is then:
Need a routine to calculate full travel matrix for a given time window.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.