easbar / fast_paths Goto Github PK

Fast shortest path calculations for Rust

License: Apache License 2.0

Rust 100.00%

shortest-paths routing-algorithm road-network traffic-simulation dijkstra-algorithm contraction-hierarchies

fast_paths's Introduction

Fast Paths

The most famous algorithms used to calculate shortest paths are probably Dijkstra's algorithm and A*. However, shortest path calculation can be done much faster by preprocessing the graph.

Fast Paths uses Contraction Hierarchies, one of the best known speed-up techniques for shortest path calculation. It is especially suited to calculate shortest paths in road networks, but can be used for any directed graph with positive, non-zero edge weights.

Installation

In Cargo.toml

[dependencies]
fast_paths = "0.2.0"

Basic usage

// begin with an empty graph
let mut input_graph = InputGraph::new();

// add an edge between nodes with ID 0 and 6, the weight of the edge is 12.
// Note that the node IDs should be consecutive, if your graph has N nodes use 0...N-1 as node IDs,
// otherwise performance will degrade.
input_graph.add_edge(0, 6, 12);
// ... add many more edges here

// freeze the graph before using it (you cannot add more edges afterwards, unless you call thaw() first)
input_graph.freeze();

// prepare the graph for fast shortest path calculations. note that you have to do this again if you want to change the
// graph topology or any of the edge weights
let fast_graph = fast_paths::prepare(&input_graph);

// calculate the shortest path between nodes with ID 8 and 6 
let shortest_path = fast_paths::calc_path(&fast_graph, 8, 6);

match shortest_path {
    Some(p) => {
        // the weight of the shortest path
        let weight = p.get_weight();
        
        // all nodes of the shortest path (including source and target)
        let nodes = p.get_nodes();
    },
    None => {
        // no path has been found (nodes are not connected in this graph)
    }
}

Batch-wise shortest path calculation

For batch-wise calculation of shortest paths the method described above is inefficient. You should keep the PathCalculator object to execute multiple queries instead:

// ... see above
// create a path calculator (note: not thread-safe, use a separate object per thread)
let mut path_calculator = fast_paths::create_calculator(&fast_graph);
let shortest_path = path_calculator.calc_path(&fast_graph, 8, 6);

Calculating paths between multiple sources and targets

We can also efficiently calculate the shortest path when we want to consider multiple sources or targets:

// ... see above
// we want to either start at node 2 or 3 both of which carry a different initial weight
let sources = vec![(3, 5), (2, 7)];
// ... and go to either node 6 or 8 which also both carry a cost upon arrival
let targets = vec![(6, 2), (8, 10)];
// calculate the path with minimum cost that connects any of the sources with any of the targets while taking into 
// account the initial weights of each source and node
let shortest_path = path_calculator.calc_path_multiple_sources_and_targets(&fast_graph, sources, targets);

Serializing the prepared graph

FastGraph implements standard Serde serialization.

To be able to use the graph in a 32bit WebAssembly environment, it needs to be transformed to a 32bit representation when preparing it on a 64bit system. This can be achieved with the following two methods, but it will only work for graphs that do not exceed the 32bit limit, i.e. the number of nodes and edges and all weights must be below 2^32.

use fast_paths::{deserialize_32, serialize_32, FastGraph};

#[derive(Serialize, Deserialize)]
struct YourData {
    #[serde(serialize_with = "serialize_32", deserialize_with = "deserialize_32")]
    graph: FastGraph,
    // the rest of your struct
}

Preparing the graph after changes

The graph preparation can be done much faster using a fixed node ordering, which is just a permutation of node ids. This can be done like this:

let fast_graph = fast_paths::prepare(&input_graph);
let node_ordering = fast_graph.get_node_ordering();

let another_fast_graph = fast_paths::prepare_with_order(&another_input_graph, &node_ordering);

For this to work another_input_graph must have the same number of nodes as input_graph, otherwise prepare_with_order will return an error. Also performance will only be acceptable if input_graph and another_input_graph are similar to each other, say you only changed a few edge weights.

Benchmarks

FastPaths was run on a single core on a consumer-grade laptop using the road networks provided for the DIMACS implementation challenge graphs. The following graphs were used for the benchmark:

area	number of nodes	number of edges
New York	264.347	730.100
California&Nevada	1.890.816	4.630.444
USA	23.947.348	57.708.624

graph	metric	preparation time	average query time	out edges	in edges
NY city	distance	9 s	55 μs	747.555	747.559
CAL&NV	distance	36 s	103 μs	4.147.109	4.147.183
USA	distance	10.6 min	630 μs	52.617.216	52.617.642
NY city	time	6 s	26 μs	706.053	706.084
CAL&NV	time	24 s	60 μs	3.975.276	3.975.627
USA	time	5.5 min	305 μs	49.277.058	49.283.162

The shortest path calculation time was averaged over 100k random routing queries. The benchmarks were run on a Macbook Pro M1 Max using Rust 1.74.1. The code for running these benchmarks can be found on the benchmarks branch.

There are also some benchmarks using smaller maps included in the test suite. You can run them like this:

export RUST_TEST_THREADS=1; cargo test --release -- --ignored --nocapture

Graph limitations

loop-edges (from node A to node A) will be ignored, because since we are only considering positive non-zero edge-weights they cannot be part of a shortest path
in case the graph has duplicate edges (multiple edges from node A to node B) only the edge with the lowest weight will be considered

Special Thanks

Thanks to Dustin Carlino from A/B Street!

License

This project is licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in fast_paths by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

fast_paths's People

Contributors

Stargazers

Watchers

fast_paths's Issues

Copy and Clone Traits

Just wondering why the structs like FastGraph don't derive Clone?
It's blocking structs that own one from deriving those traits.

Improve preparation time

Some of the ideas taken from this discussion: Stunkymonkey/osm_ch#1

cancel witness searches after a certain amount of nodes have been explored -> see #37
use contracted_neighbors heuristic
use flat array for preparation graph
parallelization Stunkymonkey/osm_ch#1 (comment)
contract only a certain fraction of nodes
find out why abstreet graph preparation is slow compared to e.g. plain OSM maps Stunkymonkey/osm_ch#1 (comment) -> see #37
clean up query code? Stunkymonkey/osm_ch#1 (comment)

format of datasets such as New York

In a table, it shows the consumed time, such as New York.
But where is the dataset? is the format shapefile?
could you pls provide more details?

Document how to run the benchmarks

Hello!
Could you please share how you ran the benchmarks listed in the README?
The only tests I can see are in the lib.rs, and running cargo test after removing the #[ignore] still compiles the code in unoptimized mode. 'grep'-ing for '#[bench]' also shows nothing.

Why does the preparation time vary so much for different maps?

The preparation time varies drastically depending on the graph we use. Take for example these graphs (included in this repository), that are similar in size:

name	nodes	edges	edges/nodes	preparation time (ms)	fast_graph edges	fast_graph edges/node	prep per node (μs)	query time (μs)
Bremen dist	40_461	85_090	2.10	429	66_520	1.64	10	18
Bremen time	40_461	85_090	2.10	6_868	62_518	1.54	169	13
Bremen uniform*	40_461	85_090	2.10	307	65_669	1.62	8	15
South Seattle	29_763	49_604	1.67	16_750	64_029	2.15	560	32

The measurements shown in the table are just the result of running the tests in lib.rs on my laptop.

*Update: For the Bremen uniform graph I used the Bremen time graph, but set all edge weights to a constant value of 10.

I'd really like to find out where this difference comes from and it would be especially useful to speed up the preparation of the abstreet graphs (South Seattle here) somehow. Also related: #33.

bidijkstra stall-on-demand

I am not sure if I understand stall-on-demand fully.

this one will improve query time for sure 😉
i have seen @easbar implemented it for graphhopper, so i guess you understood everything.

please check my understanding:
the blocking happens before the neighboring nodes are added to the heap. Because adding nodes to the heap is the expensive part.

so we iterate over incoming edges with its neighbors and check if a node with a higher level is already settled. if so we end here.?
if not we iterate as usual over outgoing edges with its neighbors and add them to the heap.

do we have to store something when a node is blocked?

Possible perf regression with Rust 1.45

I just switched to a new Rust and building the CH is suddenly stalling out. I'll dig in soon and see what's going on.

Add a way to get all nodes in a radius

My main use of this algorithm is in pandana ware the common need is to for each knode, calculate summary statistics about all the nodes that are within a thresholded.

CH improve amount of created shortcuts

as seen here, the amount of shortcuts is not optimal.
The current approach as described in the original publication is creating shortcuts, if the contracting node is not part of the shortest path from previous neighbor to the next neighbor. As seen here this can lead to non perfect amount of shortcuts.
A "newer" approach is, to have a distinct set of previous neighbors and another distinct set with following neighbors. Then use Dijkstra and calculate the travel-costs. If the cost is equal to the cost from previous to contracting node to the following neighbor then the path from neighbor to neighbor is the optimal path. Therefore a new shortcut has to be created. Otherwise another better path is available and will create a shortcut in the future.
The path itself does not have to be fully resolved. Only the costs are important. Additional Improvement

i already did an implementation in rust, but with a different data-structure: link

maybe I find some time and could try implement it for your model. Is this wanted? Or do your prefer doing this yourself?

Possibly wrong path?

Hey, i am using this lib for the current advent of code.
And I got the wrong result for my input. The test input is working and everything else also. Maybe you can verify that the code is working correctly from your point of view?

I got the result of 3019 but it should be 3012. I rerun the code multiple times and also verified, that the enlarged matrix is correct (it is :/). So I cannot find any issues in my code. My next thought is, that your lib maybe has an issue? :D

Maybe you want/can take a look :)

You can find the code here:
https://github.com/auryn31/aoc_2021/blob/main/day_15/src/main.rs

But there is a solid chance my code is wrong :D

Best

Huge performance difference in terms of running time

I use the USA-road-d.COL road network, which contains 435666 nodes and 1042400 edges.

Generally, it costs more than 61300 micros, and it is much slower than you reported benchmark.

Data

I download the dataset from dataset and then clean it (remove duplicated edges; remove loop...).

The cleaned data can be download from google drive

USA-road-d.COL 435666 1042400
0 1 14567
0 46 7059
1 0 14567
1 30 16448
...

Experiment hardware

I run the experiments on a Windows desktop with i7 8700 CPU, 16 GB memory.

Code of running queries of CH

The saved CH file is 80.1MB.

    let fast_graph = fast_paths::load_from_disk("fast_graph_col.fp").unwrap();
    let start = Instant::now();
    let shortest_path = fast_paths::calc_path(&fast_graph, s, t);
    let duration = start.elapsed();
    println!("Time elapsed in ch_run() is: {:?}", duration.as_micros());

The average query time to answer the shortest from 0 to 1001 is 61303 micros.

ShortestPath PartialEq ignores nodes

See discussion here:

20ba810#r53021469

usize does not work well for de/serialization.

WASM is 32bit always (afaik).

So if you serialize a FastGraph on a 64bit platform the resulting bytes will not be deserializable in a WASM binary.

Is there a real need for usize or would you be amenable to a more concrete type? Or do you know of some way to tell serde that usize should read 8 bytes (u64) when it's decoding into a u32?

Extremely high memory usage when generating fast graph

I am facing extremely high memory usage when generating a fast graph(>25GB)

I have 873k nodes and 900k edges. My code looks like this:

	let mut input_graph = InputGraph::new();
	for e in edges {
		input_graph.add_edge(e.start_node, e.end_node, e.weight);
	}
	println!("Freezing graph...");
	input_graph.freeze();
	println!("Calculating optimized graph...");
	let fast_graph = fast_paths::prepare(&input_graph);
	println!("Done.\r\nSaving graph...\t");
	fast_paths::save_to_disk(&fast_graph, format!("{}.ff", filename).as_ref());
	println!("Done.");

It starts swallowing memory at let fast_graph = fast_paths::prepare(&input_graph);. I tried creating a new params with a ratio of 0.01 and 10.0, which didn't seem to help. (I'm not entirely sure what this value does, which is why I tried a small and large value)

I looked through the benchmark code but it doesn't look like I'm doing anything fundamentally different.

Why not more object oriented?

I noticed a lot of methods are used like this:

fast_paths::a_method(&graph, a, b, ...)

Whereas a more OO method call would look like this:

graph.a_method(a, b, ...)

Is there any reason the first is used so often? I would like to send in a PR to move over to the second way, but want to make sure it would be accepted.

Get rid of compiler warnings due to unused code in floyd_warshall.rs

The code in floyd_warshall.rs is only used in tests at the moment.

The dijkstra's implementation should be bidirectional

There are still some perf gains to be had if we switch to a bidirectional dijkstra's implementation for the query phase.
(See https://arxiv.org/pdf/1504.05140.pdf , pages 22-23)

Rust 1.51.0 warning: panic message is not a string literal

Using Rust 1.51.0 I get:

warning: panic message is not a string literal
   --> src/preparation_graph.rs:150:13
    |
150 | /             format!(
151 | |                 "invalid node id {}, must be in [0, {}[",
152 | |                 node, self.num_nodes
153 | |             )
    | |_____________^
    |
    = note: `#[warn(non_fmt_panic)]` on by default
    = note: this is no longer accepted in Rust 2021
    = note: this warning originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

Dual licensing MIT/Apache 2.0

Hi there, could you dual license under MIT or Apache 2.0? This is the standard license used in the Rust community.

What would it take to support tiled routing graphs?

Hey! I'm the maintainer of Headway, a self-hostable maps stack. I'm thinking about taking the project in a direction that would allow instances to link together in a federated network that could grow to cover the planet. Doing so presents a number of challenges though. From the routing side of things, an end-user can't simply send a routing query including route endpoints to an untrusted server, because of the privacy implications. Nor can it download a routing graph for the entire planet. Valhalla handles this by dividing up the routing graph into tiles. Do you know if an approach like that could work with fast_paths? If so, what would need to change?

Compiling Valhalla to webassembly seems extremely difficult, and I'd like to evaluate building turn-by-turn on top of fast_paths instead. I'm not sure which would end up being more difficult since both of these ideas sound ridiculously complex, but this option involves less CMake at least and I want to explore it a bit.