pnevyk / gryf Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 1.0 430 KB

Graph data structure library aspiring to be convenient, versatile, correct and performant.

License: MIT License

Rust 100.00%

algorithms data-structures graph rust

gryf's People

Contributors

Stargazers

Watchers

Forkers

linecode

gryf's Issues

Use BFS for shortest paths if unit weight is used

In shortest paths algorithm, detect if unit weight (i.e., all edges have weight equal to 1) is used and use simple BFS algorithm if so. Unit edge weights are represented by Unit type. For unit weights the overhead of Dijkstra's algorithm is unnecessary and thus using a straightforward BFS implementation will be more efficient.

Somewhat tricky part is how to deduce that the edge weight getter represents unit weight. We can't use std::any API, because we don't want to constrain the edge weight getter to be 'static. One possible way that I can think of now is to add a new static method to GetEdgeWeight trait called is_unit() -> bool with the default implementation returning false. The Unit type would then be the only type that would return true. This is not perfect as it can't recognize user functions that also represent a unit weight, but using such functions should be discouraged.

Do not expose BFS as a new algorithm option in Algo enum, as this should be solely an implementation detail, which covers only a niche use case.

Rename `ShortestPaths` to `SingleSourceShortestPaths`

This will follow the convention in literature and will make it easier to distinguish between single-source shortest paths and all-pairs shortest paths that we will implement in the future. The naming is present also in the Wikipedia article.

Graph operations

There should be a module which provides various standard graph operations like edge contraction, graph complement, intersection, union, etc. These don't require any special support from the underlying storage and thus can be generic. There should be probably also standard traits which would be implemented by these operations (similar to std::ops).

Things to consider:

unary vs binary
self-consuming vs in-place

Validate applicability of gryf on top reverse dependencies of petgraph

Pick first k reverse dependencies of petgraph crate and try to use gryf instead of petgraph in them. Find blockers or papercuts in gryf in this process. Report them here. This should validate applicability of gryf on real world projects. Obviously we lack many standard algorithms, but the core graph API should be applicable enough.

Consider changing `add_edge(src, dst, e)` to `add_edge(e: impl IntoEdge)`

Advantages:

Cleaner (?) API for unweighted graphs: graph.add_edge((src, dst)) instead of graph.add_edge(src, dst, ()).

Disadvantages:

Worse documentation? add_edge(&mut self, src: VertexIndex, dst: VertexIndex, edge: E) is much clearer than add_edge<T: IntoEdge<VertexIndex, E, Ty>>(&mut self, edge: T).
Potential problems with type inference. Especially in case of encapsulations where we don't use G::VertexIndex directly, but VI: Borrow<G::VertexIndex> instead.

In summary, I am not much in favor of this change, but just wanted to leave a note if I change the opinion and the disadvantages will turn out not that bad.

Stable storage does not check vertex presence in neighbors iterator

Currently, the Stable storage does not check whether the source index for getting neighbors is still in the graph (i.e., is not in removed_vertices). The behavior in such case should be identical to the other storages.

Make the library compile on stable

At the time of writing this issue, there are 4 nightly features enabled:

generic_associated_types - stabilized in rust 1.65.0
assert_matches - there are crates for this functionality
int_log - is going to be stabilized in rust 1.67.0
try_blocks - there is exactly one occurrence in the codebase and that can be refactored

There is no justification (at least now) to require nightly.

Remove the mention about unstable features from the README as well.

Provide default implementations for counts and bounds

Provide default implementations for counts and bounds in respective traits (VerticesBase, EdgesBase, VerticesBaseWeak, EdgesBaseWeak).

vertex_count and edge_count can be implemented by counting items in self.vertex_indices() and self.edge_indices() iterators, respectively.
vertex_bound and edge_bound can be implemented by finding maximum in self.vertex_indices() and self.edge_indices() iterators, respectively.
vertex_count_weak, vertex_bounds_weak, edge_count_weak and edge_bounds_weak can return None

In all cases, keep the current (efficient) implementation.

Refactor internals of adjacency matrix representation

The main two goals of the refactoring are:

Reduce the scope and complexity of the code that uses unsafe.
Make the utility functions public so they can be reused by other matrix-like representations.

Reducing scope of unsafe

The internals of AdjMatrix use unsafe code for two reasons:

The possibility to "detach" edge presence information from the edge data, so that implementation of Neighbors does not cause "E does not live long enough" error.
A space reduction, where Vec<Option<E>> would need extra byte(s) for every item in a lot of practical cases (where a memory layout optimization would not help). Although the real performance effect was not benchmarked!

To be more confident in the correctness as well as reduce stress while making changes in the future, the scope and complexity of the code using unsafe should be reduced.

There should be a new Vec-like type encapsulating the BitVec, Vec<MaybeUninit<T>> pair. The API surface should be as minimal as possible while still supporting the needs of the AdjMatrix implementation. And it should still be possible to get reference to the underlying BitVec which does not have any generics. Since the implementation will not need to care about anything matrix-related, the implementation should be more straightforward and much easier to reason about. Anyway, miri should be used to ensure correctness.

Exposing utility functions

There are some useful functions that help with storing matrix-like data in a linear vector, namely:

size_of -- calculating needed size of the vector given number of vertices and graph being directed or undirected
resize -- Resizing the vector to a new size, correctly moving existing edge data. Although this function needs to be made edge container agnostic.
index & coords -- Functions that can convert edge coordinates (source vertex and destination vertex) to the vector index and vice versa.

These should be publicly available so they can be used by external implementations.

Create property-based tests for topological sort algorithms

Create proptest tests for TopoSort using our graph generation strategies and available assert_valid function. Test both algorithms (DFS, Kahn). Use DebugGraph as the graph being generated because it provides very useful information in case of a failed assert.

It would be nice to implement a function for removing cycles from a graph by removing back edges from DfsEvents so that there could be also proptest tests which use acyclic graphs for testing. This should increase test coverage of the algorithm since more of valid graphs will be tested.

Return an edge that is part of a negative cycle in Bellman-Ford

If toposort algorithms encounter a cycle, they return an edge that is part of that cycle. We can do the same in Bellman-Ford when encountering the negative cycle. This has real applications (e.g., currency arbitrage detection).

We should not reuse the Cycle type that is used by toposort algorithm and cycle detection algorithm, because the collect method would not work as expected (we want to find the negative cycle, not any cycle). But we could introduce a new type that would also have the collect method to find all edges that are part of the negative cycle.

Toposort algorithms should support skipping cycles

The algorithms for topological sort implement Iterator<Item = Result<VertexIndex, Error>>, that is, they lazily produce a sequence of vertices in topological order or report a (cycle) error when encountered.

It should be possible to just skip cycle errors and continue and get the rest of the vertices. The result sequence should be a valid topological order as if the encountered cycles were not in the original graph. In code, this should work and give expected output:

let toposort = TopoSort::on(&graph).run();

let sorted = toposort
    // Ignore cycle errors.
    .filter_map(|result| result.ok())
    .collect::<Vec<_>>();

There may perhaps be an option to specify if recovering from cycles while still producing a valid topological order is required or not (as that may influence some implementation details, where false could be more efficient), e.g.

let toposort = TopoSort::on(&graph).recover_from_cycles(true).run();

It's not obvious to me what would be the default value for this option. Anyway, for the false case, the behavior for all algorithms consistently should be to always return None after the first cycle error is reported.

Check number of edges when testing if graph is a path

When checking if a graph is a valid path here, the code should check if it's even possible to be true given the number of vertices and edges, as a path must have n - 1 edges. This can early identify some non-paths.

Implement the number of edges in a path as another fact (zero vertices must be correctly handled, too).

Graph abstractions

Prepona was a huge inspiration in separating the graph abstraction from graph data storage. The question is how should we approach that to benefit from this separation to its fullest, but keeping the usage and API as convenient as possible.

Kind of obvious cases are having simple graph and multigraph. Somewhat orthogonally allowing and disallowing self-loops. Should we have four structs (LoopFreeSimpleGraph, SimpleGraph, LoopFreeMultiGraph, MultiGraph or is there a different, more flexible approach to tackle this (a marker type, constructor argument)?

Another level would be having graph types that would guarantee some certain property, like BipartiteGraph, TreeGraph, ConnectedGraph. This information may be exploited by using a simpler and faster algorithm that works only on certain classes of graph. There would be possibly some kind of indirection through traits (Bipartite, Tree, ...), which would be required by the algorithms, so that different implementations of the same property can be used.

There may be also tricks like starting with a sparse-friendly storage and switching to a dense one once the edge/vertex ratio exceeds a threshold.

Any other ideas?

Toposort cycle error should contain edge that "caused" it

The information about the edge and vertices in the cycle is a very useful information in applications as well as during debugging. To have the API as simple as possible, it is sufficient to just include the edge index, from which the rest can be retrieved from the graph.

pub enum Error<G: GraphBase> {
    Cycle(G::EdgeIndex),
}

Create property-based tests for shortest paths algorithms

Create proptest tests for ShortestPaths using our graph generation ~~strategies~~ strategies. Test both algorithms (Dijkstra, Bellman-Ford). ~~Use DebugGraph as the graph being generated because it provides very useful information in case of a failed assert.~~

What properties to test? ~~One possibility is to implement a function for removing isolated vertices from a graph so that the tested graphs are connected. Then,~~ For connected graphs after finishing an algorithm, it should be possible to reconstruct path and get distance from every vertex in the graph to the starting vertex. Also, the distances (not the paths, generally) should be equal as returned by Dijkstra and Bellman-Ford.

Parallel algorithms

It would be nice to provide implementations of parallel algorithms, especially if it would be possible to have them with the same API as their sequential counterparts. The goal is to try to implement a parallel algorithm in order to discover obstacles, rough edges, necessary API extensions/changes and helpful internals for implementing parallel algorithms in general.

A good candidate may be connected components. That will require implementing fundamentals such as parallel API on graph storages and parallel BFS.

Some resources I found online:

Study `graphene` crate for inspiration

I accidentally stumbled on graphene project with some of the goals and target use cases that are similar to gryf.

Algorithm configuration builders

Instead of using several run methods with many arguments, the algorithms can use the builder pattern with reasonable defaults instead. Example with ShortestPaths:

Current way

ShortestPaths::run(&graph, start, Some(goal), edge_weight);
ShortestPaths::run_algo(&graph, start, Some(goal), edge_weight, Some(shortest_paths::Algo::Dijkstra));
ShortestPaths::run_dijkstra(&graph, start, Some(goal), edge_weight);
ShortestPaths::run_bellman_ford(&graph, start, edge_weight);

Builder pattern

The API is just a quick, first draft and is subject to change.

ShortestPaths::run(&graph, start); // with goal = None, edge_weight = identity, algo = any
ShortestPaths::with(&graph, start).goal(goal).edge_weight(edge_weight).dijkstra().run();

An interesting challenge will be to allow to choose a specific algorithm which should also result in more specific constraints on the graph (for which there are now separate functions such as run_dijkstra). But this should be quite doable with type parameters and zero-sized new types.

The main advantages are:

The API should be more convenient to use, especially when the default values are appropriate.
More options can be added in a backwards-compatible way.
It enables to use optional arguments (Option) with a generic type without any hassle (for a function fn foo<T>(x: Option<T>) the compiler cannot infer the type for T in foo(None)).

`GraphIterator` trait

GraphIterator trait would be a variation on Iterator from the standard library, requiring to implement just "next" functionality (probably next_vertex and next_edge, but I haven't thought about the details at all yet) and providing iterator-like functionality that makes sense to do for graphs. A list of such functionality (not exhaustive):

map(v, e), map_vertices(v), map_edges(e)
filter(v, e), filter_vertices(v), filter_edges(e)
filter_map
count -> (v, e), vertex_count -> v, edge_count -> e
collect -> G with G generic type implementing a trait

When consuming, vertices are always processed first and edges second (existence of an edge is dependent on its endpoint(s) not being removed), which may impose some implementation challenges.

Implicit graphs

From my perspective, infinite graphs can be used to represent some dynamic graph-like behavior/model. I remember an exercise from the university: Find an optimal sequence of operations divide by 2, divide by 3, subtract 1 to gen from N to zero. This is possible to solve with BFS, but there is no need to materialize the entire graph. I believe there might be some practical problems of a similar feature.

This problem is more related to the interfaces and if they are capable to support it (in a convenient way). A nice thing is that the underlying model is basically a "storage" with the same interface (or a subset of it) like get all neighbors, etc. But many methods in the current interfaces somewhat assume finiteness -- to name a few: vertex_count, vertex_indices, vertices and its edge counterparts.

I think it's now worthy to complicate the basic interfaces to support infinite graphs, which are quite niche area, by for instance having vertex_count returning Option<usize>. Another problem is with index types (VertexIndex, EdgeIndex) as I can imagine that in many cases it would be problematic or fragile to encode the "dynamic" underlying model to these indices.

Having a dual set of traits for infinite graphs would rule out algorithms' implementations that use the finite traits, even when the algorithm itself fundamentally does not require finiteness (graph traversal, Dijkstra, etc.).

Split `traits.rs` into submodules

The traits.rs grew a lot and became quite messy. It should be split into multiple submodules, each one containing somewhat related traits (e.g., vertices, edges, neighbors, properties, ...).

Ensuring assumptions

Some implementations make assumptions about behavior of functions in core traits, but these are not generally guaranteed. For example, Stable assumes that removing vertices/edges does not invalidate indices in an unexpected way (changing indices with lower number than removed one), or Complement assumes that vertex indices are incremented by one when adding new vertices.

Although these assumptions hold on the storage implementations in gryf (at least at the moment) and are somewhat reasonable, it can lead to surprising behavior and bugs when using a custom implementation not following this behavior.

One way to avoid it is to create a set of marker traits (similar to Send or Eq) that can be implemented for graphs that follows expected behaviors and require these traits to be implemented for graphs used by code that relies on it. At this point, it is probably too early to create these traits as we may need additional expected behaviors and so it is better to wait until having a more complete picture.

Reconsider multi edges iteration

There are cases (1) when it would be useful to be able to iterate over all multi-edges between two vertices without requiring an additional trait bound on the input graph not to disqualify graphs that do not support that.

We might move multi_edge_index method from MultiEdges to Edges so that it can be used on all types implementing Edges. The trait MultiEdges would become a marker trait, if there is a need for constraining a graph on that. For graphs without multi-edge support, the MultiEdgeIndicesIter type would simply be Option<EdgeIndex> and the implementation would call edge_index.

We might even stop distinguishing between edge_index and multi_edge_index and have just edge_index that returns an iterator. With the current API, surprises may happen if a developer/user does not realize that edge_index does not tell the whole truth in multi-edge graphs. Therefore this change would reduce a chance for surprising behavior, which is one of our goals.

Study `networkx` Python package for inspiration

networkx is a popular (12k stars on GitHub) and mature (first documented release in 2005) graph library written in Python. Assuming it's friendly to newcomers but also provides necessary API for advanced use, try to find inspiration for our API there.

Override default implementations by more efficient ones where it makes sense

The core traits Vertices{Base,Mut,}, Edges{Base,Mut,} and Neighbors contain default implementations for some methods that can be expressed in the means of others. These implementations are often inefficient, however, and should be overridden by reasonable implementations in each graph storage.

This applies both to graph storages (AdjList, AdjMatrix, EdgeList) and their adapters (Stable, Frozen, operators, ...). The latter should always explicitly delegate the behavior to the underlying graph (i.e., implementing fn <function>(...) { self.graph.<function>(...) } or equivalent), but this is often guaranteed because of using derive macros.

A few known examples:

VerticesBase::contains_vertex -- after 4229680 the default implementation changed to going through all vertex indices and checking if there is the vertex of interest. This definitely should be overridden in all graph storages to self.vertex(index).is_some()
Neighbors::degree{,_directed} in AdjList -- the degree can be computed in constant time by getting the number of outgoing and/or incoming edges which is stored in the vertices.
Neighbors::degree{,_directed} in Complement -- the degree can be computed by subtracting the real degree from the maximum possible degree of a vertex

Index types

Currently, there is a single index type for vertices (VertexIndex) and a single index type for edges (EdgeIndex). Not being able to parametrize over the index type prevents some interesting applications (specific use cases where a special form of the index is convenient to have, implicit graphs #3, resource-constrained environments). On the other hand, I don't want to clutter the generics, making the majority of uses where default index type is fine more cumbersome to work with.

As a side note, VerticesBaseWeak and EdgesBaseWeak somehow support custom indices (their main original motivation was to add support for implicit graphs), but it feels quite suboptimal.

Here is a list that of traditional solutions to this problem that I can think of:

Just use a single, predetermined type. This is what gryf has now.
Add another generic type (usually with a default type). In our case, for instance the generics for Graph type would be Graph<V, E, Ty: EdgeType, G, VIdx: IndexType = VertexIndex, EIdx: IndexType = VertexIndex> and for Edges would be Edges<E, Ty: EdgeType, VIdx, EIdx>, which is quite scary.
A special type is created (Vertex<V, Idx = VertexIndex>, similarly for edges) and it is used where we now just use V and E respectively. I personally feel that it makes the interfaces uglier and less ergonomic.

There is one alternative idea that would use auto traits and negative impls unstable features. Trait Indexed will have two associated types VertexIndex: IndexType and EdgeIndex: IndexType. Trait DefaultIndex will be auto trait, thus it would be implemented for all types (unless negative impl is used) and there will be a catch-all

impl<T: DefaultIndex> Indexed for T {
    type VertexIndex = VertexIndex;
    type EdgeIndex = EdgeIndex;
}

Thus, the index type would be associated to the vertex and edge data types, but in an opaque way (contrary to option 2 mentioned above). This should cover the majority of cases where default index is fine without changing the code on the user side. Using a negative impl, one could opt-out from default indexing and implement Indexed trait for their type with a custom index. Two helper types will be provided for convenience: PhantomIndexed opting out from DefaultIndex auto trait (basically for the same purpose as PhantomPinned) and Custom<T, VIdx, EIdx> implementing Indexed trait while using VIdx and EIdx for VertexIndex and EdgeIndex, respectively.

I have zero experience with auto traits and negative impls and I would not be surprised if this approach causes "conflicting implementations" or any other issue. I am also not sure about the final ergonomics of this solution, especially on the user side.

A note about requiring both VertexIndex and EdgeIndex on a single type: This is necessary to be able to use a type for vertices and edges as one couldn't implement Indexed trait for both vertex index type and edge index type if there was just a single associated type. Another reason is that edge-related traits need both edge index type and vertex index type in order to support functions such as endpoints and edge_index.

Careful thinking will need to go into what should be the minimal requirements for an index type (IndexType trait). Currently it's Clone, Copy, PartialEq + Eq, PartialOrd + Ord, Hash, from and into usize, "null" value. I believe it is reasonable to require Eq + Hash (this is important for using HashSet for visit sets, among other things). Ord is required in CompactIndexMap and brings support for BTreeSet and other ordering-related data structures. Clone is likely a must-have, Copy is questionable, probably too restrictive. If Copy is not required, many function parameters types should change from owned type to ref type (e.g., contains_vertex(&self, index: &V::VertexIndex)).

Being able to get a usize representation is important for some algorithms, but having this as an additional, optional functionality that can be used in constraints, sounds as a good compromise. Similar for "null" index. Generic storage implementations are also dependent on being able to treat the indices as usize. There could be therefore two index traits, IndexType and NumIndexType: IndexType, where NumIndexType would provide these additional usize-oriented capabilities.

Implement `Index` and `IndexMut` std traits for Graph and Path

Implement Index and IndexMut trais for Graph and Path (and other graph wrappers if new are implemented before this is done).

These operations should be implemented both for vertex index and edge index types, so it is possible to get a vertex data by indexing with vertex index and edge data by edge index.

Fix `complete_graph_edge_count` for zero vertices

Function complete_graph_edge_count returns the number of edges in a complete graph. Currently, it causes integer overflow when the number of vertices is zero.

Benchmark automatic algorithm selection

Quoting this Reddit comment:

It would be good to have a benchmark suite where you run an NxM test, graph properties x algorithm (x random seed). Then you use that to calibrate your algorithm selection, and also make some pretty plots to stick in the readme and convince people that the algorithm selector makes the right choices in these cases.

Sometimes it will not be fair. For example, algorithm selection for shortest path for a graph with signed edges will always pick (supposedly slower) algorithm that supports negative edges, even if the graph contains only positive edges, as a safe choice. But otherwise this is a good way how to validate the selection criteria.

Try to simplify generics for storage on graph encapsulations

Currently, the full signature for a graph type with a specific storage looks like this

Graph<V, E, Ty, AdjList<V, E, Ty, DefaultIndexing>>

This is scary and also repeating the types for vertices, edges and direction feels redundant. Of course one could use _ placeholder, but still.

Ideally, there is a mechanism that would simplify the signature to something like this:

Graph<V, E, Ty, MagicAdjList<DefaultIndexing>>

MagicAdjList would be a separate type that would know how to create AdjList given V, E and Ty. Or something like that. This definitely needs more thought and experiments.

BuildHasher from the standard library came to mind, but maybe it's a useless reference.

Kahn algorithm does not recover from encountering a cycle

If continuing the iteration even when a cycle error is encountered, Kahn algorithm keeps reporting the cycle error forever. This makes the following code an infinite loop:

let toposort = TopoSort::on(&graph).kahn().run();

let sorted = toposort
    // Ignore cycle errors.
    .filter_map(|result| result.ok())
    .collect::<Vec<_>>();

The desired behavior is described in #26, but until then the fix can just be to set a flag and return None when this flag is set.

State reuse between multiple runs of the same algorithm

Some algorithms in petgraph (toposort, has_path_connecting) allow to pass optional state. This allows reusing allocations that were done in a previous run of the algorithm. If the algorithm is executed repeatedly on the same or just slightly changed graph, reusing the internal state or at least some information can avoid potential reallocs for buffers growing. In general, any information from the past runs should help prepare the internal state such that it is more efficient for the run.

We use the builder pattern for specifying parameters for algorithms. A tempting solution would be to introduce state builder method which would accept &mut State (e.g., ShortestPathsState) and use it instead of a default in the run. A disadvantage is that it introduces a lifetime to the builders' type signature. On the other hand, it is expected that builders have many generic parameters, so it might be worthy.

Another way would be to introduce flavors of run methods (e.g., run_with) that would have an additional parameter for the &mut State.

As for the contents of such state, we often cannot reuse some allocations because they are moved to the result (e.g., dist map in the shortest paths algorithm). But we could at least store the final size, so it can be used for initializing the collections using with_capacity constructor. Other allocations can be actually reused because they are indeed internal state.

We also need to have the state such that it is available for all algorithms for a problem. That is, the state must be as general as to support Dijkstra, Bellman-Ford and BFS (and later others) for shortest path problem, for instance. For Bellman-Ford, there seems to be nothing useful to share between runs. For Dijkstra and BFS, we could share size of maps for distances and predecessors as well as the queue (or at least its size). Using just "metadata" (size) instead of real data (queue) might simplify the type signature, which is probably worth an extra allocation, which is insignificant, because the main goal is to avoid reallocations on large graphs.

For ShortestPaths therefore, there could be this type representing the internal state:

#[derive(Default, Clone)]
pub struct ShortestPathsState {
    dist_len: usize,
    pred_len: usize,
    queue_len: usize,
}

And the usage would then be:

struct MyGraphWrapper {
    graph: Graph<String, u32, Directed>,
    state: ShortestPathsState,
}

impl MyGraphWrapper {
    pub fn new() -> Self {
        Self {
            graph: Graph::new(),
            state: ShortestPathsState::default(),
        }
    }

    // graph manipulating API

    pub fn shortest_path(&mut self, from: VertexIndex, to: VertexIndex) -> u32 {
        ShortestPaths::on(&self.graph)
            .goal(to)
            // Alternative: `&self` + a synchronization primitive over state + cloning and storing back
            .state(&mut self.state)
            .run(from)
            .unwrap()[to]
    }
}

For problems where the algorithms differ significantly and their internal state is non-trivial, it will be probably beneficial to wrap the algorithm-specific state into options:

pub struct State {
    algo1: Option<Algo1State>,
    algo2: Option<Algo2State>,
    shared: usize,
}

Desired behavior of shortest paths algorithms when the goal is not reached

The shortest paths API allows to specify a goal vertex:

let result = ShortestPaths::on(&graph).goal(target).run(start).unwrap();

What should be the behavior when the target vertex is not reachable from the start vertex?

Silently ignore that. Then getting distance to the target returns None.
Return an error from run.

Tweak API for using a custom storage in graphs

Currently, the way to specify a custom storage when initializing a graph is this:

let mut graph = Graph::with_storage(AdjMatrix::<_, _, Undirected, DefaultIndexing>::new());

The need for specifying the Undirected and DefaultIndexing generic parameters brings unnecessary clutter. Moreover, we should follow the convention of the Rust standard library, which uses new_in naming convention for specifying allocators.

To improve the user experience, we should make these changes:

Provide undirected and directed constructor methods, the same way as we do with default new_undirected and new_directed. This removes the need of specifying directionality generic parameter.
Use new_in naming convention. That is, there will be these three constructors: new_in (directionality unspecified), new_undirected_in and new_directed_in.
For the graph storages, change the implementation of Default to fix the indexing to DefaultIndexing. This removes the need of specifying DefaultIndexing (but a custom indexing would still be allowed via new (e.g., AdjList::new())).

After all, the above example will look like this:

let mut graph = Graph::new_undirected_in(AdjMatrix::default());

Implement shortest paths algorithm from "Negative-Weight Single-Source Shortest Paths in Near-linear Time" paper

source

I haven't studied the paper yet, but I would like to have this included in the library. The reasons are:

There would be an implementation of an algorithm for shortest paths with negative weights faster than Bellman-Ford (there have been other algorithms, but this was is claiming the best time complexity bound.
The abstract claims that the algorithm is "simple". Whereas it is definitely more complex than Bellman-Ford, it should be simpler than recent alternatives.
Implementing the algorithm will show if our API and infrastructure is sufficient for a modern algorithm, or show gaps that we need to fill.

Ensure minimal generic type constraints are used

The codebase is generics-heavy and a usual pattern is that an algorithm, operator, etc. constraints the generic types by core traits (e.g., VerticesBase, Edges<E, Ty>, etc, such that it can provide the desired functionality. Among related traits, there is usually some kind of relation such that one trait makes stronger demands on the type than the other. The relations are:

VerticesBase -- (adds data V) --> Vertices<V> -- (adds mutation) --> VerticesMut<V>, and similarly for edges
standard traits make stronger demands than their "weak" counterparts, e.g., VerticesBase::vertex_count returns usize while VerticesBaseWeak::vertex_count_hint`` returns Option`

The code has gone through a lot of changes and some of these traits were added on the way. There are probably places where the constraints are unnecessarily strong and relaxing them would make the bounds simpler or more widely applicable.

Semantics of neighbors iterator for undirected self-loops

It feels inefficient and counter-intuitive to yield a self-loop edge twice when iterating over neighbors of a vertex. However, a self-loop in undirected graphs contributes 2 to a vertex's degree, for the two ends of the edge. So yielding the edge only once breaks the current default implementation of degree calculation on Neighbors trait. I think it would be fine to remove the default implementations and require ones from the storage implementer.

On top of that, should this be a specified behavior that is required to be followed by any storage? Because if it is left to be implementation-specific, this would probably cause quite a hard time for correct implementation of degree for Stable graph adapter, where it would not be clear if a self-loop edge should be counted twice or not (depending on whether the underlying iterator yields the self-loop twice or not).

Is there a merit in requiring yielding a self-loop twice always? This would keep the default implementation of degree calculation trivial, but is it worthy? Are there any other arguments in favor?

Extend `Neighbors` trait with common aliases

Add new methods listed below to Neighbors trait. All of them are aliases to an already-provided functionality so they should have a default implementation. These methods should also be added to graph encapsulations in graph module.

successors(&self, src: &Self::VertexIndex) -> Self::NeighborsIter<'_> aliased to self.neighbors_directed(src, Outgoing)
predecessors(&self, src: &Self::VertexIndex) -> Self::NeighborsIter<'_> aliased to self.neighbors_directed(src, Incoming)
out_degree(&self, src: &Self::VertexIndex) -> usize aliased to self.degree_directed(src, Outgoing)
in_degree(&self, src: &Self::VertexIndex) -> usize aliased to self.degree_directed(src, Incoming)

Graph generalizations

This library could also provide graph types that are not so common, for example hypergraph or mixed graph. I haven't really looked into their specifics regarding storage and APIs.

Constant weights

For graphs without edge weights, shortest path algorithms use some constant value (e.g., 1), which is represented by Unit type. However, the implementation of Dijkstra still accesses the edge to the next neighbor to be able to get the weight, even when the weight getter does not actually need that edge (as in Unit case). This is unfortunate for two reasons:

It does unnecessary work (retrieving the edge data) which is never zero-cost in standard storages.
It requires implicit graphs to implement EdgesWeak::edge_weak that returns Some, even if otherwise unnecessary.

Instead, the GetWeight trait can be extended such that it can communicate an availability of a constant weight, i.e., allow to skip retrieving the edges.

pub trait GetWeight<E, W>
where
    W: Weight,
{
    fn get(&self, edge: &E) -> W;
    fn get_const(&self) -> Option<W> { None }
}

Having a default implementation means that it is opt-in, and can be really overridden only by our Unit type. Taking &self (i.e., not using fn get_const() -> Option<W> signature, is to allow types that return a constant value determined at runtime (e.g., passed to the constructor of a hypothetical Constant weight getter).

The implementation of an algorithm can then first ask for get_const and use the edge to compute the weight only if the constant value is not available. I strongly believe that the compiler should even be able to eliminate the check and choose the appropriate code path when the concrete types are known.

Fallible methods

Some operations on graph can fail. A nice example is add_edge(src, dst, data), where

src and/or dst are not present in the graph,
this is a duplicate edge, when the underlying storage does not support multiedges

For fallible operation, we may introduce fallible API such as try_add_edge that would return a Result. For trait implementations, only the fallible method would be required and the panicking variant would have a default implementation that calls try_ counterpart and unwraps the result.

Each fallible method should have a corresponding Error type that should be general enough to cover all practical cases. Making the error types as associative types would be an overkill and complicate the API.

Implicit graph helper struct

Inspired by pathfinding crate, it would be nice to have a possibility to specify a structure of an implicit graph just by providing a function to get the neighbors of a vertex. Currently, it is necessary to implement several traits (Neighbors and *Weak traits).

There could be a struct Implicit that would be a wrapper over a generic F that would represent a function implementing the Neighbors::neighbors_directed. The rest of Neighbors trait plus all *Weak traits would be implemented by the wrapper.

The implementation of Neighbors::neighbors will be probably a bit tricky, it needs to correctly handle undirected and directed graphs. And the type of Neighbors::NeighborsIter will need to be something like Box<dyn Iterator> so that the implementation of Neighbors::neighbors can chain outgoing and incoming edges.

Fix storages

#46 introduced fuzz targets for all currently implemented storages. All of them failed. Fix the storages such that fuzzing does not reveal a problem after running for some time.

Improve `visit` module

The idea of the visit module is to provide API for standard graph traversal algorithms (DFS, BFS) with certain flavors (plain, with events, ...). Apart from being used by the user, it should also provide sufficient API to be used in the algorithm implementations.

The problems are:

The API for BFS is not sufficient even for implementing a basic algorithm such as shortest paths on unweighted graph (source).
The raw submodule is not not exactly pretty. It might be that the ugliness is somewhat inherent due to its purpose, but I believe that some marginal improvements or simplifications can be made.

Subset adapter

A Subset adapter would be used to wrap a graph and overriding the implementations of the traits such that vertices and edges are filtered based on some criterion. Rough sketch of the API:

pub struct Subset<G, S = ()> {
    graph: G,
    state: S,
    filter_vertex: Box<dyn Fn(&G::VertexIndex, &G, &S) -> bool>,
    filter_edge: Box<dyn Fn(&G::EdgeIndex, &G, &S) -> bool>,
}

impl<G> Subset<G> {
    pub fn new(graph: G) -> Self {
        Self::with_state(graph, ())
    }
}

impl<G, S> Subset<G, S> {
    pub fn with_state(graph: G, state: S) -> Self {
        Self {
            graph,
            state,
            filter_vertex: Box::new(|_, _, _| true),
            filter_edge: Box::new(|_, _, _| true),
        }
    }

    pub fn into_inner(self) -> G {
        self.graph
    }

    pub fn filter_vertex<F>(self, predicate: F) -> Self
    where
        F: Fn(&G::VertexIndex, &G, &S) -> bool
    {
        Self {
            filter_vertex: Box::new(predicate),
            ..self
        }
    }

    pub fn filter_edge<F>(self, predicate: F) -> Self
    where
        F: Fn(&G::EdgeIndex, &G, &S) -> bool
    {
        Self {
            filter_edge: Box::new(predicate),
            ..self
        }
    }
}

// impl Vertices, Edges, Neighbors, ...

Shortest paths on undirected graphs with negative weights

Bellman-Ford algorithm currently works only on directed graphs (although this is not enforced by the type constraints nor algorithm selection at runtime). It can be adapted to undirected graphs by treating every edge as two directed edges going in the other directions, but then every edge with negative weight becomes a negative cycle. This may be a problem for any algorithm supporting negative weights, including #38 (but I did not investigate this proper).

If there isn't an algorithm that would handle undirected graphs with negative weights, I think the proper solution is to use Dijkstra. Because any algorithm would fail in some way if there is a negative edge, but Dijkstra will be faster in the happy cases.

Fix hygiene of procedural macros

The current implementation of derive procedural macros refers to types and traits from the library, assuming it is in scope. This leads to "cannot find trait ... in this scope" and alike when it is not the case.

Fix adding duplicate edge to AdjMatrix

AdjMatrix::add_edge now adds 1 to the internal edge counter unconditionally. However, adjacency matrix does not support multiple edges between two vertices and a new edge always replaces the previous one. In such case, the internal edge counter must remain unchanged.