Comments (27)
One little but significant advantage of the current implementation of edges
is that it has no constraint on the type of vertices a
, i.e. it's fully polymorphic edges :: [(a, a)] -> Graph a
. What you suggest will require Ord a
, ruling out some graphs, e.g. whose vertices are functions.
I think it's reasonable to add a new function, perhaps called edgesOrd
, that will rely on Ord a
to make the resulting expression smaller.
In general, this is an interesting research question, how to efficiently turn a list of edges into a compact algebraic representation. One of the most promising approaches is to use modular decomposition.
from alga.
Ah this way, it is ok !
I just need to replace overlays
by something like overlays1
, without the Empty
leaf at the end :)
from alga.
Of course graph-creation will be slightly longer, maybe allow the user to choose (between a more "alga" structure and a quick one) can be a good idea.
For comparison, the creation of a clique with 1000
vertices from a list of edges with Fgl.PatriciaTree
takes 4
seconds, meanwhile for the actual Algebraic.Graph
it takes 500 ms
from alga.
So concerning edgesOrd
, I made some experiments:
edgesOrd :: Ord a => [(a, a)] -> Graph a
edgesOrd = Map.foldrWithKey (\k v acc -> Overlay acc $ Connect (Vertex k) $ vertices1 v) Empty . Map.fromListWith (<>) . map (fmap return)
where
vertices1 (x :| xs) = foldr (Overlay . vertex) (vertex x) xs
It build an adjacency map and convert it to a Graph
:
edgesOrd
ensure a size x == edgeCount x + vertexCount x
while edges
give size x == 2 * edgeCount x + 1
Then benchmarks are rather good, except for some functions ^^:
vertexList: 0.67 (good)
isEmpty: 359.36 (bad)
vertexCount: 0.66 (good)
hasVertex: 84714.52 (bad)
edgeCount: 0.70 (good)
edgeList: 0.73 (good)
hasEdge: 0.58 (good)
hasSelfLoop: 0.58 (good)
addEdge: 1.15 (bad)
addVertex: 1.21 (bad)
removeVertex: 0.68 (good)
equality: 0.81 (good)
removeEdge: 0.48 (good)
transpose: 1.33 (bad)
dff: 0.95 (OK)
topSort: 0.96 (OK)
edges: 10.23 (bad)
So the main comment is that it is a good idea, but can lead to some problems (see hasVertex
). While still being faster than Fgl
(Fgl takes 3.2s
seconds to build a clique with 1000 vertices while using edgesOrd
alga takes only 500ms
).
hasVertex
can be optimized in hasVertexOrd
and I think others operations too, but this require to store somewhere the fact that the graph was created with edgesOrd
, because it give a predictable structure.
from alga.
@nobrakal Interesting :) I think your implementation is not lazy enough (you build the whole map before producing the graph) and also tries to do too much work by eliminating duplicated edges, which leads to significant slowdown on some graph algorithms. Does switching to Data.Map.Lazy
help?
What about trying an intermediate solution, where you build something like Map a [a]
, where the lists are allowed to have duplicates?
Perhaps, you could even use something simpler like group
possibly after sort
:
https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List.html#v:group
Keep in mind that Map
is an expensive data structure, which supports fast searching, but you do not need this in edgesOrd
, so you are currently doing more work than necessary.
from alga.
also tries to do too much work by eliminating duplicated edges ?
Am I trying to remove duplicated edges ? The type of the map is Map a (NonEmpty a)
, why this implicate removing duplicated edges ?
But you are right, I will explore others solutions :)
from alga.
Am I trying to remove duplicated edges ?
Oops, I didn't spot that you use Map.fromListWith (<>)
-- indeed, this does not eliminate duplicates. Still I have a feeling you are doing more work than necessary by storing intermediate results in a balanced tree.
from alga.
Yep, you were right, using groupBy
:
edgesEq :: Eq a => [(a, a)] -> Graph a
edgesEq = foldr (\(v,l) g -> Overlay g $ Connect (Vertex v) l) Empty . groupByWithVertices
groupByWithVertices :: Eq a => [(a,a)] -> [(a,Graph a)]
groupByWithVertices [] = []
groupByWithVertices (x:xs) = (fst x, vertices1 x ys) : groupByWithVertices zs
where
eq = on (==) fst
(ys,zs) = span (eq x) xs
vertices1 x = foldr (Overlay . vertex . snd) (vertex $ snd x)
This behave almost like the previous try, but require only an Eq
instance and is 2 times faster.
Because it is the same idea than my previous comment, resulting graphs are the same than in my previous benchmarks, and thus the benchmarks are equivalents for the version of this comment.
from alga.
@nobrakal Good! So, you are saying with the new version, hasVertex
will still be around 84714x slower? I've no idea where such a huge slowdown comes from! Can you explain?
from alga.
Here is a detailed benchmark of hasVertex
### Clique
#### 1000
##### 0
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 34.91 ns | 1.000 |
| Alga || 25.00 ms | 0.977 |
+---------++----------+-------+
##### 333
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 7.451 μs | 1.000 |
| Alga || 11.08 ms | 0.913 |
+---------++----------+-------+
##### 666
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 14.59 μs | 0.996 |
| Alga || 1.267 ms | 1.000 |
+---------++----------+-------+
##### 1000
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 16.09 ms | 0.974 |
| Alga || 22.70 ms | 1.000 |
+---------++----------+-------+
### Mesh
#### 1000
##### 0
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 34.83 ns | 0.999 |
| Alga || 40.70 μs | 1.000 |
+---------++----------+-------+
##### 333
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 13.81 μs | 0.999 |
| Alga || 39.08 μs | 0.999 |
+---------++----------+-------+
##### 666
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 20.62 μs | 1.000 |
| AlgaOld || 29.31 μs | 1.000 |
+---------++----------+-------+
##### 1000
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 39.67 μs | 0.998 |
| AlgaOld || 46.94 μs | 1.000 |
+---------++----------+-------+
I am a bit lost since the slowdown really come from "simple" foldg
. hasEdge
is faster with graphs built with edgesEq
for example.
The only explanation I see is that we are building huge Boolean thunks since the graph it a bit more "balanced" than one constructed with edges
. Such a huge drop is still very problematic, it seems very high for only a strictness problem...
Even stranger, for a Clique
the new representation is always worse than something created with edges
...
from alga.
The only explanation I see is that we are building huge Boolean thunks since the graph it a bit more "balanced" than one constructed with edges .
@nobrakal You can test this hypothesis very easily: just generate a balanced Graph
expression, for example (1 + 1) + (1 + 1)
scaled to a few layers and see how it compares to a linear chain 1 + (1 + (1 + 1))
.
P.S.: Comparing to ((1 + 1) + 1) + 1
is also interesting. I've always wondered how this works, but never tried! So, it's a useful experiment.
from alga.
Here are some benchs, using:
- balanced, a balanced graph expression with 32768 of type Int
- vertices, a graph obtained from
vertices $ replicate 32768 1
- transposedVertices, a graph obtained with
foldg Empty Vertex (flip Overlay) (flip Connect) $ vertices $ replicate 32768 1
benchmarking hasVertex 0/balanced
time 542.5 μs (537.6 μs .. 547.7 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 542.6 μs (537.2 μs .. 547.3 μs)
std dev 16.68 μs (13.36 μs .. 22.13 μs)
variance introduced by outliers: 23% (moderately inflated)
benchmarking hasVertex 0/vertices
time 436.1 μs (427.2 μs .. 448.1 μs)
0.997 R² (0.995 R² .. 0.999 R²)
mean 440.5 μs (435.4 μs .. 447.4 μs)
std dev 20.03 μs (15.94 μs .. 24.05 μs)
variance introduced by outliers: 40% (moderately inflated)
benchmarking hasVertex 0/transposedVertices
time 811.5 μs (801.9 μs .. 824.4 μs)
0.997 R² (0.994 R² .. 1.000 R²)
mean 814.9 μs (807.5 μs .. 827.8 μs)
std dev 31.26 μs (19.64 μs .. 53.27 μs)
variance introduced by outliers: 29% (moderately inflated)
benchmarking hasVertex 1/balanced
time 136.6 ns (135.5 ns .. 138.1 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 136.1 ns (135.8 ns .. 136.9 ns)
std dev 1.471 ns (919.7 ps .. 2.818 ns)
benchmarking hasVertex 1/vertices
time 27.42 ns (26.67 ns .. 28.39 ns)
0.995 R² (0.992 R² .. 0.997 R²)
mean 27.50 ns (27.01 ns .. 28.12 ns)
std dev 1.859 ns (1.468 ns .. 2.554 ns)
variance introduced by outliers: 83% (severely inflated)
benchmarking hasVertex 1/transposedVertices
time 604.8 μs (598.1 μs .. 612.0 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 597.2 μs (593.9 μs .. 602.6 μs)
std dev 13.72 μs (9.668 μs .. 19.87 μs)
variance introduced by outliers: 14% (moderately inflated)
So the balanced option is less efficient than the one obtained with vertices
(and a reversed structure is not very great ^^).
Also, I spotted something, with the benchs: I was not testing the last vertex of the domain, and this test is faster (the same problem that we had when improving NotEmptyGraph
: the last vertex comes first in the representation):
##### 999
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 13.18 μs | 1.000 |
| AlgaOld || 21.89 μs | 0.999 |
+---------++----------+-------+
But even for the "better positioned" vertex, the time is not so great (comparing to the ~30ns
for the better positioned vertex for actual edges
from alga.
Your comment about the difference between 1 + ( +1 ( 1 + ... ))
and ( ( ... + 1 ) + 1 ) + 1
leads me to re-considerate the function, and in fact I was producing a somewhere bad representation (like ( ( ... + 1 ) + 1 ) + 1
).
Inverting the two arguments of Overlay
leads to more reasonable results ^^:
### Clique
#### 1000
##### 0
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 33.36 ns | 1.000 |
| Alga || 36.01 ns | 0.999 |
+---------++----------+-------+
##### 333
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 3.837 μs | 1.000 |
| AlgaOld || 7.245 μs | 1.000 |
+---------++----------+-------+
##### 666
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 9.044 μs | 1.000 |
| AlgaOld || 17.36 μs | 1.000 |
+---------++----------+-------+
##### 1000
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 14.12 ms | 0.970 |
| Alga || 22.65 ms | 0.981 |
+---------++----------+-------+
### Mesh
#### 1000
##### 0
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| AlgaOld || 33.34 ns | 1.000 |
| Alga || 33.48 ns | 1.000 |
+---------++----------+-------+
##### 333
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 11.69 μs | 1.000 |
| AlgaOld || 15.84 μs | 1.000 |
+---------++----------+-------+
##### 666
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 22.40 μs | 1.000 |
| AlgaOld || 30.80 μs | 1.000 |
+---------++----------+-------+
##### 1000
+---------++----------+-------+
| || Time | R² |
+=========++==========+=======+
| Alga || 35.34 μs | 1.000 |
| AlgaOld || 47.06 μs | 0.999 |
+---------++----------+-------+
SUMMARY:
* Alga was the fastest 5 times
* AlgaOld was the fastest 1 times
There was 2 ex-aequo
The function used was:
edges :: [(a, a)] -> Graph a
edges = overlays . map (uncurry edge)
edgesEq :: Eq a => [(a, a)] -> Graph a
edgesEq = foldr (\(v,l) -> Overlay (Connect (Vertex v) l)) Empty . groupByWithVertices
groupByWithVertices :: Eq a => [(a,a)] -> [(a,Graph a)]
groupByWithVertices [] = []
groupByWithVertices (x:xs) = (fst x, vertices1 x ys) : groupByWithVertices zs
where
eq = on (==) fst
(ys,zs) = span (eq x) xs
vertices1 x = foldr (Overlay . vertex . snd) (vertex $ snd x)
I am currently running the whole suite to see if it is introducing other problems !
from alga.
vertexList: 0.71 (good)
isEmpty: 1.00 (OK)
vertexCount: 0.65 (good)
hasVertex: 0.83 (good)
edgeCount: 0.60 (good)
edgeList: 0.58 (good)
hasEdge: 1.64 (bad)
hasSelfLoop: 0.57 (good)
addEdge: 1.07 (OK)
addVertex: 1.15 (bad)
removeVertex: 0.69 (good)
equality: 0.56 (good)
removeEdge: 0.49 (good)
transpose: 0.83 (good)
dff: 1.08 (OK)
topSort: 0.99 (OK)
edges: 2.15 (bad)
Now hasEdge
is not so happy... I will try with #97
from alga.
Comparing the #97 modification plus edgesEq
and only the #97 modification, I get a:
hasEdge: 0.75 (good)
So I think all seems good here
from alga.
@nobrakal So, speaking very roughtly, the algorithms get a small (say 30%) speed up at the cost of 2x slower graph construction. I guess this is a good tradeoff, but I hope we can find further improvements for the edges
function, so that it achieves better compression.
from alga.
Yep, this is certainly possible, requiring an Ord
instance.
I played a bit, and came to an horrible function I won't show here (one can find it at: https://github.com/nobrakal/alga/blob/edgesOrd/src/Algebra/Graph.hs#L313 ).
Roughly the idea I had was to build an adjacency map, and this time, to use it: Each time I am trying to connect two vertices x and y, I search if the two are connected to some vertices, c d e, so I can create:
x * ( y * (c + d + e))
So this is an improvement from the first version because it reduce the duplication of vertices c d and e.
So it compresses greatly:
*Algebra.Graph> let x = edgeList $ clique [1..1000] :: [(Int,Int)]
*Algebra.Graph> length x
499500
*Algebra.Graph> size $ edges x
999001
*Algebra.Graph> size $ edgesEq x
500500
*Algebra.Graph> size $ edgesOrd x
251498
But, sadly, I cannot come to very descent performances:
edges: 35.71 (bad)
Using specialized IntMap
and IntSet
:
edges: 11.17 (bad)
The benchmarks:
vertexList: 0.50 (good)
isEmpty: 0.96 (OK)
vertexCount: 0.47 (good)
hasVertex: 48439.75 (bad)
edgeCount: 0.56 (good)
edgeList: 0.55 (good)
hasEdge: 0.48 (good)
hasSelfLoop: 0.40 (good)
addEdge: 0.75 (good)
addVertex: 0.78 (good)
removeVertex: 0.51 (good)
equality: 0.44 (good)
removeEdge: 0.43 (good)
edges: 12.28 (bad)
Again an issue with hasVertex
otherwise numbers are rather but not extraordinary better.
I don't know if it worth investigating and optimizing this solution
from alga.
Ok, I have managed to find a better version of edgesOrd
:
vertexList: 0.53 (good)
isEmpty: 1.00 (OK)
vertexCount: 0.53 (good)
hasVertex: 0.76 (good)
edgeCount: 0.53 (good)
edgeList: 0.52 (good)
hasEdge: 0.42 (good)
hasSelfLoop: 0.47 (good)
addEdge: 0.64 (good)
addVertex: 0.75 (good)
removeVertex: 0.48 (good)
equality: 0.43 (good)
removeEdge: 0.42 (good)
transpose: 0.60 (good)
reachable: 0.67 (good)
edges: 12.66 (bad)
It is 10 times slower than a regular edges
for every type that is an instance of Ord
, and corrected the previous problem with hasVertex
.
For Int
vertices, it is only 7 times slower.
Code is here: https://github.com/nobrakal/alga/blob/edgesOrd/src/Algebra/Graph.hs#L314
I will investigate further improvements :)
from alga.
I made some further improvements (https://github.com/nobrakal/alga/blob/edgesOrd/src/Algebra/Graph.hs#L312)
With a general Ord
constraint
edges: 6.94 (bad)
Specialized to Int
edges: 4.05 (bad)
size
is also smaller:
> let x = edgeList $ clique [1..1000] :: [(Int,Int)]
> size $ edgesOrd x
250501
from alga.
@nobrakal Many thanks for ongoing experiments!
The latest solution gives around 50% speed up across API, but it's quite complex and doesn't seem to provide any useful guarantees/properties for resulting expressions. We could, of course, include it into the API with a comment "This function is around 5x slower than edges
, but if you use it to construct graphs, all other functions are probably going to be 2x times faster", but this seems a bit vague and ad hoc.
If we are prepared to pay a lot in terms of performance and complexity, perhaps, it is worth looking into a more principled approach, for example, based on modular graph decomposition? Then we will at least be able to provide certain guarantees about the result, for example, that all graph modules have been collapsed into sub-expressions.
Of course, it's tempting to add edgesOrd
to improve Alga's standing in the benchmarks suite, but will Alga users actually want to construct graphs using edgesOrd
? Maybe they will prefer to construct graphs algebraically, which gives direct control over the resulting graph structure.
P.S.: By the way, I guess you could use the star
combinator in a few places in your implementation to get some simplification.
from alga.
Perhaps, edgesOrd
should actually be called stars
and take a list [(a, [a])]
as input. Then there may be a separate function somewhere in Algebra.Graph.Utilities
that turns a list of edges into a list of adjacency lists. This will make the API simpler to understand and explain.
(But your latest implementation may be doing more than just stars
underneath!).
from alga.
provide any useful guarantees/properties for resulting expressions
Yep, we only know that size (edgesOrd x) <= length x + nbVertices x
. So when possible, it is better than edgesEq
.
Note that the previous edgesEq
worked only with sorted list of edges (Since I did all my tests with edgesList
to produce a list of edges...). So we need an Ord
instance anyway, to sort things :)
but will Alga users actually want to construct graphs using edgesOrd?
When building a graph with Int
vertices, or something with an Ord
instance not too expensive, it can certainly be interesting. Since actual edges
is extremely fast, one can chose to spend some time to build a better graph for later use (and edgesOrd
is faster than FGL edges
equivalent anyway).
Maybe they will prefer to construct graphs algebraically, which gives direct control over the resulting graph structure.
Obviously this is the better thing to do :)
based on modular graph decomposition?
It certainly worth a try, but as you said this a complex thing, and the implementation has to be a bit optimized if don't want to take many time to build a graph !
edgesOrd should actually be called stars
I build a list of (a,Set a)
(NonEmpty Set to be more precise), so I can't use such a function, and having starsOrd
does not seems very useful...
from alga.
I build a list of
(a,Set a)
(NonEmpty Set
to be more precise), so I can't use such a function, and havingstarsOrd
does not seems very useful...
I don't understand. Let's assume we add the following function to the API:
stars :: [(a, [a])] -> Graph a
stars = overlays . map (uncurry star)
Then, given xs :: [(a, Set a)]
you can call call stars
as stars . map (\(x, s) -> (x, toList s))
.
from alga.
Sadly, using stars
drop a bit performances:
edges: 10.25 (bad)
(As a reminder, without stars
(so with graphs constructed in place:
edges: 8.65 (bad)
)
from alga.
@nobrakal Strange. Presumably, you should have exactly the same code in your implementation, since you are getting a bunch of stars in the end, isn't it? Could you explain the slowdown?
from alga.
You are right. I have reworked it to use non-empty lists when possible and thus avoiding a pattern match on the empty list, and I dropped the result to a:
edges: 9.48 (bad)
What left of the slowdown comes I think to an optimization that does not happen in these intricate mix of foldr/map
from alga.
What left of the slowdown comes I think to an optimization that does not happen in these intricate mix of foldr/map
I would expect everything to be the same if stars
is inlined. It would be nice to eliminate any overhead left due to having a separate function.
from alga.
Related Issues (20)
- How to handle goodness inherited from other libraries HOT 2
- Bump upper bound for base-compat for version 0.4 on Hackage HOT 4
- Unable to use ghcid HOT 6
- Algorithms should be in terms of typeclasses not data structures HOT 15
- Acyclic graphs with Int labels HOT 5
- Tests failed to build with QuickCheck 2.14.2
- Multitree HOT 3
- Monoid instance HOT 6
- support for GraphViz "HTML strings" for node or edge labels HOT 1
- Add benchmarks to CI HOT 2
- Supporting different node and edge types HOT 7
- v0.6 fails to build with GHC 8.0 and 8.2 HOT 3
- Doubt about decorating a Graph HOT 4
- Compatibility with mtl-2.3 HOT 2
- Smart constructor for Symbol rather than requiring -XOverloadedLists HOT 1
- Proposal : foldgM HOT 2
- Algebra.Graph.Labelled.AdjacencyMap.edges appears to be broken HOT 2
- Semigroup version of Algebra.Graph.Labelled.AdjacencyMap HOT 7
- Bump "deepseq" dependency bounds to 1.5.0.0 HOT 2
- Warnings when building with ghc-9.10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alga.