I investigated about poor results of alga in <a href=

Ah this way, it is ok ! I just need to replace <code class="notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yep, you were right, using groupBy : <div class="h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here is a detailed benchmark of hasVertex <div cl

Improve edges function of Algebraic.Graph,about snowleopard/alga

Comments (27)

snowleopard commented on September 24, 2024 1

One little but significant advantage of the current implementation of edges is that it has no constraint on the type of vertices a, i.e. it's fully polymorphic edges :: [(a, a)] -> Graph a. What you suggest will require Ord a, ruling out some graphs, e.g. whose vertices are functions.

I think it's reasonable to add a new function, perhaps called edgesOrd, that will rely on Ord a to make the resulting expression smaller.

In general, this is an interesting research question, how to efficiently turn a list of edges into a compact algebraic representation. One of the most promising approaches is to use modular decomposition.

from alga.

nobrakal commented on September 24, 2024 1

Ah this way, it is ok !

I just need to replace overlays by something like overlays1, without the Empty leaf at the end :)

from alga.

nobrakal commented on September 24, 2024

Of course graph-creation will be slightly longer, maybe allow the user to choose (between a more "alga" structure and a quick one) can be a good idea.
For comparison, the creation of a clique with 1000 vertices from a list of edges with Fgl.PatriciaTree takes 4 seconds, meanwhile for the actual Algebraic.Graph it takes 500 ms

from alga.

nobrakal commented on September 24, 2024

So concerning edgesOrd, I made some experiments:

edgesOrd :: Ord a => [(a, a)] -> Graph a
edgesOrd = Map.foldrWithKey (\k v acc -> Overlay acc $ Connect (Vertex k) $ vertices1 v) Empty . Map.fromListWith (<>) . map (fmap return)
  where
    vertices1 (x :| xs) = foldr (Overlay . vertex) (vertex x) xs

It build an adjacency map and convert it to a Graph:
edgesOrd ensure a size x == edgeCount x + vertexCount x while edges give size x == 2 * edgeCount x + 1

Then benchmarks are rather good, except for some functions ^^:

vertexList: 0.67 (good)
isEmpty: 359.36 (bad)
vertexCount: 0.66 (good)
hasVertex: 84714.52 (bad)
edgeCount: 0.70 (good)
edgeList: 0.73 (good)
hasEdge: 0.58 (good)
hasSelfLoop: 0.58 (good)
addEdge: 1.15 (bad)
addVertex: 1.21 (bad)
removeVertex: 0.68 (good)
equality: 0.81 (good)
removeEdge: 0.48 (good)
transpose: 1.33 (bad)
dff: 0.95 (OK)
topSort: 0.96 (OK)
edges: 10.23 (bad)

So the main comment is that it is a good idea, but can lead to some problems (see hasVertex). While still being faster than Fgl (Fgl takes 3.2s seconds to build a clique with 1000 vertices while using edgesOrd alga takes only 500ms).

hasVertex can be optimized in hasVertexOrd and I think others operations too, but this require to store somewhere the fact that the graph was created with edgesOrd, because it give a predictable structure.

from alga.

snowleopard commented on September 24, 2024

@nobrakal Interesting :) I think your implementation is not lazy enough (you build the whole map before producing the graph) and also tries to do too much work by eliminating duplicated edges, which leads to significant slowdown on some graph algorithms. Does switching to Data.Map.Lazy help?

What about trying an intermediate solution, where you build something like Map a [a], where the lists are allowed to have duplicates?

Perhaps, you could even use something simpler like group possibly after sort:

https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List.html#v:group

Keep in mind that Map is an expensive data structure, which supports fast searching, but you do not need this in edgesOrd, so you are currently doing more work than necessary.

from alga.

nobrakal commented on September 24, 2024

also tries to do too much work by eliminating duplicated edges ?

Am I trying to remove duplicated edges ? The type of the map is Map a (NonEmpty a), why this implicate removing duplicated edges ?

But you are right, I will explore others solutions :)

from alga.

snowleopard commented on September 24, 2024

Am I trying to remove duplicated edges ?

Oops, I didn't spot that you use Map.fromListWith (<>) -- indeed, this does not eliminate duplicates. Still I have a feeling you are doing more work than necessary by storing intermediate results in a balanced tree.

from alga.

nobrakal commented on September 24, 2024

Yep, you were right, using groupBy:

edgesEq :: Eq a => [(a, a)] -> Graph a
edgesEq = foldr (\(v,l) g -> Overlay g $ Connect (Vertex v) l) Empty . groupByWithVertices

groupByWithVertices :: Eq a => [(a,a)] -> [(a,Graph a)]
groupByWithVertices []     = []
groupByWithVertices (x:xs) = (fst x, vertices1 x ys) : groupByWithVertices zs
  where
    eq = on (==) fst
    (ys,zs) = span (eq x) xs
    vertices1 x = foldr (Overlay . vertex . snd) (vertex $ snd x)

This behave almost like the previous try, but require only an Eq instance and is 2 times faster.

Because it is the same idea than my previous comment, resulting graphs are the same than in my previous benchmarks, and thus the benchmarks are equivalents for the version of this comment.

from alga.

snowleopard commented on September 24, 2024

@nobrakal Good! So, you are saying with the new version, hasVertex will still be around 84714x slower? I've no idea where such a huge slowdown comes from! Can you explain?

from alga.

nobrakal commented on September 24, 2024

Here is a detailed benchmark of hasVertex

### Clique
#### 1000
##### 0

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 34.91 ns | 1.000 |
|    Alga || 25.00 ms | 0.977 |
+---------++----------+-------+

##### 333

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 7.451 μs | 1.000 |
|    Alga || 11.08 ms | 0.913 |
+---------++----------+-------+

##### 666

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 14.59 μs | 0.996 |
|    Alga || 1.267 ms | 1.000 |
+---------++----------+-------+

##### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 16.09 ms | 0.974 |
|    Alga || 22.70 ms | 1.000 |
+---------++----------+-------+

### Mesh
#### 1000
##### 0

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 34.83 ns | 0.999 |
|    Alga || 40.70 μs | 1.000 |
+---------++----------+-------+

##### 333

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 13.81 μs | 0.999 |
|    Alga || 39.08 μs | 0.999 |
+---------++----------+-------+

##### 666

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 20.62 μs | 1.000 |
| AlgaOld || 29.31 μs | 1.000 |
+---------++----------+-------+

##### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 39.67 μs | 0.998 |
| AlgaOld || 46.94 μs | 1.000 |
+---------++----------+-------+

I am a bit lost since the slowdown really come from "simple" foldg. hasEdge is faster with graphs built with edgesEq for example.
The only explanation I see is that we are building huge Boolean thunks since the graph it a bit more "balanced" than one constructed with edges . Such a huge drop is still very problematic, it seems very high for only a strictness problem...

Even stranger, for a Clique the new representation is always worse than something created with edges...

from alga.

snowleopard commented on September 24, 2024

The only explanation I see is that we are building huge Boolean thunks since the graph it a bit more "balanced" than one constructed with edges .

@nobrakal You can test this hypothesis very easily: just generate a balanced Graph expression, for example (1 + 1) + (1 + 1) scaled to a few layers and see how it compares to a linear chain 1 + (1 + (1 + 1)).

P.S.: Comparing to ((1 + 1) + 1) + 1 is also interesting. I've always wondered how this works, but never tried! So, it's a useful experiment.

from alga.

nobrakal commented on September 24, 2024

Here are some benchs, using:

balanced, a balanced graph expression with 32768 of type Int
vertices, a graph obtained from vertices $ replicate 32768 1
transposedVertices, a graph obtained with foldg Empty Vertex (flip Overlay) (flip Connect) $ vertices $ replicate 32768 1

benchmarking hasVertex 0/balanced
time                 542.5 μs   (537.6 μs .. 547.7 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 542.6 μs   (537.2 μs .. 547.3 μs)
std dev              16.68 μs   (13.36 μs .. 22.13 μs)
variance introduced by outliers: 23% (moderately inflated)

benchmarking hasVertex 0/vertices
time                 436.1 μs   (427.2 μs .. 448.1 μs)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 440.5 μs   (435.4 μs .. 447.4 μs)
std dev              20.03 μs   (15.94 μs .. 24.05 μs)
variance introduced by outliers: 40% (moderately inflated)

benchmarking hasVertex 0/transposedVertices
time                 811.5 μs   (801.9 μs .. 824.4 μs)
                     0.997 R²   (0.994 R² .. 1.000 R²)
mean                 814.9 μs   (807.5 μs .. 827.8 μs)
std dev              31.26 μs   (19.64 μs .. 53.27 μs)
variance introduced by outliers: 29% (moderately inflated)

benchmarking hasVertex 1/balanced
time                 136.6 ns   (135.5 ns .. 138.1 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 136.1 ns   (135.8 ns .. 136.9 ns)
std dev              1.471 ns   (919.7 ps .. 2.818 ns)

benchmarking hasVertex 1/vertices
time                 27.42 ns   (26.67 ns .. 28.39 ns)
                     0.995 R²   (0.992 R² .. 0.997 R²)
mean                 27.50 ns   (27.01 ns .. 28.12 ns)
std dev              1.859 ns   (1.468 ns .. 2.554 ns)
variance introduced by outliers: 83% (severely inflated)

benchmarking hasVertex 1/transposedVertices
time                 604.8 μs   (598.1 μs .. 612.0 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 597.2 μs   (593.9 μs .. 602.6 μs)
std dev              13.72 μs   (9.668 μs .. 19.87 μs)
variance introduced by outliers: 14% (moderately inflated)

So the balanced option is less efficient than the one obtained with vertices (and a reversed structure is not very great ^^).

Also, I spotted something, with the benchs: I was not testing the last vertex of the domain, and this test is faster (the same problem that we had when improving NotEmptyGraph: the last vertex comes first in the representation):

##### 999

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 13.18 μs | 1.000 |
| AlgaOld || 21.89 μs | 0.999 |
+---------++----------+-------+

But even for the "better positioned" vertex, the time is not so great (comparing to the ~30ns for the better positioned vertex for actual edges

from alga.

nobrakal commented on September 24, 2024

Your comment about the difference between 1 + ( +1 ( 1 + ... )) and ( ( ... + 1 ) + 1 ) + 1 leads me to re-considerate the function, and in fact I was producing a somewhere bad representation (like ( ( ... + 1 ) + 1 ) + 1 ).
Inverting the two arguments of Overlay leads to more reasonable results ^^:

### Clique
#### 1000
##### 0

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 33.36 ns | 1.000 |
|    Alga || 36.01 ns | 0.999 |
+---------++----------+-------+

##### 333

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 3.837 μs | 1.000 |
| AlgaOld || 7.245 μs | 1.000 |
+---------++----------+-------+

##### 666

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 9.044 μs | 1.000 |
| AlgaOld || 17.36 μs | 1.000 |
+---------++----------+-------+

##### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 14.12 ms | 0.970 |
|    Alga || 22.65 ms | 0.981 |
+---------++----------+-------+

### Mesh
#### 1000
##### 0

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 33.34 ns | 1.000 |
|    Alga || 33.48 ns | 1.000 |
+---------++----------+-------+

##### 333

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 11.69 μs | 1.000 |
| AlgaOld || 15.84 μs | 1.000 |
+---------++----------+-------+

##### 666

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 22.40 μs | 1.000 |
| AlgaOld || 30.80 μs | 1.000 |
+---------++----------+-------+

##### 1000

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 35.34 μs | 1.000 |
| AlgaOld || 47.06 μs | 0.999 |
+---------++----------+-------+


SUMMARY:

 * Alga was the fastest 5 times
 * AlgaOld was the fastest 1 times

 There was 2 ex-aequo

The function used was:

edges :: [(a, a)] -> Graph a
edges = overlays . map (uncurry edge)

edgesEq :: Eq a => [(a, a)] -> Graph a
edgesEq = foldr (\(v,l) -> Overlay (Connect (Vertex v) l)) Empty . groupByWithVertices

groupByWithVertices :: Eq a => [(a,a)] -> [(a,Graph a)]
groupByWithVertices []     = []
groupByWithVertices (x:xs) = (fst x, vertices1 x ys) : groupByWithVertices zs
  where
    eq = on (==) fst
    (ys,zs) = span (eq x) xs
    vertices1 x = foldr (Overlay . vertex . snd) (vertex $ snd x)

I am currently running the whole suite to see if it is introducing other problems !

from alga.

nobrakal commented on September 24, 2024

vertexList: 0.71 (good)
isEmpty: 1.00 (OK)
vertexCount: 0.65 (good)
hasVertex: 0.83 (good)
edgeCount: 0.60 (good)
edgeList: 0.58 (good)
hasEdge: 1.64 (bad)
hasSelfLoop: 0.57 (good)
addEdge: 1.07 (OK)
addVertex: 1.15 (bad)
removeVertex: 0.69 (good)
equality: 0.56 (good)
removeEdge: 0.49 (good)
transpose: 0.83 (good)
dff: 1.08 (OK)
topSort: 0.99 (OK)
edges: 2.15 (bad)

Now hasEdge is not so happy... I will try with #97

from alga.

nobrakal commented on September 24, 2024

Comparing the #97 modification plus edgesEq and only the #97 modification, I get a:

hasEdge: 0.75 (good)

So I think all seems good here

from alga.

snowleopard commented on September 24, 2024

@nobrakal So, speaking very roughtly, the algorithms get a small (say 30%) speed up at the cost of 2x slower graph construction. I guess this is a good tradeoff, but I hope we can find further improvements for the edges function, so that it achieves better compression.

from alga.

nobrakal commented on September 24, 2024

Yep, this is certainly possible, requiring an Ord instance.

I played a bit, and came to an horrible function I won't show here (one can find it at: https://github.com/nobrakal/alga/blob/edgesOrd/src/Algebra/Graph.hs#L313 ).

Roughly the idea I had was to build an adjacency map, and this time, to use it: Each time I am trying to connect two vertices x and y, I search if the two are connected to some vertices, c d e, so I can create:
x * ( y * (c + d + e))
So this is an improvement from the first version because it reduce the duplication of vertices c d and e.

So it compresses greatly:

*Algebra.Graph> let x = edgeList $ clique [1..1000] :: [(Int,Int)]
*Algebra.Graph> length x
499500
*Algebra.Graph> size $ edges x
999001
*Algebra.Graph> size $ edgesEq x
500500
*Algebra.Graph> size $ edgesOrd x
251498

But, sadly, I cannot come to very descent performances:

edges: 35.71 (bad)

Using specialized IntMap and IntSet:

edges: 11.17 (bad)

The benchmarks:

vertexList: 0.50 (good)
isEmpty: 0.96 (OK)
vertexCount: 0.47 (good)
hasVertex: 48439.75 (bad)
edgeCount: 0.56 (good)
edgeList: 0.55 (good)
hasEdge: 0.48 (good)
hasSelfLoop: 0.40 (good)
addEdge: 0.75 (good)
addVertex: 0.78 (good)
removeVertex: 0.51 (good)
equality: 0.44 (good)
removeEdge: 0.43 (good)
edges: 12.28 (bad)

Again an issue with hasVertex otherwise numbers are rather but not extraordinary better.

I don't know if it worth investigating and optimizing this solution

from alga.

nobrakal commented on September 24, 2024

Ok, I have managed to find a better version of edgesOrd:

vertexList: 0.53 (good)
isEmpty: 1.00 (OK)
vertexCount: 0.53 (good)
hasVertex: 0.76 (good)
edgeCount: 0.53 (good)
edgeList: 0.52 (good)
hasEdge: 0.42 (good)
hasSelfLoop: 0.47 (good)
addEdge: 0.64 (good)
addVertex: 0.75 (good)
removeVertex: 0.48 (good)
equality: 0.43 (good)
removeEdge: 0.42 (good)
transpose: 0.60 (good)
reachable: 0.67 (good)
edges: 12.66 (bad)

It is 10 times slower than a regular edges for every type that is an instance of Ord , and corrected the previous problem with hasVertex.
For Int vertices, it is only 7 times slower.

Code is here: https://github.com/nobrakal/alga/blob/edgesOrd/src/Algebra/Graph.hs#L314

I will investigate further improvements :)

from alga.

nobrakal commented on September 24, 2024

I made some further improvements (https://github.com/nobrakal/alga/blob/edgesOrd/src/Algebra/Graph.hs#L312)

With a general Ord constraint

edges: 6.94 (bad)

Specialized to Int

edges: 4.05 (bad)

size is also smaller:

> let x = edgeList $ clique [1..1000] :: [(Int,Int)]
> size $ edgesOrd x
250501

from alga.

snowleopard commented on September 24, 2024

@nobrakal Many thanks for ongoing experiments!

The latest solution gives around 50% speed up across API, but it's quite complex and doesn't seem to provide any useful guarantees/properties for resulting expressions. We could, of course, include it into the API with a comment "This function is around 5x slower than edges, but if you use it to construct graphs, all other functions are probably going to be 2x times faster", but this seems a bit vague and ad hoc.

If we are prepared to pay a lot in terms of performance and complexity, perhaps, it is worth looking into a more principled approach, for example, based on modular graph decomposition? Then we will at least be able to provide certain guarantees about the result, for example, that all graph modules have been collapsed into sub-expressions.

Of course, it's tempting to add edgesOrd to improve Alga's standing in the benchmarks suite, but will Alga users actually want to construct graphs using edgesOrd? Maybe they will prefer to construct graphs algebraically, which gives direct control over the resulting graph structure.

P.S.: By the way, I guess you could use the star combinator in a few places in your implementation to get some simplification.

from alga.

snowleopard commented on September 24, 2024

Perhaps, edgesOrd should actually be called stars and take a list [(a, [a])] as input. Then there may be a separate function somewhere in Algebra.Graph.Utilities that turns a list of edges into a list of adjacency lists. This will make the API simpler to understand and explain.

(But your latest implementation may be doing more than just stars underneath!).

from alga.

nobrakal commented on September 24, 2024

provide any useful guarantees/properties for resulting expressions

Yep, we only know that size (edgesOrd x) <= length x + nbVertices x. So when possible, it is better than edgesEq.

Note that the previous edgesEq worked only with sorted list of edges (Since I did all my tests with edgesList to produce a list of edges...). So we need an Ord instance anyway, to sort things :)

but will Alga users actually want to construct graphs using edgesOrd?

When building a graph with Int vertices, or something with an Ord instance not too expensive, it can certainly be interesting. Since actual edges is extremely fast, one can chose to spend some time to build a better graph for later use (and edgesOrd is faster than FGL edges equivalent anyway).

Maybe they will prefer to construct graphs algebraically, which gives direct control over the resulting graph structure.

Obviously this is the better thing to do :)

based on modular graph decomposition?

It certainly worth a try, but as you said this a complex thing, and the implementation has to be a bit optimized if don't want to take many time to build a graph !

edgesOrd should actually be called stars

I build a list of (a,Set a) (NonEmpty Set to be more precise), so I can't use such a function, and having starsOrd does not seems very useful...

from alga.

snowleopard commented on September 24, 2024

I build a list of (a,Set a) (NonEmpty Set to be more precise), so I can't use such a function, and having starsOrd does not seems very useful...

I don't understand. Let's assume we add the following function to the API:

stars :: [(a, [a])] -> Graph a
stars = overlays . map (uncurry star)

Then, given xs :: [(a, Set a)] you can call call stars as stars . map (\(x, s) -> (x, toList s)).

from alga.

nobrakal commented on September 24, 2024

Sadly, using stars drop a bit performances:

edges: 10.25 (bad)

(As a reminder, without stars (so with graphs constructed in place:

edges: 8.65 (bad)

)

from alga.

snowleopard commented on September 24, 2024

@nobrakal Strange. Presumably, you should have exactly the same code in your implementation, since you are getting a bunch of stars in the end, isn't it? Could you explain the slowdown?

from alga.

nobrakal commented on September 24, 2024

You are right. I have reworked it to use non-empty lists when possible and thus avoiding a pattern match on the empty list, and I dropped the result to a:

edges: 9.48 (bad)

What left of the slowdown comes I think to an optimization that does not happen in these intricate mix of foldr/map

from alga.

snowleopard commented on September 24, 2024

What left of the slowdown comes I think to an optimization that does not happen in these intricate mix of foldr/map

I would expect everything to be the same if stars is inlined. It would be nice to eliminate any overhead left due to having a separate function.

from alga.

Improve edges function of Algebraic.Graph about alga HOT 27 OPEN

Comments (27)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent