Hi, This is just for the record: I saw: <a href="h

One simple tweak could probably be as follows: <div class="highlight highlight-sou

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Benchmark hasEdge,about snowleopard/alga

Comments (23)

snowleopard commented on September 24, 2024

Interesting, thanks! I don't fully understand what you benchmarked, can you show the definition you used for the Alga (new)?

Could it be that we should specialise ToGraph code to Graph or simply copy the code?

from alga.

nobrakal commented on September 24, 2024

Sorry if I wasn't clear.

For the FIRST table, I used (copy/pasted from ToGraph)

hasEdge s t g = case foldg e v o c g of (_, _, r) -> r
      where
        e                             = (False   , False   , False                 )
        v x                           = (x == s  , x == t  , False                 )
        o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
        c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

For the SECOND, I used

hasEdge :: Ord a => a -> a -> Graph a -> Bool
hasEdge s t = (\g -> case foldg e v o c g of (_, _, r) -> r) . induce (`elem` [s, t])
   where
     e                             = (False   , False   , False                 )
     v x                           = (x == s  , x == t  , False                 )
     o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
     c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

from alga.

snowleopard commented on September 24, 2024

Wow, that's pretty surprising to me. So, essentially, on clique we get a 10x slowdown, and the only difference for clique is the term xs && yt.

What about making the third field (or maybe all of them) of the tuple strict? Perhaps, we are accumulating huge unevaluated trees of Boolean expressions in the process?

from alga.

snowleopard commented on September 24, 2024

One simple tweak could probably be as follows:

hasEdge s t g = case foldg e v o c g of (_, _, r) -> r
      where
        e                             = (False   , False   , False)
        v x                           = (x == s  , x == t  , False)
        o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xst || yst)
        c (_, _, True)  (_, _, _)     = (True, True, True)
        c (_, _, _)     (_, _, True)  = (True, True, True)
        c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt)

This essentially forces the third field for the connect case. But we could make it more elegant I'm sure.

from alga.

nobrakal commented on September 24, 2024

Sorry for taking so long to answer. Your implementation is faster but not enough:

Clique

	1000
Alga	692.3 ms
AlgaOld	94.69 ms

Note that this is certainly the fault of edges. I have forget it, but I am running benchmarks using edges, and here not clique. For example, with the clique from alga:

Clique

	1000
ToGraph	58.46 μs
ForcingValues	61.72 μs
Actual	48.52 μs

Anyway, one advantage of the ToGraph function is that it require only an Eq instance, and not an Ord one like the actual alga is requiring (because of the use of isSubgraphOf

from alga.

snowleopard commented on September 24, 2024

Note that this is certainly the fault of edges.

I don't understand what you mean: both algorithms are working on exactly the same input, right? It's not the fault of the input that one algorithm takes 7x longer than another :)

from alga.

nobrakal commented on September 24, 2024

Mmh, yes your are right, I wanted to say to we must not forget that we are only using a very big stack of overlayed edges from edges :)

from alga.

nobrakal commented on September 24, 2024

I spent some time trying to improve your implementation, and I think that it is maybe not worth spending time here, the actual implementation is faster everywhere (on real graphs made using edges and on using alga's clique).

The only improvement that can be made I think is something like:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge u v = hasEdge' . induce (`elem` [u, v])
  where
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y
    hasEdge' (Connect x y) = let !isInLeft = hasVertex u x
                                 !isInRight = hasVertex v y
                              in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (hasEdge' y && isInRight)

    hasEdge' _ = False

Because it drop the Ord Constraint.

It can also be great to define hasLoop, which will be faster than hasEdge x x:

hasLoop x = hasLoop' . induce (==x)
  where
    hasLoop' (Overlay x y) = hasLoop' x || hasloop' y
    hasLoop' Connect{} = True
    hasLoop' _ = False

Which I think is working because induce remove Empty leaves.

from alga.

snowleopard commented on September 24, 2024

@nobrakal I don't think your hasLoop is correct, it will return True for Connect Empty Empty :) Unless, of course, you rely on the feature of (some) implementations of induce that do constant folding to eliminate Empty whenever possible.

from alga.

nobrakal commented on September 24, 2024

Yep totally, I was thinking with Algebra.Graph.induce which throw away empty leaves

from alga.

snowleopard commented on September 24, 2024

@nobrakal Right, I see. I don't mind adding hasLoop into the API, so feel free to do that, but consistently across various modules.

So, what's the conclusion regarding hasEdge? I'm confused by so many different versions we discussed :) What is the most efficient implementation?

from alga.

nobrakal commented on September 24, 2024

I don't mind adding hasLoop into the API, so feel free to do that, but consistently across various modules.

You don't mind to do it yourself but will merge if I do it ? I prefer to ask ^^

So, what's the conclusion regarding hasEdge? I'm confused by so many different versions we discussed :)

I was also ^^.

The actual one is the fastest, with graphs constructed with edges or with clique.
But a somewhat better version is:

hasEdge :: Eq a => a -> a -> Graph a -> Bool hasEdge u v = hasEdge' . induce (`elem` [u, v])
  where 
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y 
    hasEdge' (Connect x y) = 
      let !isInLeft = hasVertex u x 
           !isInRight = hasVertex v y 
       in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (hasEdge' y && isInRight)
    hasEdge' _ = False

This improve a bit the time of the function (it is about 1.07 times faster), and mostly drop the Ord instance required on vertices because of isSubgraphOf in the current implementation. It is now only requiring Eq.
The "naive" implementation of hasEdge' works because the induced subgraph is very tiny thanks to the absence of Empty leaves

from alga.

snowleopard commented on September 24, 2024

You don't mind to do it yourself but will merge if I do it ? I prefer to ask ^^

I don't mind either, but I won't find time to do it myself in the next couple of weeks, so feel free to do it :-)

OK, let's switch to your improved implementation -- it's not immediately obvious it is correct but it actually looks quite nice. Please send a PR!

from alga.

nobrakal commented on September 24, 2024

I was going to send a PR, and while verifying my results, I benchmarked to hasEdge' function alone. It is quite better with graphs constructed with edges, but slower with clique: Here is the detailed benchmarks:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge u v = hasEdge'
  where
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y
    hasEdge' (Connect x y) =
      let !isInLeft = hasVertex u x
          !isInRight = hasVertex v y
       in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (isInRight && hasEdge' y)
    hasEdge' _ = False

So without the suffixed induce. It is really faster hand drop the Ord instance:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 62.98 ns | 1.000 |
| AlgaOld || 96.58 ms | 0.966 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 9.132 ms | 1.000 |
| AlgaOld || 96.28 ms | 0.984 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 20.61 ms | 1.000 |
| AlgaOld || 98.32 ms | 0.980 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 19.01 ms | 0.999 |
| AlgaOld || 92.89 ms | 0.982 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 26.18 ms | 0.999 |
| AlgaOld || 88.50 ms | 0.975 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 18.12 ms | 0.997 |
| AlgaOld || 91.48 ms | 0.983 |
+---------++----------+-------+

The first three tested edges are edges in the graph, the last three are not in the graph.
Basically, in addition to being faster to find edges not in the graph, it is really faster to find edges "in the front" of the graph.

Sadly this is "less" true for graph constructed using clique, since it is slower for edges in the graph but "at the end", but faster in other cases:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 62.20 ns | 1.000 |
| AlgaOld || 46.53 μs | 1.000 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.41 μs | 1.000 |
|    Alga || 2.486 ms | 1.000 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.42 μs | 1.000 |
|    Alga || 5.969 ms | 0.999 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.29 μs | 1.000 |
| AlgaOld || 46.69 μs | 1.000 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.20 μs | 1.000 |
| AlgaOld || 45.93 μs | 0.999 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.34 μs | 0.998 |
| AlgaOld || 47.13 μs | 1.000 |
+---------++----------+-------+

from alga.

snowleopard commented on September 24, 2024

Wow, what happens here?

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.42 μs | 1.000 |
|    Alga || 5.969 ms | 0.999 |
+---------++----------+-------+

It's 100 times slower here! Could you investigate why?

from alga.

nobrakal commented on September 24, 2024

The clique is in the form Connect 0 (Connect 1 (Connect 2)), so because I run hasVertex for every Connect node, this take very long time if we search an edge "at then end".

That is why it works great with #80 representation ^^
I suspect that it will be great fo "balanced" graphs, with similar of nodes on each part of a "Connect"...

from alga.

snowleopard commented on September 24, 2024

Ah, of course -- the version that calls hasVertex actually has quadratic complexity. That's why I used triples in my implementation -- to memoize the invocations of hasVertex.

from alga.

nobrakal commented on September 24, 2024

I tried to do some memoization, but nothing great :p

So do you think we can approach the time of the version using hasVertex but using a better memoization ? Or should we just stick to the previous version (here #84 (comment) ).

Or even propose the two versions ? The hasVertex solution is still faster with graph constructed with edges and by far

from alga.

snowleopard commented on September 24, 2024

So do you think we can approach the time of the version using hasVertex but using a better memoization ?

Well, you are comparing an O(n) implementation against an O(n^2) one. Of course the latter will always lose on large benchmarks.

Or even propose the two versions ?

I don't think there is any point in including a quadratic version into the library when there is a linear implementation. Think why the quadratic version manages to be faster on some examples -- perhaps that will help you figure out how to improve the linear version too!

from alga.

nobrakal commented on September 24, 2024

Think why the quadratic version manages to be faster on some examples

It runs hasVertex only whenConnect is encountered, and with a graph constructed with edges, Connect is always like Connect (Vertex a) (Vertex b). So hasVertex is fast :)

I spent again some time trying to find a better version, but didn't find anything, so maybe just dropping the Ord instance can be great for now (as discussed in #84 (comment) )

from alga.

snowleopard commented on September 24, 2024

@nobrakal Sorry, I'm not sure which implementation you refer to. Perhaps, just create a PR and we'll discuss it there :)

from alga.

nobrakal commented on September 24, 2024

I think this is closed with your super version of hasEdge ?

from alga.

snowleopard commented on September 24, 2024

@nobrakal Yes, thank you. I think we can close this.

from alga.

Benchmark hasEdge about alga HOT 23 CLOSED

Comments (23)

Clique

Clique

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent