Giter Site home page Giter Site logo

Benchmark hasEdge about alga HOT 23 CLOSED

snowleopard avatar snowleopard commented on September 24, 2024
Benchmark hasEdge

from alga.

Comments (23)

snowleopard avatar snowleopard commented on September 24, 2024

Interesting, thanks! I don't fully understand what you benchmarked, can you show the definition you used for the Alga (new)?

Could it be that we should specialise ToGraph code to Graph or simply copy the code?

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

Sorry if I wasn't clear.

For the FIRST table, I used (copy/pasted from ToGraph)

hasEdge s t g = case foldg e v o c g of (_, _, r) -> r
      where
        e                             = (False   , False   , False                 )
        v x                           = (x == s  , x == t  , False                 )
        o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
        c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

For the SECOND, I used

hasEdge :: Ord a => a -> a -> Graph a -> Bool
hasEdge s t = (\g -> case foldg e v o c g of (_, _, r) -> r) . induce (`elem` [s, t])
   where
     e                             = (False   , False   , False                 )
     v x                           = (x == s  , x == t  , False                 )
     o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt,             xst || yst)
     c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt || xst || yst)

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

Wow, that's pretty surprising to me. So, essentially, on clique we get a 10x slowdown, and the only difference for clique is the term xs && yt.

What about making the third field (or maybe all of them) of the tuple strict? Perhaps, we are accumulating huge unevaluated trees of Boolean expressions in the process?

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

One simple tweak could probably be as follows:

hasEdge s t g = case foldg e v o c g of (_, _, r) -> r
      where
        e                             = (False   , False   , False)
        v x                           = (x == s  , x == t  , False)
        o (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xst || yst)
        c (_, _, True)  (_, _, _)     = (True, True, True)
        c (_, _, _)     (_, _, True)  = (True, True, True)
        c (xs, xt, xst) (ys, yt, yst) = (xs || ys, xt || yt, xs && yt)

This essentially forces the third field for the connect case. But we could make it more elegant I'm sure.

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

Sorry for taking so long to answer. Your implementation is faster but not enough:

Clique

1000
Alga 692.3 ms
AlgaOld 94.69 ms

Note that this is certainly the fault of edges. I have forget it, but I am running benchmarks using edges, and here not clique. For example, with the clique from alga:

Clique

1000
ToGraph 58.46 μs
ForcingValues 61.72 μs
Actual 48.52 μs

Anyway, one advantage of the ToGraph function is that it require only an Eq instance, and not an Ord one like the actual alga is requiring (because of the use of isSubgraphOf

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

Note that this is certainly the fault of edges.

I don't understand what you mean: both algorithms are working on exactly the same input, right? It's not the fault of the input that one algorithm takes 7x longer than another :)

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

Mmh, yes your are right, I wanted to say to we must not forget that we are only using a very big stack of overlayed edges from edges :)

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

I spent some time trying to improve your implementation, and I think that it is maybe not worth spending time here, the actual implementation is faster everywhere (on real graphs made using edges and on using alga's clique).

The only improvement that can be made I think is something like:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge u v = hasEdge' . induce (`elem` [u, v])
  where
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y
    hasEdge' (Connect x y) = let !isInLeft = hasVertex u x
                                 !isInRight = hasVertex v y
                              in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (hasEdge' y && isInRight)

    hasEdge' _ = False

Because it drop the Ord Constraint.

It can also be great to define hasLoop, which will be faster than hasEdge x x:

hasLoop x = hasLoop' . induce (==x)
  where
    hasLoop' (Overlay x y) = hasLoop' x || hasloop' y
    hasLoop' Connect{} = True
    hasLoop' _ = False 

Which I think is working because induce remove Empty leaves.

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

@nobrakal I don't think your hasLoop is correct, it will return True for Connect Empty Empty :) Unless, of course, you rely on the feature of (some) implementations of induce that do constant folding to eliminate Empty whenever possible.

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

Yep totally, I was thinking with Algebra.Graph.induce which throw away empty leaves

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

@nobrakal Right, I see. I don't mind adding hasLoop into the API, so feel free to do that, but consistently across various modules.

So, what's the conclusion regarding hasEdge? I'm confused by so many different versions we discussed :) What is the most efficient implementation?

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

I don't mind adding hasLoop into the API, so feel free to do that, but consistently across various modules.

You don't mind to do it yourself but will merge if I do it ? I prefer to ask ^^

So, what's the conclusion regarding hasEdge? I'm confused by so many different versions we discussed :)

I was also ^^.

The actual one is the fastest, with graphs constructed with edges or with clique.
But a somewhat better version is:

hasEdge :: Eq a => a -> a -> Graph a -> Bool hasEdge u v = hasEdge' . induce (`elem` [u, v])
  where 
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y 
    hasEdge' (Connect x y) = 
      let !isInLeft = hasVertex u x 
           !isInRight = hasVertex v y 
       in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (hasEdge' y && isInRight)
    hasEdge' _ = False

This improve a bit the time of the function (it is about 1.07 times faster), and mostly drop the Ord instance required on vertices because of isSubgraphOf in the current implementation. It is now only requiring Eq.
The "naive" implementation of hasEdge' works because the induced subgraph is very tiny thanks to the absence of Empty leaves

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

You don't mind to do it yourself but will merge if I do it ? I prefer to ask ^^

I don't mind either, but I won't find time to do it myself in the next couple of weeks, so feel free to do it :-)

OK, let's switch to your improved implementation -- it's not immediately obvious it is correct but it actually looks quite nice. Please send a PR!

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

I was going to send a PR, and while verifying my results, I benchmarked to hasEdge' function alone. It is quite better with graphs constructed with edges, but slower with clique: Here is the detailed benchmarks:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge u v = hasEdge'
  where
    hasEdge' (Overlay x y) = hasEdge' x || hasEdge' y
    hasEdge' (Connect x y) =
      let !isInLeft = hasVertex u x
          !isInRight = hasVertex v y
       in (isInLeft && isInRight) || (isInLeft && hasEdge' x) || (isInRight && hasEdge' y)
    hasEdge' _ = False

So without the suffixed induce. It is really faster hand drop the Ord instance:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 62.98 ns | 1.000 |
| AlgaOld || 96.58 ms | 0.966 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 9.132 ms | 1.000 |
| AlgaOld || 96.28 ms | 0.984 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 20.61 ms | 1.000 |
| AlgaOld || 98.32 ms | 0.980 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 19.01 ms | 0.999 |
| AlgaOld || 92.89 ms | 0.982 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 26.18 ms | 0.999 |
| AlgaOld || 88.50 ms | 0.975 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 18.12 ms | 0.997 |
| AlgaOld || 91.48 ms | 0.983 |
+---------++----------+-------+

The first three tested edges are edges in the graph, the last three are not in the graph.
Basically, in addition to being faster to find edges not in the graph, it is really faster to find edges "in the front" of the graph.

Sadly this is "less" true for graph constructed using clique, since it is slower for edges in the graph but "at the end", but faster in other cases:

### Clique
#### 1000
##### (0,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 62.20 ns | 1.000 |
| AlgaOld || 46.53 μs | 1.000 |
+---------++----------+-------+

##### (292,820)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.41 μs | 1.000 |
|    Alga || 2.486 ms | 1.000 |
+---------++----------+-------+

##### (997,999)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.42 μs | 1.000 |
|    Alga || 5.969 ms | 0.999 |
+---------++----------+-------+

##### (0,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.29 μs | 1.000 |
| AlgaOld || 46.69 μs | 1.000 |
+---------++----------+-------+

##### (1,0)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.20 μs | 1.000 |
| AlgaOld || 45.93 μs | 0.999 |
+---------++----------+-------+

##### (1,1)

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
|    Alga || 12.34 μs | 0.998 |
| AlgaOld || 47.13 μs | 1.000 |
+---------++----------+-------+

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

Wow, what happens here?

+---------++----------+-------+
|         ||     Time |    R² |
+=========++==========+=======+
| AlgaOld || 52.42 μs | 1.000 |
|    Alga || 5.969 ms | 0.999 |
+---------++----------+-------+

It's 100 times slower here! Could you investigate why?

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

The clique is in the form Connect 0 (Connect 1 (Connect 2)), so because I run hasVertex for every Connect node, this take very long time if we search an edge "at then end".

That is why it works great with #80 representation ^^
I suspect that it will be great fo "balanced" graphs, with similar of nodes on each part of a "Connect"...

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

Ah, of course -- the version that calls hasVertex actually has quadratic complexity. That's why I used triples in my implementation -- to memoize the invocations of hasVertex.

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

I tried to do some memoization, but nothing great :p

So do you think we can approach the time of the version using hasVertex but using a better memoization ? Or should we just stick to the previous version (here #84 (comment) ).

Or even propose the two versions ? The hasVertex solution is still faster with graph constructed with edges and by far

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

So do you think we can approach the time of the version using hasVertex but using a better memoization ?

Well, you are comparing an O(n) implementation against an O(n^2) one. Of course the latter will always lose on large benchmarks.

Or even propose the two versions ?

I don't think there is any point in including a quadratic version into the library when there is a linear implementation. Think why the quadratic version manages to be faster on some examples -- perhaps that will help you figure out how to improve the linear version too!

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

Think why the quadratic version manages to be faster on some examples

It runs hasVertex only whenConnect is encountered, and with a graph constructed with edges, Connect is always like Connect (Vertex a) (Vertex b). So hasVertex is fast :)

I spent again some time trying to find a better version, but didn't find anything, so maybe just dropping the Ord instance can be great for now (as discussed in #84 (comment) )

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

@nobrakal Sorry, I'm not sure which implementation you refer to. Perhaps, just create a PR and we'll discuss it there :)

from alga.

nobrakal avatar nobrakal commented on September 24, 2024

I think this is closed with your super version of hasEdge ?

from alga.

snowleopard avatar snowleopard commented on September 24, 2024

@nobrakal Yes, thank you. I think we can close this.

from alga.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.