Deion I have compiled kahypar as a dynamic library l

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Adding kahypar dynamic library in compilation command makes my program much slower about kahypar HOT 8 CLOSED

whwang1996 commented on June 2, 2024

Adding kahypar dynamic library in compilation command makes my program much slower

from kahypar.

Comments (8)

SebastianSchlag commented on June 2, 2024

Hey @whwang1996,

Thanks for reaching out. This does indeed sound interesting. I wonder whether there is something wrong with your timing code, though. 37.8s seems awfully slow and we haven't seen any user reports like this so far.

Could you repeat your experiments using std::chrono to measure times?

KaHyPar does a bit of work during startup by registering all of its core components/algorithms (see https://github.com/kahypar/kahypar/tree/master/kahypar/partition/registries), but that should be really fast.

@kittobi1992 @N-Maas Do you guys have any ideas?

from kahypar.

whwang1996 commented on June 2, 2024

chrono

Hey @whwang1996,

Thanks for reaching out. This does indeed sound interesting. I wonder whether there is something wrong with your timing code, though. 37.8s seems awfully slow and we haven't seen any user reports like this so far.

Could you repeat your experiments using std::chrono to measure times?

KaHyPar does a bit of work during startup by registering all of its core components/algorithms (see https://github.com/kahypar/kahypar/tree/master/kahypar/partition/registries), but that should be really fast.

@kittobi1992 @N-Maas Do you guys have any ideas?

I have switched to std::chrono for time measurement, but I get the same result.

#include <iostream>
#include <chrono>
#include <vector>


int main () {
  double total_time = 0.0;
  size_t len = 300000;

  for (int t = 0; t < 100; t++) {
    auto t0 = std::chrono::steady_clock::now();
    std::vector<double> vec(len, 5.0);

    double temp = 0.0;
    for (int i = 0; i < 100; i++) {
      for (int j = 0; j < 10; j++) {
        temp++;

        for (int k = 0; k < len; k++) {
          auto& a = vec[k];
          a = k + temp;
        }
      }
    }

    total_time += std::chrono::duration<double, std::milli>(std::chrono::steady_clock::now() - t0).count();
    std::cout << "Total time: " << total_time << "ms" << std::endl;
  }
}

And here's the first 10 outputs of with/out libkahypar.so. You can see that the time cost increase evenly in each iteration, i.e., the overhead isn't caused by initialization.

With libkahypar.so:
Total time: 391.092ms
Total time: 773.97ms
Total time: 1173.96ms
Total time: 1566.65ms
Total time: 1946.47ms
Total time: 2328.42ms
Total time: 2708.54ms
Total time: 3086.39ms
Total time: 3465.13ms
Total time: 3859.36ms

Without libkahypar.so:
Total time: 105.864ms
Total time: 197.483ms
Total time: 290.303ms
Total time: 381.36ms
Total time: 472.573ms
Total time: 563.52ms
Total time: 654.548ms
Total time: 745.393ms
Total time: 836.707ms
Total time: 927.717ms

from kahypar.

N-Maas commented on June 2, 2024

Hi @whwang1996,

I tried your code but couldn't reproduce the issue for now. In both cases my timings are about twice as fast as your version without libkahypar.so, and both commands produce literally (bit for bit) the same binary. However, note that my gcc version is way newer than yours - I'm not sure whether I'll find time to try it with an older version.

I would guess that this is more an issue with your gcc than with KaHyPar, it seems likely that using a newer gcc version would fix it.

You should also note that your code is rather inappropriate for benchmarking anything meaningful. This is because the calculation is side-effect free, which means that the compiler is allowed to just completely remove your inner loops from the code. In some sense, you are thus lucky to get any timings with this code (since gcc doesn't seem smart enough to remove the loop). Therefore, when doing a micro-benchmark you need to at least ensure that the compiler does not remove the things you want to benchmark. This can be be done e.g. by actually doing something non-trivial with the computed result. However, it is generally hard to guarantee this (the Rust standard library has a function exactly for this purpose, but it is still only a best effort: https://doc.rust-lang.org/std/hint/fn.black_box.html)

I would recommend that you instead measure the code you actually care about. See e.g. https://stackoverflow.com/questions/2842695/what-is-microbenchmarking/2842705#2842705 for a discussion of the problems of micro-benchmarks.

Hope this helps!

from kahypar.

whwang1996 commented on June 2, 2024

Hi @whwang1996,

I tried your code but couldn't reproduce the issue for now. In both cases my timings are about twice as fast as your version without libkahypar.so, and both commands produce literally (bit for bit) the same binary. However, note that my gcc version is way newer than yours - I'm not sure whether I'll find time to try it with an older version.

I would guess that this is more an issue with your gcc than with KaHyPar, it seems likely that using a newer gcc version would fix it.

You should also note that your code is rather inappropriate for benchmarking anything meaningful. This is because the calculation is side-effect free, which means that the compiler is allowed to just completely remove your inner loops from the code. In some sense, you are thus lucky to get any timings with this code (since gcc doesn't seem smart enough to remove the loop). Therefore, when doing a micro-benchmark you need to at least ensure that the compiler does not remove the things you want to benchmark. This can be be done e.g. by actually doing something non-trivial with the computed result. However, it is generally hard to guarantee this (the Rust standard library has a function exactly for this purpose, but it is still only a best effort: https://doc.rust-lang.org/std/hint/fn.black_box.html)

I would recommend that you instead measure the code you actually care about. See e.g. https://stackoverflow.com/questions/2842695/what-is-microbenchmarking/2842705#2842705 for a discussion of the problems of micro-benchmarks.

Hope this helps!

Thanks for your detailed reply!

I tried on another PC (Ubuntu 22.04.2 on WSL), and I didn't reproduce this issue, even if I tried to build with gcc 7.5. I think you are right, that's maybe a problem with my own environment rather than kahypar.

In fact, that's a real issue I find in my project, adding kahypar dynamic library does slow my project a lot. The given code is a minimal reproducible version on my computer.

I will continue debugging it, and will close this issue if I find the solution.

from kahypar.

N-Maas commented on June 2, 2024

Yeah, if you have similar issues in your project that's quite unfortunate! In this case you probably need to look into the generated assembly to see what is actually going on.

I could also take a look at it if you don't have experience reading assembly, though I probably won't have time to analyze it in detail.

from kahypar.

whwang1996 commented on June 2, 2024

Yeah, if you have similar issues in your project that's quite unfortunate! In this case you probably need to look into the generated assembly to see what is actually going on.

I could also take a look at it if you don't have experience reading assembly, though I probably won't have time to analyze it in detail.

Unluckily, I've tried my best, but I still didn't find the solution. The assembly is the same with/out libkahypar.so. And the same issue didn't occur when using PaToH. So switching to PaToH is my final solution. :)

Thank you anyway!

from kahypar.

kittobi1992 commented on June 2, 2024

Hi @whwang1996,

You can also try out the shared-memory parallelization of KaHyPar:
https://github.com/kahypar/mt-kahypar
In its default configuration, it is faster than PaToH when using all threads of a modern workstation and produces much better partitions. In its highest-quality configuration, its produces comparable partitions to KaHyPar but is an order of magnitude faster when using ten threads. Let me know if you have any further questions ;)

from kahypar.

SebastianSchlag commented on June 2, 2024

Hey @whwang1996,
I'm sorry that we couldn't help you resolving this. Thank you very mich though for following up.

from kahypar.

Adding kahypar dynamic library in compilation command makes my program much slower about kahypar HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent