Giter Site home page Giter Site logo

Comments (8)

SebastianSchlag avatar SebastianSchlag commented on June 2, 2024

Hey @whwang1996,

Thanks for reaching out. This does indeed sound interesting. I wonder whether there is something wrong with your timing code, though. 37.8s seems awfully slow and we haven't seen any user reports like this so far.

Could you repeat your experiments using std::chrono to measure times?

KaHyPar does a bit of work during startup by registering all of its core components/algorithms (see https://github.com/kahypar/kahypar/tree/master/kahypar/partition/registries), but that should be really fast.

@kittobi1992 @N-Maas Do you guys have any ideas?

from kahypar.

whwang1996 avatar whwang1996 commented on June 2, 2024

chrono

Hey @whwang1996,

Thanks for reaching out. This does indeed sound interesting. I wonder whether there is something wrong with your timing code, though. 37.8s seems awfully slow and we haven't seen any user reports like this so far.

Could you repeat your experiments using std::chrono to measure times?

KaHyPar does a bit of work during startup by registering all of its core components/algorithms (see https://github.com/kahypar/kahypar/tree/master/kahypar/partition/registries), but that should be really fast.

@kittobi1992 @N-Maas Do you guys have any ideas?

I have switched to std::chrono for time measurement, but I get the same result.

#include <iostream>
#include <chrono>
#include <vector>


int main () {
  double total_time = 0.0;
  size_t len = 300000;

  for (int t = 0; t < 100; t++) {
    auto t0 = std::chrono::steady_clock::now();
    std::vector<double> vec(len, 5.0);

    double temp = 0.0;
    for (int i = 0; i < 100; i++) {
      for (int j = 0; j < 10; j++) {
        temp++;

        for (int k = 0; k < len; k++) {
          auto& a = vec[k];
          a = k + temp;
        }
      }
    }

    total_time += std::chrono::duration<double, std::milli>(std::chrono::steady_clock::now() - t0).count();
    std::cout << "Total time: " << total_time << "ms" << std::endl;
  }
}

And here's the first 10 outputs of with/out libkahypar.so. You can see that the time cost increase evenly in each iteration, i.e., the overhead isn't caused by initialization.

With libkahypar.so:
Total time: 391.092ms
Total time: 773.97ms
Total time: 1173.96ms
Total time: 1566.65ms
Total time: 1946.47ms
Total time: 2328.42ms
Total time: 2708.54ms
Total time: 3086.39ms
Total time: 3465.13ms
Total time: 3859.36ms

Without libkahypar.so:
Total time: 105.864ms
Total time: 197.483ms
Total time: 290.303ms
Total time: 381.36ms
Total time: 472.573ms
Total time: 563.52ms
Total time: 654.548ms
Total time: 745.393ms
Total time: 836.707ms
Total time: 927.717ms

from kahypar.

N-Maas avatar N-Maas commented on June 2, 2024

Hi @whwang1996,

I tried your code but couldn't reproduce the issue for now. In both cases my timings are about twice as fast as your version without libkahypar.so, and both commands produce literally (bit for bit) the same binary. However, note that my gcc version is way newer than yours - I'm not sure whether I'll find time to try it with an older version.

I would guess that this is more an issue with your gcc than with KaHyPar, it seems likely that using a newer gcc version would fix it.

You should also note that your code is rather inappropriate for benchmarking anything meaningful. This is because the calculation is side-effect free, which means that the compiler is allowed to just completely remove your inner loops from the code. In some sense, you are thus lucky to get any timings with this code (since gcc doesn't seem smart enough to remove the loop). Therefore, when doing a micro-benchmark you need to at least ensure that the compiler does not remove the things you want to benchmark. This can be be done e.g. by actually doing something non-trivial with the computed result. However, it is generally hard to guarantee this (the Rust standard library has a function exactly for this purpose, but it is still only a best effort: https://doc.rust-lang.org/std/hint/fn.black_box.html)

I would recommend that you instead measure the code you actually care about. See e.g. https://stackoverflow.com/questions/2842695/what-is-microbenchmarking/2842705#2842705 for a discussion of the problems of micro-benchmarks.

Hope this helps!

from kahypar.

whwang1996 avatar whwang1996 commented on June 2, 2024

Hi @whwang1996,

I tried your code but couldn't reproduce the issue for now. In both cases my timings are about twice as fast as your version without libkahypar.so, and both commands produce literally (bit for bit) the same binary. However, note that my gcc version is way newer than yours - I'm not sure whether I'll find time to try it with an older version.

I would guess that this is more an issue with your gcc than with KaHyPar, it seems likely that using a newer gcc version would fix it.

You should also note that your code is rather inappropriate for benchmarking anything meaningful. This is because the calculation is side-effect free, which means that the compiler is allowed to just completely remove your inner loops from the code. In some sense, you are thus lucky to get any timings with this code (since gcc doesn't seem smart enough to remove the loop). Therefore, when doing a micro-benchmark you need to at least ensure that the compiler does not remove the things you want to benchmark. This can be be done e.g. by actually doing something non-trivial with the computed result. However, it is generally hard to guarantee this (the Rust standard library has a function exactly for this purpose, but it is still only a best effort: https://doc.rust-lang.org/std/hint/fn.black_box.html)

I would recommend that you instead measure the code you actually care about. See e.g. https://stackoverflow.com/questions/2842695/what-is-microbenchmarking/2842705#2842705 for a discussion of the problems of micro-benchmarks.

Hope this helps!

Thanks for your detailed reply!

I tried on another PC (Ubuntu 22.04.2 on WSL), and I didn't reproduce this issue, even if I tried to build with gcc 7.5. I think you are right, that's maybe a problem with my own environment rather than kahypar.

In fact, that's a real issue I find in my project, adding kahypar dynamic library does slow my project a lot. The given code is a minimal reproducible version on my computer.

I will continue debugging it, and will close this issue if I find the solution.

from kahypar.

N-Maas avatar N-Maas commented on June 2, 2024

Yeah, if you have similar issues in your project that's quite unfortunate! In this case you probably need to look into the generated assembly to see what is actually going on.

I could also take a look at it if you don't have experience reading assembly, though I probably won't have time to analyze it in detail.

from kahypar.

whwang1996 avatar whwang1996 commented on June 2, 2024

Yeah, if you have similar issues in your project that's quite unfortunate! In this case you probably need to look into the generated assembly to see what is actually going on.

I could also take a look at it if you don't have experience reading assembly, though I probably won't have time to analyze it in detail.

Unluckily, I've tried my best, but I still didn't find the solution. The assembly is the same with/out libkahypar.so. And the same issue didn't occur when using PaToH. So switching to PaToH is my final solution. :)

Thank you anyway!

from kahypar.

kittobi1992 avatar kittobi1992 commented on June 2, 2024

Hi @whwang1996,

You can also try out the shared-memory parallelization of KaHyPar:
https://github.com/kahypar/mt-kahypar
In its default configuration, it is faster than PaToH when using all threads of a modern workstation and produces much better partitions. In its highest-quality configuration, its produces comparable partitions to KaHyPar but is an order of magnitude faster when using ten threads. Let me know if you have any further questions ;)

from kahypar.

SebastianSchlag avatar SebastianSchlag commented on June 2, 2024

Hey @whwang1996,
I'm sorry that we couldn't help you resolving this. Thank you very mich though for following up.

from kahypar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.