Giter Site home page Giter Site logo

Parallelism doesn't do much about cvise HOT 9 CLOSED

fiesh avatar fiesh commented on June 16, 2024
Parallelism doesn't do much

from cvise.

Comments (9)

marxin avatar marxin commented on June 16, 2024

Hey, thanks for the report. First, please paste here a reduction log so that I can see what happens. How big is your a.ii file? Note the most progress should be achieved by clang_delta passes remove-unused-function and replace-function-decl which run in parallel. It runs multiple int. test runs, and once one successfully finishes, it waits for the others to start before the "winner". Since then, no new tasks are run and the first successful is taken.

Another potential speed-up can be achieved by measuring time for g++ -c a.ii (that should succeed) and then run timeout X g++ -c a.ii -fimplicit-constexpr with X equal to 10 x time of the first command execution.

Generally speaking, there should not be any problem with your interestingness test.

from cvise.

fiesh avatar fiesh commented on June 16, 2024

Ah thanks, I didn't know about timeout.

This is what the reduction log looks like:

% nice -n 20 cvise -n 64 --clang-delta-std c++20 test.sh a.ii
00:00:30 INFO ===< 8414 >===
00:00:30 INFO running 64 interestingness tests in parallel
00:00:30 INFO INITIAL PASSES
00:00:30 INFO ===< IncludesPass >===
00:00:30 INFO ===< UnIfDefPass >===
00:00:31 INFO ===< CommentsPass >===
00:00:31 INFO ===< IfPass >===
00:00:31 INFO ===< LineMarkersPass >===
00:00:31 INFO ===< BlankPass >===
00:00:31 INFO ===< ClangBinarySearchPass::replace-function-def-with-decl (30 T) >===
00:01:03 INFO (1.0%, 139823 bytes, 1789 lines)
00:01:34 INFO (1.1%, 139730 bytes, 1786 lines)
00:02:06 INFO (1.3%, 139433 bytes, 1786 lines)
00:02:37 INFO (1.6%, 138994 bytes, 1786 lines)
00:03:08 INFO (2.5%, 137627 bytes, 1786 lines)
00:03:39 INFO (2.6%, 137535 bytes, 1784 lines)
00:04:10 INFO (2.6%, 137529 bytes, 1784 lines)
00:04:41 INFO (2.7%, 137409 bytes, 1781 lines)
00:05:12 INFO (2.9%, 137173 bytes, 1777 lines)
00:05:43 INFO (2.9%, 137167 bytes, 1777 lines)
00:05:57 INFO Exiting now ...
% mv a.ii.orig a.ii
% nice -n 20 cvise -n 4 --clang-delta-std c++20 test.sh a.ii
00:00:30 INFO ===< 54808 >===
00:00:30 INFO running 4 interestingness tests in parallel
00:00:30 INFO INITIAL PASSES
00:00:30 INFO ===< IncludesPass >===
00:00:30 INFO ===< UnIfDefPass >===
00:00:30 INFO ===< CommentsPass >===
00:00:30 INFO ===< IfPass >===
00:00:30 INFO ===< LineMarkersPass >===
00:00:30 INFO ===< BlankPass >===
00:00:31 INFO ===< ClangBinarySearchPass::replace-function-def-with-decl (30 T) >===
00:01:04 INFO (1.0%, 139823 bytes, 1789 lines)
00:01:35 INFO (1.1%, 139730 bytes, 1786 lines)
00:02:07 INFO (1.3%, 139433 bytes, 1786 lines)
00:02:39 INFO (1.6%, 138994 bytes, 1786 lines)
00:03:10 INFO (2.5%, 137627 bytes, 1786 lines)
00:03:41 INFO (2.6%, 137535 bytes, 1784 lines)
00:04:11 INFO (2.6%, 137529 bytes, 1784 lines)
00:04:42 INFO (2.7%, 137409 bytes, 1781 lines)
00:05:13 INFO (2.9%, 137173 bytes, 1777 lines)
00:05:46 INFO (2.9%, 137167 bytes, 1777 lines)
00:06:04 INFO Exiting now ...

I should say that in the first case, it appears as if 32 threads run simultaneously, while in the second there appear to be 4 threads.

testcase.tar.gz

from cvise.

marxin avatar marxin commented on June 16, 2024

OK, so the speed of both -n X is pretty much the same, which is kind-of-expected. Please use htop or something similar to see the system utilization. Or use --debug argument and you'll see all the BinaryState modifications that are tried with the int. test.

Looking at your test-case, the g++ invocation w/o the -fimplicit-constexpr finishes in 0.2s, thus I'm running:
timeout 0.4 g++ a.ii -c -std=c++20 -w && ! timeout 4 g++ a.ii -c -fimplicit-constexpr -std=c++20'

and the reduction runs pretty smoothly with the following system utilization:
usage

from cvise.

fiesh avatar fiesh commented on June 16, 2024

OK, so the speed of both -n X is pretty much the same, which is kind-of-expected.

I guess I have some fundamental misunderstanding here... why is this expected?

from cvise.

marxin avatar marxin commented on June 16, 2024

Because the fastest step (in your case) can be achieved in 30s and it seems that the tests (reduced by C-Vise) either fail fast (due to a compilation error of the first g++ invocation), or really reach the timeout of 30s. That's why you can't see a difference between 4 and 32 threads.

Does it help?

from cvise.

fiesh avatar fiesh commented on June 16, 2024

Because the fastest step (in your case) can be achieved in 30s and it seems that the tests (reduced by C-Vise) either fail fast (due to a compilation error of the first g++ invocation), or really reach the timeout of 30s. That's why you can't see a difference between 4 and 32 threads.

Does it help?

So they effectively just race? That's my misunderstanding then I suppose, I was assuming some more proper parallelization. If they all race, this behavior makes perfect sense, but using many threads will more often than not be entirely pointless. Do all passes just race? It might be worth explaining that in the documentation?

from cvise.

marxin avatar marxin commented on June 16, 2024

Well, partially, because the timeout 30 is slowing that down rapidly. What a BinaryPass in C-Vise does is that it attempts to make the reduction of [0-n/2, n/2-n, 0-n/4, n/4-n/2, ...] and once a first interval finishes, it cancels all parallel runs and it waits for all intervals before the one which succeeded. If the success rate is pretty high (your test case), then each step basically takes 30 seconds before a new test case is again split into intervals.

Note, in most cases, when a int. test finishes fast, most of the time is spent in either clang_delta and thus it does not play a role.

Overall speaking, having a long time out can't make the reduction much slower as the nature of C-Vise tool is to reduce a test case incrementally.

from cvise.

fiesh avatar fiesh commented on June 16, 2024

I see, thanks for the explanation. I guess an alternative would be to, instead of taking the first success and waiting for potentially better ones, to combine the successful intervals? Like if line a and line b can be removed individually, there's a high chance they can also be removed both simultaneously?

Anyway, I suppose this is a misunderstanding on my side. I don't know if there's a way to improve the behavior like I suggested or the documentation can be made clearer, but basically there's a large racing component to the parallelism here that just really reduces the usefulness of many threads in some scenarios. So I think this can be closed since it's just my misunderstanding and no wrong behavior.

Thank you for your explanations!

from cvise.

marxin avatar marxin commented on June 16, 2024

I see, thanks for the explanation. I guess an alternative would be to, instead of taking the first success and waiting for potentially better ones, to combine the successful intervals? Like if line a and line b can be removed individually, there's a high chance they can also be removed both simultaneously?

Yes, there's a high chance that both intervals (or more) can be combined. However, it would make the entire reduction algorithm much more complicated.

If you want, feel free to make a pull request with either documentation improvement, or with an attempt to more complicated merging of the intervals. Patches are welcome!

from cvise.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.