Sorry if this is a vague deion, I can supply additional files to create some pro

Ah thanks, I didn't know about timeout . <p dir="a

Because the fastest step (in your case) can be achieved in <code class="n

Parallelism doesn't do much about cvise HOT 9 CLOSED

fiesh commented on June 16, 2024

Parallelism doesn't do much

from cvise.

Comments (9)

marxin commented on June 16, 2024

Hey, thanks for the report. First, please paste here a reduction log so that I can see what happens. How big is your a.ii file? Note the most progress should be achieved by clang_delta passes remove-unused-function and replace-function-decl which run in parallel. It runs multiple int. test runs, and once one successfully finishes, it waits for the others to start before the "winner". Since then, no new tasks are run and the first successful is taken.

Another potential speed-up can be achieved by measuring time for g++ -c a.ii (that should succeed) and then run timeout X g++ -c a.ii -fimplicit-constexpr with X equal to 10 x time of the first command execution.

Generally speaking, there should not be any problem with your interestingness test.

from cvise.

fiesh commented on June 16, 2024

Ah thanks, I didn't know about timeout.

This is what the reduction log looks like:

% nice -n 20 cvise -n 64 --clang-delta-std c++20 test.sh a.ii
00:00:30 INFO ===< 8414 >===
00:00:30 INFO running 64 interestingness tests in parallel
00:00:30 INFO INITIAL PASSES
00:00:30 INFO ===< IncludesPass >===
00:00:30 INFO ===< UnIfDefPass >===
00:00:31 INFO ===< CommentsPass >===
00:00:31 INFO ===< IfPass >===
00:00:31 INFO ===< LineMarkersPass >===
00:00:31 INFO ===< BlankPass >===
00:00:31 INFO ===< ClangBinarySearchPass::replace-function-def-with-decl (30 T) >===
00:01:03 INFO (1.0%, 139823 bytes, 1789 lines)
00:01:34 INFO (1.1%, 139730 bytes, 1786 lines)
00:02:06 INFO (1.3%, 139433 bytes, 1786 lines)
00:02:37 INFO (1.6%, 138994 bytes, 1786 lines)
00:03:08 INFO (2.5%, 137627 bytes, 1786 lines)
00:03:39 INFO (2.6%, 137535 bytes, 1784 lines)
00:04:10 INFO (2.6%, 137529 bytes, 1784 lines)
00:04:41 INFO (2.7%, 137409 bytes, 1781 lines)
00:05:12 INFO (2.9%, 137173 bytes, 1777 lines)
00:05:43 INFO (2.9%, 137167 bytes, 1777 lines)
00:05:57 INFO Exiting now ...
% mv a.ii.orig a.ii
% nice -n 20 cvise -n 4 --clang-delta-std c++20 test.sh a.ii
00:00:30 INFO ===< 54808 >===
00:00:30 INFO running 4 interestingness tests in parallel
00:00:30 INFO INITIAL PASSES
00:00:30 INFO ===< IncludesPass >===
00:00:30 INFO ===< UnIfDefPass >===
00:00:30 INFO ===< CommentsPass >===
00:00:30 INFO ===< IfPass >===
00:00:30 INFO ===< LineMarkersPass >===
00:00:30 INFO ===< BlankPass >===
00:00:31 INFO ===< ClangBinarySearchPass::replace-function-def-with-decl (30 T) >===
00:01:04 INFO (1.0%, 139823 bytes, 1789 lines)
00:01:35 INFO (1.1%, 139730 bytes, 1786 lines)
00:02:07 INFO (1.3%, 139433 bytes, 1786 lines)
00:02:39 INFO (1.6%, 138994 bytes, 1786 lines)
00:03:10 INFO (2.5%, 137627 bytes, 1786 lines)
00:03:41 INFO (2.6%, 137535 bytes, 1784 lines)
00:04:11 INFO (2.6%, 137529 bytes, 1784 lines)
00:04:42 INFO (2.7%, 137409 bytes, 1781 lines)
00:05:13 INFO (2.9%, 137173 bytes, 1777 lines)
00:05:46 INFO (2.9%, 137167 bytes, 1777 lines)
00:06:04 INFO Exiting now ...

I should say that in the first case, it appears as if 32 threads run simultaneously, while in the second there appear to be 4 threads.

testcase.tar.gz

from cvise.

marxin commented on June 16, 2024

OK, so the speed of both -n X is pretty much the same, which is kind-of-expected. Please use htop or something similar to see the system utilization. Or use --debug argument and you'll see all the BinaryState modifications that are tried with the int. test.

Looking at your test-case, the g++ invocation w/o the -fimplicit-constexpr finishes in 0.2s, thus I'm running:
timeout 0.4 g++ a.ii -c -std=c++20 -w && ! timeout 4 g++ a.ii -c -fimplicit-constexpr -std=c++20'

and the reduction runs pretty smoothly with the following system utilization:

from cvise.

fiesh commented on June 16, 2024

OK, so the speed of both -n X is pretty much the same, which is kind-of-expected.

I guess I have some fundamental misunderstanding here... why is this expected?

from cvise.

marxin commented on June 16, 2024

Because the fastest step (in your case) can be achieved in 30s and it seems that the tests (reduced by C-Vise) either fail fast (due to a compilation error of the first g++ invocation), or really reach the timeout of 30s. That's why you can't see a difference between 4 and 32 threads.

Does it help?

from cvise.

fiesh commented on June 16, 2024

Because the fastest step (in your case) can be achieved in 30s and it seems that the tests (reduced by C-Vise) either fail fast (due to a compilation error of the first g++ invocation), or really reach the timeout of 30s. That's why you can't see a difference between 4 and 32 threads.

Does it help?

So they effectively just race? That's my misunderstanding then I suppose, I was assuming some more proper parallelization. If they all race, this behavior makes perfect sense, but using many threads will more often than not be entirely pointless. Do all passes just race? It might be worth explaining that in the documentation?

from cvise.

marxin commented on June 16, 2024

Well, partially, because the timeout 30 is slowing that down rapidly. What a BinaryPass in C-Vise does is that it attempts to make the reduction of [0-n/2, n/2-n, 0-n/4, n/4-n/2, ...] and once a first interval finishes, it cancels all parallel runs and it waits for all intervals before the one which succeeded. If the success rate is pretty high (your test case), then each step basically takes 30 seconds before a new test case is again split into intervals.

Note, in most cases, when a int. test finishes fast, most of the time is spent in either clang_delta and thus it does not play a role.

Overall speaking, having a long time out can't make the reduction much slower as the nature of C-Vise tool is to reduce a test case incrementally.

from cvise.

fiesh commented on June 16, 2024

I see, thanks for the explanation. I guess an alternative would be to, instead of taking the first success and waiting for potentially better ones, to combine the successful intervals? Like if line a and line b can be removed individually, there's a high chance they can also be removed both simultaneously?

Anyway, I suppose this is a misunderstanding on my side. I don't know if there's a way to improve the behavior like I suggested or the documentation can be made clearer, but basically there's a large racing component to the parallelism here that just really reduces the usefulness of many threads in some scenarios. So I think this can be closed since it's just my misunderstanding and no wrong behavior.

Thank you for your explanations!

from cvise.

marxin commented on June 16, 2024

I see, thanks for the explanation. I guess an alternative would be to, instead of taking the first success and waiting for potentially better ones, to combine the successful intervals? Like if line a and line b can be removed individually, there's a high chance they can also be removed both simultaneously?

Yes, there's a high chance that both intervals (or more) can be combined. However, it would make the entire reduction algorithm much more complicated.

If you want, feel free to make a pull request with either documentation improvement, or with an attempt to more complicated merging of the intervals. Patches are welcome!

from cvise.

Parallelism doesn't do much about cvise HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent