lcov -r is taking over an hour on a large project. Algorithm issue since memory usage

Performance seems non-linear,about linux-test-project/lcov

Comments (15)

oberpar commented on June 24, 2024

Thanks for the analysis. Not sure if this can be easily fixed. This will need some consideration.

from lcov.

creich commented on June 24, 2024

found the same issue at the same place, so i can confirm this finding.
i already tried some refactoring (replaceed the vectors with hashes), which looks promising. But there is still a lot of work to be done.
hopefully i'll find some time to finish this within the next weeks..

from lcov.

oberpar commented on June 24, 2024

Thanks for looking into this. You might want to look out for memory usage when switching over to hashes - I remember vaguely that an initial prototype of branch coverage support used the same and the amount of memory used was excessive.

from lcov.

creich commented on June 24, 2024

i somehow expected that topic ;)
anyway, i would like to ask if someone knows the reason for the current way of implementation? might it be because of the memory footprint, or are there other reasons too?

from lcov.

creich commented on June 24, 2024

I think i found a usable "hybrid" solution and started some testing. Hopefully within the next few days i might be able to provide some results. For now it looks like memory usage is not much higher than before, but the speed improvement is huge.

I will push my current approach to my github fork, so you might have a look at it. But before opening a pull request i would like to clean it up a bit and add some comments.

Is there any "default" test case available?

from lcov.

oberpar commented on June 24, 2024

That's great news! I'll be sure to have a look at the code. Please note that for the actual integration of the final change, I'll need the commits sent to the ltp-coverage mailing list (see https://github.com/linux-test-project/lcov/blob/master/CONTRIBUTING).

Regarding test case - this is somewhat of an open issue (i.e. there's no definite test suite available). What you can do is simply compile a large-ish project with gcc and collect branch coverage data for that. I'm typically doing that for the Linux kernel.

from lcov.

creich commented on June 24, 2024

Thanks for the reply.
For testing i am using both versions on our current companies project (where the slowdown occurred first for me). I'll go to run the test on the linux kernel also.

Meanwhile i found some small possible improvements to my current solution. Last "finding" was that i might be able to get rid of two subroutines i introduced and reuse some code that is already there (which i didn't notice in the first place). So hopefully i'll be able to shrink the "patch" a bit more.

I think it is worth to wait some more days before having a look to the code. So please stay tuned, i'll keep you updated.

from lcov.

oberpar commented on June 24, 2024

With the commit you referenced, I can see a run-time reduction to about 1/3.2 with about the same memory usage, which is nice!. Looking at the code some more though, I'm starting to wonder if it's
really worth to further improve on the vector based data representation.

Inspired by your patch I played around a little with different in-memory data representations for branch coverage data. Testing with an array based representation, I can see a memory increase to about 4x. Using a purely textual data representation, I was able to achieve a run-time reduction to 1/30 (loading a single 20M data file containing branch coverage), with no increase in memory usage. I'll check to see if this representation can be applied in all places.

from lcov.

creich commented on June 24, 2024

Glad to hear that i might have given you some inspiration :)
I noticed that i missed the part of combining two info files, which i fixed now. I was also able do get rid of two of my introduced subroutines through re-usage of some db-related subs.
once i am done testing i'll push it to github too.

the runtime reduction i get varies a lot depending on the content i am working on. Biggest improvement i got looks like this one:

#### original:

Deleted 238 files
Writing data to coverage_test_2_striped.info
Summary coverage rate:
  lines......: 91.3% (48445 of 53061 lines)
  functions..: 66.1% (4194 of 6345 functions)
  branches...: 45.1% (101806 of 225818 branches)

real    257m50.339s
user    257m42.652s
sys     0m5.324s

####  new approach:

Deleted 238 files
Writing data to coverage_test_2_striped.info
Summary coverage rate:
  lines......: 91.3% (48445 of 53061 lines)
  functions..: 66.1% (4194 of 6345 functions)
  branches...: 45.1% (101806 of 225818 branches)

real    0m23.006s
user    0m21.936s
sys     0m1.048s

this was related to an 'lcov --remove' command.

I'll keep testing :)

from lcov.

creich commented on June 24, 2024

I think the "hybrid" of using your original vector approach should be the one with the smallest memory footprint. I also ran some tests with hashes and arrays. But found your vector the best, since the conversion is made only a few times and that way does not have a real impact on performance any more, while saving a lot of ram.

from lcov.

creich commented on June 24, 2024

recently pushed my latest changes to this branch: https://github.com/creich/lcov/tree/speedup_read_info_file . as mentioned earlier, i was able to remove two of my new subs, but i would need some testing on the "db" functionality to be sure nothing else got broken.

The br_icev_push call within the db_to_brcount sub, could also skip checking the content of the vector when pushing, so i was able to remove the whole "check for existence" part from br_ivec_push(). That way also the "-a" option is going faster, since there is no branch left that will take more time due to checking.

i am still checking results, but for now it seems that the coverage results are stable and correct (compared to the results of 1.13). I would also like to mention that the same improvement could/should be done to genhtml later.

from lcov.

creich commented on June 24, 2024

I recently committed a patch to the mailing list as stated above. How to proceed? Should i open a pull-request here in parallel?

from lcov.

v-lopez commented on June 24, 2024

As anecdotal evidence I just tried @creich https://github.com/creich/lcov/tree/speedup_read_info_file branch on a large amount of repositories at the same time, and the execution went from 30min to 3min, while the coverage results are almost the same.

Using the branch I have ~280 less lines covered out of a total of 202724 loc. I had to rerun the tests and I don't know if our tests are deterministic, so I cannot say whether the change in lines covered is due to the changes. I'll keep using it because for us the performance gains are 100% worth it.

from lcov.

jaylaal commented on June 24, 2024

@creich: until @oberpar finishes his testing framework, what's your method for verifying that your branch's code is result-neutral? I'm currently testing some performance changes I made to geninfo (where things are slow for me) and clearly don't want to break things. FWIW, I made my changes based on your perf branch (https://github.com/creich/lcov/tree/speedup_read_info_file) and not master.

from lcov.

oberpar commented on June 24, 2024

Should be fixed in git version of LCOV, therefore I'm closing this issue.

from lcov.

Performance seems non-linear about lcov HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent