Comments (15)
Thanks for the analysis. Not sure if this can be easily fixed. This will need some consideration.
from lcov.
found the same issue at the same place, so i can confirm this finding.
i already tried some refactoring (replaceed the vectors with hashes), which looks promising. But there is still a lot of work to be done.
hopefully i'll find some time to finish this within the next weeks..
from lcov.
Thanks for looking into this. You might want to look out for memory usage when switching over to hashes - I remember vaguely that an initial prototype of branch coverage support used the same and the amount of memory used was excessive.
from lcov.
i somehow expected that topic ;)
anyway, i would like to ask if someone knows the reason for the current way of implementation? might it be because of the memory footprint, or are there other reasons too?
from lcov.
I think i found a usable "hybrid" solution and started some testing. Hopefully within the next few days i might be able to provide some results. For now it looks like memory usage is not much higher than before, but the speed improvement is huge.
I will push my current approach to my github fork, so you might have a look at it. But before opening a pull request i would like to clean it up a bit and add some comments.
Is there any "default" test case available?
from lcov.
That's great news! I'll be sure to have a look at the code. Please note that for the actual integration of the final change, I'll need the commits sent to the ltp-coverage mailing list (see https://github.com/linux-test-project/lcov/blob/master/CONTRIBUTING).
Regarding test case - this is somewhat of an open issue (i.e. there's no definite test suite available). What you can do is simply compile a large-ish project with gcc and collect branch coverage data for that. I'm typically doing that for the Linux kernel.
from lcov.
Thanks for the reply.
For testing i am using both versions on our current companies project (where the slowdown occurred first for me). I'll go to run the test on the linux kernel also.
Meanwhile i found some small possible improvements to my current solution. Last "finding" was that i might be able to get rid of two subroutines i introduced and reuse some code that is already there (which i didn't notice in the first place). So hopefully i'll be able to shrink the "patch" a bit more.
I think it is worth to wait some more days before having a look to the code. So please stay tuned, i'll keep you updated.
from lcov.
With the commit you referenced, I can see a run-time reduction to about 1/3.2 with about the same memory usage, which is nice!. Looking at the code some more though, I'm starting to wonder if it's
really worth to further improve on the vector based data representation.
Inspired by your patch I played around a little with different in-memory data representations for branch coverage data. Testing with an array based representation, I can see a memory increase to about 4x. Using a purely textual data representation, I was able to achieve a run-time reduction to 1/30 (loading a single 20M data file containing branch coverage), with no increase in memory usage. I'll check to see if this representation can be applied in all places.
from lcov.
Glad to hear that i might have given you some inspiration :)
I noticed that i missed the part of combining two info files, which i fixed now. I was also able do get rid of two of my introduced subroutines through re-usage of some db-related subs.
once i am done testing i'll push it to github too.
the runtime reduction i get varies a lot depending on the content i am working on. Biggest improvement i got looks like this one:
#### original:
Deleted 238 files
Writing data to coverage_test_2_striped.info
Summary coverage rate:
lines......: 91.3% (48445 of 53061 lines)
functions..: 66.1% (4194 of 6345 functions)
branches...: 45.1% (101806 of 225818 branches)
real 257m50.339s
user 257m42.652s
sys 0m5.324s
#### new approach:
Deleted 238 files
Writing data to coverage_test_2_striped.info
Summary coverage rate:
lines......: 91.3% (48445 of 53061 lines)
functions..: 66.1% (4194 of 6345 functions)
branches...: 45.1% (101806 of 225818 branches)
real 0m23.006s
user 0m21.936s
sys 0m1.048s
this was related to an 'lcov --remove' command.
I'll keep testing :)
from lcov.
I think the "hybrid" of using your original vector approach should be the one with the smallest memory footprint. I also ran some tests with hashes and arrays. But found your vector the best, since the conversion is made only a few times and that way does not have a real impact on performance any more, while saving a lot of ram.
from lcov.
recently pushed my latest changes to this branch: https://github.com/creich/lcov/tree/speedup_read_info_file . as mentioned earlier, i was able to remove two of my new subs, but i would need some testing on the "db" functionality to be sure nothing else got broken.
The br_icev_push call within the db_to_brcount sub, could also skip checking the content of the vector when pushing, so i was able to remove the whole "check for existence" part from br_ivec_push(). That way also the "-a" option is going faster, since there is no branch left that will take more time due to checking.
i am still checking results, but for now it seems that the coverage results are stable and correct (compared to the results of 1.13). I would also like to mention that the same improvement could/should be done to genhtml later.
from lcov.
I recently committed a patch to the mailing list as stated above. How to proceed? Should i open a pull-request here in parallel?
from lcov.
As anecdotal evidence I just tried @creich https://github.com/creich/lcov/tree/speedup_read_info_file branch on a large amount of repositories at the same time, and the execution went from 30min to 3min, while the coverage results are almost the same.
Using the branch I have ~280 less lines covered out of a total of 202724 loc. I had to rerun the tests and I don't know if our tests are deterministic, so I cannot say whether the change in lines covered is due to the changes. I'll keep using it because for us the performance gains are 100% worth it.
from lcov.
@creich: until @oberpar finishes his testing framework, what's your method for verifying that your branch's code is result-neutral? I'm currently testing some performance changes I made to geninfo
(where things are slow for me) and clearly don't want to break things. FWIW, I made my changes based on your perf branch (https://github.com/creich/lcov/tree/speedup_read_info_file) and not master.
from lcov.
Should be fixed in git version of LCOV, therefore I'm closing this issue.
See also https://sourceforge.net/p/ltp/mailman/message/35772830/
from lcov.
Related Issues (20)
- genhtml: ERROR: unexpected .info file record 'BFH:0' HOT 7
- What do these symbols for branch coverage mean? HOT 3
- `use "genhtml --ignore-errors source,source ..." to suppress this warning` HOT 6
- line coverage looks strange, with cross compilation and remote collection HOT 5
- Lcov2.0 filters normal branched when --filter branch and no_exception_branch=1 both works HOT 11
- function coverage not hit, then line coverage hit HOT 6
- geninfo settings don't work in config files in v2.1-beta1 or v2.1-beta2 HOT 3
- test_differential in the example now gives error HOT 1
- genhtml is too slow compared to old version HOT 10
- genhtml: --keep-going doesn't work for ERROR: <file> is not readable or doesn't exist. HOT 8
- how to see the verbose error msg? HOT 11
- LCOV 2.1 not compatible with Codecov HOT 9
- LCOV coverage.dat file has "DA:0:0" HOT 3
- lcov on x86 HOT 9
- genhtml: ERROR: Can't call method "branchElem" on an undefined value HOT 2
- Duplicate folder structure (--prefix does not help) HOT 2
- gcov error:std::bad_alloc HOT 7
- lcov fails with gcc 14 HOT 1
- perl2lcov: unexpected data type 'time' HOT 6
- genhtml package removed from PyPI HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lcov.