To do
Blockers
Notes
Plan disccused here:
https://public.etherpad-mozilla.org/p/code_coverage_Q3_17
I'm pasting over here some conversations we've had around the topic.
Armen wrote:
We will only be able to show accurate code coverage data for the latest version of a file
and/or test. If there are multiple changesets on a m-c push touching the same file or
related tests we will not be able to give accurate code coverage information for the older
changesets. Did I understand correctly?
gmierz wrote:
A strategy for getting around the multiple change sets for a single file could be to mark
what changeset each of the changes come from. That way, if we just use the latest file
in the current m-c push and the latest in the previous m-c push, there should be no
issues. But you are right to say "only" because this doesn't necessarily give us coverage
data from those patches exactly. I think this would be the best way to get around the
problem for now though.
What we hope to be able to use, ideally, to fix this is changeset algebra. Kyle, and I have
been playing with this idea and he's made a nice data structure that can take code
coverage backwards in changesets. I'll write down an example to illustrate this to you
tonight because it's not easy to explain in words. And we'll have to wait before we can
use it so we can test this idea a bit more.
Marco wrote:
Yes, this is correct. In some cases, we might be able to also say something about the
intermediate commits. For example, if commit A and commit B both change the same
file but not the same lines, then we can try to find the lines modified by A in the last
revision by taking into account the additions/removals made by B.
In the case where they both modify exactly the same line, we only really care about
the end result. If the line is covered, then it doesn't matter if it wasn't covered in A,
as long as it is covered in the end. If the line is not covered, then we will blame both A
and B as introducing an uncovered line.
Kyle wrote:
For our current implementation plan this understanding is correct, but it is not correct
in general: Code coverage can be mapped from one revision to another; even in the
case of multiple changes to the same file: The solution is to treat files as points in a
vector space; convert changesets to (orthogonal, binary) matrices; which allows us
to map coverage vectors from one revision to another, forward or backward.
We use that coverage to either make claims about total coverage, or claims about
coverage on net new lines, or combine multiple coverage runs for a more-stable
coverage statistic.
Kyle wrote:
The math is just a tool that provides the best conclusion given the available data. ... the math
I am proposing can handle multiple changes to the same file just fine. I added specific
examples [1] as tests so that Greg may see it operate.
... it is the best way to simulate per-changeset coverage without actually doing it. It also
provides us a way to compare coverage over multiple revisions. Again, neither of these
capabilities may be useful, which is why we are putting Marco's coverage mapping (and the
math that can replace it) behind an API [2] the frontend can use. We can decide later if we
want the features [that the] math provides.
[1] https://github.com/klahnakoski/diff-algebra/blob/master/tests/test.py