Page MenuHomePhabricator

Add support for Branch Coverage in LLVM Source-Based Code Coverage
ClosedPublic

Authored by alanphipps on Jul 23 2020, 2:31 PM.

Details

Summary

In January, I posted a basic design outline on llvm-dev for supporting Branch Coverage as part of LLVM Source-based Code Coverage. In general, I think this is a straightforward extension of source-based code coverage to measure True/False execution counts of branch-generating conditions in the source code.

Note that by "branch coverage", I am referring to a granular level of measurement for each leaf-level boolean expression (i.e. "condition") that may compose larger boolean expressions using logical operators. This is an equivalent level of granularity to what is reported by GCOV. This granular level of branch coverage I implemented is amenable to further extension for MC/DC analysis. I am not referring to higher-level "decision coverage".

I apologize for the lengthy review -- there are three main sections, but they're interconnected. Most of the other changes are for testing (which, I confirmed, is sufficient to provide coverage of everything that was added). This is my first phabricator review, so please give me suggestions or improvement!

Examples of output for Text and HTML:

Text: f{F12395372}
HTML (as PDF):

Three main chunks of work:

  • Adding and using "Branch Regions":

Extending the coverage mapping format with a notion of a "branch region" kind with an additional alternate "FalseCount" counter. Both counters can then be used to measure "True" and "False" branch counts. Also extend the Writer/Reader encoding routines to comprehend it.

clang/lib/CodeGen/CoverageMappingGen.cpp
clang/lib/CodeGen/CoverageMappingGen.h
llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
  • Instrumenting logical operators ("&&", "||"):

The existing counter instrumentation can be leveraged for most branch regions. Logical operators are a bit special and require an additional counter.

Consider: "X = C1 || C2;"

For logical operators, an execution counter is already emitted to track the execution count for the right-hand-side of the expression (C2), and because of logical operator short-circuit semantics, this counter can tell you whether they left-hand-side (C1) evaluated to True or False (depending on whether it is logical-AND or logical-OR). However, this doesn't present enough information to tell you how many times the right-hand-side (C2) evaluated to True or False. Therefore, we need to instrument an additional counter here that is dependent on the operator semantics.

Note that there is room for optimization here: when a boolean expression with logical operators is used with a control-flow statement (like if-stmt), we can leverage the "Then" block counter. And in cases where you have multiple levels of nested logical operators, we can re-use counters. For this review, I have chosen to keep the design straightforward and will defer these optimizations to a future patch.

Since the new counters extend the profile format, I bumped the profile version to allow backward compatibility with older profile versions.

clang/lib/CodeGen/CGExprScalar.cpp
clang/lib/CodeGen/CGStmt.cpp
clang/lib/CodeGen/CodeGenFunction.cpp
clang/lib/CodeGen/CodeGenFunction.h
clang/lib/CodeGen/CodeGenPGO.cpp
llvm/include/llvm/ProfileData/InstrProf.h
llvm/include/llvm/ProfileData/InstrProfData.inc
  • Visualization (llvm-cov):

Extend the llvm-cov built-in notion of "SubView" to visualize branches for whole files, individual functions, and macro expansions (which are recursive because macros can be based on other macros). SubViews keep the visualization regions visually distinct.

Extend summary reports to include branch coverage
Enable branch coverage data to be exportable into JSON and LCOV formats.

llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
llvm/tools/llvm-cov/CodeCoverage.cpp
llvm/tools/llvm-cov/CoverageReport.cpp
llvm/tools/llvm-cov/CoverageSummaryInfo.cpp
llvm/tools/llvm-cov/CoverageSummaryInfo.h
llvm/tools/llvm-cov/CoverageViewOptions.h
llvm/tools/llvm-cov/SourceCoverageView.cpp
llvm/tools/llvm-cov/SourceCoverageView.h
llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
llvm/tools/llvm-cov/SourceCoverageViewHTML.h
llvm/tools/llvm-cov/SourceCoverageViewText.cpp
llvm/tools/llvm-cov/SourceCoverageViewText.h
llvm/tools/llvm-cov/CoverageExporterJson.cpp
llvm/tools/llvm-cov/CoverageExporterLcov.cpp

Diff Detail

Event Timeline

alanphipps created this revision.Jul 23 2020, 2:31 PM
vsk added subscribers: ikudrin, arphaman.

@alanphipps thanks for the patch! I'm a bit swamped at the moment, but I hope to give a detailed review by this Wednesday.

clang/lib/CodeGen/CodeGenFunction.h
4399

It might be helpful to either require that C be the RHS of a logical binop (e.g. by passing the binop expr in), or to document that requirement.

4399

Given a logical binop like E = a && !(b || c), isLeafCondition(E->getRHS()) is true. That seems a bit counter-intuitive, because E->getRHS() contains another leaf condition. Would it make sense to rename the condition (perhaps to something like 'InstrumentedCondition')? Have I misunderstood what a leaf condition is?

alanphipps marked 2 inline comments as done.Jul 27 2020, 9:38 AM
alanphipps added inline comments.
clang/lib/CodeGen/CGStmt.cpp
1444

This fixes a defect that resulted in the wrong counter being instrumented for each Case.

clang/lib/CodeGen/CodeGenFunction.h
4399

Background: isLeafCondition() is an auxiliary routine that is used during 1.) counter allocation on binop RHS (CodeGenPGO.cpp), 2.) counter instrumentation (CodeGenFunction.cpp), and 3.) branch region generation (CoverageMappingGen.cpp). In the #3 case, it isn't always looking at the RHS of a logical binop but can be used to assess whether any condition is instrumented/evaluated.

Given your example condition:

E = a && !(b || c)

You are correct that isLeafCondition(E->getRHS()) will return true, but I think it should actually return false here (and bypass logical-NOT), so I will adapt this.

However, given a condition that includes an binary operator (like assign):

E = a && (x = !(b || c))

isLeafCondition(E->getRHS()) will also return true, and I think that is the correct behavior. Given that, then I agree that maybe isLeafCondition() should be renamed to isInstrumentedCondition() since it's not technically just leaves that are tracked in the presence of operators.

vsk added inline comments.Jul 28 2020, 4:05 PM
clang/lib/CodeGen/CodeGenFunction.h
4399

What is special about the assignment expression nested within "a && (x = !(b || c))" that necessitates an extra counter compared to "a && !(b || c)"?

clang/lib/CodeGen/CoverageMappingGen.cpp
95

nit: Please end comments with a '.'

596

It looks like this sets up a deferred region when a branch count (FalseCount) is available. Could you explain why, and when the deferred region is expected to be complete?

695

Could you explain why there's an exception for branch regions here?

803

nit: 'can be constant folded' may be slightly more idiomatic.

821

Is there some alternative to pushing and then immediately popping branch regions? Did you try adding the region directly to the function's SourceRegions, and find that there was some issue with the more direct approach?

828

Right, I don't think that alternative is practicable.

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659

Wdyt of displaying branch-taken counts using tooltips? This would match the way region execution counts are shown, which could be nice, because the information wouldn't interrupt the flow of source code.

vsk added a comment.Jul 28 2020, 4:07 PM

I haven't taken a close look at the tests yet, but plan on getting to it soon.

I'm not sure whether this is something you've already tried, but for this kind of change, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.

Thank you for your comments on the review! I will respond soon; also have been swamped this week.

In D84467#2180421, @vsk wrote:

I haven't taken a close look at the tests yet, but plan on getting to it soon.

I'm not sure whether this is something you've already tried, but for this kind of change, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.

I actually haven't tried that -- I've attempted to, but I've run into some cmake issues trying to generate the bootstrap build configuration, but I'll look into it further. But I have successfully been able to use a host clang compiler to build my target compiler with assertions enabled and coverage instrumentation enabled, and I was able to use that to confirm that the coverage tests (with my additional tests) sufficiently cover the code I added, and no assertions fired.

clang/lib/CodeGen/CodeGenFunction.h
4399

I'm exempting the logical NOT operator basically to match the functionality of other coverage vendors (Bullseye, GCOV). It's a simplistic operator in the sense that (as far as I can tell) it only affects the sense of the generated branch on a condition.

As for the assignment, we're effectively creating a new condition that is evaluatable (even if a branch may technically not be generated), and so creating a counter for it is more interesting (and matches other vendor behavior).

But I'm open to persuasion. We could just be more conservative and create counters for logical NOT, but I think there's value in matching some of the expected behavior of other vendors. This is also something we could continue to fine-tune in future patches.

clang/lib/CodeGen/CoverageMappingGen.cpp
95

nit: Please end comments with a '.'

Will do; I'll upload a diff soon with formatting cleanup.

596

Does adding a region to the RegionStack always imply setting up a deferred region that needs to be completed? My intention was to add the branch region to the RegionStack but not to setup or complete a deferred region for branch regions.

Branch regions are straightforward since we already know start location and end location when they are created, but I have probably misunderstood what a deferred region implies.

695

This was a design decision -- if a branch region is split across expansions, I still want to keep a single branch region, and I want the startloc and endloc to be consistently in the same file/expansion.

So don't create a separate region for branch regions, but do reset the StartLoc/EndLoc accordingly. I can add this as a comment, or perhaps there is a better way to do this.

Admittedly the cases that bring this about are strange, e.g.:

extern void func2();
void func1(bool a, bool b) {
  #define CONDEND b)
  if (a == CONDEND {
      func2();
  }
}

but they can happen.

803

I'll change.

821

The push/pop combo does look strange, and you're right -- I can just add it directly to SourceRegions, but the algorithm in popRegions() will do additional things like adjust the start/end location if the region spans a nested file depth. I think that's still useful to do. Another option is I could factor that algorithm out of popRegions() into a new function and also call that from createBranchRegions().

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659

I think using tooltips is a great idea, and I did look into it. Ultimately, I decided to defer that work to a future patch to allow for some time to research a good way to properly distinguish the branch-taken counts from the region counts.

vsk added inline comments.Aug 14 2020, 4:21 PM
clang/lib/CodeGen/CodeGenFunction.h
4399

I agree that there isn't a need to insert a fresh counter to accommodate a logical NOT operator in a condition. But I don't see how an assignment expression effectively creates a new, evaluatable condition. It seems like the count for the assignment expr can be derived by looking up the count for its parent region.

At this stage I'm more interested in understanding the high-level design. If the question of whether/not to add fresh counters for assignment exprs in a condition is effectively about optimization, then I'm fine with tabling it.

clang/lib/CodeGen/CoverageMappingGen.cpp
596

Oops, I see now that this selects a different SourceMappingRegion constructor than the one I'd thought of. In general, no, adding a region to RegionStack doesn't always entail setting up a deferred region. It looks like you're not doing that here. (An aside: deferred regions were introduced to fix line execution counts for lines following terminator statements, like a 'return'.)

821

If it's possible/straightforward to factor the region start/end adjustment code out of popRegions(), that would be really nice. If not, the current approach looks reasonable.

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659

Sounds reasonable. Can the branch coverage text/html output should be opt-in via a cl::opt until it's finalized? That should make it possible to iterate on the format without changing what users see.

alanphipps marked 13 inline comments as done.Aug 23 2020, 12:15 PM
In D84467#2180421, @vsk wrote:

I haven't taken a close look at the tests yet, but plan on getting to it soon.

I'm not sure whether this is something you've already tried, but for this kind of change, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.

Thank you for recommending this. I was able to do an instrumented stage2 build successfully, with assertions enabled, and it built fine, and no problems running LIT. I have to say that it was also cool to see branch coverage on the source code I added to support branch coverage!

clang/lib/CodeGen/CoverageMappingGen.cpp
821

I looked at it again, and I decided to table the refactor for now. The adjustment code is coupled to SourceRegions as is the rest of popRegions(), and it may be necessary to rework the whole of the function to get a nicer refactor.

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659

Yep, I added two options:

--show-branch-summary, which shows the total branch coverage in the summary report, and this is ON by default.
--show-branches={count,percent}, which shows source-level coverage on a given file, which is OFF by default. So it's opt-in.

alanphipps marked 2 inline comments as done.

Updating for formatting and comments (and some test adjustments after rebase). Bypassing logical-NOT operators in CodeGenFunction::isLeafCondition(), which checks the condition for the presence of logical-AND and logical-OR. (this can be further expanded, if necessary)

alanphipps marked an inline comment as done.Sep 7 2020, 9:03 AM
alanphipps added inline comments.
clang/lib/CodeGen/CodeGenFunction.h
4399

I just realized I didn't make this name change to isInstrumentedCondition(). I will do that.

alanphipps updated this revision to Diff 290301.Sep 7 2020, 9:05 AM
  • Rename isLeafCondition() to isInstrumentedCondition() and rephrase associated comments
  • Add branch regions on C++ range-based loops
vsk added a comment.Sep 9 2020, 12:46 PM

@alanphipps thanks for bearing with me. I think this is about ready to land. I do ask that you back out any punctuation/whitespace changes in code/tests that aren't directly modified in your diff (the clang-format-diff script can help with this). It looks like some libCoverage and test changes are no longer in the current version of this patch, as compared to the previous version (https://reviews.llvm.org/D84467?id=287269). Was that intentional?

In D84467#2264205, @vsk wrote:

@alanphipps thanks for bearing with me. I think this is about ready to land. I do ask that you back out any punctuation/whitespace changes in code/tests that aren't directly modified in your diff (the clang-format-diff script can help with this). It looks like some libCoverage and test changes are no longer in the current version of this patch, as compared to the previous version (https://reviews.llvm.org/D84467?id=287269). Was that intentional?

I apologize for the long delay myself -- I think I discovered a bug in the code that I'm trying to work out. In answer to your question about the patches, no it wasn't intentional; I believe I produced the patch incorrectly, only capturing the most recent changes. I'll update this soon.

alanphipps updated this revision to Diff 296017.EditedOct 3 2020, 6:14 PM

Updated diff:
1.) Fix a defect in the CGExprScalar.cpp code to instrument RHS condition for branch coverage that resulted in a RHS condition as being evaluated twice. This is incorrect behavior since a condition may have side-effects. I also added a test.
2.) Back out unintended whitespace/punctuation (please let me know if I overlooked anything).
3.) Rebase against updated upstream.

vsk added inline comments.Oct 6 2020, 4:15 PM
clang/lib/CodeGen/CoverageMappingGen.cpp
1169

If theWhileStmt->getCond() is-a BinaryOperator, it would not be considered an instrumented condition and no branch region would be created here. OTOH, if the condition is instrumented (e.g. as it would be in while (x);), a branch region would be created.

Is this the expected outcome? It seems a little bit inconsistent compared to either a) allowing the RecursiveASTVisitor to identify expressions that require branch regions, or b) guaranteeing that each while statement comes with a branch region for its condition.

I have the same question about the createBranchRegion calls in VisitDoStmt, VisitForStmt, etc. (But I'm not concerned about the calls to createBranchRegion in VisitBinL{And,Or} and VisitSwitch*.)

1277

Is the condition of a CXXForRangeStmt something that's visible to the user? It looks more like a clang implementation detail (but I could be mistaken).

alanphipps marked 2 inline comments as done.Oct 8 2020, 8:09 AM
alanphipps added inline comments.
clang/lib/CodeGen/CoverageMappingGen.cpp
1169

Right, that's the expected outcome, but I think I see what you're saying in that it could be confusing to call "createBranchRegion()" that may not actually create a branch region in some cases.

I could move the check for isInstrumentedCondition() out of createBranchRegion() to make it clear that a branch region is only created for when that is TRUE, but I wanted to encapsulate as much as I could in the function to avoid duplication. Perhaps it would be better to rename "createBranchRegion()" to something more like "createBranchRegionForInstrumentedCond()"?

The same behavior exists for VisitBinL{And,Or} since those conditions could also be nested BinaryOperators.

1277

Technically the condition isn't visible, per se, but the range-based construct implies a branchable condition that other vendors do track, so I still think it's useful to show it.

vsk added inline comments.
clang/lib/CodeGen/CoverageMappingGen.cpp
1169

I think it's fine for createBranchRegion to not guarantee that a region is created, since the logic is better-encapsulated this way.

I misunderstood how the feature is supposed to work. I assumed that all interesting branch regions would "automatically" be created during the recursive AST visit, by virtue of the logic you've added to VisitBinL{And,Or}. But it looks like the goal is to also show branch regions around loop constructs, regardless of whether or not the loop condition contains a binop.

1277

Seems reasonable -- I was just wondering whether a CXXForRangeStmt ever has a non-null condition attached. (I didn't see an example of this in loops.cpp, but perhaps I just missed the correct test case.)

llvm/include/llvm/ProfileData/InstrProfData.inc
650

Note that there's a copy of this file in llvm-project/compiler-rt (this allows the compiler-rt subtree to be checked out and built without any llvm dependency). Please copy over this change to compiler-rt/include/profile/InstrProfData.inc.

llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
258

Could you bump the coverage mapping format version, and document the extension to the format in docs/CoverageMappingFormat.rst?

This should prevent an older llvm-cov from reporting "malformed object" when encountering a branch region.

alanphipps marked 4 inline comments as done.

Bump format version and add preliminary documentation for CoverageMappingFormat, SourceBasedCodeCoverage, and llvm-cov tool

alanphipps marked 2 inline comments as done.Nov 23 2020, 11:54 AM
vsk accepted this revision.Nov 30 2020, 11:47 AM

Thank you, lgtm!

This revision is now accepted and ready to land.Nov 30 2020, 11:47 AM

Diff after rebase.

In D84467#2423636, @vsk wrote:

Thank you, lgtm!

Thank you! I also would like to ask if you could commit it for me as I don't have access for that. Note that a handful of tests have binary files which were not uploaded with the diff. What do you advise I do to get those up here so that I can commit it (unfortunately my system doesn't have arcanist setup and that would take some time). Thanks again for all of your help!

vsk added a comment.Dec 11 2020, 9:01 AM
In D84467#2423636, @vsk wrote:

Thank you, lgtm!

Thank you! I also would like to ask if you could commit it for me as I don't have access for that. Note that a handful of tests have binary files which were not uploaded with the diff. What do you advise I do to get those up here so that I can commit it (unfortunately my system doesn't have arcanist setup and that would take some time). Thanks again for all of your help!

Unfortunately I don't think I'll be able to commit this on your behalf. Would you be open to obtaining commit access from Chris and landing this directly? I ask because, for a patch of this size, there's typically a fair amount of post-commit feedback (primarily in the form of bot failure notifications, but occasionally design feedback as well). It's common for tests to pass locally, but not on some specific CI node, and it can take quite a bit of time to resolve those issues.

In D84467#2449045, @vsk wrote:

Unfortunately I don't think I'll be able to commit this on your behalf. Would you be open to obtaining commit access from Chris and landing this directly? I ask because, for a patch of this size, there's typically a fair amount of post-commit feedback (primarily in the form of bot failure notifications, but occasionally design feedback as well). It's common for tests to pass locally, but not on some specific CI node, and it can take quite a bit of time to resolve those issues.

Sure, that's reasonable, thanks. I'll give that a shot.

Herald added a project: Restricted Project. · View Herald TranscriptTue, Jan 5, 7:52 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dblaikie added inline comments.
llvm/test/tools/llvm-cov/branch-noShowBranch.test
3

This test depends on another test file as input which is generally not done in the LLVM test suite - inputs to tests are placed in the "Inputs" subdirectory, rather than in the test directory itself. I realize a few other tests already here (demangle, hideUnexpectedSubviews, style, and threads) already do this - but it'd be good to not add more cases (& fix those existing cases where possible)

Could you fix this so this test (& possibly others, if you have the opportunity/bandwidth) doesn't depend on other test files, but only on files in the Inputs subdirectory?

alanphipps marked an inline comment as done.Tue, Jan 5, 4:08 PM
alanphipps added inline comments.
llvm/test/tools/llvm-cov/branch-noShowBranch.test
3

Certainly I can do that -- I believe there are two other tests in which I did this other than this test: branch-export-json.txt and branch-export-lcov.test. I will adjust those tests that I added within the next days but I don't think I have the bandwidth to change other tests that also do this.

dblaikie added inline comments.Tue, Jan 5, 4:17 PM
llvm/test/tools/llvm-cov/branch-noShowBranch.test
3

Sure thing, thanks a bunch!

At least having fewer instances of the unfavorable idioms means they're less likely to be repeated in the future, even if there are some leftover instances/anachronisms.

alanphipps marked an inline comment as done.Thu, Jan 7, 2:51 PM
alanphipps added inline comments.
llvm/test/tools/llvm-cov/branch-noShowBranch.test
3

Changes have been committed here: https://reviews.llvm.org/rGebcc8dcb68aa
Thanks for pointing this out!