Page MenuHomePhabricator

Add support for Branch Coverage in LLVM Source-Based Code Coverage
Needs ReviewPublic

Authored by alanphipps on Jul 23 2020, 2:31 PM.

Details

Summary

In January, I posted a basic design outline on llvm-dev for supporting Branch Coverage as part of LLVM Source-based Code Coverage. In general, I think this is a straightforward extension of source-based code coverage to measure True/False execution counts of branch-generating conditions in the source code.

Note that by "branch coverage", I am referring to a granular level of measurement for each leaf-level boolean expression (i.e. "condition") that may compose larger boolean expressions using logical operators. This is an equivalent level of granularity to what is reported by GCOV. This granular level of branch coverage I implemented is amenable to further extension for MC/DC analysis. I am not referring to higher-level "decision coverage".

I apologize for the lengthy review -- there are three main sections, but they're interconnected. Most of the other changes are for testing (which, I confirmed, is sufficient to provide coverage of everything that was added). This is my first phabricator review, so please give me suggestions or improvement!

Examples of output for Text and HTML:

Text: f{F12395372}
HTML (as PDF):

Three main chunks of work:

  • Adding and using "Branch Regions":

Extending the coverage mapping format with a notion of a "branch region" kind with an additional alternate "FalseCount" counter. Both counters can then be used to measure "True" and "False" branch counts. Also extend the Writer/Reader encoding routines to comprehend it.

clang/lib/CodeGen/CoverageMappingGen.cpp
clang/lib/CodeGen/CoverageMappingGen.h
llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
  • Instrumenting logical operators ("&&", "||"):

The existing counter instrumentation can be leveraged for most branch regions. Logical operators are a bit special and require an additional counter.

Consider: "X = C1 || C2;"

For logical operators, an execution counter is already emitted to track the execution count for the right-hand-side of the expression (C2), and because of logical operator short-circuit semantics, this counter can tell you whether they left-hand-side (C1) evaluated to True or False (depending on whether it is logical-AND or logical-OR). However, this doesn't present enough information to tell you how many times the right-hand-side (C2) evaluated to True or False. Therefore, we need to instrument an additional counter here that is dependent on the operator semantics.

Note that there is room for optimization here: when a boolean expression with logical operators is used with a control-flow statement (like if-stmt), we can leverage the "Then" block counter. And in cases where you have multiple levels of nested logical operators, we can re-use counters. For this review, I have chosen to keep the design straightforward and will defer these optimizations to a future patch.

Since the new counters extend the profile format, I bumped the profile version to allow backward compatibility with older profile versions.

clang/lib/CodeGen/CGExprScalar.cpp
clang/lib/CodeGen/CGStmt.cpp
clang/lib/CodeGen/CodeGenFunction.cpp
clang/lib/CodeGen/CodeGenFunction.h
clang/lib/CodeGen/CodeGenPGO.cpp
llvm/include/llvm/ProfileData/InstrProf.h
llvm/include/llvm/ProfileData/InstrProfData.inc
  • Visualization (llvm-cov):

Extend the llvm-cov built-in notion of "SubView" to visualize branches for whole files, individual functions, and macro expansions (which are recursive because macros can be based on other macros). SubViews keep the visualization regions visually distinct.

Extend summary reports to include branch coverage
Enable branch coverage data to be exportable into JSON and LCOV formats.

llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
llvm/tools/llvm-cov/CodeCoverage.cpp
llvm/tools/llvm-cov/CoverageReport.cpp
llvm/tools/llvm-cov/CoverageSummaryInfo.cpp
llvm/tools/llvm-cov/CoverageSummaryInfo.h
llvm/tools/llvm-cov/CoverageViewOptions.h
llvm/tools/llvm-cov/SourceCoverageView.cpp
llvm/tools/llvm-cov/SourceCoverageView.h
llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
llvm/tools/llvm-cov/SourceCoverageViewHTML.h
llvm/tools/llvm-cov/SourceCoverageViewText.cpp
llvm/tools/llvm-cov/SourceCoverageViewText.h
llvm/tools/llvm-cov/CoverageExporterJson.cpp
llvm/tools/llvm-cov/CoverageExporterLcov.cpp

Diff Detail

Event Timeline

alanphipps created this revision.Jul 23 2020, 2:31 PM
vsk added subscribers: ikudrin, arphaman.

@alanphipps thanks for the patch! I'm a bit swamped at the moment, but I hope to give a detailed review by this Wednesday.

clang/lib/CodeGen/CodeGenFunction.h
4359

It might be helpful to either require that C be the RHS of a logical binop (e.g. by passing the binop expr in), or to document that requirement.

4359

Given a logical binop like E = a && !(b || c), isLeafCondition(E->getRHS()) is true. That seems a bit counter-intuitive, because E->getRHS() contains another leaf condition. Would it make sense to rename the condition (perhaps to something like 'InstrumentedCondition')? Have I misunderstood what a leaf condition is?

alanphipps marked 2 inline comments as done.Jul 27 2020, 9:38 AM
alanphipps added inline comments.
clang/lib/CodeGen/CGStmt.cpp
1368 ↗(On Diff #280221)

This fixes a defect that resulted in the wrong counter being instrumented for each Case.

clang/lib/CodeGen/CodeGenFunction.h
4359

Background: isLeafCondition() is an auxiliary routine that is used during 1.) counter allocation on binop RHS (CodeGenPGO.cpp), 2.) counter instrumentation (CodeGenFunction.cpp), and 3.) branch region generation (CoverageMappingGen.cpp). In the #3 case, it isn't always looking at the RHS of a logical binop but can be used to assess whether any condition is instrumented/evaluated.

Given your example condition:

E = a && !(b || c)

You are correct that isLeafCondition(E->getRHS()) will return true, but I think it should actually return false here (and bypass logical-NOT), so I will adapt this.

However, given a condition that includes an binary operator (like assign):

E = a && (x = !(b || c))

isLeafCondition(E->getRHS()) will also return true, and I think that is the correct behavior. Given that, then I agree that maybe isLeafCondition() should be renamed to isInstrumentedCondition() since it's not technically just leaves that are tracked in the presence of operators.

vsk added inline comments.Jul 28 2020, 4:05 PM
clang/lib/CodeGen/CodeGenFunction.h
4359

What is special about the assignment expression nested within "a && (x = !(b || c))" that necessitates an extra counter compared to "a && !(b || c)"?

clang/lib/CodeGen/CoverageMappingGen.cpp
74

nit: Please end comments with a '.'

582

It looks like this sets up a deferred region when a branch count (FalseCount) is available. Could you explain why, and when the deferred region is expected to be complete?

681

Could you explain why there's an exception for branch regions here?

789

nit: 'can be constant folded' may be slightly more idiomatic.

807

Is there some alternative to pushing and then immediately popping branch regions? Did you try adding the region directly to the function's SourceRegions, and find that there was some issue with the more direct approach?

814

Right, I don't think that alternative is practicable.

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659 ↗(On Diff #280221)

Wdyt of displaying branch-taken counts using tooltips? This would match the way region execution counts are shown, which could be nice, because the information wouldn't interrupt the flow of source code.

vsk added a comment.Jul 28 2020, 4:07 PM

I haven't taken a close look at the tests yet, but plan on getting to it soon.

I'm not sure whether this is something you've already tried, but for this kind of change, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.

Thank you for your comments on the review! I will respond soon; also have been swamped this week.

In D84467#2180421, @vsk wrote:

I haven't taken a close look at the tests yet, but plan on getting to it soon.

I'm not sure whether this is something you've already tried, but for this kind of change, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.

I actually haven't tried that -- I've attempted to, but I've run into some cmake issues trying to generate the bootstrap build configuration, but I'll look into it further. But I have successfully been able to use a host clang compiler to build my target compiler with assertions enabled and coverage instrumentation enabled, and I was able to use that to confirm that the coverage tests (with my additional tests) sufficiently cover the code I added, and no assertions fired.

clang/lib/CodeGen/CodeGenFunction.h
4359

I'm exempting the logical NOT operator basically to match the functionality of other coverage vendors (Bullseye, GCOV). It's a simplistic operator in the sense that (as far as I can tell) it only affects the sense of the generated branch on a condition.

As for the assignment, we're effectively creating a new condition that is evaluatable (even if a branch may technically not be generated), and so creating a counter for it is more interesting (and matches other vendor behavior).

But I'm open to persuasion. We could just be more conservative and create counters for logical NOT, but I think there's value in matching some of the expected behavior of other vendors. This is also something we could continue to fine-tune in future patches.

clang/lib/CodeGen/CoverageMappingGen.cpp
74

nit: Please end comments with a '.'

Will do; I'll upload a diff soon with formatting cleanup.

582

Does adding a region to the RegionStack always imply setting up a deferred region that needs to be completed? My intention was to add the branch region to the RegionStack but not to setup or complete a deferred region for branch regions.

Branch regions are straightforward since we already know start location and end location when they are created, but I have probably misunderstood what a deferred region implies.

681

This was a design decision -- if a branch region is split across expansions, I still want to keep a single branch region, and I want the startloc and endloc to be consistently in the same file/expansion.

So don't create a separate region for branch regions, but do reset the StartLoc/EndLoc accordingly. I can add this as a comment, or perhaps there is a better way to do this.

Admittedly the cases that bring this about are strange, e.g.:

extern void func2();
void func1(bool a, bool b) {
  #define CONDEND b)
  if (a == CONDEND {
      func2();
  }
}

but they can happen.

789

I'll change.

807

The push/pop combo does look strange, and you're right -- I can just add it directly to SourceRegions, but the algorithm in popRegions() will do additional things like adjust the start/end location if the region spans a nested file depth. I think that's still useful to do. Another option is I could factor that algorithm out of popRegions() into a new function and also call that from createBranchRegions().

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659 ↗(On Diff #280221)

I think using tooltips is a great idea, and I did look into it. Ultimately, I decided to defer that work to a future patch to allow for some time to research a good way to properly distinguish the branch-taken counts from the region counts.

vsk added inline comments.Aug 14 2020, 4:21 PM
clang/lib/CodeGen/CodeGenFunction.h
4359

I agree that there isn't a need to insert a fresh counter to accommodate a logical NOT operator in a condition. But I don't see how an assignment expression effectively creates a new, evaluatable condition. It seems like the count for the assignment expr can be derived by looking up the count for its parent region.

At this stage I'm more interested in understanding the high-level design. If the question of whether/not to add fresh counters for assignment exprs in a condition is effectively about optimization, then I'm fine with tabling it.

clang/lib/CodeGen/CoverageMappingGen.cpp
582

Oops, I see now that this selects a different SourceMappingRegion constructor than the one I'd thought of. In general, no, adding a region to RegionStack doesn't always entail setting up a deferred region. It looks like you're not doing that here. (An aside: deferred regions were introduced to fix line execution counts for lines following terminator statements, like a 'return'.)

807

If it's possible/straightforward to factor the region start/end adjustment code out of popRegions(), that would be really nice. If not, the current approach looks reasonable.

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659 ↗(On Diff #280221)

Sounds reasonable. Can the branch coverage text/html output should be opt-in via a cl::opt until it's finalized? That should make it possible to iterate on the format without changing what users see.

alanphipps marked 13 inline comments as done.Sun, Aug 23, 12:15 PM
In D84467#2180421, @vsk wrote:

I haven't taken a close look at the tests yet, but plan on getting to it soon.

I'm not sure whether this is something you've already tried, but for this kind of change, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.

Thank you for recommending this. I was able to do an instrumented stage2 build successfully, with assertions enabled, and it built fine, and no problems running LIT. I have to say that it was also cool to see branch coverage on the source code I added to support branch coverage!

clang/lib/CodeGen/CoverageMappingGen.cpp
807

I looked at it again, and I decided to table the refactor for now. The adjustment code is coupled to SourceRegions as is the rest of popRegions(), and it may be necessary to rework the whole of the function to get a nicer refactor.

llvm/tools/llvm-cov/SourceCoverageViewHTML.cpp
659 ↗(On Diff #280221)

Yep, I added two options:

--show-branch-summary, which shows the total branch coverage in the summary report, and this is ON by default.
--show-branches={count,percent}, which shows source-level coverage on a given file, which is OFF by default. So it's opt-in.

alanphipps marked 2 inline comments as done.

Updating for formatting and comments (and some test adjustments after rebase). Bypassing logical-NOT operators in CodeGenFunction::isLeafCondition(), which checks the condition for the presence of logical-AND and logical-OR. (this can be further expanded, if necessary)

alanphipps marked an inline comment as done.Mon, Sep 7, 9:03 AM
alanphipps added inline comments.
clang/lib/CodeGen/CodeGenFunction.h
4359

I just realized I didn't make this name change to isInstrumentedCondition(). I will do that.

alanphipps updated this revision to Diff 290301.Mon, Sep 7, 9:05 AM
  • Rename isLeafCondition() to isInstrumentedCondition() and rephrase associated comments
  • Add branch regions on C++ range-based loops
vsk added a comment.Wed, Sep 9, 12:46 PM

@alanphipps thanks for bearing with me. I think this is about ready to land. I do ask that you back out any punctuation/whitespace changes in code/tests that aren't directly modified in your diff (the clang-format-diff script can help with this). It looks like some libCoverage and test changes are no longer in the current version of this patch, as compared to the previous version (https://reviews.llvm.org/D84467?id=287269). Was that intentional?