This is an archive of the discontinued LLVM Phabricator instance.

[Coverage] Build sorted and unique segments
ClosedPublic

Authored by vsk on Aug 16 2017, 7:01 PM.

Details

Summary

A coverage segment contains a starting line and column, an execution
count, and some other metadata. Clients of the coverage library use
segments to prepare line-oriented reports.

Users of the coverage library depend on segments being unique and sorted
in source order. Currently this is not guaranteed (this is why the clang
change which introduced deferred regions was reverted).

This commit documents the "unique and sorted" condition and asserts that
it holds. It also fixes the SegmentBuilder so that it produces correct
output in some edge cases.

Testing: I've added unit tests for some edge cases. I've also checked
that the new SegmentBuilder implementation is fully covered. Apart from
running check-profile and the llvm-cov tests, I've successfully used a
stage1 llvm-cov to prepare a coverage report for an instrumented clang
binary.

Diff Detail

Event Timeline

vsk created this revision.Aug 16 2017, 7:01 PM
ikudrin added inline comments.Aug 17 2017, 12:25 AM
include/llvm/ProfileData/Coverage/CoverageMapping.h
201

I love this change! But maybe it deserves a separate patch?

unittests/ProfileData/CoverageMappingTest.cpp
566

Could you remind me where zero-length regions like (3:21) might come from?
Right now, it looks like the area from 3:21 to 4:4 has no counter associated with it, but I'd expect that a counter from the top-level region (1:5 - 4:4) was used here.

vsk updated this revision to Diff 111592.Aug 17 2017, 5:43 PM
vsk marked 2 inline comments as done.
  • Address two comments by @ikudrin (split out an NFC change, emit better segments when handling 0-length regions).
  • I tested the updated patch in the same way, and was able to prepare a coverage report for clang without issue.
include/llvm/ProfileData/Coverage/CoverageMapping.h
201

I have split this change out (NFC) and rebased this patch on top of it (the new diff will still use LineColPair).

unittests/ProfileData/CoverageMappingTest.cpp
566

Zero-length regions can be emitted when we enter a macro expansion. And you're right, creating a segment at 3:21 with the count from 1:5 -> 4:4 would be more appropriate here. I will fix this.

vsk added a comment.Aug 29 2017, 1:05 PM

Friendly ping.

ikudrin edited edge metadata.Aug 31 2017, 4:44 AM

The patch doesn't apply to the current tip clearly. And after applying it manually and fix compilation errors, I have two failing tests for llvm-cov: tools/llvm-cov/showLineExecutionCounts.cpp and tools/llvm-cov/threads.c, both of which fail with "Coverage segments not unique or sorted" assertion.

I'd also prefer a set of patches where each corner case is fixed in a separate patch and which gradually lead to the final code. Of course, if it's possible. Right now it looks like just changing an old implementation to a new one, and it could be hard to investigate later and understand each particular decision in the code. What do you think?

vsk added a comment.Aug 31 2017, 9:44 AM

The patch doesn't apply to the current tip clearly. And after applying it manually and fix compilation errors, I have two failing tests for llvm-cov: tools/llvm-cov/showLineExecutionCounts.cpp and tools/llvm-cov/threads.c, both of which fail with "Coverage segments not unique or sorted" assertion.

Both of these tests load the same covmapping file. That file was generated by an older, buggy clang. That version of clang produced coverage mapping regions where the start of the region actually ended after the end of the region. I have separate patches locally which (1) teach the coverage mapping reader and writer to report a recoverable Error when they find bogus mapping regions produced by the frontend, and (2) regenerate the buggy covmapping file. The clang bug was fixed long ago.

I had planned on landing the changes in one go but I can commit those fixes right away if you'd prefer.

I'd also prefer a set of patches where each corner case is fixed in a separate patch and which gradually lead to the final code. Of course, if it's possible. Right now it looks like just changing an old implementation to a new one, and it could be hard to investigate later and understand each particular decision in the code. What do you think?

I'm a bit hesitant to do that. I'm not sure that splitting up the logic would result in individual patches which are easier to understand. It would also mean that none of the patches could be committed, since the stage2 coverage bot would hit the 'sorted & unique' assertion.

If there's any new logic which is obscure or not well-commented, I'd be happy to explain it in more detail or improve the comments.

Yes, if it's possible, I'd like to see all the patches (one way or another), just to have the whole picture.

I don't insist on dividing the patch if you find it impractical. Do I understand it right that you've added all the found edge cases as unit tests in ProfileData/CoverageMappingTest.cpp?

vsk added a comment.Sep 1 2017, 10:47 AM

Yes, if it's possible, I'd like to see all the patches (one way or another), just to have the whole picture.

Understood, D37387 is the only extra patch needed.

I don't insist on dividing the patch if you find it impractical. Do I understand it right that you've added all the found edge cases as unit tests in ProfileData/CoverageMappingTest.cpp?

Thanks, I do find it a bit impractical. I've added all the edge cases I found as separate unit tests. I've also completed a stage2 coverage-enabled run with an asserts-enabled clang/llvm-cov, and checked that the segment builder changes are fully covered.

ikudrin added inline comments.Sep 6 2017, 4:46 AM
lib/ProfileData/Coverage/CoverageMapping.cpp
373

We can't get here with Loc == None because the only call where it is None sets FirstCompletedRegion to 0. So, the check for Loc in assert is a bit misleading. I'd add a separate asset to check that Loc is set.

382

The last of completed regions might not be processed completely, here is the test:

ProfileWriter.addRecord({"func1", 0x1234, {1, 2, 3, 4}}, Err);
startFunction("func1", 0x1234);

addCMR(Counter::getCounter(0), "file1", 1, 1, 8, 8);
addCMR(Counter::getCounter(1), "file1", 2, 2, 5, 5);
addCMR(Counter::getCounter(2), "file1", 3, 3, 4, 4);
addCMR(Counter::getCounter(3), "file1", 6, 6, 7, 7);

EXPECT_THAT_ERROR(loadCoverageMapping(), Succeeded());
const auto FunctionRecords = LoadedCoverage->getCoveredFunctions();
const auto &FunctionRecord = *FunctionRecords.begin();
CoverageData Data = LoadedCoverage->getCoverageForFunction(FunctionRecord);
std::vector<CoverageSegment> Segments(Data.begin(), Data.end());

ASSERT_EQ(8U, Segments.size());
EXPECT_EQ(CoverageSegment(1, 1, 1, true), Segments[0]);
EXPECT_EQ(CoverageSegment(2, 2, 2, true), Segments[1]);
EXPECT_EQ(CoverageSegment(3, 3, 3, true), Segments[2]);
EXPECT_EQ(CoverageSegment(4, 4, 2, false), Segments[3]);
EXPECT_EQ(CoverageSegment(5, 5, 1, false), Segments[4]);
EXPECT_EQ(CoverageSegment(6, 6, 4, true), Segments[5]);
EXPECT_EQ(CoverageSegment(7, 7, 1, false), Segments[6]);
EXPECT_EQ(CoverageSegment(8, 8, false), Segments[7]);
432

There can't be another region after a zero-length region with the same StartLoc, right?
So, if you check this condition first, it'll simplify the code because there will be no need for 'else' branches and an empty 'then' block.

436

I'd prefer seeing these expressions as separate statements, like

// If it's the last region, emit a skipped segment.
const bool Skipped = (CR.index() + 1) == Regions.size();
startSegment(..., Skipped);

but it's for you to decide.

vsk marked 3 inline comments as done.Sep 6 2017, 11:08 AM

Thanks for your comments, I'll update the patch shortly.

lib/ProfileData/Coverage/CoverageMapping.cpp
373

Right, I'll tighten the assertion and the logic here.

382

Thank you very much for the catch and the test case! The problem is that there may be a "gap" between the end of the last completed region (5:5), and the start of the new region (6:6). In this case, the count from the last active region (1:1 -> 8:8, count=1) should apply to 5:5 -> 6:6.

This scenario was handled correctly in the case where there's only one completed region.

432

I think there can. If I'm understanding you correctly, this should be handled by the "handle_consecutive_regions_with_zero_length" unit test.

vsk updated this revision to Diff 114034.Sep 6 2017, 11:08 AM
vsk marked 2 inline comments as done.

Address review feedback.

ikudrin added inline comments.Sep 7 2017, 3:26 AM
lib/ProfileData/Coverage/CoverageMapping.cpp
370

I believe that this specific case, as long as the special handling of the last segment in the loop and adding a skipped segment in the end, all of them might be joined together. The function would be divided into the following simple steps:

  1. std_stable_sort
  2. In the loop, process all completed regions, apart from the last.
  3. Remeber a pointer to the last region.
  4. Remove completed regions from the list.
  5. Decide if we need to add a segment started at endLoc of the last region (only if it differs from Loc)
  6. Add a skipped or regular segment depending on ActiveRegions (if it is empty or not).
432

At this point, Regions are already sorted and combined. Thus:

  • if B follows A and A.startLoc() == B.startLoc() then A.endLoc() < B.endLoc().
  • You fixed the malformed case in the other patch, so for any region startLoc() <= endLoc().
  • Consequently, if B follows A and A.startLoc() == A.endLoc(), then B can't have the same startLoc() as A because it violates one of previous statements.

As a result, the code might be simplified to something like this:

if (CurStartLoc == CR.value.endLoc()) {
  ...
  continue;
}
if (CR.index + 1 == Regions.size() || CurStartLoc != Regions[CR.index() + 1].startLoc()
  startSegment(CR.value(), CurStartLoc, true);

ActiveRegions.push_back(&CR.value());

What do you think?

vsk marked 4 inline comments as done.Sep 7 2017, 11:26 AM

Thanks for your comments, I'll update the patch shortly.

lib/ProfileData/Coverage/CoverageMapping.cpp
370

Good point, I think this makes the change substantially simpler. I tested the updated patch in the same way (prepare a coverage report for clang with an asserts-enabled binary + unit/lit tests).

432

That makes sense.

vsk updated this revision to Diff 114217.Sep 7 2017, 11:27 AM
vsk marked 2 inline comments as done.

Address review feedback.

ikudrin accepted this revision.Sep 7 2017, 7:57 PM

LGTM.

This revision is now accepted and ready to land.Sep 7 2017, 7:57 PM
This revision was automatically updated to reflect the committed changes.