This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add a boolean field in DILocation to know if a line must covered or not
ClosedPublic

Authored by calixte on Jul 27 2018, 7:13 AM.

Details

Summary

Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916).

Diff Detail

Event Timeline

calixte created this revision.Jul 27 2018, 7:13 AM
calixte edited the summary of this revision. (Show Details)Jul 27 2018, 7:23 AM
vsk added a reviewer: vsk.Jul 30 2018, 10:12 AM
vsk added a project: debug-info.
vsk added a subscriber: vsk.

Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.

Could you share an example of this resulting in an incorrect coverage report?

The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.

This requires adding an operand to DILocation, which is a fairly expensive thing to do. Have you considered any alternatives?

A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916).

In D49915#1180453, @vsk wrote:

Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.

Could you share an example of this resulting in an incorrect coverage report?

There are some examples in the tests that are fixed here: https://reviews.llvm.org/D49917.

The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.

This requires adding an operand to DILocation, which is a fairly expensive thing to do. Have you considered any alternatives?

Both me and Calixte had reservations about adding more data to DILocation too, but we couldn't find an alternative. It's possible that there might be some alternative we can't figure out due to our lack of experience with LLVM.
One thing I tried is not assigning lines to instructions that are TerminatorInst and !ReturnInst, but I think this causes us to skip some lines that we don't want to skip (e.g. the break clause in a switch).

vsk added a comment.Jul 30 2018, 11:56 AM
In D49915#1180453, @vsk wrote:

Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.

Could you share an example of this resulting in an incorrect coverage report?

There are some examples in the tests that are fixed here: https://reviews.llvm.org/D49917.

I see, thanks.

The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.

This requires adding an operand to DILocation, which is a fairly expensive thing to do. Have you considered any alternatives?

Both me and Calixte had reservations about adding more data to DILocation too, but we couldn't find an alternative. It's possible that there might be some alternative we can't figure out due to our lack of experience with LLVM.
One thing I tried is not assigning lines to instructions that are TerminatorInst and !ReturnInst, but I think this causes us to skip some lines that we don't want to skip (e.g. the break clause in a switch).

Perhaps you could borrow the high bit from the discriminator operand?

This looks problematic - seems suspicious that the covered value is a bool member of DILocation, but nothing else is - everything else is stored in the metadata proper. Is there a good reason for that? (it provably doesn't need to be serialized? Could this be stored as side-data in a map in an analysis instead?)

I couldn't quite spot it - where's the code that sets or constructs a DILocation with covered = true?

I couldn't quite spot it - where's the code that sets or constructs a DILocation with covered = true?

https://reviews.llvm.org/D49916

BTW, I'd probably rename covered to coverable to make it more clear.

rnk added a subscriber: rnk.Jul 31 2018, 5:22 PM

I agree, this patch needs serialization through bitcode and IR before it can land.

DILocation isn't only used for coverage, I think this should probably be a more general purpose flag, like "isImplicitCode", to indicate that the code was generated as part of a destructor cleanup, an epilogue, or something else that requires the compiler to generate IR that isn't obviously attributed to user source code. Then GCOV can decide for itself how it wants to generate coverage.

aprantl added a subscriber: aprantl.Aug 1 2018, 7:48 AM

DILocation isn't only used for coverage, I think this should probably be a more general purpose flag, like "isImplicitCode", to indicate that the code was generated as part of a destructor cleanup, an epilogue, or something else that requires the compiler to generate IR that isn't obviously attributed to user source code. Then GCOV can decide for itself how it wants to generate coverage.

We have a marker for compiler-generated code that has no relation to any source code; it's the special line number 0. This sounds like this patch is trying to work around a frontend bug, and really the frontend shouldn't stick misleading line numbers on the code? There's an RAII object in clang to generate locations for compiler/auto-generated code.

vsk added a comment.Aug 1 2018, 9:33 AM

DILocation isn't only used for coverage, I think this should probably be a more general purpose flag, like "isImplicitCode", to indicate that the code was generated as part of a destructor cleanup, an epilogue, or something else that requires the compiler to generate IR that isn't obviously attributed to user source code. Then GCOV can decide for itself how it wants to generate coverage.

We have a marker for compiler-generated code that has no relation to any source code; it's the special line number 0. This sounds like this patch is trying to work around a frontend bug, and really the frontend shouldn't stick misleading line numbers on the code? There's an RAII object in clang to generate locations for compiler/auto-generated code.

A caveat to this: plenty of users (and plenty of tests) expect to be able to set breakpoints on scope-closing curly braces. Assigning line 0 to those braces would at the minimum break a lot of tests, and would likely be viewed as a serious regression.

rnk added a comment.Aug 1 2018, 10:56 AM
In D49915#1184440, @vsk wrote:

A caveat to this: plenty of users (and plenty of tests) expect to be able to set breakpoints on scope-closing curly braces. Assigning line 0 to those braces would at the minimum break a lot of tests, and would likely be viewed as a serious regression.

Right. The frontend should try to encode enough information that the various consumers of DILocations can decide for themselves what tables they should emit. Our line tables should probably stay as they are now, and the various code coverage and instr profiling implementations can do whatever they think is appropriate with this information. With that in mind, if we're going to grow DILocation, what we really want is probably a set of location flags, if we're going to grow DILocation.

In D49915#1184574, @rnk wrote:
In D49915#1184440, @vsk wrote:

A caveat to this: plenty of users (and plenty of tests) expect to be able to set breakpoints on scope-closing curly braces. Assigning line 0 to those braces would at the minimum break a lot of tests, and would likely be viewed as a serious regression.

Right. The frontend should try to encode enough information that the various consumers of DILocations can decide for themselves what tables they should emit. Our line tables should probably stay as they are now, and the various code coverage and instr profiling implementations can do whatever they think is appropriate with this information. With that in mind, if we're going to grow DILocation, what we really want is probably a set of location flags, if we're going to grow DILocation.

Yes, c.f the source-based coverage instrumentation which ended up needing two bits of information from the frontend per segment: http://llvm.org/doxygen/structllvm_1_1coverage_1_1CoverageSegment.html.

(Btw I'm not convinced we have to grow DILocation; I'm not aware of the use case of enabling SamplePGO along with GCOV instrumentation, so I don't see why it's not feasible to borrow 1-2 bits from the discriminator operand. + @xur & @danielcdh for any corrections here)

rnk added a comment.Aug 1 2018, 11:29 AM
In D49915#1184590, @vsk wrote:
In D49915#1184574, @rnk wrote:
In D49915#1184440, @vsk wrote:

A caveat to this: plenty of users (and plenty of tests) expect to be able to set breakpoints on scope-closing curly braces. Assigning line 0 to those braces would at the minimum break a lot of tests, and would likely be viewed as a serious regression.

Right. The frontend should try to encode enough information that the various consumers of DILocations can decide for themselves what tables they should emit. Our line tables should probably stay as they are now, and the various code coverage and instr profiling implementations can do whatever they think is appropriate with this information. With that in mind, if we're going to grow DILocation, what we really want is probably a set of location flags, if we're going to grow DILocation.

Yes, c.f the source-based coverage instrumentation which ended up needing two bits of information from the frontend per segment: http://llvm.org/doxygen/structllvm_1_1coverage_1_1CoverageSegment.html.

(Btw I'm not convinced we have to grow DILocation; I'm not aware of the use case of enabling SamplePGO along with GCOV instrumentation, so I don't see why it's not feasible to borrow 1-2 bits from the discriminator operand. + @xur & @danielcdh for any corrections here)

The discriminator seems to be stored in a separate parent scope, though. I'm not sure that's really the right place to store this bit.

DILocation is a subclass of MDNode which is itself a subclass of Metadata. And in Metadata there is "unsigned char Storage" which uses only 2 bits so 6 bits are available here.
Maybe I could add "unsigned char ImplicitCode : 1" and "unsigned char Storage: 7" in class Metadata and so don't increase the DILocation size.
What do you think ?

calixte updated this revision to Diff 163059.Aug 29 2018, 5:49 AM
  • Use an available bit in Metadata to not increase the memory use
  • Update Writer/Reader for IR and Bitcode to take into account the new flag
calixte updated this revision to Diff 164634.Sep 10 2018, 2:06 AM
  • Fix an error in MetadataLoader.cpp
  • Update tests to take into account the new added field
aprantl added inline comments.Sep 10 2018, 9:31 AM
include/llvm/IR/DebugInfoMetadata.h
1445

For the casual reader it will be entirely non-obvious what "implicit" code means. Can you add the explanation that you added at the top of the review to the doxygen comments (and perhaps to SourceLevelDebugging.rst ?

lib/IR/AsmWriter.cpp
1756 ↗(On Diff #164634)

It would be better to match the Parser and only print this if it deviates from the default value.

calixte updated this revision to Diff 164827.Sep 11 2018, 2:44 AM
  • AsmWriter.cpp: print isImplicitCode only when it's true
  • Improve documentation for the new added flag
rnk added a comment.Sep 17 2018, 7:24 PM

Needs tests, but I'm on board with this and the implementation.

docs/SourceLevelDebugging.rst
436 ↗(On Diff #164827)

"a real code" isn't very descriptive. Consider "doesn't correspond to source code written by the user".

446 ↗(On Diff #164827)

"explicitly"

include/llvm/IR/DebugInfoMetadata.h
1445

Code is a weird word, I believe it's more correct to say "... corresponds to implicit code."

include/llvm/IR/Metadata.h
69–72 ↗(On Diff #164827)

This will not achieve the layout you want with MSVC. I would recommend using uint8_t for both to ensure they are packed.

calixte updated this revision to Diff 166003.Sep 18 2018, 11:00 AM
  • Add a test: test/Bitcode/DILocation-implicit-code.ll
  • and update: test/Assembler/dilocation.ll
  • fix layout issue with MSVC
  • fix the doc
rnk accepted this revision.Sep 18 2018, 11:26 AM

Thanks, I think this looks good with a minor tweak to the test.

test/Bitcode/DILocation-implicit-code.ll
169–170 ↗(On Diff #166003)

I think it's best if you don't comment out the original DILocation metadata so that someone can easily regenerate the .bc file. Add a separate, new CHECK line. You should also ignore irrelevant !N markers if you can. Something like this is best, since llvm-dis could add some auto-upgrading that disrupts the metadata numbering:

; CHECK: !DILocation(line: 20, column: 1, scope: !{{[0-9]+}}, isImplicitCode: true)
; CHECK: !DILocation(line: 17, column: 5, scope: !{{[0-9]+}}, isImplicitCode: true)
This revision is now accepted and ready to land.Sep 18 2018, 11:26 AM
calixte updated this revision to Diff 166015.Sep 18 2018, 11:58 AM
  • Update test DILocation-implicit-code.ll
calixte updated this revision to Diff 166234.Sep 20 2018, 1:49 AM
  • Fix .bc for test
This revision was automatically updated to reflect the committed changes.
vsk added a comment.Sep 20 2018, 12:03 PM

This caused a backwards-compatibility issue which made llvm-dis crash when loading old bitcode (see https://reviews.llvm.org/rL342631#inline-2294). I've committed a fix in r342678 to get our bots back to a healthy state. I'm open to any post-commit review -- while I don't anticipate there being an issue, if there is one I'd rather iterate on a passing build.