Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916).
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 22415 Build 22415: arc lint + arc unit
Event Timeline
Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.
Could you share an example of this resulting in an incorrect coverage report?
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
This requires adding an operand to DILocation, which is a fairly expensive thing to do. Have you considered any alternatives?
A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916).
There are some examples in the tests that are fixed here: https://reviews.llvm.org/D49917.
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
This requires adding an operand to DILocation, which is a fairly expensive thing to do. Have you considered any alternatives?
Both me and Calixte had reservations about adding more data to DILocation too, but we couldn't find an alternative. It's possible that there might be some alternative we can't figure out due to our lack of experience with LLVM.
One thing I tried is not assigning lines to instructions that are TerminatorInst and !ReturnInst, but I think this causes us to skip some lines that we don't want to skip (e.g. the break clause in a switch).
I see, thanks.
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
This requires adding an operand to DILocation, which is a fairly expensive thing to do. Have you considered any alternatives?
Both me and Calixte had reservations about adding more data to DILocation too, but we couldn't find an alternative. It's possible that there might be some alternative we can't figure out due to our lack of experience with LLVM.
One thing I tried is not assigning lines to instructions that are TerminatorInst and !ReturnInst, but I think this causes us to skip some lines that we don't want to skip (e.g. the break clause in a switch).
Perhaps you could borrow the high bit from the discriminator operand?
This looks problematic - seems suspicious that the covered value is a bool member of DILocation, but nothing else is - everything else is stored in the metadata proper. Is there a good reason for that? (it provably doesn't need to be serialized? Could this be stored as side-data in a map in an analysis instead?)
I couldn't quite spot it - where's the code that sets or constructs a DILocation with covered = true?
https://reviews.llvm.org/D49916
BTW, I'd probably rename covered to coverable to make it more clear.
I agree, this patch needs serialization through bitcode and IR before it can land.
DILocation isn't only used for coverage, I think this should probably be a more general purpose flag, like "isImplicitCode", to indicate that the code was generated as part of a destructor cleanup, an epilogue, or something else that requires the compiler to generate IR that isn't obviously attributed to user source code. Then GCOV can decide for itself how it wants to generate coverage.
DILocation isn't only used for coverage, I think this should probably be a more general purpose flag, like "isImplicitCode", to indicate that the code was generated as part of a destructor cleanup, an epilogue, or something else that requires the compiler to generate IR that isn't obviously attributed to user source code. Then GCOV can decide for itself how it wants to generate coverage.
We have a marker for compiler-generated code that has no relation to any source code; it's the special line number 0. This sounds like this patch is trying to work around a frontend bug, and really the frontend shouldn't stick misleading line numbers on the code? There's an RAII object in clang to generate locations for compiler/auto-generated code.
A caveat to this: plenty of users (and plenty of tests) expect to be able to set breakpoints on scope-closing curly braces. Assigning line 0 to those braces would at the minimum break a lot of tests, and would likely be viewed as a serious regression.
Right. The frontend should try to encode enough information that the various consumers of DILocations can decide for themselves what tables they should emit. Our line tables should probably stay as they are now, and the various code coverage and instr profiling implementations can do whatever they think is appropriate with this information. With that in mind, if we're going to grow DILocation, what we really want is probably a set of location flags, if we're going to grow DILocation.
Yes, c.f the source-based coverage instrumentation which ended up needing two bits of information from the frontend per segment: http://llvm.org/doxygen/structllvm_1_1coverage_1_1CoverageSegment.html.
(Btw I'm not convinced we have to grow DILocation; I'm not aware of the use case of enabling SamplePGO along with GCOV instrumentation, so I don't see why it's not feasible to borrow 1-2 bits from the discriminator operand. + @xur & @danielcdh for any corrections here)
The discriminator seems to be stored in a separate parent scope, though. I'm not sure that's really the right place to store this bit.
DILocation is a subclass of MDNode which is itself a subclass of Metadata. And in Metadata there is "unsigned char Storage" which uses only 2 bits so 6 bits are available here.
Maybe I could add "unsigned char ImplicitCode : 1" and "unsigned char Storage: 7" in class Metadata and so don't increase the DILocation size.
What do you think ?
- Use an available bit in Metadata to not increase the memory use
- Update Writer/Reader for IR and Bitcode to take into account the new flag
- Fix an error in MetadataLoader.cpp
- Update tests to take into account the new added field
include/llvm/IR/DebugInfoMetadata.h | ||
---|---|---|
1469 | For the casual reader it will be entirely non-obvious what "implicit" code means. Can you add the explanation that you added at the top of the review to the doxygen comments (and perhaps to SourceLevelDebugging.rst ? | |
lib/IR/AsmWriter.cpp | ||
1756 | It would be better to match the Parser and only print this if it deviates from the default value. |
- AsmWriter.cpp: print isImplicitCode only when it's true
- Improve documentation for the new added flag
Needs tests, but I'm on board with this and the implementation.
docs/SourceLevelDebugging.rst | ||
---|---|---|
436 ↗ | (On Diff #164827) | "a real code" isn't very descriptive. Consider "doesn't correspond to source code written by the user". |
446 ↗ | (On Diff #164827) | "explicitly" |
include/llvm/IR/DebugInfoMetadata.h | ||
1469 | Code is a weird word, I believe it's more correct to say "... corresponds to implicit code." | |
include/llvm/IR/Metadata.h | ||
69–74 | This will not achieve the layout you want with MSVC. I would recommend using uint8_t for both to ensure they are packed. |
- Add a test: test/Bitcode/DILocation-implicit-code.ll
- and update: test/Assembler/dilocation.ll
- fix layout issue with MSVC
- fix the doc
Thanks, I think this looks good with a minor tweak to the test.
test/Bitcode/DILocation-implicit-code.ll | ||
---|---|---|
169–170 ↗ | (On Diff #166003) | I think it's best if you don't comment out the original DILocation metadata so that someone can easily regenerate the .bc file. Add a separate, new CHECK line. You should also ignore irrelevant !N markers if you can. Something like this is best, since llvm-dis could add some auto-upgrading that disrupts the metadata numbering: ; CHECK: !DILocation(line: 20, column: 1, scope: !{{[0-9]+}}, isImplicitCode: true) ; CHECK: !DILocation(line: 17, column: 5, scope: !{{[0-9]+}}, isImplicitCode: true) |
This caused a backwards-compatibility issue which made llvm-dis crash when loading old bitcode (see https://reviews.llvm.org/rL342631#inline-2294). I've committed a fix in r342678 to get our bots back to a healthy state. I'm open to any post-commit review -- while I don't anticipate there being an issue, if there is one I'd rather iterate on a passing build.
For the casual reader it will be entirely non-obvious what "implicit" code means. Can you add the explanation that you added at the top of the review to the doxygen comments (and perhaps to SourceLevelDebugging.rst ?