Bug filled here: https://bugs.llvm.org/show_bug.cgi?id=45757.
Add comment to skipped regions so we don't track execution count for lines containing only comments.
Details
- Reviewers
hans vsk - Commits
- rGb46176bbb094: Reland [Coverage] Add comment to skipped regions
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Not sure if this is good idea to untrack comments, it breaks many tests under CoverageMapping.
I didn't look, but I'm surprised there are many coverage tests that have comments in them. If the comments are not important for those tests, maybe they could be removed? As long as we keep one that tracks the intended behavior of coverage on comments.
clang/lib/Parse/Parser.cpp | ||
---|---|---|
37 ↗ | (On Diff #277157) | I don't think this is the right way to do it. It seems this callback is intended for #if macros that exclude part of the file from preprocessing. |
Classfiying comments as SkippedRegion, https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h#L217
This will result overlapping SkippedRegion for comments and macro. Maybe, we shuold merge the overlapped SkippedRegion at the end.
Example:
File 0, 1:12 -> 6:2 = #0 Skipped,File 0, 2:3 -> 4:9 = 0 Skipped,File 0, 2:15 -> 2:25 = 0 1| 1|int main() { 2| | #ifdef TEST // comment 3| | int x = 1; 4| | #endif 5| 1| return 0; 6| 1|}
clang/lib/Parse/Parser.cpp | ||
---|---|---|
37 ↗ | (On Diff #277157) | I believe we want to classify comments as SkippedRegion, https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h#L217. |
The comments in those coverage tests are used for FileCheck, like //CHECK:. So, we can't remove those ones.
@zequanwu thanks for working on this. Instead of threading CommentSkipped through PPCallbacks, wdyt of taking advantage of the existing CommentHandler interface? To simplify things, we can add a static method like 'CoverageMappingGen::setupPreprocessorCallbacks(Preprocessor &)' and call that from CodeGenAction::CreateASTConsumer.
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp | ||
---|---|---|
152 ↗ | (On Diff #277538) | This seems like a lot of complexity to add to handle a narrow case. Is it necessary to merge skipped regions early in the process? Note that llvm's SegmentBuilder takes care of merging regions at the very end. |
183 ↗ | (On Diff #277538) | This is stack use after free, right? This needs to release ownership of the 'MergedRegions' container. |
Sorry, I don't quite understand this. I am already using ActionCommentHandler to handle this one inside HandleComment.
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp | ||
---|---|---|
152 ↗ | (On Diff #277538) | Merging at here is to avoid duplicate output for option -dump-coverage-mapping. Like the following example to avoid Skipped,File 0, 2:3 -> 4:9 = 0 and Skipped,File 0, 2:15 -> 2:25 = 0 to be outputted at the same time. They should be merged into 1 region Skipped,File 0, 2:3 -> 4:9 = 0 File 0, 1:12 -> 6:2 = #0 Skipped,File 0, 2:3 -> 4:9 = 0 Skipped,File 0, 2:15 -> 2:25 = 0 1| 1|int main() { 2| | #ifdef TEST // comment 3| | int x = 1; 4| | #endif 5| 1| return 0; 6| 1|} So, we need to do it early. |
ActionCommentHandler seems a bit specific to Sema. This is what I had in mind: https://github.com/vedantk/llvm-project/commit/5101dd5dcd92742805069dba6b3d6d6846ef3755.
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp | ||
---|---|---|
152 ↗ | (On Diff #277538) | I see that there are duplicate 'Skipped' regions emitted in this case, but I'm not sure what problems this causes. For testing, we could check that both skipped regions are emitted, or perhaps only check that the outermost skipped region is emitted and ignore others. Is there some other kind of hard error (like an assert) caused by having nested skipped regions? If not, perhaps it's not worth it to merge them. |
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp | ||
---|---|---|
152 ↗ | (On Diff #277538) | No, it doesn't cause any error. |
Before updating any tests, maybe it's worth doing a quick experiment with comments placed in different contexts, to see whether adding these skipped regions is really sufficient. For example, given:
1| for (auto x : collection) { 2| // Explain the loop. 3| }
The loop region covers lines 1-3. If we skip the comment range on line 2, does an execution count from the loop region still get picked up? I'd expect it to. It's possible that we need more information from the preprocessor about whether the line is fully comment/whitespace-only.
Here is what I got:
$ clang -fcoverage-mapping -fprofile-instr-generate /tmp/a.c -Xclang -dump-coverage-mapping && ./a.out && llvm-profdata merge -sparse default.profraw -o a.profdata && llvm-cov show ./a.out -instr-profile=a.profdata main: File 0, 1:12 -> 6:2 = #0 File 0, 2:19 -> 2:25 = (#0 + #1) File 0, 2:27 -> 2:30 = #1 Gap,File 0, 2:31 -> 2:32 = #1 File 0, 2:32 -> 4:4 = #1 Skipped,File 0, 3:5 -> 3:15 = 0 1| 1|int main() { 2| 11| for (int i = 0; i < 10; i++) { 3| | // comment 4| 10| } 5| 1| return 0; 6| 1|}
For those failed test cases, they are caused by extra Skipped, File ... lines which invalidate some CHECK-NEXT.
I could either change CHECK-NEXT to CHECK or moving those checks inside function to above the function, since comments outside functions will not be tracked.
The comments in those coverage tests are used for FileCheck, like //CHECK:. So, we can't remove those ones.
Oh, I didn't think about that :-)
It's a bit unusual and annoying that the test expectations end up affecting the test output. For the comments that are not really part of the test, maybe we could do something to exclude them, like running 'sed' before compiling, or wrapping them in #if 0 blocks or something.
But figuring that out seems less important than figuring out how to implement the feature, and I think the current approach looks promising.
Could you add an end-to-end llvm-cov test (see e.g. compiler-rt/test/profile/Linux/coverage_ctors.cpp)? Here are some important cases I think we should check:
- /* comment at the start of a line */ expr;
- expr; /* comment at the end of a line */
- expr; // comment at the end of a line
- // comment spanning entire line
In all but the last case, llvm-cov should report an execution count for the line. Testing multi-line variations of those cases would also be helpful.
clang/test/CoverageMapping/break.c | ||
---|---|---|
1–2 | Could you introduce a lit substitution in clang/test/lit.cfg.py for this (say, '%strip_comments'?), so that there's just one copy of the sed command? |
llvm-cov doesn't report the execution count for the first 3 liens... I will try to fix that.
The problem is that simply marking comment regions as SkippedRegion doesn't give enough information for llvm-cov about if there is code area before or after comments in the same line.
$ clang++ -fcoverage-mapping -fprofile-instr-generate /tmp/a.cpp -Xclang -dump-coverage-mapping && ./a.out && llvm-profdata merge -sparse default.profraw -o a.profdata && llvm-cov show ./a.out -instr-profile=a.profdata -debug-only="coverage-mapping" main: File 0, 1:12 -> 7:2 = #0 Skipped,File 0, 2:3 -> 2:39 = 0 Skipped,File 0, 3:15 -> 3:49 = 0 Skipped,File 0, 4:15 -> 4:46 = 0 Skipped,File 0, 5:3 -> 5:34 = 0 Counter in file 0 1:12 -> 7:2, #0 Counter in file 0 2:3 -> 2:39, 0 Counter in file 0 3:15 -> 3:49, 0 Counter in file 0 4:15 -> 4:46, 0 Counter in file 0 5:3 -> 5:34, 0 Emitting segments for file: /tmp/a.cpp Combined regions: 1:12 -> 7:2 (count=1) 2:3 -> 2:39 (count=0) 3:15 -> 3:49 (count=0) 4:15 -> 4:46 (count=0) 5:3 -> 5:34 (count=0) Segment at 1:12 (count = 1), RegionEntry Segment at 2:3 (count = 0), RegionEntry, Skipped Segment at 2:39 (count = 1) Segment at 3:15 (count = 0), RegionEntry, Skipped Segment at 3:49 (count = 1) Segment at 4:15 (count = 0), RegionEntry, Skipped Segment at 4:46 (count = 1) Segment at 5:3 (count = 0), RegionEntry, Skipped Segment at 5:34 (count = 1) Segment at 7:2 (count = 0), Skipped 1| 1|int main() { 2| | /* comment at the start of a line */ int x = 0; 3| | int y = 0; /* comment at the end of a line */ 4| | int z = 0; // comment at the end of a line 5| | // comment spanning entire line 6| 1| return 0; 7| 1|} 8| |
One way I could think is probably when we visit decl in CounterCoverageMappingBuilder, check for if there is SkippedRegion in the same line and then mark the decl range as CodeRegion. For the example, we will have 3 more CodeRegion which covers the ranges of int x = 0, int y = 0 and int z = 0.
! In D83592#2156758, @zequanwu wrote:
One way I could think is probably when we visit decl in CounterCoverageMappingBuilder, check for if there is SkippedRegion in the same line and then mark the decl range as CodeRegion. For the example, we will have 3 more CodeRegion which covers the ranges of int x = 0, int y = 0 and int z = 0.
I don't think that will work well, because a decl can span multiple lines (which in turn can include comments).
The key issue seems to be that -- even having SourceRanges for all comments -- we don't know whether a line has non-whitespace, non-comment tokens. If it does, the line should have an execution count (and we don't even need to bother with emitting a skipped region covering the line). It sounds like we need a different hook in the preprocessor that reports SourceRanges for lines with no non-comment, non-whitespace tokens. Practically, I think this can be wired up just like a CommentHandler (maybe it could be called WhitespaceHandler?). One side-benefit of introducing a new hook like this is that coverage will handle whitespace-only lines the same way as comment-only lines.
There's probably more than one way to approach this. One thing to be aware of: llvm's LineCoverageStats class is the one responsible for determining whether a line is "mapped" (i.e. whether it has an execution count). It assumes that when a line begins with a skipped region, the subsequent portion of the line must also be skipped. If we only emit skipped regions for fully whitespace-like lines, that's fine. If not, depending on how C-style comments are handled, that may need updating.
Rebase and address comments.
- Keep track of token locations before and after comment regions by onToken in Preprocessor so when emitting SkippedRegion we can determine if the region should be emitted or not.
- Add an end-to-end llvm-cov test case.
@zequanwu at a high level, I think this is the right approach and it looks nice overall. I do have some feedback (inline). Thanks!
clang/lib/CodeGen/CoverageMappingGen.cpp | ||
---|---|---|
305 | Could you restructure this to avoid mutating a SpellingRegion? This can aid debugging (the unmodified struct remains available for inspection). One way to do this might be to call the function "adjustSkippedRange" and have it return an Optional<SpellingRegion>. | |
306 | What exactly is 'Range' in updateSkippedRange? It seems like it's the {prev,next}TokLoc pair from SkippedRegion, but it's not obvious based on a cursory read. It would help to rename 'Range' to something more specific ('InnermostNonCommentRange'? 'OverlappingExecutedTokenRange'?). | |
clang/lib/CodeGen/CoverageMappingGen.h | ||
50 | It'd be more self-descriptive to replace the pair with some struct SkippedRange { SourceRange SkippedRange; SourceLocation PrevTokLoc, NextTokLoc; }. | |
51 | Optional<unsigned> LastCommentIndex would be slightly more idiomatic. | |
54–60 | Please leave short inline comments explaining the respective roles of PrevTokLoc and BeforeCommentLoc. | |
clang/test/lit.cfg.py | ||
96 | Bit of a nitpick, but I don't think it'll necessarily end up being more convenient to have "%T/%basename_t" hardcoded in this substitution (strawman, but: you can imagine a test that needs to strip comments from two different files). It'd be nice to be more explicit and write the run lines like: RUN: %strip_comments %s > %t.stripped.c | |
compiler-rt/test/profile/Linux/coverage_comments.cpp | ||
1 ↗ | (On Diff #279331) | I don't think there's anything Linux or gold-specific about this test. Could you move the test up one directory, so we can run it on Darwin/Windows/etc.? |
The testing looks really good. Just a few more requests for documentation (inline).
clang/lib/CodeGen/CoverageMappingGen.cpp | ||
---|---|---|
304 | This could use some description of what adjustment is done, e.g. "This shrinks the skipped range if it spans a line that contains a non-comment token. If shrinking the skipped range would make it empty, this returns None." | |
clang/lib/CodeGen/CoverageMappingGen.h | ||
36 | It'd be helpful to leave inline documentation for these fields as well. It's clear from context that 'Range' is the skipped source range, but it's not as clear what {Prev,Next}TokLoc are. | |
49 | Please end sentences with a '.' | |
54–60 | How does "Location of the token parsed before HandleComment is called. This is updated every time Preprocessor::Lex lexes a new token." sound? |
New tests should use [[#@LINE]] instead of [[@LINE]]
https://llvm.org/docs/CommandGuide/FileCheck.html
To support legacy uses of @LINE as a special string variable, FileCheck also accepts the following uses of @LINE with string substitution block syntax: [[@LINE]], [[@LINE+<offset>]] and [[@LINE-<offset>]] without any spaces inside the brackets and where offset is an integer.
compiler-rt/test/profile/coverage_comments.cpp | ||
---|---|---|
10 | I think you can just use CHECK-NEXT: for each line and remove duplicated code on the right side. See some recent gcov-* tests I added. |
Landed here, https://reviews.llvm.org/rGabd45154bdb6b76c5b480455eacc8c75b08242aa
I put the wrong diff ID..
compiler-rt/test/profile/coverage_comments.cpp | ||
---|---|---|
10 | Thanks, I removed the duplicated code when fixed a coverage test failure. |
Looks like this breaks tests on Windows: http://45.33.8.238/win/20315/step_7.txt
Please take a look, and if it takes a while to investigate please revert while you look.
This caused asserts in Chromium's coverage builds: https://bugs.chromium.org/p/chromium/issues/detail?id=1108352
I've reverted it in the meantime (238bbd48c5a5f84deca36e5df980241578f7c1df).
It also looks like the commit landed with the wrong Phabricator review number.
(This was committed within 10 minutes after it was approved. As a large change with updates to some sensitive places like clang/Lex/Preprocessor.h, this probably should have been waited a bit longer.)
@zequanwu I'm not sure whether this is something you've already tried, but for frontend changes, it can be helpful to verify that a stage2 clang self-host with assertions enabled completes successfully. For coverage specifically, it's possible to do that by setting '-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On' during the stage2 cmake step.
I didn't know that before. I will try.
When including some standard libraries, the assertion assert(SM.isWrittenInSameFile(LocStart, LocEnd) && "region spans multiple files"); fails, like <stdlib.h> or <cmath> etc, still investigating.
I found the problem is that I was trying to merge two skipped regions when they are adjacent.
So, the following code will fail at assertion, because the skipped regions are in different files, but are merged.
$ cat a.h // comment $ cat a.c #include "a.h" // comment int main() {}
Fix the bug about merging skipped regions. Simple don't merge, because adjustSkippedRange will handle those cases if NextTokLoc is invalid, which will just skip the second if condition.
Just tried the cmake option -DLLVM_BUILD_INSTRUMENTED_COVERAGE=On, the result still shows count for comments... but it pass the coverage_comments.cpp test case, need more investigation.
In Preprocessor.cpp, don't increment TokenCount if LexLevel == 0.
I tested this with chromium crypto_unittests. It doesn't track comments regions.
I don't why when using the cmake option -DLLVM_BUILD_INSTRUMENTED_COVERAGE=On to test coverage for clang itself, it does tracking comments.
Also, with that option on, llvm-cov crashes at assertion:llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp:578: virtual Expected<const char *> (anonymous namespace)::VersionedCovMapFuncRecordReader<llvm::coverage::Version4, unsigned long, llvm::support::little>::readCoverageHeader(const char *, const char *, BinaryCoverageReader::DecompressedData &) [Version = llvm::coverage::Version4, IntPtrT = unsigned long, Endian = llvm::support::little]: Assertion (CovMapVersion)CovHeader->getVersion<Endian>() == Version' failed.`
clang/lib/Lex/Preprocessor.cpp | ||
---|---|---|
973 | Can you explain why things are changing here? The idea of TokenCount is to simply count the preprocessor tokens. At this point I think you have more knowledge here than I did when I added it :) |
clang/lib/Lex/Preprocessor.cpp | ||
---|---|---|
973 | In CodeGen::CoverageMappingModuleGen::setUpCoverageCallbacks, I set OnToken to a lambda to update PrevTokLoc after lexing a new token. But, the original condition if (LexLevel == 0 && !Result.getFlag(Token::IsReinjected)) will not call OnToken if it's lexing a directive, like #if, #define etc, because the LexLevel is greater than 0. In order to update PrevTokLoc even when lexing directive, I add PreprocessToken which will be set to true in CodeGen::CoverageMappingModuleGen::setUpCoverageCallbacks just to invoke OnToken without increment TokenCount when lexing directives. |
Change newSR.ColumnEnd = 1 to newSR.ColumnEnd = SR.ColumnStart + 1 to ensure ColumnEnd >= ColumnStart.
clang/lib/Lex/Preprocessor.cpp | ||
---|---|---|
973 | Ah, I see. That sounds okay to me. |
Just to double-check: are you using llvm-{cov,profdata} binaries from the stage1 llvm build to produce reports for stage2 (coverage-instrumented) binaries?
I didn't config correctly before. I just tried. It works correctly now, not tracking comments.
@zequanwu the change that dropped merging of adjacent comments looks good. If no further issues cropped up during stage2 testing, please feel free to re-land. Thanks!
It'd be helpful to leave inline documentation for these fields as well. It's clear from context that 'Range' is the skipped source range, but it's not as clear what {Prev,Next}TokLoc are.