Add a hook to track empty line regions when lexing.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
- Rebase.
- Merging adjcent skipped regions, if the regions are in the same file, could reduce binary size.
- The new empty lines regions break existing coverage tests, and don't know a good way to fix those. (probably add a null statement ; at each empty line?)
- For stage2 build of clang, I didn't notice a performance slowdown this time.
Hi @zequanwu, are you looking for review for this patch? I wasn't sure because of the WIP label.
clang/include/clang/Lex/Lexer.h | ||
---|---|---|
131 | Could you leave a comment describing what this is? | |
clang/include/clang/Lex/Preprocessor.h | ||
960 | const | |
clang/lib/CodeGen/CoverageMappingGen.cpp | ||
326 | It looks like this assumes there is some guarantee that the skipped range (as given by SR) is in the same file as {Prev,Next}TokLoc. If that's correct, can we go ahead and assert that? | |
clang/lib/Lex/Lexer.cpp | ||
3290 | Is NewLinePtr supposed to point to the start of the most recent newline sequence? For "\r\n", is it supposed to be "\r<NewLinePtr>\n" or "<NewLinePtr>\r\n"? | |
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
483–484 | Why is this deletion necessary? Do we need to introduce 0-length regions which alter the count? An example would help. | |
580 | I don't think this relaxation is correct, since it permits duplicate segments. This is confusing for reporting tools because it's not possible to work out which segment applies at a given source location. |
Address comments.
clang/lib/CodeGen/CoverageMappingGen.cpp | ||
---|---|---|
326 | Oh, it's a bug in https://reviews.llvm.org/D83592. There is no guarantee that they are in the same file. | |
clang/lib/Lex/Lexer.cpp | ||
3290 | I didn't consider this. Updated. | |
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
483–484 | Because a single empty line will be a 0 length region. I don't know why is this condition necessary before. Does zero-length region exists before this change? example: int main() { return 0; } Before, llvm-cov gives the following. Counter in file 0 1:12 -> 4:2, #0 Counter in file 0 2:1 -> 2:1, 0 Emitting segments for file: /tmp/a.c Combined regions: 1:12 -> 4:2 (count=1) 2:1 -> 2:1 (count=0) Segment at 1:12 (count = 1), RegionEntry Segment at 2:1 (count = 0), RegionEntry, Skipped Segment at 4:2 (count = 0), Skipped 1| 1|int main() { 2| | 3| | return 0; 4| |} After: Counter in file 0 1:12 -> 4:2, #0 Counter in file 0 2:1 -> 2:1, 0 Emitting segments for file: /tmp/a.c Combined regions: 1:12 -> 4:2 (count=1) 2:1 -> 2:1 (count=0) Segment at 1:12 (count = 1), RegionEntry Segment at 2:1 (count = 0), RegionEntry, Skipped Segment at 2:1 (count = 1) Segment at 4:2 (count = 0), Skipped 1| 1|int main() { 2| | 3| 1| return 0; 4| 1|} | |
580 | I don't remember why I made this change. Reverting it seems nothing changed. |
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
---|---|---|
580 | Oh, since single empty line will be a 0 length regions. L.Col could equal to R.Col. |
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
---|---|---|
483–484 | It looks like we do occasionally see 0-length regions, possibly due to bugs in the frontend (http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp.html#L485). I don't expect the reporting tools to be able to handle duplicate segments in a robust way. Having duplicate segments was the source of "messed up" coverage bugs in the past, due to the contradiction inherent in having two different segments begin at the same source location. Do you see some other way to represent empty lines? Perhaps, if we emit a skipped region for an empty line, we can emit a follow-up segment that restores the previously-active region starting on the next line? So in this case: Segment at 1:12 (count = 1), RegionEntry |
Add a wrapped segment at location of zero-length segment with last pushed region count.
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
---|---|---|
483–484 | I think we should have the following, because the wrapped segment's count will be used in next line (e.g. line 3). Segment at 1:12 (count = 1), RegionEntry Segment at 2:1 (count = 0), RegionEntry, Skipped Segment at 2:1 (count = 1) Segment at 4:2 (count = 0), Skipped |
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
---|---|---|
483–484 | I think one issue with that output is that it contains two segments that start at the same location (2:1). Historically, this sort of duplication was a source of bugs/inconsistencies (e.g two entry regions beginning at the same location with different counts), and I’m concerned that re-allowing multiple segments with the same start location could lead to regressions down the road. OTOH, your change is consistent with how non-zero length segments are handled, and it could be fragile to look for an alternative start location (like 3:1) that restores the count from before the skipped region. I’d be curious to know your thoughts on how to prevent regressions related to segments which share the same start location. Maybe they could only be allowed in this limited case? |
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
---|---|---|
483–484 | Yes, they are two segments with the same location, but only the first is a RegionEntry (might not cause the same bug you saw before). So the second one is only used as wrapped segment, and will not conflict with first segment RegionEntry. |
address comment.
emit second segment when first segment is skipped and activeregions is not empty.
I don't know who can be added as reviewers for the preprocessor part
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp | ||
---|---|---|
483–484 | Since this broke a unit test, which has 5 zero-length regions points to same location, I think it's reasonable only emitting the second segment when active regions is not empty. | |
580 | if (L.Line == R.Line && L.Col == R.Col && !L.HasCount) is more specific. |
I took a closer look at the lexer changes, and think that these look fine. Thanks again for working on this. LGTM with a minor change, described inline.
clang/lib/Lex/Lexer.cpp | ||
---|---|---|
2228 | It'd be more consistent to write this the way lastNewLine is initialized earlier in SkipWhitespace, e.g. as: if (*CurPtr == '\n') { lastNewLine = CurPtr; if (!NewLinePtr) NewLinePtr = CurPtr; } Maybe this could even be factored into a helper lambda, like setLastNewLine(char *Ptr) -- I'm not sure whether that'd be cleaner. |
Thanks for reviewing. One last thing I need to resolve is to handle the coverage test case as skipped regions are emitted for empty lines, and haven't thought a good way other than adding checks for those new skipped regions.
How does it sound to add a testing-only cl::opt that disables whitespace skipped regions? Tests that aren't focused on validating skipped regions can set this cl::opt.
- Address comment.
- Add cl::opt to disable emitting skipped regions for empty lines and comments (used on test case only), and update test cases under CoverageMapping with -mllvm -emptyline-comment-coverage=false.
- Remove the logic which only emits skipped regions for comments and empty lines parsing function body, because can't tell if it's inside class member function.
- Always set SR.ColumnStart to 1 in adjustSkippedRange.
clang/lib/CodeGen/CoverageMappingGen.cpp | ||
---|---|---|
38 | We might consider marking this as cl::Hidden. |
@zequanwu FTR, I don't have any outstanding concerns (I understand you might be asking for a second reviewer to chime in though).
Could you leave a comment describing what this is?