This is an archive of the discontinued LLVM Phabricator instance.

[Coverage] Handle invalid end location of an expression/statement.
ClosedPublic

Authored by zequanwu on Mar 28 2023, 11:44 AM.

Details

Summary

Fix a crash when an expression/statement can have valid start location but invalid end location in some situations. For example: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536

This confuses CounterCoverageMappingBuilder when popping a region from region
stack as if the end location is a macro or include location.

Diff Detail

Event Timeline

zequanwu created this revision.Mar 28 2023, 11:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 11:44 AM
zequanwu requested review of this revision.Mar 28 2023, 11:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 11:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
zequanwu updated this revision to Diff 509098.Mar 28 2023, 12:28 PM
  • Handle invalid start location.
  • Add a test case.

I'm trying to add the test case from: https://github.com/llvm/llvm-project/issues/45481#issuecomment-981028897, but lit test fails in <compare> not found.

I'm trying to add the test case from: https://github.com/llvm/llvm-project/issues/45481#issuecomment-981028897, but lit test fails in <compare> not found.

You can just define your own strong_ordering like this test clang/test/SemaCXX/warn-zero-nullptr-cxx20.cpp

hans added a comment.Apr 3 2023, 1:41 AM

I'm not familiar with this code. I suppose the question is whether it's reasonable for this code to expect that the source locations are always valid or not?

clang/lib/CodeGen/CoverageMappingGen.cpp
605

nit: the "the" is not needed

clang/test/CoverageMapping/invalid_location.cpp
31

s/NETX/NEXT/ here and below?

zequanwu marked an inline comment as done.Apr 3 2023, 8:43 AM

I'm not familiar with this code. I suppose the question is whether it's reasonable for this code to expect that the source locations are always valid or not?

Yes.

For 0 ? T<C>{} : T<C>{};, the both branches have valid start location but invalid end location. See comments at https://github.com/llvm/llvm-project/issues/45481#issuecomment-1487267814.

For the std::strong_ordering, I found that DeclRefExpr in the ConditionalOperator's true branch has invalid start and end locations. It might because it's inside a PseudoObjectExpr. Maybe we should simply just skip visiting PseudoObjectExpr, I see that its begin and end location are the same. I'm not familiar with that expression, so, I just handled it by adding checks for validating begin and end locations.

PseudoObjectExpr 0x5555629f3bf0 'const strong_ordering':'const class std::strong_ordering' lvalue
|-BinaryOperator 0x5555629f3bd0 'const strong_ordering':'const class std::strong_ordering' lvalue '<=>'
| |-MemberExpr 0x5555629f3848 'const S':'const struct S' lvalue .value 0x5555629f34d0
| | `-DeclRefExpr 0x5555629f3808 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f31b0 'lhs' 'const MyStruct &'
| `-MemberExpr 0x5555629f3878 'const S':'const struct S' lvalue .value 0x5555629f34d0
|   `-DeclRefExpr 0x5555629f3828 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f3238 'rhs' 'const MyStruct &'
|-OpaqueValueExpr 0x5555629f38a8 'const S':'const struct S' lvalue
| `-MemberExpr 0x5555629f3848 'const S':'const struct S' lvalue .value 0x5555629f34d0
|   `-DeclRefExpr 0x5555629f3808 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f31b0 'lhs' 'const MyStruct &'
|-OpaqueValueExpr 0x5555629f38c0 'const S':'const struct S' lvalue
| `-MemberExpr 0x5555629f3878 'const S':'const struct S' lvalue .value 0x5555629f34d0
|   `-DeclRefExpr 0x5555629f3828 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f3238 'rhs' 'const MyStruct &'
`-ConditionalOperator 0x5555629f3ba0 'const strong_ordering':'const class std::strong_ordering' lvalue
  |-CXXOperatorCallExpr 0x5555629f3970 '_Bool' '==' adl
  | |-ImplicitCastExpr 0x5555629f3958 '_Bool (*)(const S &, const S &)' <FunctionToPointerDecay>
  | | `-DeclRefExpr 0x5555629f38d8 '_Bool (const S &, const S &)' lvalue Function 0x5555629f2a40 'operator==' '_Bool (const S &, const S &)'
  | |-OpaqueValueExpr 0x5555629f38a8 'const S':'const struct S' lvalue
  | | `-MemberExpr 0x5555629f3848 'const S':'const struct S' lvalue .value 0x5555629f34d0
  | |   `-DeclRefExpr 0x5555629f3808 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f31b0 'lhs' 'const MyStruct &'
  | `-OpaqueValueExpr 0x5555629f38c0 'const S':'const struct S' lvalue
  |   `-MemberExpr 0x5555629f3878 'const S':'const struct S' lvalue .value 0x5555629f34d0
  |     `-DeclRefExpr 0x5555629f3828 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f3238 'rhs' 'const MyStruct &'
  |-DeclRefExpr 0x5555629f3b80 'const strong_ordering':'const class std::strong_ordering' lvalue Var 0x5555629a4ed8 'equal' 'const strong_ordering':'const class std::strong_ordering'
  `-ConditionalOperator 0x5555629f3b50 'const strong_ordering':'const class std::strong_ordering' lvalue
    |-CXXOperatorCallExpr 0x5555629f3ab0 '_Bool' '<' adl
    | |-ImplicitCastExpr 0x5555629f3a98 '_Bool (*)(const S &, const S &)' <FunctionToPointerDecay>
    | | `-DeclRefExpr 0x5555629f3a78 '_Bool (const S &, const S &)' lvalue Function 0x5555629f27a0 'operator<' '_Bool (const S &, const S &)'
    | |-OpaqueValueExpr 0x5555629f38a8 'const S':'const struct S' lvalue
    | | `-MemberExpr 0x5555629f3848 'const S':'const struct S' lvalue .value 0x5555629f34d0
    | |   `-DeclRefExpr 0x5555629f3808 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f31b0 'lhs' 'const MyStruct &'
    | `-OpaqueValueExpr 0x5555629f38c0 'const S':'const struct S' lvalue
    |   `-MemberExpr 0x5555629f3878 'const S':'const struct S' lvalue .value 0x5555629f34d0
    |     `-DeclRefExpr 0x5555629f3828 'const MyStruct':'const struct MyStruct' lvalue ParmVar 0x5555629f3238 'rhs' 'const MyStruct &'
    |-DeclRefExpr 0x5555629f3b30 'const strong_ordering':'const class std::strong_ordering' lvalue Var 0x5555629a4c50 'less' 'const strong_ordering':'const class std::strong_ordering'
    `-DeclRefExpr 0x5555629f3ae8 'const strong_ordering':'const class std::strong_ordering' lvalue Var 0x5555629a5388 'greater' 'const strong_ordering':'const class std::strong_ordering'
clang/test/CoverageMapping/invalid_location.cpp
31

These 2 "CHECK-NETX" are misspelled on purpose. Once the FIXME is fixed, we should correct the spelling here and below.

Updated the FIXME.

zequanwu updated this revision to Diff 510514.Apr 3 2023, 8:43 AM

Address comments.

hans added a comment.Apr 4 2023, 1:00 AM

I'm not familiar with this code. I suppose the question is whether it's reasonable for this code to expect that the source locations are always valid or not?

Yes.

For 0 ? T<C>{} : T<C>{};, the both branches have valid start location but invalid end location. See comments at https://github.com/llvm/llvm-project/issues/45481#issuecomment-1487267814.

For the std::strong_ordering, I found that DeclRefExpr in the ConditionalOperator's true branch has invalid start and end locations. It might because it's inside a PseudoObjectExpr. Maybe we should simply just skip visiting PseudoObjectExpr, I see that its begin and end location are the same. I'm not familiar with that expression, so, I just handled it by adding checks for validating begin and end locations.

Right, I'm just wondering if it's reasonable that this code should have to handle such situations. (It seems it can't handle it perfectly anyway, since the coverage won't be completely correct.) Maybe Aaron can comment on what the expectations are for these locations.

I'm not familiar with this code. I suppose the question is whether it's reasonable for this code to expect that the source locations are always valid or not?

Yes.

For 0 ? T<C>{} : T<C>{};, the both branches have valid start location but invalid end location. See comments at https://github.com/llvm/llvm-project/issues/45481#issuecomment-1487267814.

For the std::strong_ordering, I found that DeclRefExpr in the ConditionalOperator's true branch has invalid start and end locations. It might because it's inside a PseudoObjectExpr. Maybe we should simply just skip visiting PseudoObjectExpr, I see that its begin and end location are the same. I'm not familiar with that expression, so, I just handled it by adding checks for validating begin and end locations.

Right, I'm just wondering if it's reasonable that this code should have to handle such situations. (It seems it can't handle it perfectly anyway, since the coverage won't be completely correct.) Maybe Aaron can comment on what the expectations are for these locations.

PseudoObjectExpr is a strange little beast in that it's an abstract object in the AST. It gets used when there's is a particular syntax that is modelled directly by the language as a series of expressions (e.g., writing foo->x = 12; in the source, but due to some language feature like __declspec(property), the semantics are foo->setX(12); as a member function call instead). So it has multiple source locations that are of interest and you have to know what information you're after -- the syntactic location or one of the semantic locations. I'm not super familiar with the code coverage machinery, but it looks like it's walking over the semantic expressions rather than the syntactic one, and that seems a bit fishy to me because I'd assume that coverage is intended to describe what the user actually wrote in their source. So I don't think I'd skip the PseudoObjectExpr, but you might need to handle it differently in CounterCoverageMappingBuilder by adding a VisitPseudoObjectExpr() method and not walking over *all* of the children, but only the syntactic form of the expression.

zequanwu updated this revision to Diff 510938.Apr 4 2023, 3:05 PM

Split to another patch: D147569.

zequanwu edited the summary of this revision. (Show Details)Apr 4 2023, 3:05 PM
hans accepted this revision.Apr 11 2023, 9:01 AM

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

This revision is now accepted and ready to land.Apr 11 2023, 9:01 AM

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

I'm confused -- I thought D147569 resolved the issue and so this patch is no longer needed?

zequanwu edited the summary of this revision. (Show Details)EditedApr 11 2023, 9:11 AM

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

I'm confused -- I thought D147569 resolved the issue and so this patch is no longer needed?

D147569 fixes https://github.com/llvm/llvm-project/issues/45481, which is caused by not visiting PseudoObjectExpr. That makes the visitor visits the semantic expression of PseudoObjectExpr which doesn't have source location.

This one fixes another issue crbug.com/1427933, which is caused by some expressions don't have source location due to some work around [1]. Their stack traces look similar but not caused by the same issue.

Updated the link in summary to: [1]

[1]: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

I'm confused -- I thought D147569 resolved the issue and so this patch is no longer needed?

D147569 fixes https://github.com/llvm/llvm-project/issues/45481. This one fixes another issue crbug.com/1427933. Their stack trace look similar but not caused by the same issue.

Updated the link in summary to: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536

Thank you for clarifying, I was confused. :-)

I don't think the changes here are correct either -- it's glossing over an issue that we're not properly tracking the source location in the AST. Playing around with the reduced example is interesting though. If you remove the default argument in the T constructor, the issue goes away. If you stop using a forward declaration in the instantiation of T, the issue goes away. If S1 isn't a template, the issue goes away (but the issue will come back if you then make foo() a template instead of S1). So it seems that something about template instantiation is dropping the source location information (perhaps) and we should be trying to track down where that is to fix the root cause rather than work around it here for coverage mapping alone. (The end location being incorrect can impact other things that are harder to test because it'll be for things like fix-its that don't work properly, which are easy to miss.)

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

I'm confused -- I thought D147569 resolved the issue and so this patch is no longer needed?

D147569 fixes https://github.com/llvm/llvm-project/issues/45481. This one fixes another issue crbug.com/1427933. Their stack trace look similar but not caused by the same issue.

Updated the link in summary to: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536

Thank you for clarifying, I was confused. :-)

I don't think the changes here are correct either -- it's glossing over an issue that we're not properly tracking the source location in the AST. Playing around with the reduced example is interesting though. If you remove the default argument in the T constructor, the issue goes away. If you stop using a forward declaration in the instantiation of T, the issue goes away. If S1 isn't a template, the issue goes away (but the issue will come back if you then make foo() a template instead of S1). So it seems that something about template instantiation is dropping the source location information (perhaps) and we should be trying to track down where that is to fix the root cause rather than work around it here for coverage mapping alone. (The end location being incorrect can impact other things that are harder to test because it'll be for things like fix-its that don't work properly, which are easy to miss.)

I agree that the real fix is to fix the source location information. If I just change [1] to SourceRange Locs = SourceRange(LParenOrBraceLoc, RParenOrBraceLoc);, the crash is gone. However, as the "FIXME" comment in [1] says, it's an intended work-around to drop source locations here. So, I'm assuming this kind of source location work-around could happen in multiple places in clang, and could happen in the future as a temporary work-around. Instead of crashing clang coverage for those work-around, we can at least skip coverage info for those expressions/statements.

[1]: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1538

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

I'm confused -- I thought D147569 resolved the issue and so this patch is no longer needed?

D147569 fixes https://github.com/llvm/llvm-project/issues/45481. This one fixes another issue crbug.com/1427933. Their stack trace look similar but not caused by the same issue.

Updated the link in summary to: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536

Thank you for clarifying, I was confused. :-)

I don't think the changes here are correct either -- it's glossing over an issue that we're not properly tracking the source location in the AST. Playing around with the reduced example is interesting though. If you remove the default argument in the T constructor, the issue goes away. If you stop using a forward declaration in the instantiation of T, the issue goes away. If S1 isn't a template, the issue goes away (but the issue will come back if you then make foo() a template instead of S1). So it seems that something about template instantiation is dropping the source location information (perhaps) and we should be trying to track down where that is to fix the root cause rather than work around it here for coverage mapping alone. (The end location being incorrect can impact other things that are harder to test because it'll be for things like fix-its that don't work properly, which are easy to miss.)

I agree that the real fix is to fix the source location information. If I just change [1] to SourceRange Locs = SourceRange(LParenOrBraceLoc, RParenOrBraceLoc);, the crash is gone. However, as the "FIXME" comment in [1] says, it's an intended work-around to drop source locations here. So, I'm assuming this kind of source location work-around could happen in multiple places in clang, and could happen in the future as a temporary work-around. Instead of crashing clang coverage for those work-around, we can at least skip coverage info for those expressions/statements.

[1]: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1538

Ugh, that workaround is unfortunate.

You are correct that this same situation may come up in the future, but each time it happens, I think it's a bug in the FE that needs to be addressed. I worry that silencing the issues here will 1) lead to the actual bug staying in the code base forever, and 2) be used as justification for making the same workaround elsewhere in the code base, perhaps so often that it becomes "the way things are supposed to work".

Perhaps a way to split the middle would be to assert that the source locations are valid in coverage mapping, but then do the right thing in non-asserts builds instead of crashing. This way, we don't lose the benefit of knowing the issues happen in development builds, but we don't punish users of coverage mapping with the released product. WDYT?

Again not an expert here, but lgtm.

(Nit: the https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaExprCXX.cpp#L1528-L1530 link in the description seems to point to the wrong code now, since main changed. Here is a link for 16.0.1: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536)

I'm confused -- I thought D147569 resolved the issue and so this patch is no longer needed?

D147569 fixes https://github.com/llvm/llvm-project/issues/45481. This one fixes another issue crbug.com/1427933. Their stack trace look similar but not caused by the same issue.

Updated the link in summary to: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1536

Thank you for clarifying, I was confused. :-)

I don't think the changes here are correct either -- it's glossing over an issue that we're not properly tracking the source location in the AST. Playing around with the reduced example is interesting though. If you remove the default argument in the T constructor, the issue goes away. If you stop using a forward declaration in the instantiation of T, the issue goes away. If S1 isn't a template, the issue goes away (but the issue will come back if you then make foo() a template instead of S1). So it seems that something about template instantiation is dropping the source location information (perhaps) and we should be trying to track down where that is to fix the root cause rather than work around it here for coverage mapping alone. (The end location being incorrect can impact other things that are harder to test because it'll be for things like fix-its that don't work properly, which are easy to miss.)

I agree that the real fix is to fix the source location information. If I just change [1] to SourceRange Locs = SourceRange(LParenOrBraceLoc, RParenOrBraceLoc);, the crash is gone. However, as the "FIXME" comment in [1] says, it's an intended work-around to drop source locations here. So, I'm assuming this kind of source location work-around could happen in multiple places in clang, and could happen in the future as a temporary work-around. Instead of crashing clang coverage for those work-around, we can at least skip coverage info for those expressions/statements.

[1]: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.1/clang/lib/Sema/SemaExprCXX.cpp#L1538

Ugh, that workaround is unfortunate.

You are correct that this same situation may come up in the future, but each time it happens, I think it's a bug in the FE that needs to be addressed. I worry that silencing the issues here will 1) lead to the actual bug staying in the code base forever, and 2) be used as justification for making the same workaround elsewhere in the code base, perhaps so often that it becomes "the way things are supposed to work".

Perhaps a way to split the middle would be to assert that the source locations are valid in coverage mapping, but then do the right thing in non-asserts builds instead of crashing. This way, we don't lose the benefit of knowing the issues happen in development builds, but we don't punish users of coverage mapping with the released product. WDYT?

Won't this still cause assertion failure on assert builds? I don't quite understand "do the right thing in non-asserts builds instead of crashing".

Perhaps a way to split the middle would be to assert that the source locations are valid in coverage mapping, but then do the right thing in non-asserts builds instead of crashing. This way, we don't lose the benefit of knowing the issues happen in development builds, but we don't punish users of coverage mapping with the released product. WDYT?

Won't this still cause assertion failure on assert builds?

Yes, it will, but that's the goal in this case. I think we want the loud failure to tell us when we've

I don't quite understand "do the right thing in non-asserts builds instead of crashing".

For example:

// If either of these locations is invalid, something elsewhere in the compiler has broken...
assert(StartLoc && StartLoc->isInvalid() && "Start location is not valid");
assert(EndLoc && EndLoc->isInvalid() && "End location is not valid");

// ... however, we can still recover without crashing.
if (StartLoc && StartLoc->isInvalid())
  StartLoc = std::nullopt;
if (EndLoc && EndLoc->isInvalid())
  EndLoc = std::nullopt;
zequanwu updated this revision to Diff 512868.EditedApr 12 2023, 9:14 AM
  • Add assertion on source locations in pushRegion.
  • Removed test case because that would cause the new assertion failure in assert build.

Thank you! I think there's only a few small things left then:

  • Add a release note about the fact that we worked around this issue in non-asserts builds but that assert builds may see new assertions triggered from this and to file an issue with a reproducer if you hit the assertion.
  • File an issue with a simple reproducer for the case you know we're still not handling properly
  • Update the commit message to clarify that we're no longer fixing anything with these changes but are instead trying to catch the issues are more loudly than with a crash
zequanwu edited the summary of this revision. (Show Details)Apr 12 2023, 2:04 PM
zequanwu updated this revision to Diff 512971.EditedApr 12 2023, 2:04 PM
zequanwu edited the summary of this revision. (Show Details)
MaskRay added a subscriber: MaskRay.Nov 9 2023, 9:33 PM
MaskRay added inline comments.
clang/lib/CodeGen/CoverageMappingGen.cpp
607

Is this workaround still needed after b0e61de7075942ef5ac8af9ca1ec918317f62152 (with a test clang/test/Coverage/unresolved-ctor-expr.cpp)?

zequanwu added inline comments.Nov 10 2023, 3:28 PM
clang/lib/CodeGen/CoverageMappingGen.cpp
607

I think this workaround can be removed, verified it no longer crashes if reverting this change.