This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] fix modernize-use-nullptr false positive with spaceship operator comparisons
ClosedPublic

Authored by poelmanc on Jan 29 2021, 7:21 PM.

Details

Summary

clang-tidy -std=c++20 with modernize-use-nullptr mistakenly inserts nullptr in place of the comparison operator if the comparison internally expands in the AST to a rewritten spaceship operator. This can be reproduced by running the new modernize-use-nullptr-cxx20.cpp test without applying the supplied patch to UseNullptrCheck.cpp; the current clang-tidy will mistakenly replace:

result = (a1 < a2);

with

result = (a1 nullptr a2);

Diff Detail

Event Timeline

poelmanc created this revision.Jan 29 2021, 7:21 PM
poelmanc requested review of this revision.Jan 29 2021, 7:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2021, 7:21 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
poelmanc updated this revision to Diff 320286.Jan 29 2021, 11:48 PM

Fix test failure in modernize-use-nullptr-cxx20.cpp by replacing #include <compare> with some minimal equivalent std code.

Eugene.Zelenko set the repository for this revision to rG LLVM Github Monorepo.

Thanks for looking at this, its been on my todo list for long time.
Regarding the hasGrandparent idea, that could be recreated with hasParent(expr(hasParent(cxxRewrittenBinaryOperator())))
Though looking at the (trimmed) AST I'm not sure that's enough. Appears to have 4 levels of parents before reaching the cxxRewrittenBinaryOperator

-CXXRewrittenBinaryOperator <col:13, col:18> 'bool'
  -CXXOperatorCallExpr <col:13, col:16> 'bool' '<' adl
   |-ImplicitCastExpr <col:16> 'bool (*)(std::strong_ordering, __cmp_cat::__unspec) noexcept' <FunctionToPointerDecay>
   |-CXXOperatorCallExpr <col:13, col:18> 'std::strong_ordering':'std::strong_ordering' '<=>'
    -ImplicitCastExpr <col:16> '__cmp_cat::__unspec':'std::__cmp_cat::__unspec' <ConstructorConversion>
      -CXXConstructExpr <col:16> '__cmp_cat::__unspec':'std::__cmp_cat::__unspec' 'void (std::__cmp_cat::__unspec *) noexcept'
        -ImplicitCastExpr <col:16> 'std::__cmp_cat::__unspec *' <NullToPointer> // Cast we care about
          -IntegerLiteral <col:16> 'int' 0
clang-tools-extra/test/clang-tidy/checkers/modernize-use-nullptr-cxx20.cpp
4–5

We don't compile against a std library for tests so this comment is redundant.

njames93 retitled this revision from clang-tidy: modernize-use-nullptr mistakenly fixes rewritten spaceship operator comparisons to [clang-tidy] fix modernize-use-nullptr false positive with spaceship operator comparisons.Jan 30 2021, 4:10 PM
poelmanc updated this revision to Diff 320330.Jan 30 2021, 4:27 PM

Thanks to the great feedback, changed unless(hasAncestor(cxxRewrittenBinaryOperator())) to unless(hasParent(expr(hasParent(cxxRewrittenBinaryOperator())))) and added a test to verify the improvement (and removed an extraneous comment.)

poelmanc marked an inline comment as done.Jan 30 2021, 4:28 PM
poelmanc added a comment.EditedJan 30 2021, 4:37 PM

@njames93 Thank you so much for the quick feedback, I made your suggested changes and added a test that it properly converts result = (a1 > (ptr == 0 ? a1 : a2)); to result = (a1 > (ptr == nullptr ? a1 : a2)); now.

In these examples so far, checking the grandparent works. Here's the AST:

$ clang -std=c++20 -Xclang -ast-dump modernize-use-nullptr-cxx20.cpp
...
    |-BinaryOperator 0x22d30bec4e0 <line:41:3, col:20> 'bool' lvalue '=' // result = (a1 < a2);
    | |-DeclRefExpr 0x22d30be8960 <col:3> 'bool' lvalue Var 0x22d30be88e0 'result' 'bool'
    | `-ParenExpr 0x22d30bec4c0 <col:12, col:20> 'bool'
    |   `-CXXRewrittenBinaryOperator 0x22d30bec4a8 <col:13, col:18> 'bool'
    |     `-CXXOperatorCallExpr 0x22d30bec470 <col:13, col:16> 'bool' '<' adl
    |       |-ImplicitCastExpr 0x22d30bec458 <col:16> 'bool (*)(const std::strong_ordering, std::strong_ordering::zero_type) noexcept' <FunctionToPointerDecay>
    |       | `-DeclRefExpr 0x22d30bec3d8 <col:16> 'bool (const std::strong_ordering, std::strong_ordering::zero_type) noexcept' lvalue Function 0x22d30ba9f28 'operator<' 'bool (const std::strong_ordering, std::strong_ordering::zero_type) noexcept'
    |       |-CXXOperatorCallExpr 0x22d30bec360 <col:13, col:18> 'std::strong_ordering':'std::strong_ordering' '<=>'
    |       | |-ImplicitCastExpr 0x22d30bec348 <col:16> 'std::strong_ordering (*)(const A &) const noexcept' <FunctionToPointerDecay>
    |       | | `-DeclRefExpr 0x22d30be8a08 <col:16> 'std::strong_ordering (const A &) const noexcept' lvalue CXXMethod 0x22d30bedcd8 'operator<=>' 'std::strong_ordering (const A &) const noexcept'
    |       | |-ImplicitCastExpr 0x22d30be89f0 <col:13> 'const A' lvalue <NoOp>
    |       | | `-DeclRefExpr 0x22d30be8998 <col:13> 'A' lvalue Var 0x22d30be8290 'a1' 'A'
    |       | `-ImplicitCastExpr 0x22d30be89d8 <col:18> 'const A' lvalue <NoOp>
    |       |   `-DeclRefExpr 0x22d30be89b8 <col:18> 'A' lvalue Var 0x22d30be87f0 'a2' 'A'
    |       `-ImplicitCastExpr 0x22d30bec3c0 <col:16> 'std::strong_ordering::zero_type':'nullptr_t' <NullToPointer> // Auto-generated cast we should ignore
    |         `-IntegerLiteral 0x22d30bec398 <col:16> 'int' 0
...
    `-BinaryOperator 0x22d30becb20 <line:48:3, col:38> 'bool' lvalue '='  // result = (a1 > (ptr == 0 ? a1 : a2));
      |-DeclRefExpr 0x22d30bec830 <col:3> 'bool' lvalue Var 0x22d30be88e0 'result' 'bool'
      `-ParenExpr 0x22d30becb00 <col:12, col:38> 'bool'
        `-CXXRewrittenBinaryOperator 0x22d30becae8 <col:13, col:37> 'bool'
          `-CXXOperatorCallExpr 0x22d30becab0 <col:13, col:16> 'bool' '>' adl
            |-ImplicitCastExpr 0x22d30beca98 <col:16> 'bool (*)(const std::strong_ordering, std::strong_ordering::zero_type) noexcept' <FunctionToPointerDecay>
            | `-DeclRefExpr 0x22d30beca78 <col:16> 'bool (const std::strong_ordering, std::strong_ordering::zero_type) noexcept' lvalue Function 0x22d30baa1a0 'operator>' 'bool (const std::strong_ordering, std::strong_ordering::zero_type) noexcept'
            |-CXXOperatorCallExpr 0x22d30beca00 <col:13, col:37> 'std::strong_ordering':'std::strong_ordering' '<=>'
            | |-ImplicitCastExpr 0x22d30bec9e8 <col:16> 'std::strong_ordering (*)(const A &) const noexcept' <FunctionToPointerDecay>
            | | `-DeclRefExpr 0x22d30bec9c8 <col:16> 'std::strong_ordering (const A &) const noexcept' lvalue CXXMethod 0x22d30bedcd8 'operator<=>' 'std::strong_ordering (const A &) const noexcept'
            | |-ImplicitCastExpr 0x22d30bec9b0 <col:13> 'const A' lvalue <NoOp>
            | | `-DeclRefExpr 0x22d30bec850 <col:13> 'A' lvalue Var 0x22d30be8290 'a1' 'A'
            | `-ImplicitCastExpr 0x22d30bec998 <col:18, col:37> 'const A' lvalue <NoOp>
            |   `-ParenExpr 0x22d30bec978 <col:18, col:37> 'A' lvalue
            |     `-ConditionalOperator 0x22d30bec948 <col:19, col:35> 'A' lvalue
            |       |-BinaryOperator 0x22d30bec8e8 <col:19, col:26> 'bool' '=='
            |       | |-ImplicitCastExpr 0x22d30bec8b8 <col:19> 'int *' <LValueToRValue>
            |       | | `-DeclRefExpr 0x22d30bec870 <col:19> 'int *' lvalue Var 0x22d30bec758 'ptr' 'int *'
            |       | `-ImplicitCastExpr 0x22d30bec8d0 <col:26> 'int *' <NullToPointer> // ptr == 0 cast that we want to convert to nullptr
            |       |   `-IntegerLiteral 0x22d30bec890 <col:26> 'int' 0
            |       |-DeclRefExpr 0x22d30bec908 <col:30> 'A' lvalue Var 0x22d30be8290 'a1' 'A'
            |       `-DeclRefExpr 0x22d30bec928 <col:35> 'A' lvalue Var 0x22d30be87f0 'a2' 'A'
            `-ImplicitCastExpr 0x22d30beca60 <col:16> 'std::strong_ordering::zero_type':'nullptr_t' <NullToPointer> // Auto-generated cast we should ignore
              `-IntegerLiteral 0x22d30beca38 <col:16> 'int' 0

This looks a little bit different from your AST but I'm not sure what code your AST was generated from. If you have a test case I'd be happy to take a look. Thanks!

@njames93 Thank you so much for the quick feedback, I made your suggested changes and added a test that it properly converts result = (a1 > (ptr == 0 ? a1 : a2)); to result = (a1 > (ptr == nullptr ? a1 : a2)); now.
This looks a little bit different from your AST but I'm not sure what code your AST was generated from. If you have a test case I'd be happy to take a look. Thanks!

I used the same test code as yours, but rather than creating my own std::strong_ordering I included the compare header. For reference I was using a trunk clang build with a trunk version of libstdc++ - https://godbolt.org/z/PrjhbK

This does highlight an issue where the mimicked std library stubs used for tests don't correspond exactly to what the stdlib actually looks like and can result in subtly different ASTs that have added/removed implicit nodes.

Going a little off point here but a few months back I pushed a fix for another check that passed its tests.
However the bug report was re-opened as the bug was still observable in the real word.
Turned out the implementation of std::string used for the test had a trivial destructor resulting in the AST not needed to emit CXXBindTemporaryExprs all over the place, which threw off the matching logic.

Unfortunately this kind of disparity is hard to detect in tests so it may be wise to test this locally using the compare header from a real standard library implementation (preferably all 3 main ones if you have the machines) and see if this behaviour is correct.

poelmanc added a comment.EditedJan 31 2021, 1:46 PM

This does highlight an issue where the mimicked std library stubs used for tests don't correspond exactly to what the stdlib actually looks like and can result in subtly different ASTs that have added/removed implicit nodes.

Going a little off point here but a few months back I pushed a fix for another check that passed its tests.
However the bug report was re-opened as the bug was still observable in the real word.
Turned out the implementation of std::string used for the test had a trivial destructor resulting in the AST not needed to emit CXXBindTemporaryExprs all over the place, which threw off the matching logic.

Unfortunately this kind of disparity is hard to detect in tests so it may be wise to test this locally using the compare header from a real standard library implementation (preferably all 3 main ones if you have the machines) and see if this behaviour is correct.

Indeed the mimicked std library approach has real advantages like running fast and giving consistent results across platforms. But it also has drawbacks like the replication of mimicked std code across tests, and possible differences between test and real world behaviour. We could give tests a CLANG_TIDY_TEST_NATIVE_STD macro to switch between mimicked std code and real headers, and set up CI to test both ways, to catch these issues at the expense of added complexity. But that's a broader discussion.

Back to modernize-use-nullptr, these days I'm working in MSVC and the tests pass with MSVC's <compare>, with my mimicked compare, and a shorter mimicked strong_ordering that I found in ASTTraverserTest.cpp. But I fetched gcc's <compare> from https://raw.githubusercontent.com/gcc-mirror/gcc/master/libstdc%2B%2B-v3/libsupc%2B%2B/compare and now see the same ASTs as you, and the proposed fix fails.

Thoughts on which option seems like the best path forward?

  1. I believe we really want to ignore the ZeroLiteral added to the AST in SemaOverload.cpp:13739-13754. Based on the CXXRewrittenBinaryOperator's isReversed() value, we should be able to detect which CXXOperatorCallExpr child that is (so far it's always the 3rd child, but if isReversed() it may be the 1st or 2nd?) That would tie us tightly to the SemaOverload.cpp implementation, but free us from std implementation details.
  2. Roll back to hasAncestor, fixing the immediate problem of creating invalid fixes (a1 nullptr 2a) and accepting that some rare but fixable code (a1 < ( ptr == 0 ? a2 : a3) will not be modernized to nullptr.
  3. Change the bugfix approach here from AST-based to text-based. In replaceWithNullptr in UseNullptrCheck.cpp:62, add code to skip the planned replacement if the text we're about to replace with nullptr consists of <, =, and >. As a quick test, this version seems to make all tests pass and doesn't depend on std header implementation:
static const std::string SuspiciousChars("<=>");
char FirstCharToReplace = *SM.getCharacterData(StartLoc);
char LastCharToReplace = *SM.getCharacterData(EndLoc);
if (SuspiciousChars.find(FirstCharToReplace) != std::string::npos &&
    SuspiciousChars.find(LastCharToReplace) != std::string::npos)
  return;
steveire added a subscriber: steveire.EditedJan 31 2021, 4:59 PM

Thoughts on which option seems like the best path forward?

I think you should be able to correctly match everything. Try a matcher like (can probably be cleaned up a bit):

auto isOrHasDescendant = [](auto InnerMatcher) {
  return anyOf(InnerMatcher, hasDescendant(InnerMatcher));
};

return traverse(
    TK_AsIs,
    anyOf(castExpr(anyOf(ImplicitCastToNull,
                         explicitCastExpr(hasDescendant(ImplicitCastToNull))),
                   unless(hasAncestor(explicitCastExpr())),
                   unless(hasAncestor(cxxRewrittenBinaryOperator())))
              .bind(CastSequence),
          cxxRewrittenBinaryOperator(
              // Match rewritten operators, but verify (in the check method)
              // that if an implicit cast is found, it is not from another
              // nested rewritten operator
              expr().bind("matchBinopOperands"),
              hasEitherOperand(isOrHasDescendant(
                  implicitCastExpr(
                      ImplicitCastToNull,
                      hasAncestor(cxxRewrittenBinaryOperator().bind(
                          "checkBinopOperands")))
                      .bind(CastSequence))))));

and

if (Result.Nodes.getNodeAs<CXXRewrittenBinaryOperator>(
        "matchBinopOperands") !=
    Result.Nodes.getNodeAs<CXXRewrittenBinaryOperator>("checkBinopOperands"))
  return;

in check.

ie, add an anyOf, ignore all descendants below cxxRewrittenBinaryOperator in the first part of the anyOf, and look only and cxxRewrittenBinaryOperator in the second part of it.

Here's another testcase for nested rewritten operators:

result = (a1 > ((a1 > (ptr == 0 ? a1 : a2)) ? a1 : a2));
// CHECK-FIXES: result = (a1 > ((a1 > (ptr == nullptr ? a1 : a2)) ? a1 : a2));
poelmanc updated this revision to Diff 320393.Jan 31 2021, 9:52 PM

Thanks @steveire, that suggestion worked perfectly! I added the additional test case and shortened the mimicked strong_ordering code to a version from clang/unittests.ASTMatchers/ASTMatchersTraversalTest.cpp, but also manually tested this using both MSVC's and libstdc++v3's <compare> header.

This does highlight an issue where the mimicked std library stubs used for tests don't correspond exactly to what the stdlib actually looks like and can result in subtly different ASTs that have added/removed implicit nodes.

Going a little off point here but a few months back I pushed a fix for another check that passed its tests.
However the bug report was re-opened as the bug was still observable in the real word.
Turned out the implementation of std::string used for the test had a trivial destructor resulting in the AST not needed to emit CXXBindTemporaryExprs all over the place, which threw off the matching logic.

Unfortunately this kind of disparity is hard to detect in tests so it may be wise to test this locally using the compare header from a real standard library implementation (preferably all 3 main ones if you have the machines) and see if this behaviour is correct.

Indeed the mimicked std library approach has real advantages like running fast and giving consistent results across platforms. But it also has drawbacks like the replication of mimicked std code across tests, and possible differences between test and real world behaviour. We could give tests a CLANG_TIDY_TEST_NATIVE_STD macro to switch between mimicked std code and real headers, and set up CI to test both ways, to catch these issues at the expense of added complexity. But that's a broader discussion.

I think the benefits from the mimicked std library outweigh the problems from trying to test against a moving target in multiple dimensions, and so I think we want to keep the mimicked std library approach as the default for tests. However, I also think the lack of testing against real world STL implementations is a problem that hurts us, for all the reasons pointed out already. Ultimately, the problem with testing against a real STL will boil down to who is responsible for changes when a test case starts failing after the check author has moved on. However, we might be able to solve this with policy decisions like changing a test to an expected failure + bug report while waiting for someone to step up to solve the issue. But this is orthogonal to your patch and would require a larger community discussion.

I think you should be able to correctly match everything. Try a matcher like (can probably be cleaned up a bit):

Thank you for that matcher suggestion; you are a powerful wizard. :-)

Generally, the changes LGTM.

clang-tools-extra/clang-tidy/modernize/UseNullptrCheck.cpp
59
poelmanc updated this revision to Diff 320637.Feb 1 2021, 5:13 PM

Add period to end of comment.

poelmanc marked an inline comment as done.Feb 1 2021, 5:13 PM

Thanks for all the great feedback I received here. To give credit where credit's due, this updated revision to UseNullptrCheck.cpp is now actually 100% @steveire's suggested code. Even one of the tests cases was his. Whenever it's ready to land I'd appreciate it if someone could push it as I lack llvm-project commit access.

Thanks for all the great feedback I received here. To give credit where credit's due, this updated revision to UseNullptrCheck.cpp is now actually 100% @steveire's suggested code. Even one of the tests cases was his. Whenever it's ready to land I'd appreciate it if someone could push it as I lack llvm-project commit access.

Can I ask if you could tidy the description of this, basically remove all the stuff about hasGrandparent etc, probably best just remove everything after result = (a1 nullptr a2); in the desc. It shows in the commit message and its not strictly relevant.

poelmanc edited the summary of this revision. (Show Details)Feb 3 2021, 4:43 PM

Can I ask if you could tidy the description of this, basically remove all the stuff about hasGrandparent etc, probably best just remove everything after result = (a1 nullptr a2); in the desc. It shows in the commit message and its not strictly relevant.

Thanks, done. I never thought about all that showing up in the commit message, I'll be more concise.

njames93 accepted this revision.Feb 8 2021, 8:14 AM

LGTM, Same as last time for the commit?

This revision is now accepted and ready to land.Feb 8 2021, 8:14 AM

LGTM, Same as last time for the commit?

That would be great, thanks!

I'm unable to cleanly apply this patch, looks like something was a bit off when you created the diff.

➜  llvm-project git:(main) ✗ arc patch D95714                           
 INFO  Base commit is not in local repository; trying to fetch.
Created and checked out branch arcpatch-D95714.
Checking patch dev/null => clang-tools-extra/test/clang-tidy/checkers/modernize-use-nullptr-cxx20.cpp...
error: dev/null: does not exist in index
Checking patch clang-tools-extra/clang-tidy/modernize/UseNullptrCheck.cpp...
Applied patch clang-tools-extra/clang-tidy/modernize/UseNullptrCheck.cpp cleanly.

 Patch Failed! 
Usage Exception: Unable to apply patch!
➜  llvm-project git:(arcpatch-D95714) ✗

Can you try to reupload the diff, making sure you use full context as well.

poelmanc updated this revision to Diff 322253.Feb 8 2021, 5:49 PM

Thanks for your patience, this one should work, as I used my normal git show HEAD -U999999 workflow.

(The requested change was just to add a period to a comment, so as a short-cut I tried "Raw Diff" -> copy -> "Update Diff" -> paste into "Raw Diff" field, -> add the period -> Continue -> Save. "Raw Diff" didn't show full context, but it appeared to work based on the web interface. I guess not. modernize-use-nullptr-cxx20.cpp is a whole new file, in case that matters.)

poelmanc updated this revision to Diff 322302.EditedFeb 8 2021, 11:55 PM

Capitalize IsOrHasDescendant, add } // namespace std per HarborMaster feedback.