This is an archive of the discontinued LLVM Phabricator instance.

Fix clang-tidy readability-redundant-string-init for c++17/c++2a
ClosedPublic

Authored by poelmanc on Oct 21 2019, 12:14 AM.

Details

Summary

readability-redundant-string-init was one of several clang-tidy checks documented as failing for C++17 by @gribozavr's extensive testing on 2019-05-20. (The failure mode in C++17 is that it changes std::string Name = ""; to std::string Name = Name;, which actually compiles but crashes at run-time.)

Analyzing the AST with clang -Xclang -ast-dump showed that the outer CXXConstructExprs that previously held the correct SourceRange were being elided in C++17/2a, but the containing VarDecl expressions still had all the relevant information. So this patch changes the fix to get its source ranges from VarDecl.

It adds one test std::string g = "u", h = "", i = "uuu", j = "", k; to confirm proper warnings and fixit replacements in a single DeclStmt where some strings require replacement and others don't. The readability-redundant-string-init.cpp and readability-redundant-string-init-msvc.cpp tests now pass for C++11/14/17/2a.

Diff Detail

Event Timeline

poelmanc created this revision.Oct 21 2019, 12:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2019, 12:14 AM
Eugene.Zelenko added inline comments.
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
78

Please do not use auto if type is not spelled in same statement or iterator

124–130

Please do not use auto if type is not spelled in same statement or iterator

Eugene.Zelenko added a project: Restricted Project.
Eugene.Zelenko removed a subscriber: gribozavr.
poelmanc updated this revision to Diff 225894.Oct 21 2019, 8:38 AM

Removed the two uses of auto where the type was not an iterator or clear from the right-hand-side.

poelmanc marked 2 inline comments as done.Oct 21 2019, 8:39 AM

Thanks for the quick feedback, fixed.

aaron.ballman added inline comments.Oct 28 2019, 8:48 AM
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
75

You can drop the clang:: everywhere -- the code is within the correct namespace already.

79–80

Please drop top-level const qualifiers unless it's a pointer or reference declaration (that's not a style we typically use).

128

Elide braces per coding style.

clang-tools-extra/test/clang-tidy/checkers/readability-redundant-string-init.cpp
106

Why does this need a regex?

poelmanc updated this revision to Diff 226695.Oct 28 2019, 10:11 AM

Remove extra const, braces for one-line if statements, and clang::. Added a comment explaining the need for a regex on a warning test line.

poelmanc marked 5 inline comments as done.Oct 28 2019, 10:14 AM

Thanks for the feedback, the new patch removes the extra const, clang::, and braces on one-line if statements, and now explains the regex in readability-redundant-string-init.cpp.

clang-tools-extra/test/clang-tidy/checkers/readability-redundant-string-init.cpp
106

Thanks, I added a comment explaining that the character position of the warning differs slightly between C++11/14 (reports at the 'e') and C++17/2x (reports at the '"'), since the underlying parsing code has changed.

MyDeveloperDay added inline comments.
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
50

driveby aside comment (feel free to ignore): wouldn't it be good if "basic_string" here was configurable, this would allow anyone who defines a string class (say like StringRef) which could have this kind of redundant string initialization be able to catch them.

Couldn't this be then used to find code like StringRef EnabledPrefixOut = "";

static void ParseMRecip(const Driver &D, const ArgList &Args,
                        ArgStringList &OutStrings) {
  StringRef DisabledPrefixIn = "!";
  StringRef DisabledPrefixOut = "!";
  StringRef EnabledPrefixOut = "";
  StringRef Out = "-mrecip=";
poelmanc marked 3 inline comments as done.Oct 28 2019, 10:29 PM
poelmanc added inline comments.
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
50

Great suggestion, done. See https://reviews.llvm.org/D69548. (I figured it should be a separate topic from this one, but holler if you prefer I add it to this patch.)

aaron.ballman added inline comments.Oct 29 2019, 1:41 PM
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
67–70

Are you sure that this behavioral difference isn't just a bug in Clang? I can't think of why the source range should differ based on language mode, so I wonder if the correct thing to do is fix how we calculate the source range in Clang?

clang-tools-extra/test/clang-tidy/checkers/readability-redundant-string-init.cpp
106

Thank you for the explanation!

poelmanc marked 2 inline comments as done.Oct 29 2019, 2:13 PM
poelmanc added inline comments.
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
67–70

Thanks and no, I'm not sure at all. At the end of my summary paragraph I wrote:

Or, is it possible that this change in SourceRange is an unintentional difference in the parsing code? If so fixing that might make more sense.

Before starting this patch, I built a Debug build of LLVM and stepped through LLVM parsing code while running clang-tidy on the readability-redundant-string-init.cpp file. I set breakpoints and inspected variable values under both -std c++14 and -std c++17. The call stacks were clearly different. I could sometimes inspect character positions in the file and see where an expression's SourceLoc was set. However, a SourceLoc is a single character position; I couldn't figure out where an expression's SourceRange was even set. So I gave up and went down the current approach.

Should we ping the LLVM mailing list and see if this change rings a bell with anyone there?

poelmanc marked an inline comment as done.Oct 29 2019, 4:25 PM
gribozavr2 added inline comments.Oct 31 2019, 10:32 AM
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
67–70

Yes, I believe that changing Clang to produce consistent source ranges in C++14 and C++17 is the best way to go.

Manual parsing below is not a very maintainable solution.

aaron.ballman added inline comments.Oct 31 2019, 11:26 AM
clang-tools-extra/clang-tidy/readability/RedundantStringInitCheck.cpp
67–70

Yes, I believe that changing Clang to produce consistent source ranges in C++14 and C++17 is the best way to go.

+1, if this is possible to do.

NoQ added a subscriber: NoQ.EditedNov 6 2019, 3:57 PM

I suspect that it's not just the source range, but the whole AST for the initializer is different, due to C++17 mandatory copy elision in the equals-initializer syntax. Like, before C++17 it was a temporary constructor, a temporary materialization (ironic!), and a copy constructor, but in C++17 and after it's a single direct constructor which looks exactly like the old temporary constructor (except not being a CXXTemporaryObjectExpr). You're most likely talking about different construct-expressions in different language modes.

That said, it should probably be possible to change the source range anyway somehow.

Also i don't see any tests for multiple declarations in a single DeclStmt, i.e.

string A = "a", B = "";

... - which source range do you expect to have if you warn on B?

Just documenting here that I sent the following email to cfe-dev@lists.llvm.org:

When parsing a named declaration with an equals sign with clang -std c++11/14, clang builds an initializer expression whose SourceRange covers from the variable name through the end of the initial value:

std::string foo = "bar";
            ^~~~~~~~~~~

When parsing the same code with clang -std c++17/2x, the initializer expression's SourceRange only includes the initial value itself, and not the variable name or the equals sign:

std::string foo = "bar";
                  ^~~~~

If the string is initialized using parentheses rather than an equals sign, in all of c++11/14/17/2x the initializer expression's SourceRange includes the variable name and both parentheses:

std::string foo("bar");
            ^~~~~~~~~~

This difference has broken clang-tidy's readability-remove-redundant-string for c++17 and c++2x, as described in at reviews.llvm.org/D69238. Is this SourceRange difference intentional, and if not does anyone have thoughts on how to make the SourceRange consistent across C++ standards? Thanks!

P.S. I've tried stepping through clang-tidy parsing the std::string a = ""; line in clang-tools-extra/test/clang-tidy/checkers/readability-redundant-string-init.cpp with -std=c++11 and again with -std=c++17. In both cases Sema::AddInitializerToDecl seems to create a correct InitializationKind with the Locations[0] properly pointing to the start of the variable name. I'm not sure how or where that gets translated to the expression's SourceRange later on in some way that presumably differs between c++11 and c++17. If this SourceRange difference is unintentional and needs to be tracked down and fixed, any tips on where to look would be appreciated.

I didn't receive any email responses, I believe it prompted @NoQ to look this over and comment - thanks!

In D69238#1736365, @NoQ wrote:

I suspect that it's not just the source range, but the whole AST for the initializer is different, due to C++17 mandatory copy elision in the equals-initializer syntax. Like, before C++17 it was a temporary constructor, a temporary materialization (ironic!), and a copy constructor, but in C++17 and after it's a single direct constructor which looks exactly like the old temporary constructor (except not being a CXXTemporaryObjectExpr). You're most likely talking about different construct-expressions in different language modes.

That said, it should probably be possible to change the source range anyway somehow.

Thanks to all the encouragement here, I spent a few more hours stepping through code and have found a one-line change to clang\lib\Sema\SemaInit.cpp:8053 that fixes this bug! Change:

SourceLocation Loc = CurInit.get()->getBeginLoc();

to

SourceLocation Loc = Kind.getLocation();

For SK_UserConversion, CurInit is set at SemaInit.cpp:7899 to Args[0], i.e. the first argument to the constructor, which is "" in this case. By changing Loc to Kind.getLocation(), the BuildCXXConstructExpr at SemaInit.cpp:8064 gets created with a SourceRange spanning a = "". With just that change, the SourceRange for an expression like std::string a = "" becomes consistent across C++11/14/17/2a and readability-redundant-string-init tests pass in all C++ modes (so we can throw away my 70 lines of manual parsing code.)

I have no experience with the clang parsing code so I don't know what other effects this change might have or who else we might want to check with before committing this. Should I change my "diff" here to just that?

Also i don't see any tests for multiple declarations in a single DeclStmt, i.e.

string A = "a", B = "";

... - which source range do you expect to have if you warn on B?

For that example I'd expect the warning source range to cover B = "". readability-redundant-string-init.cpp does include tests almost like that: std::string a = "", b = "", c = ""; warns on 3 separate source ranges, while std::string d = "u", e = "u", f = "u"; doesn't warn at all. I'm happy to add a line that has some of each. I just added std::string g = "u", h = "", i = "uuu", j = "", k; and confirmed that it warns and fixes correctly (in C++11/14 mode it should pass already; in C++17/2a mode it works correctly with the current patch.)

If it is indeed the extra AST node for the elidable constructor, see if you can use the ignoringElidableConstructorCall AST matcher to ignore it, therefore smoothing over AST differences between language modes.

poelmanc added a comment.EditedNov 8 2019, 5:14 PM

If it is indeed the extra AST node for the elidable constructor, see if you can use the ignoringElidableConstructorCall AST matcher to ignore it, therefore smoothing over AST differences between language modes.

Thanks for the tip. Adding ignoringElidableConstructorCall in front of cxxConstructExpr for the EmptyStringCtorExpr and EmptyStringCtorExprWithTemporaries in RedundantStringInitCheck.cpp resulted in the checker no longer matching any of std::string a = "" lines, i.e. basically disabling the check for those types of lines.

Is there a tool to print the AST? That would show whether the AST already has some expression with the right SourceRange.

NoQ added a comment.Nov 8 2019, 5:19 PM

Is there a tool to print the AST? That would show whether the AST already has some expression with the right SourceRange.

[[ https://clang.llvm.org/docs/IntroductionToTheClangAST.html | Clang's -ast-dump ]].

poelmanc updated this revision to Diff 228556.Nov 8 2019, 9:37 PM
poelmanc edited the summary of this revision. (Show Details)
In D69238#1739627, @NoQ wrote:

Wow. That makes this so much easier... Thank you so much!

Looking at the AST showed that the CXXConstructExprs that readability-redundant-string-init previously relied upon for SourceRange were being elided in C++17/2a, but that VarDecls consistently held all the right info across all C++ modes. So I completely rewrote this patch to get the SourceRange from the VarDecl. All tests pass.

In case anyone cares or for posterity, what follows are a bunch of probably unnecessary details.

I made a super minimal readability-redundant-string-init.cpp:

namespace std {
struct string {
  string();
  string(const char *);
};
}

void f() {
  std::string a = "", b = "foo", c, d(""), e("bar");
}

Running clang -Xclang -ast-dump -std=c++11 readability-redundant-string-init.cpp yields:

`-FunctionDecl 0x27a33bfd610 <line:8:1, line:10:1> line:8:6 f 'void ()'
  `-CompoundStmt 0x27a33bf6c78 <col:10, line:10:1>
    `-DeclStmt 0x27a33bf6c60 <line:9:3, col:52>
      |-VarDecl 0x27a33bfd788 <col:3, col:19> col:15 a 'std::string':'std::string' cinit
      | `-ExprWithCleanups 0x27a33bfddd0 <col:15, col:19> 'std::string':'std::string'
      |   `-CXXConstructExpr 0x27a33bfdda0 <col:15, col:19> 'std::string':'std::string' 'void (std::string &&) noexcept' elidable
      |     `-MaterializeTemporaryExpr 0x27a33bfdd48 <col:19> 'std::string':'std::string' xvalue
      |       `-ImplicitCastExpr 0x27a33bfdc20 <col:19> 'std::string':'std::string' <ConstructorConversion>
      |         `-CXXConstructExpr 0x27a33bfdbf0 <col:19> 'std::string':'std::string' 'void (const char *)'
      |           `-ImplicitCastExpr 0x27a33bfdbd8 <col:19> 'const char *' <ArrayToPointerDecay>
      |             `-StringLiteral 0x27a33bfd868 <col:19> 'const char [1]' lvalue ""
      |-VarDecl 0x27a33bfde08 <col:3, col:27> col:23 b 'std::string':'std::string' cinit
      | `-ExprWithCleanups 0x27a33bfdfb0 <col:23, col:27> 'std::string':'std::string'
      |   `-CXXConstructExpr 0x27a33bfdf80 <col:23, col:27> 'std::string':'std::string' 'void (std::string &&) noexcept' elidable
      |     `-MaterializeTemporaryExpr 0x27a33bfdf68 <col:27> 'std::string':'std::string' xvalue
      |       `-ImplicitCastExpr 0x27a33bfdf50 <col:27> 'std::string':'std::string' <ConstructorConversion>
      |         `-CXXConstructExpr 0x27a33bfdf20 <col:27> 'std::string':'std::string' 'void (const char *)'
      |           `-ImplicitCastExpr 0x27a33bfdf08 <col:27> 'const char *' <ArrayToPointerDecay>
      |             `-StringLiteral 0x27a33bfdee8 <col:27> 'const char [4]' lvalue "foo"
      |-VarDecl 0x27a33bfdfe8 <col:3, col:34> col:34 c 'std::string':'std::string' callinit
      | `-CXXConstructExpr 0x27a33bfe050 <col:34> 'std::string':'std::string' 'void ()'
      |-VarDecl 0x27a33bfe098 <col:3, col:41> col:37 d 'std::string':'std::string' callinit
      | `-CXXConstructExpr 0x27a33bf6af0 <col:37, col:41> 'std::string':'std::string' 'void (const char *)'
      |   `-ImplicitCastExpr 0x27a33bf6ad8 <col:39> 'const char *' <ArrayToPointerDecay>
      |     `-StringLiteral 0x27a33bf6aa0 <col:39> 'const char [1]' lvalue ""
      `-VarDecl 0x27a33bf6b40 <col:3, col:51> col:44 e 'std::string':'std::string' callinit
        `-CXXConstructExpr 0x27a33bf6c00 <col:44, col:51> 'std::string':'std::string' 'void (const char *)'
          `-ImplicitCastExpr 0x27a33bf6be8 <col:46> 'const char *' <ArrayToPointerDecay>
            `-StringLiteral 0x27a33bf6ba8 <col:46> 'const char [4]' lvalue "bar"

With -std=c++11, all of the outer CXXConstructExpr have the correct range to replace with just the variable names, which explains why all is fine with C++11.

Then running clang -Xclang -ast-dump -std=c++17 readability-redundant-string-init.cpp yields:

`-FunctionDecl 0x2b8e6ac96b0 <line:8:1, line:10:1> line:8:6 f 'void ()'
  `-CompoundStmt 0x2b8e6acc608 <col:10, line:10:1>
    `-DeclStmt 0x2b8e6acc5f0 <line:9:3, col:52>
      |-VarDecl 0x2b8e6ac9828 <col:3, col:19> col:15 a 'std::string':'std::string' cinit
      | `-ImplicitCastExpr 0x2b8e6ac9cc0 <col:19> 'std::string':'std::string' <ConstructorConversion>
      |   `-CXXConstructExpr 0x2b8e6ac9c90 <col:19> 'std::string':'std::string' 'void (const char *)'
      |     `-ImplicitCastExpr 0x2b8e6ac9c78 <col:19> 'const char *' <ArrayToPointerDecay>
      |       `-StringLiteral 0x2b8e6ac9908 <col:19> 'const char [1]' lvalue ""
      |-VarDecl 0x2b8e6ac9e08 <col:3, col:27> col:23 b 'std::string':'std::string' cinit
      | `-ImplicitCastExpr 0x2b8e6ac9f50 <col:27> 'std::string':'std::string' <ConstructorConversion>
      |   `-CXXConstructExpr 0x2b8e6ac9f20 <col:27> 'std::string':'std::string' 'void (const char *)'
      |     `-ImplicitCastExpr 0x2b8e6ac9f08 <col:27> 'const char *' <ArrayToPointerDecay>
      |       `-StringLiteral 0x2b8e6ac9ee8 <col:27> 'const char [4]' lvalue "foo"
      |-VarDecl 0x2b8e6ac9f88 <col:3, col:34> col:34 c 'std::string':'std::string' callinit
      | `-CXXConstructExpr 0x2b8e6ac9ff0 <col:34> 'std::string':'std::string' 'void ()'
      |-VarDecl 0x2b8e6aca038 <col:3, col:41> col:37 d 'std::string':'std::string' callinit
      | `-CXXConstructExpr 0x2b8e6aca0f0 <col:37, col:41> 'std::string':'std::string' 'void (const char *)'
      |   `-ImplicitCastExpr 0x2b8e6aca0d8 <col:39> 'const char *' <ArrayToPointerDecay>
      |     `-StringLiteral 0x2b8e6aca0a0 <col:39> 'const char [1]' lvalue ""
      `-VarDecl 0x2b8e6acc4d0 <col:3, col:51> col:44 e 'std::string':'std::string' callinit
        `-CXXConstructExpr 0x2b8e6acc590 <col:44, col:51> 'std::string':'std::string' 'void (const char *)'
          `-ImplicitCastExpr 0x2b8e6acc578 <col:46> 'const char *' <ArrayToPointerDecay>
            `-StringLiteral 0x2b8e6acc538 <col:46> 'const char [4]' lvalue "bar"

With -std=c++17 the two elidable CXXConstructExpr are gone, and the previously nested CXXConstructExpr that remain have too-short ranges, which explains the C++17 failure.

Fortunately this dump showed that VarDecl contained all the information needed to obtain the right ranges. From

VarDecl 0xfffffffffff <col:Begin, col:End> col:Loc var_name 'std::string':'std::string' cinit

we extract SourceRange(Loc,End) and everything works fine across C++ modes.

poelmanc marked 3 inline comments as done.Nov 8 2019, 9:49 PM
mgehre removed a subscriber: mgehre.Nov 9 2019, 12:25 AM
gribozavr2 accepted this revision.Nov 11 2019, 8:47 AM

Thanks! Do you have commit access or do you need me to commit for you?

This revision is now accepted and ready to land.Nov 11 2019, 8:47 AM

Thanks! Do you have commit access or do you need me to commit for you?

I don't have commit access, if you could commit it for me that would be great. Thanks!

This revision was automatically updated to reflect the committed changes.
mitchell-stellar reopened this revision.Nov 19 2019, 4:53 AM
mitchell-stellar added a subscriber: mitchell-stellar.

Reopening in order to correct the accidentally swapped commits between this ticket and https://reviews.llvm.org/D70165.

This revision is now accepted and ready to land.Nov 19 2019, 4:53 AM
This revision was automatically updated to reflect the committed changes.