This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Flag more buggy string constructor cases
Needs ReviewPublic

Authored by ccotter on Feb 13 2023, 6:26 PM.



Add more cases to the "swapped args" case of
bugprone-string-constructor check when the first argument is
a reference to a char variable or function that returns a char.

Calls to the fill constructor where both arguments are
implicitly cast is "confusing" since the implicit cast may not be
apparent to the reader. In many cases, this is a programmer error. In
cases where it's not an error, the code should use explicit casts to
clearly specify intent.

Note that the swapped args case is a subset of the "confusing args"
case. Not all confusing args are swapped args. Consider

char buffer[] = "abcdef";
std::string not_swapped(buffer[1], 2); // oops! should be '&buffer[1]'
assert(not_swapped == "bc");

The only case where the swapped args is assumed is when the
first argument to the fill constructor is one where it is not possible
the author forgot to prefix the expression with &. I.e., when the
first arg is

  • char literal
  • declRefExpr to variable of type char (but not char&).
  • function that returns char

With the declRefExpr case, it is possible to take the address
of a variable of char type, but only if the number of bytes
read is no more than 1:

char ch;
std::string fill(ch, 1);
// intended 'std::string fill(1, ch)' instead?
// or intended 'std::string fill(&ch, 1)'?
// both lead to the same end result anyway, but
// the former is cleaner. so swapping here is safe.

Diff Detail

Event Timeline

ccotter created this revision.Feb 13 2023, 6:26 PM
Herald added a project: Restricted Project. · View Herald Transcript
ccotter requested review of this revision.Feb 13 2023, 6:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 6:27 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ccotter retitled this revision from Flag code with both string constructor arguments implicitly casted to [clang-tidy] Flag code with both string constructor arguments implicitly casted.Feb 13 2023, 6:27 PM
ccotter added inline comments.Feb 13 2023, 6:29 PM

This is specifically a bug I wrote recently where I meant to write &kText[1]. This doesn't fall under the "arguments swapped" case since indeed I merely forgot a '&'. At most, the tool can flag this potentially buggy code and suggest casts if the author determines the code is correct as is. However, the tool does not actually provide replacements as such code is more likely to be incorrect and require the author to determine the fix and apply it.

ccotter updated this revision to Diff 497189.Feb 13 2023, 7:55 PM
  • Add more cases to swapped params
ccotter added inline comments.Feb 13 2023, 7:57 PM

swapped[1,2,5,7], wswapped were all supported by the original logic - i.e., the ones with a literal char as the first arg.

The other swapped* cases are newly supported with my changes.

ccotter retitled this revision from [clang-tidy] Flag code with both string constructor arguments implicitly casted to [clang-tidy] Flag more buggy string constructor cases.Feb 13 2023, 7:59 PM
ccotter edited the summary of this revision. (Show Details)
ccotter updated this revision to Diff 500645.Feb 26 2023, 6:28 PM
  • Add more cases to swapped params
PiotrZSL added inline comments.

add a testcase with:

char& chRef = ch;
std::string swapped2(chRef, i);

char* chPtr = &ch;
std::string swapped2(*chPtr, i);

using Character = char;
Character  c;
std::string swapped2(c, i);

I fear that it may not match those, because here you are trying to match specific use cases, when you should simply match a type of expression. In such case this should be just:

const auto CharExpr = expr(hasType(qualType(hasCanonicalType(anyOf(isAnyCharacter(), references(isAnyCharacter()))))));

Only issue would be implicit cast, you can try deal with them by using ignoringImplicit, or in better way just by using:

traverse(TK_IgnoreUnlessSpelledInSource,  hasArgument(0, CharExpr.bind("swapped-parameter"))),

TK_IgnoreUnlessSpelledInSource will cut of all implicit operations, so you could check if called constructor is actually the wrong one, and check argument types. Because now macher in line 125 may not always work if somehow we would have for example 2 implicit casts, for example from char reference to char, or some other crap.

It's up to you, but consider more generic approach instead of matching specific use cases like references to variables, or calls.


what about references to integer, or typedefs to integers ?


reuse here CharExpr


i thing that this case should be matched by "swapped-parameter", no need to add extra one, just first one should be fixed.

PiotrZSL added inline comments.Mar 7 2023, 12:51 AM

try with test like this:

struct Class
   operator char() const;

Class c;
std::string value(c, 5);

I fear that hasSourceExpression could match Class type, and therefore this case wouldn't be detected.

carlosgalvezp added inline comments.Mar 25 2023, 4:10 AM

Should we use C++ casts?


Should this be a char?


Shouldn't this fail, since the constructor std::string(char, char) does not exist?

ccotter updated this revision to Diff 508345.Mar 25 2023, 2:38 PM
ccotter marked 4 inline comments as done.
  • Add more cases to swapped params
  • Change message, add more tests
ccotter marked an inline comment as done.Mar 25 2023, 2:39 PM
ccotter added inline comments.

Thinking about this more, I think we have to keep the "args swapped" case very narrow to the existing patterns of the first arg being a char literal, and the second being an integer literal. Consider

char buffer[] = "abcdef";
char& chRef = ch;
std::string not_swapped(chRef, 4); // Forgot '&' in front of 'chRef'

std::string also_not_swapped(buffer[2], 2); // Forgot '&' in front of 'buffer[2]'

Neither of these are swapped arg bugs, so I don't think we can have the tool consider this a swapped arg bug and apply fixits. The also_not_swapped is the exact bug I wrote a few weeks ago.

I think there are two types of bugs the tool has to be aware of with respect to the string fill constructor

  1. swapped args
  2. "confusing" args - where the arguments are both implicitly cast from int to char and vice versa.

For swapped args, the existing logic will match when the first arg has isInteger() type, and the second arg is characterLiteral(). At most, I think we can only expand the logic to include constructor exprs when the second arg is a declRefExpr to a char, but not char ref (see the not_swapped example above), or when the second arg is the result of a function call that returns a char. In these cases, we can be sure that the the author did not mean to put a & in front of the second argument expression.

For "confusing" args, which is not implemented in the existing logic, but I am proposing in this change, this is the more general case where I think the logic should match any expression as long as both argument expressions are implicitly cast. This is "confusing" because int-to-char and char-to-int conversions are not obvious to the reader, and it should be rare for both conversions to take place when calling the fill constructor (confirmed anecdotally in the codebases I've checked). The tool could not offer fixits here since the fix might be to swap the args, or to add a & in front of the first arg (or even another change). At most, we can warn and alert the user to the problem. If this is what the author intended, they can use explicit casts to make the intent obvious to the reader.


I'll add a test for reference to integer, but I'm matching on the type of the source expression that is being cast, which will be of integer type, even if the expression is string str(x, y) and y is an int& (the source expression in the cast will be of type int).


See earlier explanation on why we can't treat these all as swapped parameters cases.


I'm not sure what you mean here - would you mind clarifying? Do you mean should this example cast the second arg '5' to char?


The new logic I wrote flags when both parameters are implicitly cast from int to char for the first arg, and char to int in the second argument. This specific situation is what I found to be most confusing and likely bug prone. I didn't include the case where only one argument is implicitly cast to be buggy, since I wanted to be more conservative in how many more expressions to flag as being buggy, and two implicit casts (which is a bug I wrote recently) is "more" bug prone than just one. We could expand it however - would you like to explore that option?

ccotter edited the summary of this revision. (Show Details)Mar 25 2023, 2:47 PM
ccotter marked an inline comment as done.

bump please?

bump please?

PiotrZSL requested changes to this revision.Jun 11 2023, 2:56 AM

I run into false positive with this example:

using UInt8 = unsigned char;
UInt8 size = 5U;
std::string str2(size, 'x');  //  warning: string constructor arguments are probably swapped; expecting string(count, character) [bugprone-string-constructor]

This is due to change in CharExpr, simply isAnyCharacter is too wide, as it also match signed/unsigned characters, and your tests does not cover those.
Probably for this we need to be more strict, and accept only non signed/unsigned characters.

Other issue is with this:

char c = '\n';
using Size = int;
Size size = 10U;
std::string str2(c, size);

Even that in AST we got double implicit casts, it's detected as warning: string constructor arguments are probably swapped; expecting string(count, character) [bugprone-string-constructor].
Instead of being detected as 'confusing string fill constructor arguments'. This probably could be fixed by matching those double casts first, instead of last.

|     `-VarDecl 0x56436ceb0520 <col:3, col:27> col:15 str2 'std::string':'std::basic_string<char>' callinit
|       `-CXXConstructExpr 0x56436ceb0650 <col:15, col:27> 'std::string':'std::basic_string<char>' 'void (unsigned int, char)'
|         |-ImplicitCastExpr 0x56436ceb0608 <col:20> 'unsigned int' <IntegralCast>
|         | `-ImplicitCastExpr 0x56436ceb05f0 <col:20> 'char' <LValueToRValue>
|         |   `-DeclRefExpr 0x56436ceb0588 <col:20> 'char' lvalue Var 0x56436ceb0280 'c' 'char'
|         `-ImplicitCastExpr 0x56436ceb0638 <col:23> 'char':'char' <IntegralCast>
|           `-ImplicitCastExpr 0x56436ceb0620 <col:23> 'Size':'int' <LValueToRValue>
|             `-DeclRefExpr 0x56436ceb05a8 <col:23> 'Size':'int' lvalue Var 0x56436ceb0420 'size' 'Size':'int'

Except above 2 change looks OK to me. My main concern is false-positive, simply because in project that I work for we use lot of std::uint8_t as size, to save bytes in struct.

This revision now requires changes to proceed.Jun 11 2023, 2:56 AM
ccotter updated this revision to Diff 530349.Jun 11 2023, 4:35 PM
  • Exclude false positive

Thanks, fixed the first false positive example you gave. Let me think about the second example in your most recent post:

char c = '\n';
using Size = int;
Size size = 10U;
std::string str2(c, size);

This is my newly added case swapped4, where one of my original intents was to flag this code as "swapped." I can see the the ambiguity. Worst case, "confusing string fill constructor arguments" is the fallback that I think is acceptable, and still achieves the goal of at least flagging the code as potentially buggy.

PiotrZSL added inline comments.Jun 11 2023, 10:37 PM

this may incorrectly work also for signed char, we want it work only for char and wchat_t, not signed char, unsigned char, signed wchar_t, unsigned wchar_t.
So this still need some work. I would say new "isAnyCharacter" need to be created.


Similar could be here, we consider std::uint8_t and std::sint8_t as integers, not as an characters, same with signed char, unsigned char