Page MenuHomePhabricator

[clang-tidy] Flag more buggy string constructor cases
Needs ReviewPublic

Authored by ccotter on Feb 13 2023, 6:26 PM.



Add more cases to the "swapped args" case of
bugprone-string-constructor check when the first argument is
a reference to a char variable or function that returns a char.

Calls to the fill constructor where both arguments are
implicitly cast is "confusing" since the implicit cast may not be
apparent to the reader. In many cases, this is a programmer error. In
cases where it's not an error, the code should use explicit casts to
clearly specify intent.

Note that the swapped args case is a subset of the "confusing args"
case. Not all confusing args are swapped args. Consider

char buffer[] = "abcdef";
std::string not_swapped(buffer[1], 2); // oops! should be '&buffer[1]'
assert(not_swapped == "bc");

The only case where the swapped args is assumed is when the
first argument to the fill constructor is one where it is not possible
the author forgot to prefix the expression with &. I.e., when the
first arg is

  • char literal
  • declRefExpr to variable of type char (but not char&).
  • function that returns char

With the declRefExpr case, it is possible to take the address
of a variable of char type, but only if the number of bytes
read is no more than 1:

char ch;
std::string fill(ch, 1);
// intended 'std::string fill(1, ch)' instead?
// or intended 'std::string fill(&ch, 1)'?
// both lead to the same end result anyway, but
// the former is cleaner. so swapping here is safe.

Diff Detail

Event Timeline

ccotter created this revision.Feb 13 2023, 6:26 PM
Herald added a project: Restricted Project. · View Herald Transcript
ccotter requested review of this revision.Feb 13 2023, 6:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 6:27 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ccotter retitled this revision from Flag code with both string constructor arguments implicitly casted to [clang-tidy] Flag code with both string constructor arguments implicitly casted.Feb 13 2023, 6:27 PM
ccotter added inline comments.Feb 13 2023, 6:29 PM

This is specifically a bug I wrote recently where I meant to write &kText[1]. This doesn't fall under the "arguments swapped" case since indeed I merely forgot a '&'. At most, the tool can flag this potentially buggy code and suggest casts if the author determines the code is correct as is. However, the tool does not actually provide replacements as such code is more likely to be incorrect and require the author to determine the fix and apply it.

ccotter updated this revision to Diff 497189.Feb 13 2023, 7:55 PM
  • Add more cases to swapped params
ccotter added inline comments.Feb 13 2023, 7:57 PM

swapped[1,2,5,7], wswapped were all supported by the original logic - i.e., the ones with a literal char as the first arg.

The other swapped* cases are newly supported with my changes.

ccotter retitled this revision from [clang-tidy] Flag code with both string constructor arguments implicitly casted to [clang-tidy] Flag more buggy string constructor cases.Feb 13 2023, 7:59 PM
ccotter edited the summary of this revision. (Show Details)
ccotter updated this revision to Diff 500645.Feb 26 2023, 6:28 PM
  • Add more cases to swapped params
PiotrZSL added inline comments.

add a testcase with:

char& chRef = ch;
std::string swapped2(chRef, i);

char* chPtr = &ch;
std::string swapped2(*chPtr, i);

using Character = char;
Character  c;
std::string swapped2(c, i);

I fear that it may not match those, because here you are trying to match specific use cases, when you should simply match a type of expression. In such case this should be just:

const auto CharExpr = expr(hasType(qualType(hasCanonicalType(anyOf(isAnyCharacter(), references(isAnyCharacter()))))));

Only issue would be implicit cast, you can try deal with them by using ignoringImplicit, or in better way just by using:

traverse(TK_IgnoreUnlessSpelledInSource,  hasArgument(0, CharExpr.bind("swapped-parameter"))),

TK_IgnoreUnlessSpelledInSource will cut of all implicit operations, so you could check if called constructor is actually the wrong one, and check argument types. Because now macher in line 125 may not always work if somehow we would have for example 2 implicit casts, for example from char reference to char, or some other crap.

It's up to you, but consider more generic approach instead of matching specific use cases like references to variables, or calls.


what about references to integer, or typedefs to integers ?


reuse here CharExpr


i thing that this case should be matched by "swapped-parameter", no need to add extra one, just first one should be fixed.

PiotrZSL added inline comments.Mar 7 2023, 12:51 AM

try with test like this:

struct Class
   operator char() const;

Class c;
std::string value(c, 5);

I fear that hasSourceExpression could match Class type, and therefore this case wouldn't be detected.

carlosgalvezp added inline comments.Mar 25 2023, 4:10 AM

Should we use C++ casts?


Should this be a char?


Shouldn't this fail, since the constructor std::string(char, char) does not exist?

ccotter updated this revision to Diff 508345.Mar 25 2023, 2:38 PM
ccotter marked 4 inline comments as done.
  • Add more cases to swapped params
  • Change message, add more tests
ccotter marked an inline comment as done.Mar 25 2023, 2:39 PM
ccotter added inline comments.

Thinking about this more, I think we have to keep the "args swapped" case very narrow to the existing patterns of the first arg being a char literal, and the second being an integer literal. Consider

char buffer[] = "abcdef";
char& chRef = ch;
std::string not_swapped(chRef, 4); // Forgot '&' in front of 'chRef'

std::string also_not_swapped(buffer[2], 2); // Forgot '&' in front of 'buffer[2]'

Neither of these are swapped arg bugs, so I don't think we can have the tool consider this a swapped arg bug and apply fixits. The also_not_swapped is the exact bug I wrote a few weeks ago.

I think there are two types of bugs the tool has to be aware of with respect to the string fill constructor

  1. swapped args
  2. "confusing" args - where the arguments are both implicitly cast from int to char and vice versa.

For swapped args, the existing logic will match when the first arg has isInteger() type, and the second arg is characterLiteral(). At most, I think we can only expand the logic to include constructor exprs when the second arg is a declRefExpr to a char, but not char ref (see the not_swapped example above), or when the second arg is the result of a function call that returns a char. In these cases, we can be sure that the the author did not mean to put a & in front of the second argument expression.

For "confusing" args, which is not implemented in the existing logic, but I am proposing in this change, this is the more general case where I think the logic should match any expression as long as both argument expressions are implicitly cast. This is "confusing" because int-to-char and char-to-int conversions are not obvious to the reader, and it should be rare for both conversions to take place when calling the fill constructor (confirmed anecdotally in the codebases I've checked). The tool could not offer fixits here since the fix might be to swap the args, or to add a & in front of the first arg (or even another change). At most, we can warn and alert the user to the problem. If this is what the author intended, they can use explicit casts to make the intent obvious to the reader.


I'll add a test for reference to integer, but I'm matching on the type of the source expression that is being cast, which will be of integer type, even if the expression is string str(x, y) and y is an int& (the source expression in the cast will be of type int).


See earlier explanation on why we can't treat these all as swapped parameters cases.


I'm not sure what you mean here - would you mind clarifying? Do you mean should this example cast the second arg '5' to char?


The new logic I wrote flags when both parameters are implicitly cast from int to char for the first arg, and char to int in the second argument. This specific situation is what I found to be most confusing and likely bug prone. I didn't include the case where only one argument is implicitly cast to be buggy, since I wanted to be more conservative in how many more expressions to flag as being buggy, and two implicit casts (which is a bug I wrote recently) is "more" bug prone than just one. We could expand it however - would you like to explore that option?

ccotter edited the summary of this revision. (Show Details)Mar 25 2023, 2:47 PM
ccotter marked an inline comment as done.

bump please?