This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Check for suspicious string assignments.
ClosedPublic

Authored by xazax.hun on Dec 10 2015, 2:52 AM.

Details

Summary

It is possible to assign arbitrary integer types to strings. Sometimes it is the result of missing to_string call or apostrophes.

This check aims to find such errors.

Diff Detail

Repository
rL LLVM

Event Timeline

xazax.hun updated this revision to Diff 42404.Dec 10 2015, 2:52 AM
xazax.hun retitled this revision from to [clang-tidy] Check for suspicious string assignments..
xazax.hun updated this object.
xazax.hun added a reviewer: alexfh.
xazax.hun added subscribers: dkrupp, cfe-commits.
xazax.hun updated this revision to Diff 42406.Dec 10 2015, 2:57 AM

Minor update to docs.

alexfh edited edge metadata.Dec 10 2015, 4:43 AM

That's quite a surprising behavior. Looks like a bug in the library implementation (or in the standard) to me.

But the check is awesome. Thank you!

clang-tidy/misc/StringAssignmentCheck.cpp
21 ↗(On Diff #42406)

Please only register the matcher for C++ code (see almost any other check for an example). It also seems that you don't need the Matcher variable.

33 ↗(On Diff #42406)

The isIntegerType() check can be moved to the matcher. I'd also add a matcher for isAnyCharacterType and move that check to the matcher as well. The latter is fine for a follow-up.

36 ↗(On Diff #42406)

Instead of storing the fixits in an array, I'd organize the code as follows:

auto Diag = diag(...);
if (!Loc.isMacroID()) {
  Diag << FixItHint::CreateInsertion(...) << FixItHint::CreateInsertion(...);
}
38 ↗(On Diff #42406)

Please check that we're in C++11 mode before suggesting this fix.

clang-tidy/misc/StringAssignmentCheck.h
1 ↗(On Diff #42406)

I suggest making the name more specific, e.g. StringIntegerAssignmentCheck. Same for the check name (misc-string-integer-assignment).

alexfh edited edge metadata.Dec 10 2015, 4:44 AM
alexfh added a subscriber: rsmith.

BTW, Richard, is the possibility of assignment of an integer to a std::string a bug in the standard?

xazax.hun updated this revision to Diff 42427.Dec 10 2015, 6:48 AM
xazax.hun marked 5 inline comments as done.
  • Modified the patch according to the review.

Thank you for the review. I have seen a few false positives in the llvm codebase. I agree that these kind of conversions are unfortunate but I am sure fixing this would cause a huge backward compatibility problem. Nontheless I also think that it might be something that is worth to consider.

alexfh added inline comments.Dec 10 2015, 8:52 AM
clang-tidy/misc/StringIntegerAssignmentCheck.cpp
45 ↗(On Diff #42427)

I'd expand the message with an explanation of what actually happens here. Also it's not clear what the "to_string operation" refers to. And one more thing I missed, is other character types.

So I'd go with something similar to: "an integer is interpreted as a character code when assigning it to a string; if this is intended, cast the integer to '%0'; if you want a string representation, use %select{std::to_string()|std::to_wstring()|appropriate conversion facility}1", where %0 should be substituted with char/wchar_t/whatever the specific character type is.

(and please add tests for other character types)

BTW, Richard, is the possibility of assignment of an integer to a std::string a bug in the standard?

This is the result of:

basic_string& operator=(charT c);
Returns: *this = basic_string(1,c).

So I believe it is by design.

BTW, Richard, is the possibility of assignment of an integer to a std::string a bug in the standard?

This is the result of:

basic_string& operator=(charT c);
Returns: *this = basic_string(1,c).

So I believe it is by design.

I see how the current behavior can be explained by the standard, however I'm not sure this was the intention of the standard to specifically allow this behavior.

And if it's unintended (and I'd say undesired), the standard could also be changed to enforce a mechanism to disallow assignment of std::string from other integer types (e.g. a templated deleted operator= enable_if'd for all integral types except for charT, or something like that).

Maybe it wouldn't be reasonable for some reason, but it's worth trying ;)

aaron.ballman added inline comments.Dec 10 2015, 11:59 AM
clang-tidy/misc/StringIntegerAssignmentCheck.cpp
19 ↗(On Diff #42427)

I think this would be reasonable to add to ASTMatchers.h, but with the name isAnyCharacterType.

44 ↗(On Diff #42427)

Would it make sense to allow = 0 or += 0 for null terminators? Requiring '\0' isn't the end of the world, but I think 0 is pretty common for that particular case.

46 ↗(On Diff #42427)

I don't think to_string() is required for all cases. For instance:

s += 12; // --> s += "12";
s = 32; // --> s = ' '; or s = "32";
s = 4; // --> s = '4'; or s = "4";
s += some_integer; // --> s += std::to_string(some_integer);

This is also currently doing the wrong thing for std::wstring objects, or std::basic_string<char32_t>, etc.

clang-tidy/misc/StringIntegerAssignmentCheck.h
18 ↗(On Diff #42427)

"where an integer" instead of "where integer".

docs/clang-tidy/checks/misc-string-integer-assignment.rst
9 ↗(On Diff #42427)

Can remove the comma.

10 ↗(On Diff #42427)

"It is possible to assign an integer"

test/clang-tidy/misc-string-integer-assignment.cpp
39 ↗(On Diff #42427)

I would also like to see some tests for std::wstring.

LegalizeAdulthood added inline comments.
test/clang-tidy/misc-string-integer-assignment.cpp
25 ↗(On Diff #42427)

Use {} initialization perhaps?

xazax.hun marked 7 inline comments as done.Dec 14 2015, 3:08 AM
xazax.hun added inline comments.
clang-tidy/misc/StringIntegerAssignmentCheck.cpp
19 ↗(On Diff #42427)

I choose the name isAnyCharacter to be consistent with isInteger which also lacks the Type suffix. Do you still prefer the isAnyCharacterType name?

44 ↗(On Diff #42427)

The = 0 or += 0 will not compile due to ambiguous overload (it could both be converted to const CharT * or CharT).

46 ↗(On Diff #42427)

I think it might be even better to not to suggest fixits in this case, since there are some false positives. But if you think it is ok to have fixits I will come up some heuristics like:
For one digit constants use 'const' or L'const'.
For more than one digit constants use "const" or L"const".
For unknown character types do not recommend fixits.
For other cases and C++11 use to_string or to_wstring.

xazax.hun updated this revision to Diff 42698.Dec 14 2015, 3:08 AM
xazax.hun marked an inline comment as done.
  • Addressed review comments.
  • Removed fixit suggestions.
alexfh added inline comments.Dec 14 2015, 5:15 AM
clang-tidy/misc/StringIntegerAssignmentCheck.cpp
20 ↗(On Diff #42698)

I prefer isAnyCharacter for consistency with isInteger. I'd also suggest to move the matcher to ASTMatchers.h eventually.

47 ↗(On Diff #42698)

I'd suggest to still add fixits. The heuristics you suggest seems reasonable. Fine for a follow-up.

docs/clang-tidy/checks/misc-string-integer-assignment.rst
2 ↗(On Diff #42698)

nit: Please add more =s

9 ↗(On Diff #42698)

"The source of the problem" seems slightly confusing here (we didn't yet define "the problem"). I'd say slightly differently:

The check finds assignments of an integer to ``std::basic_string<T>`` (``std::string``, ``std::wstring``, etc.). The source of the problem is the following assignment operator of ``std::basic_string``:

.. code:: c++
  basic_string& operator=( CharT ch );

and the fact that numeric types can be implicitly converted to most character types.
test/clang-tidy/misc-string-integer-assignment.cpp
7 ↗(On Diff #42698)

I don't think the definitions of the methods are helpful here.

26 ↗(On Diff #42698)

Why do you need to initialize the strings?

aaron.ballman added inline comments.
clang-tidy/misc/StringIntegerAssignmentCheck.cpp
20 ↗(On Diff #42698)

I prefer isAnyCharacter for consistency with isInteger. I'd also suggest to move the matcher to ASTMatchers.h eventually.

(CCing in Manuel for his opinion as well.)

I like the consistency idea, but it was my understanding that we want all AST matcher functions to match the C++ API they are exposing, which is why I still have a slight preference for isAnyCharacterType().

45 ↗(On Diff #42698)

The = 0 or += 0 will not compile due to ambiguous overload (it could both be converted to const CharT * or CharT).

Excellent point. :-)

47 ↗(On Diff #42698)

I'd suggest to still add fixits. The heuristics you suggest seems reasonable. Fine for a follow-up.

Agreed.

xazax.hun updated this revision to Diff 42710.Dec 14 2015, 6:26 AM
xazax.hun marked 4 inline comments as done.
  • Reimplemented fixits.
  • Addressed the review comments.
clang-tidy/misc/StringIntegerAssignmentCheck.cpp
20 ↗(On Diff #42698)

Adding this checker to ASTMatchers.h seems like a trivial modification. I think I will just add it and alter this patch once this check is accepted and you decided how to call it.

test/clang-tidy/misc-string-integer-assignment.cpp
26 ↗(On Diff #42698)

I do not need to initialize them. I made the tests more minimal.

test/clang-tidy/misc-string-integer-assignment.cpp
33 ↗(On Diff #42710)

What happens if the test code had used static_cast<char>?

46 ↗(On Diff #42710)

Use static_cast<> instead of C-style cast?

alexfh accepted this revision.Dec 14 2015, 5:04 PM
alexfh edited edge metadata.

Looks good with a few nits.

Thank you for the new awesome check!

clang-tidy/misc/StringIntegerAssignmentCheck.cpp
74 ↗(On Diff #42710)

Don't use else after a return.

test/clang-tidy/misc-string-integer-assignment.cpp
27 ↗(On Diff #42710)

I'd also remove the check name from all CHECK lines after the first one to fit the CHECK lines to 80 columns.

46 ↗(On Diff #42710)

I think, both casts should be tested at least once. And it doesn't matter what cast style is used in the rest of the test.

This revision is now accepted and ready to land.Dec 14 2015, 5:04 PM
This revision was automatically updated to reflect the committed changes.

Thank you for the new awesome check!

Thank you for the review. I committed the new matcher to clang as well. I went with isAnyCharacter name for now, but we can rename it anytime.