This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] extend bugprone-signed-char-misuse check.
ClosedPublic

Authored by ztamas on Mar 6 2020, 7:55 AM.

Details

Summary

Cover a new use case when using a 'signed char' as an integer
might lead to issue with non-ASCII characters. Comparing
a 'signed char' with an 'unsigned char' using equality / unequality
operator produces an unexpected result for non-ASCII characters.

Diff Detail

Event Timeline

ztamas created this revision.Mar 6 2020, 7:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 7:55 AM
ztamas added a comment.EditedMar 6 2020, 7:59 AM

I run the check on LLVM code and found only one catch of suspicious comparison:

clang-tidy -p=/home/zolnai/clang/build /home/zolnai/clang/llvm-project/clang/lib/Lex/Lexer.cpp
/home/zolnai/clang/llvm-project/clang/lib/Lex/Lexer.cpp:1293:39: warning: comparison between signed and unsigned char [bugprone-signed-char-misuse]
      if ((C == '\n' || C == '\r') && C != PrevC)
ztamas added a comment.EditedMar 6 2020, 8:12 AM

In the LibreOffice codebase, I found 8 catches:
https://gist.github.com/tzolnai/2b22492c0cf79f5dba577f6a878657e3

All catches seem valid to me.

Eugene.Zelenko added a project: Restricted Project.
njames93 added inline comments.Mar 7 2020, 12:32 AM
clang-tools-extra/clang-tidy/bugprone/SignedCharMisuseCheck.cpp
74

All node matchers are implicitly allOf matchers so you can safely drop the allOf matcher.

97

binaryOperator(hasAnyOperatorName("==", "!=") ...
Also whats the reason for not checking <, >, <=, >= or <=>?

103

Any reason for this trailing ;?

125

This should probably be a condition variable and bring the init from the start of check inside.

141–143
} else if (const auto *IntegerType = Result.Nodes.getNodeAs<QualType>("integerType")) {
  ...
} else 
  llvm_unreachable("Unexpected match");
clang-tools-extra/test/clang-tidy/checkers/bugprone-signed-char-misuse.cpp
210

New line

ztamas updated this revision to Diff 248907.Mar 7 2020, 12:49 AM

Fix pre-merge check error

Plus remove unneeded "no warning" comments
and add a missing quote mark in docs.

ztamas updated this revision to Diff 248908.Mar 7 2020, 1:01 AM

Changed code based on reviewer comments.

  • Removed unneeded allOf()
  • Remove unneeded ';'
  • Added new line at the end of the file
  • Moved variable declarations inside if()
ztamas updated this revision to Diff 248909.Mar 7 2020, 1:07 AM

One more unneeded allOf and clang-format

ztamas marked 5 inline comments as done.Mar 7 2020, 1:08 AM
ztamas marked an inline comment as done.Mar 7 2020, 1:31 AM
ztamas added inline comments.
clang-tools-extra/clang-tidy/bugprone/SignedCharMisuseCheck.cpp
97

I think these are different cases when the programmer checks the equality of two characters or checking some kind of order of characters. In case of equality, there seems to be a clear assumption, that if the two character variables represent the same character (semantically the same), then the equality should return true. This assumption fails with mixed signed and unsigned char variables.
However, when the programmer uses a less or a greater operator it's a bit different. I guess a programmer in the case starts to wonder what is the actual order of the characters, checks the ASCII table if it's not obvious for a set of characters. Also, it's not clear what might be the false assumption here. Is an ASCII character smaller or greater than a non-ASCII character? So I don't see a probable false assumption here what we can point out. At least, in general.
Furthermore, I optimized the check for equality/inequality, with ignoring the ASCII characters for both the signed and the unsigned operand. This makes sense for equality/inequality but does not make sense for the other comparison operators.
All-in-all I'm now focusing on the equality / unequality operators. Usage of these operands clearly something we can catch. Checking other comparison operators is something that needs consideration, so I don't bother with that in this patch.

ztamas updated this revision to Diff 248912.Mar 7 2020, 1:37 AM

Use quotes in warning message.

njames93 added inline comments.Mar 7 2020, 1:42 AM
clang-tools-extra/clang-tidy/bugprone/SignedCharMisuseCheck.cpp
97

Fair enough, in that case can you specify in the docs that you only check for == and != comparison. Currently they only say it looks for comparison rather than maybe equality comparison.
Also point about hasAnyOperatorName still stands.

Harbormaster completed remote builds in B48442: Diff 248907.
Harbormaster completed remote builds in B48445: Diff 248912.
ztamas updated this revision to Diff 248917.Mar 7 2020, 2:18 AM

Update code based on reviewer comments.

  • Update docs: comparison -> equality/inequality comparison.
  • Use hasAnyOperatorName
  • Plus fix a typo
ztamas marked 2 inline comments as done.Mar 7 2020, 2:20 AM
njames93 accepted this revision.Mar 9 2020, 6:38 AM

LGTM

However, how does this handle cases when the type is written as char. They can be signed/unsigned based on what platform is being targeted. But on a platform where char is signed, comparison to an unsigned char is dangerous and the complimented case is the same. Surely any char comparison to signed char or unsigned char should be warned. But I'm not sure if that's in the scope of this check.

This revision is now accepted and ready to land.Mar 9 2020, 6:38 AM

LGTM

However, how does this handle cases when the type is written as char. They can be signed/unsigned based on what platform is being targeted. But on a platform where char is signed, comparison to an unsigned char is dangerous and the complimented case is the same. Surely any char comparison to signed char or unsigned char should be warned. But I'm not sure if that's in the scope of this check.

It uses the default signedness what clang specifies for char (which depends on the platform). For example, on my Linux x64 system, char is equivalent to signed char and so this clang-tidy check catches also the char <-> unsigned char comparisons. This default behavior can be overridden with -funsigned-char and -fsigned-char compilation options. So for a cross-platform C++ code, it's an option to run the clang-tidy twice, using these two options and so both char <-> unsigned char and char <-> signed char comparisons can be caught.

This revision was automatically updated to reflect the committed changes.