This is an archive of the discontinued LLVM Phabricator instance.

[Clang][Diagnostic] Add `-Wcomparison-op-parentheses` to warn on chained comparisons
Needs ReviewPublic

Authored by hazohelet on Jan 27 2023, 10:48 PM.

Details

Summary

This patch introduces a new warning flag -Wcomparison-op-parentheses in -Wparentheses to issue warnings with its fixit hint for comparison operators within another comparison operator.

This fixes https://github.com/llvm/llvm-project/issues/60256

Diff Detail

Event Timeline

hazohelet created this revision.Jan 27 2023, 10:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 10:48 PM
hazohelet requested review of this revision.Jan 27 2023, 10:48 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: sstefan1. · View Herald Transcript
hazohelet updated this revision to Diff 492986.Jan 28 2023, 1:09 AM

Fix the new warning issued against libcxx test code.

erichkeane added inline comments.
clang/lib/Sema/SemaExpr.cpp
15577

I'd repalce the comment with something more like:

Emit a diagnostic for a case where a comparison operation is a direct sub-expression of another comparison operation.  Additionally, emit a fixit hint to suggest the inner comparison expression be wrapped in parentheses.
15586

I find myself wondering if we could provide a better 'suggested fix' here. Aaron is better with the diagnostics than I am, but I would think that someone doing:

a < b < c PROBABLY doesn't mean that, they probably mean: a < b && b < c.

Also, is the mistake the 'same' when they do something like a > b != c ? It would seem to me that mixing operators might make it something either more intentional/meaningful. ADDITIONALLY, in the case where they are booleans, these end up being overly noisy. The pattern of the == c (where 'c' is bool, or convertible to bool) is probably intentional.

I think the logic here needs to be more complicated than just "Comparison within Comparison", however I don't have a fully formed idea of when to diagnose.

@tahonermann : Do you perhaps have a good idea?

clang/test/Sema/comparison-op-parentheses.c
2

don't use macros to control conditional diagnostics like that. You use -verify=something.

hazohelet added inline comments.Jan 31 2023, 7:01 AM
clang/lib/Sema/SemaExpr.cpp
15586

I find myself wondering if we could provide a better 'suggested fix' here. Aaron is better with the diagnostics than I am, but I would think that someone doing:

a < b < c PROBABLY doesn't mean that, they probably mean: a < b && b < c.

Yes. We could provide a better fix-it hint.
My idea:

  • In the case of chained relational operators (<, >, <=, >=), clang suggests adding &&.
  • In other cases, clang suggests parentheses.

How about doing it this way? It's similar to how Rust handles chained comparisons.

Also, is the mistake the 'same' when they do something like a > b != c ? It would seem to me that mixing operators might make it something either more intentional/meaningful. ADDITIONALLY, in the case where they are booleans, these end up being overly noisy. The pattern of the == c (where 'c' is bool, or convertible to bool) is probably intentional.

I think the logic here needs to be more complicated than just "Comparison within Comparison", however I don't have a fully formed idea of when to diagnose.

I have a differing perspective on suppressing the warning for boolean and boolean-convertible values.
There are two possible programmer mistakes in chained comparisons.

  1. a > b != c misunderstood as a > b && b != c
  2. a > b != c misunderstood as a > (b != c)

While the latter is considered rare in this scenario, the former could be likely to happen due to other languages, such as Python handling chained comparisons in the former manner.

erichkeane added inline comments.Jan 31 2023, 7:04 AM
clang/lib/Sema/SemaExpr.cpp
15586

I'd be interested to see the fixit-hints for the first bit, also to see how others feel about it here.

IMO, a > b != c to mean (a > b) != c is a reasonably common pattern I suspect we won't want to be noisy on.

I'd be interested to see the fixit-hints for the first bit, also to see how others feel about it here.

My 2c is that -Wparentheses is already a very stylistic warning (even correct code, without knowing about this esoteric/specific suppression style of adding parens, is likely to be caught by this general class of diagnostic) so I'm pretty apprehensive about opening it up further - but data would be the decider. Running the proposed slices (different operators in different orders, etc) of warning over some broad codebases to get a sense of the false positive rate would be really helpful/necessary to decide which things to include and exclude.

hazohelet updated this revision to Diff 494635.Feb 3 2023, 7:39 AM

Address comments from erichkeane:

  • Fix comment wording
  • Avoid using macro in test file

Implement my proposal for fix-it hint:

  • In the case of chained relational operators (<, >, <=, >=), suggest adding &&.
  • In other cases, suggest parentheses.
hazohelet marked 2 inline comments as done.Feb 3 2023, 7:41 AM

I'm reasonably happy here. I think the chained-comparisons getting the && suggestion is nice. However in each case, the diagnostic is REALLY poor. It isn't really clear to most folks (barely clear to me!) that '<' within '>' really means anything. I'd love if we could spend some time brainstorming some better diagnostic wording that makes it clear what the problem is.

aaron.ballman added inline comments.Feb 3 2023, 8:38 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
6654

Spit-balling wording ideas:

'%0' results in a boolean value which is then compared using '%1'; did you mean a three-way comparison instead?

comparing the boolean result of '%0' with '%1' does not perform a three-way comparison

Also, why are these diagnostics off by default? Do we have some idea as to the false positive rate?

clang/lib/Sema/SemaExpr.cpp
15586

Also, why are these diagnostics off by default? Do we have some idea as to the false positive rate?

As for the false positive rate, I have checked for instances of this warning in the codebases for 'oneapi-src/oneTBB', 'rui314/mold', and 'microsoft/lightgbm', but did not find any new cases.
I also ran a test on 'tensorflow/tensorflow' using gcc '-Wparentheses' and found six lines of code that trigger the new diagnostic. They all relate to checking whether x and y have the same sign using x > 0 == y > 0 and alike. I tried to build with tensorflow using clang, but I stumbled upon some errors (for my poor knowledge of bazel configuration), so here I am using gcc.

I set the diagnostic disabled by default for compatibility with gcc. Considering the test against tensorflow above, it would be too noisy if we turned on the suggest-parentheses diagnostic by default (From my best guess, it would generate at least 18 new instances of warning on the tensorflow build for the six lines).
However, in my opinion, it is reasonable enough to have the diagnostic on chained relational operators enabled by default. The following is an excerpt from the 'Existing Code in C++' section of the proposal document of the introduction of chaining comparison for C++17.

Overall, what we found was:

  • Zero instances of chained arithmetic comparisons that are correct today. That is, intentionally using the current standard behavior.
  • Four instances of currently-erroneous arithmetic chaining, of the assert(0 <= ratio <= 1.0); variety. These are bugs that compile today but don’t do what the programmer intended, but with this proposal would change in meaning to become correct.
  • Many instances of using successive comparison operators in DSLs that overloaded these operators to give meaning unrelated to comparisons.

Note that the 'chaining comparisons' in the document are

  • all ==, such as a == b == c == d;
  • all {<, <=}, such as a < b <= c < d; and
  • all {>, >=} (e.g., a >= b > c > d).

URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0893r0.html

Although this paper is five years old now, I think we can conclude chaining relational operators are bug-prone and should be diagnosed by default.
Also, reading the document above, I think it would be reasonable to suggest adding '&&' in a == b == c case, too.

tensorflow/tensorflow newly-warned lines:

Also, why are these diagnostics off by default? Do we have some idea as to the false positive rate?

As for the false positive rate, I have checked for instances of this warning in the codebases for 'oneapi-src/oneTBB', 'rui314/mold', and 'microsoft/lightgbm', but did not find any new cases.

Thank you for doing the extra testing! Sorry for the delayed response, this review fell off my radar for a bit.

I also ran a test on 'tensorflow/tensorflow' using gcc '-Wparentheses' and found six lines of code that trigger the new diagnostic. They all relate to checking whether x and y have the same sign using x > 0 == y > 0 and alike. I tried to build with tensorflow using clang, but I stumbled upon some errors (for my poor knowledge of bazel configuration), so here I am using gcc.

Thank you for reporting this, that's really good information.

I set the diagnostic disabled by default for compatibility with gcc. Considering the test against tensorflow above, it would be too noisy if we turned on the suggest-parentheses diagnostic by default (From my best guess, it would generate at least 18 new instances of warning on the tensorflow build for the six lines).
However, in my opinion, it is reasonable enough to have the diagnostic on chained relational operators enabled by default. The following is an excerpt from the 'Existing Code in C++' section of the proposal document of the introduction of chaining comparison for C++17.

Overall, what we found was:

  • Zero instances of chained arithmetic comparisons that are correct today. That is, intentionally using the current standard behavior.
  • Four instances of currently-erroneous arithmetic chaining, of the assert(0 <= ratio <= 1.0); variety. These are bugs that compile today but don’t do what the programmer intended, but with this proposal would change in meaning to become correct.
  • Many instances of using successive comparison operators in DSLs that overloaded these operators to give meaning unrelated to comparisons.

Note that the 'chaining comparisons' in the document are

  • all ==, such as a == b == c == d;
  • all {<, <=}, such as a < b <= c < d; and
  • all {>, >=} (e.g., a >= b > c > d).

URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0893r0.html

Although this paper is five years old now, I think we can conclude chaining relational operators are bug-prone and should be diagnosed by default.
Also, reading the document above, I think it would be reasonable to suggest adding '&&' in a == b == c case, too.

I tend to agree -- I was searching around sourcegraph (https://sourcegraph.com/search?q=context:global+lang:C+lang:C%2B%2B+%5BA-Za-z0-9_%5D%2B%5Cs*%28%3D%3D%7C%21%3D%7C%3C%3D%7C%3E%3D%29%5Cs*%5BA-Za-z0-9_%5D%2B%5Cs*%28%3D%3D%7C%21%3D%7C%3C%3D%7C%3E%3D%29&patternType=regexp&sm=1&groupBy=repo) and I can't find a whole lot of evidence for chained operators outside of comments.

hazohelet updated this revision to Diff 502166.Mar 3 2023, 9:46 AM

Update the differential

  • Revise warning messages based on ideas from @aaron.ballman.
  • Introduce a new warning flag -Wchaining-comparisons that is enabled by default, to warn about chaining relationals or equal operators.
  • Adjust existing test files to conform with the new warning settings.

Also, why are these diagnostics off by default? Do we have some idea as to the false positive rate?

As for the false positive rate, I have checked for instances of this warning in the codebases for 'oneapi-src/oneTBB', 'rui314/mold', and 'microsoft/lightgbm', but did not find any new cases.

Thank you for doing the extra testing! Sorry for the delayed response, this review fell off my radar for a bit.

I also ran a test on 'tensorflow/tensorflow' using gcc '-Wparentheses' and found six lines of code that trigger the new diagnostic. They all relate to checking whether x and y have the same sign using x > 0 == y > 0 and alike. I tried to build with tensorflow using clang, but I stumbled upon some errors (for my poor knowledge of bazel configuration), so here I am using gcc.

Thank you for reporting this, that's really good information.

I set the diagnostic disabled by default for compatibility with gcc. Considering the test against tensorflow above, it would be too noisy if we turned on the suggest-parentheses diagnostic by default (From my best guess, it would generate at least 18 new instances of warning on the tensorflow build for the six lines).
However, in my opinion, it is reasonable enough to have the diagnostic on chained relational operators enabled by default. The following is an excerpt from the 'Existing Code in C++' section of the proposal document of the introduction of chaining comparison for C++17.

Overall, what we found was:

  • Zero instances of chained arithmetic comparisons that are correct today. That is, intentionally using the current standard behavior.
  • Four instances of currently-erroneous arithmetic chaining, of the assert(0 <= ratio <= 1.0); variety. These are bugs that compile today but don’t do what the programmer intended, but with this proposal would change in meaning to become correct.
  • Many instances of using successive comparison operators in DSLs that overloaded these operators to give meaning unrelated to comparisons.

Note that the 'chaining comparisons' in the document are

  • all ==, such as a == b == c == d;
  • all {<, <=}, such as a < b <= c < d; and
  • all {>, >=} (e.g., a >= b > c > d).

URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0893r0.html

Although this paper is five years old now, I think we can conclude chaining relational operators are bug-prone and should be diagnosed by default.
Also, reading the document above, I think it would be reasonable to suggest adding '&&' in a == b == c case, too.

I tend to agree -- I was searching around sourcegraph (https://sourcegraph.com/search?q=context:global+lang:C+lang:C%2B%2B+%5BA-Za-z0-9_%5D%2B%5Cs*%28%3D%3D%7C%21%3D%7C%3C%3D%7C%3E%3D%29%5Cs*%5BA-Za-z0-9_%5D%2B%5Cs*%28%3D%3D%7C%21%3D%7C%3C%3D%7C%3E%3D%29&patternType=regexp&sm=1&groupBy=repo) and I can't find a whole lot of evidence for chained operators outside of comments.

Thanks for the review and the sourcegraph search!
I added a new warning flag -Wchaining-comparisons that is enabled by default. I added this flag to -Wparentheses for gcc compatibility.

hazohelet updated this revision to Diff 506887.Mar 21 2023, 2:58 AM

Rebase and Ping

xgupta added a subscriber: xgupta.EditedJul 22 2023, 7:46 PM

Thought to point out that there is a clang-tidy check in review doing the similar thing - https://reviews.llvm.org/D144429.