This is an archive of the discontinued LLVM Phabricator instance.

[Diagnostics] Improve -Wxor-used-as-pow
AbandonedPublic

Authored by xbolva00 on Aug 18 2019, 1:09 PM.

Details

Summary

Handle specific cases

(2 ^ 64) - 1
Digit separators

Diff Detail

Event Timeline

xbolva00 created this revision.Aug 18 2019, 1:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2019, 1:09 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

From post commit discussion, Nico Weber:

  1. Hi,
  2. this results in a false positive on webrtc, on this code:
  3. https://cs.chromium.org/chromium/src/third_party/libwebp/src/enc/picture_csp_enc.c?q=picture_csp_enc.c&sq=package:chromium&dr&l=1002
  4. The code does this:
  5. #ifdef WORDS_BIGENDIAN
  6. #define ALPHA_OFFSET 0 // uint32_t 0xff000000 is 0xff,00,00,00 in memory
  7. #else
  8. #define ALPHA_OFFSET 3 // uint32_t 0xff000000 is 0x00,00,00,ff in memory
  9. #endif
  10. // ...
  11. const uint8_t* const argb = (const uint8_t*)picture->argb;
  12. const uint8_t* const a = argb + (0 ^ ALPHA_OFFSET);
  13. const uint8_t* const r = argb + (1 ^ ALPHA_OFFSET);
  14. const uint8_t* const g = argb + (2 ^ ALPHA_OFFSET);
  15. const uint8_t* const b = argb + (3 ^ ALPHA_OFFSET);
  16. The idea is to get bytes 0, 1, 2, 3 as a, r, g, b if ALPHA_OFFSET is 0, or bytes 3, 2, 1, 0 if ALPHA_OFFSET is 3.
  17. Maybe this shouldn't fire if the rhs is a macro?
  18. (On all of Chromium, this fired 3 times; in the other 2 cases the rhs was a literal as well, and those two were true positives.)
xbolva00 added a subscriber: jfb.Aug 19 2019, 11:19 AM

I agree what @tkanis suggested and be silent if RHS is macro as real world code shows it. Opinions?

@jfb @aaron.ballman

xbolva00 updated this revision to Diff 215957.Aug 19 2019, 11:29 AM

Do not warn if RHS is macro.

I agree what @tkanis suggested and be silent if RHS is macro as real world code shows it. Opinions?

@jfb @aaron.ballman

I suspect we can come up with examples where the macro on either lhs or rhs is sensible and other examples where its senseless. The existing test cases already have:

res = TWO ^ 8; // expected-warning {{result of 'TWO ^ 8' is 10; did you mean '1 << 8' (256)?}}
// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:9-[[@LINE-1]]:16}:"1 << 8"
// expected-note@-2 {{replace expression with '0x2 ^ 8' to silence this warning}}

res = 2 ^ TEN; // expected-warning {{result of '2 ^ TEN' is 8; did you mean '1 << TEN' (1024)?}}
// CHECK: fix-it:"{{.*}}":{[[@LINE-1]]:9-[[@LINE-1]]:16}:"1 << TEN"
// expected-note@-2 {{replace expression with '0x2 ^ TEN' to silence this warning}}

showing that we expect to warn when macros are involved. If we decide that macros should silence the warning, I would expect any use of a macro in the expression to silence it, not just a RHS macro.

I think it is too soon to jump and disable all macros. We still catch all motivating cases, it found bugs in Chromium.

Chromium's case makes sense but on the other hand, LHS does not make much sense to be a macro. #define TWO 2 and res = TWO ^ 32? And even for this case, I think the user should rework it to #define TWO 0x2.

I think it is too soon to jump and disable all macros. We still catch all motivating cases, it found bugs in Chromium.

I think it's too soon to disable any macros. The workaround for Chromium is trivial and makes the code more clear.

Chromium's case makes sense but on the other hand, LHS does not make much sense to be a macro. #define TWO 2 and res = TWO ^ 32? And even for this case, I think the user should rework it to #define TWO 0x2.

I guess I don't see the distinction. If the user should rework TWO to be 0x2 when on the LHS, the same holds true for the RHS, to me.

xbolva00 added a subscriber: hans.Aug 19 2019, 12:50 PM
xbolva00 added a comment.EditedAug 19 2019, 12:50 PM

I think it's too soon to disable any macros.

It is hard to say (this was discussed a lot).

@thakis or @hans what do you think? Maybe you could do experiments for us. Comment the code which disables this warning in macros and try it on Chromium - let’s see how many false positives/negatives it could produce. This could really resolve “macro” question.

But please let’s not repeat same “macro yes/no” arguments from last review.

Anyway, I could make -Wxor-used-as-pow-in-macro so everybody would be happy. Is it solution?

Now, Chromium disabled this warning :( until we “fix” it.

Now, Chromium disabled this warning :( until we “fix” it.

If Chromium is willing to make build-system changes to add -Wno-xor-as-pow, then surely they should also be willing to make the suggested source-code fixit of #define ALPHA_OFFSET 0x3 where hexadecimal indicates an intentional bitwise operation. Let's give them a little time to react to that suggestion. :)
(A heavier workaround would be to use xor instead of ^.)

xbolva00 added a comment.EditedAug 19 2019, 1:37 PM

>> suggested source-code fixit of #define ALPHA_OFFSET 0x3

No, there is a note “replace with 0x2 ^ ALPHA_OFFSET” to silence it. Didnt you suggest 0x2 / 0xA? :)

While we can suggest as you wrote:
#define ALPHA_OFFSET 0x3

I really dont think it is worth to do.

My suggestion: comment macro bailout, run this on Chromium.

Any objections to this patch except “macro” question?

Is the test up-to-date ? I tried to apply the patch locally and I get a bunch of failures :

error: 'warning' diagnostics expected but not seen: 
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 73: result of '2 ^ 64' is 66; did you mean '-1LL'?
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 77: result of '2 ^ 64' is 66; did you mean 'ULLONG_MAX'?
error: 'warning' diagnostics seen but not expected: 
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 53: result of '2 ^ TEN' is 8; did you mean '1 << TEN' (1024)?
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 119: result of '2 ^ ALPHA_OFFSET' is 1; did you mean '1 << ALPHA_OFFSET' (8)?
error: 'note' diagnostics expected but not seen: 
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 73 (directive at /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp:75): replace expression with '0x2 ^ 64' to silence this warning
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 77 (directive at /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp:79): replace expression with '0x2 ^ 64' to silence this warning
error: 'note' diagnostics seen but not expected: 
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 53: replace expression with '0x2 ^ TEN' to silence this warning
  File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 119: replace expression with '0x2 ^ ALPHA_OFFSET' to silence this warning
8 errors generated.
lib/Sema/SemaExpr.cpp
9429

I think it would be nice to have a little comment above diagnoseXorMisusedAsPow to explain what the parameters are (for example. it is not immediately clear to me what is the difference between SubLHS and LHS).

9434

s/is macro/is a macro ?

File /home/bruno/software/llvm-project/clang/test/SemaCXX/warn-xor-as-pow.cpp Line 119: result of '2 ^ ALPHA_OFFSET' is 1; did you mean '1 << ALPHA_OFFSET' (8)?

This should not warn. Please verify that patch was applied correctly and you use newly built clang with this patch. (I use arc patch DXXXXX)

>> suggested source-code fixit of #define ALPHA_OFFSET 0x3

No, there is a note “replace with 0x2 ^ ALPHA_OFFSET” to silence it. Didnt you suggest 0x2 / 0xA? :)

While we can suggest as you wrote:
#define ALPHA_OFFSET 0x3

I really dont think it is worth to do.

I don't see this as a false positive; I think Chromium should change their macro rather than Clang disable this check for everyone using macros.

I agree. And I think they should just use const int instead of macros after all.

Maybe we should just improve silence hint if rhs is macro?
#define ALPHA_OFFSET 0x3 ?

I agree. And I think they should just use const int instead of macros after all.

Maybe we should just improve silence hint if rhs is macro?
#define ALPHA_OFFSET 0x3 ?

I don't think we should try to suggest a fixit to fix the macro definition because the replacement list can be arbitrarily complex. We might be able to do something for a simple object-like macro whose replacement list is a single integer literal, but I'm not certain it's worth the effort (we don't go to those lengths with fixits elsewhere, that I'm aware of).

xbolva00 updated this revision to Diff 216399.Aug 21 2019, 7:47 AM

Fixed review nits.

I fixed and enabled this checker to warn for macros too.

@aaron.ballman is right. Yes, as he said... If the user has a time to add -Wno-xor.... to build systems, he/she has also time to improve code readability and use hexadecimal literals.

But I am a bit worried that the user does not realize that he/she could fix it using hexadecimal literals... Should this be part of silence note? (any proposed wording?) or add to Clang docs?

xbolva00 marked 2 inline comments as done.Aug 21 2019, 7:54 AM

I fixed and enabled this checker to warn for macros too.

@aaron.ballman is right. Yes, as he said... If the user has a time to add -Wno-xor.... to build systems, he/she has also time to improve code readability and use hexadecimal literals.

But I am a bit worried that the user does not realize that he/she could fix it using hexadecimal literals... Should this be part of silence note? (any proposed wording?) or add to Clang docs?

That's a reasonable concern, but I'm not certain it's worth adding the text to the note for it. This seems more like a documentation issue. We have some automation for generating documentation for warning flags (see ClangDiagnosticsEmitter.cpp), and it looks like there may be a way to have the diagnostic itself provide the documentation so that it appears in https://clang.llvm.org/docs/DiagnosticsReference.html automatically. To me, that would be the best place for this documentation to live, but I have no idea if there's further work needed to support that.

But it seems everything is autogenerated and nothing is “custom message”. Any custom change is removed by next autogeneration.

Possibly @hans could tell us if there is a way

But it seems everything is autogenerated and nothing is “custom message”. Any custom change is removed by next autogeneration.

Possibly @hans could tell us if there is a way

The part I saw that looked interesting to me was: https://github.com/llvm-mirror/clang/blob/master/utils/TableGen/ClangDiagnosticsEmitter.cpp#L1786 but I see now that's only for diagnostic groups. You may need to extend this functionality to get the behavior we want, but that can always be done in a follow-up patch which also adds the documentation to the diagnostic flag.

I think it is too soon to jump and disable all macros. We still catch all motivating cases, it found bugs in Chromium.

I think it's too soon to disable any macros. The workaround for Chromium is trivial and makes the code more clear.

I think I missed the workaround. I only saw the suggestion to write 0x2 instead of 2 which doesn't seem more clear at all to me. Was there another fix suggestion?

I think it is too soon to jump and disable all macros. We still catch all motivating cases, it found bugs in Chromium.

I think it's too soon to disable any macros. The workaround for Chromium is trivial and makes the code more clear.

I think I missed the workaround. I only saw the suggestion to write 0x2 instead of 2 which doesn't seem more clear at all to me. Was there another fix suggestion?

Nope, you didn't miss it -- that was the suggestion. Using a hex notation is the way we let programmers signify that they want to do bit fiddling.

I also strongly -1 to disabling the warning for macros.
It's just a disservice to everyone, and the fact there's precedent doesn't mean it's the right solution.

That being said, does using C++14 binary literals also silence the diag? It should.

I think I missed the workaround. I only saw the suggestion to write 0x2 instead of 2 which doesn't seem more clear at all to me. Was there another fix suggestion?

Nope, you didn't miss it -- that was the suggestion. Using a hex notation is the way we let programmers signify that they want to do bit fiddling.

Sorry, I don't buy that :) If I wasn't aware it's there to suppress a warning, if I saw 0x2 I'd think the author of that was confused about what the 0x prefix does. Maybe a better fixit is to suggest xor instead of ^ which also suppresses the warning and which is imho a bit less weird. (xor is so rare that it's still a bit weird though.)

xbolva00 added a comment.EditedAug 21 2019, 11:51 AM

C++14 binary literals

Thanks, noted.

Suggest “xor”

We could. Aaron what do you think? But I still like “0x” more.

I think I missed the workaround. I only saw the suggestion to write 0x2 instead of 2 which doesn't seem more clear at all to me. Was there another fix suggestion?

Nope, you didn't miss it -- that was the suggestion. Using a hex notation is the way we let programmers signify that they want to do bit fiddling.

Sorry, I don't buy that :) If I wasn't aware it's there to suppress a warning, if I saw 0x2 I'd think the author of that was confused about what the 0x prefix does.

I don't understand how you draw that conclusion, but the reason behind why we went with that as a way to silence the diagnostic is because using the prefix acts as a signal that the developer wants to do bit manipulation more than just a decimal literal does. It's not ideal because it's largely a hidden way to silence diagnostics, but it's also not the only place where we do that sort of thing (like surrounding something in parens to silence a diagnostic, or casting something to void, etc).

Maybe a better fixit is to suggest xor instead of ^ which also suppresses the warning and which is imho a bit less weird. (xor is so rare that it's still a bit weird though.)

I don't think suggesting a fixit for xor is a good solution. For starters, you need a header included in order to use that for C. As a textual suggestion "; use the 'xor' alternative token to perform an exclusive OR" wouldn't bother me too much though.

Could the following text be acceptable?

replace expression with '0xA ^ 1' or ‘10 xor 1’ to silence this warning

(In C mode without xor macro “replace expression with '0xA ^ 1' to silence this warning”)

I am not fan of “the ‘xor’ alternative token”

rsmith added a subscriber: rsmith.Aug 21 2019, 12:19 PM

The ALPHA_OFFSET code seems to be unnecessarily "clever" to me. I think the warning can be suppressed by reversing the operands:

ALPHA_OFFSET ^ 2

But if I were maintaining that code I would get rid of the xor hack entirely and use

#define CHANNEL_OFFSET(i) (i)

or

#define CHANNEL_OFFSET(i) (3-(i))
xbolva00 added a comment.EditedAug 21 2019, 12:50 PM

Swap trick is cool, we can suggest it always(vs. ‘xor’)

Or maybe not :) 2 ^ 2 :)

Swap trick is cool, we can suggest it always(vs. ‘xor’)

Well, you should avoid suggesting that the user rewrite 2 ^ 10 as 10 ^ 2.
Anyway, I think some of the observers here might be confusing two aspects of the issue.

First there's "how can the programmer shut up this diagnostic?", to which there are dozens of answers that all work equally well. 0x2 ^ 3, 2 ^ 0x3, 2 xor 3, 3 ^ 2 (in this case), myxorfunction(2, 3), 02 ^ 03, 0b10 ^ 0b11, Richard's suggested de-cleverfication of 3 - 2, and so on. Basically anything that convinces the compiler you're not trying to take an integer power of 2.

Second there's "what advice and/or fixit should Clang produce to push the programmer toward one specific way of shutting up this diagnostic?". Right now, IIUC, Clang always gives an English note suggesting to change the left-hand side into hex (preserving the behavior of the code), and a technical fixit suggesting to use 1 << RHS or 1eRHS (changing the behavior of the code).

The main trouble we're having with #2 is that when Clang suggests one possible fix, the Chromium programmers seemed to interpret that as meaning that that was the only possible fix — they didn't look deeper to see if there might be other possible ways to improve the code.

Perhaps you could change the English note to say

to silence this warning, replace expression with '0x2 ^ ALPHA_OFFSET' or use the 'xor' keyword

This would at least give more of a jog to the programmer's brain, by saying "you have a choice to make here; don't apply the compiler's suggested fixit blindly."

xbolva00 updated this revision to Diff 216458.Aug 21 2019, 1:24 PM
xbolva00 edited the summary of this revision. (Show Details)

Detect digit separators.

xbolva00 updated this revision to Diff 216478.Aug 21 2019, 2:05 PM

Implemented @Quuxplusone's idea to improve silence note.

I think everybody should be fine with this now :)

I don't understand how you draw that conclusion, but the reason behind why we went with that as a way to silence the diagnostic is because using the prefix acts as a signal that the developer wants to do bit manipulation more than just a decimal literal does. It's not ideal because it's largely a hidden way to silence diagnostics, but it's also not the only place where we do that sort of thing (like surrounding something in parens to silence a diagnostic, or casting something to void, etc).

I think adding parens and casts are fairly well-understood to suppress warnings. Other things, like changing the spelling of literals, aren't. C++ is already a fairly baroque language, and every new thing that makes your code behave subtly different with a different compiler makes it more so. So I think we should be careful to not attach semantic meaning in surprising ways. Warnings have to carry their weight.

Maybe a better fixit is to suggest xor instead of ^ which also suppresses the warning and which is imho a bit less weird. (xor is so rare that it's still a bit weird though.)

I don't think suggesting a fixit for xor is a good solution. For starters, you need a header included in order to use that for C. As a textual suggestion "; use the 'xor' alternative token to perform an exclusive OR" wouldn't bother me too much though.

I don't like xor all that much either :)

I do like Richard's suggestion.

Quuxplusone: Saying "the chromium devs don't want to change their code" is missing the mark. I'm happy to change that file, but that doesn't really help. There are hundreds of people committing hundreds of changes to the code base every day. They have varying levels of C++ proficiency (like everywhere). It's not just about getting our code to compile today, but also about the long-term effects of the warnings we have enabled. We have to be careful about which warnings we enable, and this one is maybe not quite there yet on a usefulness/overhead tradeoff evaluation. I'd argue that Chromium is fairly typical here and that clang's default warning set should follow similar considerations.

I looked through D63423 and didn't find an evaluation of false / true positive rates when lhs and rhs are just literals vs when they're identifiers / macros. Glancing through https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=10+%5E&search=Search , I couldn't find a single true positive where lhs or rhs weren't a bare literal (but I didn't look super carefully). Did I just miss that in D63423? What was the motivation for firing on more than bare literals?

I think adding parens and casts are fairly well-understood to suppress warnings.

It should work here as well. #define ALPHA_OFFSET (3). Did anobody from Chromium try it?

They have varying levels of C++ proficiency

I consider things like hex decimals or parens to silence as a quite basic knowledge.

We have to be careful about which warnings we enable

I dont want off by default warning - useless. If there are still infinite macro concerns, I will revert just base patch.

This should not warn. Please verify that patch was applied correctly and you use newly built clang with this patch. (I use arc patch DXXXXX)

You were right, I messed up on my side. Sorry about the noise !

I don't have much to add to the macro yes/no discussion but I have added some inline comments.

lib/Sema/SemaExpr.cpp
9449

This will miss literals with a unary +, like 10 ^ +8. I am not finding any code sample on codesearch.isocpp.org with this pattern, but you could imagine someone writing code like :

#define EXPONENT 10

res1 = 10 ^ +EXPONENT;
res2 = 10 ^ -EXPONENT;

WDYT ?

9469

You could bailout early if the values are such that no diagnostics is ever going to be issued, before doing any source range/string manipulations (ie: do all the tests on the values first).

9506

Maybe put the slightly more expensive find last in the condition.

9670

I think that this will miss code like (2 ^ 64) - 1u since IgnoreParens() will not skip any implicit casts added by UsualArithmeticConversions. (Also IgnoreParens() skips a bunch of other things, but it does not seem relevant here).

Also, in CheckBitwiseOperands diagnoseXorMisusedAsPow is called before UsualArithmeticConversions and not after. I am wondering if it is possible for diagnoseXorMisusedAsPow to be called with invalid arguments in CheckBitwiseOperands ?

I think adding parens and casts are fairly well-understood to suppress warnings.

It should work here as well. #define ALPHA_OFFSET (3). Did anobody from Chromium try it?

They have varying levels of C++ proficiency

I consider things like hex decimals or parens to silence as a quite basic knowledge.

parens for sure, but I'm not aware of any other warning that behaves differently for ordinary and hex literals. What else is there?

other warning that behaves differently for ordinary and hex literals

Such literals should indicate that the user knows what he/she is doing (bit tricks, ..). I agree that there should be a additional docs (as we mentioned it and kindly pinged @hans) and not force the user to looking at Clang's code to see what are preconditions to warn.

I dont know really how to make both sides happy in terms of "macro" issue :(

The ALPHA_OFFSET code seems to be unnecessarily "clever" to me. I think the warning can be suppressed by reversing the operands:

ALPHA_OFFSET ^ 2

But if I were maintaining that code I would get rid of the xor hack entirely and use

#define CHANNEL_OFFSET(i) (i)

or

#define CHANNEL_OFFSET(i) (3-(i))

+1

I think adding parens and casts are fairly well-understood to suppress warnings.

It should work here as well. #define ALPHA_OFFSET (3). Did anobody from Chromium try it?

They have varying levels of C++ proficiency

I consider things like hex decimals or parens to silence as a quite basic knowledge.

parens for sure, but I'm not aware of any other warning that behaves differently for ordinary and hex literals. What else is there?

I thought we did in Clang proper, but now I am second-guessing because I cannot find any with a quick search. Regardless, I think the "how do we silence this diagnostic" is getting a bit out of hand. We seem to now silence it when there's a hex, oct, or binary literal, a digit separator, or the xor token. I think we do not want to list all of those options in a diagnostic message.

I looked through D63423 and didn't find an evaluation of false / true positive rates when lhs and rhs are just literals vs when they're identifiers / macros. Glancing through https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=10+%5E&search=Search , I couldn't find a single true positive where lhs or rhs weren't a bare literal (but I didn't look super carefully). Did I just miss that in D63423? What was the motivation for firing on more than bare literals?

This is an interesting question to me that may help inform how we want to proceed.

xbolva00 added a comment.EditedAug 22 2019, 1:49 PM

What was the motivation for firing on more than bare literals?

The first patch (or so) supported diagnosing macros. There is no exact motivation why, simply “I just did it” to present working prototype. Based on review, things were adjusted.

Some reviewers wanted to keep full macro support, some didnt like diagnose ^ in macros. That case was removed.

I think nobody complained about macro as LHS/RHS.

xbolva00 updated this revision to Diff 216720.Aug 22 2019, 2:31 PM

Addressed Bruno Ricci's tips.
Removed macro support.

Removed macro support (again...).

I agree with Nico, this warning is on by default so should be as exact as possible without many false positives. While we all know that the Chromium's false positive case could be rewritten for better code readability, on the other side Clang's default warnings shouldn't be annoying.

Nico, PTAL and possibly approve.

xbolva00 updated this revision to Diff 216723.Aug 22 2019, 2:50 PM
xbolva00 updated this revision to Diff 216724.Aug 22 2019, 3:01 PM

@thakis wrote:

What was the motivation for firing on more than bare literals?

Well, fundamentally, there is no difference in code quality between any of these snippets:

#define BIG 300
double bigpower1() { return 10 ^ BIG; }

static constexpr int BIG = 300; double bigpower2() { return 10 ^ BIG; }

double bigpower3(int BIG) { return 10 ^ BIG; }

double bigpower4(int BIG) { return 10 ^ (BIG+1); }

minval -= 10 ^ -precision;  // https://codesearch.isocpp.org/actcd19/main/q/qgis/qgis_2.18.28+dfsg-2/src/gui/editorwidgets/qgsrangewidgetwrapper.cpp
intermediate = (str[offset] - '0') / (10 ^ lpc);  // https://codesearch.isocpp.org/actcd19/main/p/pacemaker/pacemaker_1.1.18-2/lib/common/iso8601.c
real_loop += (((unsigned int) *argv[4]+k) - 48) * 10^(strlen(argv[4]) - (k+1));  // https://codesearch.isocpp.org/actcd19/main/liba/libaria/libaria_2.8.0+repack-1.2/tests/testCOM.cpp

If you seriously mean "xor", then you shouldn't be writing code that looks like you meant "exponentiate." I don't think it matters whether the LHS or RHS come from literals, macros, constexpr variables, const variables, function calls, additions, subtractions, or what-have-you. It seems reasonable, when you write 10 ^ x or 2 ^ x in C++, for the compiler to tell you about it so you can go fix it. (Admittedly it looks like a slippery slope to int square(int n) { return n ^ 2; } — but we can be data-driven here. There are no true-positive instances of ^ 2 in our corpus, whereas there are many true positives of 10 ^ and 2 ^.)

To the target audience of this warning, the news about ^'s semantics would be coming as a surprise. Not every user of C++ is a power user; and only power users would be like "I know 10 ^ alpha means alpha xor 0xA and in fact that's exactly what I meant — please don't warn me about my code!"

However, if this is the version that gets committed, it's far better than nothing, and it can always be further tightened up later. (For example, this version doesn't yet warn about bigpower3 et seq., right? It basically only hits the literal cases.)

@thakis wrote:

What was the motivation for firing on more than bare literals?

Well, fundamentally, there is no difference in code quality between any of these snippets:

#define BIG 300
double bigpower1() { return 10 ^ BIG; }

static constexpr int BIG = 300; double bigpower2() { return 10 ^ BIG; }

double bigpower3(int BIG) { return 10 ^ BIG; }

double bigpower4(int BIG) { return 10 ^ (BIG+1); }

minval -= 10 ^ -precision;  // https://codesearch.isocpp.org/actcd19/main/q/qgis/qgis_2.18.28+dfsg-2/src/gui/editorwidgets/qgsrangewidgetwrapper.cpp
intermediate = (str[offset] - '0') / (10 ^ lpc);  // https://codesearch.isocpp.org/actcd19/main/p/pacemaker/pacemaker_1.1.18-2/lib/common/iso8601.c
real_loop += (((unsigned int) *argv[4]+k) - 48) * 10^(strlen(argv[4]) - (k+1));  // https://codesearch.isocpp.org/actcd19/main/liba/libaria/libaria_2.8.0+repack-1.2/tests/testCOM.cpp

Thanks for the real-world examples, these are convincing.

If you seriously mean "xor", then you shouldn't be writing code that looks like you meant "exponentiate." I don't think it matters whether the LHS or RHS come from literals, macros, constexpr variables, const variables, function calls, additions, subtractions, or what-have-you. It seems reasonable, when you write 10 ^ x or 2 ^ x in C++, for the compiler to tell you about it so you can go fix it. (Admittedly it looks like a slippery slope to int square(int n) { return n ^ 2; } — but we can be data-driven here. There are no true-positive instances of ^ 2 in our corpus, whereas there are many true positives of 10 ^ and 2 ^.)

Right, I agree with the "data-driven" bit. Hence my point that if in practice all true positives are from literals, then we should only warn on literals. The examples you posted show that that's not the case.

I now agree that it makes sense to warn when the operands are macros or variables.

xbolva00 added a subscriber: regehr.EditedAug 23 2019, 7:06 AM

I now agree that it makes sense to warn when the operands are macros or variables.

I could re-add macro support and then @jfb or @regehr would blame this diagnostic because of macro support =] variables could open a new box of false positives.

Anyway, your motivating case:

minval -= 10 ^ -precision; // https://codesearch.isocpp.org/actcd19/main/q/qgis/qgis_2.18.28+dfsg-2/src/gui/editorwidgets/qgsrangewidgetwrapper.cpp

This should be diagnosted. Location of "-" is not a macro.
Edit: nope, precision is not a macro, just a variable.

real_loop += (((unsigned int) *argv[4]+k) - 48) * 10^(strlen(argv[4]) - (k+1));

Too complex, no chance to diagnose it here :) Not related to macros.

intermediate = (str[offset] - '0') / (10 ^ lpc); // https://codesearch.isocpp.org/actcd19/main/p/pacemaker/pacemaker_1.1.18-2/lib/common/iso8601.c

lpc is not a macro, it is just loop int variable. Not related to macros.

I am strongly in favour to just land this as is. :)

Adding @rsmith to see if we can make forward progress on this patch again.

Adding @rsmith to see if we can make forward progress on this patch again.

On the other side, I don't want to waste Richard's time since I dont want to extend (variables and macros are controversal topic anyway) it more now. I promised to @jfb to handle (2 ^ 64) - 1 as follow up patch and the promised patch is here..

Adding @rsmith to see if we can make forward progress on this patch again.

On the other side, I don't want to waste Richard's time since I dont want to extend (variables and macros are controversal topic anyway) it more now. I promised to @jfb to handle (2 ^ 64) - 1 as follow up patch and the promised patch is here..

It looks like your patch changed the behavior involving macros though, so this isn't just about handling (2 ^ 64) - 1. It's hard to tell due to the way the patch is formatted though. tbh, I would appreciate if you would leave the definition of diagnoseXorMisusedAsPow() where it is and add a forward declare of the function earlier in the file. It would make spotting the differences in the function much easier.

xbolva00 updated this revision to Diff 218115.Aug 30 2019, 9:11 AM

Fixed review comment.

tbh, I would appreciate if you would leave the definition of diagnoseXorMisusedAsPow() where it is and add a forward declare of the function earlier in the file.

I uploaded the patch almost 2 weeks ago and nobody complained so I thought it is okay. Uploaded, fixed.

aaron.ballman added inline comments.Aug 30 2019, 12:14 PM
lib/Sema/SemaExpr.cpp
54–55

I'd appreciate some comments on what these arguments should be when non-null.

11145–11146

I would appreciate it if this patch didn't also change the behavior for macros. That seems like a larger discussion that can happen in a separate patch.

11145–11146

Why is this type changing?

11145–11146

Same question here.

11146

!Negative (we already verified above that it's either the + or - operator, so if it's not one, it's the other.)?

11151

The two branches are not equivalent. What about ~0ULL instead?

11154

This comment explains what the code does, but not why it does it. Given that we're adding special cases, I think more comments here explaining why this is valuable would be appreciated.

tbh, I would appreciate if you would leave the definition of diagnoseXorMisusedAsPow() where it is and add a forward declare of the function earlier in the file.

I uploaded the patch almost 2 weeks ago and nobody complained so I thought it is okay. Uploaded, fixed.

I appreciate it -- this made it much easier to spot the actual changes in the code.

xbolva00 updated this revision to Diff 218258.Sep 1 2019, 5:36 AM

Fixed review comments

xbolva00 marked 11 inline comments as done.Sep 1 2019, 5:38 AM
xbolva00 marked an inline comment as done.
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11145–11146

+1, agree.

aaron.ballman added inline comments.Sep 4 2019, 2:10 PM
lib/Sema/SemaExpr.cpp
11154

Thank you, the comments helped! But they also raised another question for me. What's special about 2^64? Why not (2^16) - 1 or other common power-of-two values? I would have expected 8, 16, 32, and 64 to be handled similarly.

xbolva00 marked an inline comment as done.Sep 4 2019, 2:25 PM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11154

We generally suggest 1LL << C. But here we cant say 1LL << 64.

(2^16)-1 is diagnosed normally since we go here from “visit xor” code.
^^^^

This (2^64)-1 handling was suggested by jfb.

I see no motivation cases to diagnose 2^65, 2^100, ...

xbolva00 marked an inline comment as done.Sep 4 2019, 2:30 PM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11154

I think the suggestion for (2^32)-1:
(1LL<32)-1 is good, or should we make things more complicated and suggest Int max macro?

aaron.ballman added inline comments.Sep 4 2019, 2:50 PM
lib/Sema/SemaExpr.cpp
11154

We generally suggest 1LL << C. But here we cant say 1LL << 64.

Ah, good point.

I see no motivation cases to diagnose 2^65, 2^100, ...

Me neither, I was more wondering about the common powers of two.

I think the suggestion for (2^32)-1:
(1LL<32)-1 is good, or should we make things more complicated and suggest Int max macro?

I feel like this is somewhat clang-tidy territory more than the compiler properly, including the 2^64 - 1 case, because it is likely to be very uncommon due to how specific it is. However, given that this only triggers on xor and only with integer literals, it shouldn't cause too much of a compilation slow-down in general to do it here.

I tend to err on the side of consistency, so my feeling is that if we want the 64 case to suggest ULLONG, we'd want the other cases to behave similarly. Alternatively, rather than handling this specific issue in the compiler, handle it in a bugprone clang-tidy check where we can also give the user more control over how they want to correct their mistake (e.g., std::numeric_limits<long>::max() vs LONG_MAX vs ~0L).

xbolva00 marked an inline comment as done.Sep 4 2019, 2:57 PM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11154
xbolva00 marked an inline comment as done.Sep 4 2019, 3:00 PM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11154

(There were no concerns about slowdown in the “base” patch).

Maybe I should just revert it..

jfb added inline comments.Sep 4 2019, 3:02 PM
lib/Sema/SemaExpr.cpp
11154

I don't really care. My original point was that any suggested fix should be correct, and (1LL << 64) - 1 wasn't.

xbolva00 abandoned this revision.Sep 4 2019, 3:08 PM
aaron.ballman added inline comments.Sep 4 2019, 3:11 PM
lib/Sema/SemaExpr.cpp
11154

I don't really care. My original point was that any suggested fix should be correct, and (1LL << 64) - 1 wasn't.

Agreed with this. We can just leave the fix off in the circumstances it's not correct while still warning that the original code is suspicious.

Ok, I will fix it as you suggested.