Page MenuHomePhabricator

[Diagnostics] Diagnose misused xor as pow
Needs ReviewPublic

Authored by xbolva00 on Jun 17 2019, 5:39 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Fixed notes by @jfb. Thank you.

xbolva00 marked an inline comment as done.Jun 17 2019, 10:48 AM
jfb added a comment.Jun 17 2019, 10:57 AM

Why have .c and .cpp tests?

test/Sema/warn-xor-as-pow.c
43 ↗(On Diff #205121)

I think we want to suggest 1LL << 32 or something like that. Whatever we usually do with this type of suggestion.

45 ↗(On Diff #205121)

This one hits a ceiling, we can't really suggest anything for this value IMO. Maybe we need to see if the user is doing 2^64 - 1? In that case we can offer a suggestion.

xbolva00 updated this revision to Diff 205126.Jun 17 2019, 10:58 AM
xbolva00 marked an inline comment as done.Jun 17 2019, 11:04 AM
xbolva00 added inline comments.
test/Sema/warn-xor-as-pow.c
45 ↗(On Diff #205121)

Since this requires changes in other parts of code to catch 2^64 - 1, maybe we can leave it as a follow-up patch?

jfb added inline comments.Jun 17 2019, 11:06 AM
test/Sema/warn-xor-as-pow.c
45 ↗(On Diff #205121)

Sure.

Seems low-value at first glance, but I guess I can't argue with those Twitter and codesearch results.

Do you have codesearch results for ^2? https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=%5E+2&search=Search Seems like a lot of false positives (and a lot of code comments turning up in codesearch??), but would be perhaps even more valuable to the kinds of users who would write 10^x or x^2.

What is going on in this case, and would a warning here be a false positive or a true positive? https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=10+%5E+7&search=Search

lib/Sema/SemaExpr.cpp
10913

Why do you special-case 0b prefix (or suffix — 0x0b is also special-cased by this snippet), but you don't special-case octal or hexadecimal constants? I'm sure x ^ 0x1234 will be a more common spelling than x ^ 0b1001000110100.

10924

Do you have metrics indicating that this line is an improvement over if (true) {?

10959

I don't understand why parenthesizing one argument should silence the warning. Wouldn't it be more natural to suggest converting both arguments to hexadecimal or binary? That is, convert 10 ^ x to 0xA ^ x, or 2 ^ 16 to 0x2 ^ 0x10.

test/Sema/warn-xor-as-pow.c
38 ↗(On Diff #205121)

Please add test cases for 2 ^ 0x4 and 2 ^ 04 and 0x2 ^ 10 and 02 ^ 10.

39 ↗(On Diff #205121)

I don't understand why this line doesn't warn. Is it because the macro's name has the case-insensitive string xor in it? Is it because there is a macro involved at all? In either case, please add another test case for res = FOOBAR(2, 16).

Also res = EPSILON where #define EPSILON 10^-300. That seems to come up in the codesearch results a lot.

xbolva00 marked 2 inline comments as done.Jun 17 2019, 11:37 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
10924

This is left over of older code :) I will fix it.

test/Sema/warn-xor-as-pow.c
39 ↗(On Diff #205121)

EPSILON 10^-300

this is in macro :(

so maybe if negative RHS, we should check macro too? @jfb

xbolva00 marked 3 inline comments as done.Jun 17 2019, 11:42 AM
xbolva00 updated this revision to Diff 205140.Jun 17 2019, 11:48 AM

Removed useless else block with return.

xbolva00 marked an inline comment as done.Jun 17 2019, 11:58 AM
xbolva00 added inline comments.
test/SemaCXX/warn-xor-as-pow.cpp
86

What should we suggest here ? :) Nevative shift ? think we should diagnose it.

I am not sure about 10 ^ 0 case... warn or not ?

@jfb

Just wanted to say that I think I agree with the design choices here!

xbolva00 updated this revision to Diff 205187.Jun 17 2019, 2:19 PM

Handle 10 ^ -C case..

Quuxplusone added inline comments.Jun 17 2019, 3:03 PM
lib/Sema/SemaExpr.cpp
10931

Can you use starts_with (or the LLVM equivalent) in both of these cases? It'll be faster and also more correct.

Hex and binary are handled up here on line 10927, but octal is handled down on line 10955; why? Can't they be combined into one place in the code?

10982

Suppressing the warning specifically on 10 ^ 0 (which means 10) seems like an anti-feature.

10988

I suggest at least one unit test case that involves 10 ^ 100, and considering the user-friendliness of the resulting error message. How about suggesting "1e" + std::to_string(RightSideIntValue) as the fixit in both cases?

test/SemaCXX/warn-xor-as-pow.cpp
53

I still expect to see a warning either on this line, or on the line where the macro is defined. I don't see why we'd want to be silent in this case.

Thanks, I will address your notes soon!

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

xbolva00 marked an inline comment as done.Jun 17 2019, 3:20 PM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
10931

We cannot use starts_with here, case: 2 ^ 0b11.

Yes, I can combine it to one place.

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

IMO it's better to not warn about xors in macros-- I like the current tradeoffs.

xbolva00 updated this revision to Diff 205198.Jun 17 2019, 3:35 PM

Addressed review notes.

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

IMO it's better to not warn about xors in macros-- I like the current tradeoffs.

Thank you for opinion.

I think this patch is (almost) done. :)

xbolva00 updated this revision to Diff 205202.Jun 17 2019, 3:39 PM

Removed unused diagnostic message.

xbolva00 marked an inline comment as done.Jun 17 2019, 3:44 PM
xbolva00 added inline comments.
test/SemaCXX/warn-xor-as-pow.cpp
14

Small question..

Instead of 1<<0 for this case, I emit fixit "1", seems ok for you?

aaron.ballman added inline comments.Jun 19 2019, 9:05 AM
include/clang/Basic/DiagnosticSemaKinds.td
3306

We usually use "did you mean" in these kinds of diagnostics. Replace the comma with a semicolon.

Same comments below.

lib/Sema/SemaExpr.cpp
10895

Missing a full stop at the end of the comment. Same comment applies elsewhere.

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

IMO it's better to not warn about xors in macros-- I like the current tradeoffs.

Do you mind explaining why? I would have expected to diagnose an xor within a macro because the code is just as wrong there, but maybe I'm missing something.

We will be noisy in this case

#define IRQ_CHINT4_LEVEL        (11 ^ 7)
#define IRQ_CHINT3_LEVEL        (10 ^ 7)
#define IRQ_CHINT2_LEVEL        (9 ^ 7)
#define IRQ_CHINT1_LEVEL        (8 ^ 7)

We will be noisy in this case

#define IRQ_CHINT4_LEVEL        (11 ^ 7)
#define IRQ_CHINT3_LEVEL        (10 ^ 7)
#define IRQ_CHINT2_LEVEL        (9 ^ 7)
#define IRQ_CHINT1_LEVEL        (8 ^ 7)

Sure, but do we not want to be noisy in a case like: #define BAD_IDEA (2 ^ 16)?

... But I go thru codesearch and this is just one case I think.

I think we should warn in macros... In codesearch case ,they are many more true positives, then false negatives.

#define MAX_T_U32 ((2^32)-1)

Ouch.

... But I go thru codesearch and this is just one case I think.

I think we should warn in macros... In codesearch case ,they are many more true positives, then false negatives.

#define MAX_T_U32 ((2^32)-1)

Ouch.

That's my rationale as well. If it happens outside of macros and is bad, it's probably just as likely to happen within a macro and be equally as bad. I think we probably should warn within macros.

xbolva00 updated this revision to Diff 205636.Jun 19 2019, 10:27 AM

Added support to warn in macros.
Renamed to -Wxor-used-as-pow.

jfb added a comment.Jun 19 2019, 10:29 AM

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

So you would not want to see this code diagnosed?

#define POW(x, y) (x ^ y)

void f() {
  int two_to_the_derp = POW(2, 16);
}

I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?

xbolva00 added a comment.EditedJun 19 2019, 10:41 AM
In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

Is it worth to pattern match this case? I don't like how this diagnostic is getting more and more complicated.

User can use -Wno-xor-used-as-pow if auto-generated weird code.

Or I can move it under -Wextra..

jfb added a comment.Jun 19 2019, 10:43 AM
In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

So you would not want to see this code diagnosed?

#define POW(x, y) (x ^ y)

void f() {
  int two_to_the_derp = POW(2, 16);
}

I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?

If you wrote CONSTANT_2_OR_10 ^ CONSTANT I'm pretty sure you messed up. Macros are opaque, and I'm not sure you clearly messed up anymore. Let's take the clear win, see what falls out, and only expand our reach if we have reason to do so. Without trying on huge codebases I don't know whether we'll get a lot of false positive. I'd rather be conservative and not cause headache.

I am also in the "be a bit conservative at first and see how things shake out" camp, but it's y'all doing the hard work here, not me :)

xbolva00 updated this revision to Diff 205641.Jun 19 2019, 10:50 AM

Reverted macro support.

Okey, it should be fine now.

If you are agree with the current state, please approve it.

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

jfb added a comment.Jun 19 2019, 10:54 AM

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

This patch is not the place to change common clang behavior. I'm advocating for what we usually do: be conservative in macros because token-pasting tends to look "wrong" but not actually be wrong as much (i.e. false positives). If you want clang to do something different, an RFC to cfe-dev is the right place for such a discussion.

xbolva00 added a comment.EditedJun 19 2019, 10:58 AM

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

I cannot promise anything, but maybe I could do -Wxor-used-as-pow-in-macro (off by default) so people could try it and see how noisy it would be :)

Edit: Oh, probably no, I would have to duplicate items in DiagnosticSemaKinds :(

In D63423#1550728, @jfb wrote:

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

This patch is not the place to change common clang behavior. I'm advocating for what we usually do: be conservative in macros because token-pasting tends to look "wrong" but not actually be wrong as much (i.e. false positives). If you want clang to do something different, an RFC to cfe-dev is the right place for such a discussion.

I don't think i'm proposing to change the entirety of warnings to warn on macros, am i?
Likewise, is it a causality/practice to ignore macros, or a documented requirement [that would require RFC for changing]?
It's been awhile since i've read through the entirety of docs, but i don't recall seeing one.

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

I cannot promise nothing, but maybe I could do -Wxor-used-as-pow-in-macro (off by default) so people could try it and see how noisy it would be :)

I'm not sure -Wflag + -Wflag-in-macros will scale :/
i was thinking more along the lines of warn-for-system-headers flag.

In D63423#1550713, @jfb wrote:
In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

So you would not want to see this code diagnosed?

#define POW(x, y) (x ^ y)

void f() {
  int two_to_the_derp = POW(2, 16);
}

I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?

If you wrote CONSTANT_2_OR_10 ^ CONSTANT I'm pretty sure you messed up. Macros are opaque, and I'm not sure you clearly messed up anymore. Let's take the clear win, see what falls out, and only expand our reach if we have reason to do so. Without trying on huge codebases I don't know whether we'll get a lot of false positive. I'd rather be conservative and not cause headache.

IIRC, we do not usually go out of our way to stop diagnosing code within macro expansions unless there's a reason to do so. Given that we know the code is likely wrong (because we're only checking constants whether inside or outside of a macro expansion) and that we've seen the incorrect code happen inside of macros in the wild, I suspect that being conservative is not much of a win. That said, I definitely understand the desire to not issue burdensome amounts of false positives. If we see false positives coming from macros, we should address them (perhaps even by not diagnosing within a macro). Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)

xbolva00 updated this revision to Diff 205646.Jun 19 2019, 11:17 AM

Added dots in the comments.

xbolva00 added a comment.EditedJun 19 2019, 11:21 AM

Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)

Sorry, I have no time nor spare computers to check big codebases. As @jfb said, let's start with a conservative approach and possibly improve it in the future.

Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)

Sorry, I have no time nor spare computers to check big codebases. As @jfb said, let's start with a conservative approach and possibly improve it in the future.

I can live with that, but my concern is that we often don't do that "improve it in the future" bit and we already know about buggy cases in the wild where the diagnostic will not trigger. Given how often people use macros for simple, but wrong, constructs like my POW example, I think this reduces the utility of the check.

What are the GCC folks planning to do with macros?

David Malcon - GCC: “I think we'd want to *not* warn if either of the operands are from a macro expansion.“

But I see no reason to follow this and why we should restrict this even more..

xbolva00 updated this revision to Diff 205692.Jun 19 2019, 3:42 PM

Improved code, added const.

Any comments? If no, please approve..

jfb added inline comments.Jun 21 2019, 9:24 AM
include/clang/Basic/DiagnosticGroups.td
508

Does this match the naming that GCC ended up with?

include/clang/Basic/DiagnosticSemaKinds.td
3309

This is still a bad diagnostic, I'd rather see it fixed.

lib/Sema/SemaExpr.cpp
10929

Doesn't this match any expression that contains xor? Put another way, I don't see "xor" used anywhere else under clang, what's usually done?

10951

This ignores 0B.

10954

This ignores 0X.

10961

The above seems to mishandle user-defined literals. Unless the UDL contains 0x or 0b in them. I'd like to see this tested.

Quuxplusone added inline comments.Jun 21 2019, 9:29 AM
lib/Sema/SemaExpr.cpp
10950

I would rather see this whole section as a three-liner:

// Do not diagnose binary, octal, or hexadecimal literals.
if (StringRef(LHSStr).startswith("0") || StringRef(RHSStr).startswith("0"))
    return;
10963

Here and on line 10971, I suggest whitespace around the << (in the suggested expression).

10983

Do you currently warn on, let's say, 2uLL ^ 7? and if so, do we care that the suggested expression 0x2 ^ 7 has a different type? I imagine the answer to "do we care" is "no," but I thought I should ask.

xbolva00 marked 3 inline comments as done.Jun 21 2019, 9:31 AM
xbolva00 added inline comments.
include/clang/Basic/DiagnosticGroups.td
508

No, they have no progress yet. GCC should not block our progress here..

include/clang/Basic/DiagnosticSemaKinds.td
3309

Why so? Can you propose wording here?

We provide a fixit for it, e.g. 1LL<<32..

lib/Sema/SemaExpr.cpp
10929

Yes, but since xor is keyword in C++, I think this is safe.

xbolva00 marked an inline comment as done.Jun 21 2019, 9:32 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
10983

We check if bidwith is same.. Bail out if not.

Quuxplusone added inline comments.Jun 21 2019, 9:43 AM
lib/Sema/SemaExpr.cpp
10929

I believe JF is worried about expressions like constexpr int flexor_exponent = 3; return 10 ^ flexor_exponent;. That expression contains the substring "xor" but does not use the xor keyword. Which reminds me, can you add some test cases showing what behavior you expect for operands that are not integer literals but still compile-time const, or constexpr, or template arguments?

xbolva00 marked an inline comment as done.Jun 21 2019, 9:45 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
10929

Ah, I will change it to “ xor “.

jfb added inline comments.Jun 21 2019, 9:47 AM
lib/Sema/SemaExpr.cpp
10929

Spaces aren't the only valid whitespace character :)

2 xor
   31
xbolva00 marked an inline comment as done.Jun 21 2019, 10:09 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
10929

I will rework it to chech only XOR op, not the while expr.

xbolva00 marked an inline comment as done.Jun 21 2019, 10:27 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
10929

Well, 10 ^ flexor ..

We cannot reach xor check, flexor is not a integer literal. So I will rework nothing here.

aaron.ballman added inline comments.Jun 21 2019, 10:30 AM
lib/Sema/SemaExpr.cpp
10929
#define flexor 7

Now flexor is an integer literal. (A test case for this would be beneficial.)

jfb added inline comments.Jun 21 2019, 10:32 AM
lib/Sema/SemaExpr.cpp
10929

Also, as I've said above:

constexpr long long operator"" _xor(unsigned long long v)
{ return v; }
auto i = 10 ^ 5_xor;