Page MenuHomePhabricator

[Diagnostics] Diagnose misused xor as pow
ClosedPublic

Authored by xbolva00 on Jun 17 2019, 5:39 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks, I will address your notes soon!

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

xbolva00 marked an inline comment as done.Jun 17 2019, 3:20 PM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11046

We cannot use starts_with here, case: 2 ^ 0b11.

Yes, I can combine it to one place.

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

IMO it's better to not warn about xors in macros-- I like the current tradeoffs.

xbolva00 updated this revision to Diff 205198.Jun 17 2019, 3:35 PM

Addressed review notes.

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

IMO it's better to not warn about xors in macros-- I like the current tradeoffs.

Thank you for opinion.

I think this patch is (almost) done. :)

xbolva00 updated this revision to Diff 205202.Jun 17 2019, 3:39 PM

Removed unused diagnostic message.

xbolva00 marked an inline comment as done.Jun 17 2019, 3:44 PM
xbolva00 added inline comments.
test/SemaCXX/warn-xor-as-pow.cpp
14

Small question..

Instead of 1<<0 for this case, I emit fixit "1", seems ok for you?

aaron.ballman added inline comments.Jun 19 2019, 9:05 AM
include/clang/Basic/DiagnosticSemaKinds.td
3330

We usually use "did you mean" in these kinds of diagnostics. Replace the comma with a semicolon.

Same comments below.

lib/Sema/SemaExpr.cpp
11010

Missing a full stop at the end of the comment. Same comment applies elsewhere.

The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?

IMO it's better to not warn about xors in macros-- I like the current tradeoffs.

Do you mind explaining why? I would have expected to diagnose an xor within a macro because the code is just as wrong there, but maybe I'm missing something.

We will be noisy in this case

#define IRQ_CHINT4_LEVEL        (11 ^ 7)
#define IRQ_CHINT3_LEVEL        (10 ^ 7)
#define IRQ_CHINT2_LEVEL        (9 ^ 7)
#define IRQ_CHINT1_LEVEL        (8 ^ 7)

We will be noisy in this case

#define IRQ_CHINT4_LEVEL        (11 ^ 7)
#define IRQ_CHINT3_LEVEL        (10 ^ 7)
#define IRQ_CHINT2_LEVEL        (9 ^ 7)
#define IRQ_CHINT1_LEVEL        (8 ^ 7)

Sure, but do we not want to be noisy in a case like: #define BAD_IDEA (2 ^ 16)?

... But I go thru codesearch and this is just one case I think.

I think we should warn in macros... In codesearch case ,they are many more true positives, then false negatives.

#define MAX_T_U32 ((2^32)-1)

Ouch.

... But I go thru codesearch and this is just one case I think.

I think we should warn in macros... In codesearch case ,they are many more true positives, then false negatives.

#define MAX_T_U32 ((2^32)-1)

Ouch.

That's my rationale as well. If it happens outside of macros and is bad, it's probably just as likely to happen within a macro and be equally as bad. I think we probably should warn within macros.

xbolva00 updated this revision to Diff 205636.Jun 19 2019, 10:27 AM

Added support to warn in macros.
Renamed to -Wxor-used-as-pow.

jfb added a comment.Jun 19 2019, 10:29 AM

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

So you would not want to see this code diagnosed?

#define POW(x, y) (x ^ y)

void f() {
  int two_to_the_derp = POW(2, 16);
}

I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?

xbolva00 added a comment.EditedJun 19 2019, 10:41 AM
In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

Is it worth to pattern match this case? I don't like how this diagnostic is getting more and more complicated.

User can use -Wno-xor-used-as-pow if auto-generated weird code.

Or I can move it under -Wextra..

jfb added a comment.Jun 19 2019, 10:43 AM
In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

So you would not want to see this code diagnosed?

#define POW(x, y) (x ^ y)

void f() {
  int two_to_the_derp = POW(2, 16);
}

I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?

If you wrote CONSTANT_2_OR_10 ^ CONSTANT I'm pretty sure you messed up. Macros are opaque, and I'm not sure you clearly messed up anymore. Let's take the clear win, see what falls out, and only expand our reach if we have reason to do so. Without trying on huge codebases I don't know whether we'll get a lot of false positive. I'd rather be conservative and not cause headache.

I am also in the "be a bit conservative at first and see how things shake out" camp, but it's y'all doing the hard work here, not me :)

xbolva00 updated this revision to Diff 205641.Jun 19 2019, 10:50 AM

Reverted macro support.

Okey, it should be fine now.

If you are agree with the current state, please approve it.

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

jfb added a comment.Jun 19 2019, 10:54 AM

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

This patch is not the place to change common clang behavior. I'm advocating for what we usually do: be conservative in macros because token-pasting tends to look "wrong" but not actually be wrong as much (i.e. false positives). If you want clang to do something different, an RFC to cfe-dev is the right place for such a discussion.

xbolva00 added a comment.EditedJun 19 2019, 10:58 AM

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

I cannot promise anything, but maybe I could do -Wxor-used-as-pow-in-macro (off by default) so people could try it and see how noisy it would be :)

Edit: Oh, probably no, I would have to duplicate items in DiagnosticSemaKinds :(

In D63423#1550728, @jfb wrote:

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

This patch is not the place to change common clang behavior. I'm advocating for what we usually do: be conservative in macros because token-pasting tends to look "wrong" but not actually be wrong as much (i.e. false positives). If you want clang to do something different, an RFC to cfe-dev is the right place for such a discussion.

I don't think i'm proposing to change the entirety of warnings to warn on macros, am i?
Likewise, is it a causality/practice to ignore macros, or a documented requirement [that would require RFC for changing]?
It's been awhile since i've read through the entirety of docs, but i don't recall seeing one.

I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.

I cannot promise nothing, but maybe I could do -Wxor-used-as-pow-in-macro (off by default) so people could try it and see how noisy it would be :)

I'm not sure -Wflag + -Wflag-in-macros will scale :/
i was thinking more along the lines of warn-for-system-headers flag.

In D63423#1550713, @jfb wrote:
In D63423#1550697, @jfb wrote:

What I meant with macros was that I don't think we should warn on:

#define LEGIT(a, b) ({ work work work; a ^ b; work work work; })

LEGIT(10, 5);

If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.

So you would not want to see this code diagnosed?

#define POW(x, y) (x ^ y)

void f() {
  int two_to_the_derp = POW(2, 16);
}

I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?

If you wrote CONSTANT_2_OR_10 ^ CONSTANT I'm pretty sure you messed up. Macros are opaque, and I'm not sure you clearly messed up anymore. Let's take the clear win, see what falls out, and only expand our reach if we have reason to do so. Without trying on huge codebases I don't know whether we'll get a lot of false positive. I'd rather be conservative and not cause headache.

IIRC, we do not usually go out of our way to stop diagnosing code within macro expansions unless there's a reason to do so. Given that we know the code is likely wrong (because we're only checking constants whether inside or outside of a macro expansion) and that we've seen the incorrect code happen inside of macros in the wild, I suspect that being conservative is not much of a win. That said, I definitely understand the desire to not issue burdensome amounts of false positives. If we see false positives coming from macros, we should address them (perhaps even by not diagnosing within a macro). Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)

xbolva00 updated this revision to Diff 205646.Jun 19 2019, 11:17 AM

Added dots in the comments.

xbolva00 added a comment.EditedJun 19 2019, 11:21 AM

Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)

Sorry, I have no time nor spare computers to check big codebases. As @jfb said, let's start with a conservative approach and possibly improve it in the future.

Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)

Sorry, I have no time nor spare computers to check big codebases. As @jfb said, let's start with a conservative approach and possibly improve it in the future.

I can live with that, but my concern is that we often don't do that "improve it in the future" bit and we already know about buggy cases in the wild where the diagnostic will not trigger. Given how often people use macros for simple, but wrong, constructs like my POW example, I think this reduces the utility of the check.

What are the GCC folks planning to do with macros?

David Malcon - GCC: “I think we'd want to *not* warn if either of the operands are from a macro expansion.“

But I see no reason to follow this and why we should restrict this even more..

xbolva00 updated this revision to Diff 205692.Jun 19 2019, 3:42 PM

Improved code, added const.

Any comments? If no, please approve..

jfb added inline comments.Jun 21 2019, 9:24 AM
include/clang/Basic/DiagnosticGroups.td
515

Does this match the naming that GCC ended up with?

include/clang/Basic/DiagnosticSemaKinds.td
3333

This is still a bad diagnostic, I'd rather see it fixed.

lib/Sema/SemaExpr.cpp
11044

Doesn't this match any expression that contains xor? Put another way, I don't see "xor" used anywhere else under clang, what's usually done?

11066

This ignores 0B.

11069

This ignores 0X.

11076

The above seems to mishandle user-defined literals. Unless the UDL contains 0x or 0b in them. I'd like to see this tested.

Quuxplusone added inline comments.Jun 21 2019, 9:29 AM
lib/Sema/SemaExpr.cpp
11065

I would rather see this whole section as a three-liner:

// Do not diagnose binary, octal, or hexadecimal literals.
if (StringRef(LHSStr).startswith("0") || StringRef(RHSStr).startswith("0"))
    return;
11078

Here and on line 10971, I suggest whitespace around the << (in the suggested expression).

11098

Do you currently warn on, let's say, 2uLL ^ 7? and if so, do we care that the suggested expression 0x2 ^ 7 has a different type? I imagine the answer to "do we care" is "no," but I thought I should ask.

xbolva00 marked 3 inline comments as done.Jun 21 2019, 9:31 AM
xbolva00 added inline comments.
include/clang/Basic/DiagnosticGroups.td
515

No, they have no progress yet. GCC should not block our progress here..

include/clang/Basic/DiagnosticSemaKinds.td
3333

Why so? Can you propose wording here?

We provide a fixit for it, e.g. 1LL<<32..

lib/Sema/SemaExpr.cpp
11044

Yes, but since xor is keyword in C++, I think this is safe.

xbolva00 marked an inline comment as done.Jun 21 2019, 9:32 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11098

We check if bidwith is same.. Bail out if not.

Quuxplusone added inline comments.Jun 21 2019, 9:43 AM
lib/Sema/SemaExpr.cpp
11044

I believe JF is worried about expressions like constexpr int flexor_exponent = 3; return 10 ^ flexor_exponent;. That expression contains the substring "xor" but does not use the xor keyword. Which reminds me, can you add some test cases showing what behavior you expect for operands that are not integer literals but still compile-time const, or constexpr, or template arguments?

xbolva00 marked an inline comment as done.Jun 21 2019, 9:45 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11044

Ah, I will change it to “ xor “.

jfb added inline comments.Jun 21 2019, 9:47 AM
lib/Sema/SemaExpr.cpp
11044

Spaces aren't the only valid whitespace character :)

2 xor
   31
xbolva00 marked an inline comment as done.Jun 21 2019, 10:09 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11044

I will rework it to chech only XOR op, not the while expr.

xbolva00 marked an inline comment as done.Jun 21 2019, 10:27 AM
xbolva00 added inline comments.
lib/Sema/SemaExpr.cpp
11044

Well, 10 ^ flexor ..

We cannot reach xor check, flexor is not a integer literal. So I will rework nothing here.

aaron.ballman added inline comments.Jun 21 2019, 10:30 AM
lib/Sema/SemaExpr.cpp
11044
#define flexor 7

Now flexor is an integer literal. (A test case for this would be beneficial.)

jfb added inline comments.Jun 21 2019, 10:32 AM
lib/Sema/SemaExpr.cpp
11044

Also, as I've said above:

constexpr long long operator"" _xor(unsigned long long v)
{ return v; }
auto i = 10 ^ 5_xor;
xbolva00 updated this revision to Diff 214523.Aug 10 2019, 5:46 AM

Addressed review notes.

xbolva00 marked 7 inline comments as done.Aug 10 2019, 5:47 AM

Hello,

I fixed your notes + added test cases.

Only remaining question is what to do in "2 ^ 64" (65, 66, .. etc) case. I can emit "result of 2 ^ 64" is "XYZ", but this is rather poor diagnostic. What should we suggest for "2 ^ 64"? (@jfb

The "2 ^ 64 - 1" case to be solved as follow up patch..

And since there is no progress on GCC side, I will use -Wxor-used-as-pow

xbolva00 updated this revision to Diff 214524.Aug 10 2019, 5:53 AM

svn added forgotten test file..

xbolva00 updated this revision to Diff 214525.Aug 10 2019, 5:56 AM
xbolva00 updated this revision to Diff 214533.Aug 10 2019, 8:22 AM

Added UDL tests

aaron.ballman added inline comments.Aug 10 2019, 9:11 AM
include/clang/Basic/DiagnosticGroups.td
515

Agreed, but we should still coordinate to ensure the names we pick are the same since this is a user-facing option. If GCC isn't adding this yet, can you file a bug report (or comment on an existing one) with the name we've chosen?

lib/Sema/SemaExpr.cpp
11022–11025

if (const auto *UO = dyn_cast<UnaryOperator>(RHS.get())) { ... } rather than doing an isa<> followed by a cast<>.

11042

Why is this C++ only? C has an xor macro that comes from iso646.h. Should add a test case for this situation.

xbolva00 updated this revision to Diff 214534.Aug 10 2019, 9:23 AM

Addressed review notes

xbolva00 marked 2 inline comments as done.Aug 10 2019, 9:24 AM
xbolva00 marked an inline comment as done.
xbolva00 added inline comments.
include/clang/Basic/DiagnosticGroups.td
515

I will inform them after patch approval.

Any futher comments or is it OK now @jfb ? @aaron.ballman ?

aaron.ballman accepted this revision.Aug 16 2019, 6:39 AM

This LGTM aside from a few nits, but please give a chance for other reviewers to weigh in on their comments.

lib/Sema/SemaExpr.cpp
11070–11076

Might as well combine these.

test/SemaCXX/warn-xor-as-pow.cpp
101–103

I'd also like to see a case like res = 10_xor ^ 5;

This revision is now accepted and ready to land.Aug 16 2019, 6:39 AM
xbolva00 updated this revision to Diff 215592.Aug 16 2019, 7:04 AM

Fixed nits.

xbolva00 marked 2 inline comments as done.Aug 16 2019, 7:05 AM
xbolva00 updated this revision to Diff 215693.Aug 16 2019, 3:30 PM

Better comparison for "xor".

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2019, 12:13 PM