Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
508 ↗ | (On Diff #205049) | GCC folks will do this diagnostic soon too, so I will wait and use their name. |
Should we also emit a note how to silence it, eg. swap xor operands or add parentheses around 2/10 ?
I think you want to avoid warning inside macros as well, say:
#define AWESOME(x, y) ({ blah blah blah; x ^ y; blah }) AWESOME(2, 10); // probably not a bug
test/SemaCXX/warn-xor-as-pow.cpp | ||
---|---|---|
13 ↗ | (On Diff #205089) | 2 ^ 0 seems like it would be a bug too? |
Why have .c and .cpp tests?
test/Sema/warn-xor-as-pow.c | ||
---|---|---|
43 ↗ | (On Diff #205121) | I think we want to suggest 1LL << 32 or something like that. Whatever we usually do with this type of suggestion. |
45 ↗ | (On Diff #205121) | This one hits a ceiling, we can't really suggest anything for this value IMO. Maybe we need to see if the user is doing 2^64 - 1? In that case we can offer a suggestion. |
test/Sema/warn-xor-as-pow.c | ||
---|---|---|
45 ↗ | (On Diff #205121) | Since this requires changes in other parts of code to catch 2^64 - 1, maybe we can leave it as a follow-up patch? |
test/Sema/warn-xor-as-pow.c | ||
---|---|---|
45 ↗ | (On Diff #205121) | Sure. |
Seems low-value at first glance, but I guess I can't argue with those Twitter and codesearch results.
Do you have codesearch results for ^2? https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=%5E+2&search=Search Seems like a lot of false positives (and a lot of code comments turning up in codesearch??), but would be perhaps even more valuable to the kinds of users who would write 10^x or x^2.
What is going on in this case, and would a warning here be a false positive or a true positive? https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=10+%5E+7&search=Search
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10913 ↗ | (On Diff #205121) | Why do you special-case 0b prefix (or suffix — 0x0b is also special-cased by this snippet), but you don't special-case octal or hexadecimal constants? I'm sure x ^ 0x1234 will be a more common spelling than x ^ 0b1001000110100. |
10924 ↗ | (On Diff #205121) | Do you have metrics indicating that this line is an improvement over if (true) {? |
10959 ↗ | (On Diff #205121) | I don't understand why parenthesizing one argument should silence the warning. Wouldn't it be more natural to suggest converting both arguments to hexadecimal or binary? That is, convert 10 ^ x to 0xA ^ x, or 2 ^ 16 to 0x2 ^ 0x10. |
test/Sema/warn-xor-as-pow.c | ||
38 ↗ | (On Diff #205121) | Please add test cases for 2 ^ 0x4 and 2 ^ 04 and 0x2 ^ 10 and 02 ^ 10. |
39 ↗ | (On Diff #205121) | I don't understand why this line doesn't warn. Is it because the macro's name has the case-insensitive string xor in it? Is it because there is a macro involved at all? In either case, please add another test case for res = FOOBAR(2, 16). Also res = EPSILON where #define EPSILON 10^-300. That seems to come up in the codesearch results a lot. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10931 ↗ | (On Diff #205187) | Can you use starts_with (or the LLVM equivalent) in both of these cases? It'll be faster and also more correct. Hex and binary are handled up here on line 10927, but octal is handled down on line 10955; why? Can't they be combined into one place in the code? |
10982 ↗ | (On Diff #205187) | Suppressing the warning specifically on 10 ^ 0 (which means 10) seems like an anti-feature. |
10988 ↗ | (On Diff #205187) | I suggest at least one unit test case that involves 10 ^ 100, and considering the user-friendliness of the resulting error message. How about suggesting "1e" + std::to_string(RightSideIntValue) as the fixit in both cases? |
test/SemaCXX/warn-xor-as-pow.cpp | ||
52 ↗ | (On Diff #205187) | I still expect to see a warning either on this line, or on the line where the macro is defined. I don't see why we'd want to be silent in this case. |
Thanks, I will address your notes soon!
The only remaining question is, as your said, whether or not to diagnose xors in macros. @regerh @jfb ?
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10931 ↗ | (On Diff #205187) | We cannot use starts_with here, case: 2 ^ 0b11. Yes, I can combine it to one place. |
test/SemaCXX/warn-xor-as-pow.cpp | ||
---|---|---|
13 ↗ | (On Diff #205089) | Small question.. Instead of 1<<0 for this case, I emit fixit "1", seems ok for you? |
include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
3306 ↗ | (On Diff #205202) | We usually use "did you mean" in these kinds of diagnostics. Replace the comma with a semicolon. Same comments below. |
lib/Sema/SemaExpr.cpp | ||
10895 ↗ | (On Diff #205202) | Missing a full stop at the end of the comment. Same comment applies elsewhere. |
Do you mind explaining why? I would have expected to diagnose an xor within a macro because the code is just as wrong there, but maybe I'm missing something.
We will be noisy in this case
#define IRQ_CHINT4_LEVEL (11 ^ 7) #define IRQ_CHINT3_LEVEL (10 ^ 7) #define IRQ_CHINT2_LEVEL (9 ^ 7) #define IRQ_CHINT1_LEVEL (8 ^ 7)
... But I go thru codesearch and this is just one case I think.
I think we should warn in macros... In codesearch case ,they are many more true positives, then false negatives.
#define MAX_T_U32 ((2^32)-1)
Ouch.
That's my rationale as well. If it happens outside of macros and is bad, it's probably just as likely to happen within a macro and be equally as bad. I think we probably should warn within macros.
What I meant with macros was that I don't think we should warn on:
#define LEGIT(a, b) ({ work work work; a ^ b; work work work; }) LEGIT(10, 5);
If the constants are inline in the macros then sure, warn there. Basically: if you literally wrote CONSTANT ^ CONSTANT and it looks fishy, let's warn. If token-pasting wrote it for you and it looks fishy, let's not warn.
So you would not want to see this code diagnosed?
#define POW(x, y) (x ^ y) void f() { int two_to_the_derp = POW(2, 16); }
I'm not certain I understand the rationale for why we would not want to diagnose such a construct. Do we have reason to expect a lot of false positives?
Is it worth to pattern match this case? I don't like how this diagnostic is getting more and more complicated.
User can use -Wno-xor-used-as-pow if auto-generated weird code.
Or I can move it under -Wextra..
If you wrote CONSTANT_2_OR_10 ^ CONSTANT I'm pretty sure you messed up. Macros are opaque, and I'm not sure you clearly messed up anymore. Let's take the clear win, see what falls out, and only expand our reach if we have reason to do so. Without trying on huge codebases I don't know whether we'll get a lot of false positive. I'd rather be conservative and not cause headache.
I am also in the "be a bit conservative at first and see how things shake out" camp, but it's y'all doing the hard work here, not me :)
Okey, it should be fine now.
If you are agree with the current state, please approve it.
I've always been frustrated at how clang just gives up as soon as it sees a macro.
At best that should be controlled with some command-line flag.
This patch is not the place to change common clang behavior. I'm advocating for what we usually do: be conservative in macros because token-pasting tends to look "wrong" but not actually be wrong as much (i.e. false positives). If you want clang to do something different, an RFC to cfe-dev is the right place for such a discussion.
I cannot promise anything, but maybe I could do -Wxor-used-as-pow-in-macro (off by default) so people could try it and see how noisy it would be :)
Edit: Oh, probably no, I would have to duplicate items in DiagnosticSemaKinds :(
I don't think i'm proposing to change the entirety of warnings to warn on macros, am i?
Likewise, is it a causality/practice to ignore macros, or a documented requirement [that would require RFC for changing]?
It's been awhile since i've read through the entirety of docs, but i don't recall seeing one.
I'm not sure -Wflag + -Wflag-in-macros will scale :/
i was thinking more along the lines of warn-for-system-headers flag.
IIRC, we do not usually go out of our way to stop diagnosing code within macro expansions unless there's a reason to do so. Given that we know the code is likely wrong (because we're only checking constants whether inside or outside of a macro expansion) and that we've seen the incorrect code happen inside of macros in the wild, I suspect that being conservative is not much of a win. That said, I definitely understand the desire to not issue burdensome amounts of false positives. If we see false positives coming from macros, we should address them (perhaps even by not diagnosing within a macro). Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)
Perhaps the author can run the check over a large corpus of code to see whether fps come up in practice? (The presence of fps will suggest to not warn in macros, but the absence of fps won't tell us too much.)
Sorry, I have no time nor spare computers to check big codebases. As @jfb said, let's start with a conservative approach and possibly improve it in the future.
I can live with that, but my concern is that we often don't do that "improve it in the future" bit and we already know about buggy cases in the wild where the diagnostic will not trigger. Given how often people use macros for simple, but wrong, constructs like my POW example, I think this reduces the utility of the check.
What are the GCC folks planning to do with macros?
David Malcon - GCC: “I think we'd want to *not* warn if either of the operands are from a macro expansion.“
But I see no reason to follow this and why we should restrict this even more..
include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
508 ↗ | (On Diff #205692) | Does this match the naming that GCC ended up with? |
include/clang/Basic/DiagnosticSemaKinds.td | ||
3309 ↗ | (On Diff #205692) | This is still a bad diagnostic, I'd rather see it fixed. |
lib/Sema/SemaExpr.cpp | ||
10929 ↗ | (On Diff #205692) | Doesn't this match any expression that contains xor? Put another way, I don't see "xor" used anywhere else under clang, what's usually done? |
10951 ↗ | (On Diff #205692) | This ignores 0B. |
10954 ↗ | (On Diff #205692) | This ignores 0X. |
10961 ↗ | (On Diff #205692) | The above seems to mishandle user-defined literals. Unless the UDL contains 0x or 0b in them. I'd like to see this tested. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10950 ↗ | (On Diff #205692) | I would rather see this whole section as a three-liner: // Do not diagnose binary, octal, or hexadecimal literals. if (StringRef(LHSStr).startswith("0") || StringRef(RHSStr).startswith("0")) return; |
10963 ↗ | (On Diff #205692) | Here and on line 10971, I suggest whitespace around the << (in the suggested expression). |
10983 ↗ | (On Diff #205692) | Do you currently warn on, let's say, 2uLL ^ 7? and if so, do we care that the suggested expression 0x2 ^ 7 has a different type? I imagine the answer to "do we care" is "no," but I thought I should ask. |
include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
508 ↗ | (On Diff #205692) | No, they have no progress yet. GCC should not block our progress here.. |
include/clang/Basic/DiagnosticSemaKinds.td | ||
3309 ↗ | (On Diff #205692) | Why so? Can you propose wording here? We provide a fixit for it, e.g. 1LL<<32.. |
lib/Sema/SemaExpr.cpp | ||
10929 ↗ | (On Diff #205692) | Yes, but since xor is keyword in C++, I think this is safe. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10983 ↗ | (On Diff #205692) | We check if bidwith is same.. Bail out if not. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | I believe JF is worried about expressions like constexpr int flexor_exponent = 3; return 10 ^ flexor_exponent;. That expression contains the substring "xor" but does not use the xor keyword. Which reminds me, can you add some test cases showing what behavior you expect for operands that are not integer literals but still compile-time const, or constexpr, or template arguments? |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | Ah, I will change it to “ xor “. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | Spaces aren't the only valid whitespace character :) 2 xor 31 |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | I will rework it to chech only XOR op, not the while expr. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | Well, 10 ^ flexor .. We cannot reach xor check, flexor is not a integer literal. So I will rework nothing here. |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | #define flexor 7 Now flexor is an integer literal. (A test case for this would be beneficial.) |
lib/Sema/SemaExpr.cpp | ||
---|---|---|
10929 ↗ | (On Diff #205692) | Also, as I've said above: constexpr long long operator"" _xor(unsigned long long v) { return v; } auto i = 10 ^ 5_xor; |
Hello,
I fixed your notes + added test cases.
Only remaining question is what to do in "2 ^ 64" (65, 66, .. etc) case. I can emit "result of 2 ^ 64" is "XYZ", but this is rather poor diagnostic. What should we suggest for "2 ^ 64"? (@jfb
The "2 ^ 64 - 1" case to be solved as follow up patch..
And since there is no progress on GCC side, I will use -Wxor-used-as-pow
include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
508 ↗ | (On Diff #205692) | Agreed, but we should still coordinate to ensure the names we pick are the same since this is a user-facing option. If GCC isn't adding this yet, can you file a bug report (or comment on an existing one) with the name we've chosen? |
lib/Sema/SemaExpr.cpp | ||
10977–10980 ↗ | (On Diff #214533) | if (const auto *UO = dyn_cast<UnaryOperator>(RHS.get())) { ... } rather than doing an isa<> followed by a cast<>. |
10997 ↗ | (On Diff #214533) | Why is this C++ only? C has an xor macro that comes from iso646.h. Should add a test case for this situation. |
include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
508 ↗ | (On Diff #205692) | I will inform them after patch approval. |
This LGTM aside from a few nits, but please give a chance for other reviewers to weigh in on their comments.
lib/Sema/SemaExpr.cpp | ||
---|---|---|
11025–11031 ↗ | (On Diff #214534) | Might as well combine these. |
test/SemaCXX/warn-xor-as-pow.cpp | ||
100–102 ↗ | (On Diff #214534) | I'd also like to see a case like res = 10_xor ^ 5; |
It's bad enough that this warns for
#define A 2
int f() { return A ^ 1; }
where as far as the users of A are concerned, A is just some arbitrary value, it's just random chance it happens to be two and there is no chance of it being misinterpreted as exponentiation, but it's much worse that this cannot be worked around by changing the definition of A to 0x2, the only way to suppress the warning with a hexadecimal constant is by explicitly using that constant in the use, i.e. by changing the function to int f() { return 0x2 ^ 1; }, which is very much inappropriate when the users of A are not supposed to care about which specific value it has.
I'd like to see this changed so that at least the #define A 0x2 case no longer triggers the warning, but ideally I'd like to see no warning for this example in its current state either. I see in the test case that that warning is intentional, but I'm not seeing from the previous discussion whether this case was based on real world programmer errors or not. Do you recall? If it was, then fair enough, I'll propose to leave that as it is.
(Edit: D97445 changed it so that the concrete example I used before no longer warns; I've edited my comment to use a different example.)
I see how this warning is arguably overzealous in the very special case of "raising" to the constant 1 (since nobody would ever write that by accident). However, if the user of A wants to indicate that they understand this is bitwise-xor, they can simply change the body of their f to return A ^ 0x1; — hex notation suppresses the warning. (Changing the definition of A as well is perhaps a good idea but not technically required.) IMO this is good enough and we should leave it. (What do you think, having seen the A ^ 0x1 workaround? Does that sufficiently address your needs?)
[...] I'm not seeing from the previous discussion whether this case was based on real world programmer errors or not
The description links to a couple of tweets showing examples from the wild:
https://twitter.com/jfbastien/status/1139298419988549632
https://twitter.com/mikemx7f/status/1139335901790625793
I missed that hex notation for the RHS also suppresses the warning. Thanks, that's good to know. Combined with the fact that the case where LHS and RHS are both macros now no longer triggers the warning, that means one of LHS or RHS must be an integer literal, so there will always be a viable workaround once clang 13 is released. That may be good enough.
[...] I'm not seeing from the previous discussion whether this case was based on real world programmer errors or not
The description links to a couple of tweets showing examples from the wild:
https://twitter.com/jfbastien/status/1139298419988549632
https://twitter.com/mikemx7f/status/1139335901790625793
Those are different cases, they literally have 2 ^ in there. In the case I was asking about, the 2 comes from a macro expansion, but the ^ does not. That is much less likely to be a mistake, and one where I didn't see whether the warning for that was based on real-world concerns.
I remember that was interest to support macros too :) tbh I cant remember now such real world case where “macro ^ cst” was an issue but.. it was a long time ago ;)
If you are want, you can send patch to to avoid warning in this case, we can discuss it.
Will do; the change is trivial after D97445: it's just changing that to use || rather than &&. I'll do that and update the tests to match. I don't know whether that'll be what we want, but it may be easier to talk about once we have a good picture from the test case update what it affects.
Perhaps that should warn even if the RHS is in hex form, or is an enumerator constant, or is not even constant at all.
libpng: #if-ed out pseudocode but just waiting for someone to copy and paste: for (i=11;i>=0;--i){ print i, " ", (1 - e(-(2^i)/65536*l(2))) * 2^(32-i), "\n"}.
#define MPHOC_MAX_CHAR ((2 ^ (BITS_PER_CHAR - 1)) - 1) #define MPHOC_MAX_SHORT ((2 ^ (BITS_PER_SHORT - 1)) - 1) #define MPHOC_MAX_INT ((2 ^ (BITS_PER_INT - 1)) - 1) #define MPHOC_MAX_LONG ((2 ^ (BITS_PER_LONG - 1)) - 1) #define MPHOC_MAX_WORD ((2 ^ (BITS_PER_WORD - 1)) - 1)
(and many more in there)
Perhaps that should warn even if the RHS is in hex form
It would be kinda strange, since in one clang release we ask users to silence warning with hex form and newer release would warn anyway. Not a fan of this decision.
, or is an enumerator constant, or
This looks like a good idea. +1.
is not even constant at all.
Depends, needs to be carefully evaluated (true positives vs false positives ratio).
Fair point, but the clang suggestion is always to turn the LHS into hex form, never to turn the RHS in hex form, so if users followed clang's suggested silencing, it would continue to work. That's how I didn't even notice that I could change the RHS to hex form in the case I asked about.
And fully agreed that if we warn about any cases where we didn't warn before, we need to be very careful about the risk of false positives. When I create the new RFC patch, I'll try to get details on that and include it in the description.
When I create the new RFC patch, I'll try to get details on that and include it in the description.
Great! Thanks.