GetExprRange in lib/Sema/SemaChecking.cpp returns wrong width for mul operation: it should be 2N bits for N * N.
This patch will close PR35409.
Details
- Reviewers
rjmccall aaron.ballman rsmith
Diff Detail
Event Timeline
Why should the behavior be X86 specific?
There also seems to be something weirder going on with gcc. They also warn on these.
void foo(char c) {
c = c + c;
}
void bar(short c) {
c = c + c;
}
Just want to point the obvious, https://godbolt.org/g/1SQSda
So as per AST (and language rules), it's actually
char foo(char c) { int d = (int)c + (int)c; return (char)d; }
gcc also warns for this
short foo(char a) {
return a * a;
}
Despite the fact that the char would be promoted to int, the upper bits are just sign bits and the multiply result still fits in a short.
But it *should* warn if the underlying char type is unsigned, because 255 * 255 = 65025 which is too large to fit into a signed short.
Fair point, what is the default signedness of char?
FWIW, all these warn in gcc. So they seem to be just checking purely based on the int promotion without any concern for the original size?
unsigned short foo(unsigned char a) { return a * a; } signed short bar(signed char a) { return a * a; } signed short foo(signed char a) { return a + a; } unsigned short foo(unsigned char a) { return a + a; }
It's decided by target architecture. ARM and PPC often use unsigned, x86 often uses signed.
FWIW, all these warn in gcc. So they seem to be just checking purely based on the int promotion without any concern for the original size?
unsigned short foo(unsigned char a) { return a * a; } signed short bar(signed char a) { return a * a; } signed short foo(signed char a) { return a + a; } unsigned short foo(unsigned char a) { return a + a; }
Yeah, this does seem to be just warning about arithmetic involving promotions, which seems rather chatty but is aligned with MISRA rule 10.6.
The purpose of this analysis is not to compute the theoretical information content of the computation result. We are conservative about bounds precisely so that we do not warn in situations like these.
icc seems to match gcc for those last 4 cases i sent. And MSVC is throwing an odd signed/unsigned mismatch
I think we're correct not to warn here and that GCC/ICC are being noisy. The existence of a temporary promotion to a wider type doesn't justify warning on arithmetic between two operands that are the same size as the ultimate result. It is totally fair for users to think of this operation as being "closed" on the original type.
Could you please clarify, are you saying that PR35409 is not a bug, and clang should continue to not warn in those cases?
If we would have "conversion sanitizer", detection of such problems would be easy, but without it, right now it is rather hard to detect such issues...
Correct.
If we would have "conversion sanitizer", detection of such problems would be easy, but without it, right now it is rather hard to detect such issues...
What issue? That arithmetic can overflow?
But this isn't about overflow. Arithmetic overflow issues are nicely detectable with current UBSan (+-fsanitize=integer, for unsigned).
This is about the lack of detectable overflow - implicit cast to int, multiplication of int's (which means no overflow actually happened),
and then implicit integer demotion (which changes the "integer's value", for which there is no sanitizer, yet).
There is currently no way to detect this, at least in clang.
And it's not trivial to detect/debug such problems, which is exactly why a warning would be nice.
The situation is effectively overflow in the original type of the operands. That overflow does not have UB, and its formal moment is slightly different because it's tied to the conversion rather than the actual arithmetic, but, in the end, the observable effect is overflow. From a language-design perspective, C would not be able to get away with implicitly promoting operands to a wider type if the result of the arithmetic after truncation weren't consistent with just doing the arithmetic in the narrower type.
There is currently no way to detect this, at least in clang.
And it's not trivial to detect/debug such problems, which is exactly why a warning would be nice.
I think a "sanitizer" that tried to report when a integer value was used after undergoing an implicit conversion that changed its arithmetic value โ but not when the value first passed through an explicit cast โ would be a really interesting feature.
This patch, however, is contrary to the design of -Wconversion, which does attempt to avoid warning in cases where the user might reasonably see an operation as "really" being performed in its operand type. If you want to argue that we should change the design of -Wconversion, that is a broader conversation that you should take to cfe-dev.
Could you please clarify, are you saying that PR35409 is not a bug, and clang should continue to not warn in those cases?
Correct.
Does it mean we should abandon this revision? On the other hand it's a real bug, isn't it?
Do you see this code as having a bug when a is >= 182?
short foo(unsigned char a) { return a * a; }
(If you don't like seeing unsigned char you can imagine it was spelled as uint8_t.)
That's an interesting question. In general, these warnings do try to ignore the effects of implicit promotion. We would not want -Wsign-conversion to fire on unsigned short x = an_unsigned_short + 1; (or - 1, for that matter), even though formally this coerces a signed int to unsigned short. Similarly, -Wsign-conversion *should* warn on signed short x = an_unsigned_short + 1;, even though formally the promotion from unsigned short to signed int is not problematic and the final conversion from signed int to signed short is not a signedness change. (This second example should also generate a -Wconversion warning, but the questions are independent.) Applying that strictly here would say that the user is entitled to think of this as an operation on unsigned char that then gets losslessly promoted to signed short, even though arithmetically that's not what happens. On the other hand, I do think there's some room for -Wsign-conversion to be more aggressive than -Wconversion about this sort of thing; -Wsign-conversion should generally fire for any changes in signedness from the original operand types (with the usual complexities around constant values), and there's just an exception for computations whose value is known to fit within the expressible range of the result type, which is not true of this multiplication. So I think it would be acceptable to warn on this.
John.
That's an interesting question. In general, these warnings do try to ignore the effects of implicit promotion. We would not want -Wsign-conversion to fire on unsigned short x = an_unsigned_short + 1; (or - 1, for that matter), even though formally this coerces a signed int to unsigned short. Similarly, -Wsign-conversion *should* warn on signed short x = an_unsigned_short + 1;, even though formally the promotion from unsigned short to signed int is not problematic and the final conversion from signed int to signed short is not a signedness change. (This second example should also generate a -Wconversion warning, but the questions are independent.) Applying that strictly here would say that the user is entitled to think of this as an operation on unsigned char that then gets losslessly promoted to signed short, even though arithmetically that's not what happens. On the other hand, I do think there's some room for -Wsign-conversion to be more aggressive than -Wconversion about this sort of thing; -Wsign-conversion should generally fire for any changes in signedness from the original operand types (with the usual complexities around constant values), and there's just an exception for computations whose value is known to fit within the expressible range of the result type, which is not true of this multiplication. So I think it would be acceptable to warn on this.
John.
OK, is it LGTM?
OK, we have 2 possibilities here (fmpov):
- Forget about the issue and don't do anything now - it is not a bug
- Return the width based on real analyze of mul args:
- signed vs. unsigned
- chars, shorts, etc.
What do you suggest to do? Or maybe there are other variants?
I think we just close this. If you want to put time into trying to diagnose in Aaron's example of uint8 * uint -> signed short, I think the right approach is to enhance IntRange so that it can optionally carry a more aggressive range, and then make -Wsign-compare use that when available. That's a significantly different patch from this.
I'm having a very hard time following this thread. In fact, i can't follow the logic at all.
Can you at least point out where the current design goals are documented?
They probably aren't, other than in mailing-list discussions like this.
I remember basing the design off the vision of a white paper that someone working on GCC wrote up. Maybe it was just this wiki page:
https://gcc.gnu.org/wiki/NewWconversion
But our design was always more sophisticated than GCC's. Note the comment at the bottom of that page about how you should write INT_MIN as (-INT_MAX - 1), because otherwise GCC was warning about it! I don't know how that was ever considered acceptable.
Looking for that paper, I did find this GCC bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40752
where they had a very similar debate to the one we just had.
The basic tenet of Clang's -Wconversion design, as far as this problem goes, is that users generally ought to be able to ignore most of the technical details of what C calls for with integer promotions and the types of literals; instead, they should generally be able to think of arithmetic as taking place in the narrowest promoted type of the opaque operands used in the expression. If the actual computation type doesn't include the full range of that theoretical promoted type, that's worth a warning; this can happen when mixing the signedness of operands. But otherwise they can generally think of the operation as computing a value of that promoted type. If that operation overflows, so be it โ we're not going to warn about the potential for overflow every time the user adds two ints, and by the same token, it doesn't make any sense to warn about every time the user adds two shorts just because the language made this otherwise-unimportant technical decision to do the arithmetic in a wider type.
If that operation overflows, so be it โ we're not going to warn about the potential for overflow every time the user adds two ints, and by the same token, it doesn't make any sense to warn about every time the user adds two shorts just because the language made this otherwise-unimportant technical decision to do the arithmetic in a wider type.
There is one comment only: this patch changes the 'mul' operation only and initially the change was done for X86 only because it clearly says about "doubling" of mul result width. And another comment: currently we have warning here:
int c = a * b; T d1 = c; // expected-warning{{implicit conversion loses integer precision: 'int' to 'char'}}
but we don't have it here:
T d2 = a * b; // when T is char
It's a potential problem from my point of view.
But I don't mind to close the review - and the corresponding bug of course.
I disagree with closing the bug.
It's a real issue that is not detected/diagnosed by anything in clang (diagnostic, sanitizer).
It seems we are in a deadlock here :)
Yes, I understand that addition was not changed by the patch. I do not think that using different logic for multiplication is appropriate just because multiplication produces bigger values. It is reasonable to multiply two 8-bit numbers and store the result in an 8-bit variable without a warning because the net effect is just doing a "closed" multiplication on an 8-bit type, which is a reasonable operation to want to do. GCC is too noisy here.
There is nothing x86-specific about the supposed problem here, and using target-specific logic was always clearly inappropriate.
And another comment: currently we have warning here:
int c = a * b; T d1 = c; // expected-warning{{implicit conversion loses integer precision: 'int' to 'char'}}but we don't have it here:
T d2 = a * b; // when T is char
It's a potential problem from my point of view.
All of these warnings are single-line analyses, which is unlikely to change absent massive architectural improvements to clang. It has nothing to do with there being a multiplication; we would give you that exact same warning if you wrote int c = a;.
I don't have any special authority here besides having designed and implemented the current diagnostic. You are welcome to start a thread on cfe-dev to gather support for changing the design of -Wconversion to always warn about the potential for overflow in sub-int multiplication. I will argue against you there, like I have here, but if there's consensus that we should warn about this, I'll abide by the community's judgment.
What would the design for that warning be? C promotes all arithmetic on sub-int types to int, and if that's assigned back to a variable of the original type, there's a conversion. But you seem to only want to warn about truncating the result of multiplication and not, say, addition or negation. Is there a principle to this? Just the likelihood of escaping the range of the original type?
In fact I think about mul only because of X86 nature: it always put the mul result in wider place (than its args). (Don't know about other CPUs - maybe the same?) As result we could change the current design to reflect this feature:
8bit x 8bit -> 16bit
16bit x 16bit -> 32bit
...
While X86 does have multiplies that return double width results, it also has 16/32/64-bit forms of imul that only return the lower portion of the result. Those multiplies are typically faster and have fewer uops than the double width multiplies so we prefer to use that instruction when possible. We are currently using the double width multiply for 8*8->8 multiplies because there is no 8-bit single width multiply.
I agree with @RKSimon, i don't see why this would be arch-specific.
If this is correct, maybe add explanation as a code comment?