This is an archive of the discontinued LLVM Phabricator instance.

Consider unsigned long for non-u/U decimal literals (C90/C++03)
ClosedPublic

Authored by hubert.reinterpretcast on May 15 2015, 7:31 AM.

Details

Summary

This modifies Clang to reflect that under pre-C99 ISO C, decimal
constants may have type unsigned long even if they do not contain u
or U in their suffix (C90 subclause 6.1.3.2 paragraph 5). The same is
done for C++ without C++11 which--because of undefined behaviour--allows
for behaviour compatible with ISO C90 in the case of an unsuffixed
decimal literal and is otherwise identical to C90 in its treatment of
integer literals (C++03 subclause 2.13.1 [lex.icon] paragraph 2).

Messages are added to the c99-compat and c++11-compat groups to warn
on such literals, since they behave differently under the newer
standards.

Fixes PR 16678.

Diff Detail

Event Timeline

hubert.reinterpretcast retitled this revision from to Consider unsigned long for non-u/U decimal literals (C90/C++03).
hubert.reinterpretcast updated this object.
hubert.reinterpretcast edited the test plan for this revision. (Show Details)
hubert.reinterpretcast added a subscriber: Unknown Object (MLST).
rsmith edited edge metadata.May 28 2015, 4:23 PM

Sorry for the delay.

Can you combine the test files together into a single file with more RUN: lines (perhaps with -Ds for the things that change between the old and new modes, and -x to specify C versus C++ input)? Note that expected-* comments within #if'd out regions are ignored. That'll make it much more obvious what you're changing and what is expected to change between modes. (There are also some costs associated with the number of separate test files that can be reduced by doing this.)

include/clang/Basic/DiagnosticCommonKinds.td
119

In C++98, this should probably be an ExtWarn rather than a Warning, because this case has undefined behavior. Splitting this into two warnings would also allow you to put the warning into -Wc99-compat or -Wc++11-compat as appropriate, and to indicate that the C++98 case has undefined behavior.

120–121

It would be clearer and more useful to phrase this as:

"integer literal is too large to be represented in type 'long', interpreting as 'unsigned long'; this literal will have type '%select{long long|unsigned long long}0' in C99 onwards"

(with s/C99/C++11/ for the C++98 warning.)

You can easily compute the signedness of the long long in SemaExpr by checking if LongLongSize > LongSize.

include/clang/Basic/DiagnosticCommonKinds.td
119

In C++98, the undefined behaviour is only on the case of a decimal integer literal with none of u/U/l/L. If the l/L is present, the choice of unsigned long is required to be considered after long.

I agree that an ExtWarn is appropriate for the undefined behaviour case, but I do not think that the -Wc99-compat/-Wc++11-compat split still follows.

120–121

The use of unsigned long long in C99/C++11 modes would also be an extension though. Saying "C99 onwards" implies the Standard, not the mode; so perhaps "in C99-and-later modes"?

rsmith added inline comments.Jun 5 2015, 12:56 PM
include/clang/Basic/DiagnosticCommonKinds.td
119

OK, so:

  • For the cases where the code has undefined behavior, the warning should be an ExtWarn
  • For the cases where the code has defined behavior but changes meaning in a later standard, the warning should be in -Wc*-compat
  • For the cases where the code becomes invalid in a later standard, the warning should say so
  • We should split this warning up into enough separate diagnostics that we can cover all of the above cases with correct, clear, helpful diagnostics

I think the different cases are:

  1. C89, no suffix or L, LONG_MAX == LLONG_MAX (c99-compat), "will be ill-formed in C99 onwards"
  2. C89, no suffix or L, LONG_MAX < LLONG_MAX (c99-compat), "will have type 'long long' in C99 onwards"
  3. C++98, no suffix, LONG_MAX == LLONG_MAX (c++11-compat, ExtWarn), "will be ill-formed in C++11 onwards"
  4. C++98, L suffix, LONG_MAX == LLONG_MAX (c++11-compat), "will be ill-formed in C++11 onwards"
  5. C++98, no suffix, LONG_MAX < LLONG_MAX (c++11-compat, ExtWarn), "will have type 'long long' in C++11 onwards"
  6. C++98, L suffix, LONG_MAX < LLONG_MAX (c++11-compat), "will have type 'long long' in C++11 onwards"

I think you can cover the above with three different diagnostics, for 1+2, 3+5, and 4+6, with a %select in each one selecting on whether 'long long' is large enough for the literal.

Each of these should be in either -Wc99-compat or -Wc++11-compat because the meaning of the program would be different under -std=c99 or -std=c++11, respectively.

hubert.reinterpretcast edited edge metadata.

Split+regroup the new messages; merge testcases

rsmith accepted this revision.Jun 8 2015, 2:51 PM
rsmith edited edge metadata.

LGTM, thanks!

This revision is now accepted and ready to land.Jun 8 2015, 2:51 PM
hubert.reinterpretcast edited the test plan for this revision. (Show Details)
hubert.reinterpretcast edited edge metadata.