Page MenuHomePhabricator

[C++2b] Support size_t literals
ClosedPublic

Authored by AntonBikineev on Mar 27 2021, 9:49 AM.

Details

Summary

This adds support for C++2b z/uz suffixes for size_t literals (P0330).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
AntonBikineev requested review of this revision.Mar 27 2021, 9:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2021, 9:49 AM

Fix failing tests

Fix formatting

aaron.ballman requested changes to this revision.Mar 29 2021, 7:53 AM
aaron.ballman added a subscriber: aaron.ballman.

Thanks for working on this! I think the direction is good in general, but I think we should also add tests for use in the preprocessor (#if 1z == 1, etc) as well as tests for the behavior in C.

clang/include/clang/Basic/DiagnosticCommonKinds.td
191
clang/include/clang/Lex/LiteralSupport.h
66
clang/lib/Frontend/InitPreprocessor.cpp
593–595
clang/lib/Lex/LiteralSupport.cpp
685
clang/lib/Lex/PPExpressions.cpp
325–327
326

This feature is specific to C++ and needs some sort of diagnostic in C. There is not currently a WG14 proposal for this literal suffix, so I think this should be an error in that case rather than a warning.

clang/lib/Sema/SemaExpr.cpp
3871–3873
3872

Same diagnostic needs for C here as above.

3914

The assert message doesn't seem to match the predicate -- why does a null qualtype imply it's a microsoft literal?

clang/test/SemaCXX/size_t-literal.cpp
15–17

Rather than check __cplusplus like this, I think the RUN lines should specify a verify prefix. e.g., -verify=cxx2b and then use cxx2b-no-diagnostics and -verify=cxx20 and then use cxx20-warning {{}}.

This revision now requires changes to proceed.Mar 29 2021, 7:53 AM
AntonBikineev marked 9 inline comments as done.Mar 29 2021, 1:27 PM

Thanks Aaron for taking a look! Addressed the comments. I also hope it's okay to test preprocessor in the same test (test/Lexer/size_t-literal.cpp).

clang/lib/Sema/SemaExpr.cpp
3914

Hm, agree, that's probably misleading. I've tried to check that the previous branch couldn't be taken if this one has been. I think "assert(!Literal.MicrosoftInteger && ...);" would make more sense.

clang/test/SemaCXX/size_t-literal.cpp
15–17

Good idea, thanks! That has made the test much cleaner.

AntonBikineev marked an inline comment as done.

Address comments

rsmith added inline comments.Mar 29 2021, 2:34 PM
clang/include/clang/Basic/DiagnosticCommonKinds.td
190

Should this be an ExtWarn? I think we should warn on this by default.

clang/lib/Lex/LiteralSupport.cpp
595–596

General comment: I think the checks here have become too complex and error-prone. I suggest we add another flag, hasSize, that's set when we parse any size modifier (F, L, LL, H, F128, I64, and now Z), and when parsing any size modifier, replace the existing ad-hoc set of checks with a check for hasSize.

clang/lib/Lex/PPExpressions.cpp
325

We should issue a compat warning when these literals are used in C++2b or later. (That is, always issue a diagnostic for a size_t literal: in C, it's an error, in C++20 and before, it's an extension warning, and in C++2b and later, it's a backwards-compatibility warning.) There are a bunch of examples of this elsewhere in the codebase -- look for the emission of warn_cxx.*_compat.

clang/lib/Sema/SemaExpr.cpp
3871

Also need the C++20-and-before compat warning here for C++2b-onwards mode.

3912–3931

This should exclude the isSizeT case in addition to isLong and isLongLong. (It probably doesn't matter, because int is not larger than size_t on any platform we support, but nonetheless we should check.)

3947

Likewise here.

3978

Likewise here -- and this one actually does matter, because we have targets with a 32-bit size_t but a 64-bit long long.

clang/test/Lexer/size_t-literal.cpp
7–9

Please also check -1z < 0 and -1zu < 0 to ensure that the preprocessor gets the signedness correct.

AntonBikineev marked 7 inline comments as done.Mar 29 2021, 7:13 PM

Thanks Richard for taking a look!

clang/include/clang/Basic/DiagnosticCommonKinds.td
190

I tried to follow the strategy used for 'long long' (only warn with -Wlong-long), but I agree that warn-by-default would probably be better.

clang/lib/Lex/LiteralSupport.cpp
595–596

Good idea! It actually helped to flush out the issue when Q and H are mixed together: https://godbolt.org/z/8hj39P93a

AntonBikineev marked 2 inline comments as done.

Address comments

rsmith added inline comments.Mar 29 2021, 10:23 PM
clang/lib/Lex/LiteralSupport.cpp
640

Looks like this might fix incorrect acceptance of 1.0f16f and 1.0f16L in CUDA mode too :)

clang/lib/Sema/SemaExpr.cpp
3920–3927

I'd like to see some test coverage for this, particularly the testing of the high bit to determine if we're forming a signed or unsigned literal.

3978

Can we get a test for this case? (Eg, 5'000'000'000z with -triple i686-linux.)

AntonBikineev marked 3 inline comments as done.Mar 30 2021, 3:36 AM
AntonBikineev added inline comments.
clang/lib/Lex/LiteralSupport.cpp
640

Nice!

clang/lib/Sema/SemaExpr.cpp
3895

I now begin to think that we should probably also prohibit things like 0x1234z to be implicitly interpreted as unsigned. Wdyt?

3998

I think this branch should also be covered, since we probably don't want promotion from size_t to ULL but instead have a separate diagnostic that size_t is out-of-range. I've added another diagnostic and branch here.

AntonBikineev marked an inline comment as done.

Address comments. Also:

  • always issue a new diagnostic if size_t/ssize_t is out of range;
  • prohibit numbers with 'z' suffix and non-10-radix to be interpreted as unsigned.
rsmith added inline comments.Mar 30 2021, 11:53 AM
clang/include/clang/Basic/DiagnosticCommonKinds.td
198–200

I wonder if it'd be better to say 'ssize_t' instead of signed 'size_t' here? The latter sounds self-contradictory since size_t is an unsigned type.

clang/lib/Sema/SemaExpr.cpp
3895

Table 8 says that a z-suffixed literal has type ssize_t (if the value fits) or size_t (otherwise), so I think taking AllowUnsigned into account is correct (although perhaps surprising).

3998

Thanks, that makes sense to me. I agree that a z-suffixed literal should never have any type other than size_t or ssize_t; we don't want to fall back to unsigned long long as an extension.

clang/test/SemaCXX/size_t-literal.cpp
79

This case looks valid to me.

AntonBikineev marked an inline comment as done.Mar 30 2021, 1:40 PM
AntonBikineev added inline comments.
clang/include/clang/Basic/DiagnosticCommonKinds.td
198–200

I thought about it (and actually had it first as ssize_t). The problem with ssize_t is that it's not defined by C or C++ Standards, AFAIK, but by POSIX. The proposal calls it "the signed integer type corresponding to std::size_t", so I decided to shorten it to "signed 'size_t'". However, I don't have strong opinion on this.

clang/lib/Sema/SemaExpr.cpp
3895

Thanks for pointing this out! Changed it back and fixed the tests.

Address comments

rsmith accepted this revision.Mar 30 2021, 3:17 PM

Thanks! Looks good to me. Please wait a day or so in case Aaron has more comments before going ahead.

clang/include/clang/Basic/DiagnosticCommonKinds.td
198–200

That makes sense. OK, I don't have a strong opinion either, so let's go with what you have.

clang/lib/Sema/SemaExpr.cpp
3996–3997

(Mostly to make it clear that this can happen for size_t regardless of the u suffix.)

clang/www/cxx_status.html
1273
AntonBikineev marked 2 inline comments as done.

Generally LGTM but I did have a question about the feature testing macro.

clang/lib/Frontend/InitPreprocessor.cpp
593–594

Because we allow this as an extension in all C++ modes, should this be enabled always rather than gated on C++2b?

clang/lib/Lex/LiteralSupport.cpp
678
AntonBikineev marked an inline comment as done.Mar 31 2021, 6:03 AM
AntonBikineev added inline comments.
clang/lib/Frontend/InitPreprocessor.cpp
593–594

I was also wondering about this. I've checked that we also do the same for other feature macros, such as __cpp_binary_literals, which is defined for -std>=c++14 while at the same time is allowed as an extension before C++14. Therefore I decided to mimic the behaviour.

aaron.ballman accepted this revision.Mar 31 2021, 6:05 AM

LGTM, thank you for the patch!

clang/lib/Frontend/InitPreprocessor.cpp
593–594

Thanks for checking on that! I think that seems defensible enough. :-)

This revision is now accepted and ready to land.Mar 31 2021, 6:05 AM
AntonBikineev closed this revision.Mar 31 2021, 6:42 AM

Aaron, Richard, thanks a lot for reviewing this!

Sorry, forgot to add amend the commit with 'Differential revision', closing this manually..

Quuxplusone added inline comments.
clang/lib/Frontend/InitPreprocessor.cpp
593–594

Btw, thus far libc++ has tended to make the opposite choice: for example, libc++ defines __cpp_lib_variant == 202102 in all modes, because the programmer conceivably might be depending on that macro to make some decision, so we want to make sure it reflects the specific semantics that we implement. (For __cpp_binary_literals specifically, I agree it doesn't really matter because nobody's going to be making decisions based on the value of this macro.)

See https://reviews.llvm.org/D99290#inline-934563 (D96385, D97394) for previous discussions on the libc++ side.

AntonBikineev added inline comments.Mar 31 2021, 7:50 AM
clang/lib/Frontend/InitPreprocessor.cpp
593–594

Thanks for pointing this out, Arthur.

I wish there was some consistency, however, I'm not sure if this is easily feasible. I guess the strategy of defining __cpp_size_t_literals on all modes would be problematic, since if the user code depends on __cpp_size_t_literals, it could suddenly receive the extension warning (when compiled with -std<2++2b), which is enabled by default.

593–594

Btw, thus far libc++ has tended to make the opposite choice: for example, libc++ defines __cpp_lib_variant == 202102 in all modes, because the programmer conceivably might be depending on that macro to make some decision, so we want to make sure it reflects the specific semantics that we implement. (For __cpp_binary_literals specifically, I agree it doesn't really matter because nobody's going to be making decisions based on the value of this macro.)

See https://reviews.llvm.org/D99290#inline-934563 (D96385, D97394) for previous discussions on the libc++ side.

clang/lib/Frontend/InitPreprocessor.cpp
593–594

I guess the strategy of defining __cpp_size_t_literals in all modes would be problematic, since if the [pre-C++2b] user code depends on __cpp_size_t_literals, it could suddenly receive the extension warning...

Ah, yes. Orthogonally to everything I said above, I think it's fair to say that

  • in modes where 42uz produces a fatal error, it's definitely "not supported"
  • in modes where it's accepted without complaint, it's definitely "supported" (*)
  • in modes where it produces a non-fatal warning, you can plausibly argue it either way

(*) - with a bonus exception that if the user passes -Wno-blah or -w, that doesn't magically make things be supported
libc++'s situation seems more black-and-white; e.g. variant behaves one way or the other. There's no libc++ equivalent of "you get the new behavior but with a warning." :)

rsmith added inline comments.Apr 15 2021, 6:19 PM
clang/lib/Frontend/InitPreprocessor.cpp
593–594

We have some prior art to draw on here: our __has_extension(X) behavior is that under -pedantic-errors, we don't advertise any extensions beyond the __has_feature(X) set, and otherwise we advertise features even if they will lead to warnings (or errors via -Werror or -Werror=pedantic or similar warning flags).

I'm not sure that's necessarily the best thing, since it's only loosely connected to whether the construct in question would lead to (warnings or) errors, but it's consistent with the principle that warning flags shouldn't change behavior (beyond which warnings or errors are emitted) and probably more useful than never advertising extensions in earlier language modes.