This adds support for C++2b z/uz suffixes for size_t literals (P0330).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Thanks for working on this! I think the direction is good in general, but I think we should also add tests for use in the preprocessor (#if 1z == 1, etc) as well as tests for the behavior in C.
clang/include/clang/Basic/DiagnosticCommonKinds.td | ||
---|---|---|
191 | ||
clang/include/clang/Lex/LiteralSupport.h | ||
66 | ||
clang/lib/Frontend/InitPreprocessor.cpp | ||
593–595 | ||
clang/lib/Lex/LiteralSupport.cpp | ||
680 | ||
clang/lib/Lex/PPExpressions.cpp | ||
325–327 | ||
326 | This feature is specific to C++ and needs some sort of diagnostic in C. There is not currently a WG14 proposal for this literal suffix, so I think this should be an error in that case rather than a warning. | |
clang/lib/Sema/SemaExpr.cpp | ||
3871–3873 | ||
3872 | Same diagnostic needs for C here as above. | |
3911 | The assert message doesn't seem to match the predicate -- why does a null qualtype imply it's a microsoft literal? | |
clang/test/SemaCXX/size_t-literal.cpp | ||
15–17 | Rather than check __cplusplus like this, I think the RUN lines should specify a verify prefix. e.g., -verify=cxx2b and then use cxx2b-no-diagnostics and -verify=cxx20 and then use cxx20-warning {{}}. |
Thanks Aaron for taking a look! Addressed the comments. I also hope it's okay to test preprocessor in the same test (test/Lexer/size_t-literal.cpp).
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
3911 | Hm, agree, that's probably misleading. I've tried to check that the previous branch couldn't be taken if this one has been. I think "assert(!Literal.MicrosoftInteger && ...);" would make more sense. | |
clang/test/SemaCXX/size_t-literal.cpp | ||
15–17 | Good idea, thanks! That has made the test much cleaner. |
clang/include/clang/Basic/DiagnosticCommonKinds.td | ||
---|---|---|
190 | Should this be an ExtWarn? I think we should warn on this by default. | |
clang/lib/Lex/LiteralSupport.cpp | ||
594–595 | General comment: I think the checks here have become too complex and error-prone. I suggest we add another flag, hasSize, that's set when we parse any size modifier (F, L, LL, H, F128, I64, and now Z), and when parsing any size modifier, replace the existing ad-hoc set of checks with a check for hasSize. | |
clang/lib/Lex/PPExpressions.cpp | ||
325 | We should issue a compat warning when these literals are used in C++2b or later. (That is, always issue a diagnostic for a size_t literal: in C, it's an error, in C++20 and before, it's an extension warning, and in C++2b and later, it's a backwards-compatibility warning.) There are a bunch of examples of this elsewhere in the codebase -- look for the emission of warn_cxx.*_compat. | |
clang/lib/Sema/SemaExpr.cpp | ||
3871 | Also need the C++20-and-before compat warning here for C++2b-onwards mode. | |
3926 | This should exclude the isSizeT case in addition to isLong and isLongLong. (It probably doesn't matter, because int is not larger than size_t on any platform we support, but nonetheless we should check.) | |
3942 | Likewise here. | |
3973 | Likewise here -- and this one actually does matter, because we have targets with a 32-bit size_t but a 64-bit long long. | |
clang/test/Lexer/size_t-literal.cpp | ||
7–9 | Please also check -1z < 0 and -1zu < 0 to ensure that the preprocessor gets the signedness correct. |
Thanks Richard for taking a look!
clang/include/clang/Basic/DiagnosticCommonKinds.td | ||
---|---|---|
190 | I tried to follow the strategy used for 'long long' (only warn with -Wlong-long), but I agree that warn-by-default would probably be better. | |
clang/lib/Lex/LiteralSupport.cpp | ||
594–595 | Good idea! It actually helped to flush out the issue when Q and H are mixed together: https://godbolt.org/z/8hj39P93a |
clang/lib/Lex/LiteralSupport.cpp | ||
---|---|---|
636 | Looks like this might fix incorrect acceptance of 1.0f16f and 1.0f16L in CUDA mode too :) | |
clang/lib/Sema/SemaExpr.cpp | ||
3934–3941 | I'd like to see some test coverage for this, particularly the testing of the high bit to determine if we're forming a signed or unsigned literal. | |
3973 | Can we get a test for this case? (Eg, 5'000'000'000z with -triple i686-linux.) |
clang/lib/Lex/LiteralSupport.cpp | ||
---|---|---|
636 | Nice! | |
clang/lib/Sema/SemaExpr.cpp | ||
3892 | I now begin to think that we should probably also prohibit things like 0x1234z to be implicitly interpreted as unsigned. Wdyt? | |
3992 | I think this branch should also be covered, since we probably don't want promotion from size_t to ULL but instead have a separate diagnostic that size_t is out-of-range. I've added another diagnostic and branch here. |
Address comments. Also:
- always issue a new diagnostic if size_t/ssize_t is out of range;
- prohibit numbers with 'z' suffix and non-10-radix to be interpreted as unsigned.
clang/include/clang/Basic/DiagnosticCommonKinds.td | ||
---|---|---|
198–200 | I wonder if it'd be better to say 'ssize_t' instead of signed 'size_t' here? The latter sounds self-contradictory since size_t is an unsigned type. | |
clang/lib/Sema/SemaExpr.cpp | ||
3892 | Table 8 says that a z-suffixed literal has type ssize_t (if the value fits) or size_t (otherwise), so I think taking AllowUnsigned into account is correct (although perhaps surprising). | |
3992 | Thanks, that makes sense to me. I agree that a z-suffixed literal should never have any type other than size_t or ssize_t; we don't want to fall back to unsigned long long as an extension. | |
clang/test/SemaCXX/size_t-literal.cpp | ||
79 | This case looks valid to me. |
clang/include/clang/Basic/DiagnosticCommonKinds.td | ||
---|---|---|
198–200 | I thought about it (and actually had it first as ssize_t). The problem with ssize_t is that it's not defined by C or C++ Standards, AFAIK, but by POSIX. The proposal calls it "the signed integer type corresponding to std::size_t", so I decided to shorten it to "signed 'size_t'". However, I don't have strong opinion on this. | |
clang/lib/Sema/SemaExpr.cpp | ||
3892 | Thanks for pointing this out! Changed it back and fixed the tests. |
Thanks! Looks good to me. Please wait a day or so in case Aaron has more comments before going ahead.
clang/include/clang/Basic/DiagnosticCommonKinds.td | ||
---|---|---|
198–200 | That makes sense. OK, I don't have a strong opinion either, so let's go with what you have. | |
clang/lib/Sema/SemaExpr.cpp | ||
3992–3993 | (Mostly to make it clear that this can happen for size_t regardless of the u suffix.) | |
clang/www/cxx_status.html | ||
1273 ↗ | (On Diff #334263) |
clang/lib/Frontend/InitPreprocessor.cpp | ||
---|---|---|
593–594 | I was also wondering about this. I've checked that we also do the same for other feature macros, such as __cpp_binary_literals, which is defined for -std>=c++14 while at the same time is allowed as an extension before C++14. Therefore I decided to mimic the behaviour. |
LGTM, thank you for the patch!
clang/lib/Frontend/InitPreprocessor.cpp | ||
---|---|---|
593–594 | Thanks for checking on that! I think that seems defensible enough. :-) |
Aaron, Richard, thanks a lot for reviewing this!
Sorry, forgot to add amend the commit with 'Differential revision', closing this manually..
clang/lib/Frontend/InitPreprocessor.cpp | ||
---|---|---|
593–594 | Btw, thus far libc++ has tended to make the opposite choice: for example, libc++ defines __cpp_lib_variant == 202102 in all modes, because the programmer conceivably might be depending on that macro to make some decision, so we want to make sure it reflects the specific semantics that we implement. (For __cpp_binary_literals specifically, I agree it doesn't really matter because nobody's going to be making decisions based on the value of this macro.) See https://reviews.llvm.org/D99290#inline-934563 (D96385, D97394) for previous discussions on the libc++ side. |
clang/lib/Frontend/InitPreprocessor.cpp | ||
---|---|---|
593–594 | Thanks for pointing this out, Arthur. I wish there was some consistency, however, I'm not sure if this is easily feasible. I guess the strategy of defining __cpp_size_t_literals on all modes would be problematic, since if the user code depends on __cpp_size_t_literals, it could suddenly receive the extension warning (when compiled with -std<2++2b), which is enabled by default. | |
593–594 |
|
clang/lib/Frontend/InitPreprocessor.cpp | ||
---|---|---|
593–594 |
Ah, yes. Orthogonally to everything I said above, I think it's fair to say that
(*) - with a bonus exception that if the user passes -Wno-blah or -w, that doesn't magically make things be supported |
clang/lib/Frontend/InitPreprocessor.cpp | ||
---|---|---|
593–594 | We have some prior art to draw on here: our __has_extension(X) behavior is that under -pedantic-errors, we don't advertise any extensions beyond the __has_feature(X) set, and otherwise we advertise features even if they will lead to warnings (or errors via -Werror or -Werror=pedantic or similar warning flags). I'm not sure that's necessarily the best thing, since it's only loosely connected to whether the construct in question would lead to (warnings or) errors, but it's consistent with the principle that warning flags shouldn't change behavior (beyond which warnings or errors are emitted) and probably more useful than never advertising extensions in earlier language modes. |
Should this be an ExtWarn? I think we should warn on this by default.