[Clang] CWG1473: do not err on the lack of space after operator""
In addition:
- Fix tests for CWG2521 deprecation warning.
- Enable -Wdeprecated-literal-operator by default.
Side note: GCC stopped diagnosing this since GCC 4.9
Paths
| Differential D153156
[Clang] CWG1473: do not err on the lack of space after operator"" AcceptedPublic Authored by rZhBoYao on Jun 16 2023, 10:43 AM.
Details
Summary [Clang] CWG1473: do not err on the lack of space after operator"" In addition:
Side note: GCC stopped diagnosing this since GCC 4.9
Diff Detail
Event Timeline
rZhBoYao marked an inline comment as done. Comment Actions There seems to be no clear objection to this and https://reviews.llvm.org/D152632 and the CI are passing for both. Any chance that I merge these two before llvm 17 branch out (IIRC the next Monday)? Comment Actions LGTM aside from some minor concerns.
As this isn't fixing a regression, I think it'd be better to let it bake a while longer in Clang 18 (unless there's some extenuating circumstances where we need to land this sooner).
This revision is now accepted and ready to land.Aug 15 2023, 6:51 AM rZhBoYao marked 2 inline comments as done. Comment ActionsThank Aaron and Vlad for reviewing this! Just updating the diff to reflect the final version. This revision was landed with ongoing or failed builds.Aug 17 2023, 8:12 AM Closed by commit rGf2583f3acf59: [Clang] CWG1473: do not err on the lack of space after operator"" (authored by rZhBoYao). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions Unfortunately the option -Wno-reserved-user-defined-literal fails after this: #define MYTHING "_something_" const char* f() { return "ONE"MYTHING"TWO"; } $ clang -Wno-reserved-user-defined-literal repro.cxx repro.cxx:4:15: error: no matching literal operator for call to 'operator""MYTHING' with arguments of types 'const char *' and 'unsigned long', and no matching literal operator template 4 | return "ONE"MYTHING"TWO"; | ^ 1 error generated. Comment Actions
This is conforming right? Correct me if I'm wrong. My reading of https://eel.is/c++draft/lex.pptoken#3.3 is that "ONE"MYTHING"TWO" is a single preprocessing-token during phase 3 (https://eel.is/c++draft/lex.phases#1.3). Can @aaron.ballman confirm this? Comment Actions
The diagnostic behavior is correct. MYTHING doesn't get expanded until phase 4 (http://eel.is/c++draft/lex.phases#1.4), so this appears as "ONE"MYTHING as a single preprocessor token: https://eel.is/c++draft/lex.ext#nt:user-defined-string-literal and that token is an invalid UDL. Comment Actions
IIUC, the question is not whether the diagnostic is correct, but rather why -Wno-reserved-user-defined-literal does not workaround the breakage. Is that right? An example of this in the wild is older versions of swig: https://github.com/swig/swig/blob/939dd5e1c8c17e5f8b38747bf18e9041ab5f377e/Source/Modules/php.cxx#L1724 Comment Actions
Correct per the specification, but the purpose of -Wno-reserved-user-defined-literal is to precisely to permit such pre-C++11 code like the example above to continue to compile. This change breaks that option. (Note that it is an error-by-default "warning".) If proper spec-conformance means we can no longer support the ability to allow such out-of-spec pre-c++11 code to work anymore, that's probably OK...but, in that case, we also need to eliminate the warning option, and mention the change in the release notes. However, it's unclear to me why we need to do so, since the spec says you're not allowed to define UDL without an underscore prefix Yet, what this change does, and why it breaks the above functionality, is that it now allows defining and calling user-defined literal operators which don't begin with an underscore. That seems like a potentially undesirable change? And, if that _was_ an intended change, then we have other diagnostics which need to be fixed up now: const char*operator ""foo(const char *, unsigned long); const char* f() { return "hi"foo; } emits <source>:1:12: warning: user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator [-Wuser-defined-literals] and yet, now it does invoke the operator... Comment Actions
Works in C++98 mode tho. TIL people exploit this to mix pre-C++11 code into modern C++. I agree that a reminder for those people in the release note is needed.
Agreed
We simply stop pretending a whitespace precedes an invalid ud-suffix as that affects the grammar production and therefore diagnosis in Sema. const char* operator""b(const char*, decltype(sizeof 0)); const char* f() { #define b "a" return "ONE"b; // NOW: IFNDR but calls operator""b // // BEFORE: string concat by exploiting the impl of // ext_reserved_user_defined_literal (controlled by // -Wreserved-user-defined-literal diag group) } Comment Actions
Because that code is ill-formed, Clang diagnosed it with an error by default. Isn't that preferable to accepting it by default? And disabling the error treats it as string-concat, because that enables users to remain compatible with code written assuming C++98 semantics for macros in string concatenation. (This support was a quite-intentionally added feature, not an accident of implementation!) FWIW, GCC does the same thing, but with an on-by-default-warning that doesn't even default to error severity. I do think it's OK to remove support for this extension if spec-compliance requires it, but AFAICT, that's not the case here, so I think it would be preferable to preserve the previous behavior. Comment Actions I’ll see what I can do regarding reviving the string concat behavior. It feels like that a more refined treatment than before can be achieved. Maybe adds an imaginary preceding whitespace only when we can find a macro with the same name. Comment Actions The other common breakage I'm seeing is code that hasn't yet migrated from the "PRI" format macros, of which there's an example in LLVM itself: https://github.com/llvm/llvm-project/blob/6a0e536ccfef1f7bd64ee4153b4efc0aeecf28d4/clang/test/SemaCXX/cxx11-compat.cpp#L38 e.g.: https://godbolt.org/z/v3boc9E6T
I mostly agree with this, at least the spirit of it, but do we usually remove the -Wno ability to suppress a common error because we don't like it? (Or justify it post-facto if we accidentally did that?) If removing this is important for some other aspect of clang development, it would be nice to have notice ahead of time that this option will go away. I guess +1 to what James said. Comment Actions
Sorry, I don't want to mislead: I just mean there's a handy example in clang's unit tests. I don't see any instances of this in LLVM non-test code. Comment Actions Given the concerns raised (the PRIuS one in https://godbolt.org/z/v3boc9E6T seems like a good example), and that the fix in D158372 doesn't seem straight-forward, could this be reverted to unbreak things until a revised version is ready? @aaron.ballman what do you think? Comment Actions
I agree this should be reverted from 17.x so we have more time to consider an appropriate path forward. Supporting evidence that this will likely be disruptive: https://sourcegraph.com/search?q=context:global+%5E%28%22%5B%5E%22%5D%2B%22PRI.*%29%24+lang:C%2B%2B+-file:.*test.*&patternType=regexp&case=yes&sm=0&groupBy=repo Comment Actions
I don't believe this is on 17.x. I'm suggesting reverting to green on trunk. Comment Actions
Heh, review fatigue caught me; I even mentioned on this review that it should not go into 17.x so we get more bake time with it! Sorry for that. I think it is beneficial for us to get more feedback on problems caused by the changes, in case they help us decide how we want to move forward solving them. However, if the changes on trunk are sufficiently disruptive that we probably won't get more (novel) feedback anyway, then I think a revert is reasonable. Comment Actions I went ahead and pushed a revert of this, since this change was pretty disruptive, at least in our experience. I could be convinced that it's worth sunsetting this extension for accepting C-style format string macro code, but that doesn't seem like the original intent of this change. Maybe that's the right thing to do, but it should be an intentionally considered change. Comment Actions Reopening the review since it ended up reverted, but I do think we DO want to have at least some of the changes proposed here (and/or from the pending D158372). At the very least, the existing diagnostics are currently quite misleading/wrong, and the work here was helping to improve that. Taking this example: float operator ""XXX(const char *); It results in three diagnostics: <source>:1:18: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Wreserved-user-defined-literal] 1 | float operator ""XXX(const char *); | ^ | <source>:1:18: warning: identifier 'XXX' preceded by whitespace in a literal operator declaration is deprecated [-Wdeprecated-literal-operator] 1 | float operator ""XXX(const char *); | ~~~~~~~~~~~^~~ | operator""XXX <source>:1:7: warning: user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator [-Wuser-defined-literals] 1 | float operator ""XXX(const char *); | ^ The first and third are enabled by default, the second is not. The first only describes what to do if you wanted it to be string-concat (which would be invalid here...), not if you wanted it to be an UDL-operator definition. And it's not clear that "identifier" really means "macro expanding to a string that I assume you meant" in the message. I think we should not emit that message ONLY if there is a macro named "XXX". The second is just incorrect -- there's already no space in the user’s source-code, the compiler just added one internally! The third seems OK, although if we actually do permit calling such UDLs, would want to remove the last phrase... Then, consider: float operator ""XXX(const char *); float x1 = ""XXX; One might reasonably expect to build when -Wno-reserved-user-defined-literal is specified. But, currently, it does not. I think a reasonable compromise behavior would be:
This revision is now accepted and ready to land.Sep 22 2023, 5:54 AM
Revision Contents
Diff 551141 clang/docs/ReleaseNotes.rst
clang/include/clang/Basic/DiagnosticLexKinds.td
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/lib/Lex/Lexer.cpp
clang/test/CXX/drs/dr14xx.cpp
clang/test/CXX/drs/dr17xx.cpp
clang/test/CXX/drs/dr25xx.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p1.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p10.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p11.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p3.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p4.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p5.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p6.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p7.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p8.cpp
clang/test/CXX/lex/lex.literal/lex.ext/p9.cpp
clang/test/CXX/over/over.oper/over.literal/p2.cpp
clang/test/CXX/over/over.oper/over.literal/p3.cpp
clang/test/CXX/over/over.oper/over.literal/p5.cpp
clang/test/CXX/over/over.oper/over.literal/p6.cpp
clang/test/CXX/over/over.oper/over.literal/p7.cpp
clang/test/CXX/over/over.oper/over.literal/p8.cpp
clang/test/CodeGenCXX/cxx11-user-defined-literal.cpp
clang/test/CodeGenCXX/mangle-ms-cxx11.cpp
clang/test/FixIt/fixit-c++11.cpp
clang/test/OpenMP/amdgcn_ldbl_check.cpp
clang/test/PCH/cxx11-user-defined-literals.cpp
clang/test/Parser/cxx0x-literal-operators.cpp
clang/test/Parser/cxx11-user-defined-literals.cpp
clang/test/SemaCXX/cxx11-ast-print.cpp
clang/test/SemaCXX/cxx11-user-defined-literals-unused.cpp
clang/test/SemaCXX/cxx11-user-defined-literals.cpp
clang/test/SemaCXX/cxx1y-user-defined-literals.cpp
clang/test/SemaCXX/cxx1z-user-defined-literals.cpp
clang/test/SemaCXX/cxx2a-consteval.cpp
clang/test/SemaCXX/cxx98-compat.cpp
clang/test/SemaCXX/literal-operators.cpp
clang/test/SemaCXX/no-warn-user-defined-literals-in-system-headers.cpp
clang/test/SemaCXX/reserved-identifier.cpp
clang/test/SemaCXX/warn-xor-as-pow.cpp
clang/www/cxx_dr_status.html
|
Can you move this down, so that offset is negative (after @)?