This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Treat invalid UDL as two tokens
Needs RevisionPublic

Authored by rZhBoYao on Aug 20 2023, 11:26 AM.

Details

Reviewers
aaron.ballman
jyknight
tahonermann
Group Reviewers
Restricted Project
Summary

In contrast to Clang-17, we treat an invalid ud-suffix as if whitespace preceded
it only if it can be seen as a macro and the preceding string literal is non-empty.

#define E "!"
const char
  *operator""E(const char*),
  // ""E is a single token as it should be
  *s = "not empty"E;
  // treated as if whitespace preceds E hence a string concat:
  // = "not empty!"

This addresses comments in D153156.

Diff Detail

Event Timeline

rZhBoYao created this revision.Aug 20 2023, 11:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2023, 11:26 AM
rZhBoYao requested review of this revision.Aug 20 2023, 11:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2023, 11:26 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cor3ntin added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
9322

Can you remove <Error> and adapt the calling code to adjust the index?

clang/lib/Lex/Lexer.cpp
1992–2006

I missed that in the previous review, is the FIX-IT here still relevant?

2017

This sounds brittle. I think we are better off looking ahead to the next non-identifier character rather than assuming a size here

2024

This is also sort of brittle, it assumes standard UDL don't have unicode... but that sounds more reasonable today.

aaron.ballman added inline comments.Aug 21 2023, 5:46 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
283–284
rZhBoYao updated this revision to Diff 552060.Aug 21 2023, 9:43 AM
rZhBoYao marked 5 inline comments as done.
rZhBoYao edited the summary of this revision. (Show Details)
rZhBoYao added inline comments.
clang/lib/Lex/Lexer.cpp
1992–2006

Yes, I also put back the fixit test.

rsmith added a subscriber: rsmith.Aug 21 2023, 12:06 PM
rsmith added inline comments.
clang/docs/ReleaseNotes.rst
97

I don't think this is accurate. Clang supported CWG1473 before these changes, as far as I can see: all valid code under CWG1473 was accepted, and invalid code was diagnosed (by default). Rather, what has changed is the behavior for invalid code: instead of treating an invalid ""blah as two tokens always, in order to accept as much old code as possible, we now treat it as two tokens only when blah is defined as a macro name.

This is still a breaking change in some cases, for users of -Wno-deprecated-literal-operator, eg:

#define FOO(xyz) ""xyz

... now will be lexed as a single invalid token rather than two tokens.

I'm not sure what the motivation for making changes here was, and D153156's description doesn't really help me understand it. Is the goal to improve the diagnostic quality for these kinds of errors on invalid code? Is there some example for which Clang's behavior with regard to CWG1473 was non-conforming? Something else?

clang/lib/Lex/Lexer.cpp
2020

Reverse the order of these checks to do the cheaper check first and to avoid the possibility of reading off the end of the input.

2027–2029

That's still a breaking change compared to what we designed -Wno-reserved-literal-operator to do. How often does it happen in practice that someone has both an ill-formed literal operator *and* a macro defined to the same name as the suffix?

2035

This doesn't check whether the identifier is currently defined as a macro, in the presence of modules. It also won't be correct if the lexer has got substantially ahead of the preprocessor, and the #define has been lexed but not yet preprocessed.

Overall, it's not really possible to tell from here whether an identifier is defined as a macro. To do this properly, you should instead produce a single token here and teach the preprocessor to consider splitting it into two tokens if the suffix is a reserved ud-suffix naming a defined macro. In principle you could also check from the preprocessor whether the previous produced token was operator and use that as part of the decision...

clang/test/CXX/drs/dr14xx.cpp
487

I don't think this is correct. As far as I can tell, Clang has correctly implemented CWG1473 since version 3.2.

tahonermann requested changes to this revision.Aug 22 2023, 8:52 AM
tahonermann added a subscriber: tahonermann.

Given Richard's comments, it seems that changes are needed.

This revision now requires changes to proceed.Aug 22 2023, 8:52 AM

OK, will do it by the end of this week.