This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Correctly handle $, @, and ` when represented as UCN
ClosedPublic

Authored by cor3ntin on Jun 23 2023, 4:10 AM.

Details

Summary

This covers

This patch

  • Disallow representing $ as a UCN in all language mode, which did not properly work (see GH62133), and which in made ill-formed in C++ and C by P2558 and N3124 respectively
  • Allow a UCN for any character in C2X, in string and character literals

Fixes #62133

Diff Detail

Event Timeline

cor3ntin created this revision.Jun 23 2023, 4:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 4:10 AM
cor3ntin requested review of this revision.Jun 23 2023, 4:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 4:10 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cor3ntin added a reviewer: Restricted Project.Jun 23 2023, 4:11 AM

I'm a bit confused as to whether we're planning to support this as an extension in older language modes or whether we expect this to be a new feature only enabled in newer language modes. Also, this should update the C and CXX status pages and have a release note.

clang/include/clang/Basic/DiagnosticLexKinds.td
200–207

Don't we need similar warnings for C++2c?

clang/lib/Lex/Lexer.cpp
3487–3489

You should update the comments here so they match the source.

3499–3500

Should this be looking for C2x?

clang/lib/Lex/LiteralSupport.cpp
630

Won't this give the C warning in C++98 mode? (We should add test coverage for C++98)

cor3ntin added inline comments.Jun 26 2023, 7:49 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
200–207

The change in C++ dates back to C++11, the warning you are looking for are just above (unchanged text)

clang/lib/Lex/Lexer.cpp
3499–3500

See https://github.com/llvm/llvm-project/issues/62133
This never worked. Making it work in older C mode is something I looked into but it didn't seem worth it.

clang/lib/Lex/LiteralSupport.cpp
630

I'll be adding test and replace by Features.CPlusPlus. But to your question, no, in C++98, IsError is true so that branch is not taken

aaron.ballman added inline comments.Jun 26 2023, 8:18 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
200–207

Ahhh, thanks!

clang/test/Preprocessor/ucn-allowed-chars.c
86

I'd like more context for this diagnostic message because I have no idea what it's trying to warn.

clang/test/Preprocessor/ucn-pp-identifier.c
117

Ugh, that's unfortunate. I wonder if we should change this to use a regex so the line number doesn't matter?

cor3ntin updated this revision to Diff 534576.Jun 26 2023, 8:39 AM

Address Aaron's comments

cor3ntin updated this revision to Diff 534589.Jun 26 2023, 8:54 AM

Add regex in test file

cor3ntin added inline comments.Jun 26 2023, 9:06 AM
clang/test/Preprocessor/ucn-allowed-chars.c
86

The warning is either

whitespace required after macro name 
ISO C99 requires whitespace after the macro name

in C++ and C respectively.

Given it's not salient, I did that. Let me know if you want me to add a different verify tag

cor3ntin updated this revision to Diff 534595.Jun 26 2023, 9:11 AM
cor3ntin marked an inline comment as done.

Update comment to match C2x

cor3ntin marked 4 inline comments as done.Jun 26 2023, 9:12 AM

Changes generally LGTM, but I'll leave it to @tahonermann to do the final sign-off given this is related to text.

clang/test/Preprocessor/ucn-allowed-chars.c
86

I'd say leave it as-is for now, but we really should unify these diagnostics (then we can at least use something more descriptive like whitespace required after macro name, leaving off the ISO C bits)

clang/test/Preprocessor/ucn-pp-identifier.c
160

This is another spot where we really should be spelling out the diagnostic.

cor3ntin updated this revision to Diff 534602.Jun 26 2023, 9:25 AM

Fix diagnostic test in ucn-pp-identifiers.c

cor3ntin marked an inline comment as done.Jun 26 2023, 9:25 AM
shafik added a subscriber: shafik.Jun 26 2023, 9:27 AM
shafik added inline comments.
clang/test/Lexer/utf8-char-literal.cpp
22

Why change this test line?

cor3ntin added inline comments.Jun 26 2023, 9:30 AM
clang/test/Lexer/utf8-char-literal.cpp
22

0x80 is too big to be represented in a char (and we have tests for that, I wanted a positive tests for control characters)

Changes generally LGTM, but I'll leave it to @tahonermann to do the final sign-off given this is related to text.

@tahonermann Ping :)

tahonermann requested changes to this revision.Jul 6 2023, 2:41 PM

Changes look good; I added a number of suggested edits for minor issues.

clang/docs/ReleaseNotes.rst
203–204
530–531
clang/include/clang/Basic/DiagnosticLexKinds.td
204–207
clang/lib/Lex/Lexer.cpp
3489–3490
clang/lib/Lex/LiteralSupport.cpp
617–618
630
clang/test/Preprocessor/ucn-allowed-chars.c
16

I assume this line was deleted to minimize the disruption to line numbers due to the additional RUN line?

This revision now requires changes to proceed.Jul 6 2023, 2:41 PM
cor3ntin marked 6 inline comments as done.Jul 6 2023, 9:52 PM
cor3ntin added inline comments.
clang/test/Preprocessor/ucn-allowed-chars.c
16

Indeed!

cor3ntin updated this revision to Diff 537996.Jul 6 2023, 11:01 PM

Address Tom's comments

@tahonermann I'll probably merge that EoW unless you scream!

tahonermann accepted this revision.Jul 11 2023, 2:04 PM

I'll probably merge that EoW unless you scream!

Why wait? :) Looks good!

This revision is now accepted and ready to land.Jul 11 2023, 2:04 PM
This revision was landed with ongoing or failed builds.Jul 11 2023, 11:03 PM
This revision was automatically updated to reflect the committed changes.