This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Add tests and mark as implemented WG14 N2728 (char16_t & char32_t string literals shall be UTF-16 & UTF-32)
ClosedPublic

Authored by tahonermann on Apr 24 2023, 1:31 PM.

Details

Summary

This change expands testing of UTF-8, UTF-16, and UTF-32 character and string literals as validation that WG14 N2728 (char16_t & char32_t string literals shall be UTF-16 & UTF-32) has been implemented.

WG14 N2728: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2728.htm

Diff Detail

Event Timeline

tahonermann created this revision.Apr 24 2023, 1:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2023, 1:31 PM
tahonermann requested review of this revision.Apr 24 2023, 1:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2023, 1:31 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tahonermann added inline comments.Apr 24 2023, 1:42 PM
clang/test/Lexer/char-literal.cpp
1–4

C++17 and C2x are added so that UTF-8 character literals are exercised. C++20 is added to exercise the change in type of UTF-8 literals due to char8_t.

48–50

C apparently prefers that programmers use actual control characters rather than naming them via UCNs, even in character and string literals. I know not why, but that is what N3096 6.4.3 (Universal character names) says.

73–99

The unsigned char casts are to work around conversion issues with (signed) char and the change of type to char8_t in C++20 vs C++17.

clang/www/c_status.html
932

As far as I can tell, no changes are needed for Clang to implement N2728; UTF-16 and UTF-32 have been used for char16_t and char32_t literals since their introduction in C11 and C++11, so there is no specific Clang version to mark as a conformance point.

cor3ntin accepted this revision.Apr 24 2023, 2:12 PM

LGTM (modulo nitpicking comment)

clang/test/Lexer/char-literal.cpp
49

I think these tests would be clearer with a different verify tag rather than an ifdef, but it's kinda preexisting so feel free to ignore.

This revision is now accepted and ready to land.Apr 24 2023, 2:12 PM
aaron.ballman accepted this revision.Apr 25 2023, 9:29 AM

LGTM!

clang/test/Lexer/char-literal.cpp
49

That'd be a nice cleanup for post-commit though.

Thank you @cor3ntin, that was an excellent suggestion; this is much more readable now! I updated the new code to use custom verify tags (I left the existing code alone).

This revision was landed with ongoing or failed builds.Apr 27 2023, 2:27 PM
This revision was automatically updated to reflect the committed changes.
tahonermann marked 2 inline comments as done.