This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Only expand UCNs for raw_identifiers
ClosedPublic

Authored by sammccall on May 5 2022, 3:15 PM.

Details

Summary

It turns out clang::expandUCNs only works on tokens that contain valid UCNs
and no other random escapes, and clang only uses it on raw_identifiers.

Currently we can hit an assertion by creating tokens with stray non-valid-UCN
backslashes in them.

Fortunately, expanding UCNs in raw_identifiers is actually all we need.
Most tokens (keywords, punctuation) can't have them. UCNs in literals can be
treated as escape sequences like \n even this isn't the standard's
interpretation. This more or less matches how clang works.
(See https://isocpp.org/files/papers/P2194R0.pdf which points out that the
standard's description of how UCNs work is misaligned with real implementations)

Diff Detail

Event Timeline

sammccall created this revision.May 5 2022, 3:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 3:15 PM
sammccall requested review of this revision.May 5 2022, 3:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 3:15 PM
sammccall updated this revision to Diff 427475.May 5 2022, 3:16 PM

rename testcase to be more descriptive, add comment

sammccall updated this revision to Diff 427490.May 5 2022, 4:39 PM

oops, forgot testcase

hokein accepted this revision.May 5 2022, 10:30 PM
hokein added inline comments.
clang-tools-extra/pseudo/test/crash/backslashes.c
2 ↗(On Diff #427490)

nit: I'd add --print-tokens to make the purpose of this test clearer.

This revision is now accepted and ready to land.May 5 2022, 10:30 PM
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.May 5 2022, 11:54 PM