This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Correctly parse C99 digraphs: "<:", ":>", "<%", "%>", "%:", "%:%:".
ClosedPublic

Authored by curdeius on Feb 1 2022, 8:23 AM.

Details

Summary

Fixes https://github.com/llvm/llvm-project/issues/31592.

This commits enables lexing of digraphs in C++11 and onwards.
Enabling them in C++03 is error-prone, as it would unconditionally treat sequences like "<:" as digraphs, even if they are followed by a single colon, e.g. "<::" would be treated as "[:" instead of "<" followed by "::". Lexing in C++11 doesn't have this problem as it looks ahead the following token.
The relevant excerpt from Lexer::LexTokenInternal:

// C++0x [lex.pptoken]p3:
//  Otherwise, if the next three characters are <:: and the subsequent
//  character is neither : nor >, the < is treated as a preprocessor
//  token by itself and not as the first character of the alternative
//  token <:.

Also, note that both clang and gcc turn on digraphs by default (-fdigraphs), so clang-format should match this behaviour.

Diff Detail

Event Timeline

curdeius requested review of this revision.Feb 1 2022, 8:23 AM
curdeius created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2022, 8:23 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
curdeius added a project: Restricted Project.Feb 1 2022, 8:25 AM
owenpan accepted this revision.Feb 1 2022, 9:40 AM
owenpan added inline comments.
clang/lib/Format/Format.cpp
3246

“unconditionally”?

This revision is now accepted and ready to land.Feb 1 2022, 9:40 AM
MyDeveloperDay accepted this revision.Feb 1 2022, 9:42 AM

What the fuck. I know of trigraphs, and that we got rid of them, but these...

clang/lib/Format/Format.cpp
3246

yeah

curdeius edited the summary of this revision. (Show Details)Feb 2 2022, 1:15 AM
curdeius added inline comments.
clang/lib/Format/Format.cpp
3246

Good catch. Thanks!