This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Handle non-ASCII after line splicing
ClosedPublic

Authored by cor3ntin on Sep 1 2023, 2:05 AM.

Details

Summary

int a\
ス;

Failed to be parsed as a valid identifier.

Fixes #65156

Diff Detail

Event Timeline

cor3ntin created this revision.Sep 1 2023, 2:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2023, 2:05 AM
cor3ntin requested review of this revision.Sep 1 2023, 2:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2023, 2:05 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
shafik added inline comments.Sep 1 2023, 8:53 AM
clang/lib/Lex/Lexer.cpp
1757

I think that is what you meant to say

1761

Do we need to verify that FirstCodeUnitSize is not zero?

1798

So if we get here it mean we have the splice to skip?

cor3ntin updated this revision to Diff 555563.Sep 1 2023, 11:12 PM
cor3ntin marked an inline comment as done.

Add and fix comments

clang/lib/Lex/Lexer.cpp
1761

Nope, we always get a char of getCharAndSize so it's either 1 in the normal case, or N if there are escaped newlines or trigraphs

1798

Yes. And we need to emit a diag. It's a bit subtle so i added a comment.

shafik added a comment.Sep 5 2023, 4:20 PM

I think this looks good but I would like @tahonermann to review this as well.

tahonermann accepted this revision.Sep 6 2023, 12:48 PM

This looks good to me modulo a couple of nits.

clang/include/clang/Lex/Lexer.h
806–807

Tok should be documented here.

clang/lib/Lex/Lexer.cpp
1800
This revision is now accepted and ready to land.Sep 6 2023, 12:48 PM
cor3ntin updated this revision to Diff 556076.Sep 6 2023, 1:37 PM

Address Tom's feedback

tahonermann accepted this revision.Sep 6 2023, 1:47 PM

Even better than I asked for. I held back on suggesting the change of Tok to Result to match tryConsumeIdentifierUCN(), but you made that change anyway! You must have read my mind! :)

This revision was landed with ongoing or failed builds.Sep 6 2023, 2:20 PM
This revision was automatically updated to reflect the committed changes.

Even better than I asked for. I held back on suggesting the change of Tok to Result to match tryConsumeIdentifierUCN(), but you made that change anyway! You must have read my mind! :)

Haha. Once i added docs the lack of symmetry with the previous function became jarring!
Thanks!