This is an archive of the discontinued LLVM Phabricator instance.

[Lex] Fix a crash in updateConsecutiveMacroArgTokens.
ClosedPublic

Authored by hokein on Feb 14 2023, 3:41 PM.

Diff Detail

Unit TestsFailed

Event Timeline

hokein created this revision.Feb 14 2023, 3:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 3:41 PM
hokein requested review of this revision.Feb 14 2023, 3:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 3:41 PM
hokein added inline comments.Feb 14 2023, 3:48 PM
clang/test/Lexer/update_consecutive_macro_crash.cpp
9

More details about the issue :

  • due to the error recovery, the clang lexer inserts a pair of () around the macro argument int{,}, so we will see [(, int, {, ,, }, )] tokens
  • however, the size of file id for the macro argument only take account the written tokens which are [int, {, ,, }], and the extra inserted ) token is at the Limit source location which triggers an empty Partition.
shafik added a subscriber: shafik.Feb 14 2023, 4:21 PM
shafik added inline comments.
clang/lib/Lex/TokenLexer.cpp
1022–1031

You had refactored the previous version, was this the way it was handled before? Do we need to add more test to ensure that we don't have any other unintended issues?

1030

I don't believe we include links to issues in comments but you should definitely add the information to the commit message and when folks look at the commit they can get the that context there.

hokein added inline comments.Feb 15 2023, 1:14 AM
clang/lib/Lex/TokenLexer.cpp
1022–1031

was this the way it was handled before?

Yeah, the previous implementation literally checked whether every token is from the same file id by calling getFileID.

Do we need to add more test to ensure that we don't have any other unintended issues?

I'd like to add more, but it is hard to foresee all invalid cases (clang has different error-recovery strategies that might have different side effects).

1030

I'm happy to remove it. But is there any guideline discouraging this practise? I have already seen this pattern in LLVM project. I think this is based on the author's judgement (IMO, it seems better to keep it for this case).

sammccall accepted this revision.Feb 22 2023, 12:50 AM
sammccall added inline comments.
clang/lib/Lex/TokenLexer.cpp
1028

Naively, it seems like if recovery tokens could lead us to read one-past-the-end, it could also cause us to read two-past-the-end in which case this check would fail.

We're relying on this never happening, so I think the comment should refer more specifically to the tokens that get inserted.

Maybe "recovery only ever inserts a single token past the end of the FileID, specifically the ) when a macro-arg containing a comma should be guarded by parentheses"?

1030

I see 50+ refs to github issues, 70+ links to bugs.llvm.org, 1K+ references to PRNNNNN.
The reference seems useful when it's a strange workaround like this.

clang/test/Lexer/update_consecutive_macro_crash.cpp
9

can you add these as comments to the test?

This revision is now accepted and ready to land.Feb 22 2023, 12:50 AM
hokein updated this revision to Diff 499405.Feb 22 2023, 1:15 AM
hokein marked 2 inline comments as done.

address comments.

This revision was landed with ongoing or failed builds.Feb 22 2023, 1:16 AM
This revision was automatically updated to reflect the committed changes.