This is an archive of the discontinued LLVM Phabricator instance.

[clang][Lexer] Make raw and normal lexer behave the same for line comments
ClosedPublic

Authored by kadircet on Jan 28 2022, 7:24 AM.

Details

Summary

Normally there are heruistics in lexer to treat //* specially in
language modes that don't have line comments (to emit /). Unfortunately this
only applied to the first occurence of a line comment inside the file, as the
subsequent line comments were treated as if language had support for them.

This unfortunately only holds in normal lexing mode, as in raw mode all
occurences of line comments received this treatment, which created discrepancies
when comparing expanded and spelled tokens.

The proper fix would be to just make sure we treat all the line comments with a
subsequent * the same way, but it would imply breaking some code that's
accepted by clang today. So instead we introduce the same bug into raw lexing
mode.

Fixes https://github.com/clangd/clangd/issues/1003.

Diff Detail

Event Timeline

kadircet created this revision.Jan 28 2022, 7:24 AM
kadircet requested review of this revision.Jan 28 2022, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 7:24 AM
sammccall added inline comments.Jan 31 2022, 6:25 AM
clang/lib/Lex/Lexer.cpp
2387

It's much less obvious what this does in raw lexing mode.

My rough understanding is raw lexing mode is ~stateless and we may choose to recreate the lexer, start lexing at any point etc. So having a stateful flag flip won't actually result in the same text being lexed the same way, because we may not run in the same sequence.

sammccall accepted this revision.Jan 31 2022, 6:47 AM
sammccall added inline comments.
clang/lib/Lex/Lexer.cpp
2387

OK, discussed offline:

  • if lexing the whole file in a loop (motivating case), this will DTRT
  • if lexing one token and thtrowing away the lexer, modifying LineComment is a no-op
  • if lexing some range, we're going to be stateful but with possibly-wrong initial state

As I understand, this fixes case 1, leaves 2 broken, and slightly changes the way in which 3 is broken, which seems OK.
Like you say, the right fix is to take the statefulness out of the language extension, which can happen after the branch.

This revision is now accepted and ready to land.Jan 31 2022, 6:47 AM
This revision was landed with ongoing or failed builds.Jan 31 2022, 7:15 AM
This revision was automatically updated to reflect the committed changes.
kadircet marked 2 inline comments as done.
probinson added inline comments.
clang/unittests/Lex/LexerTest.cpp
654

@kadircet @sammccall It turns out this while loop is zero-trip; the test assertions in the body are never executed. Replace it with assert(false); and the test doesn't crash.

That means ToksView is empty from the start. This is probably not what you wanted?

In other words, the test does not exercise the patch. This is pretty serious. It needs to be fixed or reverted.

kadircet added inline comments.Feb 7 2022, 4:15 AM
clang/unittests/Lex/LexerTest.cpp
654

thanks! the discrepancy was actually having tokens in one but not the other, hence tests were failing initially but after the fix the tests passed (by making both lexing modes return none) so it skipped my attention. sending out a fix now.