This is an archive of the discontinued LLVM Phabricator instance.

[Lex] Introduce Preprocessor::LexTokensUntilEOF()
ClosedPublic

Authored by Hahnfeld on Aug 21 2023, 5:08 AM.

Details

Summary

This new method repeatedly calls Lex() until end of file is reached and optionally fills a std::vector of Tokens. Use it in Clang's unit tests to avoid quite some code duplication.

Diff Detail

Event Timeline

Hahnfeld created this revision.Aug 21 2023, 5:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:08 AM
Hahnfeld requested review of this revision.Aug 21 2023, 5:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:08 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
v.g.vassilev added inline comments.
clang/lib/Lex/Preprocessor.cpp
998

Shouldn't we take the results as a LexAll(std::vector<Token> &result)?

Perhaps we should rename this interface to LexTokensUntilEOF?

aaron.ballman added inline comments.Sep 14 2023, 5:32 AM
clang/lib/Lex/Preprocessor.cpp
998

I think both suggestions make sense, especially given this doesn't return the EOF token (so it doesn't really lex *all*). Though, given how many callers do not want to store the tokens, perhaps it should take a std::optional or a pointer to the vector so we can skip the storage work if it's unnecessary?

1000

I'd prefer not to assume the token stream has an EOF token (perhaps the stream is one only being used to parse until the eod token instead), so if we can turn this into a non-infinite loop, that would make me more comfortable.

Hahnfeld updated this revision to Diff 556791.Sep 14 2023, 9:13 AM
Hahnfeld retitled this revision from [Lex] Introduce Preprocessor::LexAll() to [Lex] Introduce Preprocessor::LexTokensUntilEOF().
Hahnfeld edited the summary of this revision. (Show Details)

Rename to Preprocessor::LexTokensUntilEOF()

Hahnfeld marked 2 inline comments as done.Sep 14 2023, 9:19 AM
Hahnfeld added inline comments.
clang/lib/Lex/Preprocessor.cpp
998

The new LexTokensUntilEOF now takes an optional std::vector<Token> *Tokens = nullptr.

1000

I'm not sure I understand entirely. Do you want something like

tok.isOneOf(tok::unknown, tok::eof, tok::eod)

instead of tok.is(tok::eof)? Can this happen at the level of the Preprocessor?

aaron.ballman added inline comments.Sep 14 2023, 10:15 AM
clang/lib/Lex/Preprocessor.cpp
1000

I was thinking something more along the lines of:

if (Tokens) {
  for (Token Tok; !Tok.isOneOf(tok::eof, tok::eod); Lex(Tok))
    Tokens->push_back(Tok);
}

but I hadn't thought about tok::unknown; that might be a good one to also include given that clangd operates on partial sources.

v.g.vassilev added inline comments.Sep 14 2023, 11:01 AM
clang/lib/Lex/Preprocessor.cpp
1000

I was wondering if we could somehow merge this routine with Parser::SkipUntil since they seem to be doing a very similar tasks.

aaron.ballman added inline comments.Sep 14 2023, 12:14 PM
clang/lib/Lex/Preprocessor.cpp
1000

That could perhaps end up looking reasonable (they do similar tasks aside from collecting the tokens that are being skipped). Do you need the interface to be on Preprocessor or Parser though (or does it not really matter for you)?

Hahnfeld updated this revision to Diff 556834.Sep 15 2023, 1:13 AM
Hahnfeld marked an inline comment as done.

Handle all of tok::unknown, tok::eof, tok::eod.

clang/lib/Lex/Preprocessor.cpp
1000

tok.isOneOf(tok::unknown, tok::eof, tok::eod)

I implemented this check, let me know if this looks reasonable. The code you posted doesn't do what we need because we also want to lex if Tokens is nullptr, so the hierarchy must be an if inside the loop.
Given these additional token kinds, does UntilEOF still make sense or do we want another name? Note that I'll leave repl_input_end to https://reviews.llvm.org/D158415.

I was wondering if we could somehow merge this routine with Parser::SkipUntil since they seem to be doing a very similar tasks.

I'm not sure this makes sense, given that Parser::SkipUntil requires some knowledge about the structural input. At the very least, I'd prefer not to go into that direction for this change.

This looks reasonable to me, but I'd wait for @aaron.ballman greenlight as he seems to have had more comments/ideas.

clang/lib/Lex/Preprocessor.cpp
1000

I am not sure I understand the reasoning but I somewhat see that having Parser's SkipUntil be implemented with our new Preprocessor::LexUntil... would require a lot more work. How about adding a fixme note capturing this as a future possible refactoring?

aaron.ballman accepted this revision.Oct 4 2023, 7:02 AM

LGTM aside from a minor naming nit. I think finding a way to unify with SkipUntil would be good follow-up work, though.

clang/lib/Lex/Preprocessor.cpp
1000

Minor naming convention nit.

This revision is now accepted and ready to land.Oct 4 2023, 7:02 AM
Hahnfeld updated this revision to Diff 557603.Oct 5 2023, 12:12 AM
Hahnfeld marked an inline comment as done.

Address minor naming convention nit.

clang/lib/Lex/Preprocessor.cpp
1000

@v.g.vassilev Parser::SkipUntil has a long switch statement looking at the token and taking special actions depending on their kind and the context it appears in. I don't see how to generalize this in the way LexTokensUntilEOF works.

This revision was landed with ongoing or failed builds.Oct 5 2023, 2:04 AM
This revision was automatically updated to reflect the committed changes.