This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add a new annotation token: annot_repl_input_end
ClosedPublic

Authored by junaire on Apr 22 2023, 9:06 AM.

Details

Summary

This patch is the first part of the below RFC:
https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493

It adds an annotation token that will replace the original EOF token
when we are in incremental C++ mode. In addition, when we're
parsing an ExprStmt and there's a missing semicolon after the
expression, we set a marker in the annotation token and continue
parsing.

Eventually, we propagate this info in ParseTopLevelStmtDecl and are able
to mark this Decl as something we want to do value printing. Below is a
example:

clang-repl> int x = 42;
clang-repl> x
x is a TopLevelStmtDecl and without a semicolon, we should set
it's IsSemiMissing bit so we can do something interesting in
ASTConsumer::HandleTopLevelDecl.

The idea of the annotation token is proposed by Richard Smith, thanks!

Signed-off-by: Jun Zhang <jun@junz.org>

Diff Detail

Event Timeline

junaire created this revision.Apr 22 2023, 9:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2023, 9:06 AM
junaire requested review of this revision.Apr 22 2023, 9:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2023, 9:06 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
junaire edited the summary of this revision. (Show Details)Apr 22 2023, 9:26 AM
junaire edited the summary of this revision. (Show Details)

This patch is part of D141215. I split it as some jit people requested. I don't know if there's a good way to test the patch, please leave a comment if you have an idea. Thanks.

clang/lib/CodeGen/CodeGenModule.cpp
7219 ↗(On Diff #516079)

I'll removed the change after https://reviews.llvm.org/D148435 landed.

rsmith added a subscriber: rsmith.May 1 2023, 12:07 PM

This looks like a nice improvement to me.

clang/lib/Interpreter/IncrementalParser.cpp
162
254–270

Not really related to this patch, but in DelayedTemplateParsing mode this code appears to expect two annot_repl_input_end tokens in a row.

clang/lib/Parse/ParseStmt.cpp
547–563

I don't think we need to check whether incremental processing is enabled here -- if not, we shouldn't see the "end of REPL input" token at all.

clang/lib/Parse/Parser.cpp
617–618

Do we need to do this here? IncrementalParser already seems to take care of this, and the logic here would be easier to reason about if Parser never steps past an annot_repl_input_end token, and such tokens instead are only ever consumed by the REPL.

Are there other users of incremental processing mode, other than the REPL / IncrementalParser?

junaire updated this revision to Diff 518738.May 2 2023, 8:06 AM
junaire marked 3 inline comments as done.

Address comments, thanks!

clang/lib/Interpreter/IncrementalParser.cpp
162

Sorry but that's a private member function so I can't do that. Should I make it public?

clang/lib/Parse/Parser.cpp
617–618

Make sense to me, I'll remove it.

v.g.vassilev added inline comments.May 8 2023, 2:18 AM
clang/include/clang/AST/Decl.h
4345

We should change the accessors to the new field name accordingly.

@rsmith, any ideas how to support value printing of constructs such as int i = 12, eg, that's not a top-level statement declaration but worth printing its value? It is the shorthand of int i = 12; i.

clang/lib/CodeGen/CodeGenModule.cpp
7219 ↗(On Diff #516079)

That can probably go away now.

junaire updated this revision to Diff 520313.May 8 2023, 3:11 AM

Update + Rebase

junaire marked 2 inline comments as done.May 8 2023, 3:12 AM
v.g.vassilev accepted this revision.May 15 2023, 1:25 PM

LGTM! Thanks for putting so much efforts into this!

This revision is now accepted and ready to land.May 15 2023, 1:25 PM
This revision was automatically updated to reflect the committed changes.

Are there other users of incremental processing mode, other than the REPL / IncrementalParser?

It seems Swift's clang importer also uses incremental processing mode, I'm assuming to keep the TUScope and CurLexer alive after EOF. We also end up using the same context with various actions, which leads to a few hangs as there's various checks for eof only, eg. ReadPCHAndPreprocessAction, PreprocessOnlyAction, and RewriteIncludesAction. There's also quite a few places in the parser that only check for eof as well (I just grepped for Tok.isNot(tok::eof)).

Should these all be updated to handle annot_repl_input_end or should we instead have a different flag that we set for this purpose instead of co-opting isIncrementalProcessingEnabled?

Are there other users of incremental processing mode, other than the REPL / IncrementalParser?

It seems Swift's clang importer also uses incremental processing mode, I'm assuming to keep the TUScope and CurLexer alive after EOF. We also end up using the same context with various actions, which leads to a few hangs as there's various checks for eof only, eg. ReadPCHAndPreprocessAction, PreprocessOnlyAction, and RewriteIncludesAction. There's also quite a few places in the parser that only check for eof as well (I just grepped for Tok.isNot(tok::eof)).

Should these all be updated to handle annot_repl_input_end or should we instead have a different flag that we set for this purpose instead of co-opting isIncrementalProcessingEnabled?

I'd prefer to avoid adding a new flag. Is there a way to see how does the diff looks like? Maybe it would make more sense to use the annot_repl_input_end token? If the token name does not capture well the generic use-case I am happy to change it to something better.

bnbarham added a comment.EditedAug 4 2023, 9:11 AM

I'd prefer to avoid adding a new flag. Is there a way to see how does the diff looks like?

You mean for a new flag? I don't have one prepared, but it would basically just be adding an extra check where isIncrementalProcessingEnabled is currently used to skip resetting CurLexer and TUScope. I don't believe we'd want any difference in parsing in Swift's clang importer use case.

Maybe it would make more sense to use the annot_repl_input_end token? If the token name does not capture well the generic use-case I am happy to change it to something better.

The issue is that all these actions (and the parser checks) can run with and without isIncrementalProcessingEnabled, so they would need to check both eof and annot_repl_input_end. For some concrete locations (but no where near complete):
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L82
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L955
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/Rewrite/InclusionRewriter.cpp#L542
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Parse/ParseExprCXX.cpp#L4070

I'd prefer to avoid adding a new flag. Is there a way to see how does the diff looks like?

You mean for a new flag? I don't have one prepared, but it would basically just be adding an extra check where isIncrementalProcessingEnabled is currently used to skip resetting CurLexer and TUScope. I don't believe we'd want any difference in parsing in Swift's clang importer use case.

I meant that I'd like to figure out if we could use the annot_repl_input_end before considering a new flag.

Maybe it would make more sense to use the annot_repl_input_end token? If the token name does not capture well the generic use-case I am happy to change it to something better.

The issue is that all these actions (and the parser checks) can run with and without isIncrementalProcessingEnabled, so they would need to check both eof and annot_repl_input_end. For some concrete locations (but no where near complete):
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L82
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L955
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/Rewrite/InclusionRewriter.cpp#L542

These three seem to be useful for clang-repl too, so we might want to extend it with like !(eof || annot_repl_input_end)

https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Parse/ParseExprCXX.cpp#L4070

That looks a bit obscure to me. Looks like we are trying to reach some error recovery anchor but do you happen to have some use case at hand? In principle we could do the same as for the other 3.

I meant that I'd like to figure out if we could use the annot_repl_input_end before considering a new flag.

Oh sure, that's why I'm here :). Just trying to figure out what we think the best way forward is.

The issue is that all these actions (and the parser checks) can run with and without isIncrementalProcessingEnabled, so they would need to check both eof and annot_repl_input_end. For some concrete locations (but no where near complete):
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L82
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L955
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/Rewrite/InclusionRewriter.cpp#L542

These three seem to be useful for clang-repl too, so we might want to extend it with like !(eof || annot_repl_input_end)

Sure, though to be clear this isn't all of them, there's also the dump actions too (though not sure that makes sense for clang-repl) and a few in other files (VerifyDiagnosticConsumer.cpp, PrintPreprocessedOutput.cpp).

https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Parse/ParseExprCXX.cpp#L4070

That looks a bit obscure to me. Looks like we are trying to reach some error recovery anchor but do you happen to have some use case at hand? In principle we could do the same as for the other 3.

This was just one I picked at random from my grep over the parser. It's unclear how often this would be hit, but I have to assume it (and the other references) can, even if they're obscure/unlikely. My main point is that !eof is being used somewhat frequently to end loops, but with this change they will now all never end.

I meant that I'd like to figure out if we could use the annot_repl_input_end before considering a new flag.

Oh sure, that's why I'm here :). Just trying to figure out what we think the best way forward is.

The issue is that all these actions (and the parser checks) can run with and without isIncrementalProcessingEnabled, so they would need to check both eof and annot_repl_input_end. For some concrete locations (but no where near complete):
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L82
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/FrontendActions.cpp#L955
https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Frontend/Rewrite/InclusionRewriter.cpp#L542

These three seem to be useful for clang-repl too, so we might want to extend it with like !(eof || annot_repl_input_end)

Sure, though to be clear this isn't all of them, there's also the dump actions too (though not sure that makes sense for clang-repl)

The dump actions are also somewhat interesting because in that case we will dump the delta that was accumulated and repeat.

and a few in other files (VerifyDiagnosticConsumer.cpp, PrintPreprocessedOutput.cpp).

These ones are also interesting.

https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Parse/ParseExprCXX.cpp#L4070

That looks a bit obscure to me. Looks like we are trying to reach some error recovery anchor but do you happen to have some use case at hand? In principle we could do the same as for the other 3.

This was just one I picked at random from my grep over the parser. It's unclear how often this would be hit, but I have to assume it (and the other references) can, even if they're obscure/unlikely. My main point is that !eof is being used somewhat frequently to end loops, but with this change they will now all never end.

Since eof there were several eof-like tokens that were added: annot_module_begin, annot_module_end, tok::annot_module_include which are commonly handled in isEofOrEom. We could update these uses with isEofOrEom but I am pretty sure that @rsmith will point out cases where that's not a good very good idea :) We could either experiment with using isEofOrEom or have a similar utility function only for the two tokens (say isEoI -- end of input whatever that means).

https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Parse/ParseExprCXX.cpp#L4070

That looks a bit obscure to me. Looks like we are trying to reach some error recovery anchor but do you happen to have some use case at hand? In principle we could do the same as for the other 3.

This was just one I picked at random from my grep over the parser. It's unclear how often this would be hit, but I have to assume it (and the other references) can, even if they're obscure/unlikely. My main point is that !eof is being used somewhat frequently to end loops, but with this change they will now all never end.

Since eof there were several eof-like tokens that were added: annot_module_begin, annot_module_end, tok::annot_module_include which are commonly handled in isEofOrEom. We could update these uses with isEofOrEom but I am pretty sure that @rsmith will point out cases where that's not a good very good idea :) We could either experiment with using isEofOrEom or have a similar utility function only for the two tokens (say isEoI -- end of input whatever that means).

My other concern here is that it seems our use cases are somewhat different, eg. we wouldn't want any parsing differences and while I don't know why this is yet, the removal of:

// Skip over the EOF token, flagging end of previous input for incremental
// processing
if (PP.isIncrementalProcessingEnabled() && Tok.is(tok::eof))
  ConsumeToken();

is causing issues for us as well.

https://github.com/llvm/llvm-project/blob/df6b35e329ebecad6dc3bfb83183e482eb7a0020/clang/lib/Parse/ParseExprCXX.cpp#L4070

That looks a bit obscure to me. Looks like we are trying to reach some error recovery anchor but do you happen to have some use case at hand? In principle we could do the same as for the other 3.

This was just one I picked at random from my grep over the parser. It's unclear how often this would be hit, but I have to assume it (and the other references) can, even if they're obscure/unlikely. My main point is that !eof is being used somewhat frequently to end loops, but with this change they will now all never end.

Since eof there were several eof-like tokens that were added: annot_module_begin, annot_module_end, tok::annot_module_include which are commonly handled in isEofOrEom. We could update these uses with isEofOrEom but I am pretty sure that @rsmith will point out cases where that's not a good very good idea :) We could either experiment with using isEofOrEom or have a similar utility function only for the two tokens (say isEoI -- end of input whatever that means).

My other concern here is that it seems our use cases are somewhat different, eg. we wouldn't want any parsing differences and while I don't know why this is yet, the removal of:

I think I understand now. We basically want a mode which keeps some of the clang objects alive and ready to process more input. And on top, for clang-repl we want to enable some special parsing mode for the top-level statement declarations.

// Skip over the EOF token, flagging end of previous input for incremental
// processing
if (PP.isIncrementalProcessingEnabled() && Tok.is(tok::eof))
  ConsumeToken();

is causing issues for us as well.

Yes, that now seems to be moved on the user side as we thought it would be dead code. We can bring that one back if that's the only issue you see with that approach.

My other concern here is that it seems our use cases are somewhat different, eg. we wouldn't want any parsing differences and while I don't know why this is yet, the removal of:

I think I understand now. We basically want a mode which keeps some of the clang objects alive and ready to process more input. And on top, for clang-repl we want to enable some special parsing mode for the top-level statement declarations.

Yeah, I'd say that sums it up pretty well.

// Skip over the EOF token, flagging end of previous input for incremental
// processing
if (PP.isIncrementalProcessingEnabled() && Tok.is(tok::eof))
  ConsumeToken();

is causing issues for us as well.

Yes, that now seems to be moved on the user side as we thought it would be dead code. We can bring that one back if that's the only issue you see with that approach.

I think it would make sense to bring it back under the mode you spoke about above, yeah 👍

My other concern here is that it seems our use cases are somewhat different, eg. we wouldn't want any parsing differences and while I don't know why this is yet, the removal of:

I think I understand now. We basically want a mode which keeps some of the clang objects alive and ready to process more input. And on top, for clang-repl we want to enable some special parsing mode for the top-level statement declarations.

Yeah, I'd say that sums it up pretty well.

// Skip over the EOF token, flagging end of previous input for incremental
// processing
if (PP.isIncrementalProcessingEnabled() && Tok.is(tok::eof))
  ConsumeToken();

is causing issues for us as well.

Yes, that now seems to be moved on the user side as we thought it would be dead code. We can bring that one back if that's the only issue you see with that approach.

I think it would make sense to bring it back under the mode you spoke about above, yeah 👍

So, in that case we should bring back the boolean flag for incremental processing and keep the IncrementalExtensions LanguageOption separate. In that case IncrementalExtensions would mean that we must turn on incremental processing for managing lifetime and only use the language option when extending the parsing logic. However, I think the problem would be what to do with the tok::eof and tok::annot_repl_input_end? I'd probably need @aaron.ballman or @rsmith here...

So, in that case we should bring back the boolean flag for incremental processing and keep the IncrementalExtensions LanguageOption separate. In that case IncrementalExtensions would mean that we must turn on incremental processing for managing lifetime and only use the language option when extending the parsing logic. However, I think the problem would be what to do with the tok::eof and tok::annot_repl_input_end? I'd probably need @aaron.ballman or @rsmith here...

Would you be happy to make that change, or should I put it up? Separating the options and what to do about the token in general could be separate PRs.

So, in that case we should bring back the boolean flag for incremental processing and keep the IncrementalExtensions LanguageOption separate. In that case IncrementalExtensions would mean that we must turn on incremental processing for managing lifetime and only use the language option when extending the parsing logic. However, I think the problem would be what to do with the tok::eof and tok::annot_repl_input_end? I'd probably need @aaron.ballman or @rsmith here...

Would you be happy to make that change, or should I put it up? Separating the options and what to do about the token in general could be separate PRs.

If you could do it would be better as it would capture better your concrete use case.

My other concern here is that it seems our use cases are somewhat different, eg. we wouldn't want any parsing differences and while I don't know why this is yet, the removal of:

I think I understand now. We basically want a mode which keeps some of the clang objects alive and ready to process more input. And on top, for clang-repl we want to enable some special parsing mode for the top-level statement declarations.

Yeah, I'd say that sums it up pretty well.

// Skip over the EOF token, flagging end of previous input for incremental
// processing
if (PP.isIncrementalProcessingEnabled() && Tok.is(tok::eof))
  ConsumeToken();

is causing issues for us as well.

Yes, that now seems to be moved on the user side as we thought it would be dead code. We can bring that one back if that's the only issue you see with that approach.

I think it would make sense to bring it back under the mode you spoke about above, yeah 👍

So, in that case we should bring back the boolean flag for incremental processing and keep the IncrementalExtensions LanguageOption separate. In that case IncrementalExtensions would mean that we must turn on incremental processing for managing lifetime and only use the language option when extending the parsing logic. However, I think the problem would be what to do with the tok::eof and tok::annot_repl_input_end? I'd probably need @aaron.ballman or @rsmith here...

I missed this ping earlier in the month, sorry for that! My intuition is that this is handled by isEofOrEom() as done in this patch -- if incremental extensions is not enabled, then we wouldn't see an annot_repl_input_end token in the first place, right?