This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Start rules are `_ := start-symbol EOF`, improve recovery.
ClosedPublic

Authored by sammccall on Jul 25 2022, 11:46 PM.

Details

Summary

Previously we were calling glrRecover() ad-hoc at the end of input.
Two main problems with this:

  • glrRecover() on two separate code paths is inelegant
  • We may have to recover several times in succession (e.g. to exit from nested scopes), so we need a loop at end-of-file

Having an actual shift action for an EOF terminal allows us to handle
both concerns in the main shift/recover/reduce loop.

This revealed a recovery design bug where recovery could enter a loop by
repeatedly choosing the same parent to identically recover from.
Addressed this by allowing each node to be used as a recovery base once.

Diff Detail

Event Timeline

sammccall created this revision.Jul 25 2022, 11:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 11:46 PM
sammccall requested review of this revision.Jul 25 2022, 11:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 11:46 PM

+1 on this change, it would make the expose-lookahead-index-to-guard change easier.

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74

haven't look at it deeply -- is this bug related to this eof change? This looks like a different bug in recovery.

clang-tools-extra/pseudo/lib/Forest.cpp
191

nit: in the underlying TokenStream implementation, tokens() has a trailing eof token, I think we can fold this into the above loop (if we expose a token_eof() method in TokenStream). Not sure we should do this.

sammccall added inline comments.Jul 26 2022, 7:21 AM
clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74

They're "related" in that they both fix repeated-recovery scenarios.
This change fixes that we can hit an infinite loop when applying recovery repeatedly.
The eof change fixes that recovery is (erroneously) only applied once at eof.

I hoped to cover them with the same testcase, which tests repeated recovery at EOF. I can extract this change with a separate test if you like, though it will be very similar to the one I have here.

clang-tools-extra/pseudo/lib/Forest.cpp
191

I think this doesn't generalize well... at the moment we're parsing the whole stream, but in future we likely want to parse a subrange (pp-disabled regions?). In such a case we would still want the terminating EOF terminal as a device for parsing, even though there's no corresponding token.

hokein added inline comments.Jul 29 2022, 1:27 AM
clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74

This change fixes that we can hit an infinite loop when applying recovery repeatedly.

I'm more worried about this bug, I think this is an important bug, worth a separate patch to fix it, right now it looks like a join-effort in the eof change.

The eof change fixes that recovery is (erroneously) only applied once at eof.

Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop). Am I missing something?

clang-tools-extra/pseudo/lib/Forest.cpp
191

oh, ok, that's fair enough.

clang-tools-extra/pseudo/test/lr-build-basic.test
16–18

there should be a State 4 (with a _ := expr EOF • item)

clang-tools-extra/pseudo/unittests/GLRTest.cpp
622

remove the debugging statement.

sammccall added inline comments.Jul 29 2022, 1:50 AM
clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
74

Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop). Am I missing something?

There's a bug lurking in that tech debt: if recovery does not advance the cursor then we should go around the loop again, but if it happens at eof then (in the old code) there's no loop to go around at all.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 19 2022, 7:50 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
sammccall marked 4 inline comments as done.