This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] wip/prototype: use LR0 instead of SLR1 table
Needs ReviewPublic

Authored by sammccall on Jun 8 2022, 3:31 PM.

Details

Reviewers
hokein
Summary

This ignores the lookahead token when deciding what to reduce, we always reduce
unconditionally. The main effects are:

  • all potential partial parses are visible e.g. to error recovery. (In error scenarios, the intended reduction may not occur with SLR1 when the error occurs at the lookahead token).
  • the size of the LR table is much smaller (In this patch we go from 340kB => 120kB, and the encoding isn't efficient)
  • we explore more dead-ends due to lack of lookahead, and parser runs slower
  • with this patch, the LR0 table parses ~1/3 slower than the SLR1. (LR0 allows code simplifications so we may win some of this back)

This patch uses eod as a dummy lookahead token for reduce actions:
i.e. reduces are Actions[State, eod]. The structure of glrParse is preserved:
we still track heads as "pending actions". To allow either LR0 or SLR1 to be
used, we perform lookups for both the actual lookahead token and eod.

In a real implementation we'd probably want to:

  • switch glrParse to track head nodes, instead of pending actions on them. This should simplify the code there somewhat.
  • use a separate storage in LRTable for reduce actions, like a parallel vector<RuleID> (sorted by stateid) and vector<size_t> (indexed by stateid)

Diff Detail

Event Timeline

sammccall created this revision.Jun 8 2022, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 3:31 PM
sammccall requested review of this revision.Jun 8 2022, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 3:31 PM
hokein added a comment.Jun 9 2022, 2:01 AM

Thanks for experimenting this!

the size of the LR table is much smaller (In this patch we go from 340kB => 120kB, and the encoding isn't efficient)

This is interesting. I'd expect we now add more reduce actions into the LRTable, thus the table size should grow (I guess the saving is because we don't need to store different token kinds, instead just a single eod token)?

we explore more dead-ends due to lack of lookahead, and parser runs slower
with this patch, the LR0 table parses ~1/3 slower than the SLR1. (LR0 allows code simplifications so we may win some of this back)

(I also did some hack experiment yesterday, got similar results)
1/3 performance downgrade seems a lot... It is ok for error-recovery, but for correct code, it is an expensive and unnecessary pay.

Since this is a large change, before moving forward, maybe we should consider some alternatives:

  1. use a hybrid solution (this also depends on how many interesting states we have)
    • for interesting states (where it has the sequence-element rule like stmt := expression <dot>), we build LR(0) for these states (like this patch);
    • for other non-interesting states, we keep building the SLR;
  2. as we discussed yesterday, we find all available reduce actions during the error recovery by iterating all tokens. This is slower (but only on the error-recovery path), but I think there are ways to speed that up, e.g. using separate storage in the LRTable
sammccall updated this revision to Diff 438693.Jun 21 2022, 7:06 AM

Deeper version of LR0, still prototype-quality

  • use hashtables instead of binary search where appropriate
  • simplify GLR accordingly
  • add fast-path reduce to reclaim some of the performance (though "reclaim" is maybe wrong - this optimization can with hindsight apply to SLR1 also)

Numbers to follow.

Thanks for experimenting this!

the size of the LR table is much smaller (In this patch we go from 340kB => 120kB, and the encoding isn't efficient)

This is interesting. I'd expect we now add more reduce actions into the LRTable, thus the table size should grow (I guess the saving is because we don't need to store different token kinds, instead just a single eod token)?

Yes, before we store (State, Tok) => Reduce(L := R) for every Tok in Follow(L).
Now we only store (State, eod) => Reduce(L := R).

sammccall updated this revision to Diff 438701.Jun 21 2022, 7:51 AM

update tests, trim dead code

sammccall updated this revision to Diff 438703.Jun 21 2022, 7:53 AM

remove first/follow, also dead

sammccall updated this revision to Diff 438727.Jun 21 2022, 9:02 AM

update test/binaries, tests now all pass

sammccall updated this revision to Diff 438732.Jun 21 2022, 9:10 AM

tweak & document fast-path

Current numbers from my machine:

Parse time (ClangPseudoBenchmark SemaCodeComplete.cpp --benchmark_filter=glrParse --benchmark_min_time=10)

old: glrParse    167033209 ns    167026892 ns           84 bytes_per_second=2.24031M/s
new: glrParse    192320371 ns    192305530 ns           72 bytes_per_second=1.94582M/s

13% slowdown

LR Table (clang-pseudo -source SemaCodeComplete.cpp -strip-directives -print-statistics):

old:
Statistics of the LR parsing table:
    number of states: 1480
    number of actions: 84136
    size of the table (bytes): 342564
new: 
Statistics of the LR parsing table:
    number of actions: shift=13741 reduce=1115 goto=14789
    size of the table (bytes): 401538

20% increase presumably due to sparseness of the shift/goto hashtables, i'll dig into it

Forest (clang-pseudo -source SemaCodeComplete.cpp -strip-directives -print-statistics):

old:
Forest bytes: 11055328 nodes: 651678
GSS bytes: 20640 nodes: 574449
new:
Forest bytes: 16747864 nodes: 989322
GSS bytes: 28832 nodes: 882916

52% increase due to incorrect reductions performed

My current thinking is this is too much overhead to accept.

I'm going to try to take some of these ideas and apply them to our SLR(1) implementation, in particular store heads as nodes rather than pending actions.
I think it's worth accepting some slowdown for this, which the fast-path might pay for.