This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Check follow-sets instead of tying reduce actions to lookahead tokens.
ClosedPublic

Authored by sammccall on Jun 23 2022, 3:18 PM.

Details

Summary

Previously, the action table stores a reduce action for each lookahead
token it should allow. These tokens are the followSet(action.rule.target).

In practice, the follow sets are large, so we spend a bunch of time binary
searching around all these essentially-duplicates to check whether our lookahead
token is there.
However the number of reduces for a given state is very small, so we're
much better off linear scanning over them and performing a fast check for each.

D128318 was an attempt at this, storing a bitmap for each reduce.
However it's even more compact just to use the follow sets directly, as
there are fewer nonterminals than (state, rule) pairs. It's also faster.

This specialized approach means unbundling Reduce from other actions in
LRTable, so it's no longer useful to support it in Action. I suspect
Action will soon go away, as we store each kind of action separately.

This improves glrParse speed by 42% (3.30 -> 4.69 MB/s).
It also reduces LR table size by 59% (343 -> 142kB).

Diff Detail

Event Timeline

sammccall created this revision.Jun 23 2022, 3:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 3:18 PM
Herald added a subscriber: mgrang. · View Herald Transcript
sammccall requested review of this revision.Jun 23 2022, 3:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 3:18 PM
hokein accepted this revision.Jun 27 2022, 6:10 AM

This looks a great improvement. A few nits, it looks good to me overall.

clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
136

I think the API is motivated by the LR(0) experimentation, we're now decoupling the reduction look up from the lookahead token, this seems nice (LR(0) only needs getReduceRules, other LR(1) variant need getReduceRules + canFollow).

The API usage for SLR(1) is not quite straight forward at the first glance, would be better to add a small snippet in the comment.

143

nit: use isNonterminal(Nonterminal)? IMO it is clearer.

215

nit: I think rephasing it as: a half-open range of Reduces [ReduceOffset[S], ReduceOffset[S+1]). is clearer -- the current syntax Reduces[ [ReduceOffset[S], ReduceOffset[S+1]) ] is a bit weird (My first attempt was to read it as an unmatch-brackets code snippet)

218

The NT is not quite obvious, what is NT? I think it is the symbol ID, then I don't understand the NT * NUM_TOKENS + tok::Kind(T) part.

219

Oh, I understand it now after reading the build code...

NT here is the non-terminal (symbol to be reduced), T is the lookahead. And FollowSets is conceptually a flat array of a 2d bool array [Nonterminal, Tokens].

Probably add some more comments.

clang-tools-extra/pseudo/lib/GLR.cpp
371

nit: add a comment about what does return value indicate.

clang-tools-extra/pseudo/lib/grammar/LRTableBuild.cpp
115

make these members as private -- having two public members seems to violate the pattern. (FollowSets can be initialized in the constructor, we might need a separate method for Reduces. or either change the Builder to a struct).

clang-tools-extra/pseudo/unittests/LRTableTest.cpp
28

nit: we use the hard-coded ruleID in the following test, I'd suggest add a trailing comment for the rule ID for each rule here, for better code readability.

38

It would be better to change it to an IDENTIFIER to reflect the grammar above.

39

nit: I'd move this on Line67, as this token serves as an invalid token according to the grammar.

42

the row is usual for terminals, and the reduce is a separate table, which seems a bit confusing -- I would probably split the reduce table out (adding an empty col between term and reduce)

45

nit: acc => R2 (acc) (sorry, I didn't update the comment in the removing-accept patch)

52

nit: add a /*RuleID=*/ comment annotation for the second argument.

This revision is now accepted and ready to land.Jun 27 2022, 6:10 AM
sammccall marked 13 inline comments as done.Jun 27 2022, 3:34 PM
sammccall added inline comments.
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
136

Added.

In fact it wasn't motivated by the LR(0) stuff, at least not consciously/directly.
Rather I just wanted to avoid having to allocate a data structure to copy a filtered list into!

clang-tools-extra/pseudo/lib/grammar/LRTableBuild.cpp
115

This whole class is private and exists purely for the convenience of buildSLR()/buildForTests, I don't think adding wrapping methods really helps them out.

Changed Builder to a struct and made everything public.

clang-tools-extra/pseudo/unittests/LRTableTest.cpp
42

I moved it to a separate (1D) table above ReduceEntries, and at that point it looked exactly like the definition of ReduceEntries.

So I don't think it adds anything that's not obvious from the code, I dropped the comment.

sammccall updated this revision to Diff 440415.Jun 27 2022, 3:34 PM
sammccall marked 3 inline comments as done.

address comments

This revision was landed with ongoing or failed builds.Jun 27 2022, 3:36 PM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Jul 1 2022, 9:25 AM

Our downstream bots have failed when using LLVM_ENABLE_EXPENSIVE_CHECKS=ON , and I think (unless my bisecting got screwed up) that it started to happen after this patch.

We typically get

Failed Tests (10):
  ClangPseudo :: lr-build-basic.test
  ClangPseudo :: lr-build-conflicts.test
  clangPseudo Unit Tests :: ./ClangPseudoTests/10/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/11/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/12/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/13/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/14/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/15/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/16/31
  clangPseudo Unit Tests :: ./ClangPseudoTests/9/31

And the failing stack traces look like this:

/lib/gcc/x86_64-pc-linux-gnu/9.3.0/../../../../include/c++/9.3.0/bits/stl_vector.h:1060: std::vector::const_reference std::vector<unsigned short, std::allocator<unsigned short> >::operator[](std::vector::size_type) const [_Tp = unsigned short, _Alloc = std::allocator<unsigned short>]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
 #0 0x000000000047b6b3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x47b6b3)
 #1 0x00000000004795cc llvm::sys::RunSignalHandlers() (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x4795cc)
 #2 0x000000000047bcc6 SignalHandler(int) (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x47bcc6)
 #3 0x00007f9ace5af630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00007f9acd6ce387 __GI_raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f9acd6cfa78 __GI_abort (/lib64/libc.so.6+0x37a78)
 #6 0x000000000045b988 (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x45b988)
 #7 0x00000000004e42a6 clang::pseudo::(anonymous namespace)::GLRReduce::popPending() (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x4e42a6)
 #8 0x00000000004e15f2 clang::pseudo::(anonymous namespace)::GLRReduce::operator()(std::vector<clang::pseudo::GSS::Node const*, std::allocator<clang::pseudo::GSS::Node const*> >&, unsigned short) (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x4e15f2)
 #9 0x00000000004e2e15 clang::pseudo::glrReduce(std::vector<clang::pseudo::GSS::Node const*, std::allocator<clang::pseudo::GSS::Node const*> >&, unsigned short, clang::pseudo::ParseParams const&) (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x4e2e15)
#10 0x0000000000432379 clang::pseudo::(anonymous namespace)::GLRTest_ReduceConflictsSplitting_Test::TestBody() (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x432379)
#11 0x00000000004b8b6c testing::Test::Run() (/repo/llvm/build-all-expensive/tools/clang/tools/extra/pseudo/unittests/./ClangPseudoTests+0x4b8b6c)

Thanks for the report. There is an out-of-bound issue in LRTable::getReduceRules, fixed in c99827349927a44334f2b04139168efd0bc87cd3.

bjope added a comment.Jul 2 2022, 3:11 PM

Thanks for the report. There is an out-of-bound issue in LRTable::getReduceRules, fixed in c99827349927a44334f2b04139168efd0bc87cd3.

Great, thanks for the fix!