This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Store shift and goto actions in a compact structure with faster lookup.
ClosedPublic

Authored by sammccall on Jun 23 2022, 6:30 PM.

Details

Summary

The actions table is very compact but the binary search to find the
correct action is relatively expensive.
A hashtable is faster but pretty large (64 bits per value, plus empty
slots, and lookup is constant time but not trivial due to collisions).

The structure in this patch uses 1.25 bits per entry (whether present or absent)
plus the size of the values, and lookup is trivial.

The Shift table is 119KB = 27KB values + 92KB keys.
The Goto table is 86KB = 30KB values + 57KB keys.
(Goto has a smaller keyspace as #nonterminals < #terminals, and more entries).

This patch improves glrParse speed by 28%: 4.69 => 5.99 MB/s
Overall the table grows by 60%: 142 => 228KB.

By comparison, DenseMap<unsigned, StateID> is "only" 16% faster (5.43 MB/s),
and results in a 285% larger table (547 KB) vs the baseline.

Diff Detail

Event Timeline

sammccall created this revision.Jun 23 2022, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 6:30 PM
Herald added a subscriber: mgrang. · View Herald Transcript
sammccall requested review of this revision.Jun 23 2022, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 6:30 PM
sammccall updated this revision to Diff 439589.Jun 23 2022, 6:32 PM

remove debugging code

hokein accepted this revision.Jul 4 2022, 5:48 AM
hokein added inline comments.
clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h
72

The Action can be removed now (I guess your plan is to remove it in a follow-up patch).

127–128

If we're going to change the return type to llvm::Optional.

I think we can remove this comment , as the the invariant is not guaranteed by the API now, but rather than moving to the caller.

130

add nonterminal assertion?

220

As discussed offline, we could be more specific for the name (something like StateIDTable).

We might also want to put the key calculation logic (numStates() * Nonterminal + State, and numStates() * symbolToToken(Terminal) + State) to an inline function, we have a few places using them duplicately.

230

nit: V => KV
it feels more natural to make it a static build method, but up to you.

This revision is now accepted and ready to land.Jul 4 2022, 5:48 AM
This revision was landed with ongoing or failed builds.Jul 4 2022, 10:40 AM
This revision was automatically updated to reflect the committed changes.
sammccall marked 3 inline comments as done.