This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Sort nonterminals based on their reduction order.
ClosedPublic

Authored by hokein on Mar 23 2022, 3:57 AM.

Details

Summary

Reductions need to be performed in a careful order in GLR parser, to
make sure we gather all alternatives before creating an ambigous forest
node.

This patch encodes the nonterminal order into the rule id, so that we
can efficiently to determinal ordering of reductions in GLR parser.

This patch also abstracts to a TestGrammar, which is shared among tests.

This is a part of the GLR parser, https://reviews.llvm.org/D121368,
https://reviews.llvm.org/D121150

Diff Detail

Event Timeline

hokein created this revision.Mar 23 2022, 3:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 3:57 AM
Herald added subscribers: mgrang, mgorny. · View Herald Transcript
hokein requested review of this revision.Mar 23 2022, 3:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 3:57 AM
hokein updated this revision to Diff 417567.Mar 23 2022, 4:50 AM

some tweaks.

I think my main suggestion is to sort the symbols, not just the rules.
Having the rules in blocks grouped by the symbol they produce, but not in symbol order, seems confusing to reason about.

The extraction of TestGrammar seems unrelated to the rest of the patch and it *isn't* actually shared between tests yet, which makes it a bit hard to review.

clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h
167–168

I think it's easier to understand if we sort the *symbols* and leave this comment as-is.
It feels a bit strange to sort the symbols alphabetically but sort the rules by some other property of the symbols.

It's also important to describe what relation we're sorting by!

I'd say "The symbols are topologically sorted: if S := T then S has a higher SymbolID than T."

clang-tools-extra/pseudo/lib/GrammarBNF.cpp
56

You could identify dependencies here, and then use them to sort the uniquenonterminals before allocating SymbolIDs

122

The map seems unneccesary - just a vector<pair<SymbolID, SymbolID>> Dependencies, and sort them?

135

(is a vector of symbol id, or of states?)

clang-tools-extra/pseudo/unittests/TestGrammar.h
18 ↗(On Diff #417567)

doc

26 ↗(On Diff #417567)

I think this puts too much to much emphasis on the "single" and breaks the symmetry with "symbol" - i'd prefer just "ruleFor".

(The failure mode here is just that the test crashes 100% of the time, so the risk seems low)

28 ↗(On Diff #417567)

shouldn't these be private, or have better names, and docs when they're set etc?

I think my main suggestion is to sort the symbols, not just the rules.
Having the rules in blocks grouped by the symbol they produce, but not in symbol order, seems confusing to reason about.

We discussed this offline - having alphabetical symbol IDs is certainly nice, and there's no hard technical reason to give rules and symbols the same order.

I've come around to the (surprising!) approach in this patch, and just think we should document it so it's less surprising.

clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h
167–168

Revised suggestion:

RuleID is an index into this array of rule definitions.

Rules with the same target symbol (LHS) are grouped into a single range.
The relative order of different target symbols is *not* by SymbolID, but
rather a topological sort: if S := T then the rules producing T have lower
RuleIDs than rules producing S.
(This strange order simplifies the LR parser: for a given token range, if
we reduce in increasing RuleID order then we need never backtrack -
prerequisite reductions are reached before dependent ones).
hokein updated this revision to Diff 417731.Mar 23 2022, 1:40 PM
hokein marked an inline comment as done.

address comments, remove the extraction of TestGrammar.

The extraction of TestGrammar seems unrelated to the rest of the patch and it *isn't* actually shared between tests yet, which makes it a bit hard to review.

It is used in our previous Forest patch (not landed yet), but you're probably right. I have removed the extraction code in this patch, will send out a followup patch for the extraction afterwards.

clang-tools-extra/pseudo/lib/GrammarBNF.cpp
56

not sure I get the your point -- we could move the topological sort here, but it doesn't seem to be significantly better compared with the current solution.

135

oops, it should be the enum State

sammccall accepted this revision.Mar 23 2022, 3:41 PM

Nice!

clang-tools-extra/pseudo/lib/GrammarBNF.cpp
56

(This related to the suggestion of sorting *symbols*, in which case you want to do it based on the specs, before assigning symbol IDs. It's obsolete now)

92–97

I'd call the variable SymbolOrder - once we've computed the order and decided to use it for symbols, it's no longer locally relevant that it's topological

105

You don't need TopologicalOrder[] wrapping both, just compare symbol IDs?

119

This explanation is a bit confusing, because it refers to a second array doesn't exist and never will.

Maybe "This function returns the sort key for each symbol, the array is indexed by SymbolID."

(I think we're actually returning the inverse of the chosen permutation, but that's not a helpful explanation :-)

142

The grammar contains a cycle involving symbol {0}?

147

this is more tersely llvm::lower_bound(Dependencies, {SID, 0}), making use of the fact that std::less will do what we want

And with the inline lambda is gone, it seems more idiomatic as a for loop:

for (auto It = llvm::lower_bound(...); It != Dependencies.end() && It->first == SID; ++It)
  DFS(It->second);
156

Instead of building up the permutation array and then inverting it afterwards, maybe directly Result[SID] = NextKey++; here?

(Having pre-sized the vector)

clang-tools-extra/pseudo/unittests/GrammarTest.cpp
51

nit: maybe "ruleFor() expected a single rule..." to make it clear this isn't a real grammar requirement

53

nit: rule -> rules

This revision is now accepted and ready to land.Mar 23 2022, 3:41 PM
hokein updated this revision to Diff 417902.Mar 24 2022, 6:12 AM
hokein marked 5 inline comments as done.

address comments.

clang-tools-extra/pseudo/lib/GrammarBNF.cpp
147

thanks, this is a nice suggestion.

156

this is a smart suggestion, but it seems too smart, I find the current way is easier to understand (we first build the top order, and revert it secondly).

This revision was landed with ongoing or failed builds.Mar 24 2022, 6:31 AM
This revision was automatically updated to reflect the committed changes.