Forest node by design is unmutable. To create an ambiguous node, we have
to know all alternatives in advance.
In order to achieve that, we must perform all reductions in a careful
order (see code comments for details), so that we can gather completed
alternatives as a batch, and process them in a single pass.
E.g. considering the following grammar:
bnf TU := stmt TU := expr stmt := expr expr := ID stmt := ID // Ambiguous stmt forest node: // stmt (ambiguous) // / \ // / stmt // | | // stmt expr // \ / // ID
The ambiguous Stmt node is built in a single section where we perform three reductions:
- expr := ID
- stmt := ID
- stmt := expr (enabled after 1 is performed)
We expect to perform them in a batch way with the order {1}, {2, 3}.
When processing the batch {2, 3} where ambiguity happens, we build an
ambiguous node for stmt.
Based on https://reviews.llvm.org/D121150
As discussed offline, the algorithm is simplest if the RuleIDs themselves reflect the topological order in which we want to reduce them.
We require the RuleIDs to be grouped by the SymbolID of their LHS, but I think don't have much of a requirement on SymbolIDs themselves. (Currently they're alphabetical and we binary-search for _ in the grammar constructor, but not much else).
So I think we can just sort the symbols with this topo order when constructing the grammar.
(This is fairly self-contained, probably affects dumps from tests, could be a separate patch if you like)