In general we split a reduce into pop/push, so concurrently-available reductions
can run in the correct order. The data structures for this are expensive.
When only one reduction is possible at a time, we need not do this: we can pop
and immediately push instead.
Strictly this is correct whenever we yield one concurrent PushSpec.
This patch recognizes a trivial but common subset of these cases:
- there must be no pending pushes and only one head available to pop
- the head must have only one reduction rule
- the reduction path must be a straight line (no multiple parents)
On my machine this speeds up by 2.12 -> 2.30 MB/s = 8%
Thinking more about this, this trivial case seems to be triggered more often if we use a more powerful LR parsing algorithm -- a more powerful LR parser means less dead heads, and more linear cases.