PR31851 (although I hit the same issue on some other, internal testcases) shows LCSSA formation taking a long time.
This patch is an idea discussed on IRC and proposed by Danny to speed up the computation.
It seems a large chunk of time is spent walking the Dominator tree while walking over the basic blocks forming a loop. We do it to skip blocks not dominating any of the loop exits, as if this condition is true, no SSA name defined in the block can be used outside of it.
Turns out, and that's at the same time funny and sad, performing the check itself is more expensive than performing the work.
This patch speeds up the checking doing the following:
- Walk the domtree in breadth-first order, keeping tracks of levels and numbering nodes according to the BF order.
- Walk loops in domtree order, performing the dominance checks only on exits above the current BB.
(Danny suggests we could do better just looking at the subtrees, but that's probably an improvement for later).
The patch needs cleanup, and needs to be slightly reworked as the domtree computation is performed in runOnFunction but there are many loop passes which call formLCSSARecursively() as an utility and could benefit from this speedup (although they currently don't). So, consider it a WIP, but, goes without saying, comments welcome.
Figures: opt -O2 on the test attached to PR31851
Baseline:
real 1m45.731s user 1m45.533s sys 0m0.198s
Patch:
real 1m35.519s user 1m35.318s sys 0m0.201s
Unless I measured this wrong (which is possible, given it's 2AM), it's a net 10% reduction in compile time.