Previously this pass was using up to 5% compiletime in some cases which
is a bit much for what it is doing. The pass featured a full blown
dataflow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. It now works in a single pass just maintaining a small state
machines per tracked physreg. This makes the pass 5-10x faster.
Given how the code is structured, i.e., this definition and all the helper functions before the main algorithm, I would add a comment saying that this structure will be populated via a bottom up traversal of the basic blocks.
Then, a LOH will be recorded when we reach an ADRP with a candidate chain (or chains when we have the ADRP_ADRP case on top of the other).