Previously this pass was using up to 5% compiletime in some cases which
is a bit much for what it is doing. The pass featured a full blown
dataflow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. It now works in a single pass just maintaining a small state
machines per tracked physreg. This makes the pass 5-10x faster.