In D110173 we start using the existing LLVM IDF calculator to place PHIs as we reconstruct an SSA form of machine-code program. Sadly that's slower than the old (but broken) way, this patch attempts to recover some of that performance.
The key observation: every time we def a register, we also have to def it's register units. A refresher on units is here [0]. If we def'd $rax, in the current implementation we independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and they will all have the same PHI positions. Instead of doing that, we can calculate the PHI positions for {al, ah} and place PHIs for any aliasing registers in the same positions. Any def of a super-register has to def the unit, and vice versa, so this is sound. It cuts down the SSA placement we need to do significantly.
There are a few escape hatches: stack slots don't have subregisters (yet), so they have to be handled independently. There are also scenarios where we only ever read from registers (such as arguments), and so never track their sub-registers or register units. In both of these cases, this patch just falls back to placing PHIs "normally". There's also a small amount of juggling to do with the stack pointer: we ignore defs of the stack pointer at calls, so register masks and SP defs get ignored. Now that we're placing PHIs for register units on top of the super-registers they alias, we need to be careful to not def the stack pointer register units either. Thus, there's now a collection of "aliases of the stack pointer" in MLocTracker that shouldn't be def'd by calls either.
Testing: because this is performance related, no tests, but this builds clang-3.4 with identical object files after being applied (save for the object file with the date/time in it).
[0] https://lists.llvm.org/pipermail/llvm-dev/2012-May/050177.html
NB: this seems to be the canonical way of enumerating register unit... registers? in the rest of LLVM. What MCRegUnitIterator produces doesn't seem to necessarily correspond to a single register, therefore there's an additional process of enumerating the roots too.
Either way, it works for x86, and this pattern is used elsewhere in llvm. So It Must Be Right (TM).