It was using a redundant iteration over super regs to build
SmallerAliasMap. Removing this results in exactly the same alias maps
and a noticeable performance gain on targets with a large number of
registers.
Just anecdotally: on my machine, processing a small AArch64 binary went
from 2.7s down to 80ms.
X86 has registers with subregs, which in turn has their own subregs. e.g. RAX -> EAX -> AX -> AH, AL
I'm curious whether the subreg iterator here is returning the transitive list of all sub regs of a register? e.g. would it return AL as a subreg of RAX? That's what the old code was computing.