Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.
It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.
The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.
Would it be cleaner to set this inside SRegClass based on !ge(numRegs, 16)? Or do it in a common base class of SRegClass/VRegClass/ARegClass?