Mainly this stops weird thing like v3f64 vectors getting split
into 3 pieces and then using XMM0/XMM1 for the first two, then
finding the f64 in the 32-bit mode and using FP0 next. Now we'll
fail CanLowerReturn and fall back to sret lowering instead.
I had to copy a few lines for mmx since we were dependent on
those being inherited. But I'm not sure tye make sense.
Clang doesn't generate x86_mmx type as a function argument or
return value. So it probably doesn't matter in practice.