Previously for any copy from a register bigger than the destination:
- Copied to a same-sized register in the destination register bank.
- Subregister copy of that to the destination.
This fails for copies from 128-bit FPRs to GPRs because the GPR register bank can't accomodate 128-bit values.
Instead of special-casing such copies to perform the truncation beforehand in the source register bank, generalize this:
a) Perform a subregister copy straight from source register whenever possible. This results in shorter MIR and fixes the above problem.
b) Perform a full copy to target bank and then do a subregister copy only if source bank can't support target's size. E.g. GPR to 8-bit FPR copy.
This should be an llvm_unreachable, I think. We never want to return anything here.