This could be smarter by picking an ideal type, or at least splitting
the vector in half first. Also handles lower for non-power-of-2,
non-extending vector loads.
Currently this just avoids failing to legalize some odd vector AMDGPU
tests, but is a step towards removing the split logic from the
NarrowScalar logic.