In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.
Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
Would this ever happen? Having the operation legal on a smaller vector width, without it being legal in the higher size?