If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP for the common elements and
INSERT_VECTOR_ELT for the different elements.
Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.
With D90176, the lowering for patterns originating from code like
float32x4_t y = {a,a,a,0}; (common in 3D apps) are lowered even
better (unnecessary fmov is removed).
I think NumElts >= 4 is redundant? If NumElts == 2, there is no value of NumDifferentLanes that makes this pass.