Today, we can generate a vector_shuffle from an IR shuffle where the size of the result is exactly the sum of the sizes of the input vectors. If the output vector is smaller - e.g. a <12 x i8> being formed by a shuffle with two <8 x i8> inputs, we emit a sequence of extracts and inserts.
Instead, we can form a larger vector_shuffle, and then extract a subvector of the right size - e.g. shuffle the two <8 x i8>-s into a <16 x i8> and extract a <12 x i8>.
This solves PR29025.