AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-mask/bit-blend patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.