This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188
This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion. The widening case probably needs addressing separately as in most cases this should have a zero cost. The self-insertion case is trickier, but we currently always match this with the existing SK_PermuteSingleSrc logic.
Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider subvectors.
I