This is an enhancement to D81766 to allow loading the minimum target vector type into an IR vector with a different number of elements.
In one of the motivating tests from PR16739, SLP creates <2 x float> load ops mixed with <4 x float> insert ops, so we want to handle that pattern in addition to potential oversized vectors created by the vectorizers.
I'm not sure if we should try to model the cost of the identity shuffle as an insert/extract subvector since we are shuffling with undef?
Are we in danger of creating out of bounds shuffle mask indices if the dst vector type is more than 2x the original size (v2f32 -> v16f32 etc.) ? I think they canonicalize to undef but I'm not sure (+ have no access to the source tree atm)