As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly.
Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm:
// If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result.
...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering.