As noticed on D66004, scalarization of a load with a constant mask as a chain of irregular loads+inserts makes it difficult to optimize before lowering, resulting in difficulties in merging loads etc.
This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.
There's a couple of regressions that I'm still looking at where we can better combine an element insertion with the final blend, and also a few places where shuffle combining forgets which elements are already zero.
Followup to D85416
Is the explicitly inserting undefs required for optimal build vector creation? I figured just inserting all the elements into an initial undef vector would have been enough.