As noticed on D66004, scalarization of an expandload with a constant mask as a chain of irregular loads+inserts makes it difficult to optimize before lowering, resulting in difficulties in merging loads etc.
This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.
Would it make sense to do the same thing here? Ie, make a shared helper function.
It might be that later matching always gets this case, but it would still be an efficiency win to produce the blend shuffle here too?