This is a preliminary step for a preliminary step for D50992. I noticed that x86 often misses chances to load a scalar directly into a vector register.
So this patch is just allowing more of those cases to match a broadcast op in lowerBuildVectorAsBroadcast(). The old code comment said it doesn't make sense to use a broadcast when we're loading a single element and everything else is undef, but I think that's the best case in the improved tests in insert-loaded-scalar.ll. We avoid scalar-to-vector-register move and/or less efficient shuffling.
Note that there are some existing types that were already producing a broadcast, but that happens semi-accidentally. Ie, it's not happening as part of lowerBuildVectorAsBroadcast(). The build vector gets expanded into load + shuffle, and then shuffle lowering produces the broadcast.
Description of the other test diffs:
- avx-basic.ll - replacing load+shufle is a win.
- sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral?
- sse41.ll - don't care? we convert that intrinsic to generic IR now, so this test is deprecated?
- vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - do we consider the pshufb alternatives with an extra instruction a regression or a win?