This is an archive of the discontinued LLVM Phabricator instance.

[x86] use more broadcasts to load a scalar into vector reg
ClosedPublic

Authored by spatel on Aug 22 2018, 2:03 PM.

Details

Summary

This is a preliminary step for a preliminary step for D50992. I noticed that x86 often misses chances to load a scalar directly into a vector register.

So this patch is just allowing more of those cases to match a broadcast op in lowerBuildVectorAsBroadcast(). The old code comment said it doesn't make sense to use a broadcast when we're loading a single element and everything else is undef, but I think that's the best case in the improved tests in insert-loaded-scalar.ll. We avoid scalar-to-vector-register move and/or less efficient shuffling.

Note that there are some existing types that were already producing a broadcast, but that happens semi-accidentally. Ie, it's not happening as part of lowerBuildVectorAsBroadcast(). The build vector gets expanded into load + shuffle, and then shuffle lowering produces the broadcast.

Description of the other test diffs:

  1. avx-basic.ll - replacing load+shufle is a win.
  2. sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral?
  3. sse41.ll - don't care? we convert that intrinsic to generic IR now, so this test is deprecated?
  4. vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - do we consider the pshufb alternatives with an extra instruction a regression or a win?

Diff Detail

Repository
rL LLVM