HomePhabricator

[x86] use a single shufps when it can save instructions

Description

[x86] use a single shufps when it can save instructions

This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

My motivating case looks like this:

  • vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
  • vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
  • vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]

    + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]

And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.

So the test case diffs all appear to be improvements except one test in
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.

Differential Revision: https://reviews.llvm.org/D27692

Details

Committed
spatelDec 15 2016, 10:03 AM
Differential Revision
D27692: [x86] use a single shufps when it can save instructions
Parents
rL289836: Fix typo in comment. NFC.
Branches
Unknown
Tags
Unknown