We currently lower to UNPCKH but that means the source vector must match the destination vector causing an additional move, which MOVHLPS can avoid.
Fixes #26889
Paths
| Differential D125238
[X86] Prefer MOVHLPS for shuffle(x,1,-1) extraction patterns (PR26515) Changes PlannedPublic Authored by RKSimon on May 9 2022, 8:23 AM.
Details
Summary We currently lower to UNPCKH but that means the source vector must match the destination vector causing an additional move, which MOVHLPS can avoid. Fixes #26889
Diff Detail
Unit TestsFailed Event TimelineComment Actions Are you planning to do anything about the false dependency? I don't think we'll pick a "long dead" register as the original bug report suggested.
Comment Actions rebase - isTargetShuffleEquivalent doesn't allow undef target shuffle elements anymore
Revision Contents
Diff 448869 llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/lib/Target/X86/X86InstrSSE.td
llvm/test/CodeGen/X86/cast-vsel.ll
llvm/test/CodeGen/X86/combine-fcopysign.ll
llvm/test/CodeGen/X86/complex-fastmath.ll
llvm/test/CodeGen/X86/extractelement-load.ll
llvm/test/CodeGen/X86/fma.ll
llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
llvm/test/CodeGen/X86/fp-round.ll
llvm/test/CodeGen/X86/fp-roundeven.ll
llvm/test/CodeGen/X86/fp128-extract.ll
llvm/test/CodeGen/X86/fpclamptosat_vec.ll
llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
llvm/test/CodeGen/X86/frem.ll
llvm/test/CodeGen/X86/ftrunc.ll
llvm/test/CodeGen/X86/haddsub-2.ll
llvm/test/CodeGen/X86/haddsub-3.ll
llvm/test/CodeGen/X86/haddsub-shuf.ll
llvm/test/CodeGen/X86/haddsub-undef.ll
llvm/test/CodeGen/X86/haddsub.ll
llvm/test/CodeGen/X86/half.ll
llvm/test/CodeGen/X86/horizontal-reduce-fadd.ll
llvm/test/CodeGen/X86/horizontal-sum.ll
llvm/test/CodeGen/X86/inline-asm-x-i128.ll
llvm/test/CodeGen/X86/load-partial-dot-product.ll
llvm/test/CodeGen/X86/masked_compressstore.ll
llvm/test/CodeGen/X86/masked_store.ll
llvm/test/CodeGen/X86/pr11334.ll
llvm/test/CodeGen/X86/scalar-int-to-fp.ll
llvm/test/CodeGen/X86/split-vector-rem.ll
llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll
llvm/test/CodeGen/X86/sse3-avx-addsub-2.ll
llvm/test/CodeGen/X86/vec-strict-128.ll
llvm/test/CodeGen/X86/vec-strict-cmp-128.ll
llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
llvm/test/CodeGen/X86/vec_fp_to_int.ll
llvm/test/CodeGen/X86/vec_fpext.ll
llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
llvm/test/CodeGen/X86/vector-intrinsics.ll
llvm/test/CodeGen/X86/vector-narrow-binop.ll
llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll
llvm/test/CodeGen/X86/vector-reduce-fadd.ll
llvm/test/CodeGen/X86/vector-reduce-fmax-fmin-fast.ll
llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
llvm/test/CodeGen/X86/vector-reduce-fmax.ll
llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
llvm/test/CodeGen/X86/vector-reduce-fmin.ll
llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll
llvm/test/CodeGen/X86/vector-reduce-fmul.ll
llvm/test/CodeGen/X86/vector-rem.ll
llvm/test/CodeGen/X86/vector-shuffle-128-v2.ll
llvm/test/CodeGen/X86/widen_conv-3.ll
llvm/test/CodeGen/X86/widen_conv-4.ll
|
TBH I'm not convinced any of these need dependency breaks.