This is an archive of the discontinued LLVM Phabricator instance.

[x86] eliminate more sign-bit tests with vector select
AbandonedPublic

Authored by spatel on Jun 11 2018, 11:58 AM.

Details

Summary

vselect (pcmpgt 0, X), Y, Z --> shrunkblend X, Y, Z

This shortcoming was noted in D47330, and the test diffs show we already had other examples where we failed to fold to a SHRUNKBLEND:

/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.

Diff Detail

Event Timeline

spatel created this revision.Jun 11 2018, 11:58 AM
RKSimon added inline comments.Jun 11 2018, 1:39 PM
test/CodeGen/X86/vsel-cmp-load.ll
251–252

This can go

258

How come this folds but the AVX1 case in slt_zero above doesn't?

spatel updated this revision to Diff 150837.Jun 11 2018, 2:02 PM

Patch updated:
Remove a stale FIXME comment from a test.

spatel marked an inline comment as done.Jun 11 2018, 2:04 PM
spatel added inline comments.
test/CodeGen/X86/vsel-cmp-load.ll
258

AVX1 is more complicated due to ISA limitations, so I was planning to catch that one next. There, we've split the PCMPGT into halves, so I'll need to match a pattern with a concat:

    t41: v4i32 = X86ISD::PCMPGT t37, t32
        t31: v8i16 = vector_shuffle<4,5,6,7,u,u,u,u> t28, undef:v8i16
      t33: v4i32 = sign_extend_vector_inreg t31
    t42: v4i32 = X86ISD::PCMPGT t37, t33
  t40: v8i32 = concat_vectors t41, t42
  t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
  t6: v8i32,ch = CopyFromReg t0, Register:v8i32 %2
t23: v8i32 = vselect t40, t4, t6
spatel added inline comments.Jun 11 2018, 2:10 PM
test/CodeGen/X86/vsel-cmp-load.ll
258

Or probably easier - we match the pattern after type legalization, but before vector op legalization:

    t21: v8i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
  t22: v8i32 = setcc t24, t21, setlt:ch
  t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
  t6: v8i32,ch = CopyFromReg t0, Register:v8i32 %2
t23: v8i32 = vselect t22, t4, t6
spatel added inline comments.Jun 12 2018, 7:55 AM
test/CodeGen/X86/vsel-cmp-load.ll
258

See D48078 for an implementation of that suggestion.

spatel abandoned this revision.Jun 13 2018, 5:37 AM

Abandoning - we do better by matching setcc+vselect earlier.