Page MenuHomePhabricator

[DAGCombiner] try to move splat after binop with splat constant
ClosedPublic

Authored by spatel on Wed, May 13, 10:23 AM.

Details

Summary

binop (splat X), (splat C) --> splat (binop X, C)
binop (splat C), (splat X) --> splat (binop C, X)

We do this in IR, and there's a similar fold for the case with 2 non-constant operands just above the code diff in this patch.

This was discussed in D79718, and the extra shuffle in the test (llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it was noticed disappears because demanded elements analysis is no longer blocked. The large majority of the test diffs seem to be benign code scheduling changes, but I do see another type of win: moving the splat later allows binop narrowing in some cases. I don't see any obvious regressions; those were avoided on x86 and ARM with the INSERT_VECTOR_ELT restriction.

Diff Detail

Event Timeline

spatel created this revision.Wed, May 13, 10:23 AM
craig.topper added inline comments.Wed, May 13, 6:38 PM
llvm/test/CodeGen/X86/vector-rotate-128.ll
853–854

Something weird happened here I think. We appear to be loading a constant as an immediate and moving it into a vector. Then we AND it with something that was just ANDed with a constant pool. Could the two ANDs using a single constant pool?

spatel marked an inline comment as done.Thu, May 14, 7:15 AM
spatel added inline comments.
llvm/test/CodeGen/X86/vector-rotate-128.ll
853–854

I think the root cause is that we don't have a combine for:
ZERO_EXTEND_VECTOR_INREG --> BITCAST
...if we know all of the high bits are already zero.

That's visible in existing tests - for example in the SSE41 or AVX1 output for this test, we don't need to pand+pmovzxwq.

I have a draft of that patch, and it causes a massive amount of test diffs...taking a look now.

spatel marked an inline comment as done.Thu, May 14, 10:56 AM
spatel added inline comments.
llvm/test/CodeGen/X86/vector-rotate-128.ll
853–854

Oh wait...I got my vector elements mixed up. This is just a bizarre build vector legalization:

Legalizing: t83: v8i16 = BUILD_VECTOR Constant:i16<-1>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
Trying custom legalization
Creating constant: t84: i32 = Constant<65535>
Creating new node: t85: v4i32 = scalar_to_vector Constant:i32<65535>
Creating constant: t86: i32 = Constant<0>
Creating new node: t87: v4i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>
Creating new node: t88: v4i32 = vector_shuffle<4,1,2,3> t87, t85
Creating new node: t89: v8i16 = bitcast t88
spatel updated this revision to Diff 266010.Mon, May 25, 5:26 AM

Rebased - x86 tests diffs are altered after D80131 / rGfa038e03504c.

spatel marked an inline comment as done.Mon, May 25, 12:54 PM
spatel added inline comments.
llvm/test/CodeGen/X86/vector-rotate-128.ll
853–854

So we still end up with and-of-and or some variant of that here even with the updated build vector lowering.
I might've been staring at this too long, but I'm not sure how to solve it (and not sure if it's worth the effort).
Here's what I see happening:

  1. turn the raw IR into a rotl
  2. expand the rotl including masking of the shift amount
  3. convert generic vector shifts to x86 shifts by splatted scalar amount
  4. end up with 2 'and' with target constants of different element widths:
    t63: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>> 0
  t61: v8i16,ch = load<(load 16 from constant-pool)> t0, t63, undef:i64
t48: v8i16 = and t4, t61
          t95: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<4 x i32> <i32 65535, i32 0, i32 0, i32 0>> 0
        t96: v8i16,ch = load<(load 16 from constant-pool)> t0, t95, undef:i64
      t80: v8i16 = and t48, t96
craig.topper accepted this revision.Mon, May 25, 2:29 PM

Ok lets go with this. LGTM

This revision is now accepted and ready to land.Mon, May 25, 2:29 PM
This revision was automatically updated to reflect the committed changes.