This is an archive of the discontinued LLVM Phabricator instance.

DAGCombine: Combine BUILD_VECTOR to TRUNCATE
ClosedPublic

Authored by zvi on Jun 10 2017, 12:59 AM.

Details

Summary

Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.

Example:
v8i32 V = ...

v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)

Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.

Diff Detail

Repository
rL LLVM

Event Timeline

zvi created this revision.Jun 10 2017, 12:59 AM
zvi added a comment.Jun 10 2017, 1:06 AM

Right now this patch is not good for ARM, so here are the options:

  1. Try to improve the ARM backend
  2. Add a TLI hook for allowing the target to bail out
  3. Do this only in the X86 lowering

I will start with checking the feasibility of option 1.

test/CodeGen/ARM/vpadd.ll
221 ↗(On Diff #102108)

This may be a similar issue to https://reviews.llvm.org/D32993#inline-286079.
Will look into this.

RKSimon edited edge metadata.Jun 10 2017, 6:10 AM

A few minors, but would like to see ARM fixed first.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14262 ↗(On Diff #102108)

Would it be useful to support BUILD_VECTORs containing UNDEF elements?

14272 ↗(On Diff #102108)

Check for ISD::TRUNCATE legality, this might even help ARM?

test/CodeGen/ARM/vext.ll
205 ↗(On Diff #102108)

whitespace only change - if this is actually relevant commit it separately now as a NFC, else discard it.

test/CodeGen/X86/shuffle-vs-trunc-256.ll
2 ↗(On Diff #102108)

Ideally it'd be good to see the codegen for AVX1 targets as well

zvi added inline comments.Jun 10 2017, 9:37 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14262 ↗(On Diff #102108)

I put a TODO in line 14245 above as a reminder. I thought to start with a conservative approach in this patch and possibly add support for undefs later. The concern is that by replacing with a 'truncate' node we will be losing the 'undef' element information.

14272 ↗(On Diff #102108)

Thanks for suggesting that. Will give it a try.

test/CodeGen/ARM/vext.ll
205 ↗(On Diff #102108)

ok

test/CodeGen/X86/shuffle-vs-trunc-256.ll
2 ↗(On Diff #102108)

ok

efriedma added inline comments.Jun 12 2017, 12:27 PM
test/CodeGen/ARM/vpadd.ll
221 ↗(On Diff #102108)

The problem here is pretty straightforward: on trunk, we have two shuffles (which each get lowered to one of the two outputs of ARMISD::VUZP). With your patch, we have one ARMISD::VUZP, and one ISD::TRUNCATE. The ARM backend doesn't try to reason about this equivalence at all.

(This also eventually blocks the VUZP+VADD->VPADD combine, but that isn't really the important part.)

zvi added a comment.Jun 26 2017, 12:36 AM
test/CodeGen/ARM/vpadd.ll
221 ↗(On Diff #102108)

After looking into this some more (I apologize for the delayed response - got held up with other projects):
This specific case is the simplest and can be fixed by generalizing the combine.
But, unfortunately, cases such as addCombineToVPADDLq_s8 below seem much more difficult. By the time VZUP is matched the DAG explodes to:

t0: ch = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t55: v2f64,ch = load<LD16[%cbcr](align=8)> t0, t2, undef:i32
t56: v16i8 = bitcast t55
            t26: v8i16 = bitcast t56
          t51: v8i16 = ARMISD::VSHL t26, Constant:i32<8>
              t46: v4i32 = ARMISD::VMOVIMM TargetConstant:i32<0>
            t47: v8i16 = bitcast t46
            t58: v8i16 = ARMISD::VMOVIMM TargetConstant:i32<2056>
          t48: v8i16 = sub t47, t58
        t50: v8i16 = llvm.arm.neon.vshifts Constant:i32<693>, t51, t48
            t37: v8i8 = extract_subvector t56, Constant:i32<0>
            t36: v8i8 = extract_subvector t56, Constant:i32<8>
          t54: v8i8,v8i8 = ARMISD::VUZP t37, t36
        t29: v8i16 = sign_extend t54:1
      t30: v8i16 = add t50, t29
    t52: v2f64 = bitcast t30
    t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
  t53: ch = store<ST16[%X](align=8)> t55:1, t52, t4, undef:i32
t32: ch = ARMISD::RET_FLAG t53

So i don't see any prospect for enabling this combine for ARM without some major work on the ARM backend.

zvi added inline comments.Jun 26 2017, 12:40 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14272 ↗(On Diff #102108)

The X86 backend does not set any truncates as legal; all truncates are lowered using custom lowering, so i don't see how we can use this as a criteria for applying the combine.

zvi updated this revision to Diff 103926.Jun 26 2017, 5:08 AM

This revision adds a TLI hook for disabling the combine. The combine will be only enabled for X86. We can enable more targets in the future if we see benenfit.

zvi updated this revision to Diff 103927.Jun 26 2017, 5:14 AM

Minor fix for broken line in comment

delena added inline comments.Jun 26 2017, 5:30 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14192 ↗(On Diff #103927)

Code simplification:
any_of(N->operands() ..

14289 ↗(On Diff #103927)

I'd put isDesirable..() here, before call to reduceBuildVecToTrunc(), with N node as parameter.

zvi added inline comments.Jun 26 2017, 5:50 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14192 ↗(On Diff #103927)

There is no SDNode::operands(). I was thinking of adding one as a follow-up commit if it makes sense.

14289 ↗(On Diff #103927)

But we don't know in advance what will be the truncate's operand type and result type. Need to some analysis first.

delena added inline comments.Jun 26 2017, 5:56 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14289 ↗(On Diff #103927)

isDisarable..() will not check types, strides etc. You can check this in reduceBuildVecToTrunc for targets where such transformation is profitable.

zvi updated this revision to Diff 104100.Jun 27 2017, 12:38 AM

Following Elena's suggestion, pulling the profitability check out from reduceBuildVecToTrunc().

zvi marked 3 inline comments as done.Jun 27 2017, 12:40 AM
zvi marked 2 inline comments as done.
zvi updated this revision to Diff 104185.Jun 27 2017, 8:59 AM

Typo fix: element zero was not checked against other elements for having a common ancestor vector. Added a negative test.
Thanks to Elena for pointing it out.

RKSimon added inline comments.Jun 27 2017, 9:31 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14231 ↗(On Diff #104185)

Shouldn't this be EVT::getIntegerVT ?

zvi added inline comments.Jun 27 2017, 10:53 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14231 ↗(On Diff #104185)

Good catch

zvi updated this revision to Diff 104220.Jun 27 2017, 11:12 AM

Fix MVT -> EVT

zvi updated this revision to Diff 104604.Jun 29 2017, 1:09 AM

Rebase on ToT and update test/CodeGen/X86/shuffle-vs-trunc-512.ll which was recently added and was affected by this change.
To continue the discussion with @wolfgangp in D34069, IMO we should consider removing this test if it does not cover the defect it originally intended to, but i don't have any strong objection to keep it with the updated checks.

zvi updated this revision to Diff 104833.Jun 30 2017, 2:07 AM

Looks like Elena was right about the redundant make_range(). After looking a bit harder found the ops() method.

delena added inline comments.Jul 1 2017, 11:53 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14272 ↗(On Diff #104833)

I'd check (Stride < 2), I don't know what isPowerOf2_32() thinks about 0.

14296 ↗(On Diff #104833)

you can just call for getBitcast(VT, Res), the "getBitcast() has the "if" inside".

zvi added inline comments.Jul 3 2017, 4:07 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14272 ↗(On Diff #104833)

isPowerOf2_32(0) should return false. From a quick look at the implementation this is the case. I will add a case in unittests/Support/MathExtrasTest.cpp to make sure this is covered.

14296 ↗(On Diff #104833)

ok.

zvi updated this revision to Diff 105047.Jul 3 2017, 4:09 AM

As Elena suggested, remove the redundant predicate on the getBitcast

delena accepted this revision.Jul 3 2017, 4:15 AM

Please change comments in vector-truncate-combine.ll.

This revision is now accepted and ready to land.Jul 3 2017, 4:15 AM
zvi updated this revision to Diff 105081.Jul 3 2017, 8:16 AM

Update test with comment explaining it no longer tests what it originally intended to.
I am considering to propose the removal of this test in a follow-up patch.

zvi marked 8 inline comments as done.Jul 3 2017, 8:21 AM
This revision was automatically updated to reflect the committed changes.
wmi added a subscriber: wmi.Oct 11 2017, 5:27 PM

Revert r307036 at r315540 because of PR34919

arsenm added a subscriber: arsenm.Oct 12 2017, 9:06 PM
arsenm added inline comments.
llvm/trunk/include/llvm/Target/TargetLowering.h
2737

This probably needs some context parameters for types it's applying to