Page MenuHomePhabricator

[X86] Stop promoting and/or/xor/andn to vXi64.
ClosedPublic

Authored by craig.topper on Oct 14 2018, 11:45 PM.

Details

Summary

This is a work in progress patch to remove the promotion and make isel deal with the increased matching permutations.

There are still more patterns needed. I think we may also need to consider removing load promotion as well so that bitcasted loads won't exist. We may need to do some custom matching in X86ISelDAGToDAG.cpp to more easily peek through bitcasts without having to spell out every type combination.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Oct 14 2018, 11:45 PM

I think this is the right way to go. It's a blob of patterns, but they're stamped out in a regular form.
When the logic ops are promoted, we might have to deal with them separately as noted in D51553.

Can we do something similar to the scalar logic ops to reduce the duplication?

def NAME#16rr : BinOpRR_F<BaseOpc, mnemonic, Xi16, opnode>;
def NAME#32rr : BinOpRR_F<BaseOpc, mnemonic, Xi32, opnode>;
def NAME#64rr : BinOpRR_F<BaseOpc, mnemonic, Xi64, opnode>;

Rebase on top of the load promotion patch which helps reduce some of the needed patterns.

nhaehnle added inline comments.Oct 17 2018, 3:32 AM
lib/Target/X86/X86InstrAVX512.td
11437 ↗(On Diff #169949)

This and many other cases of duplication in the .td files seem amenable to using foreach. Something like:

foreach bc_ty = [bc_v16i8, bc_v8i16, bc_v4i32, bc_v2i64]<PatFrag> in
def : Pat<(xor VR128X:$src, (bc_ty (v4i32 immAllOnesV))),
          (EXTRACT_SUBREG
           (VPTERNLOGQZrri
            (INSERT_SUBREG (v8i64 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
            (INSERT_SUBREG (v8i64 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
            (INSERT_SUBREG (v8i64 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
            (i8 15)), sub_xmm)>;

Teach ANDN combine to look through bitcasts between the AND and the XOR.

craig.topper added inline comments.Oct 18 2018, 9:37 PM
lib/Target/X86/X86InstrAVX512.td
11437 ↗(On Diff #169949)

I agree we need to reduce the repetitiveness. My immediate focus is on fixing the test regressions. Then I want to see if should do anything to do custom selection in X86ISelDAGToDAG.cpp to avoid growing the static isel table.

Rebase. Fix a bad VPTERNLOG pattern. Add a combine to combineANDNP to fix a regression.

Add more patterns to handle masked logic ops. Starting to add VPTEST patterns.

craig.topper retitled this revision from [X86] WIP: Stop promoting and/or/xor/andn to vXi64. to [X86] Stop promoting and/or/xor/andn to vXi64..

I believe this now has all the patterns we need. I haven't fully collapsed the repetition of patterns into multiclasses, but I'd like to take that as follow up.

There's a few regressions related to broadcasts, but I think we already have PR for that.

With the correct patch and context this time

craig.topper added inline comments.Oct 25 2018, 12:03 PM
test/CodeGen/X86/broadcast-elm-cross-splat-vec.ll
435 ↗(On Diff #171158)

Here we have a 128-bit and a 256-bit broadcast going into isel. They both use the same load. But we don't have a broadcast f64->v4f64 instruction in AVX1 so isel emits a VMOVDDUP and an insert. This CSEs with the VMOVDDUP emitted for the f64->v2f64 broadcast. But it was too late to satisfy the one use check for the load folding. We probably need to split f64->v4f64 broadcasts during lowering/DAG combine instead of isel so it will CSE earlier.

test/CodeGen/X86/sat-add.ll
756 ↗(On Diff #171158)

This looks like we have an and with 2 not operands and we're now folding a different one to andn. I expect instcombine would have pulled the nots through the and so this is probably not a realistic case.

RKSimon accepted this revision.Oct 26 2018, 2:59 AM

LGTM - thanks

test/CodeGen/X86/broadcast-elm-cross-splat-vec.ll
435 ↗(On Diff #171158)

I can't find the PR covering this - please raise a bug if it doesn't already exist and add a TODO comment here referencing the PR.

This revision is now accepted and ready to land.Oct 26 2018, 2:59 AM
This revision was automatically updated to reflect the committed changes.