Page MenuHomePhabricator

[X86][SSE] Allow SplitOpsAndApply to split to lowest common vector size
Changes PlannedPublic

Authored by RKSimon on Wed, Jan 9, 11:05 AM.

Details

Summary

This patch relaxes the SplitOpsAndApply requirements for 128/256/512-bit sized vectors for SSE2/AVX2/AVX512BW so that it splits illegal types by the lowest common vector size and I've updated the AVG combine to demonstrate this with its existing v48i8 test case.

I'm not sure how often this really happens, but if its any use I can pursue splitting this into different sized vectors (e.g. 1*ymm + 1*xmm subvectors), but that will require a larger refactor.

@craig.topper D56306 suggests that you might want to move away from SplitOpsAndApply?

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Wed, Jan 9, 11:05 AM

PMULDQ/PMULUDQ is interacting poorly with the fact that we convert zext/sext to zext_vector_inreg/sext_vector_inreg before type legalization. So we split the PMULDQ/PMULUDQ when we create them. Then SimplfiyDemandedbits can't optimize the zext/sext to aext because the splitting messed up the use count. Then the zext/sext becomes a split zext_invec/sext_invec, but SimplifyDemandedBit won't turn those into aext_invec. So its really a gross ordering problem that probably goes away with -x86-experimental-vector-widening-legalization since we won't eagerly create zext_invec/sext_invec ops.

For this AVG case, I've considered trying to see if we could emit a v48i8 pavg and let type legalization custom widen it to v64i8 using undef and then split it. I think that requires us to use a (v64i8 (insert_subvector undef, (v48i8 X))) to widen the inputs in custom legalization. Then generic legalization would need support for legalizing the v64i8 insert_subvector with v48i8 input. Once its widened and split we should have one v64i8 pavg, or two v32i8 pavg, or four v16i8 pavg depending on the target.

Another idea is that we could teach custom type legalization to split the v48i8 avg as it widens into v16i8 undef, v16i8 avg, v32i8 avg for avx2.

craig.topper added inline comments.Wed, Jan 9, 7:50 PM
lib/Target/X86/X86ISelLowering.cpp
37004

This crashes on v24i8. Probably need InVT.getSizeInBits() % 128 == 0 instead of NumElems % 8 == 0

RKSimon planned changes to this revision.Thu, Jan 10, 2:34 AM