This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
ClosedPublic

Authored by RKSimon on Feb 28 2016, 4:17 AM.

Details

Summary

Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

This won't solve the issues with PR25718 (load+zext 8xi8 to 8xi32) but is one of several things that needs to be done at the same time.

Igor/Elena - can you advise why the masks aren't being folded into the VPMOVZX instructions on skylake targets any more please?

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 49314.Feb 28 2016, 4:17 AM
RKSimon retitled this revision from to [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG.
RKSimon updated this object.
RKSimon added reviewers: igorb, delena, congh, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
delena added inline comments.Feb 28 2016, 6:10 AM
test/CodeGen/X86/avx512-ext.ll
116

Hi Simon,

Why do we need an additional instruction here?
vpmovzxbw %xmm0, %ymm0 {%k1} {z} does the work

RKSimon added inline comments.Feb 28 2016, 6:32 AM
test/CodeGen/X86/avx512-ext.ll
116

Hi - that was what I was asking yourself + Igor. I don't know much about how the masking lowering work in AVX512, but for some reason these VZEXT (which in this case has come via VECTOR_SHUFFLE lowering) don't correctly combine with the masks.

Now, I'm tempted to avoid this issue by just not combining cases where ZERO_EXTEND is legal and extends the whole register, but it looks like its just hiding a bigger problem. I'll see if I can create a simplified repro if you wish?

delena edited edge metadata.Feb 28 2016, 6:38 AM

I looked at the code. We combine "zext" with "select" in td file. and receive
vpmovzxbw %xmm0, %ymm0 {%k1} {z}

Then I'm looking at lowering of ZERO_EXTEND_VECTOR_INREG : VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG()

It goes there, right?
At the end I see:

return DAG.getNode(ISD::BITCAST, DL, VT,
                   DAG.getVectorShuffle(SrcVT, DL, Zero, Src, ShuffleMask));

I assume that BITCAST does not allow combining "zext" with "select" .

But SEXT code woks perfect on SKX, right?

define <16 x i16> @zext_16x8_to_16x16_mask(<16 x i8> %a ,<16 x i1> %mask) nounwind readnone {

%x   = sext <16 x i8> %a to <16 x i16>
%ret = select <16 x i1> %mask, <16 x i16> %x, <16 x i16> zeroinitializer
ret <16 x i16> %ret

}

I assume that BITCAST does not allow combining "zext" with "select" .

Yes, I've just done a repro using vector shuffle - the bitcast seems to be the problem (and is why we don't see it with the VSEXT case). I can raise a bugzilla describing this if you want me to.

It looks the best option for now is only use this combine for certain non-legal ZERO_EXTEND cases - I'll update it shortly.

I wonder whether X86ISD::VZEXT / X86ISD::VSEXT are that useful or whether we should try to implement the PMOVZX/PMOVSX instructions with a mixture of ZERO/SIGN_EXTEND and ZERO/SIGN_EXTEND_VECTOR_INREG - I've looked at this in the past but haven't pursued it recently, IIRC there were problems with memory folding.

RKSimon updated this revision to Diff 49317.Feb 28 2016, 7:25 AM
RKSimon edited edge metadata.

Tightened the restrictions on the *_EXTEND to *_EXTEND_VECTOR_INREG so that already legal transforms (AVX2+) don't get unnecssarily combined.

Created PR26762 to investigate the issue with bitcasts preventing masked instruction combining

Looks better. I suggest to run llc with --debug for zext and sext for SKX and compare between them.
But in general, the patch looks good.

Looks better. I suggest to run llc with --debug for zext and sext for SKX and compare between them.
But in general, the patch looks good.

No problem - I'll put my findings on PR26762.

OK to commit?

delena accepted this revision.Mar 2 2016, 10:42 AM
delena edited edge metadata.

LGTM

This revision is now accepted and ready to land.Mar 2 2016, 10:42 AM
This revision was automatically updated to reflect the committed changes.