Page MenuHomePhabricator

[SDAG] allow more cast folding for vector sext-of-setcc

Authored by spatel on Tue, Jun 1, 7:26 AM.



This is a follow-up to D103280 that eases the use restrictions, so we can handle the motivating case from:

The loop code is adapted from similar use checks in ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an easier way to filter out non-chain uses of load values.

Diff Detail

Event Timeline

spatel created this revision.Tue, Jun 1, 7:26 AM
spatel requested review of this revision.Tue, Jun 1, 7:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Jun 1, 7:26 AM
lebedev.ri added inline comments.Tue, Jun 1, 10:10 AM

What if it is extending to the even wider type than VT?

spatel added inline comments.Tue, Jun 1, 1:36 PM

It seems neutral whether we have wider or narrower other type - trade a sext of the compare for a zext of the input. This might be because the transform to create zext-loads is conservative about uses.

Let me know if this matches what you have in mind (I can add a test either way):

define <8 x i32> @multi_use_wider_size(<8 x i8>* %src, <8 x i16>* %dst) nounwind {
  %load = load <8 x i8>, <8 x i8>* %src
  %zext = zext <8 x i8> %load to <8 x i32>
  %icmp = icmp eq <8 x i8> %load, zeroinitializer
  %sext = sext <8 x i1> %icmp to <8 x i16>
  store <8 x i16> %sext, <8 x i16>* %dst
  ret <8 x i32> %zext

Currently, we get:

vmovq	(%rdi), %xmm1
vpmovzxbd	%xmm1, %ymm0 ; 256-bit return value
vpxor	%xmm2, %xmm2, %xmm2
vpcmpeqb %xmm2, %xmm1, %xmm1
vpmovsxbw %xmm1, %xmm1
vmovdqa	%xmm1, (%rsi) ; 128-bit compare value

If we allow type mismatch:

vmovq	(%rdi), %xmm1
vpmovzxbd	%xmm1, %ymm0 
vpmovzxbw	%xmm1, %xmm1 
vpxor	%xmm2, %xmm2, %xmm2
vpcmpeqw	%xmm2, %xmm1, %xmm1
vmovdqa	%xmm1, (%rsi)
lebedev.ri added inline comments.Wed, Jun 2, 7:02 AM

My point being, iff the other ZEXT is to wider type, then i would expect that the ZEXT we create
would be absorbed by the load, and that other ZEXT would become ZEXR of ZEXT_LOAD,
but it seems like we end up with a plain load + 2 ZEXT's?

spatel added inline comments.Wed, Jun 2, 8:49 AM

Right - stepping through the debug spew for this example, if we allow the transform here, we have:

t7: v8i8,ch = load<(load 8 from %ir.src)> t0, t2, undef:i64
t21: v8i16 = zero_extend t7 -- feeds into the setcc
t8: v8i32 = zero_extend t7 -- return value

But the code in ExtendUsesToFormExtLoad() won't convert that load because it doesn't handle the extra use. It already has another FIXME comment about setcc patterns, so we could add one more there.

As a hack, I eased the restriction to see what we would end up with on this x86 example, and it's going to need more work. We truncate the zext load and then zext it back out, so we get this:

vpmovzxbw	(%rdi), %xmm1   
vpackuswb	%xmm1, %xmm1, %xmm0
vpmovzxbd	%xmm0, %ymm0

Ideally, we'd have something like:

vpmovzxbw	(%rdi), %xmm1   
vpmovzxwd	%xmm1, %ymm0

So it's going to be a multi-patch process if we want to allow more patterns here, and it's not clear to me that we actually gain from that transform (got rid of a sext, but added a zext?).

lebedev.ri accepted this revision.Wed, Jun 2, 8:55 AM

Seems fine to me unless others have other comments.


// The only extra users that we are okay with is the exact same cast we are about to create.


Thanks for checking.

This revision is now accepted and ready to land.Wed, Jun 2, 8:55 AM
RKSimon accepted this revision.Wed, Jun 2, 9:46 AM

LGTM - cheers!

This revision was landed with ongoing or failed builds.Wed, Jun 2, 10:15 AM
This revision was automatically updated to reflect the committed changes.