Page MenuHomePhabricator

[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)
ClosedPublic

Authored by spatel on Apr 27 2017, 3:53 PM.

Details

Summary

We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check legality of the select that we want to produce.

A few things to note:

  1. We can't wait until after legalization and do this generically because (at least in the x86 tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
  2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but I think that requires a closer look to make sure we don't end up worse.
  3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. I thought I'd better just post this as-is to make sure I'm not off the rails, but I could fix that first.
  4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple legal vector sizes, but I can certainly add tests for other targets to make sure this isn't doing harm.
  5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests): despite IR that is terrible for the target, this patch allows us to generate the optimal loop code because something post-ISEL is hoisting the splat extends above the vector loops.

Diff Detail

Event Timeline

spatel created this revision.Apr 27 2017, 3:53 PM
nadav added inline comments.Apr 27 2017, 4:05 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6937

I think that it can be a good idea to check which DAGCombine phase we are in and make sure that we are in the pre-legalization phase.

6947

I think that we should check that a and b have one use.

spatel updated this revision to Diff 97110.Apr 28 2017, 9:20 AM
spatel marked an inline comment as done.

Patch updated:

  1. Bail out if we require LegalOperations (unlikely that this could have benefits if we are past legalization).
  2. That change tilted the scale to make this a proper DAGCombiner member rather than a static, so some cosmetic diffs from that.
  3. Improved code comments, structure, variable names, and assert message (no functional change from any of these either).
spatel added inline comments.Apr 28 2017, 9:38 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6947

I could not find a reason for this. Apologies for moving the goal posts, but I renamed the variables in the updated version of the patch in an attempt to make the code clearer.

Your reference to 'a and b' was to the setcc operands. These are not in play with this transform. Ie, the setcc remains as-is regardless of what happens to the vselect.

'c and d' were the vselect operands (these are now 'A and B'), but I don't see a benefit to checking the number of uses on those either.

Here are a couple of hacked up tests to show these cases. I can add these to the patch if that would help.

define <4 x double> @fpext2(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) {
%cmp = fcmp olt <4 x double> %x, %y
%sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%ext = fpext <4 x float> %sel to <4 x double>
%add = fadd <4 x double> %x, %y   <--- extra uses of fcmp operands
%div = fdiv <4 x double> %ext, %add
ret <4 x double> %div
}

With this patch (no one-use check on the compare operands):

vcmpltpd   %ymm1, %ymm0, %ymm4
vcvtps2pd  %xmm2, %ymm2
vcvtps2pd  %xmm3, %ymm3
vblendvpd  %ymm4, %ymm2, %ymm3, %ymm2
vaddpd     %ymm1, %ymm0, %ymm0
vdivpd     %ymm0, %ymm2, %ymm0

If we bail out on hasOneUse():

vcmpltpd      %ymm1, %ymm0, %ymm4
vextractf128  $1, %ymm4, %xmm5
vpacksswb     %xmm5, %xmm4, %xmm4
vblendvps     %xmm4, %xmm2, %xmm3, %xmm2
vcvtps2pd     %xmm2, %ymm2
vaddpd        %ymm1, %ymm0, %ymm0
vdivpd        %ymm0, %ymm2, %ymm0

And the case where the select ops have >1 use:

define <4 x double> @fpext3(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) {
%cmp = fcmp olt <4 x double> %x, %y
%sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%ext = fpext <4 x float> %sel to <4 x double>
%add = fadd <4 x float> %a, %b   <--- extra uses of select ops
%ext2 = fpext <4 x float> %add to <4 x double>
%div = fdiv <4 x double> %ext, %ext2
ret <4 x double> %div
}

Transformed:

vcmpltpd   %ymm1, %ymm0, %ymm0
vcvtps2pd  %xmm2, %ymm1
vcvtps2pd  %xmm3, %ymm4
vblendvpd  %ymm0, %ymm1, %ymm4, %ymm0
vaddps     %xmm3, %xmm2, %xmm1
vcvtps2pd  %xmm1, %ymm1
vdivpd     %ymm1, %ymm0, %ymm0

Or bail out:

vcmpltpd      %ymm1, %ymm0, %ymm0
vextractf128  $1, %ymm0, %xmm1
vpacksswb     %xmm1, %xmm0, %xmm0
vblendvps     %xmm0, %xmm2, %xmm3, %xmm0
vcvtps2pd     %xmm0, %ymm0
vaddps	      %xmm3, %xmm2, %xmm1
vcvtps2pd     %xmm1, %ymm1
vdivpd        %ymm1, %ymm0, %ymm0
nadav accepted this revision.Apr 29 2017, 1:08 AM

LGTM!

This revision is now accepted and ready to land.Apr 29 2017, 1:08 AM
This revision was automatically updated to reflect the committed changes.