This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] narrow vector binop with 2 insert subvector operands
ClosedPublic

Authored by spatel on Jan 17 2019, 12:19 PM.

Details

Summary
bo (ins undef, X, Z), (ins undef, Y, Z) --> ins undef, (bo X, Y), Z

This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests).

The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be undef because both inputs are undef in some vector lanes.

I think a similar match but checking for a constant operand might help some of the cases in D51553.

Diff Detail

Event Timeline

spatel created this revision.Jan 17 2019, 12:19 PM
RKSimon accepted this revision.Jan 21 2019, 9:40 AM

LGTM with one minor

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18214

(very minor) Move this comment to the outer if() and use the same terms as we used in the shuffle fold above.

Also, explain that this is likely to occur in reduction patterns.

This revision is now accepted and ready to land.Jan 21 2019, 9:40 AM
spatel planned changes to this revision.Jan 21 2019, 11:14 AM

Actually, this patch isn't correct as-is. We can't insert into an undef base vector because at least 'xor undef, undef --> 0' (not undef).
I know I avoided or fixed the similar problem in IR somewhere along the way.
We need to actually compute the constant vector for the specified binop's opcode.

'xor undef, undef --> 0' (not undef).

Strictly speaking, xor undef, undef ---> undef is correct. It's just that as a practical matter, we try to fold obvious cases to zero so we don't have to argue with people who write silly constructs like __m128i a = _mm_xor_si128(a, a);. The fold here seems unlikely to cause problems in practice.

'xor undef, undef --> 0' (not undef).

Strictly speaking, xor undef, undef ---> undef is correct. It's just that as a practical matter, we try to fold obvious cases to zero so we don't have to argue with people who write silly constructs like __m128i a = _mm_xor_si128(a, a);. The fold here seems unlikely to cause problems in practice.

Ah, interesting. Either way, I think we want a test to document the behavior, so I added 1 here:
rL351763

Now, the funny thing about this particular case is that we'll generate the optimal code with x86 AVX either way because 'vxorps' with 128-bit operands zeros the upper half...oops! So I added another test here:
rL351764

I think I should check in the more conservative version of the patch first since I already wrote it. Then, if we decide it's worth loosening to the form shown here currently, I can make that a follow-up.

After looking a bit more closely at other patterns that I was hoping to fix, I'm actually not sure if we will end up keeping this code. I think we might be better off trying more general vector-demanded-elements enhancements for binops which would make this explicit pattern-matching unnecessary.

spatel updated this revision to Diff 182830.Jan 21 2019, 2:40 PM

Patch updated:

  1. Compute the constant vector that we're inserting into rather than assuming it is undef.
  2. Moved code comment as suggested.
  3. Rebased with diffs for 'xor' tests.
This revision is now accepted and ready to land.Jan 21 2019, 2:40 PM
RKSimon accepted this revision.Jan 21 2019, 2:46 PM

LGTM!

This revision was automatically updated to reflect the committed changes.