This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] try to form vector binop to eliminate an extract element
ClosedPublic

Authored by spatel on Feb 12 2020, 9:23 AM.

Details

Summary

binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C

This is a transform that has been considered for canonicalization (instcombine) in the past because it reduces instruction count. But as shown in the x86 tests, it's impossible to know if it's profitable without a cost model. There are many potential target constraints to consider.

We have implemented similar transforms in the backend (DAGCombiner and target-specific), but I don't think we have this exact fold there either (and if we did it in SDAG, it wouldn't work across blocks).

Note: this patch was intended to handle the more general case where the extract indexes do not match, but it got too big, so I scaled it back to this pattern for now.

Diff Detail

Event Timeline

spatel created this revision.Feb 12 2020, 9:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 9:23 AM
lebedev.ri added inline comments.Feb 12 2020, 9:40 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
135

Thinking out loud: what about the case where X == Y?

  • If the extract is of the same element, could consider doing extract from 2*X or X+X
  • If elements are different, could consider forming hadd
lebedev.ri added inline comments.Feb 12 2020, 10:05 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
105

Don't you want to ensure that both X and Y have the same type?
(we didn't forget that in foldExtractCmp())

spatel marked 4 inline comments as done.Feb 12 2020, 10:52 AM
spatel added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
105

Yes - good catch. Accidentally dropped that check.

135

For now (with same element extraction), it should work as expected. I'll add a test. I don't think we want to attempt canonicalization like X+X --> X<<1 if that's what you're thinking of. Let's leave that to instcombine.

With different elements, we'll want to be careful that we are not obscuring a pattern that the backend recognizes. I think we're ok on that either way for x86 horizontal math, but I'll take a closer look before the planned enhancement.

spatel updated this revision to Diff 244223.Feb 12 2020, 10:54 AM
spatel marked 2 inline comments as done.

Patch updated:

  1. Added check and test for matching vector types.
  2. Added test with extracts from common vector operand.
spatel updated this revision to Diff 244225.Feb 12 2020, 11:09 AM

Patch updated:

  1. Loosened use check to deal with extract of same operand (I misunderstood the comment/example about X+X in the previous reply).
  2. Added tests with extra uses.
lebedev.ri added inline comments.Feb 12 2020, 12:57 PM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
103

But even if it's the same extract, do we not care whether it will go away or not,
unlike the case with two different extracts?
I.e. for now i'd expect

if (!(Ext0 == Ext1 && Ext0->hasNUses(2)) &&
    !(Ext0->hasOneUse() && Ext1->hasOneUse()))
  return false;
103

(it might be better to handle extract cost from the getgo?)

spatel marked 3 inline comments as done.Feb 13 2020, 8:05 AM
spatel added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
103

Yes, identical operands creates a loophole that might allow vectorization where it wasn't intended, so we should include it in this patch rather than making a follow-on.

I've added a pile of tests that hopefully check all of the possibilities now. I don't see any x86 combos where the vector op is cheaper than the sibling scalar op, so those will all be negative tests.

spatel updated this revision to Diff 244437.Feb 13 2020, 8:09 AM
spatel marked an inline comment as done.

Patch updated:
Account for extra uses and equivalent extract operands.

lebedev.ri accepted this revision.Feb 13 2020, 10:54 AM
lebedev.ri marked an inline comment as done.

LG with nits
Thanks

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
89

Hm, and if there is only one extract, and the other operand is constant,
we consider that vectorizing the constant in some way will always be costlier?
Just thinking out loud, not for this patch.

118–119

If we are extracting the same lane, the cost should be identical right?
I wonder if it would be more readable to assert that,
do int ExtractCost = Extract0Cost; and operate on it here.

132

I'd think it would be more obvious to do !Ext0->hasNUses(2) instead.
It can't have less than two uses - there are two uses in our entry binop.
And if there are more than two uses then those are The extra uses.

Otherwise, why do we check hasOneUse() instead of hasNUsesOrMore(2)?

This revision is now accepted and ready to land.Feb 13 2020, 10:54 AM
spatel marked 5 inline comments as done.Feb 13 2020, 12:42 PM
spatel added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
89

This would imply that the vector binop is cheaper than the scalar binop since we would have the same extract either way. As I mentioned in an earlier comment, I haven't found that case yet on x86, but it's possible it exists for some binop on some subtarget.

118–119

Yes, this is a leftover of starting the patch on the more general case. I'll change it.

132

I agree that your suggestion is better for readability. Will change. Thanks for the detailed review!

lebedev.ri added inline comments.Feb 13 2020, 12:45 PM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
132

Otherwise, why do we check hasOneUse() instead of hasNUsesOrMore(2)?

Err, that of course should have read

"Otherwise, why do we check hasOneUse() instead of hasNUsesOrMore(1)"

lebedev.ri marked an inline comment as done.Feb 13 2020, 12:53 PM
lebedev.ri added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
89

I'm mainly just trying to think of plausible edge cases that we might care about, but just didn't think of.

This revision was automatically updated to reflect the committed changes.
spatel marked 2 inline comments as done.