This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] try to form vector compare and binop to eliminate scalar ops
ClosedPublic

Authored by spatel on Jun 24 2020, 9:07 AM.

Details

Summary

binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0

This is a larger pattern than the existing extractelement folds because we can't reasonably vectorize the sub-patterns with constants based on cost model calcs (it doesn't usually make sense to replace a single extracted scalar op with constant operand with a vector op). I salvaged as much of the existing logic as I could, but there might be better ways to share and reduce code.

The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that particular pattern in D59710, but that caused crashing and regressions. This patch is more general, but hopefully safer.

The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5

There's no code comment explanation for the more expensive vector fcmp in the cost model table, but I assume that's based on some ancient SSE2 implementation where it wasn't the cheapest:

static const CostTblEntry SSE2CostTbl[] = {
  { ISD::SETCC,   MVT::v2f64,   2 },
  { ISD::SETCC,   MVT::f64,     1 },

Diff Detail

Event Timeline

spatel created this revision.Jun 24 2020, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2020, 9:07 AM

Yes, old x86 targets really struggle with some v2f64 ops - cmppd included.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
563

cast<FixedVectorType> (maybe a dyn_cast<>) ?

spatel marked an inline comment as done.Jun 28 2020, 9:26 AM
spatel added a subscriber: ctetreau.
spatel added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
563

Yes, I think dyn_cast with early return here - D82056 is trying to clean this up in similar existing code. cc @ctetreau

spatel updated this revision to Diff 273950.Jun 28 2020, 9:51 AM

Patch updated:
Added bail out and regression test for scalable vectors.

RKSimon accepted this revision.Jun 28 2020, 12:56 PM

LGTM - cheers

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
90

Maybe pull this out as a precommit NFC?

525

Maybe add an explanation that SLP can't handle this?

This revision is now accepted and ready to land.Jun 28 2020, 12:56 PM
lebedev.ri added inline comments.Jun 28 2020, 1:45 PM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
200–201

Looks like some of this should be a preparatory NFC cleanup

601

++SomeNewStatistic;

spatel marked an inline comment as done.Jun 29 2020, 6:04 AM
spatel added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
90

Yes, I'll do that. I did not do it in advance of this review because it introduces some redundancy for the existing code. So it is of questionable value without the new caller.

spatel marked 4 inline comments as done.Jun 29 2020, 7:30 AM
This revision was automatically updated to reflect the committed changes.