Page MenuHomePhabricator

[SLP] Consider alternatives for cost of select instructions.
ClosedPublic

Authored by fhahn on Oct 22 2020, 9:27 AM.

Details

Summary

Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.

One example is using min/max instructions for certain patterns.

This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.

This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.

This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).

Metric: SLP.NumVectorInstructions

Program base slp diff
test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8%
test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2%
test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1%
test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2%
test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5%
test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2%

Diff Detail

Event Timeline

fhahn created this revision.Oct 22 2020, 9:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2020, 9:27 AM
fhahn requested review of this revision.Oct 22 2020, 9:27 AM
xbolva00 added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3553

Maybe add TODO to remove/adjust this code when LLVM starts using u/smin/max fully?

3568

intrinsic::not_intrinsic

ABataev added inline comments.Oct 22 2020, 9:36 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3559–3595

Use match() functions to match the pattern. Also, what's the criterion for generating intrinsic or vector instruction? Can we rely on it unconditionally and just choose the minimum cost?

dmgreen added inline comments.Oct 22 2020, 9:42 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3559–3595

There's something called matchSelectPattern that might help quite a bit too.

fhahn updated this revision to Diff 300035.Oct 22 2020, 10:15 AM

Use matchSelectPattern instead of manually looking for min/max patterns.

fhahn added inline comments.Oct 22 2020, 10:22 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3559–3595

. Also, what's the criterion for generating intrinsic or vector instruction? Can we rely on it unconditionally and just choose the minimum cost?

Currently for costing, it just picks whatever has the cheaper cost (select lowering or min/max version). The code generated by the SLP vectorizer is currently unchanged and it relies on the backend to pick whichever version is more profitable.

This works well on AArch64, but there could be a case where for example SLP vectorizer decides it is only profitable with min/max, but the backend is missing a corresponding fold and won't be able to generate the right instructions.

We could add a general fold to VectorCombine as follow up, to behave more predictable, if that is desired, or adjust the SLP codegen.

I am also planing a follow-up that supports additional select patterns that can be lowered more efficiently than currently on AArch64 that are not min/max.

There's something called matchSelectPattern that might help quite a bit too.

That's useful, thanks! Updated the code to use that. Unfortunately it seems like we still need some code to turn the pattern into an intrinsic, but maybe the code should be moved to SelectPatternResult?

fhahn marked an inline comment as done.Oct 22 2020, 10:22 AM
ABataev added inline comments.Oct 22 2020, 10:48 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3550–3567

Maybe outline all this as a separate function, something like static int getVectorMinMaxCost(...)?

3590

unexpected

fhahn updated this revision to Diff 300096.Oct 22 2020, 1:47 PM

Outlined cost computation to getVectorMinMaxCost helper function, fix typo. Thanks!

fhahn marked an inline comment as done.Oct 22 2020, 1:49 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3550–3567

Done, it was getting quite big.

samparker added inline comments.Oct 23 2020, 12:15 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3378

Maybe this should be part of TTI? There's already static methods for reduction matching and could this logic not also be useful for the loop vectorizer?

Why the reluctance to actually generate the min/max intrinsics?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3378

Yes - it'd be great to move most of this into TTI or Utils/Local.cpp

fhahn updated this revision to Diff 300295.Oct 23 2020, 8:00 AM

Updated to move matching code to ValueTracking where the other related match code lives.

Why the reluctance to actually generate the min/max intrinsics?

Nothing other than trying to keep the patch as small as possible. If people think it's better to generate the intrinsics directly, I can adjust SLP codegen.

fhahn added a comment.Oct 23 2020, 8:02 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3378

Sounds good. I moved the matching part to ValueTracking. It does not seem like the ideal location, but it is where the related SelectPatternResult helpers live.

could this logic not also be useful for the loop vectorizer?

yep, this is something I am planning on looking into shortly as a follow up.

fhahn updated this revision to Diff 300297.Oct 23 2020, 8:04 AM

Remove remnants of getVectorMinMaxCost

Harbormaster completed remote builds in B76200: Diff 300297.

I haven't looked at the details here, but I just want to chime in that I'm cautiously optimistic that we'll have the basic cost model for intrinsics straightened out soon ( example: c963bde0152a ).
That should then allow us to canonicalize to min/max/abs intrinsics in instcombine. I'm not sure if this patch makes that goal harder, easier, or neither. We will eventually want the vectorizers to recognize and produce those intrinsics directly.

I haven't looked at the details here, but I just want to chime in that I'm cautiously optimistic that we'll have the basic cost model for intrinsics straightened out soon ( example: c963bde0152a ).
That should then allow us to canonicalize to min/max/abs intrinsics in instcombine. I'm not sure if this patch makes that goal harder, easier, or neither. We will eventually want the vectorizers to recognize and produce those intrinsics directly.

I don't think this patch would make things harder. The change itself may become less important, if some of those scalar selects are turned into min/max before SLPVectorizer sees them. We can remove it, as @xbolva00 suggested, if it turns out it is not needed any more.

samparker accepted this revision.Mon, Oct 26, 2:03 AM

Probably best to wait a little while for other's feedback, but this LGTM.

This revision is now accepted and ready to land.Mon, Oct 26, 2:03 AM
RKSimon accepted this revision.Mon, Oct 26, 10:21 AM

LGTM with one minor

llvm/lib/Analysis/ValueTracking.cpp
5999

Update the comment?

This revision was landed with ongoing or failed builds.Thu, Oct 29, 1:42 PM
This revision was automatically updated to reflect the committed changes.
dmajor added a subscriber: dmajor.Fri, Oct 30, 12:48 PM

This commit triggered an assertion failure on our 32-bit bots:

reduced.c:

a, b, c;
l() {
  int e = a, f = l, g, h, i, j;
  float *d = c, *k = b;
  for (;;)
    for (; g < f; g++) {
      k[h] = d[i];
      k[h - 1] = d[j];
      h += e << 1;
      i += e;
    }
}
clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c

llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
fhahn added a comment.Fri, Oct 30, 2:27 PM

Interesting, thanks for sharing! I reverted the patch while I investigate.

llvm/lib/Analysis/ValueTracking.cpp
5999

Thanks, I updated this in the committed version.