This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
ClosedPublic

Authored by david-arm on Aug 16 2021, 8:25 AM.

Details

Summary

This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

int r = a;
for (int i = 0; i < n; i++) {
  if (src[i] > 3) {
    r = b;
  }
  src[i] += 2;
}

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

%phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
...
%pred = icmp ugt i32 %val, i32 3
%phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
Transforms/LoopVectorize/select-cmp.ll

Diff Detail

Event Timeline

david-arm created this revision.Aug 16 2021, 8:25 AM
david-arm requested review of this revision.Aug 16 2021, 8:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 8:25 AM
Matt added a subscriber: Matt.Aug 17 2021, 9:22 AM
xbolva00 added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9243

RecurrenceDescriptor::isSelectCmpRecurrenceKind(Kind)) {

+ any test for this code path?

david-arm updated this revision to Diff 369671.Aug 31 2021, 3:30 AM
  • Removed some redundant checks for select/cmp patterns.
david-arm marked an inline comment as done.Aug 31 2021, 3:31 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9243

Good spot @xbolva00! The extra checks I added were never being hit because we never create reduction recipes for this new recurrence kind. I've added some asserts instead.

david-arm marked an inline comment as done.Sep 7 2021, 7:50 AM

Gentle ping!

I have been looking at a similar area of the code recently for something unrelated. My first thought was that these are funny reductions, only reducing 2 values into 1. As in https://godbolt.org/z/531E7cPxY?

Do you know how common this comes up, and if the loop in question will usually have a large enough iteration count to warrant the overheads of vectorization? I guess that's hard to say in general.

Can you upload with full context

I have been looking at a similar area of the code recently for something unrelated. My first thought was that these are funny reductions, only reducing 2 values into 1. As in https://godbolt.org/z/531E7cPxY?

Do you know how common this comes up, and if the loop in question will usually have a large enough iteration count to warrant the overheads of vectorization? I guess that's hard to say in general.

Can you upload with full context

Hi @dmgreen, so I have only seen one example so far, but I do know that gcc vectorises this loop (although gcc's loop seems a bit messy). I tested some simple loops with large trip counts and we see significant performance improvements from vectorisation. I imagine that in practice real-world loops would do more than just perform this type of reduction, and would likely be in combination with something else.

Currently investigating a crash I discovered whilst building a benchmark. Will post another patch once I've found the cause!

david-arm updated this revision to Diff 373585.Sep 20 2021, 7:41 AM
  • Fixed a crash I found whilst building some benchmarks related to cases where the select/cmp are in a predicated block. Added tests - Transforms/LoopVectorize/select-cmp-predicated.ll
  • Disabled interleaving for small loops when using a select/cmp reduction pattern because this can give worse performance in some cases with small trip counts.

I ran SPEC2006 and did not find any performance regressions, although I know that some loops are now vectorising because of this patch. Unfortunately, this did not translate to any improvements either.

david-arm updated this revision to Diff 374824.Sep 24 2021, 6:16 AM
  • Rebase.
  • Fixed a build issue in IVDescriptors.cpp
kmclaughlin added inline comments.Sep 29 2021, 9:31 AM
llvm/lib/Transforms/Utils/LoopUtils.cpp
859

Could you use dyn_cast instead of isa & cast here?

981

Can you add a message to this assert, maybe something like "Unexpected reduction kind"?

1066

Is it possible to use isSelectCmpRecurrenceKind here? e.g.

if (isSelectCmpRecurrenceKind(Desc.getRecurrenceKind())
  return createSelectCmpTargetReduction(B, TTI, Src, Desc, OrigPhi);

return createSimpleTargetReduction(B, TTI, Src, RK);
llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
19

I could be wrong, but I think it's necessary to check for the correct label before the CHECK lines for each prefix? (e.g. CHECK-VF4IC4-LABEL above here)

llvm/test/Transforms/LoopVectorize/select-cmp.ll
215

Could you please add a small comment above each of the negative tests to explain why they shouldn't vectorise?

david-arm added inline comments.Sep 30 2021, 1:47 AM
llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
19

I think this probably works because I also add "--check-prefix=CHECK" to the FileCheck command, however maybe it looks better to do it the way you suggest. It probably looks more obvious!

david-arm updated this revision to Diff 376150.Sep 30 2021, 3:55 AM
  • Addressed review comments.
  • Rebase.
david-arm marked 5 inline comments as done.Sep 30 2021, 3:56 AM

Thanks for the review comments @kmclaughlin!

kmclaughlin accepted this revision.Sep 30 2021, 7:08 AM

Thanks for addressing my comments @david-arm, the changes look good to me.

llvm/lib/Analysis/IVDescriptors.cpp
941

Can this break be removed since we're returning above it?

This revision is now accepted and ready to land.Sep 30 2021, 7:08 AM
This revision was landed with ongoing or failed builds.Oct 1 2021, 12:41 AM
This revision was automatically updated to reflect the committed changes.
david-arm reopened this revision.Oct 4 2021, 6:03 AM

Reopening after patch was reverted, since I think I have a fix now.

This revision is now accepted and ready to land.Oct 4 2021, 6:03 AM
david-arm updated this revision to Diff 377109.Oct 5 2021, 1:04 AM
  • Fixed a buildbot sanitiser issue that occurred after merging the patch. The issue was that the input to the select instruction came from the induction PHI, and the output fed into the reduction PHI. Therefore, there wasn't a complete circular dependency and we shouldn't have vectorised.
  • I have added the original reduction PHI as a parameter to isSelectCmpPattern so that we can ensure the input matches the reduction PHI node. This is now defended by the negative test select_i32_from_icmp_non_redux_phi
  • I also added a new test called select_i32_from_icmp_same_inputs to ensure we don't vectorise when both the icmp and the select use the same PHI node.
kmclaughlin added inline comments.Oct 7 2021, 8:59 AM
llvm/test/Transforms/LoopVectorize/select-cmp.ll
291

Hi @david-arm, I'm not sure why we wouldn't support the icmp & select using the same PHI?

david-arm updated this revision to Diff 378129.Oct 8 2021, 1:53 AM
  • Re-added support for cases where the PHI is used by both the icmp and select because there is no logical reason for not supporting this.
david-arm marked an inline comment as done.Oct 8 2021, 1:53 AM
david-arm added inline comments.
llvm/test/Transforms/LoopVectorize/select-cmp.ll
291

Well spotted @kmclaughlin! We can indeed support this.

This revision was landed with ongoing or failed builds.Oct 11 2021, 1:41 AM
This revision was automatically updated to reflect the committed changes.
david-arm marked an inline comment as done.