This is an archive of the discontinued LLVM Phabricator instance.

[SVE][LoopVectorize] Add option to disable tail-folding for reverse loops
ClosedPublic

Authored by david-arm on Mar 15 2023, 4:45 AM.

Details

Summary

If we use tail-folding for reverse loops that contain loads
and stores then we will need to reverse the loop predicate.
This patch adds a new 'reverse' sve-tail-folding option and
ensures they are not considered 'simple'.

I did this by adding a function called
containsDecreasingPointers to AArch64TargetTransformInfo.cpp
that searches all instructions in the loop for loads or
stores with negative strides.

Diff Detail

Event Timeline

david-arm created this revision.Mar 15 2023, 4:45 AM
Herald added a project: Restricted Project. · View Herald Transcript
david-arm requested review of this revision.Mar 15 2023, 4:45 AM
paulwalker-arm added inline comments.Mar 15 2023, 9:57 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
544

If the number of factors needed to determine the predication strategy is going to increase, perhaps it's worth creating a descriptor class, much like IntrinsicCostAttributes, to keep the interface churn down.

What do people think?

LoopVectorizationLegality::containsDecreasingPointers seems to loop over all the instructions and call isConsecutivePtr, which just calls getPtrStride. Could that logic just be placed in AArch64TTIImpl::preferPredicateOverEpilogue? That is how it has worked in ARMTTIImpl::preferPredicateOverEpilogue via canTailPredicateLoop. Otherwise the code in containsDecreasingPointers is ran for any architecture, but only used by AArch64.

LoopVectorizationLegality::containsDecreasingPointers seems to loop over all the instructions and call isConsecutivePtr, which just calls getPtrStride. Could that logic just be placed in AArch64TTIImpl::preferPredicateOverEpilogue? That is how it has worked in ARMTTIImpl::preferPredicateOverEpilogue via canTailPredicateLoop. Otherwise the code in containsDecreasingPointers is ran for any architecture, but only used by AArch64.

That's a good point @dmgreen! In fact, that's what I originally tried doing until I realised that isConsecutivePtr is a non-inline member function of LoopVectorizationLegality. Calling that function leads to build errors because that introduces a dependency on libLLVMVectorize.so. So then I had a few choices:

  1. Make the AArch64 backend library depend upon libLLVMVectorize.so (which felt ugly),
  2. Duplicate all of the code from isConsecutivePtr into AArch64TargetTransformInfo.cpp (also feels ugly and potentially misses bug fixes), or
  3. Let the vectoriser do the discovery work and pass in a flag.

Unless you can think of a better way I've missed?

Hello - yeah I was thinking of 2, as isConsecutivePtr is fairly simple. The Arm backend uses getPtrStride and is more careful about the strides it allows. LD2/LD4 are ruled out, but other strides are allowed to use the gather/scatters for example. It will miss the SymbolicStrides and "Assume" checks though.

I have no strong opinion - it was just a suggestion about how it has worked in other architectures. I'll leave it to you what you think is best.

Matt added a subscriber: Matt.Mar 17 2023, 11:09 AM
david-arm updated this revision to Diff 506933.Mar 21 2023, 6:04 AM
david-arm edited the summary of this revision. (Show Details)
  • Moved containsDecreasingPointers to AArch64TargetTransformInfo.cpp
david-arm marked an inline comment as done.Mar 21 2023, 6:05 AM

LoopVectorizationLegality::containsDecreasingPointers seems to loop over all the instructions and call isConsecutivePtr, which just calls getPtrStride. Could that logic just be placed in AArch64TTIImpl::preferPredicateOverEpilogue? That is how it has worked in ARMTTIImpl::preferPredicateOverEpilogue via canTailPredicateLoop. Otherwise the code in containsDecreasingPointers is ran for any architecture, but only used by AArch64.

Done!

sdesmalen added inline comments.Mar 24 2023, 5:26 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
3399–3402

nit: you could do:

if (getPtrStride(...).value_or(0) == -1)

directly?

3402

Does it also need to return true for Strides < -1?

david-arm updated this revision to Diff 508070.Mar 24 2023, 6:26 AM
  • Addressed review comments
david-arm marked 2 inline comments as done.Mar 24 2023, 6:27 AM
david-arm added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
3402

In practice if abs(Stride) > 1 we will always version the loop and only enter the vectorised loop if abs(Stride)==1, but checking for all values < 0 doesn't do any harm.

sdesmalen accepted this revision.Mar 24 2023, 6:33 AM
sdesmalen added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
3402

In practice if abs(Stride) > 1 we will always version the loop and only enter the vectorised loop if abs(Stride)==1

I thought that's only the case when the stride of the induction variable is unknown. It's still possible to use LD2/3/4 for known strides > 1 or vectorize a loop using gathers.

This revision is now accepted and ready to land.Mar 24 2023, 6:33 AM

LoopVectorizationLegality::containsDecreasingPointers seems to loop over all the instructions and call isConsecutivePtr, which just calls getPtrStride. Could that logic just be placed in AArch64TTIImpl::preferPredicateOverEpilogue? That is how it has worked in ARMTTIImpl::preferPredicateOverEpilogue via canTailPredicateLoop. Otherwise the code in containsDecreasingPointers is ran for any architecture, but only used by AArch64.

Done!

Thanks

dmgreen added inline comments.Mar 24 2023, 11:42 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
3402

I think it will use gathers for <= -2. At least that is what it does for MVE, it may depend on the costs.

david-arm marked 3 inline comments as done.Mar 27 2023, 1:07 AM
david-arm added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
3402

Yeah, you're both right @dmgreen and @sdesmalen, for some reason I was getting mixed up with unknown strides. :confused.

david-arm updated this revision to Diff 508611.Mar 27 2023, 5:52 AM
david-arm marked an inline comment as done.
  • Changed patch to remove dependence on D146127
This revision was landed with ongoing or failed builds.Mar 27 2023, 7:10 AM
This revision was automatically updated to reflect the committed changes.