This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Add support for tail folding using scalable vectors
ClosedPublic

Authored by david-arm on Nov 2 2021, 4:24 AM.

Details

Summary

This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount
whereby we weren't correctly generating the runtime trip count
for scalable vectors when tail-folding.

It also removes some asserts in the tail-folding path for cases when
the VF is not scalable.

In this patch I have only permitted tail-folding to be enabled
explicitly for scalable vectors when the user has specified one
of the following flags:

-prefer-predicate-over-epilogue=predicate-dont-vectorize
-prefer-predicate-over-epilogue=predicate-else-scalar-epilogue

For now it's best not to enable tail-folding with scalable vectors for
low trip counts or when optimising for code size, since there has been
no analysis on whether this is worth it.

Various tests have been added here:

Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

The tests cannot be target independent because they require masked
load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore
need to return true.

Diff Detail

Event Timeline

david-arm created this revision.Nov 2 2021, 4:24 AM
david-arm requested review of this revision.Nov 2 2021, 4:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2021, 4:24 AM

Since we are (also) talking about profitability of the transform:

For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it.

I was expecting we would need to implement target hook preferPredicateOverEpilogue for AArch64/SVE that controls this, and add some of this logic there?

Since we are (also) talking about profitability of the transform:

For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it.

I was expecting we would need to implement target hook preferPredicateOverEpilogue for AArch64/SVE that controls this, and add some of this logic there?

Hi @SjoerdMeijer, at the moment we're not really worried about profitability since we don't intend to enable tail-folding by default anytime soon. I'm happy to look into the TTI hook (preferPredicateOverEpilogue), but I'm not sure what benefit it gives us at the moment? The main reason for supporting tail-folding for scalable vectors for AArch64 is to ensure we have the technical capability and nothing crashes with scalable vectors. In future we'd like to use predication for vector epilogues and avoid the scalar tail, but maintain an unpredicated main vector loop.

david-arm retitled this revision from [LoopVectorize] Enable tail folding for scalable vectors to [LoopVectorize] Add support for tail folding using scalable vectors.Nov 2 2021, 5:48 AM

Hi @SjoerdMeijer, I realised the title on the patch was misleading and I've changed it now to indicate we're not enabling this for AArch64, but just adding theoretical support for scalable vectors!

SjoerdMeijer added inline comments.Nov 2 2021, 6:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5761

Hi @david-arm , thanks for clarifying so far, that has helped. Moving the question about profitability to here, the place where it is probably relevant. Also note that I am asking a bunch of question just to reload to memory how things work here (and I am new to the SVE/scalable angle here).

Before we discuss the actual change, I don't think I understand this original comment + code here to be honest. The first sentence:

For scalable vectors, don't use tail folding as this is currently not yet supported.

suggest to me that we want to return CM_ScalarEpilogueAllowed in getScalarEpilogueLowering. And I don't know what the affect is of setting MaxFactors.ScalableVF to 0 below, but I guess that has somehow the affect?

To me it feels like that we do make a profitability call here "in the middle of something", which should be moved to a different place. I can also imagine that this is highly target dependent/specific, further supporting this.

I do remember though that there is a bit of an ordering problem: the decision to tail-fold or not is taken quite early, while not all information that we ideally would like to have are not yet available. Not sure that is the case here. But I guess that for now the decision "if scalable vector, then don't tail-fold" could be taken quite early. But yeah, I might be missing something here, so please enlighten me. :)

david-arm added inline comments.Nov 2 2021, 6:58 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5761

So when this code was originally added tail-folding just crashed for scalable vectors because a lot of the code related to tail-folding assumed fixed-width vectors. We've been trying to fix a lot of these issues and now we've got to the point where it is at least possible to fold the tail for most loops. However, it is still experimental for scalable vectors - scalable vectorisation has not yet been enabled by default for the vectoriser. Even after this patch lands there is still at least one known issue that we intend to fix related to tail-folding + scalable vectors. This is the reason why tail-folding is still hidden behind a flag for scalable vectors. If we start to allow tail-folding to be automatically enabled for low trip counts or when optimising for code size then we will be knowingly exposing users to these issues.

I do take your point about profitability and realise the original comment also suggested something related to profitability, but the aim of this patch is simply to add functionality that is not enabled by default. Perhaps I can change the comment to explain that we want to make this feature stable first for scalable vectors, then move on to questions of profitability?

To answer your other question about MaxFactors.ScalableVF it basically disables scalable vectorisation entirely if computeMaxVF has decided that we should use tail-folding for low trip counts or reducing code size. However, we are still be able to use fixed-width vectorisation.

SjoerdMeijer added inline comments.Nov 2 2021, 7:53 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5761

I do take your point about profitability and realise the original comment also suggested something related to profitability, but the aim of this patch is simply to add functionality that is not enabled by default. Perhaps I can change the comment to explain that we want to make this feature stable first for scalable vectors, then move on to questions of profitability?

Sound like a good approach to me.

Matt added a subscriber: Matt.Nov 2 2021, 11:05 AM
reames resigned from this revision.Nov 30 2021, 9:54 AM
david-arm updated this revision to Diff 392092.Dec 6 2021, 8:57 AM
  • Rebase, which required fixing up some tests.
david-arm updated this revision to Diff 397317.Jan 4 2022, 9:00 AM
  • Rebase, which required tweaking the tests slightly due to cost model changes.
sdesmalen accepted this revision.Jan 7 2022, 6:57 AM

Seems like quite the straightforward change, LGTM.

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
2

nit: this can be removed now that scalable vectorization is enabled by default for aarch64 when SVE is available.

This revision is now accepted and ready to land.Jan 7 2022, 6:57 AM
This revision was landed with ongoing or failed builds.Jan 10 2022, 2:55 AM
This revision was automatically updated to reflect the committed changes.