This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount
whereby we weren't correctly generating the runtime trip count
for scalable vectors when tail-folding.
It also removes some asserts in the tail-folding path for cases when
the VF is not scalable.
In this patch I have only permitted tail-folding to be enabled
explicitly for scalable vectors when the user has specified one
of the following flags:
-prefer-predicate-over-epilogue=predicate-dont-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue
For now it's best not to enable tail-folding with scalable vectors for
low trip counts or when optimising for code size, since there has been
no analysis on whether this is worth it.
Various tests have been added here:
Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
The tests cannot be target independent because they require masked
load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore
need to return true.
Hi @david-arm , thanks for clarifying so far, that has helped. Moving the question about profitability to here, the place where it is probably relevant. Also note that I am asking a bunch of question just to reload to memory how things work here (and I am new to the SVE/scalable angle here).
Before we discuss the actual change, I don't think I understand this original comment + code here to be honest. The first sentence:
suggest to me that we want to return CM_ScalarEpilogueAllowed in getScalarEpilogueLowering. And I don't know what the affect is of setting MaxFactors.ScalableVF to 0 below, but I guess that has somehow the affect?
To me it feels like that we do make a profitability call here "in the middle of something", which should be moved to a different place. I can also imagine that this is highly target dependent/specific, further supporting this.
I do remember though that there is a bit of an ordering problem: the decision to tail-fold or not is taken quite early, while not all information that we ideally would like to have are not yet available. Not sure that is the case here. But I guess that for now the decision "if scalable vector, then don't tail-fold" could be taken quite early. But yeah, I might be missing something here, so please enlighten me. :)