When the loop vectoriser encounters a known low trip count it tries
to create a single predicated loop in order to get the benefit of
vectorisation and eliminate the scalar tail. However, until now the
vectoriser prevented the use of scalable vectors in this case due
to concerns in the past about stability. I believe that tail-folded
loops using scalable vectors are now sufficiently well tested that
we can enable this. For the same reason I've also enabled it when
optimising for code size too.
Tests added here:
Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll Transforms/LoopVectorize/RISCV/low-trip-count.ll
I think we can remove the condition entirely, so that we consider using scalable vectors + tail folding when optimising for code-size as well.
I know that for SVE we'll want to improve code quality to avoid the redundant compare, but when optimising for code-size the user has made the decision that code-size is more important than performance. And the cost-model will still have a say in which is more beneficial (scalar, fixed or scalable) and may still choose a fixed-width VF in case the ScalableVF may not be legal for the loop.