If we use tail-folding for reverse loops that contain loads
and stores then we will need to reverse the loop predicate.
This patch adds a new 'reverse' sve-tail-folding option and
ensures they are not considered 'simple'.
I did this by adding a function called
containsDecreasingPointers to AArch64TargetTransformInfo.cpp
that searches all instructions in the loop for loads or
stores with negative strides.
If the number of factors needed to determine the predication strategy is going to increase, perhaps it's worth creating a descriptor class, much like IntrinsicCostAttributes, to keep the interface churn down.
What do people think?