We currently have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third, a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision (without forcing it).
I did the initial TTI plumbing for this, added usage of this new hook to the vectoriser where this should be queried, and added the beginning of an ARM MVE implementation. While this isn't complete yet, it currently behaves as a non-functional change, it demonstrates the required function interfaces. I.e., for MVE, we would like the vectoriser to do tail-folding when we know we will be generating a hardware-loop and these checks are implemented in the ARM specific hook. I will follow-up on this soon, but that will be an entirely ARM specific patch.
What about MVE..? We also need to have masked load/stores enabled.