Changes isEpilogueVectorizationProfitable to multiply the VF of the original loop with the
interleave count and compare this to EpilogueVectorizationMinVF.
When interleaving we reduce the likelihood of doing any vector work by increasing the
minimum trip count required to enter the loop, in the same way as choosing a high VF.
This change allows epilogues to be considered for loops with interleaving more equally,
for example VF=8, IC=2 will be considered in the same way that VF=16, IC=1 will.
I ran SPEC2017 on neoverse-v1 with this change and saw no overall difference in the geometric
mean, and no differences in the performance of individual benchmarks outside of noise.
This change however results in about a 6% performance improvement for x264 on publicly
available SVE2 hardware.
I think here it would probably be better to use VF=4 for the epilogue loop to use the full vector width. I think the existing logic to pick the epilogue vectorization factor picks the next lowest VF, which probably needs adjusting as well.
In addition to that, it would be good to verify the impact with some microbenchmarks (could be added here https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)