Ticket : https://bugs.llvm.org/show_bug.cgi?id=46929
Loop Vectorize currently doesn’t support epilog loop vectorization. The idea is to vectorize the remainder scalar loop after the initial vectorization with Vector Factor (VF) less than the original loop. Once this loop gets executed, the remaining iterations (if any) will go to the original scalar loop.
- Iteration checks are performed for both the vectorized and epilog vectorized loops
- Runtime checks (alias and SCEV checks) are done (only once) for either vectorized or epilog vector loops. If it fails, the original scalar loop is executed.
This helps in executing the vector code either for loops with smaller trip count (less than the original VF) or for loops with considerable remainder iterations after the original vectorization. Currently it is enabled for VF > 16.
This change gains in one of the SPEC CPU 2017 benchmark in both AArch64 and X86 Targets. It gains around 5% in x264 in Ryzen 2700x ref run.
It's extremely hard to "draw" this diagram in text. It's even harder to read it. I think we should create a documentation section under https://llvm.org/docs/Vectorizers.html#loop-vectorizer and upload an image. The link can then be put into the comment for people to view and understand what is being generated.