This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). This implementation differs from D88819 at least in the following ways:
- It's able to generate the most optimal control flow discussed in the above mentioned RFC by shortening the path-length in the case of small trip counts (those that result in all or most of the vector code getting skipped). It also avoids the redundant generation of runtime memory and SCEV checks needed to check for pointer aliasing. Please refer to the attached image illustrating the generated CFG.
- It uses a more modular approach by using the strategy design pattern and extending the InnerLoopVectorizer class.
- It can handle loops with multiple induction variables.
- It adds more debug traces.
The heuristic for determining when to perform the transform is overly simplistic and needs to be improved in the future. That work is not in the scope of this patch.