This patch follows our RFC[1] and presentation at the Dev Meeting[2]. Namely, it starts to address the proposal stated there:
Proposal: introduce the Vectorization Plan as an explicit model of a vectorization candidate and update the overall flow
according to the first step expressed:
The first patches we're working on are designed to have the innermost Loop Vectorizer explicitly model the control flow of its vectorized loop.
This implementation is designed to show key aspects of the VPlan model, demonstrating how it can capture precisely *all* vectorization decisions taken inside a to-be vectorized loop by the current Loop Vectorizer, and carry them out. It is therefore practically an NFC patch, with slight disclaimers listed below. The VPlan model implemented strives to be compact , addressing compile-time concerns. More technical details are documented in the rst file attached. The patch can be broken down into several hunks for incremental landing; a tentative break-down list is provided below.
Thanks to the Intel vectorization team for this joint effort,
Gil and Ayal.
Deviation from current functionality:
- Debug printout of “LV: Scalarizing [and predicating]: <inst>” – VPlan carries out these decisions before Cost-Model’s printouts, unlike current behavior.
- Placement of extracts moved to basic-block of users: at start of this basic-block vs. before first user; the distinct orders should be insignificant, subject to scheduling.
- Redundant basic-blocks/phi’s – these are insignificant, subject to subsequent clean-up.
Tentative break-down; some tasks refactor or fix current LV, some introduce parts of VPlan:
- refactor Cost-Model to provide MaxVF and early-exit methods.
- refactor ILV to provide vectorizeInstruction, getScalarValue, getVectorValue, widenIntInduction, buildScalarSteps, PHIsToFix/fixCrossIterationPHIs, and possibly additional methods.
- fix Unroller’s getScalarValue() to reuse ILV’s refactored getScalarValue(Part, Lane) which also sets metadata. Will simplify this patch.
- unify the GEP reuse behavior between a vectorized wide load/store, and the wide load/store of an interleave group. Will simplify this patch.
- have LV avoid creating redundant basic-blocks. Will help this patch be fully NFC.
- have LV cache basic-block masks and reuse them. Will help this patch be fully NFC.
- build initial VPlans and print them for debugging
- convert ILV.vectorize to use LVP.executeBestPlan, keeping sinkScalarOperands() as a non-VPlan post-processing method.
- optimize VPlans by introducing sinkScalarOperands() and print them for debugging
- use VPlan’s sinkScalarOperands() instead of the non-VPlan version
[1] RFC
[2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization, 2016 LLVM Developers' Meeting
As @chandlerc pointed out to me, this isn't actually NFC. It's expected not to change the output of the vectorizer, but it's a huge change in how the vectorizer *functions*.