This adds the initial skeleton and cost modelling needed to cost vplans. This replaces the current method of summing the cost of each instruction in the loop body.
It currently attempts to fairly precisely mimic the existing code model in order to not introduce too many regressions at once. As a result some of the decisions it makes are not optimal, notable in how predication is handled.
The basic scheme is to call cost() on VPlans, which recurses into VPBasicBlocks and into VPRecipes. Most cost() methods for individual recipes currently call CostModel->getInstructionCost, which will be refactored to call TTI hooks directly in future patches. In order to mimic the existing model a ReciprocalPredBlockProb is added to VPBasicBlock to model the old method of reducing the scalar cost for predicated blocks. This is known to be rather inaccurate, but if removed can lead to regressions. I will hopefully improve this bit somehow..
It passes all the llvm tests but can still causes differences for some code, especially around loops which were already close to the same score between vector factors. One common place I've seen is that the backedge cost was often over-estimated in the past. It will now correctly cost VPReduction recipes, which is nice but should only effect MVE. VPInstructions will follow in a subsequent patch, but may need to start including type information.
The patch adds an option, "cost-using-vplan", that can be used pick between the old method and the new. The idea is to switch to the new method and remove the old code path once any regressions are addressed.
This API seems a bit weird to me. I'd expect code generation decisions to be an entity on its own, not some pair (which would very well used std::pair typedef).
Today is the VF we're looking at, perhaps one day we'll want to look at UF costs (less branches), particular options of the plans themselves (split/reorder outer-loops in different ways), etc.
So, for now, if it's just a return value wrapper, we can do with std::pair or std::tuple and use auto [vplan, VF] = ... to extract on call.
For later, if we want to carry more info without passing them all as arguments, we should have an actual VPlanResult struct or something, with a back pointer to the VPlan and the parameter selection.