This is an archive of the discontinued LLVM Phabricator instance.

[LV] Initial VPlan cost modelling
AbandonedPublic

Authored by dmgreen on Oct 13 2020, 8:44 AM.

Details

Summary

This adds the initial skeleton and cost modelling needed to cost vplans. This replaces the current method of summing the cost of each instruction in the loop body.

It currently attempts to fairly precisely mimic the existing code model in order to not introduce too many regressions at once. As a result some of the decisions it makes are not optimal, notable in how predication is handled.

The basic scheme is to call cost() on VPlans, which recurses into VPBasicBlocks and into VPRecipes. Most cost() methods for individual recipes currently call CostModel->getInstructionCost, which will be refactored to call TTI hooks directly in future patches. In order to mimic the existing model a ReciprocalPredBlockProb is added to VPBasicBlock to model the old method of reducing the scalar cost for predicated blocks. This is known to be rather inaccurate, but if removed can lead to regressions. I will hopefully improve this bit somehow..

It passes all the llvm tests but can still causes differences for some code, especially around loops which were already close to the same score between vector factors. One common place I've seen is that the backedge cost was often over-estimated in the past. It will now correctly cost VPReduction recipes, which is nice but should only effect MVE. VPInstructions will follow in a subsequent patch, but may need to start including type information.

The patch adds an option, "cost-using-vplan", that can be used pick between the old method and the new. The idea is to switch to the new method and remove the old code path once any regressions are addressed.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 13 2020, 8:44 AM
Herald added a project: Restricted Project. · View Herald Transcript
dmgreen requested review of this revision.Oct 13 2020, 8:44 AM
bmahjour added inline comments.Oct 13 2020, 10:32 AM
llvm/lib/Transforms/Vectorize/VPlan.h
372

The cost-model is conceptually the structure that knows all about calculating costs. Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make more sense to send the VPlan to the cost-model as it already has access to legality?

dmgreen added inline comments.Oct 13 2020, 1:56 PM
llvm/lib/Transforms/Vectorize/VPlan.h
372

The idea, as far as I understand, is that just as the execution of a vplan is separated into the recipes that make it up, so should the cost model. All the CM.getInstructionCost(..) methods will be replaced by the code that make them up - VPInterleaveRecipe will know how to cost interleaving groups, VPReductions will know how to cost reduction, VPWidenRecipes will be able to handle widened instructions etc

The LoopVectorizationCostModel is the old monolithic way of doing things, which vplan is trying to move away from. This is only the first step of trying to clean that up..

As for this struct specifically, yeah it doesn't feel like the best. I was trying to follow VPTransformState but it doesn't contain much in it at the moment. Plus I apparently misspelt analysis.

dmgreen updated this revision to Diff 297952.Oct 13 2020, 2:01 PM

Update some spelling.

bmahjour added inline comments.Oct 15 2020, 12:21 PM
llvm/lib/Transforms/Vectorize/VPlan.h
372

Ok, if the idea is to eventually remove the LoopVectorizationCostModel in the future, this change makes sense as a transitional step.
By the way, I find it strange that we rely on LoopVectorizationLegality to get information about the cost! Looks like we only need getWidestInductionType from it, so can we put that in the context instead of the whole class?

dmgreen added inline comments.Oct 17 2020, 9:18 AM
llvm/lib/Transforms/Vectorize/VPlan.h
372

Ah Good point. Some of this code was added and removed along the way.

You are right that the Legality isn't really needed much at the moment. We may need it for some things in the future, but if we do can address those as they come up.

dmgreen updated this revision to Diff 298832.EditedOct 17 2020, 9:19 AM

Adjust VPCostContext.

fhahn added a comment.Nov 3 2020, 1:47 PM

Thank you very much for putting up the patch to get things started!

I think it would be good to make sure the first steps with the cost model won't make it harder to separate the 'decision' and the 'assign cost' steps in the current cost model, in particular moving decisions out of the legacy cost model.

I tried to get started with this by moving the initial VPlan generation earlier (D90711) and then applying/updating this patch on top of that. I still like to experiment a bit, but I think this patch is a good first step, that would allow us to make progress in 2 directions:

  1. move decisions out of the cost model
  2. add VPlan transformations that require some costing.
nadav added a comment.Nov 3 2020, 2:38 PM

Hi Dave. I have a quick question. Have you considered tuning the current cost model? What are the compile time implications of using VPlans?

simoll added a subscriber: simoll.Nov 4 2020, 2:46 AM
Kazhuu added a subscriber: Kazhuu.Mar 3 2021, 8:52 PM
a.elovikov added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.h
372

Instead of packaging the cost-model and legality and send it to the VPlan, wouldn't it make more sense to send the VPlan to the cost-model...

My +1 for putting CM code outside the VPlan itself. I prefer to see VPlan as a kind of IR in itself and hence it would make sense to keep it similar to LLVM IR CostModeling via separate class. In fact, I'd rather move ::execute methods out of VPlan as well.

Moving the cost modeling code out of VPlan classes might also make it easier to prototype/implement some advanced heuristics that would work on the basicblock/whole vplan level (i.e. some kinds of register pressure/spills/fills estimates) by just putting all the code locally and grouped by different level of details we want to account for. We might also consider cost modeling for either throughput or latency depending on the trip counts. IMO, that would also be easier to code if we'd have a separate class.

Matt added a subscriber: Matt.May 20 2021, 8:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 10:02 AM
rengolin added inline comments.Mar 8 2023, 10:46 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
195

This API seems a bit weird to me. I'd expect code generation decisions to be an entity on its own, not some pair (which would very well used std::pair typedef).

Today is the VF we're looking at, perhaps one day we'll want to look at UF costs (less branches), particular options of the plans themselves (split/reorder outer-loops in different ways), etc.

So, for now, if it's just a return value wrapper, we can do with std::pair or std::tuple and use auto [vplan, VF] = ... to extract on call.

For later, if we want to carry more info without passing them all as arguments, we should have an actual VPlanResult struct or something, with a back pointer to the VPlan and the parameter selection.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7127

So, whatever is the first VPlan with a VF we return? Is it guaranteed to be 1?

llvm/lib/Transforms/Vectorize/VPlan.h
97

Funny enough, here the use of std::pair is worse. In the code that uses it, there's a lot of:

Cost.first += C.first;
Cost.second |= C.second;

and the typedef offers no help.

Here an actual struct would be more suitable:

struct VectoriztionCostTy {
  unsigned cost;
  bool active;
}
dmgreen abandoned this revision.Mar 13 2023, 5:35 AM

This too old to be useful now and I don't have any plans to work on it in the near term. (It would be good to see improvements though, where the vplan is costed more directly as opposed continuing to go through the IR instructions).

This too old to be useful now and I don't have any plans to work on it in the near term. (It would be good to see improvements though, where the vplan is costed more directly as opposed continuing to go through the IR instructions).

I was worried the necro-bump would lead to this, but at least we have some valid reasoning here and a good seed for a future effort to resurrect this.