This patch updates the cost model for ordered reductions so that a call
to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction
plus the cost of an ordered fadd reduction.
Depends on D111555
Differential D111630
[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic RosieSumpter on Oct 12 2021, 4:25 AM. Authored by
Details This patch updates the cost model for ordered reductions so that a call Depends on D111555
Diff Detail Event Timeline
Comment Actions Created getFMulAddReductionCost. This means the code which calculates the total cost of the fmuladd is moved out of the vectorizer, but avoids changing the interface for getArithmeticReductionCost.
Comment Actions
Comment Actions Thanks for the extra info @RosieSumpter. As I see it @fhahn was just asking a question of which I personally would have answered no, presenting the same rational as my previous comment. I guess we'll have to wait for others to respond.
Comment Actions
Comment Actions Thanks for the comments @paulwalker-arm and @david-arm. I've moved the fmul cost calculation back to the vectorizer since this seems like the more favourable option.
Comment Actions
Comment Actions This cost calculation seems correct to me. For in-loop reductions it calculates the cost as a single fmul + fadd *reduction*. If this is not an in-loop reduction, or if fmuladd is not used in a reduction, it will follow the regular code-path to get the cost of this intrinsic (either as an FMA or separate fmul+fadd). So, LGTM! (Please address the minor nits before you commit)
|
Do we need a new TTI interface for this? To my mind the costing side of TTI exists to cost real entities and in this instance the IR has no discrete concept for an FMA reduction.
Instead what we have is LoopVectorize pretending such a concept exists that is knows ahead of time it will simulate with separate fmul and ordered_fadd_reduce operations. For this reason I think it would be better to explicitly cost that exact idiom within the same code that is using it.
So I guess I'm saying that if you move these three lines into LoopVectorize.cpp the patch can be much smaller and you're not creating a new interface for something that doesn't really exist.