This patch updates the cost model for ordered reductions so that a call
to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction
plus the cost of an ordered fadd reduction.
Depends on D111555
Paths
| Differential D111630
[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic ClosedPublic Authored by RosieSumpter on Oct 12 2021, 4:25 AM.
Details Summary This patch updates the cost model for ordered reductions so that a call Depends on D111555
Diff Detail Event Timeline
Comment Actions Created getFMulAddReductionCost. This means the code which calculates the total cost of the fmuladd is moved out of the vectorizer, but avoids changing the interface for getArithmeticReductionCost.
Comment Actions
paulwalker-arm added inline comments.
Comment Actions Thanks for the extra info @RosieSumpter. As I see it @fhahn was just asking a question of which I personally would have answered no, presenting the same rational as my previous comment. I guess we'll have to wait for others to respond.
Comment Actions
Comment Actions Thanks for the comments @paulwalker-arm and @david-arm. I've moved the fmul cost calculation back to the vectorizer since this seems like the more favourable option.
RosieSumpter marked an inline comment as done. Comment Actions
RosieSumpter added a parent revision: D111555: [LoopVectorize] Add vector reduction support for fmuladd intrinsic. Comment Actions This cost calculation seems correct to me. For in-loop reductions it calculates the cost as a single fmul + fadd *reduction*. If this is not an in-loop reduction, or if fmuladd is not used in a reduction, it will follow the regular code-path to get the cost of this intrinsic (either as an FMA or separate fmul+fadd). So, LGTM! (Please address the minor nits before you commit)
This revision is now accepted and ready to land.Oct 29 2021, 7:34 AM This revision was landed with ongoing or failed builds.Nov 24 2021, 1:00 AM Closed by commit rGdf32a39dd0f6: [LoopVectorize][CostModel] Update cost model for fmuladd intrinsic (authored by RosieSumpter). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 379740 llvm/include/llvm/Analysis/TargetTransformInfo.h
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
llvm/include/llvm/CodeGen/BasicTTIImpl.h
llvm/lib/Analysis/TargetTransformInfo.cpp
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
|
Do we need a new TTI interface for this? To my mind the costing side of TTI exists to cost real entities and in this instance the IR has no discrete concept for an FMA reduction.
Instead what we have is LoopVectorize pretending such a concept exists that is knows ahead of time it will simulate with separate fmul and ordered_fadd_reduce operations. For this reason I think it would be better to explicitly cost that exact idiom within the same code that is using it.
So I guess I'm saying that if you move these three lines into LoopVectorize.cpp the patch can be much smaller and you're not creating a new interface for something that doesn't really exist.