This patch adds handling for the llvm.powi.* intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization.
Closes #53887.
Paths
| Differential D128172
[SLP] Add cost model for `llvm.powi.*` intrinsics ClosedPublic Authored by n-omer on Jun 20 2022, 2:52 AM.
Details Summary This patch adds handling for the llvm.powi.* intrinsics in BasicTTIImplBase::getIntrinsicInstrCost() and improves vectorization. Closes #53887.
Diff Detail Event TimelineComment Actions diff with context?
n-omer retitled this revision from [SLP] Add cost models for llvm.powi.* intrinsics to [SLP] Add cost model for `llvm.powi.*` intrinsics.Jun 20 2022, 3:00 AM This revision is now accepted and ready to land.Jun 21 2022, 6:55 AM This revision was landed with ongoing or failed builds.Jun 21 2022, 7:41 AM Closed by commit rGe6ccb57bb3f6: [SLP] Add cost model for `llvm.powi.*` intrinsics (authored by n-omer). · Explain Why This revision was automatically updated to reflect the committed changes. Herald added subscribers: • pcwang-thead, frasercrmck, luismarques and 20 others. · View Herald TranscriptJun 22 2022, 2:54 AM Comment Actions @craig.topper @dmgreen Are the aarch64/riscv cost changes OK? They are in 'unsupported' sections - so not sure if they have extra meaning? Comment Actions I think it is fine so long as the backend can handle it: Can you change the test lines to this, which should still return an invalid cost: %powi = call <vscale x 4 x float> @llvm.powi.nxv4f32.i32(<vscale x 4 x float> %vec, i32 %extraarg) Comment Actions Maybe copy the immediate tests below so we have coverage for both constant / variable exponent values? Comment Actions Thanks, LGTM. Although I was wondering why the cost came out as 12, not 14 from the constant 42. It may be possible to get a slightly better costs that more accurately match the code from ExpandPowI.
Comment Actions Thanks. I guess that changes all the negative constant values too.. If you want a cost for negative numbers I think it would use the abs of the value in the ActiveBits+PopCount-2 computation, plus the div like you already have it.
Comment Actions Thanks. The ticket is still "closed", so I cant properly accept it again. But it LGTM. This revision was landed with ongoing or failed builds.Jun 24 2022, 3:24 AM
Revision Contents
Diff 438954 llvm/include/llvm/CodeGen/BasicTTIImpl.hllvm/include/llvm/CodeGen/TargetLowering.hllvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
llvm/test/Analysis/CostModel/X86/powi.ll
llvm/test/Transforms/SLPVectorizer/X86/powi-regression.llllvm/test/Transforms/SLPVectorizer/X86/powi.ll
|
I think a more accurate count might be ActiveBits + PopCount - 2, at least from playing around with a few examples.