With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. This involved sinking cost of the TTIImpl into the base implementation, as it performs no target checks. The opcodes remaining were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. getInstructionThroughput can now directly return the result of getUserCost.
This has required a change in the AMDGPU backend for fabs and it as the tests suggest that they should always be 'free'. I've also changed the X86 backend to return '1' for any intrinsic when the CostKind isn't RecipThroughput.
Details
- Reviewers
SjoerdMeijer RKSimon dmgreen dfukalov rampitec - Commits
- rGbd9dce8f9acd: [CostModel] getUserCost for intrinsic throughput
rG871556a49455: [CostModel] Unify Intrinsic Costs.
rG1f72d5880e33: [CostModel] Check for free intrinsics in BasicTTI
rGb263fee4d2c9: [CostModel] Sink intrinsic costs to base TTI.
rGde71def3f59d: [CostModel] Unify Intrinsic Costs.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | ||
---|---|---|
563 | Previously this would have been reported from TLI.isFAbsFree, but I don't see that check getting dropped here? |
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | ||
---|---|---|
563 | I don't recall seeing a check like that... but it makes sense. Having the base implementation call it should work. |
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | ||
---|---|---|
563 | This estimation is good in average. I'm going to add tests and improve this place after your commit. LGTM |
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | ||
---|---|---|
563 | Fabs is always free. Eventually vectors break down into scalars that have free fabs uses |
This change has caused some large text size changes: http://llvm-compile-time-tracker.com/compare.php?from=7606a54363d3d90802977c9f5fb9046d4d780835&to=de71def3f59dc9f12f67141b5040d8e15c84d08a&stat=size-text There's a 5% increase on tramp3d-v4 and some large decreases (up to 8%) on debuginfo builds.
As the commit message tagged this as no function change intended and there are no test changes, I'm assuming this impact wasn't intended?
Er, wow, some of those are huge. Are you able to characterise those benchmarks and what optimisations they are affected by? I generally come across inlining, vectorization and unrolling changes and those all sound plausible candidates! Is a reproducer possible?
I'm suspecting that the debug changes maybe caused by the base implementation treating the debug intrinsics as free. I'll revert this and break it down into three separate patches:
- Sink all the trivially free intrinsics into the bottom-most implementation.
- Combine getIntrinsicCost and getIntrinsicInstrCost.
- Have getInstructionThroughput use getUserCost.
First part has gone in as: rGb263fee4d2c9: [CostModel] Sink intrinsic costs to base TTI.
Previously this would have been reported from TLI.isFAbsFree, but I don't see that check getting dropped here?