AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?
Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does
I can add test, but it essentially will be a quite large function.
Yes, multiplier is applied in the CallAnalyzer::updateThreshold(), which applies to a normal threshold. But here we handle it ourselves, so need to update.
The primary reason to override getInlineCost() is to handle alloca arguments, but it also handles wrapper calls and used to limit inlining based on the number of blocks.
test/CodeGen/AMDGPU/inline-hint.ll | ||
---|---|---|
1 ↗ | (On Diff #202330) | Can you remove the -O1, and also move to test/Transforms/Inline/AMDGPU |
test/Transforms/Inline/AMDGPU/inline-hint.ll | ||
---|---|---|
1 ↗ | (On Diff #202337) | You need to replace -O1 with -inline, not just remove it |