Download Raw Diff

Details

Reviewers

scott.linder
arsenm

Commits

rGfbbe5230f434: [AMDGPU] Use InliningThresholdMultiplier for inline hint
rL362239: [AMDGPU] Use InliningThresholdMultiplier for inline hint

Summary

AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Diff Detail

Repository: rL LLVM

Event Timeline

rampitec created this revision.May 30 2019, 3:27 PM

Herald added subscribers: eraman, t-tye, tpr and 7 others. · View Herald TranscriptMay 30 2019, 3:27 PM

Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?

Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does

In D62707#1524219, @arsenm wrote:

Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?

Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does

I can add test, but it essentially will be a quite large function.
Yes, multiplier is applied in the CallAnalyzer::updateThreshold(), which applies to a normal threshold. But here we handle it ourselves, so need to update.
The primary reason to override getInlineCost() is to handle alloca arguments, but it also handles wrapper calls and used to limit inlining based on the number of blocks.

In D62707#1524236, @rampitec wrote:

In D62707#1524219, @arsenm wrote:

Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?

Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does

I can add test, but it essentially will be a quite large function.