This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use InliningThresholdMultiplier for inline hint
ClosedPublic

Authored by rampitec on May 30 2019, 3:27 PM.

Details

Summary

AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.May 30 2019, 3:27 PM

Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?

Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does

Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?

Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does

I can add test, but it essentially will be a quite large function.
Yes, multiplier is applied in the CallAnalyzer::updateThreshold(), which applies to a normal threshold. But here we handle it ourselves, so need to update.
The primary reason to override getInlineCost() is to handle alloca arguments, but it also handles wrapper calls and used to limit inlining based on the number of blocks.

Needs test. I'm also not sure I follow how getInliningThresholdMultiplier is used. It looks like it's applied after any of the thresholds are computed in CallAnalyzer::updateThreshold?

Also why do we actually override getInlineThreshold? The alloca object check seems like it shouldn't really be fundamentally different than the SROA checks the default InlineCost does

I can add test, but it essentially will be a quite large function.

You can use the flags to make it a small function

rampitec updated this revision to Diff 202330.May 30 2019, 4:24 PM

Added test.

arsenm added inline comments.May 30 2019, 4:33 PM
test/CodeGen/AMDGPU/inline-hint.ll
1 ↗(On Diff #202330)

Can you remove the -O1, and also move to test/Transforms/Inline/AMDGPU

rampitec updated this revision to Diff 202332.May 30 2019, 4:42 PM
rampitec marked an inline comment as done.

Removed -O1 and moved test.

rampitec updated this revision to Diff 202337.May 30 2019, 5:21 PM

Restored -O1. opt does not run inliner w/o it.

arsenm added inline comments.May 31 2019, 7:53 AM
test/Transforms/Inline/AMDGPU/inline-hint.ll
1 ↗(On Diff #202337)

You need to replace -O1 with -inline, not just remove it

rampitec updated this revision to Diff 202443.May 31 2019, 8:37 AM
rampitec marked an inline comment as done.

Replaced -O1 with -amdgpu-inline.

arsenm accepted this revision.May 31 2019, 8:54 AM

LGTM

This revision is now accepted and ready to land.May 31 2019, 8:54 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2019, 9:17 AM