This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/AMDGPU/
-
lib/
-
Target/
-
AMDGPU/
-
AMDGPUTargetTransformInfo.h
5/6
AMDGPUTargetTransformInfo.cpp

Differential D98362

[AMDGPU] Fix -amdgpu-inline-arg-alloca-cost
ClosedPublic

Authored by rampitec on Mar 10 2021, 10:21 AM.

Download Raw Diff

Details

Reviewers

arsenm
dfukalov

Commits

rGb7b99b0799fa: [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost

Summary

Before D94153 this threshold was in a pre-scaled units.
After D94153 inlining threshold multiplier is not applied
to this portion of the threshold anymore. Restore the
threshold by applying the multiplier.

Diff Detail

Event Timeline

rampitec created this revision.Mar 10 2021, 10:21 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald TranscriptMar 10 2021, 10:21 AM

rampitec requested review of this revision.Mar 10 2021, 10:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2021, 10:21 AM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added inline comments.Mar 10 2021, 11:55 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192	Should the just adjust for the scale then?

rampitec added inline comments.Mar 10 2021, 11:57 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192	I thought about this, but whenever we will adjust the scale the next time we will have to visit it again.

Harbormaster completed remote builds in B93126: Diff 329708.Mar 10 2021, 9:58 PM

Do you have any test for the fix?

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192	Nit: It seems instead of this modification you can just swap two lines 1582: Threshold = TTI.getInliningThresholdMultiplier(); 1583: Threshold += TTI.adjustInliningThreshold(&Call); in InlineCost.cpp so we'll stay with just one place of ` getInliningThresholdMultiplier()`.

In D98362#2619375, @dfukalov wrote:

Do you have any test for the fix?

These tests tend to be either unreliable or huge. We cannot measure performance with lit tests.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192	That would change behavior for all targets.

arsenm added inline comments.Mar 11 2021, 6:07 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192	I thought the point of the multiplier was to just amplify the expense of calls. I don't understand scaling up the cost here

rampitec added inline comments.Mar 11 2021, 6:10 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192	It's more like a uniform target cost multiplier which shall be applied to everything. But probably Daniil is correct and I have to swap two lines in the inliner instead, until noone uses it as is.

Swapped order of operations in the InlineCost itself instead. This is still unused by any other target so it is possible to fix it there, which seems to be a more correct way.

Herald added subscribers: haicheng, eraman. · View Herald TranscriptMar 12 2021, 9:22 AM

Added test.

Thanks!

This revision is now accepted and ready to land.Mar 12 2021, 10:01 AM

Harbormaster completed remote builds in B93520: Diff 330270.Mar 12 2021, 10:18 AM

This revision was landed with ongoing or failed builds.Mar 12 2021, 10:20 AM

Closed by commit rGb7b99b0799fa: [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rGb7b99b0799fa: [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost.

Harbormaster completed remote builds in B93531: Diff 330283.Mar 12 2021, 11:13 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.h

2 lines

AMDGPUTargetTransformInfo.cpp