This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
InlineCost.cpp
-
test/Transforms/Inline/AMDGPU/
-
Transforms/
-
Inline/
-
AMDGPU/
-
amdgpu-inline-alloca-argument-cost.ll

Differential D98362

[AMDGPU] Fix -amdgpu-inline-arg-alloca-cost
ClosedPublic

Authored by rampitec on Mar 10 2021, 10:21 AM.

Download Raw Diff

Details

Reviewers

arsenm
dfukalov

Commits

rGb7b99b0799fa: [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost

Summary

Before D94153 this threshold was in a pre-scaled units.
After D94153 inlining threshold multiplier is not applied
to this portion of the threshold anymore. Restore the
threshold by applying the multiplier.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Mar 10 2021, 10:21 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald TranscriptMar 10 2021, 10:21 AM

rampitec requested review of this revision.Mar 10 2021, 10:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2021, 10:21 AM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added inline comments.Mar 10 2021, 11:55 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192 ↗	(On Diff #329708)	Should the just adjust for the scale then?

rampitec added inline comments.Mar 10 2021, 11:57 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192 ↗	(On Diff #329708)	I thought about this, but whenever we will adjust the scale the next time we will have to visit it again.

Harbormaster completed remote builds in B93126: Diff 329708.Mar 10 2021, 9:58 PM

Do you have any test for the fix?

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192 ↗	(On Diff #329708)	Nit: It seems instead of this modification you can just swap two lines 1582: Threshold = TTI.getInliningThresholdMultiplier(); 1583: Threshold += TTI.adjustInliningThreshold(&Call); in InlineCost.cpp so we'll stay with just one place of ` getInliningThresholdMultiplier()`.

In D98362#2619375, @dfukalov wrote:

Do you have any test for the fix?

These tests tend to be either unreliable or huge. We cannot measure performance with lit tests.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192 ↗	(On Diff #329708)	That would change behavior for all targets.

arsenm added inline comments.Mar 11 2021, 6:07 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192 ↗	(On Diff #329708)	I thought the point of the multiplier was to just amplify the expense of calls. I don't understand scaling up the cost here

rampitec added inline comments.Mar 11 2021, 6:10 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1192 ↗	(On Diff #329708)	It's more like a uniform target cost multiplier which shall be applied to everything. But probably Daniil is correct and I have to swap two lines in the inliner instead, until noone uses it as is.

Swapped order of operations in the InlineCost itself instead. This is still unused by any other target so it is possible to fix it there, which seems to be a more correct way.

Herald added subscribers: haicheng, eraman. · View Herald TranscriptMar 12 2021, 9:22 AM

Added test.

Thanks!

This revision is now accepted and ready to land.Mar 12 2021, 10:01 AM

Harbormaster completed remote builds in B93520: Diff 330270.Mar 12 2021, 10:18 AM

This revision was landed with ongoing or failed builds.Mar 12 2021, 10:20 AM

Closed by commit rGb7b99b0799fa: [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rGb7b99b0799fa: [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost.

Harbormaster completed remote builds in B93531: Diff 330283.Mar 12 2021, 11:13 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

InlineCost.cpp

3 lines

test/

Transforms/

Inline/

AMDGPU/

amdgpu-inline-alloca-argument-cost.ll

22 lines

Diff 330292

llvm/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 1,571 Lines • ▼ Show 20 Lines	if (!Caller->hasOptSize() && HotCallSiteThreshold) {
// reduction, it can cause the size of a non-cold caller to increase		// reduction, it can cause the size of a non-cold caller to increase
// preventing it from being inlined.		// preventing it from being inlined.
DisallowAllBonuses();		DisallowAllBonuses();
Threshold = MinIfValid(Threshold, Params.ColdThreshold);		Threshold = MinIfValid(Threshold, Params.ColdThreshold);
}		}
}		}
}		}

		Threshold += TTI.adjustInliningThreshold(&Call);

// Finally, take the target-specific inlining threshold multiplier into		// Finally, take the target-specific inlining threshold multiplier into
// account.		// account.
Threshold *= TTI.getInliningThresholdMultiplier();		Threshold *= TTI.getInliningThresholdMultiplier();
Threshold += TTI.adjustInliningThreshold(&Call);

SingleBBBonus = Threshold * SingleBBBonusPercent / 100;		SingleBBBonus = Threshold * SingleBBBonusPercent / 100;
VectorBonus = Threshold * VectorBonusPercent / 100;		VectorBonus = Threshold * VectorBonusPercent / 100;

bool OnlyOneCallAndLocalLinkage =		bool OnlyOneCallAndLocalLinkage =
F.hasLocalLinkage() && F.hasOneUse() && &F == Call.getCalledFunction();		F.hasLocalLinkage() && F.hasOneUse() && &F == Call.getCalledFunction();
// If there is only one call of the function, and it has internal linkage,		// If there is only one call of the function, and it has internal linkage,
// the cost of inlining it drops dramatically. It may seem odd to update		// the cost of inlining it drops dramatically. It may seem odd to update
▲ Show 20 Lines • Show All 1,204 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-alloca-argument-cost.ll

This file was added.

				; RUN: opt -mtriple=amdgcn--amdhsa -S -passes=inline -inline-threshold=0 -debug-only=inline-cost < %s 2>&1 \| FileCheck %s

				; REQUIRES: asserts

				target datalayout = "A5"

				; Verify we are properly adding cost of the -amdgpu-inline-arg-alloca-cost to the threshold.

				; CHECK: NumAllocaArgs: 1
				; CHECK: Threshold: 66000

				define void @use_private_ptr_arg(float addrspace(5)* nocapture %p) {
				ret void
				}

				define amdgpu_kernel void @test_inliner_pvt_ptr(float addrspace(1)* nocapture %a, i32 %n) {
				entry:
				%pvt_arr = alloca [64 x float], align 4, addrspace(5)
				%to.ptr = getelementptr inbounds [64 x float], [64 x float] addrspace(5)* %pvt_arr, i32 0, i32 0
				call void @use_private_ptr_arg(float addrspace(5)* %to.ptr)
				ret void
				}