This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
6/6
AMDGPUResourceUsageAnalysis.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1/5
resource-usage-pal.ll

Differential D150609

[AMDGPU] Do not assume stack size for PAL code object indirect calls
ClosedPublic

Authored by bsaleil on May 15 2023, 1:37 PM.

Download Raw Diff

Details

Reviewers

arsenm
foad
sebastian-ne

Commits

rG3604fdf18d35: [AMDGPU] Do not assume stack size for PAL code object indirect calls

Summary

There is no need to set a big default stack size for PAL code object indirect calls. The driver knows the max recursion depth, so it can compute a more accurate value from the minimum scratch size.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bsaleil created this revision.May 15 2023, 1:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2023, 1:37 PM

Herald added subscribers: StephenFan, kerbowa, hiraditya and 5 others. · View Herald Transcript

bsaleil requested review of this revision.May 15 2023, 1:37 PM

Herald added subscribers: llvm-commits, wdng. · View Herald TranscriptMay 15 2023, 1:37 PM

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124	Dynamic objects should be treated identically
llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
2	Can you just add a run line to an existing test?

Harbormaster completed remote builds in B232101: Diff 522315.May 15 2023, 3:34 PM

sebastian-ne added inline comments.May 16 2023, 2:32 AM

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
121–123	Should use explicit brackets around the && clause.
122–124	I think the logic here is that we treat PAL code objects the same as AMDHSL code objects >= 5 (PAL code objects have no amdgpu_code_object_version). If there is an indirect call, something outside the compiler needs to compute the stack size, in the case of PAL/graphics, that is the driver. So probably we should use `if (AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5 \|\| STI.getTargetTriple().getOS() == Triple::AMDPAL)` for both, `AssumedStackSizeForDynamicSizeObjects` and `AssumedStackSizeForExternalCall`

arsenm added inline comments.May 16 2023, 7:22 AM

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124	Is some equivalent to the dynamic bit in the metadata emitted for pal?

sebastian-ne added inline comments.May 16 2023, 7:31 AM

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124	We emit per-function metadata (https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdpal-code-object-shader-function-map-table). An application needs to specify the recursion depth it uses and the driver uses this information to compute the needed scratch allocation.

arsenm added inline comments.May 16 2023, 7:34 AM

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124	That metadata should be checked in the test

Addressed review comments

bsaleil marked 6 inline comments as done.May 24 2023, 1:43 PM

arsenm added inline comments.May 24 2023, 2:19 PM

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6	Doesn't check a dynamic-is-present metadata field?

Harbormaster completed remote builds in B234314: Diff 525325.May 24 2023, 3:08 PM

arsenm requested changes to this revision.Jun 4 2023, 9:12 AM

This revision now requires changes to proceed.Jun 4 2023, 9:12 AM

Add comment to the test case

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6	We don't have such flag in PAL abi.

arsenm added inline comments.Jun 5 2023, 8:43 AM

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6	Then that is a problem? You would need one to know you need to add some extra?

sebastian-ne added inline comments.Jun 5 2023, 8:59 AM

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6	The amount of scratch that needs to be allocated is computed outside of LLVM in the graphics driver. The compute equivalent would be the linker/loader that sees all the functions that are linked together and also gets additional data like the maximum recursion depth. So, the only information needed in PAL metadata is the scratch usage of a function itself, without any callees.

Harbormaster completed remote builds in B236647: Diff 528459.Jun 5 2023, 10:03 AM

arsenm accepted this revision.Jun 8 2023, 4:31 PM

This revision is now accepted and ready to land.Jun 8 2023, 4:31 PM

This revision was landed with ongoing or failed builds.Jun 12 2023, 7:15 AM

Closed by commit rG3604fdf18d35: [AMDGPU] Do not assume stack size for PAL code object indirect calls (authored by bsaleil). · Explain Why

This revision was automatically updated to reflect the committed changes.

bsaleil added a commit: rG3604fdf18d35: [AMDGPU] Do not assume stack size for PAL code object indirect calls.

This patch is causing buildbot failures because the RUN line of the test case is invalid (missing colon symbol). This is now fixed by 9eea63bc9ccf5bd18a040cc028238ac4b49b77ea

sebastian-ne mentioned this in D153280: [AMDGPU] Forbid dynamic alloca on PAL ABI.Jun 19 2023, 8:04 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUResourceUsageAnalysis.cpp

4 lines

test/

CodeGen/

AMDGPU/

resource-usage-pal.ll

17 lines

Diff 530499

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp

	Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines

	bool AMDGPUResourceUsageAnalysis::runOnModule(Module &M) {			bool AMDGPUResourceUsageAnalysis::runOnModule(Module &M) {
	auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();			auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
	if (!TPC)			if (!TPC)
	return false;			return false;

	MachineModuleInfo &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();			MachineModuleInfo &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
	const TargetMachine &TM = TPC->getTM<TargetMachine>();			const TargetMachine &TM = TPC->getTM<TargetMachine>();
				const MCSubtargetInfo &STI = *TM.getMCSubtargetInfo();
	bool HasIndirectCall = false;			bool HasIndirectCall = false;

	CallGraph CG = CallGraph(M);			CallGraph CG = CallGraph(M);
	auto End = po_end(&CG);			auto End = po_end(&CG);

	// By default, for code object v5 and later, track only the minimum scratch			// By default, for code object v5 and later, track only the minimum scratch
	// size			// size
	if (AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5) {			if (AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5 \|\|
				STI.getTargetTriple().getOS() == Triple::AMDPAL) {
	if (!AssumedStackSizeForDynamicSizeObjects.getNumOccurrences())			if (!AssumedStackSizeForDynamicSizeObjects.getNumOccurrences())
	AssumedStackSizeForDynamicSizeObjects = 0;			AssumedStackSizeForDynamicSizeObjects = 0;
	if (!AssumedStackSizeForExternalCall.getNumOccurrences())			if (!AssumedStackSizeForExternalCall.getNumOccurrences())
	AssumedStackSizeForExternalCall = 0;			AssumedStackSizeForExternalCall = 0;
	}			}

	for (auto IT = po_begin(&CG); IT != End; ++IT) {			for (auto IT = po_begin(&CG); IT != End; ++IT) {
				sebastian-neUnsubmitted Done Reply Inline Actions Should use explicit brackets around the && clause. sebastian-ne: Should use explicit brackets around the && clause.
	Function *F = IT->getFunction();			Function *F = IT->getFunction();
				arsenmUnsubmitted Done Reply Inline Actions Dynamic objects should be treated identically arsenm: Dynamic objects should be treated identically
				sebastian-neUnsubmitted Done Reply Inline Actions I think the logic here is that we treat PAL code objects the same as AMDHSL code objects >= 5 (PAL code objects have no amdgpu_code_object_version). If there is an indirect call, something outside the compiler needs to compute the stack size, in the case of PAL/graphics, that is the driver. So probably we should use `if (AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5 \|\| STI.getTargetTriple().getOS() == Triple::AMDPAL)` for both, `AssumedStackSizeForDynamicSizeObjects` and `AssumedStackSizeForExternalCall` sebastian-ne: I think the logic here is that we treat PAL code objects the same as AMDHSL code objects >= 5…
				arsenmUnsubmitted Done Reply Inline Actions Is some equivalent to the dynamic bit in the metadata emitted for pal? arsenm: Is some equivalent to the dynamic bit in the metadata emitted for pal?
				sebastian-neUnsubmitted Done Reply Inline Actions We emit per-function metadata (https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdpal-code-object-shader-function-map-table). An application needs to specify the recursion depth it uses and the driver uses this information to compute the needed scratch allocation. sebastian-ne: We emit per-function metadata (https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdpal-code-object…
				arsenmUnsubmitted Done Reply Inline Actions That metadata should be checked in the test arsenm: That metadata should be checked in the test
	if (!F \|\| F->isDeclaration())			if (!F \|\| F->isDeclaration())
	continue;			continue;

	MachineFunction MF = MMI.getMachineFunction(F);			MachineFunction MF = MMI.getMachineFunction(F);
	assert(MF && "function must have been generated already");			assert(MF && "function must have been generated already");

	auto CI =			auto CI =
	CallGraphResourceInfo.insert(std::pair(F, SIFunctionResourceInfo()));			CallGraphResourceInfo.insert(std::pair(F, SIFunctionResourceInfo()));
	▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll

This file was added.

				; RUN llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s

				arsenmUnsubmitted Not Done Reply Inline Actions Can you just add a run line to an existing test? arsenm: Can you just add a run line to an existing test?
				; Check that we do not assume any default stack size for PAL code object
				; indirect calls. The driver knows the max recursion depth, so it can compute
				; a more accurate value.

				arsenmUnsubmitted Not Done Reply Inline Actions Doesn't check a dynamic-is-present metadata field? arsenm: Doesn't check a dynamic-is-present metadata field?
				bsaleilAuthorUnsubmitted Done Reply Inline Actions We don't have such flag in PAL abi. bsaleil: We don't have such flag in PAL abi.
				arsenmUnsubmitted Not Done Reply Inline Actions Then that is a problem? You would need one to know you need to add some extra? arsenm: Then that is a problem? You would need one to know you need to add some extra?
				sebastian-neUnsubmitted Not Done Reply Inline Actions The amount of scratch that needs to be allocated is computed outside of LLVM in the graphics driver. The compute equivalent would be the linker/loader that sees all the functions that are linked together and also gets additional data like the maximum recursion depth. So, the only information needed in PAL metadata is the scratch usage of a function itself, without any callees. sebastian-ne: The amount of scratch that needs to be allocated is computed outside of LLVM in the graphics…
				; CHECK: ScratchSize: 0
				; CHECK: scratch_memory_size: 0
				define amdgpu_vs void @test() {
				.entry:
				%0 = call i64 @llvm.amdgcn.s.getpc()
				%1 = inttoptr i64 %0 to ptr
				call amdgpu_gfx void %1()
				ret void
				}

				declare i64 @llvm.amdgcn.s.getpc()