This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUResourceUsageAnalysis.h
-
AMDGPUResourceUsageAnalysis.cpp
-
GCNSubtarget.h
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
agpr-register-count.ll
4/9
amdpal-callable.ll
-
attr-amdgpu-flat-work-group-size-vgpr-limit.ll
-
call-alias-register-usage-agpr.ll
-
call-alias-register-usage0.ll
-
call-alias-register-usage1.ll
-
call-alias-register-usage2.ll
-
call-alias-register-usage3.ll
-
call-graph-register-usage.ll
-
indirect-call.ll

Differential D117364

AMDGPU: Use module level register maximums for unknown callees
ClosedPublic

Authored by arsenm on Jan 14 2022, 3:11 PM.

Download Raw Diff

Details

Reviewers

rampitec
sebastian-ne

Group Reviewers

Restricted Project

Summary

Compute the theoretical register budget based on the IR function
signature/attributes, and use the global maximum register budgets for
unknown callees.

This should fix the kernel reported register usage in the presence of
indirect calls. The previous fix in
2b08f6af62afbf32e89a6a392dbafa92c62f7bdf was incorrect becauset it was
only taking the maximum in the known call graph, and missing something
that was either outside of it or codegened later.

This fixes a second case I discovered where calls to aliases also did
not work as expected. CallGraphAnalysis misses these, so functions
called through aliases were not codegened ahead of callers as
expected. CallGraphAnalysis should probably be fixed to understand
this case, and there's likely a bug with IPRA here. This fixes
numerous failures in the conformance test at -O0.

Diff Detail

Unit TestsFailed

	Time	Test
	120 ms	x64 debian > Clang.CodeGenCXX::cxx1z-initializer-aggregate.cpp
	770 ms	x64 debian > SanitizerCommon-tsan-x86_64-Linux.Linux::decorate_proc_maps.cpp

Event Timeline

arsenm created this revision.Jan 14 2022, 3:11 PM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJan 14 2022, 3:11 PM

arsenm requested review of this revision.Jan 14 2022, 3:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2022, 3:11 PM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added a parent revision: D117358: AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch.Jan 14 2022, 3:11 PM

arsenm mentioned this in D117358: AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch.

rampitec accepted this revision.Jan 14 2022, 3:23 PM

This revision is now accepted and ready to land.Jan 14 2022, 3:23 PM

Harbormaster completed remote builds in B143513: Diff 400164.Jan 14 2022, 3:58 PM

sebastian-ne requested changes to this revision.Jan 17 2022, 2:31 AM

sebastian-ne added inline comments.

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	This is over-approximating the vgpr_count whenever an indirect call is involved, which is quite a performance hit. Can we switch AMDGPUResourceUsageAnalysis to a ModulePass and run `propagateIndirectCallRegisterUsage` at the end, so that all functions with indirect calls will get the maximum VGPR count of all functions in the module? (As opposed to max VGPR count of the SCC that is used currently, which I did not intend.)

This revision now requires changes to proceed.Jan 17 2022, 2:31 AM

arsenm added inline comments.Jan 17 2022, 8:33 AM

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	I'd like to move switching to a module pass into a follow up patch. I'm a bit afraid of unintended side effects by switching to a module pass. We're already paying a compile time cost by using SCC codegen, and module passes will be worse

sebastian-ne added inline comments.Jan 17 2022, 9:17 AM

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	If the over-approximation goes in, we’ll have to revert it in our graphics branch, otherwise any benchmarks or optimization work that is going on would be meaningless (we had to locally revert D103636 for that reason). So, I’d prefer it if we could fix this in one push, if you think that’s possible.

arsenm added inline comments.Jan 17 2022, 9:18 AM

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	This is just completely broken as is, any benchmarks are working by accident

sebastian-ne added inline comments.Jan 17 2022, 9:24 AM

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	The current graphics use-case is not affected by this bug. We’re only compiling single functions per LLVM module and finding the maximum register usage and linking is done by the loader (PAL in this case).

arsenm added inline comments.Jan 17 2022, 9:25 AM

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	So then it also wouldn't be impacted by this change?

sebastian-ne added inline comments.Jan 17 2022, 9:38 AM

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	The compiled modules contain only a single function, but the function contains indirect functions calls. So it hits the code path here that uses `getMaxNumVGPRs(F)` to approximate the register usage. The VGPR usage for the compiled function should just stay as it is, even if it contains an indirect call, because there is no other function (in the module) that could be called. I tested this patch on a pipeline and the reported VGPR usage goes from 140 VGPRs to 256 VGPRs, noticeably reducing the occupancy. As said, I’d prefer if this does not go in without an accompanying patch that makes the VGPR usage accurate again. If you think that’s unreasonable, I’ll unblock this patch and revert it in our graphics branch until the follow up patch.

arsenm added a child revision: D117504: AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass.Jan 17 2022, 10:21 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	I will push them both at the same time, but I'd like to defensively keep this and D117504 as separate commits in case something goes wrong

sebastian-ne accepted this revision as: sebastian-ne.Jan 17 2022, 10:50 AM

sebastian-ne added inline comments.

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll
184	Thank you, I’ll have a look tomorrow.

Forgot to accept as amdgpu last time.

This revision is now accepted and ready to land.Feb 2 2022, 1:12 AM

935abab65cafb509f60e76bd7255dfe03befde85

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUResourceUsageAnalysis.h

11 lines

AMDGPUResourceUsageAnalysis.cpp

88 lines

GCNSubtarget.h

4 lines

test/

CodeGen/

AMDGPU/

agpr-register-count.ll

29 lines

amdpal-callable.ll

40 lines

attr-amdgpu-flat-work-group-size-vgpr-limit.ll

6 lines

call-alias-register-usage-agpr.ll

31 lines

call-alias-register-usage0.ll

26 lines

call-alias-register-usage1.ll

29 lines

call-alias-register-usage2.ll

26 lines

call-alias-register-usage3.ll

26 lines

call-graph-register-usage.ll

18 lines

indirect-call.ll

32 lines

Diff 400164

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h

Show All 14 Lines
#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPURESOURCEUSAGEANALYSIS_H		#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPURESOURCEUSAGEANALYSIS_H
#define LLVM_LIB_TARGET_AMDGPU_AMDGPURESOURCEUSAGEANALYSIS_H		#define LLVM_LIB_TARGET_AMDGPU_AMDGPURESOURCEUSAGEANALYSIS_H

#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"

namespace llvm {		namespace llvm {

		class GCNTargetMachine;
class GCNSubtarget;		class GCNSubtarget;
class MachineFunction;		class MachineFunction;
class TargetMachine;		class TargetMachine;

struct AMDGPUResourceUsageAnalysis : public CallGraphSCCPass {		struct AMDGPUResourceUsageAnalysis : public CallGraphSCCPass {
static char ID;		static char ID;

public:		public:
Show All 35 Lines	public:

const SIFunctionResourceInfo &getResourceInfo(const Function *F) const {		const SIFunctionResourceInfo &getResourceInfo(const Function *F) const {
auto Info = CallGraphResourceInfo.find(F);		auto Info = CallGraphResourceInfo.find(F);
assert(Info != CallGraphResourceInfo.end() &&		assert(Info != CallGraphResourceInfo.end() &&
"Failed to find resource info for function");		"Failed to find resource info for function");
return Info->getSecond();		return Info->getSecond();
}		}

		const SIFunctionResourceInfo &getWorstCaseResourceInfo(const Module &M);

private:		private:
SIFunctionResourceInfo analyzeResourceUsage(const MachineFunction &MF,		void computeWorstCaseModuleRegisterUsage(const Module &M);
const TargetMachine &TM) const;
void propagateIndirectCallRegisterUsage();		SIFunctionResourceInfo analyzeResourceUsage(const MachineFunction &MF);

		const GCNTargetMachine *TM = nullptr;
DenseMap<const Function *, SIFunctionResourceInfo> CallGraphResourceInfo;		DenseMap<const Function *, SIFunctionResourceInfo> CallGraphResourceInfo;
		Optional<SIFunctionResourceInfo> ModuleWorstCaseInfo;
};		};
} // namespace llvm		} // namespace llvm
#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPURESOURCEUSAGEANALYSIS_H		#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPURESOURCEUSAGEANALYSIS_H

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp

Show All 19 Lines
/// hardware-entrypoints. Therefore the register usage of functions with		/// hardware-entrypoints. Therefore the register usage of functions with
/// indirect calls is estimated as the maximum of all non-entrypoint functions		/// indirect calls is estimated as the maximum of all non-entrypoint functions
/// in the module.		/// in the module.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUResourceUsageAnalysis.h"		#include "AMDGPUResourceUsageAnalysis.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
		#include "AMDGPUTargetMachine.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	int32_t AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo::getTotalNumVGPRs(
return getTotalNumVGPRs(ST, NumAGPR, NumVGPR);		return getTotalNumVGPRs(ST, NumAGPR, NumVGPR);
}		}

bool AMDGPUResourceUsageAnalysis::runOnSCC(CallGraphSCC &SCC) {		bool AMDGPUResourceUsageAnalysis::runOnSCC(CallGraphSCC &SCC) {
auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)		if (!TPC)
return false;		return false;

const TargetMachine &TM = TPC->getTM<TargetMachine>();		TM = static_cast<const GCNTargetMachine *>(&TPC->getTM<TargetMachine>());
bool HasIndirectCall = false;

for (CallGraphNode *I : SCC) {		for (CallGraphNode *I : SCC) {
Function *F = I->getFunction();		Function *F = I->getFunction();
if (!F \|\| F->isDeclaration())		if (!F \|\| F->isDeclaration())
continue;		continue;

MachineModuleInfo &MMI =		MachineModuleInfo &MMI =
getAnalysis<MachineModuleInfoWrapperPass>().getMMI();		getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
MachineFunction &MF = MMI.getOrCreateMachineFunction(*F);		MachineFunction &MF = MMI.getOrCreateMachineFunction(*F);

auto CI = CallGraphResourceInfo.insert(		auto CI = CallGraphResourceInfo.insert(
std::make_pair(&MF.getFunction(), SIFunctionResourceInfo()));		std::make_pair(&MF.getFunction(), SIFunctionResourceInfo()));
SIFunctionResourceInfo &Info = CI.first->second;		SIFunctionResourceInfo &Info = CI.first->second;
assert(CI.second && "should only be called once per function");		assert(CI.second && "should only be called once per function");
Info = analyzeResourceUsage(MF, TM);		Info = analyzeResourceUsage(MF);
HasIndirectCall \|= Info.HasIndirectCall;
}		}

if (HasIndirectCall)
propagateIndirectCallRegisterUsage();

return false;		return false;
}		}

AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo		AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo
AMDGPUResourceUsageAnalysis::analyzeResourceUsage(		AMDGPUResourceUsageAnalysis::analyzeResourceUsage(const MachineFunction &MF) {
const MachineFunction &MF, const TargetMachine &TM) const {
SIFunctionResourceInfo Info;		SIFunctionResourceInfo Info;

const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const MachineFrameInfo &FrameInfo = MF.getFrameInfo();		const MachineFrameInfo &FrameInfo = MF.getFrameInfo();
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo &TRI = TII->getRegisterInfo();		const SIRegisterInfo &TRI = TII->getRegisterInfo();
▲ Show 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	for (const MachineInstr &MI : MBB) {
}		}
}		}

if (IsIndirect \|\| I == CallGraphResourceInfo.end()) {		if (IsIndirect \|\| I == CallGraphResourceInfo.end()) {
CalleeFrameSize =		CalleeFrameSize =
std::max(CalleeFrameSize,		std::max(CalleeFrameSize,
static_cast<uint64_t>(AssumedStackSizeForExternalCall));		static_cast<uint64_t>(AssumedStackSizeForExternalCall));

		const SIFunctionResourceInfo &WorstCase =
		getWorstCaseResourceInfo(*MF.getFunction().getParent());
		MaxSGPR = std::max(WorstCase.NumExplicitSGPR - 1, MaxSGPR);
		MaxVGPR = std::max(WorstCase.NumVGPR - 1, MaxVGPR);
		MaxAGPR = std::max(WorstCase.NumAGPR - 1, MaxAGPR);

// Register usage of indirect calls gets handled later		// Register usage of indirect calls gets handled later
Info.UsesVCC = true;		Info.UsesVCC = true;
Info.UsesFlatScratch = ST.hasFlatAddressSpace();		Info.UsesFlatScratch \|=
		WorstCase.UsesFlatScratch && ST.hasFlatAddressSpace();
Info.HasDynamicallySizedStack = true;		Info.HasDynamicallySizedStack = true;
Info.HasIndirectCall = true;		Info.HasIndirectCall = true;
} else {		} else {
// We force CodeGen to run in SCC order, so the callee's register		// We force CodeGen to run in SCC order, so the callee's register
// usage etc. should be the cumulative usage of all callees.		// usage etc. should be the cumulative usage of all callees.
MaxSGPR = std::max(I->second.NumExplicitSGPR - 1, MaxSGPR);		MaxSGPR = std::max(I->second.NumExplicitSGPR - 1, MaxSGPR);
MaxVGPR = std::max(I->second.NumVGPR - 1, MaxVGPR);		MaxVGPR = std::max(I->second.NumVGPR - 1, MaxVGPR);
MaxAGPR = std::max(I->second.NumAGPR - 1, MaxAGPR);		MaxAGPR = std::max(I->second.NumAGPR - 1, MaxAGPR);
Show All 12 Lines	AMDGPUResourceUsageAnalysis::analyzeResourceUsage(const MachineFunction &MF) {
Info.NumExplicitSGPR = MaxSGPR + 1;		Info.NumExplicitSGPR = MaxSGPR + 1;
Info.NumVGPR = MaxVGPR + 1;		Info.NumVGPR = MaxVGPR + 1;
Info.NumAGPR = MaxAGPR + 1;		Info.NumAGPR = MaxAGPR + 1;
Info.PrivateSegmentSize += CalleeFrameSize;		Info.PrivateSegmentSize += CalleeFrameSize;

return Info;		return Info;
}		}

void AMDGPUResourceUsageAnalysis::propagateIndirectCallRegisterUsage() {		const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &
// Collect the maximum number of registers from non-hardware-entrypoints.		AMDGPUResourceUsageAnalysis::getWorstCaseResourceInfo(const Module &M) {
// All these functions are potential targets for indirect calls.		if (ModuleWorstCaseInfo)
int32_t NonKernelMaxSGPRs = 0;		return *ModuleWorstCaseInfo;
int32_t NonKernelMaxVGPRs = 0;
int32_t NonKernelMaxAGPRs = 0;		computeWorstCaseModuleRegisterUsage(M);
		return *ModuleWorstCaseInfo;
for (const auto &I : CallGraphResourceInfo) {		}
if (!AMDGPU::isEntryFunctionCC(I.getFirst()->getCallingConv())) {
auto &Info = I.getSecond();		/// Find the worst case register usage for all callable functions in the module,
NonKernelMaxSGPRs = std::max(NonKernelMaxSGPRs, Info.NumExplicitSGPR);		/// assuming all reachable functions are defined in the current module.
NonKernelMaxVGPRs = std::max(NonKernelMaxVGPRs, Info.NumVGPR);		void AMDGPUResourceUsageAnalysis::computeWorstCaseModuleRegisterUsage(
NonKernelMaxAGPRs = std::max(NonKernelMaxAGPRs, Info.NumAGPR);		const Module &M) {
}		assert(!ModuleWorstCaseInfo);
}		ModuleWorstCaseInfo = SIFunctionResourceInfo();
		ModuleWorstCaseInfo->UsesVCC = true;
		ModuleWorstCaseInfo->HasDynamicallySizedStack = true;
		ModuleWorstCaseInfo->HasRecursion = true;
		ModuleWorstCaseInfo->HasIndirectCall = true;

// Add register usage for functions with indirect calls.		for (const Function &F : M) {
// For calls to unknown functions, we assume the maximum register usage of		if (F.isIntrinsic())
// all non-hardware-entrypoints in the current module.		continue;
for (auto &I : CallGraphResourceInfo) {
auto &Info = I.getSecond();		if (AMDGPU::isEntryFunctionCC(F.getCallingConv()))
if (Info.HasIndirectCall) {		continue;
Info.NumExplicitSGPR = std::max(Info.NumExplicitSGPR, NonKernelMaxSGPRs);
Info.NumVGPR = std::max(Info.NumVGPR, NonKernelMaxVGPRs);		const GCNSubtarget &ST = TM->getSubtarget<GCNSubtarget>(F);
Info.NumAGPR = std::max(Info.NumAGPR, NonKernelMaxAGPRs);		const int32_t MaxVGPR = ST.getMaxNumVGPRs(F);
		const int32_t MaxSGPR = ST.getMaxNumSGPRs(F);

		ModuleWorstCaseInfo->NumVGPR =
		std::max(ModuleWorstCaseInfo->NumVGPR, MaxVGPR);

		if (ST.hasMAIInsts()) {
		const int32_t MaxAGPR = ST.getMaxNumAGPRs(F);
		ModuleWorstCaseInfo->NumAGPR =
		std::max(ModuleWorstCaseInfo->NumAGPR, MaxAGPR);
}		}

		ModuleWorstCaseInfo->NumExplicitSGPR =
		std::max(ModuleWorstCaseInfo->NumExplicitSGPR, MaxSGPR);

		ModuleWorstCaseInfo->UsesFlatScratch \|= ST.hasFlatAddressSpace();
}		}
}		}

llvm/lib/Target/AMDGPU/GCNSubtarget.h

Show First 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines	public:
/// requested using "amdgpu-num-vgpr" attribute attached to function \p F.		/// requested using "amdgpu-num-vgpr" attribute attached to function \p F.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumVGPRs(const Function &F) const;		unsigned getMaxNumVGPRs(const Function &F) const;

		unsigned getMaxNumAGPRs(const Function &F) const {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - unsigned getMaxNumAGPRs(const Function &F) const { - return getMaxNumVGPRs(F); - } + unsigned getMaxNumAGPRs(const Function &F) const { return getMaxNumVGPRs(F); } Lint: Pre-merge checks: clang-format: please reformat the code ``` - unsigned getMaxNumAGPRs(const Function &F) const…
		return getMaxNumVGPRs(F);
		}

/// \returns Maximum number of VGPRs that meets number of waves per execution		/// \returns Maximum number of VGPRs that meets number of waves per execution
/// unit requirement for function \p MF, or number of VGPRs explicitly		/// unit requirement for function \p MF, or number of VGPRs explicitly
/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.		/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/agpr-register-count.ll

	Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	bb:			bb:
	call void @func_32_agprs() #0			call void @func_32_agprs() #0
	ret void			ret void
	}			}

	declare void @undef_func()			declare void @undef_func()

	; GCN-LABEL: {{^}}kernel_call_undef_func:			; GCN-LABEL: {{^}}kernel_call_undef_func:
	; GFX908: .amdhsa_next_free_vgpr 32			; GFX908: .amdhsa_next_free_vgpr 128
	; GFX90A: .amdhsa_next_free_vgpr 64			; GFX90A: .amdhsa_next_free_vgpr 512
	; GFX90A: .amdhsa_accum_offset 32			; GFX90A: .amdhsa_accum_offset 256
	; GCN908: NumVgprs: 128			; GCN908: NumVgprs: 128
				; GCN908: NumAgprs: 128
	; GCN90A: NumVgprs: 256			; GCN90A: NumVgprs: 256
	; GCN: NumAgprs: 32			; GCN90A: NumAgprs: 256
	; GFX908: TotalNumVgprs: 32			; GFX908: TotalNumVgprs: 128
	; GFX90A: TotalNumVgprs: 64			; GFX90A: TotalNumVgprs: 512
	; GFX908: VGPRBlocks: 7			; GFX908: VGPRBlocks: 31
	; GFX90A: VGPRBlocks: 7			; GFX90A: VGPRBlocks: 63
	; GFX908: NumVGPRsForWavesPerEU: 32			; GFX908: NumVGPRsForWavesPerEU: 128
	; GFX90A: NumVGPRsForWavesPerEU: 64			; GFX90A: NumVGPRsForWavesPerEU: 512
	; GFX90A: AccumOffset: 32			; GFX90A: AccumOffset: 256
	; GFX908: Occupancy: 8			; GFX908: Occupancy: 2
	; GFX90A: Occupancy: 8			; GFX90A: Occupancy: 1
	; GFX90A: COMPUTE_PGM_RSRC3_GFX90A:ACCUM_OFFSET: 7			; GFX90A: COMPUTE_PGM_RSRC3_GFX90A:ACCUM_OFFSET: 63
	define amdgpu_kernel void @kernel_call_undef_func() #0 {			define amdgpu_kernel void @kernel_call_undef_func() #0 {
	bb:			bb:
	call void @undef_func()			call void @undef_func()
	ret void			ret void
	}			}

	attributes #0 = { nounwind noinline "amdgpu-flat-work-group-size"="1,512" }			attributes #0 = { nounwind noinline "amdgpu-flat-work-group-size"="1,512" }

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	define amdgpu_gfx float @simple_lds_recurse(float %arg0) #0 {
%res = call amdgpu_gfx float @simple_lds_recurse(float %val)		%res = call amdgpu_gfx float @simple_lds_recurse(float %val)
ret float %res		ret float %res
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

; GCN: amdpal.pipelines:		; GCN: amdpal.pipelines:
; GCN-NEXT: - .registers:		; GCN-NEXT: - .registers:
; SDAG-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf01ca{{$}}		; SDAG-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf03cf{{$}}
; GISEL-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf01ce{{$}}		; GISEL-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf03cf{{$}}
; GCN-NEXT: 0x2e13 (COMPUTE_PGM_RSRC2): 0x8001{{$}}		; GCN-NEXT: 0x2e13 (COMPUTE_PGM_RSRC2): 0x8001{{$}}
; GCN-NEXT: .shader_functions:		; GCN-NEXT: .shader_functions:
; GCN-NEXT: dynamic_stack:		; GCN-NEXT: dynamic_stack:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GCN-NEXT: .sgpr_count: 0x28{{$}}		; GCN-NEXT: .sgpr_count: 0x28{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
; SDAG-NEXT: .vgpr_count: 0x2{{$}}		; SDAG-NEXT: .vgpr_count: 0x2{{$}}
; GISEL-NEXT: .vgpr_count: 0x3{{$}}		; GISEL-NEXT: .vgpr_count: 0x3{{$}}
Show All 16 Lines
; GCN-NEXT: .vgpr_count: 0x1{{$}}		; GCN-NEXT: .vgpr_count: 0x1{{$}}
; GCN-NEXT: no_stack_call:		; GCN-NEXT: no_stack_call:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GCN-NEXT: .sgpr_count: 0x26{{$}}		; GCN-NEXT: .sgpr_count: 0x26{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
; GCN-NEXT: .vgpr_count: 0x2{{$}}		; GCN-NEXT: .vgpr_count: 0x2{{$}}
; GCN-NEXT: no_stack_extern_call:		; GCN-NEXT: no_stack_extern_call:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GFX8-NEXT: .sgpr_count: 0x28{{$}}		; GFX8-NEXT: .sgpr_count: 0x68{{$}}
; GFX9-NEXT: .sgpr_count: 0x2c{{$}}		; GFX9-NEXT: .sgpr_count: 0x6c{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
; GCN-NEXT: .vgpr_count: 0x29{{$}}		; GCN-NEXT: .vgpr_count: 0x40{{$}}
		sebastian-neUnsubmitted Not Done Reply Inline Actions This is over-approximating the vgpr_count whenever an indirect call is involved, which is quite a performance hit. Can we switch AMDGPUResourceUsageAnalysis to a ModulePass and run `propagateIndirectCallRegisterUsage` at the end, so that all functions with indirect calls will get the maximum VGPR count of all functions in the module? (As opposed to max VGPR count of the SCC that is used currently, which I did not intend.) sebastian-ne: This is over-approximating the vgpr_count whenever an indirect call is involved, which is quite…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I'd like to move switching to a module pass into a follow up patch. I'm a bit afraid of unintended side effects by switching to a module pass. We're already paying a compile time cost by using SCC codegen, and module passes will be worse arsenm: I'd like to move switching to a module pass into a follow up patch. I'm a bit afraid of…
		sebastian-neUnsubmitted Not Done Reply Inline Actions If the over-approximation goes in, we’ll have to revert it in our graphics branch, otherwise any benchmarks or optimization work that is going on would be meaningless (we had to locally revert D103636 for that reason). So, I’d prefer it if we could fix this in one push, if you think that’s possible. sebastian-ne: If the over-approximation goes in, we’ll have to revert it in our graphics branch, otherwise…
		arsenmAuthorUnsubmitted Done Reply Inline Actions This is just completely broken as is, any benchmarks are working by accident arsenm: This is just completely broken as is, any benchmarks are working by accident
		sebastian-neUnsubmitted Not Done Reply Inline Actions The current graphics use-case is not affected by this bug. We’re only compiling single functions per LLVM module and finding the maximum register usage and linking is done by the loader (PAL in this case). sebastian-ne: The current graphics use-case is not affected by this bug. We’re only compiling single…
		arsenmAuthorUnsubmitted Done Reply Inline Actions So then it also wouldn't be impacted by this change? arsenm: So then it also wouldn't be impacted by this change?
		sebastian-neUnsubmitted Not Done Reply Inline Actions The compiled modules contain only a single function, but the function contains indirect functions calls. So it hits the code path here that uses `getMaxNumVGPRs(F)` to approximate the register usage. The VGPR usage for the compiled function should just stay as it is, even if it contains an indirect call, because there is no other function (in the module) that could be called. I tested this patch on a pipeline and the reported VGPR usage goes from 140 VGPRs to 256 VGPRs, noticeably reducing the occupancy. As said, I’d prefer if this does not go in without an accompanying patch that makes the VGPR usage accurate again. If you think that’s unreasonable, I’ll unblock this patch and revert it in our graphics branch until the follow up patch. sebastian-ne: The compiled modules contain only a single function, but the function contains indirect…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I will push them both at the same time, but I'd like to defensively keep this and D117504 as separate commits in case something goes wrong arsenm: I will push them both at the same time, but I'd like to defensively keep this and D117504 as…
		sebastian-neUnsubmitted Not Done Reply Inline Actions Thank you, I’ll have a look tomorrow. sebastian-ne: Thank you, I’ll have a look tomorrow.
; GCN-NEXT: no_stack_extern_call_many_args:		; GCN-NEXT: no_stack_extern_call_many_args:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GFX8-NEXT: .sgpr_count: 0x28{{$}}		; GFX8-NEXT: .sgpr_count: 0x68{{$}}
; GFX9-NEXT: .sgpr_count: 0x2c{{$}}		; GFX9-NEXT: .sgpr_count: 0x6c{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}
; SDAG-NEXT: .vgpr_count: 0x2a{{$}}		; SDAG-NEXT: .vgpr_count: 0x40{{$}}
; GISEL-NEXT: .vgpr_count: 0x34{{$}}		; GISEL-NEXT: .vgpr_count: 0x40{{$}}
; GCN-NEXT: no_stack_indirect_call:		; GCN-NEXT: no_stack_indirect_call:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GFX8-NEXT: .sgpr_count: 0x28{{$}}		; GFX8-NEXT: .sgpr_count: 0x68{{$}}
; GFX9-NEXT: .sgpr_count: 0x2c{{$}}		; GFX9-NEXT: .sgpr_count: 0x6c{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
; SDAG-NEXT: .vgpr_count: 0x2a{{$}}		; SDAG-NEXT: .vgpr_count: 0x40{{$}}
; GISEL-NEXT: .vgpr_count: 0x34{{$}}		; GISEL-NEXT: .vgpr_count: 0x40{{$}}
; GCN-NEXT: simple_lds:		; GCN-NEXT: simple_lds:
; GCN-NEXT: .lds_size: 0x100{{$}}		; GCN-NEXT: .lds_size: 0x100{{$}}
; GCN-NEXT: .sgpr_count: 0x20{{$}}		; GCN-NEXT: .sgpr_count: 0x20{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
; GCN-NEXT: .vgpr_count: 0x1{{$}}		; GCN-NEXT: .vgpr_count: 0x1{{$}}
; GCN-NEXT: simple_lds_recurse:		; GCN-NEXT: simple_lds_recurse:
; GCN-NEXT: .lds_size: 0x100{{$}}		; GCN-NEXT: .lds_size: 0x100{{$}}
; GCN-NEXT: .sgpr_count: 0x26{{$}}		; GCN-NEXT: .sgpr_count: 0x26{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
; GCN-NEXT: .vgpr_count: 0x29{{$}}		; GCN-NEXT: .vgpr_count: 0x29{{$}}
; GCN-NEXT: simple_stack:		; GCN-NEXT: simple_stack:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GCN-NEXT: .sgpr_count: 0x21{{$}}		; GCN-NEXT: .sgpr_count: 0x21{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}
; GCN-NEXT: .vgpr_count: 0x2{{$}}		; GCN-NEXT: .vgpr_count: 0x2{{$}}
; GCN-NEXT: simple_stack_call:		; GCN-NEXT: simple_stack_call:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GCN-NEXT: .sgpr_count: 0x26{{$}}		; GCN-NEXT: .sgpr_count: 0x26{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
; GCN-NEXT: .vgpr_count: 0x3{{$}}		; GCN-NEXT: .vgpr_count: 0x3{{$}}
; GCN-NEXT: simple_stack_extern_call:		; GCN-NEXT: simple_stack_extern_call:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GFX8-NEXT: .sgpr_count: 0x28{{$}}		; GFX8-NEXT: .sgpr_count: 0x68{{$}}
; GFX9-NEXT: .sgpr_count: 0x2c{{$}}		; GFX9-NEXT: .sgpr_count: 0x6c{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
; GCN-NEXT: .vgpr_count: 0x2a{{$}}		; GCN-NEXT: .vgpr_count: 0x40{{$}}
; GCN-NEXT: simple_stack_indirect_call:		; GCN-NEXT: simple_stack_indirect_call:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GFX8-NEXT: .sgpr_count: 0x28{{$}}		; GFX8-NEXT: .sgpr_count: 0x68{{$}}
; GFX9-NEXT: .sgpr_count: 0x2c{{$}}		; GFX9-NEXT: .sgpr_count: 0x6c{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
; SDAG-NEXT: .vgpr_count: 0x2b{{$}}		; SDAG-NEXT: .vgpr_count: 0x40{{$}}
; GISEL-NEXT: .vgpr_count: 0x34{{$}}		; GISEL-NEXT: .vgpr_count: 0x40{{$}}
; GCN-NEXT: simple_stack_recurse:		; GCN-NEXT: simple_stack_recurse:
; GCN-NEXT: .lds_size: 0{{$}}		; GCN-NEXT: .lds_size: 0{{$}}
; GCN-NEXT: .sgpr_count: 0x26{{$}}		; GCN-NEXT: .sgpr_count: 0x26{{$}}
; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}		; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
; GCN-NEXT: .vgpr_count: 0x2a{{$}}		; GCN-NEXT: .vgpr_count: 0x2a{{$}}
; GCN-NEXT: ...		; GCN-NEXT: ...

llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll

Show First 20 Lines • Show All 550 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @f512() #512 {
call void @foo()		call void @foo()
call void @use256vgprs()		call void @use256vgprs()
ret void		ret void
}		}
attributes #512 = { nounwind "amdgpu-flat-work-group-size"="512,512" }		attributes #512 = { nounwind "amdgpu-flat-work-group-size"="512,512" }

; GCN-LABEL: {{^}}f1024:		; GCN-LABEL: {{^}}f1024:
; GFX9: NumVgprs: 64		; GFX9: NumVgprs: 64
; GFX90A: NumVgprs: 64		; GFX90A: NumVgprs: 128
; GFX90A: NumAgprs: 64		; GFX90A: NumAgprs: 128
; GFX90A: TotalNumVgprs: 128		; GFX90A: TotalNumVgprs: 256
; GFX10WGP-WAVE32: NumVgprs: 128		; GFX10WGP-WAVE32: NumVgprs: 128
; GFX10WGP-WAVE64: NumVgprs: 128		; GFX10WGP-WAVE64: NumVgprs: 128
; GFX10CU-WAVE32: NumVgprs: 64		; GFX10CU-WAVE32: NumVgprs: 64
; GFX10CU-WAVE64: NumVgprs: 64		; GFX10CU-WAVE64: NumVgprs: 64
define amdgpu_kernel void @f1024() #1024 {		define amdgpu_kernel void @f1024() #1024 {
call void @foo()		call void @foo()
call void @use256vgprs()		call void @use256vgprs()
ret void		ret void
}		}

attributes #1024 = { nounwind "amdgpu-flat-work-group-size"="1024,1024" }		attributes #1024 = { nounwind "amdgpu-flat-work-group-size"="1024,1024" }

declare void @foo()		declare void @foo()

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

This file was added.

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s \| FileCheck -check-prefixes=ALL,GFX908 %s
				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s \| FileCheck -check-prefixes=ALL,GFX90A %s

				; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
				; through aliases. If GlobalOpt is never run, we do not see direct
				; calls,

				@alias = hidden alias void (), void ()* @aliasee_default

				; ALL-LABEL: {{^}}kernel:
				; GFX908: .amdhsa_next_free_vgpr 64
				; GFX908-NEXT: .amdhsa_next_free_sgpr 102

				; GFX90A: .amdhsa_next_free_vgpr 256
				; GFX90A-NEXT: .amdhsa_next_free_sgpr 102
				; GFX90A-NEXT: .amdhsa_accum_offset 128
				define amdgpu_kernel void @kernel() #0 {
				bb:
				call void @alias() #2
				ret void
				}

				define internal void @aliasee_default() #1 {
				bb:
				call void asm sideeffect "; clobber a26 ", "~{a26}"()
				ret void
				}

				attributes #0 = { noinline norecurse nounwind optnone }
				attributes #1 = { noinline norecurse nounwind readnone willreturn }
				attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll

This file was added.

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

				; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
				; through aliases. If GlobalOpt is never run, we do not see direct
				; calls,

				@alias0 = hidden alias void (), void ()* @aliasee_default_vgpr64_sgpr102

				; CHECK-LABEL: {{^}}kernel0:
				; CHECK: .amdhsa_next_free_vgpr 64
				; CHECK-NEXT: .amdhsa_next_free_sgpr 102
				define amdgpu_kernel void @kernel0() #0 {
				bb:
				call void @alias0() #2
				ret void
				}

				define internal void @aliasee_default_vgpr64_sgpr102() #1 {
				bb:
				call void asm sideeffect "; clobber v52 ", "~{v52}"()
				ret void
				}

				attributes #0 = { noinline norecurse nounwind optnone }
				attributes #1 = { noinline norecurse nounwind readnone willreturn }
				attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

This file was added.

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

				; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
				; through aliases. If GlobalOpt is never run, we do not see direct
				; calls,

				@alias1 = hidden alias void (), void ()* @aliasee_vgpr32_sgpr76

				; The parent kernel has a higher VGPR usage than the possible callees.

				; CHECK-LABEL: {{^}}kernel1:
				; CHECK: .amdhsa_next_free_vgpr 42
				; CHECK-NEXT: .amdhsa_next_free_sgpr 74
				define amdgpu_kernel void @kernel1() #0 {
				bb:
				call void asm sideeffect "; clobber v40 ", "~{v40}"()
				call void @alias1() #2
				ret void
				}

				define internal void @aliasee_vgpr32_sgpr76() #1 {
				bb:
				call void asm sideeffect "; clobber v26 ", "~{v26}"()
				ret void
				}

				attributes #0 = { noinline norecurse nounwind optnone }
				attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-waves-per-eu"="8,10" }
				attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll

This file was added.

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

				; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
				; through aliases. If GlobalOpt is never run, we do not see direct
				; calls,

				@alias2 = hidden alias void (), void()* @aliasee_vgpr64_sgpr102

				; CHECK-LABEL: {{^}}kernel2:
				; CHECK: .amdhsa_next_free_vgpr 64
				; CHECK-NEXT: .amdhsa_next_free_sgpr 102
				define amdgpu_kernel void @kernel2() #0 {
				bb:
				call void @alias2() #2
				ret void
				}

				define internal void @aliasee_vgpr64_sgpr102() #1 {
				bb:
				call void asm sideeffect "; clobber v52 ", "~{v52}"()
				ret void
				}

				attributes #0 = { noinline norecurse nounwind optnone }
				attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-waves-per-eu"="4,10" }
				attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll

This file was added.

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s

				; CallGraphAnalysis, which CodeGenSCC order depends on, does not look
				; through aliases. If GlobalOpt is never run, we do not see direct
				; calls,

				@alias3 = hidden alias void (), void ()* @aliasee_vgpr256_sgpr102

				; CHECK-LABEL: {{^}}kernel3:
				; CHECK: .amdhsa_next_free_vgpr 256
				; CHECK-NEXT: .amdhsa_next_free_sgpr 102
				define amdgpu_kernel void @kernel3() #0 {
				bb:
				call void @alias3() #2
				ret void
				}

				define internal void @aliasee_vgpr256_sgpr102() #1 {
				bb:
				call void asm sideeffect "; clobber v252 ", "~{v252}"()
				ret void
				}

				attributes #0 = { noinline norecurse nounwind optnone }
				attributes #1 = { noinline norecurse nounwind readnone willreturn "amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-per-eu"="1,1" }
				attributes #2 = { nounwind readnone willreturn }

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

	Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @usage_direct_recursion(i32 %n) #0 {			define amdgpu_kernel void @usage_direct_recursion(i32 %n) #0 {
	call void @direct_recursion_use_stack(i32 %n)			call void @direct_recursion_use_stack(i32 %n)
	ret void			ret void
	}			}

	; Make sure there's no assert when a sgpr96 is used.			; Make sure there's no assert when a sgpr96 is used.
	; GCN-LABEL: {{^}}count_use_sgpr96_external_call			; GCN-LABEL: {{^}}count_use_sgpr96_external_call
	; GCN: ; sgpr96 s[{{[0-9]+}}:{{[0-9]+}}]			; GCN: ; sgpr96 s[{{[0-9]+}}:{{[0-9]+}}]
	; CI: NumSgprs: 84			; CI: NumSgprs: 104
	; VI-NOBUG: NumSgprs: 86			; VI-NOBUG: NumSgprs: 108
	; VI-BUG: NumSgprs: 96			; VI-BUG: NumSgprs: 96
	; GCN: NumVgprs: 50			; GCN: NumVgprs: 64
	define amdgpu_kernel void @count_use_sgpr96_external_call() {			define amdgpu_kernel void @count_use_sgpr96_external_call() {
	entry:			entry:
	tail call void asm sideeffect "; sgpr96 $0", "s"(<3 x i32> <i32 10, i32 11, i32 12>) #1			tail call void asm sideeffect "; sgpr96 $0", "s"(<3 x i32> <i32 10, i32 11, i32 12>) #1
	call void @external()			call void @external()
	ret void			ret void
	}			}

	; Make sure there's no assert when a sgpr160 is used.			; Make sure there's no assert when a sgpr160 is used.
	; GCN-LABEL: {{^}}count_use_sgpr160_external_call			; GCN-LABEL: {{^}}count_use_sgpr160_external_call
	; GCN: ; sgpr160 s[{{[0-9]+}}:{{[0-9]+}}]			; GCN: ; sgpr160 s[{{[0-9]+}}:{{[0-9]+}}]
	; CI: NumSgprs: 84			; CI: NumSgprs: 104
	; VI-NOBUG: NumSgprs: 86			; VI-NOBUG: NumSgprs: 108
	; VI-BUG: NumSgprs: 96			; VI-BUG: NumSgprs: 96
	; GCN: NumVgprs: 50			; GCN: NumVgprs: 64
	define amdgpu_kernel void @count_use_sgpr160_external_call() {			define amdgpu_kernel void @count_use_sgpr160_external_call() {
	entry:			entry:
	tail call void asm sideeffect "; sgpr160 $0", "s"(<5 x i32> <i32 10, i32 11, i32 12, i32 13, i32 14>) #1			tail call void asm sideeffect "; sgpr160 $0", "s"(<5 x i32> <i32 10, i32 11, i32 12, i32 13, i32 14>) #1
	call void @external()			call void @external()
	ret void			ret void
	}			}

	; Make sure there's no assert when a vgpr160 is used.			; Make sure there's no assert when a vgpr160 is used.
	; GCN-LABEL: {{^}}count_use_vgpr160_external_call			; GCN-LABEL: {{^}}count_use_vgpr160_external_call
	; GCN: ; vgpr160 v[{{[0-9]+}}:{{[0-9]+}}]			; GCN: ; vgpr160 v[{{[0-9]+}}:{{[0-9]+}}]
	; CI: NumSgprs: 84			; CI: NumSgprs: 104
	; VI-NOBUG: NumSgprs: 86			; VI-NOBUG: NumSgprs: 108
	; VI-BUG: NumSgprs: 96			; VI-BUG: NumSgprs: 96
	; GCN: NumVgprs: 50			; GCN: NumVgprs: 64
	define amdgpu_kernel void @count_use_vgpr160_external_call() {			define amdgpu_kernel void @count_use_vgpr160_external_call() {
	entry:			entry:
	tail call void asm sideeffect "; vgpr160 $0", "v"(<5 x i32> <i32 10, i32 11, i32 12, i32 13, i32 14>) #1			tail call void asm sideeffect "; vgpr160 $0", "v"(<5 x i32> <i32 10, i32 11, i32 12, i32 13, i32 14>) #1
	call void @external()			call void @external()
	ret void			ret void
	}			}

	attributes #0 = { nounwind noinline norecurse }			attributes #0 = { nounwind noinline norecurse }
	attributes #1 = { nounwind noinline norecurse }			attributes #1 = { nounwind noinline norecurse }
	attributes #2 = { nounwind noinline }			attributes #2 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/indirect-call.ll

	Show All 10 Lines
	; GCN-NEXT: amd_code_version_major = 1			; GCN-NEXT: amd_code_version_major = 1
	; GCN-NEXT: amd_code_version_minor = 2			; GCN-NEXT: amd_code_version_minor = 2
	; GCN-NEXT: amd_machine_kind = 1			; GCN-NEXT: amd_machine_kind = 1
	; GCN-NEXT: amd_machine_version_major = 7			; GCN-NEXT: amd_machine_version_major = 7
	; GCN-NEXT: amd_machine_version_minor = 0			; GCN-NEXT: amd_machine_version_minor = 0
	; GCN-NEXT: amd_machine_version_stepping = 0			; GCN-NEXT: amd_machine_version_stepping = 0
	; GCN-NEXT: kernel_code_entry_byte_offset = 256			; GCN-NEXT: kernel_code_entry_byte_offset = 256
	; GCN-NEXT: kernel_code_prefetch_byte_size = 0			; GCN-NEXT: kernel_code_prefetch_byte_size = 0
	; GCN-NEXT: granulated_workitem_vgpr_count = 7			; GCN-NEXT: granulated_workitem_vgpr_count = 15
	; GCN-NEXT: granulated_wavefront_sgpr_count = 4			; GCN-NEXT: granulated_wavefront_sgpr_count = 12
	; GCN-NEXT: priority = 0			; GCN-NEXT: priority = 0
	; GCN-NEXT: float_mode = 240			; GCN-NEXT: float_mode = 240
	; GCN-NEXT: priv = 0			; GCN-NEXT: priv = 0
	; GCN-NEXT: enable_dx10_clamp = 1			; GCN-NEXT: enable_dx10_clamp = 1
	; GCN-NEXT: debug_mode = 0			; GCN-NEXT: debug_mode = 0
	; GCN-NEXT: enable_ieee_mode = 1			; GCN-NEXT: enable_ieee_mode = 1
	; GCN-NEXT: enable_wgp_mode = 0			; GCN-NEXT: enable_wgp_mode = 0
	; GCN-NEXT: enable_mem_ordered = 0			; GCN-NEXT: enable_mem_ordered = 0
	Show All 26 Lines
	; GCN-NEXT: is_dynamic_callstack = 1			; GCN-NEXT: is_dynamic_callstack = 1
	; GCN-NEXT: is_debug_enabled = 0			; GCN-NEXT: is_debug_enabled = 0
	; GCN-NEXT: is_xnack_enabled = 0			; GCN-NEXT: is_xnack_enabled = 0
	; GCN-NEXT: workitem_private_segment_byte_size = 16384			; GCN-NEXT: workitem_private_segment_byte_size = 16384
	; GCN-NEXT: workgroup_group_segment_byte_size = 0			; GCN-NEXT: workgroup_group_segment_byte_size = 0
	; GCN-NEXT: gds_segment_byte_size = 0			; GCN-NEXT: gds_segment_byte_size = 0
	; GCN-NEXT: kernarg_segment_byte_size = 64			; GCN-NEXT: kernarg_segment_byte_size = 64
	; GCN-NEXT: workgroup_fbarrier_count = 0			; GCN-NEXT: workgroup_fbarrier_count = 0
	; GCN-NEXT: wavefront_sgpr_count = 37			; GCN-NEXT: wavefront_sgpr_count = 104
	; GCN-NEXT: workitem_vgpr_count = 32			; GCN-NEXT: workitem_vgpr_count = 64
	; GCN-NEXT: reserved_vgpr_first = 0			; GCN-NEXT: reserved_vgpr_first = 0
	; GCN-NEXT: reserved_vgpr_count = 0			; GCN-NEXT: reserved_vgpr_count = 0
	; GCN-NEXT: reserved_sgpr_first = 0			; GCN-NEXT: reserved_sgpr_first = 0
	; GCN-NEXT: reserved_sgpr_count = 0			; GCN-NEXT: reserved_sgpr_count = 0
	; GCN-NEXT: debug_wavefront_private_segment_offset_sgpr = 0			; GCN-NEXT: debug_wavefront_private_segment_offset_sgpr = 0
	; GCN-NEXT: debug_private_segment_buffer_sgpr = 0			; GCN-NEXT: debug_private_segment_buffer_sgpr = 0
	; GCN-NEXT: kernarg_segment_alignment = 4			; GCN-NEXT: kernarg_segment_alignment = 4
	; GCN-NEXT: group_segment_alignment = 4			; GCN-NEXT: group_segment_alignment = 4
	Show All 31 Lines
	; GISEL-NEXT: amd_code_version_major = 1			; GISEL-NEXT: amd_code_version_major = 1
	; GISEL-NEXT: amd_code_version_minor = 2			; GISEL-NEXT: amd_code_version_minor = 2
	; GISEL-NEXT: amd_machine_kind = 1			; GISEL-NEXT: amd_machine_kind = 1
	; GISEL-NEXT: amd_machine_version_major = 7			; GISEL-NEXT: amd_machine_version_major = 7
	; GISEL-NEXT: amd_machine_version_minor = 0			; GISEL-NEXT: amd_machine_version_minor = 0
	; GISEL-NEXT: amd_machine_version_stepping = 0			; GISEL-NEXT: amd_machine_version_stepping = 0
	; GISEL-NEXT: kernel_code_entry_byte_offset = 256			; GISEL-NEXT: kernel_code_entry_byte_offset = 256
	; GISEL-NEXT: kernel_code_prefetch_byte_size = 0			; GISEL-NEXT: kernel_code_prefetch_byte_size = 0
	; GISEL-NEXT: granulated_workitem_vgpr_count = 7			; GISEL-NEXT: granulated_workitem_vgpr_count = 15
	; GISEL-NEXT: granulated_wavefront_sgpr_count = 4			; GISEL-NEXT: granulated_wavefront_sgpr_count = 12
	; GISEL-NEXT: priority = 0			; GISEL-NEXT: priority = 0
	; GISEL-NEXT: float_mode = 240			; GISEL-NEXT: float_mode = 240
	; GISEL-NEXT: priv = 0			; GISEL-NEXT: priv = 0
	; GISEL-NEXT: enable_dx10_clamp = 1			; GISEL-NEXT: enable_dx10_clamp = 1
	; GISEL-NEXT: debug_mode = 0			; GISEL-NEXT: debug_mode = 0
	; GISEL-NEXT: enable_ieee_mode = 1			; GISEL-NEXT: enable_ieee_mode = 1
	; GISEL-NEXT: enable_wgp_mode = 0			; GISEL-NEXT: enable_wgp_mode = 0
	; GISEL-NEXT: enable_mem_ordered = 0			; GISEL-NEXT: enable_mem_ordered = 0
	Show All 26 Lines
	; GISEL-NEXT: is_dynamic_callstack = 1			; GISEL-NEXT: is_dynamic_callstack = 1
	; GISEL-NEXT: is_debug_enabled = 0			; GISEL-NEXT: is_debug_enabled = 0
	; GISEL-NEXT: is_xnack_enabled = 0			; GISEL-NEXT: is_xnack_enabled = 0
	; GISEL-NEXT: workitem_private_segment_byte_size = 16384			; GISEL-NEXT: workitem_private_segment_byte_size = 16384
	; GISEL-NEXT: workgroup_group_segment_byte_size = 0			; GISEL-NEXT: workgroup_group_segment_byte_size = 0
	; GISEL-NEXT: gds_segment_byte_size = 0			; GISEL-NEXT: gds_segment_byte_size = 0
	; GISEL-NEXT: kernarg_segment_byte_size = 64			; GISEL-NEXT: kernarg_segment_byte_size = 64
	; GISEL-NEXT: workgroup_fbarrier_count = 0			; GISEL-NEXT: workgroup_fbarrier_count = 0
	; GISEL-NEXT: wavefront_sgpr_count = 37			; GISEL-NEXT: wavefront_sgpr_count = 104
	; GISEL-NEXT: workitem_vgpr_count = 32			; GISEL-NEXT: workitem_vgpr_count = 64
	; GISEL-NEXT: reserved_vgpr_first = 0			; GISEL-NEXT: reserved_vgpr_first = 0
	; GISEL-NEXT: reserved_vgpr_count = 0			; GISEL-NEXT: reserved_vgpr_count = 0
	; GISEL-NEXT: reserved_sgpr_first = 0			; GISEL-NEXT: reserved_sgpr_first = 0
	; GISEL-NEXT: reserved_sgpr_count = 0			; GISEL-NEXT: reserved_sgpr_count = 0
	; GISEL-NEXT: debug_wavefront_private_segment_offset_sgpr = 0			; GISEL-NEXT: debug_wavefront_private_segment_offset_sgpr = 0
	; GISEL-NEXT: debug_private_segment_buffer_sgpr = 0			; GISEL-NEXT: debug_private_segment_buffer_sgpr = 0
	; GISEL-NEXT: kernarg_segment_alignment = 4			; GISEL-NEXT: kernarg_segment_alignment = 4
	; GISEL-NEXT: group_segment_alignment = 4			; GISEL-NEXT: group_segment_alignment = 4
	Show All 36 Lines
	; GCN-NEXT: amd_code_version_major = 1			; GCN-NEXT: amd_code_version_major = 1
	; GCN-NEXT: amd_code_version_minor = 2			; GCN-NEXT: amd_code_version_minor = 2
	; GCN-NEXT: amd_machine_kind = 1			; GCN-NEXT: amd_machine_kind = 1
	; GCN-NEXT: amd_machine_version_major = 7			; GCN-NEXT: amd_machine_version_major = 7
	; GCN-NEXT: amd_machine_version_minor = 0			; GCN-NEXT: amd_machine_version_minor = 0
	; GCN-NEXT: amd_machine_version_stepping = 0			; GCN-NEXT: amd_machine_version_stepping = 0
	; GCN-NEXT: kernel_code_entry_byte_offset = 256			; GCN-NEXT: kernel_code_entry_byte_offset = 256
	; GCN-NEXT: kernel_code_prefetch_byte_size = 0			; GCN-NEXT: kernel_code_prefetch_byte_size = 0
	; GCN-NEXT: granulated_workitem_vgpr_count = 7			; GCN-NEXT: granulated_workitem_vgpr_count = 15
	; GCN-NEXT: granulated_wavefront_sgpr_count = 4			; GCN-NEXT: granulated_wavefront_sgpr_count = 12
	; GCN-NEXT: priority = 0			; GCN-NEXT: priority = 0
	; GCN-NEXT: float_mode = 240			; GCN-NEXT: float_mode = 240
	; GCN-NEXT: priv = 0			; GCN-NEXT: priv = 0
	; GCN-NEXT: enable_dx10_clamp = 1			; GCN-NEXT: enable_dx10_clamp = 1
	; GCN-NEXT: debug_mode = 0			; GCN-NEXT: debug_mode = 0
	; GCN-NEXT: enable_ieee_mode = 1			; GCN-NEXT: enable_ieee_mode = 1
	; GCN-NEXT: enable_wgp_mode = 0			; GCN-NEXT: enable_wgp_mode = 0
	; GCN-NEXT: enable_mem_ordered = 0			; GCN-NEXT: enable_mem_ordered = 0
	Show All 26 Lines
	; GCN-NEXT: is_dynamic_callstack = 1			; GCN-NEXT: is_dynamic_callstack = 1
	; GCN-NEXT: is_debug_enabled = 0			; GCN-NEXT: is_debug_enabled = 0
	; GCN-NEXT: is_xnack_enabled = 0			; GCN-NEXT: is_xnack_enabled = 0
	; GCN-NEXT: workitem_private_segment_byte_size = 16384			; GCN-NEXT: workitem_private_segment_byte_size = 16384
	; GCN-NEXT: workgroup_group_segment_byte_size = 0			; GCN-NEXT: workgroup_group_segment_byte_size = 0
	; GCN-NEXT: gds_segment_byte_size = 0			; GCN-NEXT: gds_segment_byte_size = 0
	; GCN-NEXT: kernarg_segment_byte_size = 64			; GCN-NEXT: kernarg_segment_byte_size = 64
	; GCN-NEXT: workgroup_fbarrier_count = 0			; GCN-NEXT: workgroup_fbarrier_count = 0
	; GCN-NEXT: wavefront_sgpr_count = 37			; GCN-NEXT: wavefront_sgpr_count = 104
	; GCN-NEXT: workitem_vgpr_count = 32			; GCN-NEXT: workitem_vgpr_count = 64
	; GCN-NEXT: reserved_vgpr_first = 0			; GCN-NEXT: reserved_vgpr_first = 0
	; GCN-NEXT: reserved_vgpr_count = 0			; GCN-NEXT: reserved_vgpr_count = 0
	; GCN-NEXT: reserved_sgpr_first = 0			; GCN-NEXT: reserved_sgpr_first = 0
	; GCN-NEXT: reserved_sgpr_count = 0			; GCN-NEXT: reserved_sgpr_count = 0
	; GCN-NEXT: debug_wavefront_private_segment_offset_sgpr = 0			; GCN-NEXT: debug_wavefront_private_segment_offset_sgpr = 0
	; GCN-NEXT: debug_private_segment_buffer_sgpr = 0			; GCN-NEXT: debug_private_segment_buffer_sgpr = 0
	; GCN-NEXT: kernarg_segment_alignment = 4			; GCN-NEXT: kernarg_segment_alignment = 4
	; GCN-NEXT: group_segment_alignment = 4			; GCN-NEXT: group_segment_alignment = 4
	Show All 32 Lines
	; GISEL-NEXT: amd_code_version_major = 1			; GISEL-NEXT: amd_code_version_major = 1
	; GISEL-NEXT: amd_code_version_minor = 2			; GISEL-NEXT: amd_code_version_minor = 2
	; GISEL-NEXT: amd_machine_kind = 1			; GISEL-NEXT: amd_machine_kind = 1
	; GISEL-NEXT: amd_machine_version_major = 7			; GISEL-NEXT: amd_machine_version_major = 7
	; GISEL-NEXT: amd_machine_version_minor = 0			; GISEL-NEXT: amd_machine_version_minor = 0
	; GISEL-NEXT: amd_machine_version_stepping = 0			; GISEL-NEXT: amd_machine_version_stepping = 0
	; GISEL-NEXT: kernel_code_entry_byte_offset = 256			; GISEL-NEXT: kernel_code_entry_byte_offset = 256
	; GISEL-NEXT: kernel_code_prefetch_byte_size = 0			; GISEL-NEXT: kernel_code_prefetch_byte_size = 0
	; GISEL-NEXT: granulated_workitem_vgpr_count = 7			; GISEL-NEXT: granulated_workitem_vgpr_count = 15
	; GISEL-NEXT: granulated_wavefront_sgpr_count = 4			; GISEL-NEXT: granulated_wavefront_sgpr_count = 12
	; GISEL-NEXT: priority = 0			; GISEL-NEXT: priority = 0
	; GISEL-NEXT: float_mode = 240			; GISEL-NEXT: float_mode = 240
	; GISEL-NEXT: priv = 0			; GISEL-NEXT: priv = 0
	; GISEL-NEXT: enable_dx10_clamp = 1			; GISEL-NEXT: enable_dx10_clamp = 1
	; GISEL-NEXT: debug_mode = 0			; GISEL-NEXT: debug_mode = 0
	; GISEL-NEXT: enable_ieee_mode = 1			; GISEL-NEXT: enable_ieee_mode = 1
	; GISEL-NEXT: enable_wgp_mode = 0			; GISEL-NEXT: enable_wgp_mode = 0
	; GISEL-NEXT: enable_mem_ordered = 0			; GISEL-NEXT: enable_mem_ordered = 0
	Show All 26 Lines
	; GISEL-NEXT: is_dynamic_callstack = 1			; GISEL-NEXT: is_dynamic_callstack = 1
	; GISEL-NEXT: is_debug_enabled = 0			; GISEL-NEXT: is_debug_enabled = 0
	; GISEL-NEXT: is_xnack_enabled = 0			; GISEL-NEXT: is_xnack_enabled = 0
	; GISEL-NEXT: workitem_private_segment_byte_size = 16384			; GISEL-NEXT: workitem_private_segment_byte_size = 16384
	; GISEL-NEXT: workgroup_group_segment_byte_size = 0			; GISEL-NEXT: workgroup_group_segment_byte_size = 0
	; GISEL-NEXT: gds_segment_byte_size = 0			; GISEL-NEXT: gds_segment_byte_size = 0
	; GISEL-NEXT: kernarg_segment_byte_size = 64			; GISEL-NEXT: kernarg_segment_byte_size = 64
	; GISEL-NEXT: workgroup_fbarrier_count = 0			; GISEL-NEXT: workgroup_fbarrier_count = 0
	; GISEL-NEXT: wavefront_sgpr_count = 37			; GISEL-NEXT: wavefront_sgpr_count = 104
	; GISEL-NEXT: workitem_vgpr_count = 32			; GISEL-NEXT: workitem_vgpr_count = 64
	; GISEL-NEXT: reserved_vgpr_first = 0			; GISEL-NEXT: reserved_vgpr_first = 0
	; GISEL-NEXT: reserved_vgpr_count = 0			; GISEL-NEXT: reserved_vgpr_count = 0
	; GISEL-NEXT: reserved_sgpr_first = 0			; GISEL-NEXT: reserved_sgpr_first = 0
	; GISEL-NEXT: reserved_sgpr_count = 0			; GISEL-NEXT: reserved_sgpr_count = 0
	; GISEL-NEXT: debug_wavefront_private_segment_offset_sgpr = 0			; GISEL-NEXT: debug_wavefront_private_segment_offset_sgpr = 0
	; GISEL-NEXT: debug_private_segment_buffer_sgpr = 0			; GISEL-NEXT: debug_private_segment_buffer_sgpr = 0
	; GISEL-NEXT: kernarg_segment_alignment = 4			; GISEL-NEXT: kernarg_segment_alignment = 4
	; GISEL-NEXT: group_segment_alignment = 4			; GISEL-NEXT: group_segment_alignment = 4
	▲ Show 20 Lines • Show All 1,470 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Use module level register maximums for unknown calleesClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 400164

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.h

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/test/CodeGen/AMDGPU/agpr-register-count.ll

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll

llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

AMDGPU: Use module level register maximums for unknown callees
ClosedPublic