This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUAnnotateKernelFeatures.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
inline-asm-agpr.ll

Differential D117057

[AMDGPU] Annotate functions with inline asm using agprs
AbandonedPublic

Authored by rampitec on Jan 11 2022, 2:38 PM.

Download Raw Diff

Details

Reviewers

arsenm

Summary

This is needed for future patch. It is possible to allocate all
VGPRs and use MFMA with VGPRs if we have less then 257 registers
and have no calls, but we need to make sure AGPRs are not used
by an inline asm. It is done as an annotation because the way to
use MFMA with VGPRs is to select a proper opcode, so it shall be
known by the time of selection.

Diff Detail

Event Timeline

rampitec created this revision.Jan 11 2022, 2:38 PM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJan 11 2022, 2:38 PM

rampitec requested review of this revision.Jan 11 2022, 2:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2022, 2:38 PM

Herald added a subscriber: wdng. · View Herald Transcript

rampitec added a parent revision: D117055: [AMDGPU] Fixed physreg asm constraint parsing.Jan 11 2022, 2:38 PM

Why does this specifically need to check for inline asm? Is this only checked in the kernel or functions too?

I'm trying to delete this pass and don't want to add more stuff to it. If it's just for kernels can't you check for AGPR physreg uses later?

This revision now requires changes to proceed.Jan 11 2022, 2:45 PM

In D117057#3235597, @arsenm wrote:

Why does this specifically need to check for inline asm? Is this only checked in the kernel or functions too?

I'm trying to delete this pass and don't want to add more stuff to it. If it's just for kernels can't you check for AGPR physreg uses later?

There two ways of getting AGPRs: use an MFMA and inline asm. For MFMA if we fit into register budget of 256 registers we can select _vgprcd versions and have no AGPRs, but that is not possible to do if there will be other AGPR uses.
This is too late to check for AGPR uses after selection, because it is needed for the selection.

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

In D117057#3235626, @rampitec wrote:

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

The downside is that I have to scan all instructions for that, and AMDGPUAnnotateKernelFeatures already does that. Where are you planning to move its code?

Harbormaster completed remote builds in B142749: Diff 399084.Jan 11 2022, 3:18 PM

rampitec abandoned this revision.Jan 11 2022, 4:37 PM

In D117057#3235661, @rampitec wrote:

In D117057#3235626, @rampitec wrote:

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

The downside is that I have to scan all instructions for that, and AMDGPUAnnotateKernelFeatures already does that. Where are you planning to move its code?

AMDGPUAttributor. If this were to be an attribute, which is pretty ugly, it should be the inverse. Why can't you just select to AGPRs, and later we can adjust the register classes if necessary?

In D117057#3236020, @arsenm wrote:

In D117057#3235661, @rampitec wrote:

In D117057#3235626, @rampitec wrote:

I guess I can run this code in the AMDGPUDAGToDAGISel::runOnMachineFunction or even SIMachineFunctionInfo constructor and skip the attribution.

The downside is that I have to scan all instructions for that, and AMDGPUAnnotateKernelFeatures already does that. Where are you planning to move its code?

AMDGPUAttributor. If this were to be an attribute, which is pretty ugly, it should be the inverse. Why can't you just select to AGPRs, and later we can adjust the register classes if necessary?

Because it means changing instructions, which is untrivial in some cases. I'd better select it right.

Anyhow, I have moved the code, it takes a yet another scan over the instructions which has to happen somewhere anyway. If we have a better scan place in the future it will be easy to do, attribute or not.

An attribute may have an additional benefit to allow allocation shift to happen in functions as well, but I am not sure how practically interesting is it to use a wave wide instruction in a function.

That said, I will still need a parent of this change to work properly because otherwise we still have misrepresented RC for inline asm.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUAnnotateKernelFeatures.cpp

30 lines

test/

CodeGen/

AMDGPU/

inline-asm-agpr.ll

40 lines

Diff 399084

llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp

Show All 26 Lines

namespace {		namespace {
class AMDGPUAnnotateKernelFeatures : public CallGraphSCCPass {		class AMDGPUAnnotateKernelFeatures : public CallGraphSCCPass {
private:		private:
const TargetMachine *TM = nullptr;		const TargetMachine *TM = nullptr;

bool addFeatureAttributes(Function &F);		bool addFeatureAttributes(Function &F);

		bool handleInlineAsm(const CallBase *CB) const;

public:		public:
static char ID;		static char ID;

AMDGPUAnnotateKernelFeatures() : CallGraphSCCPass(ID) {}		AMDGPUAnnotateKernelFeatures() : CallGraphSCCPass(ID) {}

bool doInitialization(CallGraph &CG) override;		bool doInitialization(CallGraph &CG) override;
bool runOnSCC(CallGraphSCC &SCC) override;		bool runOnSCC(CallGraphSCC &SCC) override;

Show All 11 Lines

char AMDGPUAnnotateKernelFeatures::ID = 0;		char AMDGPUAnnotateKernelFeatures::ID = 0;

char &llvm::AMDGPUAnnotateKernelFeaturesID = AMDGPUAnnotateKernelFeatures::ID;		char &llvm::AMDGPUAnnotateKernelFeaturesID = AMDGPUAnnotateKernelFeatures::ID;

INITIALIZE_PASS(AMDGPUAnnotateKernelFeatures, DEBUG_TYPE,		INITIALIZE_PASS(AMDGPUAnnotateKernelFeatures, DEBUG_TYPE,
"Add AMDGPU function attributes", false, false)		"Add AMDGPU function attributes", false, false)

		bool AMDGPUAnnotateKernelFeatures::handleInlineAsm(const CallBase *CB) const {
		const TargetSubtargetInfo ST = TM->getSubtargetImpl(CB->getFunction());
		const TargetRegisterInfo *TRI = ST->getRegisterInfo();
		const TargetLowering *TLI = ST->getTargetLowering();
		const InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());

		for (const auto &CI: IA->ParseConstraints()) {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (const auto &CI: IA->ParseConstraints()) { + for (const auto &CI : IA->ParseConstraints()) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (const auto &CI: IA->ParseConstraints()) { +…
		for (const StringRef Code : CI.Codes) {
		const TargetRegisterClass *RC =
		TLI->getRegForInlineAsmConstraint(TRI, Code, MVT::i32).second;
		if (RC && SIRegisterInfo::isAGPRClass(RC))
		return true;
		}
		}

		return false;
		}

bool AMDGPUAnnotateKernelFeatures::addFeatureAttributes(Function &F) {		bool AMDGPUAnnotateKernelFeatures::addFeatureAttributes(Function &F) {
bool HaveStackObjects = false;		bool HaveStackObjects = false;
bool Changed = false;		bool Changed = false;
bool HaveCall = false;		bool HaveCall = false;
		bool HaveAGPR = false;
bool IsFunc = !AMDGPU::isEntryFunctionCC(F.getCallingConv());		bool IsFunc = !AMDGPU::isEntryFunctionCC(F.getCallingConv());

for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
for (Instruction &I : BB) {		for (Instruction &I : BB) {
if (isa<AllocaInst>(I)) {		if (isa<AllocaInst>(I)) {
HaveStackObjects = true;		HaveStackObjects = true;
continue;		continue;
}		}

if (auto *CB = dyn_cast<CallBase>(&I)) {		if (auto *CB = dyn_cast<CallBase>(&I)) {
const Function *Callee =		const Function *Callee =
dyn_cast<Function>(CB->getCalledOperand()->stripPointerCasts());		dyn_cast<Function>(CB->getCalledOperand()->stripPointerCasts());

// Note the occurrence of indirect call.		// Note the occurrence of indirect call.
if (!Callee) {		if (!Callee) {
if (!CB->isInlineAsm())		if (CB->isInlineAsm())
		HaveAGPR \|= handleInlineAsm(CB);
		else
HaveCall = true;		HaveCall = true;

continue;		continue;
}		}

Intrinsic::ID IID = Callee->getIntrinsicID();		Intrinsic::ID IID = Callee->getIntrinsicID();
if (IID == Intrinsic::not_intrinsic) {		if (IID == Intrinsic::not_intrinsic) {
HaveCall = true;		HaveCall = true;
Show All 11 Lines	if (!IsFunc && HaveCall) {
Changed = true;		Changed = true;
}		}

if (HaveStackObjects) {		if (HaveStackObjects) {
F.addFnAttr("amdgpu-stack-objects");		F.addFnAttr("amdgpu-stack-objects");
Changed = true;		Changed = true;
}		}

		if (HaveAGPR) {
		F.addFnAttr("amdgpu-has-agpr-in-asm");
		Changed = true;
		}

return Changed;		return Changed;
}		}

bool AMDGPUAnnotateKernelFeatures::runOnSCC(CallGraphSCC &SCC) {		bool AMDGPUAnnotateKernelFeatures::runOnSCC(CallGraphSCC &SCC) {
bool Changed = false;		bool Changed = false;

for (CallGraphNode *I : SCC) {		for (CallGraphNode *I : SCC) {
Function *F = I->getFunction();		Function *F = I->getFunction();
Show All 23 Lines

llvm/test/CodeGen/AMDGPU/inline-asm-agpr.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx90a -S -amdgpu-annotate-kernel-features < %s \| FileCheck %s

				; CHECK-LABEL: define amdgpu_kernel void @def_a0() #0 {
				define amdgpu_kernel void @def_a0() {
				%acc = call i32 asm sideeffect "; def $0", "={a0}"()
				ret void
				}

				; CHECK-LABEL: define amdgpu_kernel void @clobber_a128() #0 {
				define amdgpu_kernel void @clobber_a128() {
				call void asm sideeffect "; clobber $1", "~{v0},~{a128},~{v14}"()
				ret void
				}

				; CHECK-LABEL: define amdgpu_kernel void @use_a4() #0 {
				define amdgpu_kernel void @use_a4() {
				call void asm sideeffect "; use $0", "{a4}"(i32 undef)
				ret void
				}

				; CHECK-LABEL: define amdgpu_kernel void @def_a() #0 {
				define amdgpu_kernel void @def_a() {
				%acc = call i32 asm sideeffect "; def $0", "=a"()
				ret void
				}

				; CHECK-LABEL: define amdgpu_kernel void @use_a() #0 {
				define amdgpu_kernel void @use_a() {
				call void asm sideeffect "; use $0", "a"(i32 undef)
				ret void
				}

				; CHECK-LABEL: define amdgpu_kernel void @asm_no_agprs() #1 {
				define amdgpu_kernel void @asm_no_agprs() {
				%v = call i32 asm sideeffect "; ref $0, $1, $2", "={v0},v,~{s0}"(i32 undef)
				ret void
				}

				; CHECK: attributes #0 = { "amdgpu-has-agpr-in-asm" "target-cpu"="gfx90a" }
				; CHECK: attributes #1 = { "target-cpu"="gfx90a" }