This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add second emergency slot for SGPR to vmem for large frames
ClosedPublic

Authored by arsenm on Dec 8 2021, 3:09 PM.

Download Raw Diff

Details

Reviewers

rampitec
sebastian-ne
foad
critson
scott.linder

Summary

In a future change, we will sometimes use a VGPR offset for doing
spills to memory, in which case we need 2 free VGPRs to do the SGPR
spill. In most cases we could spill the VGPR along with the SGPR being
spilled, but we don't have any free lanes for SGPR_1024 in wave32 so
we could still potentially need a second scavenging slot.

Diff Detail

Event Timeline

arsenm created this revision.Dec 8 2021, 3:09 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptDec 8 2021, 3:09 PM

arsenm requested review of this revision.Dec 8 2021, 3:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 3:09 PM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added a child revision: D115401: AMDGPU: Fix clobbering SCC when expanding large offset spill pseudos.Dec 8 2021, 3:09 PM

Harbormaster completed remote builds in B138297: Diff 392946.Dec 8 2021, 6:59 PM

This seems fine to me, but I am slightly confused the description mentions Wave32, but the test is for GFX9 Wave64.
I can clearly see the test exercises the new code, but do we need to also test/implement the Wave32 part?

In D115402#3182222, @critson wrote:

This seems fine to me, but I am slightly confused the description mentions Wave32, but the test is for GFX9 Wave64.
I can clearly see the test exercises the new code, but do we need to also test/implement the Wave32 part?

The description mentions an alternative strategy which won't always work for wave32, so why it's done this way

Change looks good to me

This revision is now accepted and ready to land.Dec 13 2021, 10:25 AM

d6fdbbcace0b51c0096c5dbab6afb6449da21524

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFrameLowering.cpp

13 lines

SILowerSGPRSpills.cpp

2 lines

SIMachineFunctionInfo.h

6 lines

SIMachineFunctionInfo.cpp

26 lines

test/

CodeGen/

AMDGPU/

sgpr-spill-vmem-large-frame.mir

54 lines

Diff 392946

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 1,212 Lines • ▼ Show 20 Lines	for (MachineBasicBlock &MBB : MF) {
SpillFIs[MI.getOperand(0).getIndex()]) {		SpillFIs[MI.getOperand(0).getIndex()]) {
MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);		MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);
}		}
}		}
}		}
}		}
}		}

FuncInfo->removeDeadFrameIndices(MFI);		// At this point we've already allocated all spilled SGPRs to VGPRs if we
		// can. Any remaining SGPR spills will go to memory, so move them back to the
		// default stack.
		bool HaveSGPRToVMemSpill =
		FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ true);
assert(allSGPRSpillsAreDead(MF) &&		assert(allSGPRSpillsAreDead(MF) &&
"SGPR spill should have been removed in SILowerSGPRSpills");		"SGPR spill should have been removed in SILowerSGPRSpills");

// FIXME: The other checks should be redundant with allStackObjectsAreDead,		// FIXME: The other checks should be redundant with allStackObjectsAreDead,
// but currently hasNonSpillStackObjects is set only from source		// but currently hasNonSpillStackObjects is set only from source
// allocas. Stack temps produced from legalization are not counted currently.		// allocas. Stack temps produced from legalization are not counted currently.
if (!allStackObjectsAreDead(MFI)) {		if (!allStackObjectsAreDead(MFI)) {
assert(RS && "RegScavenger required if spilling");		assert(RS && "RegScavenger required if spilling");

// Add an emergency spill slot		// Add an emergency spill slot
RS->addScavengingFrameIndex(FuncInfo->getScavengeFI(MFI, *TRI));		RS->addScavengingFrameIndex(FuncInfo->getScavengeFI(MFI, *TRI));

		// If we are spilling SGPRs to memory with a large frame, we may need a
		// second VGPR emergency frame index.
		if (HaveSGPRToVMemSpill &&
		allocateScavengingFrameIndexesNearIncomingSP(MF)) {
		RS->addScavengingFrameIndex(MFI.CreateStackObject(4, Align(4), false));
		}
}		}
}		}

// Only report VGPRs to generic code.		// Only report VGPRs to generic code.
void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,		void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
BitVector &SavedVGPRs,		BitVector &SavedVGPRs,
RegScavenger *RS) const {		RegScavenger *RS) const {
TargetFrameLowering::determineCalleeSaves(MF, SavedVGPRs, RS);		TargetFrameLowering::determineCalleeSaves(MF, SavedVGPRs, RS);
▲ Show 20 Lines • Show All 247 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	for (MachineBasicBlock &MBB : MF) {
}		}
}		}

// All those frame indices which are dead by now should be removed from the		// All those frame indices which are dead by now should be removed from the
// function frame. Otherwise, there is a side effect such as re-mapping of		// function frame. Otherwise, there is a side effect such as re-mapping of
// free frame index ids by the later pass(es) like "stack slot coloring"		// free frame index ids by the later pass(es) like "stack slot coloring"
// which in turn could mess-up with the book keeping of "frame index to VGPR		// which in turn could mess-up with the book keeping of "frame index to VGPR
// lane".		// lane".
FuncInfo->removeDeadFrameIndices(MFI);		FuncInfo->removeDeadFrameIndices(MFI, /ResetSGPRSpillStackIDs/ false);

MadeChange = true;		MadeChange = true;
} else if (FuncInfo->VGPRReservedForSGPRSpill) {		} else if (FuncInfo->VGPRReservedForSGPRSpill) {
FuncInfo->removeVGPRForSGPRSpill(FuncInfo->VGPRReservedForSGPRSpill, MF);		FuncInfo->removeVGPRForSGPRSpill(FuncInfo->VGPRReservedForSGPRSpill, MF);
}		}

SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();

// Updated the reserved registers with any VGPRs added for SGPR spills.		// Updated the reserved registers with any VGPRs added for SGPR spills.
if (NewReservedRegs)		if (NewReservedRegs)
MRI.freezeReservedRegs(MF);		MRI.freezeReservedRegs(MF);

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister
: I->second.Lanes[Lane];		: I->second.Lanes[Lane];
}		}

bool haveFreeLanesForSGPRSpill(const MachineFunction &MF,		bool haveFreeLanesForSGPRSpill(const MachineFunction &MF,
unsigned NumLane) const;		unsigned NumLane) const;
bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);		bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);
bool reserveVGPRforSGPRSpills(MachineFunction &MF);		bool reserveVGPRforSGPRSpills(MachineFunction &MF);
bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);		bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);
void removeDeadFrameIndices(MachineFrameInfo &MFI);
		/// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill
		/// to the default stack.
		bool removeDeadFrameIndices(MachineFrameInfo &MFI,
		bool ResetSGPRSpillStackIDs);

int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);		int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);
Optional<int> getOptionalScavengeFI() const { return ScavengeFI; }		Optional<int> getOptionalScavengeFI() const { return ScavengeFI; }

bool hasCalculatedTID() const { return TIDReg != 0; };		bool hasCalculatedTID() const { return TIDReg != 0; };
Register getTIDReg() const { return TIDReg; };		Register getTIDReg() const { return TIDReg; };
void setTIDReg(Register Reg) { TIDReg = Reg; }		void setTIDReg(Register Reg) { TIDReg = Reg; }

▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	for (int I = NumLanes - 1; I >= 0; --I) {
OtherUsedRegs.set(*NextSpillReg);		OtherUsedRegs.set(*NextSpillReg);
SpillRegs.push_back(*NextSpillReg);		SpillRegs.push_back(*NextSpillReg);
Spill.Lanes[I] = *NextSpillReg++;		Spill.Lanes[I] = *NextSpillReg++;
}		}

return Spill.FullyAllocated;		return Spill.FullyAllocated;
}		}

void SIMachineFunctionInfo::removeDeadFrameIndices(MachineFrameInfo &MFI) {		bool SIMachineFunctionInfo::removeDeadFrameIndices(
		MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {
// Remove dead frame indices from function frame, however keep FP & BP since		// Remove dead frame indices from function frame, however keep FP & BP since
// spills for them haven't been inserted yet. And also make sure to remove the		// spills for them haven't been inserted yet. And also make sure to remove the
// frame indices from `SGPRToVGPRSpills` data structure, otherwise, it could		// frame indices from `SGPRToVGPRSpills` data structure, otherwise, it could
// result in an unexpected side effect and bug, in case of any re-mapping of		// result in an unexpected side effect and bug, in case of any re-mapping of
// freed frame indices by later pass(es) like "stack slot coloring".		// freed frame indices by later pass(es) like "stack slot coloring".
for (auto &R : make_early_inc_range(SGPRToVGPRSpills)) {		for (auto &R : make_early_inc_range(SGPRToVGPRSpills)) {
if (R.first != FramePointerSaveIndex && R.first != BasePointerSaveIndex) {		if (R.first != FramePointerSaveIndex && R.first != BasePointerSaveIndex) {
MFI.RemoveStackObject(R.first);		MFI.RemoveStackObject(R.first);
SGPRToVGPRSpills.erase(R.first);		SGPRToVGPRSpills.erase(R.first);
}		}
}		}

// All other SPGRs must be allocated on the default stack, so reset the stack		bool HaveSGPRToMemory = false;
// ID.
		if (ResetSGPRSpillStackIDs) {
		// All other SPGRs must be allocated on the default stack, so reset the
		// stack ID.
for (int i = MFI.getObjectIndexBegin(), e = MFI.getObjectIndexEnd(); i != e;		for (int i = MFI.getObjectIndexBegin(), e = MFI.getObjectIndexEnd(); i != e;
++i)		++i) {
if (i != FramePointerSaveIndex && i != BasePointerSaveIndex)		if (i != FramePointerSaveIndex && i != BasePointerSaveIndex) {
		if (MFI.getStackID(i) == TargetStackID::SGPRSpill) {
MFI.setStackID(i, TargetStackID::Default);		MFI.setStackID(i, TargetStackID::Default);
		HaveSGPRToMemory = true;
		}
		}
		}
		}

for (auto &R : VGPRToAGPRSpills) {		for (auto &R : VGPRToAGPRSpills) {
if (R.second.FullyAllocated)		if (R.second.FullyAllocated)
MFI.RemoveStackObject(R.first);		MFI.RemoveStackObject(R.first);
}		}

		return HaveSGPRToMemory;
}		}

int SIMachineFunctionInfo::getScavengeFI(MachineFrameInfo &MFI,		int SIMachineFunctionInfo::getScavengeFI(MachineFrameInfo &MFI,
const SIRegisterInfo &TRI) {		const SIRegisterInfo &TRI) {
if (ScavengeFI)		if (ScavengeFI)
return *ScavengeFI;		return *ScavengeFI;
if (isEntryFunction()) {		if (isEntryFunction()) {
ScavengeFI = MFI.CreateFixedObject(		ScavengeFI = MFI.CreateFixedObject(
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-vmem-large-frame.mir

This file was added.

				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-spill-sgpr-to-vgpr=false -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck %s

				# Check that we allocate 2 emergency stack slots if we're spilling
				# SGPRs to memory and potentially have an offset larger than fits in
				# the addressing mode of the memory instructions.

				# CHECK-LABEL: name: test
				# CHECK: stack:
				# CHECK-NEXT: - { id: 0, name: '', type: spill-slot, offset: 8, size: 4, alignment: 4,
				# CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				# CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				# CHECK-NEXT: - { id: 1, name: '', type: default, offset: 12, size: 4096, alignment: 4,
				# CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				# CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				# CHECK-NEXT: - { id: 2, name: '', type: default, offset: 0, size: 4, alignment: 4,
				# CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				# CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				# CHECK-NEXT: - { id: 3, name: '', type: default, offset: 4, size: 4, alignment: 4,
				# CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				# CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }


				# CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
				# CHECK-NEXT: $vgpr0 = V_WRITELANE_B32 killed $sgpr10, 0, undef $vgpr0
				# CHECK-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, 0, implicit $exec :: (store (s32) into %stack.0, addrspace 5)
				# CHECK-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)


				# CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.2, addrspace 5)
				# CHECK-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 8, 0, 0, 0, implicit $exec :: (load (s32) from %stack.0, addrspace 5)
				# CHECK-NEXT: $sgpr10 = V_READLANE_B32 killed $vgpr0, 0
				# CHECK-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
				---
				name: test
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				stack:
				- { id: 0, type: spill-slot, size: 4, alignment: 4, stack-id: sgpr-spill }
				- { id: 1, size: 4096, alignment: 4 }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr33'
				hasSpilledSGPRs: true
				body: \|
				bb.0:
				liveins: $sgpr30_sgpr31, $sgpr10, $sgpr11
				S_CMP_EQ_U32 0, 0, implicit-def $scc
				SI_SPILL_S32_SAVE killed $sgpr10, %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				renamable $sgpr10 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32
				S_SETPC_B64 $sgpr30_sgpr31, implicit $scc
				...