This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
4/8
SIFrameLowering.cpp
1/1
SIInstrInfo.cpp
-
SIRegisterInfo.h
4/7
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
callee-frame-setup.ll
-
pei-scavenge-vgpr-spill.mir
-
stack-realign.ll

Differential D99269

[AMDGPU] Unify spill code
ClosedPublic

Authored by sebastian-ne on Mar 24 2021, 8:02 AM.

Download Raw Diff

Details

Reviewers

arsenm
RamNalamothu
scott.linder

Commits

rG32bc9a9bc314: [AMDGPU] Unify spill code

Summary

Instead of reimplementing spilling in prolog and epilog, reuse
loadRegFromStackSlot and storeRegToStackSlot.
Mark the generated instruction as by setting an added flag argument to
1, so the lowering can use the stack pointer instead of the frame
pointer, which is not set up at this point.

Also fixes a bug where subregisters got overwritten in
pei-scavenge-vgpr-spill.mir.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sebastian-ne created this revision.Mar 24 2021, 8:02 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptMar 24 2021, 8:02 AM

sebastian-ne requested review of this revision.Mar 24 2021, 8:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 8:02 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

arsenm added inline comments.Mar 24 2021, 8:07 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
738–739	I don't think you're supposed to rely on the frame setup flags this way

Add a flags argument to SI_SPILL_Vxx instructions instead of using frame-setup flags.

Herald added a subscriber: MatzeB. · View Herald TranscriptMar 24 2021, 10:46 AM

sebastian-ne edited the summary of this revision. (Show Details)Mar 24 2021, 10:47 AM

Harbormaster completed remote builds in B95494: Diff 332995.Mar 24 2021, 2:29 PM

Harbormaster completed remote builds in B95529: Diff 333044.Mar 24 2021, 7:17 PM

sebastian-ne added a child revision: D99429: [AMDGPU] Save WWM registers in functions.Mar 29 2021, 12:58 AM

sebastian-ne added a child revision: D96869: [AMDGPU] Fix saving fp and bp.Mar 30 2021, 2:11 AM

arsenm requested changes to this revision.Mar 30 2021, 3:28 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1460–1463	I don't like having this logic separated and overriding the frame register logic above. Can you infer this should be SP relative from the frame index itself instead of adding a new operand?

This revision now requires changes to proceed.Mar 30 2021, 3:28 PM

sebastian-ne added inline comments.Mar 31 2021, 6:25 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1460–1463	How should that work? Should I add a new `TargetStackID::PrologEpilogSave`? It is not really a property of the frame index that SP should be used, it is because the instruction is in the prolog/epilog where FP is not setup. If we wanted to access the same frame index in the function body, we would need to use FP instead of SP. I could move this code out of the switch and merge it with the frame register logic above, but that doesn’t change much.

arsenm added inline comments.Mar 31 2021, 4:12 PM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1460–1463	But we don't need to access these from an arbitrary point in the function. We know these indexes should only be accessed in the prolog/epilog. Fundamentally I do think this is a special case for PEI to handle. The problem here is just that what we need to do to emit spills is a lot more annoying than on other targets. What this probably should do is just use a common implementation function that eliminateFrameIndex and emitProlog/emitEpilog can use. buildSpillLoadStore is approximately that already. emitProlog/emitEpilog don't really need to route through the pseudos and eliminateFrameIndex

Rewrite to use buildSpillLoadStore.

This makes buildSpillLoadStore public and adds a new optional LivePhysRegs parameter, to find an unused register if we have no RegScavenger.
Now this patch merges buildPrologSpill and buildEpilogReload into buildSpillLoadStore and uses that function from emitProlog/emitEpilog.

Harbormaster completed remote builds in B96685: Diff 334652.Apr 1 2021, 6:46 AM

scott.linder added inline comments.Apr 2 2021, 10:37 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118	This seems to regress in terms of readability below, where instead of a call to `buildPrologSpill` we have a call to `buildPrologEpilogSpill(..., true)`, and the same for `buildEpilogReload`. Can we name the new function `buildPrologSpillEpilogReloadImpl` and leave the old methods interface the same: static void buildPrologSpill(...) { return buildPrologSpillEpilogReloadImpl(..., true); } static void buildEpilogReload(...) { return buildPrologSpillEpilogReloadImpl(..., false); } An argument against doing this may be that we already have `buildScratchExecCopy(..., true)` instead of `buildPrologScratchExecCopy` and `buildScratchExecCopy(..., false)` instead of `buildEpilogScratchExecCopy`, but I'd vote to do the same for those too.
831–832	This change seems unrelated; is this the bug fix? Can it be in a separate patch after the cleanup? I'd prefer the cleanup be as close to NFC as is reasonable, although I assume there is some change in behavior because we no longer call `findScratchNonCalleeSaveRegister`, we do whatever `buildSpillLoadRestore` does. I also might need some help understanding what is happening here to be able to review it. I'm confused about why `LivePhysRegs::init` gets called with different arguments for the prolog case vs the epilog case, and why we `stepBackward` on an (ostensibly arbitrary?) instruction after adding LiveOuts in the epilog case.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1472	Nit: all the changes in this file can be removed from the patch
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
853–862	Does `RS` already know about `LiveRegs`, or something equivalent to it? Or maybe `RS` cannot fail and return `Register()`? If not, it seems like we should continue to try alternative ways to pick an `SOffset` until we either find one or run out of ways to try.

Split buildPrologEpilogSpill again, split out bug fix, remove some formatting changes and add some comments.

Thanks for your comments Scott, I tried to address all of them.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118	There are some other functions (`SIRegisterInfo::buildSpillLoadStore` and `SIRegisterInfo::buildSGPRSpillLoadStore`) that also take a boolean argument to switch between load and store. Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again. I added comments to `buildScratchExecCopy`, does that look ok or should I add wrappers for prolog and epilog?
831–832	I split out the fix into D100098. I think the difference is, in the prolog we start in the beginning and track register liveness to the current insertion point (MBBI). In the epilog we start at the end and track liveness to the current insertion point. The result should be the same, but the epilog is probably at the end of the basic block while the prolog is at the beginning, so it’s cheaper to start from the end/start.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
853–862	Usually, `buildSpillLoadStore` is called to lower spill pseudos. At that point, we get a RegScavenger. During frame lowering (which did not call this function before), we do not have access to a RegScavenger as far as I know, so we use LiveRegs to find an unused register. So, when calling this function, either `RS` should be set (during frame index elimination), or `LiveRegs` (during frame lowering), but never both, as they do the same thing. `RS` can fail if all SGPRs are live. Usually it would then try to spill a register to the emergency spill slot, but that doesn’t make sense if we are currently lowering a spill, so we forbid that with the `false` last argument.

Harbormaster completed remote builds in B97713: Diff 336087.Apr 8 2021, 7:37 AM

Thank you, I have one remaining nit concerning RS and LiveRegs, but otherwise everything LGTM.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118	There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store. I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of `buildSpillLoadStore` the boolean is capturing whether to `kill` the operand, although it seems like that would always be equivalent to whether `Opcode` is a load or a store? Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again. This LGTM, thank you! The differences between the two are more obvious to me now than in the conditional, combined version. I added comments to buildScratchExecCopy, does that look ok or should I add wrappers for prolog and epilog? Sorry, I only meant to justify my comments on your code, I didn't mean to ask for a change as part of this patch. The comment LGTM.
831–832	LGTM, thank you! I think I understand now, and removing some of the redundant code helps.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
853–862	Makes sense, but in that case could you add an `assert(!RS != !LiveRegs && "expect either RS or LiveRegs but not both")`?

Add assert to buildSpillLoadStore.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118	There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store. I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of `buildSpillLoadStore` the boolean is capturing whether to `kill` the operand, although it seems like that would always be equivalent to whether `Opcode` is a load or a store? Whoops, you’re right, `buildSpillLoadStore` directly takes the opcode instead of a boolean. I think `kill` can be false on a store if the register is still used afterwards.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
853–862	I think it’s supported to call `buildSpillLoadStore` without a register scavenger (and without LiveRegs), that’s why the check for `RS` was here before, so I added an assert for `!RS \|\| !LiveRegs`.

Harbormaster completed remote builds in B97915: Diff 336365.Apr 9 2021, 3:23 AM

LGTM, thank you for bearing with me!

If I understand Matt's comments earlier correctly, I think you've addressed them. If not, or Matt has more concerns, I assume we can clean it up post-commit.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 12 2021, 2:24 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG32bc9a9bc314: [AMDGPU] Unify spill code (authored by sebastian-ne). · Explain Why

This revision was automatically updated to reflect the committed changes.

sebastian-ne added a commit: rG32bc9a9bc314: [AMDGPU] Unify spill code.

RamNalamothu mentioned this in D113100: [AMDGPU] Do not add debug locations to the code inside prologue.Nov 3 2021, 6:11 AM

RamNalamothu mentioned this in rG539f500e78ad: [AMDGPU] Do not add debug locations to the code inside prologue.Nov 3 2021, 7:32 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

247 lines

18 lines

16 lines

25 lines

test/

CodeGen/

AMDGPU/

callee-frame-setup.ll

18 lines

pei-scavenge-vgpr-spill.mir

26 lines

stack-realign.ll

14 lines

Diff 334652

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	if (!TempSGPR) {
LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "		LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "
<< printReg(TempSGPR, TRI) << '\n');		<< printReg(TempSGPR, TRI) << '\n');
}		}
}		}

// We need to specially emit stack operations here because a different frame		// We need to specially emit stack operations here because a different frame
// register is used than in the rest of the function, as getFrameRegister would		// register is used than in the rest of the function, as getFrameRegister would
// use.		// use.
static void buildPrologSpill(const GCNSubtarget &ST, LivePhysRegs &LiveRegs,		static void buildPrologEpilogSpill(const GCNSubtarget &ST,
scott.linderUnsubmitted Not Done Reply Inline Actions This seems to regress in terms of readability below, where instead of a call to `buildPrologSpill` we have a call to `buildPrologEpilogSpill(..., true)`, and the same for `buildEpilogReload`. Can we name the new function `buildPrologSpillEpilogReloadImpl` and leave the old methods interface the same: static void buildPrologSpill(...) { return buildPrologSpillEpilogReloadImpl(..., true); } static void buildEpilogReload(...) { return buildPrologSpillEpilogReloadImpl(..., false); } An argument against doing this may be that we already have `buildScratchExecCopy(..., true)` instead of `buildPrologScratchExecCopy` and `buildScratchExecCopy(..., false)` instead of `buildEpilogScratchExecCopy`, but I'd vote to do the same for those too. scott.linder: This seems to regress in terms of readability below, where instead of a call to…
sebastian-neAuthorUnsubmitted Done Reply Inline Actions There are some other functions (`SIRegisterInfo::buildSpillLoadStore` and `SIRegisterInfo::buildSGPRSpillLoadStore`) that also take a boolean argument to switch between load and store. Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again. I added comments to `buildScratchExecCopy`, does that look ok or should I add wrappers for prolog and epilog? sebastian-ne: There are some other functions (`SIRegisterInfo::buildSpillLoadStore` and `SIRegisterInfo…
scott.linderUnsubmitted Not Done Reply Inline Actions There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store. I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of `buildSpillLoadStore` the boolean is capturing whether to `kill` the operand, although it seems like that would always be equivalent to whether `Opcode` is a load or a store? Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again. This LGTM, thank you! The differences between the two are more obvious to me now than in the conditional, combined version. I added comments to buildScratchExecCopy, does that look ok or should I add wrappers for prolog and epilog? Sorry, I only meant to justify my comments on your code, I didn't mean to ask for a change as part of this patch. The comment LGTM. scott.linder: > There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo…
sebastian-neAuthorUnsubmitted Done Reply Inline Actions There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store. I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of `buildSpillLoadStore` the boolean is capturing whether to `kill` the operand, although it seems like that would always be equivalent to whether `Opcode` is a load or a store? Whoops, you’re right, `buildSpillLoadStore` directly takes the opcode instead of a boolean. I think `kill` can be false on a store if the register is still used afterwards. sebastian-ne: > > There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo…
MachineBasicBlock &MBB,		const SIRegisterInfo &TRI,
		const SIMachineFunctionInfo &FuncInfo,
		LivePhysRegs &LiveRegs, MachineFunction &MF,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
const SIInstrInfo *TII, Register SpillReg,		Register SpillReg, int FI, bool IsSpill) {
Register ScratchRsrcReg, Register SPReg, int FI) {		unsigned Opc;
MachineFunction *MF = MBB.getParent();		if (IsSpill)
MachineFrameInfo &MFI = MF->getFrameInfo();		Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR
		: AMDGPU::BUFFER_STORE_DWORD_OFFSET;
int64_t Offset = MFI.getObjectOffset(FI);		else
		Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_LOAD_DWORD_SADDR
		: AMDGPU::BUFFER_LOAD_DWORD_OFFSET;

MachineMemOperand *MMO = MF->getMachineMemOperand(		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
MachinePointerInfo::getFixedStack(*MF, FI), MachineMemOperand::MOStore, 4,		MachinePointerInfo PtrInfo = MachinePointerInfo::getFixedStack(MF, FI);
MFI.getObjectAlign(FI));		MachineMemOperand *MMO = MF.getMachineMemOperand(
		PtrInfo, IsSpill ? MachineMemOperand::MOStore : MachineMemOperand::MOLoad,
if (ST.enableFlatScratch()) {		FrameInfo.getObjectSize(FI), FrameInfo.getObjectAlign(FI));
if (TII->isLegalFLATOffset(Offset, AMDGPUAS::PRIVATE_ADDRESS, true)) {		if (IsSpill)
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::SCRATCH_STORE_DWORD_SADDR))
.addReg(SpillReg, RegState::Kill)
.addReg(SPReg)
.addImm(Offset)
.addImm(0) // cpol
.addMemOperand(MMO);
return;
}
} else if (SIInstrInfo::isLegalMUBUFImmOffset(Offset)) {
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFSET))
.addReg(SpillReg, RegState::Kill)
.addReg(ScratchRsrcReg)
.addReg(SPReg)
.addImm(Offset)
.addImm(0) // cpol
.addImm(0) // tfe
.addImm(0) // swz
.addMemOperand(MMO);
return;
}

// Don't clobber the TmpVGPR if we also need a scratch reg for the stack
// offset in the spill.
LiveRegs.addReg(SpillReg);		LiveRegs.addReg(SpillReg);
		TRI.buildSpillLoadStore(I, Opc, FI, SpillReg, IsSpill,
if (ST.enableFlatScratch()) {		FuncInfo.getStackPtrOffsetReg(), 0, MMO, nullptr,
MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(		&LiveRegs);
MF->getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0RegClass);		if (IsSpill)

bool HasOffsetReg = OffsetReg;
if (!HasOffsetReg) {
// No free register, use stack pointer and restore afterwards.
OffsetReg = SPReg;
}

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_ADD_U32), OffsetReg)
.addReg(SPReg)
.addImm(Offset);

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::SCRATCH_STORE_DWORD_SADDR))
.addReg(SpillReg, RegState::Kill)
.addReg(OffsetReg, HasOffsetReg ? RegState::Kill : 0)
.addImm(0) // offset
.addImm(0) // cpol
.addMemOperand(MMO);

if (!HasOffsetReg) {
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_SUB_U32), OffsetReg)
.addReg(SPReg)
.addImm(Offset);
}
} else {
MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(
MF->getRegInfo(), LiveRegs, AMDGPU::VGPR_32RegClass);

if (OffsetReg) {
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::V_MOV_B32_e32), OffsetReg)
.addImm(Offset);

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFEN))
.addReg(SpillReg, RegState::Kill)
.addReg(OffsetReg, RegState::Kill)
.addReg(ScratchRsrcReg)
.addReg(SPReg)
.addImm(0) // offset
.addImm(0) // cpol
.addImm(0) // tfe
.addImm(0) // swz
.addMemOperand(MMO);
} else {
// No free register, use stack pointer and restore afterwards.
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_ADD_U32), SPReg)
.addReg(SPReg)
.addImm(Offset);

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFSET))
.addReg(SpillReg, RegState::Kill)
.addReg(ScratchRsrcReg)
.addReg(SPReg)
.addImm(0) // offset
.addImm(0) // cpol
.addImm(0) // tfe
.addImm(0) // swz
.addMemOperand(MMO);

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_SUB_U32), SPReg)
.addReg(SPReg)
.addImm(Offset);
}
}

LiveRegs.removeReg(SpillReg);		LiveRegs.removeReg(SpillReg);
}		}

static void buildEpilogReload(const GCNSubtarget &ST, LivePhysRegs &LiveRegs,
MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,
const SIInstrInfo *TII, Register SpillReg,
Register ScratchRsrcReg, Register SPReg, int FI) {
MachineFunction *MF = MBB.getParent();
MachineFrameInfo &MFI = MF->getFrameInfo();
int64_t Offset = MFI.getObjectOffset(FI);

MachineMemOperand *MMO = MF->getMachineMemOperand(
MachinePointerInfo::getFixedStack(*MF, FI), MachineMemOperand::MOLoad, 4,
MFI.getObjectAlign(FI));

if (ST.enableFlatScratch()) {
if (TII->isLegalFLATOffset(Offset, AMDGPUAS::PRIVATE_ADDRESS, true)) {
BuildMI(MBB, I, DebugLoc(),
TII->get(AMDGPU::SCRATCH_LOAD_DWORD_SADDR), SpillReg)
.addReg(SPReg)
.addImm(Offset)
.addImm(0) // cpol
.addMemOperand(MMO);
return;
}
MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(
MF->getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0RegClass);
if (!OffsetReg)
report_fatal_error("failed to find free scratch register");

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_ADD_U32), OffsetReg)
.addReg(SPReg)
.addImm(Offset);
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::SCRATCH_LOAD_DWORD_SADDR),
SpillReg)
.addReg(OffsetReg, RegState::Kill)
.addImm(0)
.addImm(0) // cpol
.addMemOperand(MMO);
return;
}

if (SIInstrInfo::isLegalMUBUFImmOffset(Offset)) {
BuildMI(MBB, I, DebugLoc(),
TII->get(AMDGPU::BUFFER_LOAD_DWORD_OFFSET), SpillReg)
.addReg(ScratchRsrcReg)
.addReg(SPReg)
.addImm(Offset)
.addImm(0) // cpol
.addImm(0) // tfe
.addImm(0) // swz
.addMemOperand(MMO);
return;
}

MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(
MF->getRegInfo(), LiveRegs, AMDGPU::VGPR_32RegClass);
if (!OffsetReg)
report_fatal_error("failed to find free scratch register");

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::V_MOV_B32_e32), OffsetReg)
.addImm(Offset);

BuildMI(MBB, I, DebugLoc(),
TII->get(AMDGPU::BUFFER_LOAD_DWORD_OFFEN), SpillReg)
.addReg(OffsetReg, RegState::Kill)
.addReg(ScratchRsrcReg)
.addReg(SPReg)
.addImm(0)
.addImm(0) // cpol
.addImm(0) // tfe
.addImm(0) // swz
.addMemOperand(MMO);
}

static void buildGitPtr(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,		static void buildGitPtr(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
const DebugLoc &DL, const SIInstrInfo *TII,		const DebugLoc &DL, const SIInstrInfo *TII,
Register TargetReg) {		Register TargetReg) {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
const MCInstrDesc &SMovB32 = TII->get(AMDGPU::S_MOV_B32);		const MCInstrDesc &SMovB32 = TII->get(AMDGPU::S_MOV_B32);
Register TargetLo = TRI->getSubReg(TargetReg, AMDGPU::sub0);		Register TargetLo = TRI->getSubReg(TargetReg, AMDGPU::sub0);
▲ Show 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	if (LiveRegs.empty()) {
}		}
}		}

ScratchExecCopy = findScratchNonCalleeSaveRegister(		ScratchExecCopy = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, *TRI.getWaveMaskRegClass());		MRI, LiveRegs, *TRI.getWaveMaskRegClass());
if (!ScratchExecCopy)		if (!ScratchExecCopy)
report_fatal_error("failed to find free scratch register");		report_fatal_error("failed to find free scratch register");

if (!IsProlog)		LiveRegs.addReg(ScratchExecCopy);
LiveRegs.removeReg(ScratchExecCopy);
scott.linderUnsubmitted Not Done Reply Inline Actions This change seems unrelated; is this the bug fix? Can it be in a separate patch after the cleanup? I'd prefer the cleanup be as close to NFC as is reasonable, although I assume there is some change in behavior because we no longer call `findScratchNonCalleeSaveRegister`, we do whatever `buildSpillLoadRestore` does. I also might need some help understanding what is happening here to be able to review it. I'm confused about why `LivePhysRegs::init` gets called with different arguments for the prolog case vs the epilog case, and why we `stepBackward` on an (ostensibly arbitrary?) instruction after adding LiveOuts in the epilog case. scott.linder: This change seems unrelated; is this the bug fix? Can it be in a separate patch after the…
sebastian-neAuthorUnsubmitted Done Reply Inline Actions I split out the fix into D100098. I think the difference is, in the prolog we start in the beginning and track register liveness to the current insertion point (MBBI). In the epilog we start at the end and track liveness to the current insertion point. The result should be the same, but the epilog is probably at the end of the basic block while the prolog is at the beginning, so it’s cheaper to start from the end/start. sebastian-ne: I split out the fix into D100098. I think the difference is, in the prolog we start in the…
scott.linderUnsubmitted Not Done Reply Inline Actions LGTM, thank you! I think I understand now, and removing some of the redundant code helps. scott.linder: LGTM, thank you! I think I understand now, and removing some of the redundant code helps.

const unsigned OrSaveExec =		const unsigned OrSaveExec =
ST.isWave32() ? AMDGPU::S_OR_SAVEEXEC_B32 : AMDGPU::S_OR_SAVEEXEC_B64;		ST.isWave32() ? AMDGPU::S_OR_SAVEEXEC_B32 : AMDGPU::S_OR_SAVEEXEC_B64;
BuildMI(MBB, MBBI, DL, TII->get(OrSaveExec), ScratchExecCopy).addImm(-1);		BuildMI(MBB, MBBI, DL, TII->get(OrSaveExec), ScratchExecCopy).addImm(-1);

return ScratchExecCopy;		return ScratchExecCopy;
}		}

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitPrologue(MachineFunction &MF,
for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg		for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg
: FuncInfo->getSGPRSpillVGPRs()) {		: FuncInfo->getSGPRSpillVGPRs()) {
if (!Reg.FI.hasValue())		if (!Reg.FI.hasValue())
continue;		continue;

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);

buildPrologSpill(ST, LiveRegs, MBB, MBBI, TII, Reg.VGPR,		buildPrologEpilogSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBBI, Reg.VGPR,
FuncInfo->getScratchRSrcReg(),		*Reg.FI, true);
StackPtrReg,
Reg.FI.getValue());
}		}

if (FPSaveIndex && spilledToMemory(MF, *FPSaveIndex)) {		if (FPSaveIndex && spilledToMemory(MF, *FPSaveIndex)) {
const int FramePtrFI = *FPSaveIndex;		const int FramePtrFI = *FPSaveIndex;
		arsenmUnsubmitted Done Reply Inline Actions I don't think you're supposed to rely on the frame setup flags this way arsenm: I don't think you're supposed to rely on the frame setup flags this way
assert(!MFI.isDeadObjectIndex(FramePtrFI));		assert(!MFI.isDeadObjectIndex(FramePtrFI));

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);

MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);
if (!TmpVGPR)		if (!TmpVGPR)
report_fatal_error("failed to find free scratch register");		report_fatal_error("failed to find free scratch register");

BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)
.addReg(FramePtrReg);		.addReg(FramePtrReg);

buildPrologSpill(ST, LiveRegs, MBB, MBBI, TII, TmpVGPR,		buildPrologEpilogSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBBI, TmpVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg, FramePtrFI);		FramePtrFI, true);
}		}

if (BPSaveIndex && spilledToMemory(MF, *BPSaveIndex)) {		if (BPSaveIndex && spilledToMemory(MF, *BPSaveIndex)) {
const int BasePtrFI = *BPSaveIndex;		const int BasePtrFI = *BPSaveIndex;
assert(!MFI.isDeadObjectIndex(BasePtrFI));		assert(!MFI.isDeadObjectIndex(BasePtrFI));

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);

MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);
if (!TmpVGPR)		if (!TmpVGPR)
report_fatal_error("failed to find free scratch register");		report_fatal_error("failed to find free scratch register");

BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)
.addReg(BasePtrReg);		.addReg(BasePtrReg);

buildPrologSpill(ST, LiveRegs, MBB, MBBI, TII, TmpVGPR,		buildPrologEpilogSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBBI, TmpVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg, BasePtrFI);		BasePtrFI, true);
}		}

if (ScratchExecCopy) {		if (ScratchExecCopy) {
// FIXME: Split block and make terminator.		// FIXME: Split block and make terminator.
unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;		unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec)		BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec)
.addReg(ScratchExecCopy, RegState::Kill);		.addReg(ScratchExecCopy, RegState::Kill);
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitEpilogue(MachineFunction &MF,
Register ScratchExecCopy;		Register ScratchExecCopy;
if (FPSaveIndex) {		if (FPSaveIndex) {
const int FramePtrFI = *FPSaveIndex;		const int FramePtrFI = *FPSaveIndex;
assert(!MFI.isDeadObjectIndex(FramePtrFI));		assert(!MFI.isDeadObjectIndex(FramePtrFI));
if (spilledToMemory(MF, FramePtrFI)) {		if (spilledToMemory(MF, FramePtrFI)) {
if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);

MCPhysReg TempVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);
if (!TempVGPR)		if (!TmpVGPR)
report_fatal_error("failed to find free scratch register");		report_fatal_error("failed to find free scratch register");
buildEpilogReload(ST, LiveRegs, MBB, MBBI, TII, TempVGPR,		buildPrologEpilogSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBBI, TmpVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg, FramePtrFI);		FramePtrFI, false);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), FramePtrReg)
.addReg(TempVGPR, RegState::Kill);		.addReg(TmpVGPR, RegState::Kill);
} else {		} else {
// Reload from VGPR spill.		// Reload from VGPR spill.
assert(MFI.getStackID(FramePtrFI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FramePtrFI) == TargetStackID::SGPRSpill);
ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =		ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(FramePtrFI);		FuncInfo->getSGPRToVGPRSpills(FramePtrFI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), FramePtrReg)
.addReg(Spill[0].VGPR)		.addReg(Spill[0].VGPR)
.addImm(Spill[0].Lane);		.addImm(Spill[0].Lane);
}		}
}		}

if (BPSaveIndex) {		if (BPSaveIndex) {
const int BasePtrFI = *BPSaveIndex;		const int BasePtrFI = *BPSaveIndex;
assert(!MFI.isDeadObjectIndex(BasePtrFI));		assert(!MFI.isDeadObjectIndex(BasePtrFI));
if (spilledToMemory(MF, BasePtrFI)) {		if (spilledToMemory(MF, BasePtrFI)) {
if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);

MCPhysReg TempVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);
if (!TempVGPR)		if (!TmpVGPR)
report_fatal_error("failed to find free scratch register");		report_fatal_error("failed to find free scratch register");
buildEpilogReload(ST, LiveRegs, MBB, MBBI, TII, TempVGPR,		buildPrologEpilogSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBBI, TmpVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg, BasePtrFI);		BasePtrFI, false);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), BasePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), BasePtrReg)
.addReg(TempVGPR, RegState::Kill);		.addReg(TmpVGPR, RegState::Kill);
} else {		} else {
// Reload from VGPR spill.		// Reload from VGPR spill.
assert(MFI.getStackID(BasePtrFI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(BasePtrFI) == TargetStackID::SGPRSpill);
ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =		ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(BasePtrFI);		FuncInfo->getSGPRToVGPRSpills(BasePtrFI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), BasePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), BasePtrReg)
.addReg(Spill[0].VGPR)		.addReg(Spill[0].VGPR)
.addImm(Spill[0].Lane);		.addImm(Spill[0].Lane);
}		}
}		}

for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg :		for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg :
FuncInfo->getSGPRSpillVGPRs()) {		FuncInfo->getSGPRSpillVGPRs()) {
if (!Reg.FI.hasValue())		if (!Reg.FI.hasValue())
continue;		continue;

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);

buildEpilogReload(ST, LiveRegs, MBB, MBBI, TII, Reg.VGPR,		buildPrologEpilogSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBBI, Reg.VGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg,		*Reg.FI, false);
Reg.FI.getValue());
}		}

if (ScratchExecCopy) {		if (ScratchExecCopy) {
// FIXME: Split block and make terminator.		// FIXME: Split block and make terminator.
unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;		unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec)		BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec)
.addReg(ScratchExecCopy, RegState::Kill);		.addReg(ScratchExecCopy, RegState::Kill);
▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,463 Lines • ▼ Show 20 Lines	if (RI.isSGPRClass(RC)) {
return;		return;
}		}

unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillSaveOpcode(SpillSize)		unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillSaveOpcode(SpillSize)
: getVGPRSpillSaveOpcode(SpillSize);		: getVGPRSpillSaveOpcode(SpillSize);
MFI->setHasSpilledVGPRs();		MFI->setHasSpilledVGPRs();

BuildMI(MBB, MI, DL, get(Opcode))		BuildMI(MBB, MI, DL, get(Opcode))
.addReg(SrcReg, getKillRegState(isKill)) // data		.addReg(SrcReg, getKillRegState(isKill)) // data
scott.linderUnsubmitted Done Reply Inline Actions Nit: all the changes in this file can be removed from the patch scott.linder: Nit: all the changes in this file can be removed from the patch
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset
.addImm(0) // offset		.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}

static unsigned getSGPRSpillRestoreOpcode(unsigned Size) {		static unsigned getSGPRSpillRestoreOpcode(unsigned Size) {
switch (Size) {		switch (Size) {
case 4:		case 4:
return AMDGPU::SI_SPILL_S32_RESTORE;		return AMDGPU::SI_SPILL_S32_RESTORE;
case 8:		case 8:
return AMDGPU::SI_SPILL_S64_RESTORE;		return AMDGPU::SI_SPILL_S64_RESTORE;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	BuildMI(MBB, MI, DL, OpDesc, DestReg)
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);		.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);

return;		return;
}		}

unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillRestoreOpcode(SpillSize)		unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillRestoreOpcode(SpillSize)
: getVGPRSpillRestoreOpcode(SpillSize);		: getVGPRSpillRestoreOpcode(SpillSize);
BuildMI(MBB, MI, DL, get(Opcode), DestReg)		BuildMI(MBB, MI, DL, get(Opcode), DestReg)
.addFrameIndex(FrameIndex) // vaddr		.addFrameIndex(FrameIndex) // vaddr
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset
.addImm(0) // offset		.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}

void SIInstrInfo::insertNoop(MachineBasicBlock &MBB,		void SIInstrInfo::insertNoop(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI) const {		MachineBasicBlock::iterator MI) const {
insertNoops(MBB, MI, 1);		insertNoops(MBB, MI, 1);
}		}

void SIInstrInfo::insertNoops(MachineBasicBlock &MBB,		void SIInstrInfo::insertNoops(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 6,108 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show All 15 Lines

#define GET_REGINFO_HEADER		#define GET_REGINFO_HEADER
#include "AMDGPUGenRegisterInfo.inc"		#include "AMDGPUGenRegisterInfo.inc"

namespace llvm {		namespace llvm {

class GCNSubtarget;		class GCNSubtarget;
class LiveIntervals;		class LiveIntervals;
		class LivePhysRegs;
class RegisterBank;		class RegisterBank;
class SIMachineFunctionInfo;		class SIMachineFunctionInfo;

class SIRegisterInfo final : public AMDGPUGenRegisterInfo {		class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
private:		private:
const GCNSubtarget &ST;		const GCNSubtarget &ST;
bool SpillSGPRToVGPR;		bool SpillSGPRToVGPR;
bool isWave32;		bool isWave32;
▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	public:
/// Return all SGPR64 which satisfy the waves per execution unit requirement		/// Return all SGPR64 which satisfy the waves per execution unit requirement
/// of the subtarget.		/// of the subtarget.
ArrayRef<MCPhysReg> getAllSGPR64(const MachineFunction &MF) const;		ArrayRef<MCPhysReg> getAllSGPR64(const MachineFunction &MF) const;

/// Return all SGPR32 which satisfy the waves per execution unit requirement		/// Return all SGPR32 which satisfy the waves per execution unit requirement
/// of the subtarget.		/// of the subtarget.
ArrayRef<MCPhysReg> getAllSGPR32(const MachineFunction &MF) const;		ArrayRef<MCPhysReg> getAllSGPR32(const MachineFunction &MF) const;

private:		void buildSpillLoadStore(MachineBasicBlock::iterator MI, unsigned LoadStoreOp,
void buildSpillLoadStore(MachineBasicBlock::iterator MI,		int Index, Register ValueReg, bool ValueIsKill,
unsigned LoadStoreOp,		MCRegister ScratchOffsetReg, int64_t InstrOffset,
int Index,		MachineMemOperand MMO, RegScavenger RS,
Register ValueReg,		LivePhysRegs *LiveRegs = nullptr) const;
bool ValueIsKill,
MCRegister ScratchOffsetReg,
int64_t InstrOffset,
MachineMemOperand *MMO,
RegScavenger *RS) const;
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 784 Lines • ▼ Show 20 Lines static unsigned getFlatScratchSpillOpcode(const SIInstrInfo *TII,

} }

if (UseST) if (UseST)

LoadStoreOp = AMDGPU::getFlatScratchInstSTfromSS(LoadStoreOp); LoadStoreOp = AMDGPU::getFlatScratchInstSTfromSS(LoadStoreOp);

return LoadStoreOp; return LoadStoreOp;

} }

void SIRegisterInfo::buildSpillLoadStore(MachineBasicBlock::iterator MI, void SIRegisterInfo::buildSpillLoadStore(

unsigned LoadStoreOp, MachineBasicBlock::iterator MI, unsigned LoadStoreOp, int Index,

int Index, Register ValueReg, bool IsKill, MCRegister ScratchOffsetReg,

bool IsKill, LivePhysRegs *LiveRegs) const {

MCRegister ScratchOffsetReg,

int64_t InstOffset,

MachineMemOperand *MMO,

RegScavenger *RS) const {

MachineBasicBlock *MBB = MI->getParent(); MachineBasicBlock *MBB = MI->getParent();

MachineFunction *MF = MI->getParent()->getParent(); MachineFunction *MF = MI->getParent()->getParent();

const SIInstrInfo *TII = ST.getInstrInfo(); const SIInstrInfo *TII = ST.getInstrInfo();

const MachineFrameInfo &MFI = MF->getFrameInfo(); const MachineFrameInfo &MFI = MF->getFrameInfo();

const SIMachineFunctionInfo *FuncInfo = MF->getInfo<SIMachineFunctionInfo>(); const SIMachineFunctionInfo *FuncInfo = MF->getInfo<SIMachineFunctionInfo>();

const MCInstrDesc *Desc = &TII->get(LoadStoreOp); const MCInstrDesc *Desc = &TII->get(LoadStoreOp);

const DebugLoc &DL = MI->getDebugLoc(); const DebugLoc &DL = MI->getDebugLoc();

Show All 39 Lines if (!IsOffsetLegal || (IsFlat && !SOffset && !ST.hasFlatScratchSTMode())) {

// We currently only support spilling VGPRs to EltSize boundaries, meaning // We currently only support spilling VGPRs to EltSize boundaries, meaning

// we can simplify the adjustment of Offset here to just scale with // we can simplify the adjustment of Offset here to just scale with

// WavefrontSize. // WavefrontSize.

if (!IsFlat) if (!IsFlat)

Offset *= ST.getWavefrontSize(); Offset *= ST.getWavefrontSize();

// We don't have access to the register scavenger if this function is called // We don't have access to the register scavenger if this function is called

// during PEI::scavengeFrameVirtualRegs(). // during PEI::scavengeFrameVirtualRegs().

if (RS) if (RS) {

SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false); SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);

} else if (LiveRegs) {

for (MCRegister Reg : AMDGPU::SGPR_32RegClass) {

if (LiveRegs->available(MF->getRegInfo(), Reg)) {

SOffset = Reg;

break;

}

scott.linderUnsubmitted

Not Done

// during PEI::scavengeFrameVirtualRegs().

- if (RS) {

+ if (RS)

SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);

- } else if (LiveRegs) {

+ if (!SOffset && LiveRegs) {

for (MCRegister Reg : AMDGPU::SGPR_32RegClass) {

if (LiveRegs->available(MF->getRegInfo(), Reg)) {

SOffset = Reg;

break;

}

if (!SOffset) {

Does RS already know about LiveRegs, or something equivalent to it? Or maybe RS cannot fail and return Register()?

If not, it seems like we should continue to try alternative ways to pick an SOffset until we either find one or run out of ways to try.

scott.linder: Does `RS` already know about `LiveRegs`, or something equivalent to it? Or maybe `RS` cannot…

sebastian-neAuthorUnsubmitted

Done

Usually, buildSpillLoadStore is called to lower spill pseudos. At that point, we get a RegScavenger. During frame lowering (which did not call this function before), we do not have access to a RegScavenger as far as I know, so we use LiveRegs to find an unused register.
So, when calling this function, either RS should be set (during frame index elimination), or LiveRegs (during frame lowering), but never both, as they do the same thing.

RS can fail if all SGPRs are live. Usually it would then try to spill a register to the emergency spill slot, but that doesn’t make sense if we are currently lowering a spill, so we forbid that with the false last argument.

sebastian-ne: Usually, `buildSpillLoadStore` is called to lower spill pseudos. At that point, we get a…

scott.linderUnsubmitted

Not Done

Makes sense, but in that case could you add an assert(!RS != !LiveRegs && "expect either RS or LiveRegs but not both")?

scott.linder: Makes sense, but in that case could you add an `assert(!RS != !LiveRegs && "expect either RS or…

sebastian-neAuthorUnsubmitted

Done

I think it’s supported to call buildSpillLoadStore without a register scavenger (and without LiveRegs), that’s why the check for RS was here before, so I added an assert for !RS || !LiveRegs.

sebastian-ne: I think it’s supported to call `buildSpillLoadStore` without a register scavenger (and without…

if (!SOffset) { if (!SOffset) {

// There are no free SGPRs, and since we are in the process of spilling // There are no free SGPRs, and since we are in the process of spilling

// VGPRs too. Since we need a VGPR in order to spill SGPRs (this is true // VGPRs too. Since we need a VGPR in order to spill SGPRs (this is true

// on SI/CI and on VI it is true until we implement spilling using scalar // on SI/CI and on VI it is true until we implement spilling using scalar

// stores), we have no way to free up an SGPR. Our solution here is to // stores), we have no way to free up an SGPR. Our solution here is to

// add the offset directly to the ScratchOffset or StackPtrOffset // add the offset directly to the ScratchOffset or StackPtrOffset

// register, and then subtract the offset after the spill to return the // register, and then subtract the offset after the spill to return the

▲ Show 20 Lines • Show All 581 Lines • ▼ Show 20 Lines switch (MI->getOpcode()) {

case AMDGPU::SI_SPILL_A64_SAVE: case AMDGPU::SI_SPILL_A64_SAVE:

case AMDGPU::SI_SPILL_A32_SAVE: { case AMDGPU::SI_SPILL_A32_SAVE: {

const MachineOperand *VData = TII->getNamedOperand(*MI, const MachineOperand *VData = TII->getNamedOperand(*MI,

AMDGPU::OpName::vdata); AMDGPU::OpName::vdata);

assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() == assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==

MFI->getStackPtrOffsetReg()); MFI->getStackPtrOffsetReg());

unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR

: AMDGPU::BUFFER_STORE_DWORD_OFFSET; : AMDGPU::BUFFER_STORE_DWORD_OFFSET;

buildSpillLoadStore(MI, Opc, buildSpillLoadStore(MI, Opc,

Index, Index,

VData->getReg(), VData->isKill(), VData->getReg(), VData->isKill(),

arsenmUnsubmitted

Not Done

I don't like having this logic separated and overriding the frame register logic above. Can you infer this should be SP relative from the frame index itself instead of adding a new operand?

arsenm: I don't like having this logic separated and overriding the frame register logic above. Can you…

sebastian-neAuthorUnsubmitted

Done

How should that work? Should I add a new TargetStackID::PrologEpilogSave?

It is not really a property of the frame index that SP should be used, it is because the instruction is in the prolog/epilog where FP is not setup. If we wanted to access the same frame index in the function body, we would need to use FP instead of SP.

I could move this code out of the switch and merge it with the frame register logic above, but that doesn’t change much.

sebastian-ne: How should that work? Should I add a new `TargetStackID::PrologEpilogSave`? It is not really a…

arsenmUnsubmitted

Done

But we don't need to access these from an arbitrary point in the function. We know these indexes should only be accessed in the prolog/epilog. Fundamentally I do think this is a special case for PEI to handle. The problem here is just that what we need to do to emit spills is a lot more annoying than on other targets.

What this probably should do is just use a common implementation function that eliminateFrameIndex and emitProlog/emitEpilog can use. buildSpillLoadStore is approximately that already. emitProlog/emitEpilog don't really need to route through the pseudos and eliminateFrameIndex

arsenm: But we don't need to access these from an arbitrary point in the function. We know these…

FrameReg, FrameReg,

TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(), TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),

*MI->memoperands_begin(), *MI->memoperands_begin(),

RS); RS);

MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode())); MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));

MI->eraseFromParent(); MI->eraseFromParent();

break; break;

} }

Show All 29 Lines case AMDGPU::SI_SPILL_A1024_RESTORE: {

TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(), TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),

*MI->memoperands_begin(), *MI->memoperands_begin(),

RS); RS);

MI->eraseFromParent(); MI->eraseFromParent();

break; break;

} }

default: { default: {

// Other access to frame index

const DebugLoc &DL = MI->getDebugLoc(); const DebugLoc &DL = MI->getDebugLoc();

int64_t Offset = FrameInfo.getObjectOffset(Index); int64_t Offset = FrameInfo.getObjectOffset(Index);

if (ST.enableFlatScratch()) { if (ST.enableFlatScratch()) {

if (TII->isFLATScratch(*MI)) { if (TII->isFLATScratch(*MI)) {

assert((int16_t)FIOperandNum == assert((int16_t)FIOperandNum ==

AMDGPU::getNamedOperandIdx(MI->getOpcode(), AMDGPU::getNamedOperandIdx(MI->getOpcode(),

AMDGPU::OpName::saddr)); AMDGPU::OpName::saddr));

▲ Show 20 Lines • Show All 869 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
ret void		ret void
}		}

; The byval argument exceeds the MUBUF constant offset, so a scratch		; The byval argument exceeds the MUBUF constant offset, so a scratch
; register is needed to access the CSR VGPR slot.		; register is needed to access the CSR VGPR slot.
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008		; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008		; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF-DAG: s_add_u32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_u32 s32, s32, 0x40300{{$}}
; FLATSCR-DAG: s_add_u32 s32, s32, 0x100c{{$}}		; FLATSCR-DAG: s_add_u32 s32, s32, 0x100c{{$}}
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; MUBUF: v_readlane_b32 s4, [[CSR_VGPR]], 0		; MUBUF: v_readlane_b32 s4, [[CSR_VGPR]], 0
; FLATSCR: v_readlane_b32 s0, [[CSR_VGPR]], 0		; FLATSCR: v_readlane_b32 s0, [[CSR_VGPR]], 0
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: v_readlane_b32 s5, [[CSR_VGPR]], 1		; MUBUF: v_readlane_b32 s5, [[CSR_VGPR]], 1
; FLATSCR: v_readlane_b32 s1, [[CSR_VGPR]], 1		; FLATSCR: v_readlane_b32 s1, [[CSR_VGPR]], 1
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}
; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x100c{{$}}		; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x100c{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008		; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008		; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #1 {		define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	call void asm sideeffect "; clobber all VGPRs except CSR v40",
,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"()		,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"()
ret void		ret void
}		}

; If the size of the offset exceeds the MUBUF offset field we need another		; If the size of the offset exceeds the MUBUF offset field we need another
; scratch VGPR to hold the offset.		; scratch VGPR to hold the offset.
; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset		; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset
; MUBUF: s_or_saveexec_b64 s[4:5], -1		; MUBUF: s_or_saveexec_b64 s[4:5], -1
; MUBUF: v_mov_b32_e32 v0, 0x1008		; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF-NEXT: buffer_store_dword v39, v0, s[0:3], s32 offen ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; MUBUF: v_mov_b32_e32 v0, s33		; MUBUF-NEXT: v_mov_b32_e32 v0, s33
; GCN-NOT: v_mov_b32_e32 v0, 0x100c		; GCN-NOT: v_mov_b32_e32 v0, 0x100c
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x100c		; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40300
; MUBUF-NEXT: buffer_store_dword v0, v1, s[0:3], s32 offen ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR: s_add_u32 [[SOFF:s[0-9]+]], s33, 0x1004		; FLATSCR: s_add_u32 [[SOFF:s[0-9]+]], s33, 0x1004
; FLATSCR: v_mov_b32_e32 v0, 0		; FLATSCR: v_mov_b32_e32 v0, 0
; FLATSCR: scratch_store_dword off, v0, [[SOFF]]		; FLATSCR: scratch_store_dword off, v0, [[SOFF]]
define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {		define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
Show All 23 Lines

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

	Show All 20 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255			liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255

	; GFX8-LABEL: name: pei_scavenge_vgpr_spill			; GFX8-LABEL: name: pei_scavenge_vgpr_spill
	; GFX8: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2			; GFX8: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
	; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX8: $sgpr32 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; GFX8: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc
	; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)			; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)
	; GFX8: $sgpr32 = S_SUB_U32 $sgpr32, 8196, implicit-def $scc
	; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX8: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; GFX8: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; GFX8: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; GFX8: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; GFX8: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; GFX8: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; GFX8: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; GFX8: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; GFX8: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX8: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)			; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
	; GFX8: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX8: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX8: $vcc_lo = S_MOV_B32 8192			; GFX8: $vcc_lo = S_MOV_B32 8192
	; GFX8: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec			; GFX8: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec
	; GFX8: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec			; GFX8: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
	; GFX8: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; GFX8: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; GFX8: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; GFX8: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX8: $vgpr0 = V_MOV_B32_e32 8196, implicit $exec			; GFX8: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc
	; GFX8: $vgpr2 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)			; GFX8: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
	; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX8: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)			; GFX8: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
	; GFX8: S_ENDPGM 0, csr_amdgpu_allvgprs			; GFX8: S_ENDPGM 0, csr_amdgpu_allvgprs
	; GFX9-LABEL: name: pei_scavenge_vgpr_spill			; GFX9-LABEL: name: pei_scavenge_vgpr_spill
	; GFX9: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
	; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9: $sgpr32 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; GFX9: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc
	; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)
	; GFX9: $sgpr32 = S_SUB_U32 $sgpr32, 8196, implicit-def $scc
	; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; GFX9: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; GFX9: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; GFX9: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; GFX9: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; GFX9: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; GFX9: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; GFX9: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; GFX9: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX9: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
	; GFX9: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX9: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX9: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec			; GFX9: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec
	; GFX9: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec			; GFX9: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
	; GFX9: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; GFX9: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; GFX9: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; GFX9: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9: $vgpr0 = V_MOV_B32_e32 8196, implicit $exec			; GFX9: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc
	; GFX9: $vgpr2 = BUFFER_LOAD_DWORD_OFFEN killed $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)			; GFX9: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
	; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)			; GFX9: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 20, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
	; GFX9: S_ENDPGM 0, csr_amdgpu_allvgprs			; GFX9: S_ENDPGM 0, csr_amdgpu_allvgprs
	; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill			; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill
	; GFX9-FLATSCR: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2			; GFX9-FLATSCR: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
	; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9-FLATSCR: $sgpr4 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; GFX9-FLATSCR: $sgpr6 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc
	; GFX9-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr2, killed $sgpr4, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.3, addrspace 5)			; GFX9-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.3, addrspace 5)
	; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9-FLATSCR: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; GFX9-FLATSCR: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; GFX9-FLATSCR: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 8191, implicit-def $scc			; GFX9-FLATSCR: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 8191, implicit-def $scc
	; GFX9-FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def $scc			; GFX9-FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def $scc
	; GFX9-FLATSCR: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 24576, implicit-def $scc			; GFX9-FLATSCR: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 24576, implicit-def $scc
	; GFX9-FLATSCR: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec			; GFX9-FLATSCR: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec
	; GFX9-FLATSCR: $vcc_hi = S_ADD_U32 $sgpr33, 8192, implicit-def $scc			; GFX9-FLATSCR: $vcc_hi = S_ADD_U32 $sgpr33, 8192, implicit-def $scc
	; GFX9-FLATSCR: $vgpr0 = V_OR_B32_e32 killed $vcc_hi, $vgpr1, implicit $exec			; GFX9-FLATSCR: $vgpr0 = V_OR_B32_e32 killed $vcc_hi, $vgpr1, implicit $exec
	; GFX9-FLATSCR: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 24576, implicit-def $scc			; GFX9-FLATSCR: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 24576, implicit-def $scc
	; GFX9-FLATSCR: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; GFX9-FLATSCR: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9-FLATSCR: $sgpr4 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; GFX9-FLATSCR: $sgpr6 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc
	; GFX9-FLATSCR: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr4, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.3, addrspace 5)			; GFX9-FLATSCR: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.3, addrspace 5)
	; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9-FLATSCR: S_ENDPGM 0, csr_amdgpu_allvgprs			; GFX9-FLATSCR: S_ENDPGM 0, csr_amdgpu_allvgprs
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec
	S_ENDPGM 0, csr_amdgpu_allvgprs			S_ENDPGM 0, csr_amdgpu_allvgprs
	...			...

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	}			}

	define void @spill_bp_to_memory_scratch_reg_needed_mubuf_offset(<32 x i32> %a, i32 %b, [4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #5 {			define void @spill_bp_to_memory_scratch_reg_needed_mubuf_offset(<32 x i32> %a, i32 %b, [4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #5 {
	; If the size of the offset exceeds the MUBUF offset field we need another			; If the size of the offset exceeds the MUBUF offset field we need another
	; scratch VGPR to hold the offset.			; scratch VGPR to hold the offset.

	; GCN-LABEL: spill_bp_to_memory_scratch_reg_needed_mubuf_offset			; GCN-LABEL: spill_bp_to_memory_scratch_reg_needed_mubuf_offset
	; GCN: s_or_saveexec_b64 s[4:5], -1			; GCN: s_or_saveexec_b64 s[4:5], -1
	; GCN: v_mov_b32_e32 v0, s33			; GCN-NEXT: s_add_u32 s6, s32, 0x42100
				; GCN-NEXT: buffer_store_dword v39, off, s[0:3], s6 ; 4-byte Folded Spill
				; GCN-NEXT: v_mov_b32_e32 v0, s33
	; GCN-NOT: v_mov_b32_e32 v0, 0x1088			; GCN-NOT: v_mov_b32_e32 v0, 0x1088
	; GCN-NEXT: v_mov_b32_e32 v1, 0x1088			; GCN-NEXT: s_add_u32 s6, s32, 0x42200
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s32 offen			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
	; GCN: v_mov_b32_e32 v0, s34			; GCN-NEXT: v_mov_b32_e32 v0, s34
	; GCN-NOT: v_mov_b32_e32 v0, 0x108c			; GCN-NOT: v_mov_b32_e32 v0, 0x108c
	; GCN-NEXT: v_mov_b32_e32 v1, 0x108c			; GCN-NEXT: s_add_u32 s6, s32, 0x42300
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s32 offen			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
	%local_val = alloca i32, align 128, addrspace(5)			%local_val = alloca i32, align 128, addrspace(5)
	store volatile i32 %b, i32 addrspace(5)* %local_val, align 128			store volatile i32 %b, i32 addrspace(5)* %local_val, align 128

	call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",			call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
	"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}			"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
	,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}			,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
	,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}			,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
	,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}			,~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
	Show All 21 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify spill codeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 334652

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

llvm/test/CodeGen/AMDGPU/stack-realign.ll

[AMDGPU] Unify spill code
ClosedPublic