This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Reserving VGPR for future SGPR Spill
ClosedPublic

Authored by saiislam on Nov 18 2019, 2:00 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
msearles
cdevadas

Commits

rG117e5609e98b: [AMDGPU] Reserving VGPR for future SGPR Spill

Summary

One VGPR register is allocated to handle a future spill of SGPR if "--amdgpu-reserve-vgpr-for-sgpr-spill" option is used

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	590 ms	LLVM.CodeGen/AMDGPU::Unknown Unit Message ("")

Event Timeline

saiislam created this revision.Nov 18 2019, 2:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 2:00 AM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 7 others. · View Herald Transcript

Corrected formatting

saiislam edited the summary of this revision. (Show Details)Nov 18 2019, 2:55 AM

Can you add a test for this

Herald added a subscriber: kerbowa. · View Herald TranscriptApr 2 2020, 3:00 PM

Removed reserved VGPR from RA list. Added a test case which
ensures spilling of SGPR into reserved VGPR.

Herald added a subscriber: qcolombet. · View Herald TranscriptApr 10 2020, 3:17 AM

Harbormaster failed remote builds in B52654: Diff 256537!Apr 10 2020, 4:17 AM

arsenm added inline comments.Apr 10 2020, 7:33 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
100	This should be the default, but the test disruption is probably significant without forcing the VGPR to be the high register, and adjust it down after RA
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
13	I don't see how this test really stresses the need for SGPR spilling with no free VGPRs. I would expect this to look more like the tests in spill-wide-sgpr.ll, or spill-scavenge-offset.ll to force a high pressure
21	This is the removed form of the attribute. This should be something like frame-pointer=none, although I don't think it matters here

saiislam marked 2 inline comments as done.Apr 10 2020, 9:58 PM

saiislam added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
100	Yes, you are right. 285 test cases in AMDGPU codegen are failing by making it default, in the current state.
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
13	The arguments of the function are taking up all the 256 VGPRs, so FP (s34) and SGPRs (s30 and s31) are getting spilled into reserved VGPR (v32), and restored later. Tests in spill-wide-sgpr.ll are also enforcing 2,4,8 way spilling of SGPR, which I can do here by modifying the body of parent_func while keeping the function signature of both functions. Will it be ok then?

Now highest available VGPR is reserved in the beginning and it is
switched with lowest available VGPR after RA.

arsenm added inline comments.Apr 22 2020, 3:26 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
319–326	Should split into another function. It also isn't ensuring it's a CSR?
358	Commented out code
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
488–489	This shouldn't be separate from the existing SGPR spill infrastructure. This is only pre-allocating one register

Harbormaster failed remote builds in B54317: Diff 259408!Apr 22 2020, 3:48 PM

Rebased and added code for proper lower shifting of reserved VGPR.

Removed commented code.

Harbormaster failed remote builds in B55593: Diff 261735!May 3 2020, 9:12 PM

Harbormaster failed remote builds in B55594: Diff 261736!May 3 2020, 9:44 PM

Handled few seg faults in the lit-tests.

Harbormaster failed remote builds in B55768: Diff 262067!May 5 2020, 5:53 AM

arsenm added inline comments.May 5 2020, 7:19 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10867	Should add a note that this is a hack; if we split SGPR allocation from VGPR we shouldn't need to do this
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
245	Don't need = NoRegister
254	Don't need != NoRegister. Also can invert and return early
258	doesn't need to be a reference
261–263	You can just unconditionally removeLiveIn. The isLiveIn is just a redundant map lookup
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
573–574	Should get a comment about what this is for. A better name might be removeVGPRForSGPRSpill?
575	++i
580–581	Can unconditionally removeLiveIn
582	You're going to end up calling sortUniqueLiveIns for every block, for every spill VGPR. The common case is only 1 spill VGPR, but you could collect all of the registers and do one walk through the function
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
13	This won't use all 256 VGPRs. The arguments go to the stack after > 32. It happens to look like an external call looks like it uses a lot of registers, but this is likely to change in the future. A more reliable way would be to use an ugly blob of asm, like call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9} ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19} ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29} ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39} ,~{v40},~{v41},~{v42},~{v43},~{v44},~{v45},~{v46},~{v47},~{v48},~{v49} ,~{v50},~{v51},~{v52},~{v53},~{v54},~{v55},~{v56},~{v57},~{v58},~{v59} ,~{v60},~{v61},~{v62},~{v63},~{v64},~{v65},~{v66},~{v67},~{v68},~{v69} ,~{v70},~{v71},~{v72},~{v73},~{v74},~{v75},~{v76},~{v77},~{v78},~{v79} ,~{v80},~{v81},~{v82},~{v83},~{v84},~{v85},~{v86},~{v87},~{v88},~{v89} ,~{v90},~{v91},~{v92},~{v93},~{v94},~{v95},~{v96},~{v97},~{v98},~{v99} ,~{v100},~{v101},~{v102},~{v103},~{v104},~{v105},~{v106},~{v107},~{v108},~{v109} ,~{v110},~{v111},~{v112},~{v113},~{v114},~{v115},~{v116},~{v117},~{v118},~{v119} ,~{v120},~{v121},~{v122},~{v123},~{v124},~{v125},~{v126},~{v127},~{v128},~{v129} ,~{v130},~{v131},~{v132},~{v133},~{v134},~{v135},~{v136},~{v137},~{v138},~{v139} ,~{v140},~{v141},~{v142},~{v143},~{v144},~{v145},~{v146},~{v147},~{v148},~{v149} ,~{v150},~{v151},~{v152},~{v153},~{v154},~{v155},~{v156},~{v157},~{v158},~{v159} ,~{v160},~{v161},~{v162},~{v163},~{v164},~{v165},~{v166},~{v167},~{v168},~{v169} ,~{v170},~{v171},~{v172},~{v173},~{v174},~{v175},~{v176},~{v177},~{v178},~{v179} ,~{v180},~{v181},~{v182},~{v183},~{v184},~{v185},~{v186},~{v187},~{v188},~{v189} ,~{v190},~{v191},~{v192},~{v193},~{v194},~{v195},~{v196},~{v197},~{v198},~{v199} ,~{v200},~{v201},~{v202},~{v203},~{v204},~{v205},~{v206},~{v207},~{v208},~{v209} ,~{v210},~{v211},~{v212},~{v213},~{v214},~{v215},~{v216},~{v217},~{v218},~{v219} ,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229} ,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239} ,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249} ,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}"() #1 You can also use amdgpu_max_waves_per_eu to artifically limit the number of available VGPRs to reduce number of registers you have to list

Fixed failing test cases. Now VGPR reservation happens in an
independent function and doesn't overload allocateSGPRSpillToVGPR.
Updated test case to hard code usage of all but one VGPRs and use
it for SGPR spill.

Harbormaster failed remote builds in B55986: Diff 262485!May 6 2020, 3:55 PM

arsenm added inline comments.May 6 2020, 4:35 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
347–348	Better name might be reserveVGPRforSGPRSpills?
355	Define and initialize LaneVGPR at the same time
358	Don't need this? It was never added?
361	Don't need DummyFI, can just pass None

Cleaned code for reserving vgpr.

Harbormaster failed remote builds in B55995: Diff 262508!May 6 2020, 6:46 PM

saiislam added a reviewer: cdevadas.May 7 2020, 2:48 AM

cdevadas added inline comments.May 7 2020, 9:26 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315	Doesn't it limit the total allowable SGPR spills to 64? What happens if more than 64 callee-saved SGPRs used in the functions?

saiislam marked an inline comment as done.May 7 2020, 10:45 AM

saiislam added inline comments.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315	Yes, this patch is limited to 64 SGPR spills only. Does it make sense to handle that case in a separate patch?

madhur13490 added a subscriber: madhur13490.May 7 2020, 10:52 AM

kerbowa added inline comments.May 7 2020, 11:44 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315	It seems like there is a fallback to spilling to VMEM if there are no free lanes. Doesn't this also limit the number of SGPR to VGPR spills to 32 on Navi?

arsenm added inline comments.May 8 2020, 8:21 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315	That fallback is 90% broken. This is ultimately still a workaround for the fact that SGPRs and VGPRs are allocated at the same time, so we can run out of VGPRs before we see all the SGPR spills that need to be handled

arsenm added inline comments.May 8 2020, 8:23 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10871–10872	Don't need the dummy stack object anymore
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
349	The FI argument is unused

Removed unsued frame index argument from vgpr reserving function. Added condition to reserve vgpr only if machine function has stack objects.

Harbormaster failed remote builds in B56190: Diff 262936!May 8 2020, 1:58 PM

Handled a test case which was failing in debug only.

Harbormaster failed remote builds in B56205: Diff 262968!May 8 2020, 4:08 PM

Switched the nested loops in lowerShiftReservedVGPR() to move
sorting of unique live-ins to the outer loop. Improved cleanup of
reserved vgpr after lowering it to a lower value, and when it needs
to be removed.

Removed a break

Harbormaster failed remote builds in B56336: Diff 263220!May 11 2020, 11:50 AM

arsenm accepted this revision.May 11 2020, 12:31 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
584	Register()

This revision is now accepted and ready to land.May 11 2020, 12:31 PM

Harbormaster failed remote builds in B56341: Diff 263228!May 11 2020, 1:29 PM

saiislam closed this revision.May 11 2020, 5:36 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIISelLowering.cpp

11 lines

SILowerSGPRSpills.cpp

50 lines

SIMachineFunctionInfo.h

11 lines

SIMachineFunctionInfo.cpp

36 lines

SIRegisterInfo.h

3 lines

SIRegisterInfo.cpp

29 lines

test/

CodeGen/

AMDGPU/

reserve-vgpr-for-sgpr-spill.ll

51 lines

Diff 262936

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines

STATISTIC(NumTailCalls, "Number of tail calls");		STATISTIC(NumTailCalls, "Number of tail calls");

static cl::opt<bool> DisableLoopAlignment(		static cl::opt<bool> DisableLoopAlignment(
"amdgpu-disable-loop-alignment",		"amdgpu-disable-loop-alignment",
cl::desc("Do not align and prefetch loops"),		cl::desc("Do not align and prefetch loops"),
cl::init(false));		cl::init(false));

		static cl::opt<bool> VGPRReserveforSGPRSpill(
		"amdgpu-reserve-vgpr-for-sgpr-spill",
		cl::desc("Allocates one VGPR for future SGPR Spill"), cl::init(true));
		arsenmUnsubmitted Not Done Reply Inline Actions This should be the default, but the test disruption is probably significant without forcing the VGPR to be the high register, and adjust it down after RA arsenm: This should be the default, but the test disruption is probably significant without forcing the…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, you are right. 285 test cases in AMDGPU codegen are failing by making it default, in the current state. saiislam: Yes, you are right. 285 test cases in AMDGPU codegen are failing by making it default, in the…

static bool hasFP32Denormals(const MachineFunction &MF) {		static bool hasFP32Denormals(const MachineFunction &MF) {
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
return Info->getMode().allFP32Denormals();		return Info->getMode().allFP32Denormals();
}		}

static bool hasFP64FP16Denormals(const MachineFunction &MF) {		static bool hasFP64FP16Denormals(const MachineFunction &MF) {
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
return Info->getMode().allFP64FP16Denormals();		return Info->getMode().allFP64FP16Denormals();
▲ Show 20 Lines • Show All 10,747 Lines • ▼ Show 20 Lines	if (ST.isWave32() && !MF.empty()) {
for (auto &MBB : MF) {		for (auto &MBB : MF) {
for (auto &MI : MBB) {		for (auto &MI : MBB) {
TII->fixImplicitOperands(MI);		TII->fixImplicitOperands(MI);
}		}
}		}
}		}

TargetLoweringBase::finalizeLowering(MF);		TargetLoweringBase::finalizeLowering(MF);

		// Allocate a VGPR for future SGPR Spill if
		// "amdgpu-reserve-vgpr-for-sgpr-spill" option is used
		arsenmUnsubmitted Not Done Reply Inline Actions Should add a note that this is a hack; if we split SGPR allocation from VGPR we shouldn't need to do this arsenm: Should add a note that this is a hack; if we split SGPR allocation from VGPR we shouldn't need…
		// FIXME: We won't need this hack if we split SGPR allocation from VGPR
		if (VGPRReserveforSGPRSpill && !Info->VGPRReservedForSGPRSpill &&
		!Info->isEntryFunction() && MF.getFrameInfo().hasStackObjects())
		Info->reserveVGPRforSGPRSpills(MF);
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions Don't need the dummy stack object anymore arsenm: Don't need the dummy stack object anymore

void SITargetLowering::computeKnownBitsForFrameIndex(const SDValue Op,		void SITargetLowering::computeKnownBitsForFrameIndex(const SDValue Op,
KnownBits &Known,		KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth) const {		unsigned Depth) const {
TargetLowering::computeKnownBitsForFrameIndex(Op, Known, DemandedElts,		TargetLowering::computeKnownBitsForFrameIndex(Op, Known, DemandedElts,
DAG, Depth);		DAG, Depth);
▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	if (!CSI.empty()) {
insertCSRRestores(*RestoreBlock, CSI, LIS);		insertCSRRestores(*RestoreBlock, CSI, LIS);
return true;		return true;
}		}
}		}

return false;		return false;
}		}

		static ArrayRef<MCPhysReg> getAllVGPR32(const GCNSubtarget &ST,
		const MachineFunction &MF) {
		return makeArrayRef(AMDGPU::VGPR_32RegClass.begin(), ST.getMaxNumVGPRs(MF));
		}

		// Find lowest available VGPR and use it as VGPR reserved for SGPR spills.
		static bool lowerShiftReservedVGPR(MachineFunction &MF,
		const GCNSubtarget &ST) {
		MachineRegisterInfo &MRI = MF.getRegInfo();
		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
		SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
		Register LowestAvailableVGPR, ReservedVGPR;
		arsenmUnsubmitted Not Done Reply Inline Actions Don't need = NoRegister arsenm: Don't need = NoRegister
		ArrayRef<MCPhysReg> AllVGPR32s = getAllVGPR32(ST, MF);
		for (MCPhysReg Reg : AllVGPR32s) {
		if (MRI.isAllocatable(Reg) && !MRI.isPhysRegUsed(Reg)) {
		LowestAvailableVGPR = Reg;
		break;
		}
		}

		if (!LowestAvailableVGPR)
		arsenmUnsubmitted Not Done Reply Inline Actions Don't need != NoRegister. Also can invert and return early arsenm: Don't need != NoRegister. Also can invert and return early
		return false;

		ReservedVGPR = FuncInfo->VGPRReservedForSGPRSpill;
		const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
		arsenmUnsubmitted Not Done Reply Inline Actions doesn't need to be a reference arsenm: doesn't need to be a reference
		int i = 0;
		for (auto Reg : FuncInfo->getSGPRSpillVGPRs()) {
		if (Reg.VGPR == ReservedVGPR) {
		for (MachineBasicBlock &MBB : MF) {
		MBB.removeLiveIn(ReservedVGPR);
		arsenmUnsubmitted Not Done Reply Inline Actions You can just unconditionally removeLiveIn. The isLiveIn is just a redundant map lookup arsenm: You can just unconditionally removeLiveIn. The isLiveIn is just a redundant map lookup
		MBB.addLiveIn(LowestAvailableVGPR);
		MBB.sortUniqueLiveIns();
		}
		Optional<int> FI;
		if (FuncInfo->isCalleeSavedReg(CSRegs, LowestAvailableVGPR))
		FI = FrameInfo.CreateSpillStackObject(4, Align(4));

		FuncInfo->setSGPRSpillVGPRs(LowestAvailableVGPR, FI, i);
		break;
		}
		++i;
		}
		return true;
		}

bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {		bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

VRM = getAnalysisIfAvailable<VirtRegMap>();		VRM = getAnalysisIfAvailable<VirtRegMap>();

assert(SaveBlocks.empty() && RestoreBlocks.empty());		assert(SaveBlocks.empty() && RestoreBlocks.empty());
Show All 23 Lines	bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
// handled as SpilledToReg in regular PrologEpilogInserter.		// handled as SpilledToReg in regular PrologEpilogInserter.
if ((TRI->spillSGPRToVGPR() && (HasCSRs \|\| FuncInfo->hasSpilledSGPRs())) \|\|		if ((TRI->spillSGPRToVGPR() && (HasCSRs \|\| FuncInfo->hasSpilledSGPRs())) \|\|
SpillVGPRToAGPR) {		SpillVGPRToAGPR) {
// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs		// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
// are spilled to VGPRs, in which case we can eliminate the stack usage.		// are spilled to VGPRs, in which case we can eliminate the stack usage.
//		//
// This operates under the assumption that only other SGPR spills are users		// This operates under the assumption that only other SGPR spills are users
// of the frame index.		// of the frame index.

		lowerShiftReservedVGPR(MF, ST);

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
MachineBasicBlock::iterator Next;		MachineBasicBlock::iterator Next;
for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {		for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {
MachineInstr &MI = *I;		MachineInstr &MI = *I;
Next = std::next(I);		Next = std::next(I);

		arsenmUnsubmitted Not Done Reply Inline Actions Should split into another function. It also isn't ensuring it's a CSR? arsenm: Should split into another function. It also isn't ensuring it's a CSR?
if (SpillToAGPR && TII->isVGPRSpill(MI)) {		if (SpillToAGPR && TII->isVGPRSpill(MI)) {
// Try to eliminate stack used by VGPR spills before frame		// Try to eliminate stack used by VGPR spills before frame
// finalization.		// finalization.
unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(),		unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
AMDGPU::OpName::vaddr);		AMDGPU::OpName::vaddr);
int FI = MI.getOperand(FIOp).getIndex();		int FI = MI.getOperand(FIOp).getIndex();
Register VReg =		Register VReg =
TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();		TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();
Show All 15 Lines	for (MachineBasicBlock &MBB : MF) {
assert(Spilled && "failed to spill SGPR to VGPR when allocated");		assert(Spilled && "failed to spill SGPR to VGPR when allocated");
}		}
}		}
}		}

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (auto SSpill : FuncInfo->getSGPRSpillVGPRs())		for (auto SSpill : FuncInfo->getSGPRSpillVGPRs())
MBB.addLiveIn(SSpill.VGPR);		MBB.addLiveIn(SSpill.VGPR);

		arsenmUnsubmitted Not Done Reply Inline Actions Commented out code arsenm: Commented out code
for (MCPhysReg Reg : FuncInfo->getVGPRSpillAGPRs())		for (MCPhysReg Reg : FuncInfo->getVGPRSpillAGPRs())
MBB.addLiveIn(Reg);		MBB.addLiveIn(Reg);

for (MCPhysReg Reg : FuncInfo->getAGPRSpillVGPRs())		for (MCPhysReg Reg : FuncInfo->getAGPRSpillVGPRs())
MBB.addLiveIn(Reg);		MBB.addLiveIn(Reg);

MBB.sortUniqueLiveIns();		MBB.sortUniqueLiveIns();
}		}

MadeChange = true;		MadeChange = true;
		} else if (FuncInfo->VGPRReservedForSGPRSpill) {
		FuncInfo->removeVGPRForSGPRSpill(FuncInfo->VGPRReservedForSGPRSpill, MF);
}		}

SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 479 Lines • ▼ Show 20 Lines	private:
SmallVector<MCPhysReg, 32> SpillVGPR;		SmallVector<MCPhysReg, 32> SpillVGPR;

public: // FIXME		public: // FIXME
/// If this is set, an SGPR used for save/restore of the register used for the		/// If this is set, an SGPR used for save/restore of the register used for the
/// frame pointer.		/// frame pointer.
Register SGPRForFPSaveRestoreCopy;		Register SGPRForFPSaveRestoreCopy;
Optional<int> FramePointerSaveIndex;		Optional<int> FramePointerSaveIndex;

		Register VGPRReservedForSGPRSpill;
		bool isCalleeSavedReg(const MCPhysReg *CSRegs, MCPhysReg Reg);
		arsenmUnsubmitted Not Done Reply Inline Actions This shouldn't be separate from the existing SGPR spill infrastructure. This is only pre-allocating one register arsenm: This shouldn't be separate from the existing SGPR spill infrastructure. This is only pre…

public:		public:
SIMachineFunctionInfo(const MachineFunction &MF);		SIMachineFunctionInfo(const MachineFunction &MF);

bool initializeBaseYamlFields(const yaml::SIMachineFunctionInfo &YamlMFI);		bool initializeBaseYamlFields(const yaml::SIMachineFunctionInfo &YamlMFI);

ArrayRef<SpilledReg> getSGPRToVGPRSpills(int FrameIndex) const {		ArrayRef<SpilledReg> getSGPRToVGPRSpills(int FrameIndex) const {
auto I = SGPRToVGPRSpills.find(FrameIndex);		auto I = SGPRToVGPRSpills.find(FrameIndex);
return (I == SGPRToVGPRSpills.end()) ?		return (I == SGPRToVGPRSpills.end()) ?
ArrayRef<SpilledReg>() : makeArrayRef(I->second);		ArrayRef<SpilledReg>() : makeArrayRef(I->second);
}		}

ArrayRef<SGPRSpillVGPRCSR> getSGPRSpillVGPRs() const {		ArrayRef<SGPRSpillVGPRCSR> getSGPRSpillVGPRs() const {
return SpillVGPRs;		return SpillVGPRs;
}		}

		void setSGPRSpillVGPRs(Register NewVGPR, Optional<int> newFI, int Index) {
		SpillVGPRs[Index].VGPR = NewVGPR;
		SpillVGPRs[Index].FI = newFI;
		}

		bool removeVGPRForSGPRSpill(Register ReservedVGPR, MachineFunction &MF);

ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {		ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {
return SpillAGPR;		return SpillAGPR;
}		}

ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {		ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {
return SpillVGPR;		return SpillVGPR;
}		}

MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {		MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {
auto I = VGPRToAGPRSpills.find(FrameIndex);		auto I = VGPRToAGPRSpills.find(FrameIndex);
return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister		return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister
: I->second.Lanes[Lane];		: I->second.Lanes[Lane];
}		}

bool haveFreeLanesForSGPRSpill(const MachineFunction &MF,		bool haveFreeLanesForSGPRSpill(const MachineFunction &MF,
unsigned NumLane) const;		unsigned NumLane) const;
bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);		bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);
		bool reserveVGPRforSGPRSpills(MachineFunction &MF);
bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);		bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);
void removeDeadFrameIndices(MachineFrameInfo &MFI);		void removeDeadFrameIndices(MachineFrameInfo &MFI);

bool hasCalculatedTID() const { return TIDReg != 0; };		bool hasCalculatedTID() const { return TIDReg != 0; };
Register getTIDReg() const { return TIDReg; };		Register getTIDReg() const { return TIDReg; };
void setTIDReg(Register Reg) { TIDReg = Reg; }		void setTIDReg(Register Reg) { TIDReg = Reg; }

unsigned getBytesInStackArgArea() const {		unsigned getBytesInStackArgArea() const {
▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines

Register SIMachineFunctionInfo::addImplicitBufferPtr(const SIRegisterInfo &TRI) {		Register SIMachineFunctionInfo::addImplicitBufferPtr(const SIRegisterInfo &TRI) {
ArgInfo.ImplicitBufferPtr = ArgDescriptor::createRegister(TRI.getMatchingSuperReg(		ArgInfo.ImplicitBufferPtr = ArgDescriptor::createRegister(TRI.getMatchingSuperReg(
getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass));		getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass));
NumUserSGPRs += 2;		NumUserSGPRs += 2;
return ArgInfo.ImplicitBufferPtr.getRegister();		return ArgInfo.ImplicitBufferPtr.getRegister();
}		}

static bool isCalleeSavedReg(const MCPhysReg *CSRegs, MCPhysReg Reg) {		bool SIMachineFunctionInfo::isCalleeSavedReg(const MCPhysReg *CSRegs,
		MCPhysReg Reg) {
for (unsigned I = 0; CSRegs[I]; ++I) {		for (unsigned I = 0; CSRegs[I]; ++I) {
if (CSRegs[I] == Reg)		if (CSRegs[I] == Reg)
return true;		return true;
}		}

return false;		return false;
}		}

Show All 17 Lines	bool SIMachineFunctionInfo::allocateSGPRSpillToVGPR(MachineFunction &MF,
if (!SpillLanes.empty())		if (!SpillLanes.empty())
return true;		return true;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
unsigned WaveSize = ST.getWavefrontSize();		unsigned WaveSize = ST.getWavefrontSize();
		SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

unsigned Size = FrameInfo.getObjectSize(FI);		unsigned Size = FrameInfo.getObjectSize(FI);
assert(Size >= 4 && Size <= 64 && "invalid sgpr spill size");		assert(Size >= 4 && Size <= 64 && "invalid sgpr spill size");
assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");		assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");

int NumLanes = Size / 4;		int NumLanes = Size / 4;

const MCPhysReg *CSRegs = MRI.getCalleeSavedRegs();		const MCPhysReg *CSRegs = MRI.getCalleeSavedRegs();

// Make sure to handle the case where a wide SGPR spill may span between two		// Make sure to handle the case where a wide SGPR spill may span between two
// VGPRs.		// VGPRs.
for (int I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {		for (int I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {
Register LaneVGPR;		Register LaneVGPR;
unsigned VGPRIndex = (NumVGPRSpillLanes % WaveSize);		unsigned VGPRIndex = (NumVGPRSpillLanes % WaveSize);

if (VGPRIndex == 0) {		if (VGPRIndex == 0 && !FuncInfo->VGPRReservedForSGPRSpill) {
		cdevadasUnsubmitted Not Done Reply Inline Actions Doesn't it limit the total allowable SGPR spills to 64? What happens if more than 64 callee-saved SGPRs used in the functions? cdevadas: Doesn't it limit the total allowable SGPR spills to 64? What happens if more than 64 callee…
		saiislamAuthorUnsubmitted Not Done Reply Inline Actions Yes, this patch is limited to 64 SGPR spills only. Does it make sense to handle that case in a separate patch? saiislam: Yes, this patch is limited to 64 SGPR spills only. Does it make sense to handle that case in a…
		kerbowaUnsubmitted Not Done Reply Inline Actions It seems like there is a fallback to spilling to VMEM if there are no free lanes. Doesn't this also limit the number of SGPR to VGPR spills to 32 on Navi? kerbowa: It seems like there is a fallback to spilling to VMEM if there are no free lanes. Doesn't this…
		arsenmUnsubmitted Not Done Reply Inline Actions That fallback is 90% broken. This is ultimately still a workaround for the fact that SGPRs and VGPRs are allocated at the same time, so we can run out of VGPRs before we see all the SGPR spills that need to be handled arsenm: That fallback is 90% broken. This is ultimately still a workaround for the fact that SGPRs and…
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
if (LaneVGPR == AMDGPU::NoRegister) {		if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not		// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.		// partially spill the SGPR to VGPRs.
SGPRToVGPRSpills.erase(FI);		SGPRToVGPRSpills.erase(FI);
NumVGPRSpillLanes -= I;		NumVGPRSpillLanes -= I;
return false;		return false;
}		}
Show All 15 Lines	for (int I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {
}		}

SpillLanes.push_back(SpilledReg(LaneVGPR, VGPRIndex));		SpillLanes.push_back(SpilledReg(LaneVGPR, VGPRIndex));
}		}

return true;		return true;
}		}

		/// Reserve a VGPR for spilling of SGPRs
		bool SIMachineFunctionInfo::reserveVGPRforSGPRSpills(MachineFunction &MF) {
		arsenmUnsubmitted Not Done Reply Inline Actions Better name might be reserveVGPRforSGPRSpills? arsenm: Better name might be reserveVGPRforSGPRSpills?
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		arsenmUnsubmitted Not Done Reply Inline Actions The FI argument is unused arsenm: The FI argument is unused
		const SIRegisterInfo *TRI = ST.getRegisterInfo();
		SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

		Register LaneVGPR = TRI->findUnusedRegister(
		MF.getRegInfo(), &AMDGPU::VGPR_32RegClass, MF, true);
		SpillVGPRs.push_back(SGPRSpillVGPRCSR(LaneVGPR, None));
		arsenmUnsubmitted Not Done Reply Inline Actions Define and initialize LaneVGPR at the same time arsenm: Define and initialize LaneVGPR at the same time
		FuncInfo->VGPRReservedForSGPRSpill = LaneVGPR;
		return true;
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Don't need this? It was never added? arsenm: Don't need this? It was never added?

/// Reserve AGPRs or VGPRs to support spilling for FrameIndex \p FI.		/// Reserve AGPRs or VGPRs to support spilling for FrameIndex \p FI.
/// Either AGPR is spilled to VGPR to vice versa.		/// Either AGPR is spilled to VGPR to vice versa.
		arsenmUnsubmitted Not Done Reply Inline Actions Don't need DummyFI, can just pass None arsenm: Don't need DummyFI, can just pass None
/// Returns true if a \p FI can be eliminated completely.		/// Returns true if a \p FI can be eliminated completely.
bool SIMachineFunctionInfo::allocateVGPRSpillToAGPR(MachineFunction &MF,		bool SIMachineFunctionInfo::allocateVGPRSpillToAGPR(MachineFunction &MF,
int FI,		int FI,
bool isAGPRtoVGPR) {		bool isAGPRtoVGPR) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();

▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	bool SIMachineFunctionInfo::initializeBaseYamlFields(
LDSSize = YamlMFI.LDSSize;		LDSSize = YamlMFI.LDSSize;
HighBitsOf32BitAddress = YamlMFI.HighBitsOf32BitAddress;		HighBitsOf32BitAddress = YamlMFI.HighBitsOf32BitAddress;
IsEntryFunction = YamlMFI.IsEntryFunction;		IsEntryFunction = YamlMFI.IsEntryFunction;
NoSignedZerosFPMath = YamlMFI.NoSignedZerosFPMath;		NoSignedZerosFPMath = YamlMFI.NoSignedZerosFPMath;
MemoryBound = YamlMFI.MemoryBound;		MemoryBound = YamlMFI.MemoryBound;
WaveLimiter = YamlMFI.WaveLimiter;		WaveLimiter = YamlMFI.WaveLimiter;
return false;		return false;
}		}

		// Remove VGPR which was reserved for SGPR spills if there are no spilled SGPRs
		bool SIMachineFunctionInfo::removeVGPRForSGPRSpill(Register ReservedVGPR,
		arsenmUnsubmitted Not Done Reply Inline Actions Should get a comment about what this is for. A better name might be removeVGPRForSGPRSpill? arsenm: Should get a comment about what this is for. A better name might be removeVGPRForSGPRSpill?
		MachineFunction &MF) {
		arsenmUnsubmitted Not Done Reply Inline Actions ++i arsenm: ++i
		for (auto *i = SpillVGPRs.begin(); i < SpillVGPRs.end(); i++) {
		if (i->VGPR == ReservedVGPR) {
		SpillVGPRs.erase(i);

		for (MachineBasicBlock &MBB : MF) {
		MBB.removeLiveIn(ReservedVGPR);
		arsenmUnsubmitted Not Done Reply Inline Actions Can unconditionally removeLiveIn arsenm: Can unconditionally removeLiveIn
		MBB.sortUniqueLiveIns();
		arsenmUnsubmitted Not Done Reply Inline Actions You're going to end up calling sortUniqueLiveIns for every block, for every spill VGPR. The common case is only 1 spill VGPR, but you could collect all of the registers and do one walk through the function arsenm: You're going to end up calling sortUniqueLiveIns for every block, for every spill VGPR. The…
		}
		return true;
		arsenmUnsubmitted Not Done Reply Inline Actions Register() arsenm: Register()
		}
		}
		return false;
		}

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	public:

/// \returns True if operands defined with this operand type can accept		/// \returns True if operands defined with this operand type can accept
/// an inline constant. i.e. An integer value in the range (-16, 64) or		/// an inline constant. i.e. An integer value in the range (-16, 64) or
/// -4.0f, -2.0f, -1.0f, -0.5f, 0.0f, 0.5f, 1.0f, 2.0f, 4.0f.		/// -4.0f, -2.0f, -1.0f, -0.5f, 0.0f, 0.5f, 1.0f, 2.0f, 4.0f.
bool opCanUseInlineConstant(unsigned OpType) const;		bool opCanUseInlineConstant(unsigned OpType) const;

MCRegister findUnusedRegister(const MachineRegisterInfo &MRI,		MCRegister findUnusedRegister(const MachineRegisterInfo &MRI,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const MachineFunction &MF) const;		const MachineFunction &MF,
		bool ReserveHighestVGPR = false) const;

const TargetRegisterClass *getRegClassForReg(const MachineRegisterInfo &MRI,		const TargetRegisterClass *getRegClassForReg(const MachineRegisterInfo &MRI,
Register Reg) const;		Register Reg) const;
bool isVGPR(const MachineRegisterInfo &MRI, Register Reg) const;		bool isVGPR(const MachineRegisterInfo &MRI, Register Reg) const;
bool isAGPR(const MachineRegisterInfo &MRI, Register Reg) const;		bool isAGPR(const MachineRegisterInfo &MRI, Register Reg) const;
bool isVectorRegister(const MachineRegisterInfo &MRI, Register Reg) const {		bool isVectorRegister(const MachineRegisterInfo &MRI, Register Reg) const {
return isVGPR(MRI, Reg) \|\| isAGPR(MRI, Reg);		return isVGPR(MRI, Reg) \|\| isAGPR(MRI, Reg);
}		}
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {

// FIXME: Stop using reserved registers for this.		// FIXME: Stop using reserved registers for this.
for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())		for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())		for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

		for (auto SSpill : MFI->getSGPRSpillVGPRs())
		reserveRegisterTuples(Reserved, SSpill.VGPR);

return Reserved;		return Reserved;
}		}

bool SIRegisterInfo::canRealignStack(const MachineFunction &MF) const {		bool SIRegisterInfo::canRealignStack(const MachineFunction &MF) const {
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
// On entry, the base address is 0, so it can't possibly need any more		// On entry, the base address is 0, so it can't possibly need any more
// alignment.		// alignment.

▲ Show 20 Lines • Show All 1,193 Lines • ▼ Show 20 Lines	bool SIRegisterInfo::shouldRewriteCopySrc(
//		//
// We want to look through the COPY to find:		// We want to look through the COPY to find:
// => %3 = COPY %0		// => %3 = COPY %0

// Plain copy.		// Plain copy.
return getCommonSubClass(DefRC, SrcRC) != nullptr;		return getCommonSubClass(DefRC, SrcRC) != nullptr;
}		}

/// Returns a register that is not used at any point in the function.		/// Returns a lowest register that is not used at any point in the function.
/// If all registers are used, then this function will return		/// If all registers are used, then this function will return
// AMDGPU::NoRegister.		/// AMDGPU::NoRegister. If \p ReserveHighestVGPR = true, then return
MCRegister		/// highest unused register.
SIRegisterInfo::findUnusedRegister(const MachineRegisterInfo &MRI,		MCRegister SIRegisterInfo::findUnusedRegister(const MachineRegisterInfo &MRI,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const MachineFunction &MF) const {		const MachineFunction &MF,
		bool ReserveHighestVGPR) const {
		if (ReserveHighestVGPR) {
		for (MCRegister Reg : reverse(*RC))
		if (MRI.isAllocatable(Reg) && !MRI.isPhysRegUsed(Reg))
		return Reg;
		} else {
for (MCRegister Reg : *RC)		for (MCRegister Reg : *RC)
if (MRI.isAllocatable(Reg) && !MRI.isPhysRegUsed(Reg))		if (MRI.isAllocatable(Reg) && !MRI.isPhysRegUsed(Reg))
return Reg;		return Reg;
		}
return MCRegister();		return MCRegister();
}		}

ArrayRef<int16_t> SIRegisterInfo::getRegSplitParts(const TargetRegisterClass *RC,		ArrayRef<int16_t> SIRegisterInfo::getRegSplitParts(const TargetRegisterClass *RC,
unsigned EltSize) const {		unsigned EltSize) const {
const unsigned RegBitWidth = AMDGPU::getRegBitWidth(*RC->MC);		const unsigned RegBitWidth = AMDGPU::getRegBitWidth(*RC->MC);
assert(RegBitWidth >= 32 && RegBitWidth <= 1024);		assert(RegBitWidth >= 32 && RegBitWidth <= 1024);

▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll

This file was added.

				; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx803 -O0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

				define void @child_function() #1 {
				call void asm sideeffect "", "~{vcc}" () #0
				ret void
				}

				; GCN-LABEL: {{^}}parent_func:
				; CHECK: v_writelane_b32 v255, s33, 2
				; CHECK: v_writelane_b32 v255, s30, 0
				; CHECK: v_writelane_b32 v255, s31, 1
				; CHECK: s_swappc_b64 s[30:31], s[4:5]
				; CHECK: v_readlane_b32 s4, v255, 0
				arsenmUnsubmitted Not Done Reply Inline Actions I don't see how this test really stresses the need for SGPR spilling with no free VGPRs. I would expect this to look more like the tests in spill-wide-sgpr.ll, or spill-scavenge-offset.ll to force a high pressure arsenm: I don't see how this test really stresses the need for SGPR spilling with no free VGPRs. I…
				saiislamAuthorUnsubmitted Done Reply Inline Actions The arguments of the function are taking up all the 256 VGPRs, so FP (s34) and SGPRs (s30 and s31) are getting spilled into reserved VGPR (v32), and restored later. Tests in spill-wide-sgpr.ll are also enforcing 2,4,8 way spilling of SGPR, which I can do here by modifying the body of parent_func while keeping the function signature of both functions. Will it be ok then? saiislam: The arguments of the function are taking up all the 256 VGPRs, so FP (s34) and SGPRs (s30 and…
				arsenmUnsubmitted Not Done Reply Inline Actions This won't use all 256 VGPRs. The arguments go to the stack after > 32. It happens to look like an external call looks like it uses a lot of registers, but this is likely to change in the future. A more reliable way would be to use an ugly blob of asm, like call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9} ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19} ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29} ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39} ,~{v40},~{v41},~{v42},~{v43},~{v44},~{v45},~{v46},~{v47},~{v48},~{v49} ,~{v50},~{v51},~{v52},~{v53},~{v54},~{v55},~{v56},~{v57},~{v58},~{v59} ,~{v60},~{v61},~{v62},~{v63},~{v64},~{v65},~{v66},~{v67},~{v68},~{v69} ,~{v70},~{v71},~{v72},~{v73},~{v74},~{v75},~{v76},~{v77},~{v78},~{v79} ,~{v80},~{v81},~{v82},~{v83},~{v84},~{v85},~{v86},~{v87},~{v88},~{v89} ,~{v90},~{v91},~{v92},~{v93},~{v94},~{v95},~{v96},~{v97},~{v98},~{v99} ,~{v100},~{v101},~{v102},~{v103},~{v104},~{v105},~{v106},~{v107},~{v108},~{v109} ,~{v110},~{v111},~{v112},~{v113},~{v114},~{v115},~{v116},~{v117},~{v118},~{v119} ,~{v120},~{v121},~{v122},~{v123},~{v124},~{v125},~{v126},~{v127},~{v128},~{v129} ,~{v130},~{v131},~{v132},~{v133},~{v134},~{v135},~{v136},~{v137},~{v138},~{v139} ,~{v140},~{v141},~{v142},~{v143},~{v144},~{v145},~{v146},~{v147},~{v148},~{v149} ,~{v150},~{v151},~{v152},~{v153},~{v154},~{v155},~{v156},~{v157},~{v158},~{v159} ,~{v160},~{v161},~{v162},~{v163},~{v164},~{v165},~{v166},~{v167},~{v168},~{v169} ,~{v170},~{v171},~{v172},~{v173},~{v174},~{v175},~{v176},~{v177},~{v178},~{v179} ,~{v180},~{v181},~{v182},~{v183},~{v184},~{v185},~{v186},~{v187},~{v188},~{v189} ,~{v190},~{v191},~{v192},~{v193},~{v194},~{v195},~{v196},~{v197},~{v198},~{v199} ,~{v200},~{v201},~{v202},~{v203},~{v204},~{v205},~{v206},~{v207},~{v208},~{v209} ,~{v210},~{v211},~{v212},~{v213},~{v214},~{v215},~{v216},~{v217},~{v218},~{v219} ,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229} ,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239} ,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249} ,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}"() #1 You can also use amdgpu_max_waves_per_eu to artifically limit the number of available VGPRs to reduce number of registers you have to list arsenm: This won't use all 256 VGPRs. The arguments go to the stack after > 32. It happens to look like…
				; CHECK: v_readlane_b32 s5, v255, 1
				; CHECK: v_readlane_b32 s33, v255, 2
				; GCN: ; NumVgprs: 256

				define void @parent_func() #1 {
				call void asm sideeffect "",
				"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
				,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
				arsenmUnsubmitted Not Done Reply Inline Actions This is the removed form of the attribute. This should be something like frame-pointer=none, although I don't think it matters here arsenm: This is the removed form of the attribute. This should be something like frame-pointer=none…
				,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
				,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}
				,~{v40},~{v41},~{v42},~{v43},~{v44},~{v45},~{v46},~{v47},~{v48},~{v49}
				,~{v50},~{v51},~{v52},~{v53},~{v54},~{v55},~{v56},~{v57},~{v58},~{v59}
				,~{v60},~{v61},~{v62},~{v63},~{v64},~{v65},~{v66},~{v67},~{v68},~{v69}
				,~{v70},~{v71},~{v72},~{v73},~{v74},~{v75},~{v76},~{v77},~{v78},~{v79}
				,~{v80},~{v81},~{v82},~{v83},~{v84},~{v85},~{v86},~{v87},~{v88},~{v89}
				,~{v90},~{v91},~{v92},~{v93},~{v94},~{v95},~{v96},~{v97},~{v98},~{v99}
				,~{v100},~{v101},~{v102},~{v103},~{v104},~{v105},~{v106},~{v107},~{v108},~{v109}
				,~{v110},~{v111},~{v112},~{v113},~{v114},~{v115},~{v116},~{v117},~{v118},~{v119}
				,~{v120},~{v121},~{v122},~{v123},~{v124},~{v125},~{v126},~{v127},~{v128},~{v129}
				,~{v130},~{v131},~{v132},~{v133},~{v134},~{v135},~{v136},~{v137},~{v138},~{v139}
				,~{v140},~{v141},~{v142},~{v143},~{v144},~{v145},~{v146},~{v147},~{v148},~{v149}
				,~{v150},~{v151},~{v152},~{v153},~{v154},~{v155},~{v156},~{v157},~{v158},~{v159}
				,~{v160},~{v161},~{v162},~{v163},~{v164},~{v165},~{v166},~{v167},~{v168},~{v169}
				,~{v170},~{v171},~{v172},~{v173},~{v174},~{v175},~{v176},~{v177},~{v178},~{v179}
				,~{v180},~{v181},~{v182},~{v183},~{v184},~{v185},~{v186},~{v187},~{v188},~{v189}
				,~{v190},~{v191},~{v192},~{v193},~{v194},~{v195},~{v196},~{v197},~{v198},~{v199}
				,~{v200},~{v201},~{v202},~{v203},~{v204},~{v205},~{v206},~{v207},~{v208},~{v209}
				,~{v210},~{v211},~{v212},~{v213},~{v214},~{v215},~{v216},~{v217},~{v218},~{v219}
				,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229}
				,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}
				,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}
				,~{v250},~{v251},~{v252},~{v253},~{v254}" () #0
				call void @child_function()
				ret void
				}

				attributes #0 = { nounwind noinline norecurse }
				attributes #1 = { nounwind noinline norecurse }