Download Raw Diff

Details

Reviewers

arsenm
rampitec
msearles
cdevadas

Commits

rG117e5609e98b: [AMDGPU] Reserving VGPR for future SGPR Spill

Summary

One VGPR register is allocated to handle a future spill of SGPR if "--amdgpu-reserve-vgpr-for-sgpr-spill" option is used

Diff Detail

Event Timeline

saiislam created this revision.Nov 18 2019, 2:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 2:00 AM

Herald added subscribers: llvm-commits, hiraditya, t-tye and 7 others. · View Herald Transcript

Corrected formatting

saiislam edited the summary of this revision. (Show Details)Nov 18 2019, 2:55 AM

Can you add a test for this

Herald added a subscriber: kerbowa. · View Herald TranscriptApr 2 2020, 3:00 PM

Removed reserved VGPR from RA list. Added a test case which
ensures spilling of SGPR into reserved VGPR.

Herald added a subscriber: qcolombet. · View Herald TranscriptApr 10 2020, 3:17 AM

Harbormaster failed remote builds in B52654: Diff 256537!Apr 10 2020, 4:17 AM

arsenm added inline comments.Apr 10 2020, 7:33 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
105	This should be the default, but the test disruption is probably significant without forcing the VGPR to be the high register, and adjust it down after RA
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
12 ↗	(On Diff #256537)	I don't see how this test really stresses the need for SGPR spilling with no free VGPRs. I would expect this to look more like the tests in spill-wide-sgpr.ll, or spill-scavenge-offset.ll to force a high pressure
20 ↗	(On Diff #256537)	This is the removed form of the attribute. This should be something like frame-pointer=none, although I don't think it matters here

saiislam marked 2 inline comments as done.Apr 10 2020, 9:58 PM

saiislam added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
105	Yes, you are right. 285 test cases in AMDGPU codegen are failing by making it default, in the current state.
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
12 ↗	(On Diff #256537)	The arguments of the function are taking up all the 256 VGPRs, so FP (s34) and SGPRs (s30 and s31) are getting spilled into reserved VGPR (v32), and restored later. Tests in spill-wide-sgpr.ll are also enforcing 2,4,8 way spilling of SGPR, which I can do here by modifying the body of parent_func while keeping the function signature of both functions. Will it be ok then?

Now highest available VGPR is reserved in the beginning and it is
switched with lowest available VGPR after RA.

arsenm added inline comments.Apr 22 2020, 3:26 PM

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
271–278 ↗	(On Diff #259408)	Should split into another function. It also isn't ensuring it's a CSR?
318 ↗	(On Diff #259408)	Commented out code
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
488–489 ↗	(On Diff #259408)	This shouldn't be separate from the existing SGPR spill infrastructure. This is only pre-allocating one register

Harbormaster failed remote builds in B54317: Diff 259408!Apr 22 2020, 3:48 PM

Rebased and added code for proper lower shifting of reserved VGPR.

Removed commented code.

Harbormaster failed remote builds in B55593: Diff 261735!May 3 2020, 9:12 PM

Harbormaster failed remote builds in B55594: Diff 261736!May 3 2020, 9:44 PM

Handled few seg faults in the lit-tests.

Harbormaster failed remote builds in B55768: Diff 262067!May 5 2020, 5:53 AM

arsenm added inline comments.May 5 2020, 7:19 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10799	Should add a note that this is a hack; if we split SGPR allocation from VGPR we shouldn't need to do this
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
245 ↗	(On Diff #262067)	Don't need = NoRegister
254 ↗	(On Diff #262067)	Don't need != NoRegister. Also can invert and return early
258 ↗	(On Diff #262067)	doesn't need to be a reference
261–263 ↗	(On Diff #262067)	You can just unconditionally removeLiveIn. The isLiveIn is just a redundant map lookup
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
543–544 ↗	(On Diff #262067)	Should get a comment about what this is for. A better name might be removeVGPRForSGPRSpill?
545 ↗	(On Diff #262067)	++i
550–551 ↗	(On Diff #262067)	Can unconditionally removeLiveIn
552 ↗	(On Diff #262067)	You're going to end up calling sortUniqueLiveIns for every block, for every spill VGPR. The common case is only 1 spill VGPR, but you could collect all of the registers and do one walk through the function
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
12 ↗	(On Diff #256537)	This won't use all 256 VGPRs. The arguments go to the stack after > 32. It happens to look like an external call looks like it uses a lot of registers, but this is likely to change in the future. A more reliable way would be to use an ugly blob of asm, like call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9} ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19} ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29} ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39} ,~{v40},~{v41},~{v42},~{v43},~{v44},~{v45},~{v46},~{v47},~{v48},~{v49} ,~{v50},~{v51},~{v52},~{v53},~{v54},~{v55},~{v56},~{v57},~{v58},~{v59} ,~{v60},~{v61},~{v62},~{v63},~{v64},~{v65},~{v66},~{v67},~{v68},~{v69} ,~{v70},~{v71},~{v72},~{v73},~{v74},~{v75},~{v76},~{v77},~{v78},~{v79} ,~{v80},~{v81},~{v82},~{v83},~{v84},~{v85},~{v86},~{v87},~{v88},~{v89} ,~{v90},~{v91},~{v92},~{v93},~{v94},~{v95},~{v96},~{v97},~{v98},~{v99} ,~{v100},~{v101},~{v102},~{v103},~{v104},~{v105},~{v106},~{v107},~{v108},~{v109} ,~{v110},~{v111},~{v112},~{v113},~{v114},~{v115},~{v116},~{v117},~{v118},~{v119} ,~{v120},~{v121},~{v122},~{v123},~{v124},~{v125},~{v126},~{v127},~{v128},~{v129} ,~{v130},~{v131},~{v132},~{v133},~{v134},~{v135},~{v136},~{v137},~{v138},~{v139} ,~{v140},~{v141},~{v142},~{v143},~{v144},~{v145},~{v146},~{v147},~{v148},~{v149} ,~{v150},~{v151},~{v152},~{v153},~{v154},~{v155},~{v156},~{v157},~{v158},~{v159} ,~{v160},~{v161},~{v162},~{v163},~{v164},~{v165},~{v166},~{v167},~{v168},~{v169} ,~{v170},~{v171},~{v172},~{v173},~{v174},~{v175},~{v176},~{v177},~{v178},~{v179} ,~{v180},~{v181},~{v182},~{v183},~{v184},~{v185},~{v186},~{v187},~{v188},~{v189} ,~{v190},~{v191},~{v192},~{v193},~{v194},~{v195},~{v196},~{v197},~{v198},~{v199} ,~{v200},~{v201},~{v202},~{v203},~{v204},~{v205},~{v206},~{v207},~{v208},~{v209} ,~{v210},~{v211},~{v212},~{v213},~{v214},~{v215},~{v216},~{v217},~{v218},~{v219} ,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229} ,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239} ,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249} ,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}"() #1 You can also use amdgpu_max_waves_per_eu to artifically limit the number of available VGPRs to reduce number of registers you have to list

Fixed failing test cases. Now VGPR reservation happens in an
independent function and doesn't overload allocateSGPRSpillToVGPR.
Updated test case to hard code usage of all but one VGPRs and use
it for SGPR spill.

Harbormaster failed remote builds in B55986: Diff 262485!May 6 2020, 3:55 PM

arsenm added inline comments.May 6 2020, 4:35 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
347–348 ↗	(On Diff #262485)	Better name might be reserveVGPRforSGPRSpills?
355 ↗	(On Diff #262485)	Define and initialize LaneVGPR at the same time
358 ↗	(On Diff #262485)	Don't need this? It was never added?
361 ↗	(On Diff #262485)	Don't need DummyFI, can just pass None

Cleaned code for reserving vgpr.

Harbormaster failed remote builds in B55995: Diff 262508!May 6 2020, 6:46 PM

saiislam added a reviewer: cdevadas.May 7 2020, 2:48 AM

cdevadas added inline comments.May 7 2020, 9:26 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315 ↗	(On Diff #262508)	Doesn't it limit the total allowable SGPR spills to 64? What happens if more than 64 callee-saved SGPRs used in the functions?

saiislam marked an inline comment as done.May 7 2020, 10:45 AM

saiislam added inline comments.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315 ↗	(On Diff #262508)	Yes, this patch is limited to 64 SGPR spills only. Does it make sense to handle that case in a separate patch?

madhur13490 added a subscriber: madhur13490.May 7 2020, 10:52 AM

kerbowa added inline comments.May 7 2020, 11:44 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315 ↗	(On Diff #262508)	It seems like there is a fallback to spilling to VMEM if there are no free lanes. Doesn't this also limit the number of SGPR to VGPR spills to 32 on Navi?

arsenm added inline comments.May 8 2020, 8:21 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315 ↗	(On Diff #262508)	That fallback is 90% broken. This is ultimately still a workaround for the fact that SGPRs and VGPRs are allocated at the same time, so we can run out of VGPRs before we see all the SGPR spills that need to be handled

arsenm added inline comments.May 8 2020, 8:23 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10803–10804	Don't need the dummy stack object anymore
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
349 ↗	(On Diff #262508)	The FI argument is unused

Removed unsued frame index argument from vgpr reserving function. Added condition to reserve vgpr only if machine function has stack objects.

Harbormaster failed remote builds in B56190: Diff 262936!May 8 2020, 1:58 PM

Handled a test case which was failing in debug only.

Harbormaster failed remote builds in B56205: Diff 262968!May 8 2020, 4:08 PM

Switched the nested loops in lowerShiftReservedVGPR() to move
sorting of unique live-ins to the outer loop. Improved cleanup of
reserved vgpr after lowering it to a lower value, and when it needs
to be removed.