One VGPR register is allocated to handle a future spill of SGPR if "--amdgpu-reserve-vgpr-for-sgpr-spill" option is used
Details
Diff Detail
Event Timeline
Removed reserved VGPR from RA list. Added a test case which
ensures spilling of SGPR into reserved VGPR.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
105 | This should be the default, but the test disruption is probably significant without forcing the VGPR to be the high register, and adjust it down after RA | |
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll | ||
12 ↗ | (On Diff #256537) | I don't see how this test really stresses the need for SGPR spilling with no free VGPRs. I would expect this to look more like the tests in spill-wide-sgpr.ll, or spill-scavenge-offset.ll to force a high pressure |
20 ↗ | (On Diff #256537) | This is the removed form of the attribute. This should be something like frame-pointer=none, although I don't think it matters here |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
105 | Yes, you are right. 285 test cases in AMDGPU codegen are failing by making it default, in the current state. | |
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll | ||
12 ↗ | (On Diff #256537) | The arguments of the function are taking up all the 256 VGPRs, so FP (s34) and SGPRs (s30 and s31) are getting spilled into reserved VGPR (v32), and restored later. |
Now highest available VGPR is reserved in the beginning and it is
switched with lowest available VGPR after RA.
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp | ||
---|---|---|
271–278 ↗ | (On Diff #259408) | Should split into another function. It also isn't ensuring it's a CSR? |
318 ↗ | (On Diff #259408) | Commented out code |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
488–489 ↗ | (On Diff #259408) | This shouldn't be separate from the existing SGPR spill infrastructure. This is only pre-allocating one register |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
10799 | Should add a note that this is a hack; if we split SGPR allocation from VGPR we shouldn't need to do this | |
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp | ||
245 ↗ | (On Diff #262067) | Don't need = NoRegister |
254 ↗ | (On Diff #262067) | Don't need != NoRegister. Also can invert and return early |
258 ↗ | (On Diff #262067) | doesn't need to be a reference |
261–263 ↗ | (On Diff #262067) | You can just unconditionally removeLiveIn. The isLiveIn is just a redundant map lookup |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
543–544 ↗ | (On Diff #262067) | Should get a comment about what this is for. A better name might be removeVGPRForSGPRSpill? |
545 ↗ | (On Diff #262067) | ++i |
550–551 ↗ | (On Diff #262067) | Can unconditionally removeLiveIn |
552 ↗ | (On Diff #262067) | You're going to end up calling sortUniqueLiveIns for every block, for every spill VGPR. The common case is only 1 spill VGPR, but you could collect all of the registers and do one walk through the function |
llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll | ||
12 ↗ | (On Diff #256537) | This won't use all 256 VGPRs. The arguments go to the stack after > 32. It happens to look like an external call looks like it uses a lot of registers, but this is likely to change in the future. A more reliable way would be to use an ugly blob of asm, like call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9} ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19} ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29} ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39} ,~{v40},~{v41},~{v42},~{v43},~{v44},~{v45},~{v46},~{v47},~{v48},~{v49} ,~{v50},~{v51},~{v52},~{v53},~{v54},~{v55},~{v56},~{v57},~{v58},~{v59} ,~{v60},~{v61},~{v62},~{v63},~{v64},~{v65},~{v66},~{v67},~{v68},~{v69} ,~{v70},~{v71},~{v72},~{v73},~{v74},~{v75},~{v76},~{v77},~{v78},~{v79} ,~{v80},~{v81},~{v82},~{v83},~{v84},~{v85},~{v86},~{v87},~{v88},~{v89} ,~{v90},~{v91},~{v92},~{v93},~{v94},~{v95},~{v96},~{v97},~{v98},~{v99} ,~{v100},~{v101},~{v102},~{v103},~{v104},~{v105},~{v106},~{v107},~{v108},~{v109} ,~{v110},~{v111},~{v112},~{v113},~{v114},~{v115},~{v116},~{v117},~{v118},~{v119} ,~{v120},~{v121},~{v122},~{v123},~{v124},~{v125},~{v126},~{v127},~{v128},~{v129} ,~{v130},~{v131},~{v132},~{v133},~{v134},~{v135},~{v136},~{v137},~{v138},~{v139} ,~{v140},~{v141},~{v142},~{v143},~{v144},~{v145},~{v146},~{v147},~{v148},~{v149} ,~{v150},~{v151},~{v152},~{v153},~{v154},~{v155},~{v156},~{v157},~{v158},~{v159} ,~{v160},~{v161},~{v162},~{v163},~{v164},~{v165},~{v166},~{v167},~{v168},~{v169} ,~{v170},~{v171},~{v172},~{v173},~{v174},~{v175},~{v176},~{v177},~{v178},~{v179} ,~{v180},~{v181},~{v182},~{v183},~{v184},~{v185},~{v186},~{v187},~{v188},~{v189} ,~{v190},~{v191},~{v192},~{v193},~{v194},~{v195},~{v196},~{v197},~{v198},~{v199} ,~{v200},~{v201},~{v202},~{v203},~{v204},~{v205},~{v206},~{v207},~{v208},~{v209} ,~{v210},~{v211},~{v212},~{v213},~{v214},~{v215},~{v216},~{v217},~{v218},~{v219} ,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229} ,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239} ,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249} ,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}"() #1 You can also use amdgpu_max_waves_per_eu to artifically limit the number of available VGPRs to reduce number of registers you have to list |
Fixed failing test cases. Now VGPR reservation happens in an
independent function and doesn't overload allocateSGPRSpillToVGPR.
Updated test case to hard code usage of all but one VGPRs and use
it for SGPR spill.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
347–348 ↗ | (On Diff #262485) | Better name might be reserveVGPRforSGPRSpills? |
355 ↗ | (On Diff #262485) | Define and initialize LaneVGPR at the same time |
358 ↗ | (On Diff #262485) | Don't need this? It was never added? |
361 ↗ | (On Diff #262485) | Don't need DummyFI, can just pass None |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
315 ↗ | (On Diff #262508) | Doesn't it limit the total allowable SGPR spills to 64? |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
315 ↗ | (On Diff #262508) | Yes, this patch is limited to 64 SGPR spills only. |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
315 ↗ | (On Diff #262508) | It seems like there is a fallback to spilling to VMEM if there are no free lanes. Doesn't this also limit the number of SGPR to VGPR spills to 32 on Navi? |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
315 ↗ | (On Diff #262508) | That fallback is 90% broken. This is ultimately still a workaround for the fact that SGPRs and VGPRs are allocated at the same time, so we can run out of VGPRs before we see all the SGPR spills that need to be handled |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
10803–10804 | Don't need the dummy stack object anymore | |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
349 ↗ | (On Diff #262508) | The FI argument is unused |
Removed unsued frame index argument from vgpr reserving function. Added condition to reserve vgpr only if machine function has stack objects.
Switched the nested loops in lowerShiftReservedVGPR() to move
sorting of unique live-ins to the outer loop. Improved cleanup of
reserved vgpr after lowering it to a lower value, and when it needs
to be removed.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
584 ↗ | (On Diff #263228) | Register() |
This should be the default, but the test disruption is probably significant without forcing the VGPR to be the high register, and adjust it down after RA