Page MenuHomePhabricator

[AMDGPU] Reserving VGPR for future SGPR Spill
ClosedPublic

Authored by saiislam on Nov 18 2019, 2:00 AM.

Details

Summary

One VGPR register is allocated to handle a future spill of SGPR if "--amdgpu-reserve-vgpr-for-sgpr-spill" option is used

Diff Detail

Event Timeline

saiislam created this revision.Nov 18 2019, 2:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 2:00 AM
saiislam updated this revision to Diff 229782.Nov 18 2019, 2:53 AM

Corrected formatting

saiislam edited the summary of this revision. (Show Details)Nov 18 2019, 2:55 AM
arsenm added a comment.Apr 2 2020, 3:00 PM

Can you add a test for this

saiislam updated this revision to Diff 256537.Apr 10 2020, 3:17 AM

Removed reserved VGPR from RA list. Added a test case which
ensures spilling of SGPR into reserved VGPR.

arsenm added inline comments.Apr 10 2020, 7:33 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
100

This should be the default, but the test disruption is probably significant without forcing the VGPR to be the high register, and adjust it down after RA

llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
13

I don't see how this test really stresses the need for SGPR spilling with no free VGPRs. I would expect this to look more like the tests in spill-wide-sgpr.ll, or spill-scavenge-offset.ll to force a high pressure

21

This is the removed form of the attribute. This should be something like frame-pointer=none, although I don't think it matters here

saiislam marked 2 inline comments as done.Apr 10 2020, 9:58 PM
saiislam added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
100

Yes, you are right. 285 test cases in AMDGPU codegen are failing by making it default, in the current state.

llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
13

The arguments of the function are taking up all the 256 VGPRs, so FP (s34) and SGPRs (s30 and s31) are getting spilled into reserved VGPR (v32), and restored later.
Tests in spill-wide-sgpr.ll are also enforcing 2,4,8 way spilling of SGPR, which I can do here by modifying the body of parent_func while keeping the function signature of both functions. Will it be ok then?

saiislam updated this revision to Diff 259408.Apr 22 2020, 2:45 PM

Now highest available VGPR is reserved in the beginning and it is
switched with lowest available VGPR after RA.

arsenm added inline comments.Apr 22 2020, 3:26 PM
llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
320–327

Should split into another function. It also isn't ensuring it's a CSR?

359

Commented out code

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
488–489

This shouldn't be separate from the existing SGPR spill infrastructure. This is only pre-allocating one register

saiislam updated this revision to Diff 261735.May 3 2020, 8:45 PM

Rebased and added code for proper lower shifting of reserved VGPR.

saiislam updated this revision to Diff 261736.May 3 2020, 8:50 PM

Removed commented code.

saiislam updated this revision to Diff 262067.May 5 2020, 4:51 AM

Handled few seg faults in the lit-tests.

arsenm added inline comments.May 5 2020, 7:19 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10867

Should add a note that this is a hack; if we split SGPR allocation from VGPR we shouldn't need to do this

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
245

Don't need = NoRegister

254

Don't need != NoRegister. Also can invert and return early

258

doesn't need to be a reference

261–263

You can just unconditionally removeLiveIn. The isLiveIn is just a redundant map lookup

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
573–574

Should get a comment about what this is for. A better name might be removeVGPRForSGPRSpill?

575

++i

580–581

Can unconditionally removeLiveIn

582

You're going to end up calling sortUniqueLiveIns for every block, for every spill VGPR. The common case is only 1 spill VGPR, but you could collect all of the registers and do one walk through the function

llvm/test/CodeGen/AMDGPU/reserve-vgpr-for-sgpr-spill.ll
13

This won't use all 256 VGPRs. The arguments go to the stack after > 32. It happens to look like an external call looks like it uses a lot of registers, but this is likely to change in the future. A more reliable way would be to use an ugly blob of asm, like

call void asm sideeffect "",
  "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
  ,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
  ,~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29}
  ,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}
  ,~{v40},~{v41},~{v42},~{v43},~{v44},~{v45},~{v46},~{v47},~{v48},~{v49}
  ,~{v50},~{v51},~{v52},~{v53},~{v54},~{v55},~{v56},~{v57},~{v58},~{v59}
  ,~{v60},~{v61},~{v62},~{v63},~{v64},~{v65},~{v66},~{v67},~{v68},~{v69}
  ,~{v70},~{v71},~{v72},~{v73},~{v74},~{v75},~{v76},~{v77},~{v78},~{v79}
  ,~{v80},~{v81},~{v82},~{v83},~{v84},~{v85},~{v86},~{v87},~{v88},~{v89}
  ,~{v90},~{v91},~{v92},~{v93},~{v94},~{v95},~{v96},~{v97},~{v98},~{v99}
  ,~{v100},~{v101},~{v102},~{v103},~{v104},~{v105},~{v106},~{v107},~{v108},~{v109}
  ,~{v110},~{v111},~{v112},~{v113},~{v114},~{v115},~{v116},~{v117},~{v118},~{v119}
  ,~{v120},~{v121},~{v122},~{v123},~{v124},~{v125},~{v126},~{v127},~{v128},~{v129}
  ,~{v130},~{v131},~{v132},~{v133},~{v134},~{v135},~{v136},~{v137},~{v138},~{v139}
  ,~{v140},~{v141},~{v142},~{v143},~{v144},~{v145},~{v146},~{v147},~{v148},~{v149}
  ,~{v150},~{v151},~{v152},~{v153},~{v154},~{v155},~{v156},~{v157},~{v158},~{v159}
  ,~{v160},~{v161},~{v162},~{v163},~{v164},~{v165},~{v166},~{v167},~{v168},~{v169}
  ,~{v170},~{v171},~{v172},~{v173},~{v174},~{v175},~{v176},~{v177},~{v178},~{v179}
  ,~{v180},~{v181},~{v182},~{v183},~{v184},~{v185},~{v186},~{v187},~{v188},~{v189}
  ,~{v190},~{v191},~{v192},~{v193},~{v194},~{v195},~{v196},~{v197},~{v198},~{v199}
  ,~{v200},~{v201},~{v202},~{v203},~{v204},~{v205},~{v206},~{v207},~{v208},~{v209}
  ,~{v210},~{v211},~{v212},~{v213},~{v214},~{v215},~{v216},~{v217},~{v218},~{v219}
  ,~{v220},~{v221},~{v222},~{v223},~{v224},~{v225},~{v226},~{v227},~{v228},~{v229}
  ,~{v230},~{v231},~{v232},~{v233},~{v234},~{v235},~{v236},~{v237},~{v238},~{v239}
  ,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}
  ,~{v250},~{v251},~{v252},~{v253},~{v254},~{v255}"() #1

You can also use amdgpu_max_waves_per_eu to artifically limit the number of available VGPRs to reduce number of registers you have to list

saiislam updated this revision to Diff 262485.May 6 2020, 3:07 PM

Fixed failing test cases. Now VGPR reservation happens in an
independent function and doesn't overload allocateSGPRSpillToVGPR.
Updated test case to hard code usage of all but one VGPRs and use
it for SGPR spill.

arsenm added inline comments.May 6 2020, 4:35 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
347–348

Better name might be reserveVGPRforSGPRSpills?

355

Define and initialize LaneVGPR at the same time

358

Don't need this? It was never added?

361

Don't need DummyFI, can just pass None

saiislam updated this revision to Diff 262508.May 6 2020, 5:23 PM

Cleaned code for reserving vgpr.

cdevadas added inline comments.May 7 2020, 9:26 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315

Doesn't it limit the total allowable SGPR spills to 64?
What happens if more than 64 callee-saved SGPRs used in the functions?

saiislam marked an inline comment as done.May 7 2020, 10:45 AM
saiislam added inline comments.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315

Yes, this patch is limited to 64 SGPR spills only.
Does it make sense to handle that case in a separate patch?

kerbowa added inline comments.May 7 2020, 11:44 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315

It seems like there is a fallback to spilling to VMEM if there are no free lanes. Doesn't this also limit the number of SGPR to VGPR spills to 32 on Navi?

arsenm added inline comments.May 8 2020, 8:21 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
315

That fallback is 90% broken. This is ultimately still a workaround for the fact that SGPRs and VGPRs are allocated at the same time, so we can run out of VGPRs before we see all the SGPR spills that need to be handled

arsenm added inline comments.May 8 2020, 8:23 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10871–10872

Don't need the dummy stack object anymore

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
349

The FI argument is unused

saiislam updated this revision to Diff 262936.May 8 2020, 1:08 PM
saiislam marked an inline comment as not done.

Removed unsued frame index argument from vgpr reserving function. Added condition to reserve vgpr only if machine function has stack objects.

saiislam updated this revision to Diff 262968.May 8 2020, 3:21 PM

Handled a test case which was failing in debug only.

saiislam updated this revision to Diff 263220.May 11 2020, 11:13 AM

Switched the nested loops in lowerShiftReservedVGPR() to move
sorting of unique live-ins to the outer loop. Improved cleanup of
reserved vgpr after lowering it to a lower value, and when it needs
to be removed.

saiislam updated this revision to Diff 263228.May 11 2020, 11:33 AM

Removed a break

arsenm accepted this revision.May 11 2020, 12:31 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
584

Register()

This revision is now accepted and ready to land.May 11 2020, 12:31 PM
saiislam closed this revision.May 11 2020, 5:36 PM