This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The physical registers for the return address used by s_setpc need to go into a caller-save register and that seems to be wired to CCR_SGPR_64 defined in SIRegisterInfo.td as s0...s15.
That should be changed to s35..s63 for amdgpu_gfx functions.
I think that’s the cause for some of the spilling here and also for no_stack_call using stack (which it should not do).
Updates the call-clobbered SGPRs used for the return addresses of function calls to match the call-clobbered registers from the AMDGPU_Gfx calling convention. It does so by adding a new pseudo s_setpc_b64_return_gfx and changing the implementations so the new register class gets used.
The assembly changes look a lot better now.
Some of the mir tests have a lot of changes caused by the update scripts. Can you pre-commit these changes? (i.e. regenerate the tests on main, make sure they still pass, commit it and then rebase this review) That should make the diff more readable.
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | ||
---|---|---|
695–696 | If I understand this correctly, Gfx_CCR_SGPR_64 contains s[30:31], s[32:33], …, s[94:95] (33 pairs, starting with the 16th). |
We're working on a proposal for a new ABI register layout which somewhat conflicts with this. We want to ,move all special SGPR arguments to a compact range in a callee saved range. Some of these special registers will conflict with the range you are expecting to remain untouched
I think part of that new ABI layout is implemented in D102177.
There should be no conflict other than requiring the register ranges to be adjusted. The intent behind this change is to make the user-defined inreg arguments callee-save, so if some future ABI change moves the SGPR arguments to s[0:23], CSR_AMDGPU_SI_Gfx_SGPRs_4_29 should be changed to CSR_AMDGPU_SI_Gfx_SGPRs_0_23 accordingly and Gfx_CCR_SGPR_64 to s[28:29] + s[42:43], s[44:45], ….
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | ||
---|---|---|
697 | I think the 15 should be a 14. |
Rebased the patch on a commit with tests generated by a recent version of the Python test script to make some diffs easier to read.
Thanks for pre-regenerating the tests
llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll | ||
---|---|---|
20–41 | This diff does not look that good. My guess is this is because s33 is made caller-save but was callee-save before. Can you try if adding s[32:35] to the callee-save registers fixes that? |
llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll | ||
---|---|---|
20–41 | Is this correct? I thought, s33 is one of the reserved registers. |
llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll | ||
---|---|---|
20–41 | Yes, s33 is the frame pointer.
No, -NEXT means the match *has* to be the next line relative to the check before, so there can be no lines missing in-between. |
llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll | ||
---|---|---|
20–41 | Thanks. As discussed, this seems to be intended behavior because of the inreg argument. Before making s[32:35] callee-save, I am going to do a few experiments to see if this patch adds new unnecessary save-and-restore ISA for these registers in real life cases, if this is the case, I am going to mark them as callee-save . |
@sebastian-ne As far as I can tell from my tests, there is no additional spilling code generated for the SGPRs s[32:35]. I think we should be good to go with this revision and without additionally marking these registers as callee-save. A whole lot of superfluous s_mov_b32 instructions are removed by the patch, making the waterfall loops more clean, just as intended.
Thanks, for doing the analysis.
I get one warning with this patch, otherwise LGTM.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp:4356:11: warning: enumeration value 'RET_GFX_FLAG' not handled in switch [-Wswitch] switch ((AMDGPUISD::NodeType)Opcode) { ^ 1 warning generated.
If I understand this correctly, Gfx_CCR_SGPR_64 contains s[30:31], s[32:33], …, s[94:95] (33 pairs, starting with the 16th).
But it should not contain reserved registers like the stack pointer (s32, 33, 34) and also not contain callee-save registers (s64, s65, …).