Page MenuHomePhabricator

[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.

Authored by tsymalla on Oct 12 2021, 6:10 AM.



This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Diff Detail

Event Timeline

tsymalla created this revision.Oct 12 2021, 6:10 AM
tsymalla requested review of this revision.Oct 12 2021, 6:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 6:10 AM
tsymalla updated this revision to Diff 379025.Oct 12 2021, 7:08 AM

Re-formatted SIRegisterInfo.cpp.

The physical registers for the return address used by s_setpc need to go into a caller-save register and that seems to be wired to CCR_SGPR_64 defined in as s0...s15.
That should be changed to s35..s63 for amdgpu_gfx functions.
I think that’s the cause for some of the spilling here and also for no_stack_call using stack (which it should not do).

tsymalla planned changes to this revision.Oct 15 2021, 7:53 AM

Need to do a little bit more work on this.

tsymalla updated this revision to Diff 380632.Oct 19 2021, 3:25 AM

Updates the call-clobbered SGPRs used for the return addresses of function calls to match the call-clobbered registers from the AMDGPU_Gfx calling convention. It does so by adding a new pseudo s_setpc_b64_return_gfx and changing the implementations so the new register class gets used.

The assembly changes look a lot better now.
Some of the mir tests have a lot of changes caused by the update scripts. Can you pre-commit these changes? (i.e. regenerate the tests on main, make sure they still pass, commit it and then rebase this review) That should make the diff more readable.


If I understand this correctly, Gfx_CCR_SGPR_64 contains s[30:31], s[32:33], …, s[94:95] (33 pairs, starting with the 16th).
But it should not contain reserved registers like the stack pointer (s32, 33, 34) and also not contain callee-save registers (s64, s65, …).

We're working on a proposal for a new ABI register layout which somewhat conflicts with this. We want to ,move all special SGPR arguments to a compact range in a callee saved range. Some of these special registers will conflict with the range you are expecting to remain untouched

I think part of that new ABI layout is implemented in D102177.
There should be no conflict other than requiring the register ranges to be adjusted. The intent behind this change is to make the user-defined inreg arguments callee-save, so if some future ABI change moves the SGPR arguments to s[0:23], CSR_AMDGPU_SI_Gfx_SGPRs_4_29 should be changed to CSR_AMDGPU_SI_Gfx_SGPRs_0_23 accordingly and Gfx_CCR_SGPR_64 to s[28:29] + s[42:43], s[44:45], ….

tsymalla updated this revision to Diff 381618.Oct 22 2021, 11:55 AM

Changed the register range, and updated the tests.

sebastian-ne added inline comments.Oct 25 2021, 1:19 AM

I think the 15 should be a 14.

tsymalla updated this revision to Diff 381889.Oct 25 2021, 2:09 AM

Rebased the patch on a commit with tests generated by a recent version of the Python test script to make some diffs easier to read.

tsymalla marked 2 inline comments as done.Oct 25 2021, 2:11 AM

Thanks for pre-regenerating the tests


This diff does not look that good. My guess is this is because s33 is made caller-save but was callee-save before. Can you try if adding s[32:35] to the callee-save registers fixes that?

tsymalla added inline comments.Oct 25 2021, 3:00 AM

Is this correct? I thought, s33 is one of the reserved registers.
Could this output be related to the changed behavior of the script (e. g. the old revision of this test had some -NEXT lines missing)?

sebastian-ne added inline comments.Oct 25 2021, 3:08 AM

Yes, s33 is the frame pointer.

Could this output be related to the changed behavior of the script (e. g. the old revision of this test had some -NEXT lines missing)?

No, -NEXT means the match *has* to be the next line relative to the check before, so there can be no lines missing in-between.
The changed behavior was the update_mir script that did not generate -NEXT lines for us before but does now. This test uses the update_llc script, which already emitted -NEXT checks.

tsymalla added inline comments.Oct 25 2021, 5:50 AM

Thanks. As discussed, this seems to be intended behavior because of the inreg argument. Before making s[32:35] callee-save, I am going to do a few experiments to see if this patch adds new unnecessary save-and-restore ISA for these registers in real life cases, if this is the case, I am going to mark them as callee-save .

@sebastian-ne As far as I can tell from my tests, there is no additional spilling code generated for the SGPRs s[32:35]. I think we should be good to go with this revision and without additionally marking these registers as callee-save. A whole lot of superfluous s_mov_b32 instructions are removed by the patch, making the waterfall loops more clean, just as intended.

sebastian-ne accepted this revision.Nov 4 2021, 8:10 AM

Thanks, for doing the analysis.
I get one warning with this patch, otherwise LGTM.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp:4356:11: warning: enumeration value 'RET_GFX_FLAG' not handled in switch [-Wswitch]
  switch ((AMDGPUISD::NodeType)Opcode) {
1 warning generated.
This revision is now accepted and ready to land.Nov 4 2021, 8:10 AM
tsymalla updated this revision to Diff 384767.Nov 4 2021, 8:43 AM

Fix warning by registering the RET_GFX_FLAG as DAG node.

Added the enum value to the DAG node->string switch.

tsymalla updated this revision to Diff 384838.Nov 4 2021, 12:04 PM

Updated the tests.