This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Define sub-class of SGPR_64 for tail call return
ClosedPublic

Authored by cfang on Apr 20 2023, 11:20 AM.

Details

Summary
Registers for tail call return should not be clobbered by callee.

So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold
the tail call return address.

Because GFX and C calling conventions have different CSR, we need to define
the sub-class separately. This work is an extension of D147096 with the
consideration of GFX calling convention.

Based on the calling conventions, different instructions will be selected with
different sub-class of SGPR_64 as the input.

Diff Detail

Event Timeline

cfang created this revision.Apr 20 2023, 11:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 11:20 AM
cfang requested review of this revision.Apr 20 2023, 11:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 11:20 AM
Herald added a subscriber: wdng. · View Herald Transcript
cfang updated this revision to Diff 515406.Apr 20 2023, 11:26 AM

remove an unintentional blank line at the top of SIRegisterInfo.td

cdevadas added inline comments.Apr 25 2023, 10:24 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
830

@sebastian-ne can you review this part, the gfx scratch SGPRs?

The rest of the patch LGTM.

arsenm added inline comments.Apr 25 2023, 6:53 PM
llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
88–98

Could define a common SDTypeProfile and avoid duplicating it between the two

llvm/lib/Target/AMDGPU/SIInstructions.td
618–643

Can use common class to avoid duplication here as well

sebastian-ne added inline comments.Apr 26 2023, 1:31 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
830

Thanks for the fix!

The s[36:37]-s[s62:63] range looks good. Why is s[30:31] part of Gfx_CCR_SGPR_64 when it is not part of CCR_SGPR_64?

Stepping back for a moment, why does amdgpu_gfx use a different set of callee saved registers? It shouldn't really have different needs

Stepping back for a moment, why does amdgpu_gfx use a different set of callee saved registers? It shouldn't really have different needs

Right, it does not have different needs.

Both, compute and graphics, pass user data in SGPR and want to preserve them through function calls.
In compute, all of this is handled through reserved registers in the AMDGPU backend. In graphics, everything is more variable, so we handle user-data in the frontend and add it as inreg arguments to all functions.

This is why for graphics, we need the calling convention to preserve SGPR arguments.

When it was proposed to do this for all calling conventions – compute and graphics – there were objections, so we ended up marking SGPR arguments callee save only for amdgpu_gfx.

Stepping back for a moment, why does amdgpu_gfx use a different set of callee saved registers? It shouldn't really have different needs

Right, it does not have different needs.

Both, compute and graphics, pass user data in SGPR and want to preserve them through function calls.
In compute, all of this is handled through reserved registers in the AMDGPU backend. In graphics, everything is more variable, so we handle user-data in the frontend and add it as inreg arguments to all functions.

We should start respecting inreg for regular functions. It's just ignored right now. We just wouldn't make use of it as-is.

cfang added inline comments.Apr 26 2023, 11:08 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
830

Actually I just try to recover what were defined before. My understanding is that s[30:31] is still CSR and we want to avoid CSR completely. I am not sure about Gfx_CCR_SGPR_64.

Matt and CD, can you confirm that s[30:31] should not be used as the return address of tail call?

cdevadas added inline comments.Apr 26 2023, 11:58 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
830

I think for the compute side we should not use S[30:31] for tail call target addresses as they are now CSRs, and their spills/restores at prolog/epilog are needed for the debug unwinding info.

def CCR_SGPR_64 : SIRegisterClass<"AMDGPU", SGPR_64.RegTypes, 32, (add (trunc SGPR_64, 16))

Here S[30:31] pair is currently part of CCR_SGPR_64. The immediate operand in trunc should be 15 instead of 16 to exclude it.

cfang added inline comments.Apr 26 2023, 2:49 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
830

Should we should also exclude s[30:31] for Gfx_CCR_SGPR_64. What do you think, @sebastian-ne ?

cfang updated this revision to Diff 517360.Apr 26 2023, 3:41 PM
  1. Exclude s[30:31] from CCR_SGPR_64 and Gfx_CCR_SGPR_64
  2. Define a SDTypeProfile for SDNodes AMDGPUtc_return and AMDGPUtc_return_gfx
  3. Define a common class for instructions SI_TCRETURN and SI_TCRETURN_GFX.
arsenm accepted this revision.Apr 26 2023, 4:55 PM
This revision is now accepted and ready to land.Apr 26 2023, 4:55 PM
sebastian-ne accepted this revision.Apr 27 2023, 1:07 AM

s[30:31] is reserved for the return address anyway, so the compiler should never use it for tail calls (independent of if it is part of the CCR list or not). Excluding it from that list makes sense to me :)

This revision was landed with ongoing or failed builds.Apr 27 2023, 10:46 AM
This revision was automatically updated to reflect the committed changes.