This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
ClosedPublic

Authored by rovka on Jun 22 2023, 3:28 AM.

Details

Reviewers
arsenm
Group Reviewers
Restricted Project
Commits
rG26dc28449842: [AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
Summary

Lower formal arguments and returns for functions with the
amdgpu_cs_chain and amdgpu_cs_chain_preserve calling conventions:

  • Put inreg arguments into SGPRs, starting at s0, and other arguments

into VGPRs, starting at v8. No arguments should end up on the stack, if
we don't have enough registers we should error out.

  • Lower the return (which is always void) as an S_ENDPGM.
  • Set the ScratchRSrc register to s48:51, as described in the docs.
  • Set the SP to s32.

Diff Detail

Event Timeline

rovka created this revision.Jun 22 2023, 3:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2023, 3:28 AM
rovka requested review of this revision.Jun 22 2023, 3:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2023, 3:28 AM
rovka added a reviewer: Restricted Project.Jun 22 2023, 3:29 AM
arsenm added inline comments.Jun 22 2023, 3:30 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
91

This is an odd choice of value and it doesn't exist on all subtargets

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-cc.ll
11

Add gfx6 or 7 run line from before s105 existed?

arsenm added inline comments.Jun 22 2023, 3:32 AM
llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-cc.ll
206

Don't think these tests cover different address space pointers

513

Test bfloat too?

rovka added inline comments.Jun 22 2023, 3:35 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
91

I tried setting the SP in finalizeLowering using basically this snippet, but it blew up for Global ISel (I think because of some issue with the liveins). I figured I should ask here before going too far down that rabbit hole - does that sound like the right approach, or is there a better place to handle the SP? I'm guessing there are other things I could prioritize before fixing this, but do let me know if you think otherwise.

rovka added inline comments.Jun 22 2023, 3:59 AM
llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-cc.ll
11

I don't think we plan to use this on targets older than gfx10, and adding more run lines here would make it even harder to follow changes to the checks (which are honestly already a bit of an eyesore). Should I just use a lower register number for now? How does s52 sound?

rovka updated this revision to Diff 534494.Jun 26 2023, 4:12 AM

Add more tests (bfloat, pointers to a few address spaces), move SP to s52.

nhaehnle added inline comments.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
91

Realistically, we're only ever going to use this on gfx10+, where the register always exists. If you're too worried, we could have a corresponding assertion.

That said, it's not clear to me what this value can end up being used for, since chain functions don't have a stack pointer input or output. A stack pointer should be set up if we call an amdgpu_gfx function; does it matter whether the chosen register is aligned with that calling convention?

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll
16

These functions probably need way more scratch save/restore. Actually, I'm starting to suspect that we can't reliably code-generate amdgpu_cs_chain_preserve functions that call amdgpu_gfx functions, because the amdgpu_gfx calling convention has some caller-saves registers, which just doesn't mesh well.

This probably needs a bit more thought, though it shouldn't be a big problem in practice: the purpose / intended use of the _preserve calling convention is very narrow.

rovka added inline comments.Jun 27 2023, 2:17 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
91

Realistically, we're only ever going to use this on gfx10+, where the register always exists. If you're too worried, we could have a corresponding assertion.

Ok, I could add that.

That said, it's not clear to me what this value can end up being used for, since chain functions don't have a stack pointer input or output. A stack pointer should be set up if we call an amdgpu_gfx function; does it matter whether the chosen register is aligned with that calling convention?

Honestly I hadn't thought much about that, I was hellbent on putting it right after the arguments because there seems to be some precedent for that (and code already written; it's nice when code is already written). I'm going to ruminate some more on whether we can just use s32.

llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll
16

I'm not sure I understand your comment, I assume you mean it needs more scratch save/restore so it actually preserves things, like the name says? :) If that's what you had in mind, isn't that PEI's job? (I've only just started looking into that, since I thought it's not ISel's job - but I could be wrong)

nhaehnle added inline comments.Jun 27 2023, 4:21 AM
llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll
16

Okay, you may be right about that, which suggests it would be good to have some tests that actually look at final assembly as well.

But there's a bigger problem: let's say v192 is a caller-saved register in the amdgpu_gfx calling convention. What do we do if there's an amdgpu_gfx call in a _preserve function?

  • Save and restore v192 around the call inside the _preserve function unconditionally. But does this now mean that we always have to run this code with an allocation of 192 VGPRs? That's certainly bad for occupancy....
  • Do nothing. But then if the callee does use that many VGPR and clobbers v192, the outcome is incorrect.
  • Something else...?
rovka updated this revision to Diff 535316.Jun 28 2023, 4:07 AM
  • assert for GFX < 10
  • switch SP to s32
  • add tests all the way to assembly (with FIXMEs for the stuff that's not ISel related)
rovka added inline comments.Jun 28 2023, 4:12 AM
llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll
16

Okay, you may be right about that, which suggests it would be good to have some tests that actually look at final assembly as well.

But there's a bigger problem: let's say v192 is a caller-saved register in the amdgpu_gfx calling convention. What do we do if there's an amdgpu_gfx call in a _preserve function?

  • Save and restore v192 around the call inside the _preserve function unconditionally. But does this now mean that we always have to run this code with an allocation of 192 VGPRs? That's certainly bad for occupancy....
  • Do nothing. But then if the callee does use that many VGPR and clobbers v192, the outcome is incorrect.
  • Something else...?

Hm, I see your point. I'm probably going to implement the first option initially (and people can avoid calls to amdgpu_gfx functions in sensitive code...). I'll try to keep this in mind though, and maybe we can come up with something later on.

nhaehnle added inline comments.Jul 11 2023, 8:50 PM
llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll
16

Coming back to this after a while, I'm leaning towards saying that amdgpu_gfx calls from amdgpu_cs_chain_preserve functions simply shouldn't be supported.

In terms of use case scenarios, amdgpu_cs_chain_preserve is only meant for relatively small, internal (runtime library-style) "glue" functions anyway, so this shouldn't be a problem.

122–128

The code here assumes that s32 contains a stack pointer going into the function, but that's not correct -- there's no incoming stack pointer, and we should just use 0 as the stack base.

rovka updated this revision to Diff 544675.Jul 27 2023, 3:11 AM

Address comments. We no longer allow calls to amdgpu_gfx from amdgpu_cs_chain_preserve functions, so remove them from the tests. An update to the docs is coming in a separate patch.

rovka edited the summary of this revision. (Show Details)Jul 27 2023, 3:20 AM
rovka updated this revision to Diff 549302.Aug 11 2023, 2:21 AM
rovka marked an inline comment as not done.

Rebase and ping.

arsenm accepted this revision.Aug 17 2023, 6:06 AM

LGTM with assert dropped

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
87–88

Just remove the assert, I don't see anything actually preventing usage right now pre-gfx9 and this would need to be a proper error

This revision is now accepted and ready to land.Aug 17 2023, 6:06 AM
This revision was landed with ongoing or failed builds.Aug 21 2023, 2:16 AM
This revision was automatically updated to reflect the committed changes.