This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][RFC] Improve sgpr function arguments
Needs RevisionPublic

Authored by sebastian-ne on May 10 2021, 9:18 AM.

Details

Summary

The SGPR layout on functions calls currently looks like this:

s[0:3] SRDarguments...s[30:31] return addresss32 stack pointers33 frame pointers34 base pointer

The return address and stack pointer occupy multiple 4-aligned blocks
of SGPRs.
Large scalar memory reads require a 4-aligned block of SGPRs, so if less
of them are available, register allocation becomes more difficult.

The stack resource descriptor occupies SGPR0-3. If we want to pass
user-data SGPRS to a function, the SGPRs need to be moved from s[0:...]
to s[4:...] before the call.
This is also the case when flat scratch is used instead of the SRD, even
if s[0:4] is unused then, because the same call convention is used.

To improve this, I propose the following layout:

arguments...s[24:27] SRDs[28:29] return address...s35 stack pointer...s40 frame pointers41 base pointer

To free s[0:3] for arguments, the SRD is moved to s[24:27]. This has
the effect, that all of s[0:23] can be used for arguments.

The base pointer is not used in the general case, so it is moved to
s41.

Diff Detail

Event Timeline

sebastian-ne created this revision.May 10 2021, 9:18 AM
sebastian-ne requested review of this revision.May 10 2021, 9:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2021, 9:18 AM

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

Well, then you're saying unification of both ABIs and it is not discussed thoroughly internally. The layout needs to be documented and get reviewed internally before we can proceed with this patch.

t-tye added a comment.May 11 2021, 8:38 AM

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

Well, then you're saying unification of both ABIs and it is not discussed thoroughly internally. The layout needs to be documented and get reviewed internally before we can proceed with this patch.

I think the reason that compute was considering putting FP and BP at the high registers is because they can often be optimized away. So putting them there allows them to be contiguous with other callee user registers rather then leaving an unused hole that is harder for the register allocator to use. Is that reasonable? How does that work for gfx?

I am all for trying to unify the call convention for compute and gfx if possible:-)

Move registers to be more in line with other plans.

sebastian-ne edited the summary of this revision. (Show Details)Oct 12 2021, 4:30 AM
arsenm requested changes to this revision.Nov 16 2022, 4:11 PM

Current proposal is different values

This revision now requires changes to proceed.Nov 16 2022, 4:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 4:11 PM
Herald added a subscriber: kosarev. · View Herald Transcript