The SGPR layout on functions calls currently looks like this:
s[0:3] SRD | arguments... | s[30:31] return address | s32 stack pointer | s33 frame pointer | s34 base pointer |
The return address and stack pointer occupy multiple 4-aligned blocks
of SGPRs.
Large scalar memory reads require a 4-aligned block of SGPRs, so if less
of them are available, register allocation becomes more difficult.
The stack resource descriptor occupies SGPR0-3. If we want to pass
user-data SGPRS to a function, the SGPRs need to be moved from s[0:...]
to s[4:...] before the call.
This is also the case when flat scratch is used instead of the SRD, even
if s[0:4] is unused then, because the same call convention is used.
To improve this, I propose the following layout:
arguments... | s[24:27] SRD | s[28:29] return address | ... | s35 stack pointer | ... | s40 frame pointer | s41 base pointer |
To free s[0:3] for arguments, the SRD is moved to s[24:27]. This has
the effect, that all of s[0:23] can be used for arguments.
The base pointer is not used in the general case, so it is moved to
s41.