This is an archive of the discontinued LLVM Phabricator instance.

[wip] AMDGPU: Try to restore SP correctly in presence of dynamic stack adjustments
Needs ReviewPublic

Authored by arsenm on Aug 17 2023, 8:40 AM.

Details

Reviewers
scott.linder
cdevadas
yassingh
sebastian-ne
Group Reviewers
Restricted Project
Summary

Currently have no idea if this really works. We could probably try
restoring the SP from the base pointer if it's available. For some
reason we're currently inserting CSR spills before the stack is
realigned.

Diff Detail

Event Timeline

arsenm created this revision.Aug 17 2023, 8:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 8:40 AM
arsenm requested review of this revision.Aug 17 2023, 8:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 8:40 AM
Herald added a subscriber: wdng. · View Herald Transcript

Just to make sure I understand our current scheme correctly (before this patch).
If we re-align the stack, we do (ignoring the scale factor)

fp = sp + (alignment - 1)
fp &= -alignment
sp += frameSize + alignment

And in the epilogue:

sp -= frameSize + alignment

Due to the alignment of fp (but not sp), the allocated stack size sp - fp may be larger than needed, but it is restored correctly.
However, sp is not aligned, so maybe this causes problems when calling another function that expects the stack to be already aligned?

In case the stack is already aligned and does not need re-alignment, using a sp = fp in the epilogue sounds ok to me.
For the re-alignment case, sp = fp - (alignment - 1) looks incorrect to me.

If we do not want the over-commitment, we need to enforce the presence of a base pointer and in the epilogue, restore the stack pointer from the base pointer (sp = bp).

Just to make sure I understand our current scheme correctly (before this patch).
If we re-align the stack, we do (ignoring the scale factor)

fp = sp + (alignment - 1)
fp &= -alignment
sp += frameSize + alignment

And in the epilogue:

sp -= frameSize + alignment

Yes, this is essentially what was happening

Due to the alignment of fp (but not sp), the allocated stack size sp - fp may be larger than needed, but it is restored correctly.
However, sp is not aligned, so maybe this causes problems when calling another function that expects the stack to be already aligned?

We currently assume 16 byte alignment on stack entry. Realignment is triggered by a stack object with a larger alignment requirement, or forced with the "stackrealign" attribute.

I don't think there was any pre-existing issue with stack realignment. We restored the realigned size in the epilog. The problem is if there were any dynamic stack adjustments, the restore using a fixed offset just assumed none happened.

In case the stack is already aligned and does not need re-alignment, using a sp = fp in the epilogue sounds ok to me.
For the re-alignment case, sp = fp - (alignment - 1) looks incorrect to me.

If we do not want the over-commitment, we need to enforce the presence of a base pointer and in the epilogue, restore the stack pointer from the base pointer (sp = bp).

The FP is set after the stack is realigned, so the original SP has the additional alignment padding offset

The problem is if there were any dynamic stack adjustments, the restore using a fixed offset just assumed none happened.

I see, that is problematic.

An example, where I think the new code fails:

sp = 0x10
alignment = 0x20
frameSize = 0x10

// prologue
fp = sp + (alignment - 1) = 0x2f
fp &= -alignment = 0x20
sp += frameSize + alignment = 0x40
// some dynamic allocation happens
sp += 0x18 = 0x58
// everything ok so far

// epilogue
sp = fp - (alignment - 1) = 0x20 - 0x1f = 0x01
// but sp was 0x10 at the start