Page MenuHomePhabricator

[AMDGPU] Fixup use of StackPtrOffsetReg when not initialized
Needs ReviewPublic

Authored by dstuttard on Jun 3 2020, 2:15 AM.

Details

Summary

In some cases, StackPtrOffsetReg will not have been initialized. (In the case
where MFI.hasCalls() evaluates to false).

In this situation we can use the register, but don't need to add/subtract the
offset (it's not used in the rest of the entry function).

Change-Id: If317b29ff1e5c75a3ed3662920e6e960bca3b782

Diff Detail

Event Timeline

dstuttard created this revision.Jun 3 2020, 2:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2020, 2:15 AM
dstuttard updated this revision to Diff 268107.Jun 3 2020, 2:33 AM

Improved the test to include one that has a call provoking s32 to be initialized

dstuttard updated this revision to Diff 268143.Jun 3 2020, 4:47 AM

Rebase to get tests to pass

arsenm added a comment.Jun 4 2020, 2:02 PM

A small MIR test would also be helpful

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
460

hasCalls seems like the wrong check here. Should there be a stackPointerIsTracked() or something to go along with hasFP?

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
741

This shouldn't need to be an explicit property?

785

Check the theoretical stackPointerIsTracked instead?

I'm thinking that this fix may be taking the wrong approach now.

I don't think that using StackPtrOffset (s32) is going to work in all cases. I'm still investigating alternatives.

I think the fundamental problem here is that we wouldn't have any register to increment/decrement in the kernel, and yet the offset is still too far to fit in the immediate field of the instruction and it also can't scavenge a register.

I think this is implicitly relying on trashing the reserved SP. If we're going to rely on this, then there's no reason to increment/decrement it since you can just assign the value directly to the register. We do still try to avoid reserving an SP, although at this point I think it's far more trouble than it's worth to try to save that one register.

The other option is to treat any frame that can possibly have an offset that won't fit in the immediate field as needing an SP

arsenm added a comment.Aug 4 2020, 2:26 PM

I tried looking at this again and I'm somewhat confused. I'm unable to break this. Was this intended only as an optimization?

The other option is to treat any frame that can possibly have an offset that won't fit in the immediate field as needing an SP

I think this would simplify things a lot, and doesn't seem like it should have a huge effect on performance? If you are already indexing into your stack that far does it really matter that you have one fewer SGPR free?

I've run into another case where SP isn't initialized with a more clear explanation which may be the same problem. We initially select buffer instructions directly referencing SP, and fix it up to the FP later in eliminateFrameIndex. However, if LocalStackSlotAllocation fires on a particular store, it no longer directly references a frame index. Therefore, eliminateFrameIndex never fixes up the scratch offset register in the downstream memory instructions so they still incorrectly reference SP. We can hack around this by forcing SP initialization if getLocalFrameObjectCount, but I think the real fix is to use stack access pseudos before we know the offset register to use

Is this still needed?

I'm revisiting this - (sorry for the lengthy hiatus)

Matt - did you resolve the issues you were seeing by going with your suggestion of always initialising the SP if getLocalFrameObjectCount?