This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Correct prolog SP initialization logic
ClosedPublic

Authored by arsenm on Jun 8 2020, 1:50 AM.

Details

Summary

Having callees that will read SP is not the only reason we need to
reference the stack pointer.

Diff Detail

Event Timeline

dstuttard created this revision.Jun 8 2020, 1:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2020, 1:50 AM
dstuttard marked an inline comment as done.
dstuttard added inline comments.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
453

Is there a way to finesse this so it's only added when it is likely to be required?

foad added a subscriber: foad.Jun 8 2020, 2:20 AM
arsenm added inline comments.Jun 15 2020, 6:07 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
453

This isn't really right either. hasNonSpillStackObjects isn't accurate, since we can access with an soffset of 0 in the entry point. This also will fail to catch cases with variable allocas. I think the logic here should probably be hasFP()?

arsenm added inline comments.Jun 19 2020, 6:24 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
453

I think this really is hasFP based on the implementation of getFrameRegister

dstuttard marked an inline comment as done.Jun 24 2020, 8:42 AM
dstuttard added inline comments.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
453

I tried hasFP and it didn't resolve the original issue I was fixing. I'm not 100% sure what's going on here now. The backend definitely generates an access using stackPtrOffsetReg when nonSpillStackObjects are present. Should the test include both?

Sorry for the delay in reviewing, I've tried to wrap my head around this and I'm still lost.

I think someone needs to define what all of these things mean. Some fundamental questions I don't think we have really adequately answered are:

  • What is the SP, when is it defined, when can it be changed?
  • What is the FP, when is it defined, when can it be changed?
  • Can we decide ahead-of-time (ISel?) whether we will need to have an SGPR available for the immediate-overflow case in SIRegisterInfo::buildSpillLoadStore?
    • Should we be using the SP/FP to handle that case in the first place (temporarily munging the SP/FP has implications for e.g. trap handlers, signals, CFI, etc.)?
    • If not, should we add an additional SGPR argument to the spill or something similar to ensure there is an SGPR available so we don't just ICE in the case where one isn't? Is this possible with the way RA works?
    • Can we in some cases prove that the spare SGPR is not needed because we will be able to fit the offset into the immediate field of the buffer instruction? Can we actually leverage this to free up that SGPR to be used by RA, or is this a circular problem/impossible because of potential need for the SGPR much later? Is this part of the purpose of the emergency spill slot? Is this an issue because we can't actually spill SGPRs directly to VMEM, and the emergency slot is in VMEM?

It seems like both of these patches, while maybe fixing some specific cases, aren't really framing the solution in terms of what is fundamentally broken. I think I broke this further with my patch to add the scratch wave offset to the scratch SRD, but I think all of these issues existed before. I don't know what the right answer is, but I can't really review anything without at least that amount of context.

arsenm added inline comments.Jul 28 2020, 2:05 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
453

Do you have the testcase for the original issue?

arsenm commandeered this revision.Jul 28 2020, 4:18 PM
arsenm updated this revision to Diff 281408.
arsenm edited reviewers, added: dstuttard; removed: arsenm.

Correct the logic.

hasFP || hasCalls should be slightly too conservative. I have no idea what other problem you could have been seeing from using hasFP without a testcase

arsenm retitled this revision from AMDGPU Initializes StackPtrOffset when NonSpillStackObjects present to AMDGPU: Correct prolog SP initialization logic.Jul 30 2020, 3:15 PM
arsenm edited the summary of this revision. (Show Details)

I don't know if this has anything to do with David's issue, but I think separately this makes sense. I'm just confused what the semantic difference between the two "do we need an SP" predicates is?

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1299

How does this differ from frameRequiresSP?

arsenm added inline comments.Aug 4 2020, 2:50 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1299

This is split mostly because of the distinctive handling of functions and kernels in hasFP. hasFP and requiresStackPointerReferences are almost the same, except the kernel case is a bit more specific since there's never a real reason to setup FP since it's known 0 (e.g. we never realign the kernel stack). requiresStackPointerReference is essentially hasFP for kernels, and frameRequiresSP is common to kernels and functions.

scott.linder added inline comments.Aug 5 2020, 8:36 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1265

So this is the function for "the conditions which imply the need for setting up an SP which are shared for entry functions and non-entry functions; just checking this for a frame can give false-negatives"?

I think at first I missed that this is just a static function used to implement the other bits and was confused. I don't know if it actually would help, but maybe you could hint at this in the name with something like frameTriviallyRequiresSP, and note in the comment that this is just the conditions shared between all frames?

1299

Can you put a (possibly condensed) version of this in a comment on the function? I mistakenly thought the frameRequiresSP was public, so the comment doesn't need to actually reference it, but just stating what the function does would help with the ambiguity between all of the needs{S,F}P type functions.

1301

Can this just be an assert that it is called for an entry function, and then could the name of this function just include EntryFunction like e.g. emitEntryFunctionPrologue does?

arsenm updated this revision to Diff 283264.Aug 5 2020, 8:52 AM

Add comments, name changes, revert to assert

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1301

Yes, this is what I did originally but I decided a more wholistic function may be better for future uses

scott.linder accepted this revision.Aug 5 2020, 12:41 PM

LGTM, thank you!

This revision is now accepted and ready to land.Aug 5 2020, 12:41 PM