Having callees that will read SP is not the only reason we need to
reference the stack pointer.
Details
- Reviewers
scott.linder dstuttard
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
453 | Is there a way to finesse this so it's only added when it is likely to be required? |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
453 | This isn't really right either. hasNonSpillStackObjects isn't accurate, since we can access with an soffset of 0 in the entry point. This also will fail to catch cases with variable allocas. I think the logic here should probably be hasFP()? |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
453 | I think this really is hasFP based on the implementation of getFrameRegister |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
453 | I tried hasFP and it didn't resolve the original issue I was fixing. I'm not 100% sure what's going on here now. The backend definitely generates an access using stackPtrOffsetReg when nonSpillStackObjects are present. Should the test include both? |
Sorry for the delay in reviewing, I've tried to wrap my head around this and I'm still lost.
I think someone needs to define what all of these things mean. Some fundamental questions I don't think we have really adequately answered are:
- What is the SP, when is it defined, when can it be changed?
- What is the FP, when is it defined, when can it be changed?
- Can we decide ahead-of-time (ISel?) whether we will need to have an SGPR available for the immediate-overflow case in SIRegisterInfo::buildSpillLoadStore?
- Should we be using the SP/FP to handle that case in the first place (temporarily munging the SP/FP has implications for e.g. trap handlers, signals, CFI, etc.)?
- If not, should we add an additional SGPR argument to the spill or something similar to ensure there is an SGPR available so we don't just ICE in the case where one isn't? Is this possible with the way RA works?
- Can we in some cases prove that the spare SGPR is not needed because we will be able to fit the offset into the immediate field of the buffer instruction? Can we actually leverage this to free up that SGPR to be used by RA, or is this a circular problem/impossible because of potential need for the SGPR much later? Is this part of the purpose of the emergency spill slot? Is this an issue because we can't actually spill SGPRs directly to VMEM, and the emergency slot is in VMEM?
It seems like both of these patches, while maybe fixing some specific cases, aren't really framing the solution in terms of what is fundamentally broken. I think I broke this further with my patch to add the scratch wave offset to the scratch SRD, but I think all of these issues existed before. I don't know what the right answer is, but I can't really review anything without at least that amount of context.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
453 | Do you have the testcase for the original issue? |
Correct the logic.
hasFP || hasCalls should be slightly too conservative. I have no idea what other problem you could have been seeing from using hasFP without a testcase
I don't know if this has anything to do with David's issue, but I think separately this makes sense. I'm just confused what the semantic difference between the two "do we need an SP" predicates is?
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1299 | How does this differ from frameRequiresSP? |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1299 | This is split mostly because of the distinctive handling of functions and kernels in hasFP. hasFP and requiresStackPointerReferences are almost the same, except the kernel case is a bit more specific since there's never a real reason to setup FP since it's known 0 (e.g. we never realign the kernel stack). requiresStackPointerReference is essentially hasFP for kernels, and frameRequiresSP is common to kernels and functions. |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1265 | So this is the function for "the conditions which imply the need for setting up an SP which are shared for entry functions and non-entry functions; just checking this for a frame can give false-negatives"? I think at first I missed that this is just a static function used to implement the other bits and was confused. I don't know if it actually would help, but maybe you could hint at this in the name with something like frameTriviallyRequiresSP, and note in the comment that this is just the conditions shared between all frames? | |
1299 | Can you put a (possibly condensed) version of this in a comment on the function? I mistakenly thought the frameRequiresSP was public, so the comment doesn't need to actually reference it, but just stating what the function does would help with the ambiguity between all of the needs{S,F}P type functions. | |
1301 | Can this just be an assert that it is called for an entry function, and then could the name of this function just include EntryFunction like e.g. emitEntryFunctionPrologue does? |
Add comments, name changes, revert to assert
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
1301 | Yes, this is what I did originally but I decided a more wholistic function may be better for future uses |
Is there a way to finesse this so it's only added when it is likely to be required?