This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix redundant FP spilling/assert in some functions
ClosedPublic

Authored by arsenm on Jan 22 2021, 8:21 AM.

Details

Summary

If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.

Try to predict that we will have an FP later, although this is ugly.

Diff Detail

Event Timeline

arsenm created this revision.Jan 22 2021, 8:21 AM
arsenm requested review of this revision.Jan 22 2021, 8:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 8:21 AM
Herald added a subscriber: wdng. · View Herald Transcript
scott.linder added inline comments.Jan 22 2021, 10:18 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1349

Is there a reason to have hasFP as distinct from WillHaveFP? Is it not possible to have a single predicate which encompasses both? Is it an issue of needing to consider different context for each callsite?

It seems like the only question any part of the code should be asking is: "Can we prove we do not and will not need a frame pointer?" I would expect it to be a "monotone" predicate, in that it can only ever "increase" in strength from "we may need an FP, so you must assume we do/will" to "we absolutely do not need an FP, and will never accidentally end up in a situation later in compilation where we actually do need an FP", and for debug builds this should be enforced via an assertion. It would be safe to implement the predicate to always return "we may need it", but we would want to have it detect as many cases where we could promise no need for an FP starting as early in compilation as possible.

If I understand the exact issue being addressed here, this patch makes sense, but it feels like we will be hitting other similar issues until we do something more fundamental.

arsenm added inline comments.Jan 22 2021, 12:20 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1349

Ultimately this is because our hasFP condition is weird. Unlike every other target, the stack addressing and stack growth are in the same direction so we end up requiring an FP to access stack objects if there's a call. We then don't know the set of stack objects because the callee saved spills are inserted later.

Once we start making decisions that assume hasFP, we're stuck with the FP. We already do the same prediction in determineCalleeSaves, this just repeats it in determineCalleeSavesSGPR. This doesn't have to do with the callsites, it's a function of the function with a call. I think I tried before to make hasFP conservative, but I don't remember what the problem was

scott.linder accepted this revision.Jan 25 2021, 12:18 PM

LGTM, this materially improves things, and we can always try to unify all of this in the future if we have the time

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1349

I still think it would make it would be easier to understand if we could find a way to move this all into hasFP and make it conservative. It seems like other targets get this for free by virtue of being able to address negative offsets.

Once we start making decisions that assume hasFP, we're stuck with the FP.

If we actually ensure this, I'm happy, but I just have a hard time reasoning about it with what we currently have. Ideally I'd like an #ifndef NDEBUG bool that gets set when we decide we can elide the FP in hasFP. Then we can assert(!ElidedFP) if we later determine we need an FP. It sounds like the case you are fixing here already hit an assert in debug builds, though, so maybe this isn't necessary.

This revision is now accepted and ready to land.Jan 25 2021, 12:18 PM