Page MenuHomePhabricator

[AMDGPU] Unify spill code
ClosedPublic

Authored by sebastian-ne on Mar 24 2021, 8:02 AM.

Details

Summary

Instead of reimplementing spilling in prolog and epilog, reuse
loadRegFromStackSlot and storeRegToStackSlot.
Mark the generated instruction as by setting an added flag argument to
1, so the lowering can use the stack pointer instead of the frame
pointer, which is not set up at this point.

Also fixes a bug where subregisters got overwritten in
pei-scavenge-vgpr-spill.mir.

Diff Detail

Event Timeline

sebastian-ne created this revision.Mar 24 2021, 8:02 AM
sebastian-ne requested review of this revision.Mar 24 2021, 8:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 8:02 AM
arsenm added inline comments.Mar 24 2021, 8:07 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
744–745

I don't think you're supposed to rely on the frame setup flags this way

sebastian-ne marked an inline comment as done.

Add a flags argument to SI_SPILL_Vxx instructions instead of using frame-setup flags.

sebastian-ne edited the summary of this revision. (Show Details)Mar 24 2021, 10:47 AM
arsenm requested changes to this revision.Mar 30 2021, 3:28 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1573–1576

I don't like having this logic separated and overriding the frame register logic above. Can you infer this should be SP relative from the frame index itself instead of adding a new operand?

This revision now requires changes to proceed.Mar 30 2021, 3:28 PM
sebastian-ne added inline comments.Mar 31 2021, 6:25 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1573–1576

How should that work? Should I add a new TargetStackID::PrologEpilogSave?

It is not really a property of the frame index that SP should be used, it is because the instruction is in the prolog/epilog where FP is not setup. If we wanted to access the same frame index in the function body, we would need to use FP instead of SP.

I could move this code out of the switch and merge it with the frame register logic above, but that doesn’t change much.

arsenm added inline comments.Mar 31 2021, 4:12 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1573–1576

But we don't need to access these from an arbitrary point in the function. We know these indexes should only be accessed in the prolog/epilog. Fundamentally I do think this is a special case for PEI to handle. The problem here is just that what we need to do to emit spills is a lot more annoying than on other targets.

What this probably should do is just use a common implementation function that eliminateFrameIndex and emitProlog/emitEpilog can use. buildSpillLoadStore is approximately that already. emitProlog/emitEpilog don't really need to route through the pseudos and eliminateFrameIndex

sebastian-ne marked an inline comment as done.

Rewrite to use buildSpillLoadStore.

This makes buildSpillLoadStore public and adds a new optional LivePhysRegs parameter, to find an unused register if we have no RegScavenger.
Now this patch merges buildPrologSpill and buildEpilogReload into buildSpillLoadStore and uses that function from emitProlog/emitEpilog.

scott.linder added inline comments.Apr 2 2021, 10:37 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118

This seems to regress in terms of readability below, where instead of a call to buildPrologSpill we have a call to buildPrologEpilogSpill(..., true), and the same for buildEpilogReload.

Can we name the new function buildPrologSpillEpilogReloadImpl and leave the old methods interface the same:

static void buildPrologSpill(...) { return buildPrologSpillEpilogReloadImpl(..., true); }
static void buildEpilogReload(...) { return buildPrologSpillEpilogReloadImpl(..., false); }

An argument against doing this may be that we already have buildScratchExecCopy(..., true) instead of buildPrologScratchExecCopy and buildScratchExecCopy(..., false) instead of buildEpilogScratchExecCopy, but I'd vote to do the same for those too.

826–827

This change seems unrelated; is this the bug fix? Can it be in a separate patch after the cleanup? I'd prefer the cleanup be as close to NFC as is reasonable, although I assume there is some change in behavior because we no longer call findScratchNonCalleeSaveRegister, we do whatever buildSpillLoadRestore does.

I also might need some help understanding what is happening here to be able to review it. I'm confused about why LivePhysRegs::init gets called with different arguments for the prolog case vs the epilog case, and why we stepBackward on an (ostensibly arbitrary?) instruction after adding LiveOuts in the epilog case.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1472 ↗(On Diff #334652)

Nit: all the changes in this file can be removed from the patch

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1083–1094

Does RS already know about LiveRegs, or something equivalent to it? Or maybe RS cannot fail and return Register()?

If not, it seems like we should continue to try alternative ways to pick an SOffset until we either find one or run out of ways to try.

sebastian-ne marked an inline comment as done.

Split buildPrologEpilogSpill again, split out bug fix, remove some formatting changes and add some comments.

Thanks for your comments Scott, I tried to address all of them.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118

There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store. Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again.

I added comments to buildScratchExecCopy, does that look ok or should I add wrappers for prolog and epilog?

826–827

I split out the fix into D100098.
I think the difference is, in the prolog we start in the beginning and track register liveness to the current insertion point (MBBI). In the epilog we start at the end and track liveness to the current insertion point. The result should be the same, but the epilog is probably at the end of the basic block while the prolog is at the beginning, so it’s cheaper to start from the end/start.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1083–1094

Usually, buildSpillLoadStore is called to lower spill pseudos. At that point, we get a RegScavenger. During frame lowering (which did not call this function before), we do not have access to a RegScavenger as far as I know, so we use LiveRegs to find an unused register.
So, when calling this function, either RS should be set (during frame index elimination), or LiveRegs (during frame lowering), but never both, as they do the same thing.

RS can fail if all SGPRs are live. Usually it would then try to spill a register to the emergency spill slot, but that doesn’t make sense if we are currently lowering a spill, so we forbid that with the false last argument.

Thank you, I have one remaining nit concerning RS and LiveRegs, but otherwise everything LGTM.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118

There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store.

I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of buildSpillLoadStore the boolean is capturing whether to kill the operand, although it seems like that would always be equivalent to whether Opcode is a load or a store?

Now that I look at buildPrologEpilogSpill again, it’s quite short and the load/spill parts are different enough, so it makes sense to split them again.

This LGTM, thank you! The differences between the two are more obvious to me now than in the conditional, combined version.

I added comments to buildScratchExecCopy, does that look ok or should I add wrappers for prolog and epilog?

Sorry, I only meant to justify my comments on your code, I didn't mean to ask for a change as part of this patch. The comment LGTM.

826–827

LGTM, thank you!

I think I understand now, and removing some of the redundant code helps.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1083–1094

Makes sense, but in that case could you add an assert(!RS != !LiveRegs && "expect either RS or LiveRegs but not both")?

Add assert to buildSpillLoadStore.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
118

There are some other functions (SIRegisterInfo::buildSpillLoadStore and SIRegisterInfo::buildSGPRSpillLoadStore) that also take a boolean argument to switch between load and store.

I think some amount of this is pragmatic and reasonable, I just start to find it harder to read when it gets used broadly enough. Also in the case of buildSpillLoadStore the boolean is capturing whether to kill the operand, although it seems like that would always be equivalent to whether Opcode is a load or a store?

Whoops, you’re right, buildSpillLoadStore directly takes the opcode instead of a boolean. I think kill can be false on a store if the register is still used afterwards.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1083–1094

I think it’s supported to call buildSpillLoadStore without a register scavenger (and without LiveRegs), that’s why the check for RS was here before, so I added an assert for !RS || !LiveRegs.

scott.linder accepted this revision.Apr 9 2021, 9:45 AM

LGTM, thank you for bearing with me!

If I understand Matt's comments earlier correctly, I think you've addressed them. If not, or Matt has more concerns, I assume we can clean it up post-commit.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 12 2021, 2:24 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.