[Power9] Allow gpr callee saved spills in prologue to vector registers rather than stack
Needs ReviewPublic

Authored by syzaara on Oct 27 2017, 1:18 PM.

Details

Summary

Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills.

Diff Detail

syzaara created this revision.Oct 27 2017, 1:18 PM

This is pretty neat.

Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?

Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?

This is pretty neat.

Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?

Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?

The CFI limitation is temporary. I did notice the createRegister, but I don't think there is support in the debuggers/unwinders for moving from a vector register to a GPR. Once we know how to handle this, we can remove the CFI limitation. The same infrastructure can be used for other available GPRs and I think createRegister should be able to handle that.

This is pretty neat.

Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?

Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?

The CFI limitation is temporary. I did notice the createRegister, but I don't think there is support in the debuggers/unwinders for moving from a vector register to a GPR.

Okay. I'd hope that this would "just work" (I don't see any special cases in how DW_CFA_register, etc. is handled in GCC's unwind-dw2.c that would lead me to believe that it wouldn't). I suppose we'll find out.

Once we know how to handle this, we can remove the CFI limitation. The same infrastructure can be used for other available GPRs and I think createRegister should be able to handle that.

syzaara updated this revision to Diff 123645.Nov 20 2017, 1:47 PM

Use createRegister to emit .cfi_register for debug info.

@hfinkel Can you please finish reviewing this.

thegameg added inline comments.
include/llvm/CodeGen/MachineFrameInfo.h
36 ↗(On Diff #123645)

How about

union {
  int FrameIdx;
  unsigned DstReg;
};

to make it more obvious that you can't have both?

263 ↗(On Diff #123645)

This should also be updated to warn users that not all CSRs are saved in the frame (and that something like CSInfo.size() * CSRSize is not totally accurate anymore, even though I'm not sure if any part of the code relies on this to allocate / estimate stack size).

lib/Target/PowerPC/PPCFrameLowering.cpp
1990 ↗(On Diff #123645)

I may be missing something, but why is this needed? Won't it be def'd anyway?

lib/Target/PowerPC/PPCFrameLowering.h
102 ↗(On Diff #123645)

grps -> gprs

test/CodeGen/PowerPC/prolog_vec_spills.ll
19 ↗(On Diff #123645)

You can also write MIR tests with stuff like:

%R14 = IMPLICIT_DEF
%R15 = IMPLICIT_DEF
%R16 = IMPLICIT_DEF

to have more clear tests (with -run-pass prologepilog).

syzaara updated this revision to Diff 131642.Jan 26 2018, 12:52 PM
syzaara added inline comments.Jan 26 2018, 12:58 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
1990 ↗(On Diff #123645)

Without this, we will get machine verify errors in the epilogue that we are using an undefined physical register. This can be fixed by adding the register as live into the block. However, adding it to only the epilogue isn't enough since other blocks may clobber the register in between the prologue and epilogue.

syzaara edited the summary of this revision. (Show Details)Jan 26 2018, 12:58 PM
thegameg added inline comments.Jan 27 2018, 6:12 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
1990 ↗(On Diff #123645)

Ah, right, I see. There is a function called updateLiveness in PrologEpilogInserter, which propagates the liveness of CSRs. You might want to look into that.

syzaara updated this revision to Diff 132037.Jan 30 2018, 1:19 PM

Add handling in updateLiveness to add spilled destination register as live into all MachineBasicBlocks of the MachineFunction.

thegameg added inline comments.Jan 30 2018, 1:54 PM
lib/CodeGen/PrologEpilogInserter.cpp
464 ↗(On Diff #132037)

I agree that in the general case all the basic blocks need to have this register as live in, but if the prologue/epilogue of the function is shrink-wrapped, only the blocks between the prologue and the epilogue need to have this as live-in.
I think at this point in the function the Visited set contains all the blocks *outside* the prologue/epilogue region. Correct me if I'm wrong but I think you're missing a:

if (Visited.count(&MBB))
  continue;
syzaara updated this revision to Diff 132419.Feb 1 2018, 9:48 AM

@hfinkel Hi Hal, are you okay with approving this now?

hfinkel added inline comments.Mar 8 2018, 7:07 PM
lib/CodeGen/PrologEpilogInserter.cpp
464 ↗(On Diff #132037)

Please add a comment here explaining that the visited set contains the blocks outside the prologue/epilogue region. That's likely not obvious.

syzaara updated this revision to Diff 138234.Mar 13 2018, 11:08 AM
syzaara marked an inline comment as done.
nemanjai added inline comments.Apr 3 2018, 4:31 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
42 ↗(On Diff #138234)

I am really not a fan of defining this here. This should be available from the .td files. Can we not get it from there?

1982 ↗(On Diff #138234)

No magic numbers please. If you are using the static array you defined above, you can insert a marker at the end and loop until you get to it (perhaps PPC::NoRegister)

1993 ↗(On Diff #138234)

I'm just curious about this entire section. I imagine this is just a copy of the generic implementation. If that is the case, can we just return false from this function and modify the generic implementation to just ignore any that are isSpilledToReg() (assuming the target has already done that)?
Perhaps if we do that, the generic code can also assert that the DestReg is set for any such CSI objects.

@syzaara Do you plan to update this to address the comments?

syzaara updated this revision to Diff 148042.Tue, May 22, 10:13 AM