[Power9] Allow gpr callee saved spills in prologue to vector registers rather than stack
AcceptedPublic

Authored by syzaara on Oct 27 2017, 1:18 PM.

Details

Summary

Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills.

Diff Detail

syzaara created this revision.Oct 27 2017, 1:18 PM

This is pretty neat.

Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?

Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?

This is pretty neat.

Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?

Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?

The CFI limitation is temporary. I did notice the createRegister, but I don't think there is support in the debuggers/unwinders for moving from a vector register to a GPR. Once we know how to handle this, we can remove the CFI limitation. The same infrastructure can be used for other available GPRs and I think createRegister should be able to handle that.

This is pretty neat.

Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?

Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?

The CFI limitation is temporary. I did notice the createRegister, but I don't think there is support in the debuggers/unwinders for moving from a vector register to a GPR.

Okay. I'd hope that this would "just work" (I don't see any special cases in how DW_CFA_register, etc. is handled in GCC's unwind-dw2.c that would lead me to believe that it wouldn't). I suppose we'll find out.

Once we know how to handle this, we can remove the CFI limitation. The same infrastructure can be used for other available GPRs and I think createRegister should be able to handle that.

syzaara updated this revision to Diff 123645.Nov 20 2017, 1:47 PM

Use createRegister to emit .cfi_register for debug info.

@hfinkel Can you please finish reviewing this.

thegameg added inline comments.
include/llvm/CodeGen/MachineFrameInfo.h
36 ↗(On Diff #123645)

How about

union {
  int FrameIdx;
  unsigned DstReg;
};

to make it more obvious that you can't have both?

263 ↗(On Diff #123645)

This should also be updated to warn users that not all CSRs are saved in the frame (and that something like CSInfo.size() * CSRSize is not totally accurate anymore, even though I'm not sure if any part of the code relies on this to allocate / estimate stack size).

lib/Target/PowerPC/PPCFrameLowering.cpp
1990 ↗(On Diff #123645)

I may be missing something, but why is this needed? Won't it be def'd anyway?

lib/Target/PowerPC/PPCFrameLowering.h
102 ↗(On Diff #123645)

grps -> gprs

test/CodeGen/PowerPC/prolog_vec_spills.ll
19 ↗(On Diff #123645)

You can also write MIR tests with stuff like:

%R14 = IMPLICIT_DEF
%R15 = IMPLICIT_DEF
%R16 = IMPLICIT_DEF

to have more clear tests (with -run-pass prologepilog).

syzaara updated this revision to Diff 131642.Jan 26 2018, 12:52 PM
syzaara added inline comments.Jan 26 2018, 12:58 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
1990 ↗(On Diff #123645)

Without this, we will get machine verify errors in the epilogue that we are using an undefined physical register. This can be fixed by adding the register as live into the block. However, adding it to only the epilogue isn't enough since other blocks may clobber the register in between the prologue and epilogue.

syzaara edited the summary of this revision. (Show Details)Jan 26 2018, 12:58 PM
thegameg added inline comments.Jan 27 2018, 6:12 AM
lib/Target/PowerPC/PPCFrameLowering.cpp
1990 ↗(On Diff #123645)

Ah, right, I see. There is a function called updateLiveness in PrologEpilogInserter, which propagates the liveness of CSRs. You might want to look into that.

syzaara updated this revision to Diff 132037.Jan 30 2018, 1:19 PM

Add handling in updateLiveness to add spilled destination register as live into all MachineBasicBlocks of the MachineFunction.

thegameg added inline comments.Jan 30 2018, 1:54 PM
lib/CodeGen/PrologEpilogInserter.cpp
464 ↗(On Diff #132037)

I agree that in the general case all the basic blocks need to have this register as live in, but if the prologue/epilogue of the function is shrink-wrapped, only the blocks between the prologue and the epilogue need to have this as live-in.
I think at this point in the function the Visited set contains all the blocks *outside* the prologue/epilogue region. Correct me if I'm wrong but I think you're missing a:

if (Visited.count(&MBB))
  continue;
syzaara updated this revision to Diff 132419.Feb 1 2018, 9:48 AM

@hfinkel Hi Hal, are you okay with approving this now?

hfinkel added inline comments.Mar 8 2018, 7:07 PM
lib/CodeGen/PrologEpilogInserter.cpp
464 ↗(On Diff #132037)

Please add a comment here explaining that the visited set contains the blocks outside the prologue/epilogue region. That's likely not obvious.

syzaara updated this revision to Diff 138234.Mar 13 2018, 11:08 AM
syzaara marked an inline comment as done.
nemanjai added inline comments.Apr 3 2018, 4:31 PM
lib/Target/PowerPC/PPCFrameLowering.cpp
42 ↗(On Diff #138234)

I am really not a fan of defining this here. This should be available from the .td files. Can we not get it from there?

1982 ↗(On Diff #138234)

No magic numbers please. If you are using the static array you defined above, you can insert a marker at the end and loop until you get to it (perhaps PPC::NoRegister)

1993 ↗(On Diff #138234)

I'm just curious about this entire section. I imagine this is just a copy of the generic implementation. If that is the case, can we just return false from this function and modify the generic implementation to just ignore any that are isSpilledToReg() (assuming the target has already done that)?
Perhaps if we do that, the generic code can also assert that the DestReg is set for any such CSI objects.

@syzaara Do you plan to update this to address the comments?

syzaara updated this revision to Diff 148042.May 22 2018, 10:13 AM

Thank you for updating the code to get the CSR list from RegInfo. It looks much cleaner than having a static list of registers. I think the code can be cleaned up a bit further still, but overall this looks great.

llvm/include/llvm/CodeGen/MachineFrameInfo.h
63–71

I think setFrameIdx() and setDstReg() should probably update SpilledToReg.

289

... this data *is* used...

llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
1989

A comment here such as:
// Build a BitVector of VSRs that can be used for spilling GPRs.

1997

Why the 3 register classes? Don't we only want to keep VF/F8 registers in this BitVector?

2002

Can you also clear any of the registers that satisfy MF.getRegInfo().isPhysRegUsed(VolatileVFReg) so you:

  • Provide early exit if there are no unused caller-saved VSRs
  • Simplify the spilling code below (so you don't need to loop and check this there)
2006

If you do the above, you can sink this into the loop so we exit as soon as we have used up all the usable VSRs.

2016

If you do the above, this whole thing just becomes a call to find_first().

syzaara updated this revision to Diff 154275.Jul 5 2018, 11:20 AM
syzaara updated this revision to Diff 154276.
nemanjai accepted this revision.Aug 13 2018, 11:58 AM

LGTM.
Perhaps @thegameg or @MatzeB want to look at it as well (particularly the target independent parts). And perhaps @hfinkel and @echristo are interested in having another look. Otherwise, I suppose this is ready to go.

include/llvm/CodeGen/MachineFrameInfo.h
54 ↗(On Diff #154276)

You initialize this in the only constructor. I don't think there's any point in initializing it here.

lib/CodeGen/PrologEpilogInserter.cpp
474 ↗(On Diff #154276)

... outside of the region delimited by the prologue/epilogue.
I think that's what this comment is meant to say, isn't it?

This revision is now accepted and ready to land.Aug 13 2018, 11:58 AM
thegameg accepted this revision.Aug 14 2018, 2:46 AM

Please add a MIR test for this, it would make the test much clearer.

Other than that and the comment below, it LGTM!

lib/Target/PowerPC/PPCFrameLowering.cpp
1972 ↗(On Diff #154276)

for (unsigned Reg : BVAllocatable.set_bits()) should be equivalent here.

syzaara updated this revision to Diff 162899.Tue, Aug 28, 10:43 AM

Thank you! A few comments on the MIR test though. Thanks for putting up the test.

llvm/test/CodeGen/MIR/PowerPC/prolog_vec_spills.mir
4

You should be able to remove the LLVM IR and keep the MIR only.

Usually the LLVM IR is kept around if there are any references to IR values from MI.

50

Is this needed?
I think you can either use BLR8 implicit undef $lr8, implicit undef $rm and get rid of the $r3 and $x3 defs, or:

$lr8 = IMPLICIT_DEF
$rm = IMPLICIT_DEF
BLR8 implicit $lr8, implicit $rm
60

Same as the previous function.

72

$cr0 = IMPLICIT_DEF

84

I would use IMPLICIT_DEFs here as well to express the intent of the test better.

100

CHECK-LABEL: name:

103

If these are expected to be generated next to each other, I would use CHECK-NEXT: here.

109

CHECK-LABEL: name:

syzaara updated this revision to Diff 163344.Thu, Aug 30, 8:26 AM
thegameg added inline comments.Thu, Aug 30, 8:36 AM
llvm/test/CodeGen/MIR/PowerPC/prolog_vec_spills.mir
18

This line is probably not needed.

55

Same.

MatzeB accepted this revision.Thu, Aug 30, 9:08 AM

LGTM too; some nitpicks below.

llvm/include/llvm/CodeGen/MachineFrameInfo.h
37

If DstReg must be a physical register then you could use MCPhysReg instead of unsigned.

54

You could use bool SpilledToReg = false; here instead of setting it in the constructor.

llvm/lib/CodeGen/PrologEpilogInserter.cpp
78–79

I assume adding the statistics is independent of the main change. It's fine for review like this, but would be nice if you could separate it into a separate commit when committing.

syzaara added inline comments.Fri, Aug 31, 8:30 AM
llvm/include/llvm/CodeGen/MachineFrameInfo.h
37

I'm preferring to keep it as unsigned so that it is consistent with unsigned Reg (the other register variable in the class).

If you'd like me to change it, I can go ahead with that. Would you also want me to change the function definitions for unsigned getDstReg() and void setDstReg(unsigned SpillReg).

And what about the users of this function, like:

unsigned SpilledReg = CSI[I].getDstReg();
unsigned CFIRegister = MF.addFrameInst(MCCFIInstruction::createRegister(
    nullptr, MRI->getDwarfRegNum(Reg, true),
    MRI->getDwarfRegNum(SpilledReg, true)));

SpilledReg is passed to getDwarfRegNum as unsigned.

54

Also preferring to keep this the same way so its consistent with the rest of the variables in the class being set in the constructor.