Currently in llvm, CalleeSavedInfo can only assign a callee saved register to stack frame index to be spilled in the prologue. We would like to enable spilling gprs to vector registers. This patch adds the capability to spill to other registers aside from just the stack. It also adds the changes for power9 to spill gprs to volatile vector registers when they are available. This happens only for leaf functions when using the option -ppc-enable-pe-vector-spills.
Details
Diff Detail
Event Timeline
This is pretty neat.
Is the CFI limitation temporary? I imagine that you can use MCCFIInstruction::createRegister to indicate that the value that used to be in the GPR is now in the vector register?
Will be also be able to use this same infrastructure in order to save callee-saved condition registers to otherwise available GPRs (instead of to the stack)?
The CFI limitation is temporary. I did notice the createRegister, but I don't think there is support in the debuggers/unwinders for moving from a vector register to a GPR. Once we know how to handle this, we can remove the CFI limitation. The same infrastructure can be used for other available GPRs and I think createRegister should be able to handle that.
Okay. I'd hope that this would "just work" (I don't see any special cases in how DW_CFA_register, etc. is handled in GCC's unwind-dw2.c that would lead me to believe that it wouldn't). I suppose we'll find out.
Once we know how to handle this, we can remove the CFI limitation. The same infrastructure can be used for other available GPRs and I think createRegister should be able to handle that.
include/llvm/CodeGen/MachineFrameInfo.h | ||
---|---|---|
36 | How about union { int FrameIdx; unsigned DstReg; }; to make it more obvious that you can't have both? | |
263–265 | This should also be updated to warn users that not all CSRs are saved in the frame (and that something like CSInfo.size() * CSRSize is not totally accurate anymore, even though I'm not sure if any part of the code relies on this to allocate / estimate stack size). | |
lib/Target/PowerPC/PPCFrameLowering.cpp | ||
1990 | I may be missing something, but why is this needed? Won't it be def'd anyway? | |
lib/Target/PowerPC/PPCFrameLowering.h | ||
102 | grps -> gprs | |
test/CodeGen/PowerPC/prolog_vec_spills.ll | ||
20 | You can also write MIR tests with stuff like: %R14 = IMPLICIT_DEF %R15 = IMPLICIT_DEF %R16 = IMPLICIT_DEF to have more clear tests (with -run-pass prologepilog). |
lib/Target/PowerPC/PPCFrameLowering.cpp | ||
---|---|---|
1990 | Without this, we will get machine verify errors in the epilogue that we are using an undefined physical register. This can be fixed by adding the register as live into the block. However, adding it to only the epilogue isn't enough since other blocks may clobber the register in between the prologue and epilogue. |
lib/Target/PowerPC/PPCFrameLowering.cpp | ||
---|---|---|
1990 | Ah, right, I see. There is a function called updateLiveness in PrologEpilogInserter, which propagates the liveness of CSRs. You might want to look into that. |
Add handling in updateLiveness to add spilled destination register as live into all MachineBasicBlocks of the MachineFunction.
lib/CodeGen/PrologEpilogInserter.cpp | ||
---|---|---|
464 | I agree that in the general case all the basic blocks need to have this register as live in, but if the prologue/epilogue of the function is shrink-wrapped, only the blocks between the prologue and the epilogue need to have this as live-in. if (Visited.count(&MBB)) continue; |
lib/CodeGen/PrologEpilogInserter.cpp | ||
---|---|---|
464 | Please add a comment here explaining that the visited set contains the blocks outside the prologue/epilogue region. That's likely not obvious. |
lib/Target/PowerPC/PPCFrameLowering.cpp | ||
---|---|---|
42 | I am really not a fan of defining this here. This should be available from the .td files. Can we not get it from there? | |
1982 | No magic numbers please. If you are using the static array you defined above, you can insert a marker at the end and loop until you get to it (perhaps PPC::NoRegister) | |
1993 | I'm just curious about this entire section. I imagine this is just a copy of the generic implementation. If that is the case, can we just return false from this function and modify the generic implementation to just ignore any that are isSpilledToReg() (assuming the target has already done that)? |
Thank you for updating the code to get the CSR list from RegInfo. It looks much cleaner than having a static list of registers. I think the code can be cleaned up a bit further still, but overall this looks great.
llvm/include/llvm/CodeGen/MachineFrameInfo.h | ||
---|---|---|
64 ↗ | (On Diff #148042) | I think setFrameIdx() and setDstReg() should probably update SpilledToReg. |
265 ↗ | (On Diff #148042) | ... this data *is* used... |
llvm/lib/Target/PowerPC/PPCFrameLowering.cpp | ||
1963 ↗ | (On Diff #148042) | A comment here such as: |
1971 ↗ | (On Diff #148042) | Why the 3 register classes? Don't we only want to keep VF/F8 registers in this BitVector? |
1976 ↗ | (On Diff #148042) | Can you also clear any of the registers that satisfy MF.getRegInfo().isPhysRegUsed(VolatileVFReg) so you:
|
1980 ↗ | (On Diff #148042) | If you do the above, you can sink this into the loop so we exit as soon as we have used up all the usable VSRs. |
1990 ↗ | (On Diff #148042) | If you do the above, this whole thing just becomes a call to find_first(). |
include/llvm/CodeGen/MachineFrameInfo.h | ||
---|---|---|
54 | You initialize this in the only constructor. I don't think there's any point in initializing it here. | |
lib/CodeGen/PrologEpilogInserter.cpp | ||
463 | ... outside of the region delimited by the prologue/epilogue. |
Please add a MIR test for this, it would make the test much clearer.
Other than that and the comment below, it LGTM!
lib/Target/PowerPC/PPCFrameLowering.cpp | ||
---|---|---|
1978 | for (unsigned Reg : BVAllocatable.set_bits()) should be equivalent here. |
Thank you! A few comments on the MIR test though. Thanks for putting up the test.
llvm/test/CodeGen/MIR/PowerPC/prolog_vec_spills.mir | ||
---|---|---|
3 ↗ | (On Diff #162899) | You should be able to remove the LLVM IR and keep the MIR only. Usually the LLVM IR is kept around if there are any references to IR values from MI. |
49 ↗ | (On Diff #162899) | Is this needed? $lr8 = IMPLICIT_DEF $rm = IMPLICIT_DEF BLR8 implicit $lr8, implicit $rm |
59 ↗ | (On Diff #162899) | Same as the previous function. |
71 ↗ | (On Diff #162899) | $cr0 = IMPLICIT_DEF |
83 ↗ | (On Diff #162899) | I would use IMPLICIT_DEFs here as well to express the intent of the test better. |
99 ↗ | (On Diff #162899) | CHECK-LABEL: name: |
102 ↗ | (On Diff #162899) | If these are expected to be generated next to each other, I would use CHECK-NEXT: here. |
108 ↗ | (On Diff #162899) | CHECK-LABEL: name: |
LGTM too; some nitpicks below.
llvm/include/llvm/CodeGen/MachineFrameInfo.h | ||
---|---|---|
37 ↗ | (On Diff #163344) | If DstReg must be a physical register then you could use MCPhysReg instead of unsigned. |
54 ↗ | (On Diff #163344) | You could use bool SpilledToReg = false; here instead of setting it in the constructor. |
llvm/lib/CodeGen/PrologEpilogInserter.cpp | ||
78–79 ↗ | (On Diff #163344) | I assume adding the statistics is independent of the main change. It's fine for review like this, but would be nice if you could separate it into a separate commit when committing. |
llvm/include/llvm/CodeGen/MachineFrameInfo.h | ||
---|---|---|
37 ↗ | (On Diff #163344) | I'm preferring to keep it as unsigned so that it is consistent with unsigned Reg (the other register variable in the class). If you'd like me to change it, I can go ahead with that. Would you also want me to change the function definitions for unsigned getDstReg() and void setDstReg(unsigned SpillReg). And what about the users of this function, like: unsigned SpilledReg = CSI[I].getDstReg(); unsigned CFIRegister = MF.addFrameInst(MCCFIInstruction::createRegister( nullptr, MRI->getDwarfRegNum(Reg, true), MRI->getDwarfRegNum(SpilledReg, true))); SpilledReg is passed to getDwarfRegNum as unsigned. |
54 ↗ | (On Diff #163344) | Also preferring to keep this the same way so its consistent with the rest of the variables in the class being set in the constructor. |
How about
to make it more obvious that you can't have both?