In order to build the Linux kernel, the back chain must be supported together with packed-stack. The back chain is then stored topmost in the register save area.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
allow packed-stack in most of the cases where it's currently forbidden (with varargs, if the frame address is taken, and with backchain) -- since packed-stack is then not just an optimization, but an ABI change
Packed-stack stack is now also allowed with the back chain and with the frame address taken. I am however unsure of what to do with the varargs since this is hard coded all the way from the front end. It seems that GCC is keeping the standard offsets into the reg-save area. What would varargs with packed-stack in llvm mean?
lowerFRAMEADDR(): According to the documentation, zero can be returned if it cannot be identified, which this patch does when the back chain is not present. GCC seems to return just the address of where it would have been even if it is not present (then containing the saved %r15 reg instead). Doing it this way simplifies lowerFRAMEADDR(), I would think.
Allow packed-stack with varargs, placing the vararg arguments on the default slots in the reg save area.
Maybe move the checks in usePackedStack() to SystemZDAGToDAGISel::runOnMachineFunction()?
It seems that compiling a vararg function with gcc -mpacked-stack and -msoft-float places the stored GPRs in the default slots. However, by adding also -backchain (in addition to the previous two args) the GPRs are saved right below the backchain, which is at offset 152 (topmost). This patch does currently not move up the GPRs in the case of saving the backchain, but. I am guessing it should (in order to have "constant and known distance from the backchain to the return address")? If so, I suppose the the front-end should emit a different offset for vararg reg arguments in case of packed-stack + soft-float. Could it be the same regardless of backchain, or?
Looks like it does, but this isn't really required. You might as well store the GPRs right at the top of the save area.
However, by adding also -backchain (in addition to the previous two args) the GPRs are saved right below the backchain, which is at offset 152 (topmost). This patch does currently not move up the GPRs in the case of saving the backchain, but. I am guessing it should (in order to have "constant and known distance from the backchain to the return address")?
Yes, this is definitely required. Otherwise backtracking using the kernel backchain will not work.
If so, I suppose the the front-end should emit a different offset for vararg reg arguments in case of packed-stack + soft-float. Could it be the same regardless of backchain, or?
No, the offsets must remain the same, or else the ABI of the va_list type will change. (It must be possible to create a va_list in a function that is built with -mpacked-stack, and pass that list to another function built with -mno-packed-stack to use va_arg on that va_list.)
Instead, you should store the GPRs whereever they need to be stored (at or near the top of the save area), and the compute from that the appropriate value to be stored into the reg_save_area slot in the va_list struct. So for example, if you store r15 right at the top of the save area (sp + 152), then you should set reg_save_area to sp + 32; if you store r15 right below the back chain (sp + 144), then you should set reg_save_area to sp + 24 etc. That way, all existing va_arg code accessing this va_list will find the saved GPRs at the right position. (It won't find saved FPRs -- but it won't look for those if it is built with -msoft-float, and -msoft-float *is* an ABI changing option, so it's OK to require everything to be built with the same setting here.)
To implement this, you'll need to change the place in SystemZTargetLowering::LowerFormalArguments where the value to be used for reg_save_area is set:
int64_t RegSaveOffset = -SystemZMC::CallFrameSize; unsigned RegSaveIndex = MFI.CreateFixedObject(1, RegSaveOffset, true); FuncInfo->setRegSaveFrameIndex(RegSaveIndex);
This should take into account instead the specific offset where you've decided to save GPRs to.
Thank you for review. Patch updated accordingly.
Also, I refactored assignCalleeSavedSpillSlots() to not duplicate code in the case of packed-stack. That refactoring alone is this file:
Yes, this now looks correct to me. However, we now have duplicated computation between LowerFormalArguments and assignCalleeSavedSpillSlots again, which I don't really like.
I'm wondering whether the best solution wouldn't be to simply make the SystemZFrameLowering::getRegSpillOffset routine take packed stack into account itself. E.g. something like this:
unsigned getRegSpillOffset(const MachineFunction &MF, unsigned Reg) const { unsigned Offset = RegSpillOffsets[Reg]; if (usePackedStack(MF)) { if (SystemZ::GR64BitRegClass.contains(Reg)) { if (!(IsVarArg && !SoftFloat)) { // Put all GPRs at the top of the Register save area with packed // stack. Make room for the backchain if needed. Offset += BackChain ? 24 : 32; } else Offset = 0; } return Offset; }
and then use that function throughout LowerFormalArguments and assignCalleeSavedSpillSlots ...
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
2017 | It's a bit weird to hard-code only "-mhard-float" here. I guess this is because we may not actually have any "getLastArg" since hard-float is the default? In that case I guess we might as well hard-code the whole string. | |
llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp | ||
477 | SystemZMC::CallFrameSize - 8 would be nicer than the hard-coded 152. | |
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
1469 | See main comment about this duplicated logic here. | |
3260 | Hmm, GCC will always return the address where the back-chain would have been, even if it isn't emitted. Not sure whether that is very useful, but maybe we should emulate it simply for compatibility reasons ... |
However, we now have duplicated computation between LowerFormalArguments and assignCalleeSavedSpillSlots again, which I don't really like.
I'm wondering whether the best solution wouldn't be to simply make the SystemZFrameLowering::getRegSpillOffset routine take packed stack into account itself.
Yes, that seems even better :-) getRegSpillOffset() MF argument non-const due to usePackedStack().
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
2017 | yes | |
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
3260 | I thought it might be bad to create two stack slots to the same location (slot of abscent backchain / R15), but it appears to be fine. However, if creating that slot in case of no saved regs but with a local area, as in frameaddr-02.ll:fp1f, this ends up actually creating extra stack space for the backchain without using it. Not sure what would be best to do in that case if not returning 0... |
OK, just a few more small comments, otherwise looks really good now. Thanks!
llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp | ||
---|---|---|
114 | Maybe easier to understand: int CurrOffset = -SystemZMC::CallFrameSize; if (usePackedStack(MF)) CurrOffset += StartSPOffset; | |
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
1477–1479 | I believe this now needs to be -SystemZMC::CallFrameSize + Offset instead of RegSaveOffset + Offset. | |
3257 | We probably shoud use usePackedStack here for consistency. | |
3260 | I see. I guess it's fine to return 0 in that case then. (This is anyway not a mode that's used by real-world use cases.) |
Thanks for review - patch updated.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
1477–1479 | I think that should be equivalent due to the '!(IsVarArg && !SoftFloat)' check in getRegSpillOffset(). Would it be clearer to use -SystemZMC::CallFrameSize here still? |
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
1477–1479 | Yes, since if there were any offset, it would now be included in getRegSpillOffset ... Right now, it seems like it would be added twice (since you're adding the output of two getRegSpillOffset calls). |
It's a bit weird to hard-code only "-mhard-float" here. I guess this is because we may not actually have any "getLastArg" since hard-float is the default? In that case I guess we might as well hard-code the whole string.