This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).
As discussed when this function was added initially, this routine should not modify it's CSI argument. If we want to add the SP to the general set of spilled and restored registers, this should be done in determineCalleeSaves instead.
Why do we need to override the default implementation here? That shouldn't be necessary if we've set everything else up correctly.
In particular, this looks quite wrong. It's true with most ABIs that arguments are in the parent's frame, but that should have been handled elsewhere, it shouldn't require special code in this routine.
Pulling out the getFrameIndexReference discussion into the main comments.
First of all, let me review the overall stack layout with the z/OS XPLINK calling convention. To my knowledge, the stack looks basically like this (from high addresses to low addresses):
======================================================== Caller's local variables + spill area -------------------------------------------------------- Caller's argument area (passed to callee) SP1 + 2176 -------------------------------------------------------- Caller's 128-byte GPR save area SP1 + 2048 ======================================================== Callee's local variables + spill area -------------------------------------------------------- (unless leaf) Callee's argument area -------------------------------------------------------- (unless leaf) Callee's 128-byte GPR save area SP2 + 2048 ======================================================== SP1: Stack pointer at function entry to callee (beginning of prologue) SP2: Stack pointer during callee (end of prologue) SP1 - SP2 == Callee's frame size
Is this correct? If not, please correct. (In any case, it would be good to have a diagram along those lines as comments in this file as well.)
The first question is, what is the base to be used for frame references? There is some flexibility, but it seems most useful to use the incoming SP, modulo the stack pointer bias, for that (i.e. the value "SP1 + 2048" in the above diagram). This would mean that you access the incoming parameters using positive offsets relative to that base, and local variables using negative offsets relative to that base.
The getFrameIndexReference routine needs to compute that base value (and then add the offset) starting from current register values, i.e. SP2 in the above diagram. The default computation (done by the default implementation of the routine) does:
return StackOffset::getFixed(MFI.getObjectOffset(FI) + MFI.getStackSize() - getOffsetOfLocalArea() + MFI.getOffsetAdjustment());
Your proposed platform-specific implementation does instead:
return StackOffset::getFixed(MFI.getObjectOffset(FI) + getAllocatedStackSize(MF) + getOffsetOfLocalArea() + MFI.getOffsetAdjustment());
There's two differences:
1.) you use getAllocatedStackSize instead of MFI.getStackSize
2.) you add instead of subtract the getOffsetOfLocalArea value
Both of these look suspicious to me.
As to the first difference, the difference between getAllocatedStackSize and MFI.getStackSize is that the former adds the 128 bytes that are used for the (callee's) GPR save area, which wasn't part of the stack frame size before. I think it would be preferable to actually add that size to the stack size, so it will be included in MFI.getStackSize to begin with. This would also help other users of that routine.
To do so, you should simply call MFFrame.setStackSize in the emitPrologue routine, similar to what is already done in the ELF ABI case.
As to the second difference, I do not really understand how the local area offset is currently being used by the XPLINK ABI. The constructor currently sets this up as:
SystemZXPLINKFrameLowering::SystemZXPLINKFrameLowering() : SystemZFrameLowering(TargetFrameLowering::StackGrowsUp, Align(32), 128, Align(32), /* StackRealignable */ false),
So we have an upwards growing stack with a local area offset of 128 bytes. This seems wrong. First of all, the stack is actually downwards growing - note how emitPrologue subtracts from the stack pointer (for an upwards growing stack it would add to the stack pointer).
So one change might be to use StackGrowsDown instead, and then use -128 instead of 128 for the local area offset. (With StackGrowsDown, the offset must be negative.) That would already fix the sign problem your code works around.
However, I think even that is really incorrect. I don't believe in the above diagram we have anything that would match the "local area offset" as used by common code here -- that offset would refer to some fixed members placed between the base of the frame (i.e. SP1 + 2048 as discussed above) and the local variable/spill area -- but there are no fixed members in that place. So I think this really should be 0 anyway.
Maybe the intent was to use the top of the caller's GPR save area as the frame base instead? Then the 128 byte would account for that incoming space. But I'm not sure this has any particular benefit. (@Kai, any comments on that?)
Note that the parameter set up code in LowerFormalParameters et. al. currently also assumes that the frame offsets for incoming parameters are relative to the top of the caller's GPR save area, and not the incoming SP. So depending on the decision what point to use as frame base, we may need to update how those offsets are being computed as well.
Reading the long comment, I guess that there is a systematic error in the calculation of the stack offsets.
First, the diagram looks correct to me. And obviously, the stack grows down.
The local area offset is used to "jump" over the 128 byte GPR save area. According to your description, the semantic of the local area is different, which indicates to me that this field should not be used for this purpose. I guess the different sign of the LocalAreaOffset is also the cause for the strange calculation in getAllocatedStackSize().
For the frame reference, the idea seems to be to reach out into the caller's stack frame, as the formula ends up being: `stack bias + stack size of function + jump over GPR save area + object offset". While the offset of incoming arguments is relative to the GPR save area, the fixed spill offsets are relative to stack pointer =
Well, thinking about all this I guess the best way forward is:
- not using the LocalAreaOffset, because it has a different semantic
- get rid of getAllocatedStackSize()
- use "incoming SP + stackbias" as base for all stack offset calculations, because that is more obvious
Agreed. For the last part, the only thing that needs to change is the fixed stack offsets for the parameter slots, which now need to include the 128 byte area. This could be done locally in LowerFormalParameters when doing the CreateFixedObject call. But I suspect it might be more straightforward to change the offset computation in SystemZCallingConvention.td to have the correct offset in VA.getLocMemOffset() to begin with.
I'm wondering if it wouldn't be more straightforward to handle this in determineCalleeSaves, in the place where the frame pointer register is handled?
This is no longer called anywhere and should be deleted.
This function is now actually identical to the default implementation and should also be deleted.
|1504 ↗||(On Diff #392454)|
It's a bit ugly, but I'd be OK with it for now. It would be good to add a FIXME saying that ideally the call frame size should have already been included in the offset.
That's what this line is for. Adding a register to SavedRegs in determineCalleeSaves doesn't do anything unless that register is marked as a callee-saved register here.
|1504 ↗||(On Diff #392454)|
So originally I actually included it in the offset, but I found that it would force similar ugliness in the LowerCall routine here.
As I understand it, if the call frame size is included in the offset just for XPLINK, then in the location I linked above we would have to only add the call frame size for ELF targets, since the call frame size is already included. ELF callees instead rely on SystemZELFFrameLowering::getFrameIndexReference to jump over the call frame size.
So I think if we were to add each target's call frame size to the offset, we could actually eliminate the override of getFrameIndexReference for ELF as well. Please correct me if I'm mistaken about the ELF code though.
But with that said, I'd prefer not to touch the ELF code in this patch so if you're OK with it for now I'll just leave it as a FIXME
Ah, I see. This is OK then.
|1504 ↗||(On Diff #392454)|
Indeed, this is another place where the current abstractions break down. I think the proper fix will be to change that line to simply:
unsigned Offset = Regs->getStackPointerBias() + VA.getLocMemOffset();
and then set StackPointerBias to 160 for ELF. Those 160 bytes are really a stack bias on ELF - they're part of the callee's stack frame, but always allocated by the caller.
But all this should be done in a follow-up patch. For now, this patch LGTM as is.
This generates STG instruction used to save the frame pointer correctly. It was previously generated as follows:
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG), SystemZ::R0D) .addReg(SystemZ::R4D) .addImm(Offset) .addReg(0);
This was incorrect as the above invocation of BuildMI would cause the above instruction to give R0D the RegState::Define flag (see here), which does not make sense for a store instruction. The correct way is to generate the instruction as follows:
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG)) .addReg(SystemZ::R0D) .addReg(SystemZ::R4D) .addImm(Offset) .addReg(0);
Unfortunately it seems that this did not trigger an error unless EXPENSIVE_CHECKS was used, hence the buildbot failure.