This is an archive of the discontinued LLVM Phabricator instance.

[z/OS] Implement prologue and epilogue generation for z/OS target.
ClosedPublic

Authored by Everybody0523 on Nov 23 2021, 10:07 AM.

Details

Summary

This patch adds support for prologue and epilogue generation for the z/OS target under the XPLINK64 ABI for functions with a stack size of less than 1048576 bytes (huge stack frames).

Diff Detail

Unit TestsFailed

Event Timeline

Everybody0523 created this revision.Nov 23 2021, 10:07 AM
Everybody0523 requested review of this revision.Nov 23 2021, 10:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2021, 10:07 AM
Everybody0523 edited the summary of this revision. (Show Details)Nov 23 2021, 10:26 AM

Formatting changes

Whitespace change

Add more testcases

uweigand added inline comments.Nov 28 2021, 6:44 AM
llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
884

As discussed when this function was added initially, this routine should not modify it's CSI argument. If we want to add the SP to the general set of spilled and restored registers, this should be done in determineCalleeSaves instead.

1169

Why do we need to override the default implementation here? That shouldn't be necessary if we've set everything else up correctly.

1181

In particular, this looks quite wrong. It's true with most ABIs that arguments are in the parent's frame, but that should have been handled elsewhere, it shouldn't require special code in this routine.

Everybody0523 added inline comments.Nov 29 2021, 12:10 PM
llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
1181

Should this be done in LowerFormalArguments instead? I'm sorry but I'm not quite sure where else to do it.

Everybody0523 marked an inline comment as not done.

Pulling out the getFrameIndexReference discussion into the main comments.

First of all, let me review the overall stack layout with the z/OS XPLINK calling convention. To my knowledge, the stack looks basically like this (from high addresses to low addresses):

             ========================================================
             Caller's local variables + spill area
             --------------------------------------------------------
             Caller's argument area (passed to callee)
SP1 + 2176   --------------------------------------------------------
             Caller's 128-byte GPR save area
SP1 + 2048   ========================================================
             Callee's local variables + spill area
             --------------------------------------------------------
             (unless leaf) Callee's argument area 
             --------------------------------------------------------
             (unless leaf) Callee's 128-byte GPR save area
SP2 + 2048   ========================================================


SP1:  Stack pointer at function entry to callee (beginning of prologue)
SP2:  Stack pointer during callee (end of prologue)

SP1 - SP2 == Callee's frame size

Is this correct? If not, please correct. (In any case, it would be good to have a diagram along those lines as comments in this file as well.)

The first question is, what is the base to be used for frame references? There is some flexibility, but it seems most useful to use the incoming SP, modulo the stack pointer bias, for that (i.e. the value "SP1 + 2048" in the above diagram). This would mean that you access the incoming parameters using positive offsets relative to that base, and local variables using negative offsets relative to that base.

The getFrameIndexReference routine needs to compute that base value (and then add the offset) starting from current register values, i.e. SP2 in the above diagram. The default computation (done by the default implementation of the routine) does:

return StackOffset::getFixed(MFI.getObjectOffset(FI) + MFI.getStackSize() -
                             getOffsetOfLocalArea() +
                             MFI.getOffsetAdjustment());

Your proposed platform-specific implementation does instead:

return StackOffset::getFixed(MFI.getObjectOffset(FI) + getAllocatedStackSize(MF) +
                             getOffsetOfLocalArea() +
                             MFI.getOffsetAdjustment());

There's two differences:
1.) you use getAllocatedStackSize instead of MFI.getStackSize
2.) you add instead of subtract the getOffsetOfLocalArea value

Both of these look suspicious to me.

As to the first difference, the difference between getAllocatedStackSize and MFI.getStackSize is that the former adds the 128 bytes that are used for the (callee's) GPR save area, which wasn't part of the stack frame size before. I think it would be preferable to actually add that size to the stack size, so it will be included in MFI.getStackSize to begin with. This would also help other users of that routine.

To do so, you should simply call MFFrame.setStackSize in the emitPrologue routine, similar to what is already done in the ELF ABI case.

As to the second difference, I do not really understand how the local area offset is currently being used by the XPLINK ABI. The constructor currently sets this up as:

SystemZXPLINKFrameLowering::SystemZXPLINKFrameLowering()
    : SystemZFrameLowering(TargetFrameLowering::StackGrowsUp, Align(32), 128,
                           Align(32), /* StackRealignable */ false),

So we have an upwards growing stack with a local area offset of 128 bytes. This seems wrong. First of all, the stack is actually downwards growing - note how emitPrologue subtracts from the stack pointer (for an upwards growing stack it would add to the stack pointer).

So one change might be to use StackGrowsDown instead, and then use -128 instead of 128 for the local area offset. (With StackGrowsDown, the offset must be negative.) That would already fix the sign problem your code works around.

However, I think even that is really incorrect. I don't believe in the above diagram we have anything that would match the "local area offset" as used by common code here -- that offset would refer to some fixed members placed between the base of the frame (i.e. SP1 + 2048 as discussed above) and the local variable/spill area -- but there are no fixed members in that place. So I think this really should be 0 anyway.

Maybe the intent was to use the top of the caller's GPR save area as the frame base instead? Then the 128 byte would account for that incoming space. But I'm not sure this has any particular benefit. (@Kai, any comments on that?)

Note that the parameter set up code in LowerFormalParameters et. al. currently also assumes that the frame offsets for incoming parameters are relative to the top of the caller's GPR save area, and not the incoming SP. So depending on the decision what point to use as frame base, we may need to update how those offsets are being computed as well.

Kai added a comment.Dec 6 2021, 1:27 PM

Reading the long comment, I guess that there is a systematic error in the calculation of the stack offsets.

First, the diagram looks correct to me. And obviously, the stack grows down.

The local area offset is used to "jump" over the 128 byte GPR save area. According to your description, the semantic of the local area is different, which indicates to me that this field should not be used for this purpose. I guess the different sign of the LocalAreaOffset is also the cause for the strange calculation in getAllocatedStackSize().

For the frame reference, the idea seems to be to reach out into the caller's stack frame, as the formula ends up being: `stack bias + stack size of function + jump over GPR save area + object offset". While the offset of incoming arguments is relative to the GPR save area, the fixed spill offsets are relative to stack pointer =

Well, thinking about all this I guess the best way forward is:

  • not using the LocalAreaOffset, because it has a different semantic
  • get rid of getAllocatedStackSize()
  • use "incoming SP + stackbias" as base for all stack offset calculations, because that is more obvious

Well, thinking about all this I guess the best way forward is:

  • not using the LocalAreaOffset, because it has a different semantic
  • get rid of getAllocatedStackSize()
  • use "incoming SP + stackbias" as base for all stack offset calculations, because that is more obvious

Agreed. For the last part, the only thing that needs to change is the fixed stack offsets for the parameter slots, which now need to include the 128 byte area. This could be done locally in LowerFormalParameters when doing the CreateFixedObject call. But I suspect it might be more straightforward to change the offset computation in SystemZCallingConvention.td to have the correct offset in VA.getLocMemOffset() to begin with.

uweigand added inline comments.Dec 7 2021, 10:48 AM
llvm/lib/Target/SystemZ/SystemZCallingConv.td
169

I'm wondering if it wouldn't be more straightforward to handle this in determineCalleeSaves, in the place where the frame pointer register is handled?

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
1055

This is no longer called anywhere and should be deleted.

1165

This function is now actually identical to the default implementation and should also be deleted.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1504

It's a bit ugly, but I'd be OK with it for now. It would be good to add a FIXME saying that ideally the call frame size should have already been included in the offset.

Everybody0523 added inline comments.Dec 7 2021, 8:43 PM
llvm/lib/Target/SystemZ/SystemZCallingConv.td
169

That's what this line is for. Adding a register to SavedRegs in determineCalleeSaves doesn't do anything unless that register is marked as a callee-saved register here.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1504

So originally I actually included it in the offset, but I found that it would force similar ugliness in the LowerCall routine here.

As I understand it, if the call frame size is included in the offset just for XPLINK, then in the location I linked above we would have to only add the call frame size for ELF targets, since the call frame size is already included. ELF callees instead rely on SystemZELFFrameLowering::getFrameIndexReference to jump over the call frame size.

So I think if we were to add each target's call frame size to the offset, we could actually eliminate the override of getFrameIndexReference for ELF as well. Please correct me if I'm mistaken about the ELF code though.

But with that said, I'd prefer not to touch the ELF code in this patch so if you're OK with it for now I'll just leave it as a FIXME

uweigand accepted this revision.Dec 8 2021, 3:00 AM

LGTM.

llvm/lib/Target/SystemZ/SystemZCallingConv.td
169

Ah, I see. This is OK then.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1504

Indeed, this is another place where the current abstractions break down. I think the proper fix will be to change that line to simply:

unsigned Offset = Regs->getStackPointerBias() + VA.getLocMemOffset();

and then set StackPointerBias to 160 for ELF. Those 160 bytes are really a stack bias on ELF - they're part of the callee's stack frame, but always allocated by the caller.

But all this should be done in a follow-up patch. For now, this patch LGTM as is.

This revision is now accepted and ready to land.Dec 8 2021, 3:00 AM
This revision was automatically updated to reflect the committed changes.

Hey folks, it looks like this commit has caused the buildbot here to run into failures. Can someone please revert this change, fix it and then commit it again?

muiez added a subscriber: muiez.Dec 14 2021, 11:25 AM

Hey folks, it looks like this commit has caused the buildbot here to run into failures. Can someone please revert this change, fix it and then commit it again?

Yup, sure. Reverted here: https://github.com/llvm/llvm-project/commit/ebf5497b269f5769b53320dd81290714642e4306

Everybody0523 updated this revision to Diff 394366.EditedDec 14 2021, 1:12 PM

This generates STG instruction used to save the frame pointer correctly. It was previously generated as follows:

BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG), SystemZ::R0D)
    .addReg(SystemZ::R4D)
    .addImm(Offset)
    .addReg(0);

This was incorrect as the above invocation of BuildMI would cause the above instruction to give R0D the RegState::Define flag (see here), which does not make sense for a store instruction. The correct way is to generate the instruction as follows:

BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG))
    .addReg(SystemZ::R0D)
    .addReg(SystemZ::R4D)
    .addImm(Offset)
    .addReg(0);

Unfortunately it seems that this did not trigger an error unless EXPENSIVE_CHECKS was used, hence the buildbot failure.

uweigand added inline comments.Dec 15 2021, 12:08 AM
llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
1115

I think you also need to add RegState::Kill for R0D here.

Add RegState::Kill to MIBuilder when generating STG to store R0D into stack slot.

LGTM, please commit again.