This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] The Local Area Offset in SystemZFrameLowering.
ClosedPublic

Authored by jonpa on Nov 19 2019, 2:56 AM.

Details

Reviewers
uweigand
Summary

I began looking at how to implement the packed stack layout and got kind of stuck on understanding the current code. The use of the LocalAreaOffset with a value of -160 does not seem to have any real benefit, although it certainly is confusing to me... Just the notion of "the local area" per

/// getOffsetOfLocalArea - This method returns the offset of the local area
/// from the stack pointer on entrance to a function.
///
int getOffsetOfLocalArea() const { return LocalAreaOffset; }

is problematic since we don't really have a single constant offset to a single area for locals. We have the register save area between 0 and +160 from EntrySP, and then we have another local area for e.g.spills, which would begin at SP+160 *after the prologue*. Incoming stack arguments would be at +160 and above from EntrySP. The use of this offset in different context is very confusing to me, and it seems much more readable to instead just allocate the stack objects with fixed offsets explicitly relative to EntrySP.

This offset is used when PEI::calculateFrameObjectOffsets() computes the offsets. This becomes -168 for the first spilled reg. TargetFrameLowering::getFrameIndexReference() then subtracts the LAO, which is negative, so we get -8 back again (plus stack size). I don't see any other uses, and it doesn't seem like we gain anything from this.

The debug dumps are currently broken, which may not be a real problem, but still:

fi#-3: size=8, align=8, fixed, at location [SP+280]
fi#-2: size=8, align=8, fixed, at location [SP+272]
fi#-1: size=8, align=8, fixed, at location [SP+160]

This is actually one incoming stack argument at +0, and then %r14 and %r15 saved at +112 and +120 while the MachineFrameInfo::print() has subtracted the LocalAreaOffset of -160 to them all.

This patch seems to work (knock-knock), by simply setting this offset to 0 and instead add a SystemZMC::RegisterSaveAreaSize constant (160) in places as needed instead.

  • Removed SystemZMC::CallFrameSize - slightly confusing to me to have this as a constant since a "call frame" could also include stack arguments in my world. Use instead SystemZMC::RegisterSaveAreaSize to represent the 160-byte base area.
  • All SystemZ calls to CreateFixedObject() needs to be adjusted:

LowerFormalArguments():

  • incoming stack arguments needs RegisterSaveAreaSize added to be +160.
  • varargs:
    • add RegisterSaveAreaSize for setVarArgsFrameIndex() to get +160 from incoming SP for first stack arg.
    • RegSaveIndex is +0 from incoming SP...
    • No need to add offset for storing of FPR to the fixed slots since the table has the correct offset from incoming SP.

lowerFRAMEADDR(): The frame address should be the incoming SP, which is also the address of the back chain, so just give 0 as offset.

  • Users of getObjectOffset():

PEI::calculateFrameObjectOffsets() / MachineFrameInfo::estimateStackSize(): Should return the same value as before, since even though we now give an offset of 160 for a stack arg, this is negated and therefore not counted. This makes sense I think since it's a check into the local area to be allocated.

TargetFrameLowering::getFrameIndexReference(): As before except the offset is now included in the SP Offsets for each object while LAO is 0.

SystemZFrameLowering::processFunctionBeforeFrameFinalized(): The +160 offset is already part of the stackarg ObjectOffset so don't add it again. In fact, the current code is conservatively wrong, because it adds +160 also to the register saves, which already have the final offsets from the table into the Register save area.

SelectionDAGAddressAnalysis.cpp, in BaseIndexOffset::equalBaseIndex(): This is just a relative difference so it shouldn't matter if the LAO is included or not.

  • I don't quite understand

SystemZMC::CFAOffsetFromInitialSP: A little confused as to why this has an offset from the incoming SP, as the Canonical Frame Address is supposed to be the incoming SP, per definition, or? It seems that we are pretending that the 160-byte base area is part of the call, so the CFA is above it. If I make this 0, then two DebugInfo/SystemZ/ tests fail, so it seems this is needed, unless I just forgot to update some other place as well.

the asserts in TargetInstrInfo::foldMemoryOperand() / TargetLoweringBase::emitPatchPoint():

assert(MFI.getObjectOffset(FI) != -1)

All tests pass, and SPEC is NFC except for one file where it seems that the emergency spill slots are no longer allocated (see above under processFunctionBeforeFrameFinalized()).

Diff Detail

Event Timeline

jonpa created this revision.Nov 19 2019, 2:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2019, 2:56 AM

So here is how this is supposed to work, as I understand it.

First of all, on SystemZ, unlike all other platforms, the DWARF CFA (Canonical Frame Address) is not equal to the incoming stack pointer, but to incoming stack pointer plus 160. This is simply a fact of the SystemZ ABI that we cannot change at this point.

This also affects the definition of the "getOffsetOfLocalArea" value. While the comment calls this "the offset of the local area from the stack pointer on entrance to a function", we interpret this as "the offset of the local area from the CFA" -- on all other platforms this is equivalent, but for us the difference matters. Note that since the stack grows downward, this offset describes the number of bytes we have to go downward from the CFA in order to reach the topmost byte of the local area (which is therefore simply the size of the register save area).

Before register allocation, (fixed) frame objects can be allocated relative to the CFA plus some offset. This allows reaching incoming arguments (CFA + 0 .. CFA + N) as well as the register save area (CFA ... CFA - 160). During register allocation, non-fixed frame objects are then allocated to the "local area", i.e. not within the first 160 bytes counting downwards from the CFA. In a second step, in order to establish addressability during function execution, the address is converted from "CFA - offset" into "SP (or frame pointer in case of variable stack size) + offset". Note that in that computation, the size of the other register save area (which the current function must allocate for its callees) must be included.

Given all that, I believe the SystemZ backend handles this correctly, so I don't think these changes are really necessary.

jonpa updated this revision to Diff 230216.Nov 20 2019, 2:15 AM
jonpa retitled this revision from [SystemZ] Stop using the Local Area Offset in SystemZFrameLowering. to [SystemZ] The Local Area Offset in SystemZFrameLowering..

Thank you very much for the explanation - it does make more sense now :-)

I think it would be nice to have this mentioned in a comment, so I tried to reuse part of your answer here in the SystemZFrameLowering() constructor.

The one part that stands out from this then to me, is the fact that the SpillOffsetTable is relative to the incoming SP, which is why the offsets are incorrect in the created stack objects. To have a fully correct IR, I guess we should return offsets relative to the CFA, in other words subtract 160 from each entry. But there is no real point in this since those offsets are never used (we compute the offset for the STMG instead), so I instead figured that at least having a comment mentioning that might be helpful.

Unfortunately I was still left with a slight confusion, as I am not quite sure how this works for the FPRs part of that table: Normally those FPRs are not callee-saved, so it wouldn't matter. But how about the CSR_SystemZ_AllRegs calling convention? If F0D would be callee-saved, would it end up in the right place?

The one part that stands out from this then to me, is the fact that the SpillOffsetTable is relative to the incoming SP, which is why the offsets are incorrect in the created stack objects. To have a fully correct IR, I guess we should return offsets relative to the CFA, in other words subtract 160 from each entry. But there is no real point in this since those offsets are never used (we compute the offset for the STMG instead), so I instead figured that at least having a comment mentioning that might be helpful.

Hmm, this may actually be simple to fix, have the SpillOffsetTable (which is used by common code) use offsets relative to the CFA, and convert it to offset relative to incoming SP when filling RegSpillOffsets (which is used by the SystemZ code). If this reduces confusion when reading MI dumps, this may be worthwhile to do.

Unfortunately I was still left with a slight confusion, as I am not quite sure how this works for the FPRs part of that table: Normally those FPRs are not callee-saved, so it wouldn't matter. But how about the CSR_SystemZ_AllRegs calling convention? If F0D would be callee-saved, would it end up in the right place?

F0D, F2D, F4D, F6D are used for passing floating-point arguments and must be saved into those specific slots in a function that uses variable argument lists, since the va_list holds a pointer to the start of the register save area, and va_arg assumes the first four floating-point arguments are stored at those particular offsets relative to that pointer. If you use AllRegs, they'll also be saved there, which isn't really required, but doesn't hurt either. Any other register, whether the normal call-saved register, or the other call-clobbered ones when using AllRegs, do not have assigned slots and will be saved in some arbitrary place on the stack allocated during register allocation.

jonpa added a comment.Nov 21 2019, 2:03 AM

F0D, F2D, F4D, F6D are used for passing floating-point arguments and must be saved into those specific slots in a function that uses variable argument lists, since the va_list holds a pointer to the start of the register save area, and va_arg assumes the first four floating-point arguments are stored at those particular offsets relative to that pointer. If you use AllRegs, they'll also be saved there, which isn't really required, but doesn't hurt either. Any other register, whether the normal call-saved register, or the other call-clobbered ones when using AllRegs, do not have assigned slots and will be saved in some arbitrary place on the stack allocated during register allocation.

For variable arguments, I see that this works since we are indeed adding -160 to the offsets from the table in LowerFormalArguments().

If I compile

define anyregcc void @anyregcc1() {
entry:
  call void asm sideeffect "", "~{f0}"() nounwind
  ret void
}

I get

anyregcc1:                              # @anyregcc1
        .cfi_startproc
# %bb.0:                                # %entry
        std     %f0, 288(%r15)          # 8-byte Folded Spill
        .cfi_offset %f0, 128
        #APP
        #NO_APP
        ld      %f0, 288(%r15)          # 8-byte Folded Reload
        br      %r14

This seems broken to me since the correct offset for %f0 from the incoming SP is 128, or?

Right. It seems we really do need to implement the change I described in my last comment then. I've just quickly tried this patch:


and this seems to fix the problem for your test case:

anyregcc1:                              # @anyregcc1
        .cfi_startproc
# %bb.0:                                # %entry
        std     %f0, 128(%r15)          # 8-byte Folded Spill
        .cfi_offset %f0, -32
        #APP
        #NO_APP
        ld      %f0, 128(%r15)          # 8-byte Folded Reload
        br      %r14

Can you check whether this doesn't introduce any regressions anywhere, and integrate your comment and test case? Thanks!

jonpa updated this revision to Diff 230918.Nov 25 2019, 9:32 AM

Applied your patch, which is NFC on SPEC. Fixed your comment above SpillOffsetTable slightly to "..., relative to the CFA (i.e. incoming stack pointer + SystemZMC::CallFrameSize)".

I extended llvm/test/CodeGen/SystemZ/anyregcc-novec.ll to also check for the actual offsets for the FP argument registers. The cfi_offset:s for these registers have now also changed, and I added them as well to the test.

uweigand accepted this revision.Nov 25 2019, 9:37 AM

Minor suggestion on the test case, otherwise this LGTM. Thanks!

llvm/test/CodeGen/SystemZ/anyregcc-novec.ll
7 ↗(On Diff #230918)

If we check absolute stack offsets below, we should probably add a check for the instruction that decrements the stack pointer here as well (the aghi %r15, -...).

This revision is now accepted and ready to land.Nov 25 2019, 9:37 AM
jonpa closed this revision.Nov 25 2019, 10:09 AM
jonpa marked an inline comment as done.

Committed as a7d3f69.