This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Support the kernel backchain
ClosedPublic

Authored by jonpa on Feb 12 2020, 1:37 PM.

Details

Summary

In order to build the Linux kernel, the back chain must be supported together with packed-stack. The back chain is then stored topmost in the register save area.

Diff Detail

Event Timeline

jonpa created this revision.Feb 12 2020, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 1:37 PM
jonpa added a comment.Feb 12 2020, 1:41 PM

allow packed-stack in most of the cases where it's currently forbidden (with varargs, if the frame address is taken, and with backchain) -- since packed-stack is then not just an optimization, but an ABI change

Packed-stack stack is now also allowed with the back chain and with the frame address taken. I am however unsure of what to do with the varargs since this is hard coded all the way from the front end. It seems that GCC is keeping the standard offsets into the reg-save area. What would varargs with packed-stack in llvm mean?

lowerFRAMEADDR(): According to the documentation, zero can be returned if it cannot be identified, which this patch does when the back chain is not present. GCC seems to return just the address of where it would have been even if it is not present (then containing the saved %r15 reg instead). Doing it this way simplifies lowerFRAMEADDR(), I would think.

jonpa updated this revision to Diff 244451.Feb 13 2020, 8:50 AM

Allow packed-stack with varargs, placing the vararg arguments on the default slots in the reg save area.

Maybe move the checks in usePackedStack() to SystemZDAGToDAGISel::runOnMachineFunction()?

jonpa added a comment.Feb 15 2020, 3:38 PM

It seems that compiling a vararg function with gcc -mpacked-stack and -msoft-float places the stored GPRs in the default slots. However, by adding also -backchain (in addition to the previous two args) the GPRs are saved right below the backchain, which is at offset 152 (topmost). This patch does currently not move up the GPRs in the case of saving the backchain, but. I am guessing it should (in order to have "constant and known distance from the backchain to the return address")? If so, I suppose the the front-end should emit a different offset for vararg reg arguments in case of packed-stack + soft-float. Could it be the same regardless of backchain, or?

It seems that compiling a vararg function with gcc -mpacked-stack and -msoft-float places the stored GPRs in the default slots.

Looks like it does, but this isn't really required. You might as well store the GPRs right at the top of the save area.

However, by adding also -backchain (in addition to the previous two args) the GPRs are saved right below the backchain, which is at offset 152 (topmost). This patch does currently not move up the GPRs in the case of saving the backchain, but. I am guessing it should (in order to have "constant and known distance from the backchain to the return address")?

Yes, this is definitely required. Otherwise backtracking using the kernel backchain will not work.

If so, I suppose the the front-end should emit a different offset for vararg reg arguments in case of packed-stack + soft-float. Could it be the same regardless of backchain, or?

No, the offsets must remain the same, or else the ABI of the va_list type will change. (It must be possible to create a va_list in a function that is built with -mpacked-stack, and pass that list to another function built with -mno-packed-stack to use va_arg on that va_list.)

Instead, you should store the GPRs whereever they need to be stored (at or near the top of the save area), and the compute from that the appropriate value to be stored into the reg_save_area slot in the va_list struct. So for example, if you store r15 right at the top of the save area (sp + 152), then you should set reg_save_area to sp + 32; if you store r15 right below the back chain (sp + 144), then you should set reg_save_area to sp + 24 etc. That way, all existing va_arg code accessing this va_list will find the saved GPRs at the right position. (It won't find saved FPRs -- but it won't look for those if it is built with -msoft-float, and -msoft-float *is* an ABI changing option, so it's OK to require everything to be built with the same setting here.)

To implement this, you'll need to change the place in SystemZTargetLowering::LowerFormalArguments where the value to be used for reg_save_area is set:

int64_t RegSaveOffset = -SystemZMC::CallFrameSize;
unsigned RegSaveIndex = MFI.CreateFixedObject(1, RegSaveOffset, true);
FuncInfo->setRegSaveFrameIndex(RegSaveIndex);

This should take into account instead the specific offset where you've decided to save GPRs to.

jonpa updated this revision to Diff 245545.Feb 19 2020, 3:44 PM

Thank you for review. Patch updated accordingly.

Also, I refactored assignCalleeSavedSpillSlots() to not duplicate code in the case of packed-stack. That refactoring alone is this file:

Yes, this now looks correct to me. However, we now have duplicated computation between LowerFormalArguments and assignCalleeSavedSpillSlots again, which I don't really like.

I'm wondering whether the best solution wouldn't be to simply make the SystemZFrameLowering::getRegSpillOffset routine take packed stack into account itself. E.g. something like this:

unsigned getRegSpillOffset(const MachineFunction &MF, unsigned Reg) const {
    unsigned Offset = RegSpillOffsets[Reg];
    if (usePackedStack(MF)) {
      if (SystemZ::GR64BitRegClass.contains(Reg)) {
        if (!(IsVarArg && !SoftFloat)) {
	​    // Put all GPRs at the top of the Register save area with packed
	​    // stack. Make room for the backchain if needed.
	​    Offset += BackChain ? 24 : 32;
        } else
  ​        Offset = 0;
    }
    return Offset;
}

and then use that function throughout LowerFormalArguments and assignCalleeSavedSpillSlots ...

clang/lib/Driver/ToolChains/Clang.cpp
2017

It's a bit weird to hard-code only "-mhard-float" here. I guess this is because we may not actually have any "getLastArg" since hard-float is the default? In that case I guess we might as well hard-code the whole string.

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
477

SystemZMC::CallFrameSize - 8 would be nicer than the hard-coded 152.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1469

See main comment about this duplicated logic here.

3260

Hmm, GCC will always return the address where the back-chain would have been, even if it isn't emitted. Not sure whether that is very useful, but maybe we should emulate it simply for compatibility reasons ...

jonpa updated this revision to Diff 245684.Feb 20 2020, 10:06 AM
jonpa marked 5 inline comments as done.

However, we now have duplicated computation between LowerFormalArguments and assignCalleeSavedSpillSlots again, which I don't really like.

I'm wondering whether the best solution wouldn't be to simply make the SystemZFrameLowering::getRegSpillOffset routine take packed stack into account itself.

Yes, that seems even better :-) getRegSpillOffset() MF argument non-const due to usePackedStack().

clang/lib/Driver/ToolChains/Clang.cpp
2017

yes

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
3260

I thought it might be bad to create two stack slots to the same location (slot of abscent backchain / R15), but it appears to be fine.

However, if creating that slot in case of no saved regs but with a local area, as in frameaddr-02.ll:fp1f, this ends up actually creating extra stack space for the backchain without using it.

Not sure what would be best to do in that case if not returning 0...

OK, just a few more small comments, otherwise looks really good now. Thanks!

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
114

Maybe easier to understand:

int CurrOffset = -SystemZMC::CallFrameSize;
if (usePackedStack(MF))
  CurrOffset += StartSPOffset;
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1477–1479

I believe this now needs to be -SystemZMC::CallFrameSize + Offset instead of RegSaveOffset + Offset.

3257

We probably shoud use usePackedStack here for consistency.

3260

I see. I guess it's fine to return 0 in that case then. (This is anyway not a mode that's used by real-world use cases.)

jonpa updated this revision to Diff 245887.Feb 21 2020, 9:05 AM
jonpa marked 5 inline comments as done.

Thanks for review - patch updated.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1477–1479

I think that should be equivalent due to the '!(IsVarArg && !SoftFloat)' check in getRegSpillOffset().

Would it be clearer to use -SystemZMC::CallFrameSize here still?

uweigand added inline comments.Feb 22 2020, 5:49 AM
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1477–1479

Yes, since if there were any offset, it would now be included in getRegSpillOffset ... Right now, it seems like it would be added twice (since you're adding the output of two getRegSpillOffset calls).

jonpa updated this revision to Diff 246076.Feb 22 2020, 8:21 AM
jonpa marked an inline comment as done.

Patch updated per review to use -SystemZMC::CallFrameSize in LowerFormalArguments().

uweigand accepted this revision.Feb 23 2020, 9:04 AM

LGTM, thanks!

This revision is now accepted and ready to land.Feb 23 2020, 9:04 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2020, 1:46 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript