Page MenuHomePhabricator

[SystemZ, IPRA] determineCalleeSaves must always add return register and DP.
Needs ReviewPublic

Authored by jonpa on Apr 29 2018, 3:18 AM.

Details

Summary

When enabling IPRA on SystemZ, a benchmark crashed and I found out that a function had been optimized to not save the Callee Saved Registers, which unfortunately included the return register (%r14) on SystemZ, so when the function tried to return UB resulted.

To fix this, I beleive SystemZFrameLowering::determineCalleeSaves() needs to add R14D to SavedRegs. I also think the DP should be added (R11), in case the calling function uses DP, right?

In SystemZFrameLowering::determineCalleeSaves(), the first call to TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS) will not add any registers to SavedRegs if

// When interprocedural register allocation is enabled caller saved registers
// are preferred over callee saved registers.
if (MF.getTarget().Options.EnableIPRA && isSafeForNoCSROpt(MF.getFunction()))
  return;

So, the question then is what registers we must save in this case, and which we in fact do not have to save, to get better performance with IPRA. So far, I have come to the conclusion that all the specially treated registers that SystemZFrameLowering::determineCalleeSaves is already adding needs to be saved with IPRA.

I made two tests - one each for R11 and R14. For some reason I could not process the debug output without doing a separate run for it.

Diff Detail

Event Timeline

jonpa created this revision.Apr 29 2018, 3:18 AM

When enabling IPRA on SystemZ, a benchmark crashed and I found out that a function had been optimized to not save the Callee Saved Registers, which unfortunately included the return register (%r14) on SystemZ, so when the function tried to return UB resulted.

Why the caller function did not save/restore return register value around when for a callee IPRA decided to have noCSR opt?
Also I think there should be other architecture for which also these tests will break due to same reason. If this is bug applicable for other architectures too then I think it is better to update TargetFrameLowering::determineCalleeSaves() so that it can add such specially treated registers to saved list or other architecture must update their determineCalleeSaves() as implemented in this change.

To fix this, I beleive SystemZFrameLowering::determineCalleeSaves() needs to add R14D to SavedRegs. I also think the DP should be added (R11), in case the calling function uses DP, right?

I see determineCalleeSaves() implementation for few other architectures and none of them checking if IPRA is enable so are they handling such case in different way or they are unaware of this kind of failure?

jonpa added a comment.Apr 29 2018, 7:17 AM

When enabling IPRA on SystemZ, a benchmark crashed and I found out that a function had been optimized to not save the Callee Saved Registers, which unfortunately included the return register (%r14) on SystemZ, so when the function tried to return UB resulted.

Why the caller function did not save/restore return register value around when for a callee IPRA decided to have noCSR opt?

The caller makes a call with 'brasl %r14, fun', which sets %r14 to the return address. When the callee returns, it does 'br %r14'. So it is not around the call to 'fun' the return register needs to be saved/restored, but *inside* of it.

Also I think there should be other architecture for which also these tests will break due to same reason. If this is bug applicable for other architectures too then I think it is better to update TargetFrameLowering::determineCalleeSaves() so that it can add such specially treated registers to saved list or other architecture must update their determineCalleeSaves() as implemented in this change.

To fix this, I beleive SystemZFrameLowering::determineCalleeSaves() needs to add R14D to SavedRegs. I also think the DP should be added (R11), in case the calling function uses DP, right?

I see determineCalleeSaves() implementation for few other architectures and none of them checking if IPRA is enable so are they handling such case in different way or they are unaware of this kind of failure?

That's a good point. I suspect the other targets have to do the same thing, since determineCalleeSaves is there to specify this. I mean it is not just the return register, but also for instance on SystemZ some landing pad registers.

I do not believe we need to do anything special with R11. This is only special if it is used as frame pointer, in which case the code before already adds it as caller-saved:

if (HasFP)
  SavedRegs.set(SystemZ::R11D);

I agree we need to handle R14, however. This is for the scenario where we have a leaf function (so the MFFrame.hasCalls() check will return false), but R14 is still clobbered by some *other* instruction in the current function. In this case we always need to save and restore it, since the return instruction will implicitly rely on the incoming R14 value.

vivekvpandya added inline comments.Apr 30 2018, 3:29 AM
lib/Target/SystemZ/SystemZFrameLowering.cpp
97

From last comment by @uweigand it seems that the actual bug is here shouldn't hasCall() for systemZ detect instructions which modifies R14 except load instruction?

Well, usually this is handled correctly by the isPhysRegModified call in SystemZFrameLowering::determineCalleeSaves. (The only reason why we even need the extra hasCalls check is that the call instruction is currently not at all stages modeled correctly to show that R14 is clobbered.)

The only problem is that in the IPRA special case, the SystemZFrameLowering::determineCalleeSaves handling is skipped. I think we simply should do an unconditional check in SystemZ code, like so:

if (MFFrame.hasCalls() || MRI.isPhysRegModified(SystemZ::R14D))
  SavedRegs.set(SystemZ::R14D);
jonpa added a comment.Apr 30 2018, 7:56 AM

I do not believe we need to do anything special with r11. This is only special if it is used as frame pointer, in which case the code before already adds it as caller-saved:

if (HasFP)
SavedRegs.set(SystemZ::R11D);

I reran ipra-03.ll with saving r11 commented out, and got this debug output:

++++++++++++++++++++ Register Usage Information Propagation ++++++++++++++++++++
MachineFunction : fun0
 -------------------- Register Usage Information Collector Pass --------------------
Function Name : fun0
Clobbered Registers: fun0 function optimized for not having CSR.
$cc $r0d $r1d $r2d $r3d $r11d $r12d $r13d $r14d $r15d $r1h $r2h $r13h $r14h $r15h $r0l $r1l $r2l $r3l $r11l $r12l $r13l $r14l $r15l $r0q $r2q $r10q $r12q $r14q
----------------------------------------
 ++++++++++++++++++++ Register Usage Information Propagation ++++++++++++++++++++
MachineFunction : BZ2_blockSort
Call Instruction Before Register Usage Info Propagation :
CallBRASL @fun0, $r2d, $r3d, $r4d, $r5d, $r6d, <regmask $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d $f8q $f9q $f12q $f13q $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $r6d $r7d $r8d $r9d \
$r10d $r11d $r12d $r13d $r14d $r15d $r6h $r7h $r8h $r9h $r10h $r11h $r12h $r13h $r14h $r15h $r6l $r7l $r8l $r9l $r10l $r11l $r12l $r13l $r14l $r15l $r6q $r8q $r10q $r12q $r14q>, implicit-def\
 dead $r14d, implicit-def dead $cc, implicit-def $r2d

Call Instruction After Register Usage Info Propagation : CallBRASL @fun0, $r2d, $r3d, $r4d, $r5d, $r6d, <regmask $noreg $a0 $a1 $a2 $a3 $a4 $a5 $a6 $a7 $a8 $a9 $a10 $a11 $a12 $a13 $a14 $a15 \
$c0 $c1 $c2 $c3 $c4 $c5 $c6 $c7 $c8 $c9 $c10 $c11 $c12 $c13 $c14 $c15 $v0 $v1 $v2 $v3 $v4 $v5 $v6 $v7 $v8 $v9 $v10 $v11 $v12 $v13 $v14 $v15 $v16 $v17 $v18 $v19 $v20 $v21 $v22 $v23 $v24 $v25 \
$v26 $v27 $v28 $v29 $v30 $v31 $f0d $f1d $f2d $f3d $f4d $f5d $f6d $f7d $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d $f16d $f17d $f18d $f19d $f20d $f21d $f22d $f23d $f24d $f25d $f26d $f27d $f\
28d $f29d $f30d $f31d $f0q $f1q $f4q $f5q $f8q $f9q $f12q $f13q $f0s $f1s $f2s $f3s $f4s $f5s $f6s $f7s $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $f16s $f17s $f18s $f19s $f20s $f21s $f22\
s $f23s $f24s $f25s $f26s $f27s $f28s $f29s $f30s $f31s $r4d $r5d $r6d $r7d $r8d $r9d $r10d $r0h $r3h $r4h $r5h $r6h $r7h $r8h $r9h $r10h $r11h $r12h $r4l $r5l $r6l $r7l $r8l $r9l $r10l $r4q\
 $r6q $r8q>, implicit-def dead $r14d, implicit-def dead $cc, implicit-def $r2d

fun0 uses %r11l, so Collector adds %r11l and %r11d. Propagation then updates the regmask on the call to fun0 to clobber those registers.

I am a bit unsure about what would happen with %r11 in the calling function if hasFP() returns true. In that case %r11 is a reserved register in that function, so regalloc will not try to use it. In the top of Propagate, I see

/// This updated RegMask will be used by the register allocator while allocating
/// the current MachineFunction.

It doesn't say here that it will save a reserved register around a call that clobbers it. Generally, I would fear that a reserved register is under the responsibility of the Target, but maybe I am missing something here? (I assume that's why we have to add it in determineCalleeSaves ourselves if HasFP is true?)

This looks like a generic problem to me, many platforms have registers that are reserved only in some functions but not others. If function A where a register is reserved calls function B where the register is not reserved, something must ensure that the register is preserved across the call to B. Usually, this is the case because such registers need to be callee-saved. But if function B is optimized via IPRA to not save the register, we have a problem ...

Can you check how this is solved on other platforms? I guess X86 likewise should use a conditionally-reserved register as frame pointer ...

I run ipra-02.ll with --enable-ipra option with lldb the contorl reaches to SystemZFrameLowering::determineCalleeSaves() through https://github.com/llvm-mirror/llvm/blob/5b508199fa870516e29b847b8af2f85887e05d56/lib/CodeGen/PrologEpilogInserter.cpp#L525
for both functions HasFP is set to false that is why determineCalleeSaves() does not add R11. Can you please verify this?

vivekvpandya added inline comments.Apr 30 2018, 10:26 AM
test/CodeGen/SystemZ/ipra-03.ll
88

here changing "no-frame-pointer-elim"="true" fixes the problem. It adds spill/restore for R11

But we do not want to force use of a frame pointer, in general this just reduces performance for no reason on Z. Some functions require a frame pointer (those that do dynamic allocas), but most do not.

But we do not want to force use of a frame pointer, in general this just reduces performance for no reason on Z. Some functions require a frame pointer (those that do dynamic allocas), but most do not.

Then I think code should be modified in following manner:

if (HasFP || MRI.isPhysRegModified(SystemZ::R11D) )
  SavedRegs.set(SystemZ::R11D);

So, in effect, never do the IPRA callee-saved optimization for R11, because the caller *could* use it as frame pointer? That's certainly the conservative solution.

In fact, we're then bascially back to Jonas' original patch anyway.