This is an archive of the discontinued LLVM Phabricator instance.

Add "Restored" flag to CalleeSavedInfo
ClosedPublic

Authored by kparzysz on Aug 1 2017, 10:08 AM.

Details

Summary

This version should now be restricted to the fix for the liveness of LR.

This is a follow-up to this thread:
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115909.html

This patch has a few things packed together and is not really intended to be in the final form, but to demonstrate the approach for now.

The original issue was with the tail merging leaving inconsistent liveness information around. That is addressed in this patch by adding IMPLICIT_DEFs where needed. LivePhysRegs is used to determine whether something is a live-in to a block, and here's where the remaining 90% of this patch comes from:
On ARM, if MBB is a return block, LivePhysRegs will add LR to its live-ins, if LR is in CalleeSavedInfo. It does that for each register in CSI because the assumption is that if something is saved and restored, then it is live outside of the function. The problem with LR is that it is not restored (in many cases) and so it doesn't need to be treated as a live-out from the function, and subsequently as a live-in to the return block. In tail merging, having it as a live-in to a block caused IMPLICIT_DEFs defining LR to appear in some predecessors, which then prevented certain macro-fusion-like optimizations (merging cmp with a branch) from happening later on.

The approach to deal with it taken here is to add a field in CalleeSavedInfo indicating that the saved register is actually restored. It is set to "true" by default, and target's implementation of restoreCalleeSavedRegisters should only reset it to "false" when the register is not restored. Currently, this only applies to LR on ARM. This has the consequence that the CSI parameter is no longer "const", which is not something I'm a fan of. LivePhysRegs is the user of that information when it adds block live-outs.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz created this revision.Aug 1 2017, 10:08 AM

Fundamentally, the problem with LR on ARM is that it isn't really a callee-save register. We just pretend it's callee-save to avoid explicitly tracking its lifetime throughout the function. Actually, ARM isn't the only architecture that does this; the weird part about ARM is that it has the instruction "pop {pc}" (which loads the return address directly from the stack, without loading it back into LR).

If we are going to add a marking to CalleeSavedInfo, it would probably be better to add it in assignCalleeSavedSpillSlots, where we construct the CalleeSavedInfo, rather than waiting until restoreCalleeSavedRegisters.

If we are going to add a marking to CalleeSavedInfo, it would probably be better to add it in assignCalleeSavedSpillSlots, where we construct the CalleeSavedInfo, rather than waiting until restoreCalleeSavedRegisters.

The problem is that restoreCalleeSavedRegisters decides whether to use a pop instruction or not, which determines whether LR is restored, or loaded into PC directly. Are you suggesting moving some of that logic into assignCalleeSavedSpillSlots?

Oh, I see what you mean... we don't actually decide whether to "pop {pc}" until ARMFrameLowering::emitPopInst. Actually, there's another potential problem there: if shrink-wrapping ever allows emitting multiple epilogues (D36109 etc.), we might not make the same decision for each epilogue.

Long-term, it might be more straightforward to add implicit uses to return instructions, similar to the way we add implicit defs to calls, so we don't have to make return blocks a special case in LivePhysRegs. Not sure how hard that would be.

Long-term, it might be more straightforward to add implicit uses to return instructions, similar to the way we add implicit defs to calls, so we don't have to make return blocks a special case in LivePhysRegs. Not sure how hard that would be.

There is a review for that as well (D33003), and it ran into the same problem. It adds all registers that are live-out from the function to the return instructions (except those which are defined by the return instruction itself). The problem is that it again uses CSI to determine for which registers such implicit uses need to be created, and again LR causes trouble: since the return instruction (pop) doesn't restore it, the code will assume that it must have been restored prior to the return and add an implicit use of LR to the return. That, in turn, is a problem since LR remains undefined (clobbered).

qcolombet edited edge metadata.Aug 1 2017, 12:30 PM

Long-term, it might be more straightforward to add implicit uses to return instructions, similar to the way we add implicit defs to calls, so we don't have to make return blocks a special case in LivePhysRegs. Not sure how hard that would be.

Like Eli said if we could model LR properly that would be the best. That being said, I don't know what it will take to do that.

My guts say we should make the frame lowering logic more complex than it is already.

Oh, I see what you mean... we don't actually decide whether to "pop {pc}" until ARMFrameLowering::emitPopInst.

Wait, that's not right; we don't need to include lr in the list of live-outs either way. If we generate "pop {pc}", lr isn't used; if we generate "bx lr", the explicit use of lr should allow us to generate correct liveness.

My guts say we should make the frame lowering logic more complex than it is already.

*shouldn't

Wait, that's not right; we don't need to include lr in the list of live-outs either way. If we generate "pop {pc}", lr isn't used; if we generate "bx lr", the explicit use of lr should allow us to generate correct liveness.

The problem is that currently the return instructions don't carry any live-out information that is associated with callee-saved registers. LivePhysRegs needs to know that information, because CS registers need to be treated as live between the restore point and the exit from the function. It uses CalleeSavedInfo to get that data, which leads it to believe that LR is live at the return. If the return instruction uses LR explicitly (bx lr), this will have been already expressed in liveness, but if it's a "pop", it won't be. Still, LivePhysRegs will think that LR is live at the return and we end up with a problem.

The point I was trying to make is that we should always exclude LR from the set of registers added by addLiveOutsNoPristines: whether or not we actually convert the "bx lr" into a "pop", it isn't live out of the function, and we don't need to model it that way. So assignCalleeSavedSpillSlots wouldn't need any special logic to figure that out.

The point I was trying to make is that we should always exclude LR from the set of registers added by addLiveOutsNoPristines: whether or not we actually convert the "bx lr" into a "pop", it isn't live out of the function, and we don't need to model it that way. So assignCalleeSavedSpillSlots wouldn't need any special logic to figure that out.

Ah, I see. However, if we want to support multiple save/restore points with different sets of registers, this patch may not help. If we kept track of what registers were saved/restored in which block, it would make things a lot easier.

I think the problem is that LR is not really a callee-saved register in the sense that it doesn't need to (and isn't) preserved across function calls. It is still included in the save lists returned by getCalleeSavedRegs. Would it make sense to change the ARM backend to remove it from these lists?

The point I was trying to make is that we should always exclude LR from the set of registers added by addLiveOutsNoPristines: whether or not we actually convert the "bx lr" into a "pop", it isn't live out of the function, and we don't need to model it that way. So assignCalleeSavedSpillSlots wouldn't need any special logic to figure that out.

If we remove LR from CSI in assignCalleeSavedSpillSlots, it won't be saved, and the updateLiveness function won't add it to block live-ins. On the other hand, if LR remains in CSI, addLiveOutsNoPristines will add it to the set it returns. At the moment BX_RET is not marked as using LR, so that is not keeping LR live, when I add implicit use or LR to BX_RET, the verifier fails even before register allocation, since LR is not reserved and it expects the live-ins for LR to be correct from the start.

What happens if we keep the setRestored() hack, but move the ARM-specific code to call setRestored() on LR into assignCalleeSavedSpillSlots?

Long-term, the solution is probably to remove LR from CSI, make it an explicit operand of BX_RET/tail calls, and fix up prologue/epilogue lowering to do the right thing... but it's a much larger/trickier patch.

What happens if we keep the setRestored() hack, but move the ARM-specific code to call setRestored() on LR into assignCalleeSavedSpillSlots?

Oh, nevermind, that'll blow up because BX_RET doesn't actually use LR.


Ultimately, yes, we should stop modeling LR as callee-save, but it's probably a pretty big patch to deal with all the implications (all the various return and tail-call instructions would need to be changed, and we probably need some special handling to make sure LR lands in the right place in the stack frame for stack walking code). I guess this patch isn't too evil in the meantime.

MatzeB edited edge metadata.Aug 4 2017, 4:02 PM

Thanks for working on this!

  1. The BranchFolding changes: I've got some issue with the current approach, see below.
  1. Adding the Restored bit to CSI:
  • Very interesting observations!
  • Can we split this change into a separate patch?
  • I think the proposed change is fine as it fixes a problem with the system we have today (under the condition that you add more comments).

My long/in an ideal world:
I find the code in LivePhysRegs::addLiveOuts() and LiveRegUntis::addLiveOuts() that queries CSI on return blocks is accidental complexity that should not be there. The CSI list should be an implementation detail of the prolog epilogue inserter and disappear after that pass is done. We should be able to model things perfectly well by adding implicit uses to the return instructions; It should be equally natural to not add the implicit use of LR to pop {pc} then.

include/llvm/CodeGen/MachineFrameInfo.h
31–45 ↗(On Diff #109140)

The discussion here clearly shows that Restored is an unintuitive concept, so we need comments to explain it in detail!

671 ↗(On Diff #109140)

You could use /// \copydoc getCalleeSavedInfo() here.

lib/CodeGen/BranchFolding.cpp
241–253 ↗(On Diff #109140)
  • Why do you need this, given there is already LivePhysRegs::addLiveOuts()/LivePhysRegs::addLiveOutsNoPristines(). If there is a reason, then this needs a comment explaining it.
  • We are working on deprecating kill-flags. This means introducing new uses of stepForward() is bad as the results will be better/worse depending on kill-flag accuracy.
  • (Unrelated to this patch: I didn't realize you have to manually remove dead defs from the LiveOut set like this, I really think stepForward() should already do this for you and should be fixed accordingly)

Created D36415 for the branch folding issue. It needs the LR issue addressed, or otherwise two ARM testcases will fail.

kparzysz updated this revision to Diff 110051.Aug 7 2017, 12:18 PM
kparzysz retitled this revision from Liveness issues in tail merging, and the ARM::LR saved-but-not-restored to Add "Restored" flag to CalleeSavedInfo.
kparzysz edited the summary of this revision. (Show Details)

Clean up the patch, move the branch folding part into a separate review.

This revision was automatically updated to reflect the committed changes.

Added comments as requested by Matthias.