This is an archive of the discontinued LLVM Phabricator instance.

[Thumb] Save/restore high registers in Thumb1 pro/epilogues
ClosedPublic

Authored by olista01 on Sep 5 2016, 3:41 AM.

Details

Summary

The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.

This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.

In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.

We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.

There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.

In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.

Diff Detail

Repository
rL LLVM

Event Timeline

olista01 updated this revision to Diff 70314.Sep 5 2016, 3:41 AM
olista01 retitled this revision from to [Thumb] Save/restore high registers in Thumb1 pro/epilogues.
olista01 updated this object.
olista01 added reviewers: rengolin, t.p.northover.
olista01 set the repository for this revision to rL LLVM.
olista01 added a subscriber: llvm-commits.
rovka added a subscriber: rovka.Sep 14 2016, 8:33 AM

This looks good in general (especially the tests). I think it would be nice to factor out the duplicated code into some helpers though.

lib/Target/ARM/Thumb1FrameLowering.cpp
314 ↗(On Diff #70314)

Can't you update this check rather than delete it?

test/CodeGen/Thumb/callee_save.ll
49 ↗(On Diff #70314)

Isn't r4 pushed because it's clobbered by the inline asm? And r7 to keep the stack aligned?

131 ↗(On Diff #70314)

You mean an extra 4 low registers? (same below)

olista01 marked 2 inline comments as done.Sep 15 2016, 2:23 AM

I think it would be nice to factor out the duplicated code into some helpers though.

I don't think there is enough common code to actually factor any of it out - there are plenty of small differences between saving and restoring, such as different treatment of lr, and checking argument or return registers.

lib/Target/ARM/Thumb1FrameLowering.cpp
314 ↗(On Diff #70314)

We can now use r0-r3 as part of restoring the high registers (when they are not used as arguments), so any valid register for the tPOP instruction could appear here.

olista01 updated this revision to Diff 71488.Sep 15 2016, 2:45 AM
olista01 removed rL LLVM as the repository for this revision.
  • Correct comments in test
  • Update a few tests added by r279506, now that we correctly spill an extra register to maintain stack alignment
t.p.northover added inline comments.Sep 26 2016, 3:11 AM
lib/Target/ARM/Thumb1FrameLowering.cpp
641–643 ↗(On Diff #71488)

Stale comment? Certainly a scary one.

688–690 ↗(On Diff #71488)

This 3-line fragment seems to be an extremely common idiom in this stretch of code. Might be worth factoring into a findNextOrderedReg, or even using sorted vectors of registers to sidestep the issue entirely.

olista01 marked an inline comment as done.Sep 26 2016, 5:34 AM
olista01 added inline comments.
lib/Target/ARM/Thumb1FrameLowering.cpp
641–643 ↗(On Diff #71488)

The comment is accurate, these unreachables are hit by existing regression tests. The code before this patch doesn't check this at all, and emits nonsense instructions like this:

push    {d8, d9, d10, d11, d12, d13, d14, d15, r8, r10, r11, r4, r5, r6, r7, lr}

for test/CodeGen/ARM/eh-dispcont.ll and test/CodeGen/ARM/global-merge.ll when targeting a thumb1-only iOS target.

This patch (with the unreachables commented out) sort-of fixes this by not adding the D registers to the GPR push/pop instructions.

I know next-to nothing about SjLj handling, Is it the case that this intrinsic shouldn't clobber the FP regs, or does it actually change them (meaning it can't be supported on Thumb1-only targets without switching to ARM mode if available)?

olista01 updated this revision to Diff 72469.Sep 26 2016, 5:35 AM
olista01 set the repository for this revision to rL LLVM.

Factor out ordered register set iteration logic into findNextOrderedReg.

t.p.northover added inline comments.Sep 28 2016, 2:47 AM
lib/Target/ARM/Thumb1FrameLowering.cpp
654–656 ↗(On Diff #72469)

Ouch, that's pretty evil. As far as I can tell the intrinsic is recording the fact that the exception could have come from virtually anywhere, including an ARM function that's clobbered everything.

But clearly that hasn't worked in the past either so I'd prefer making that an explicit ABI contract: we can change the regmask set in ARMISelLowering.cpp to reflect that d8-d15 must be preserved by whatever means rather than letting the frame lowering continue to do something that's clearly wrong.

Realistically, no-one is going to come along and ask us to support SjLj exceptions for Thumb1-only code that uses VFP regs any more anyway (Xcode hasn't been able to target it since 2012 as far as I can tell); a FrameLowering sanity-check is more important.

olista01 added inline comments.Oct 3 2016, 2:31 AM
lib/Target/ARM/Thumb1FrameLowering.cpp
654–656 ↗(On Diff #72469)

I've put this up for review as a separate patch at D25180.

olista01 updated this revision to Diff 73253.Oct 3 2016, 2:45 AM

Uncomment the two llvm_unreachable calls checking for un-savable register classes (requires D25180).

t.p.northover accepted this revision.Oct 10 2016, 11:42 AM
t.p.northover edited edge metadata.

Sorry about the delay. Struck with the dreaded lurgy last week. I think this looks fine (obviously after the exception handling thing is fixed).

This revision is now accepted and ready to land.Oct 10 2016, 11:42 AM
This revision was automatically updated to reflect the committed changes.