This is a follow-up to D32202.
While the previous change (D32202) did fix the stack alignment issue, we
were still at a weird state in terms of the CFI/CFA directives (as the
offsets were wrong). This change cleans up the SAVE/RESTORE macros for
the trampoline, accounting the stack pointer adjustments with less
instructions and with some clearer math. We note that the offsets will
be different on the exit trampolines, because we don't typically 'call'
into this trampoline and we only ever jump into them (i.e. treated as a
tail call that's patched in at runtime).
If the exit handler is jumped into, does the exit sled explicitly push the return address onto the stack (accounting for the extra 8 bytes compared to the subq insn above)?