This is a follow-up to D32202.
While the previous change (D32202) did fix the stack alignment issue, we
were still at a weird state in terms of the CFI/CFA directives (as the
offsets were wrong). This change cleans up the SAVE/RESTORE macros for
the trampoline, accounting the stack pointer adjustments with less
instructions and with some clearer math. We note that the offsets will
be different on the exit trampolines, because we don't typically 'call'
into this trampoline and we only ever jump into them (i.e. treated as a
tail call that's patched in at runtime).