Compiling the following program:
int g(int), h(int, int, int, int, int);
int f(int a, int b, int c, int d, int e) {
  a = g(a);
  if (a == -1)
    return -1;
  return h(a, b, c, d, e);
}with clang -target arm-arm-none-eabi -mcpu=cortex-m23 -O2 produced this assembly:
f:
        .fnstart
        .save   {r4, r5, r6, r7, lr}
        push    {r4, r5, r6, r7, lr}
        .setfp  r7, sp, #12
        add     r7, sp, #12
        .pad    #4
        sub     sp, #4
        mov     r4, r3
        mov     r5, r2
        mov     r6, r1
        bl      g
        adds    r1, r0, #1
        beq     .LBB0_2
        mov     r1, r6
        mov     r2, r5
        mov     r3, r4
        bl      h
        add     sp, #4
        pop     {r4, r5, r6, r7, pc}
.LBB0_2:
        movs    r0, #0
        mvns    r0, r0
        add     sp, #4
        pop     {r4, r5, r6, r7, pc}Here, the function h is called with an incorrect stack argument. The reason is that the compiler originally created a tail call to h , but then
converted it to an ordinary call because LR was saved by the function and restoring LR is a bit more involved for Armv6m/Armv8m.base (a.k.a. "16-bit Thumb") and negates the benefits of the tail call. Unfortunately, this conversion is incorrect for functions, which have stack arguments as nothing has been done to pass the stack arguments to the callee.
Not doing that conversion and leaving the task of properly restoring LR to emitPopSpecialFixUp solves the correctness problem.
Unfortunately, for functions, which do save LR and tail-call a function without stack arguments  we generate a slightly worse code.
Now, moving to emitPopSpecialFixUp, in the case we couldn't immediately find a "pop-friendly" register, but we have a pop instructions, we
can use as a temporary one of the callee-saved low registers and restore LR before popping other calle-saves.
After the patch, the code, generated for the tail-call looks like:
ldr     r4, [sp, #20]
mov     lr, r4
pop     {r4, r5, r6, r7}
add     sp, #4
b       h