Compiling the following program:
int g(int), h(int, int, int, int, int); int f(int a, int b, int c, int d, int e) { a = g(a); if (a == -1) return -1; return h(a, b, c, d, e); }
with clang -target arm-arm-none-eabi -mcpu=cortex-m23 -O2 produced this assembly:
f: .fnstart .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 mov r4, r3 mov r5, r2 mov r6, r1 bl g adds r1, r0, #1 beq .LBB0_2 mov r1, r6 mov r2, r5 mov r3, r4 bl h add sp, #4 pop {r4, r5, r6, r7, pc} .LBB0_2: movs r0, #0 mvns r0, r0 add sp, #4 pop {r4, r5, r6, r7, pc}
Here, the function h is called with an incorrect stack argument. The reason is that the compiler originally created a tail call to h , but then
converted it to an ordinary call because LR was saved by the function and restoring LR is a bit more involved for Armv6m/Armv8m.base (a.k.a. "16-bit Thumb") and negates the benefits of the tail call. Unfortunately, this conversion is incorrect for functions, which have stack arguments as nothing has been done to pass the stack arguments to the callee.
Not doing that conversion and leaving the task of properly restoring LR to emitPopSpecialFixUp solves the correctness problem.
Unfortunately, for functions, which do save LR and tail-call a function without stack arguments we generate a slightly worse code.
Now, moving to emitPopSpecialFixUp, in the case we couldn't immediately find a "pop-friendly" register, but we have a pop instructions, we
can use as a temporary one of the callee-saved low registers and restore LR before popping other calle-saves.
After the patch, the code, generated for the tail-call looks like:
ldr r4, [sp, #20] mov lr, r4 pop {r4, r5, r6, r7} add sp, #4 b h
Perhaps you meant const BitVector & here?