This patch adds dynamic stack alignment for Thumb1.
The motivating issue is micompilation of the following code, when targeting a
CPU, which implements only Thumb-1, like cortex-m0.
struct foo { alignas(16) char buf[12]; int i; int *ip; foo() : ip(&i) {} }; extern void g(foo &); void f() { foo myFoo; g(myFoo); }
When initialising the ip member, the address of i is calculated using bitwise OR:
push {r7, lr} .pad #40 sub sp, #40 movs r1, #12 mov r0, sp orrs r1, r0 str r1, [sp, #16]
which is obviously incorrect when the starting address of that object (resp. the
stack pointer) happens to be aligned at 8 or 4 byte boundary.
When compiling for ARM or Thumb2, the stack is realigned and the problem does
not occur.
hard-coding r4 here is bound to create problems. You need to make sure it's saved earlier (and popped later).
You also add four instructions to the prologue, which in Thumb1 is not great. It' better than bad codegen, of course, but you need to make sure how often the realignment will hit (from being fatal, I'm guessing not often), or if there's another way to do this (I can't think of anything).
Welcoming comments from people with more Thumb1 experience than myself.