This is an archive of the discontinued LLVM Phabricator instance.

Dynamic stack alignment for Thumb1
ClosedPublic

Authored by chill on Sep 21 2017, 10:31 AM.

Details

Summary

This patch adds dynamic stack alignment for Thumb1.

The motivating issue is micompilation of the following code, when targeting a
CPU, which implements only Thumb-1, like cortex-m0.

struct foo {
    alignas(16) char buf[12];
    int i;
    int *ip;
    foo() : ip(&i) {}
};

extern void g(foo &);
void f() {
    foo myFoo;
    g(myFoo);
}

When initialising the ip member, the address of i is calculated using bitwise OR:

push    {r7, lr}
.pad    #40
sub     sp, #40
movs    r1, #12
mov     r0, sp
orrs    r1, r0
str     r1, [sp, #16]

which is obviously incorrect when the starting address of that object (resp. the
stack pointer) happens to be aligned at 8 or 4 byte boundary.

When compiling for ARM or Thumb2, the stack is realigned and the problem does
not occur.

Diff Detail

Event Timeline

chill created this revision.Sep 21 2017, 10:31 AM
rengolin added inline comments.Sep 22 2017, 8:01 AM
lib/Target/ARM/Thumb1FrameLowering.cpp
357

hard-coding r4 here is bound to create problems. You need to make sure it's saved earlier (and popped later).

You also add four instructions to the prologue, which in Thumb1 is not great. It' better than bad codegen, of course, but you need to make sure how often the realignment will hit (from being fatal, I'm guessing not often), or if there's another way to do this (I can't think of anything).

Welcoming comments from people with more Thumb1 experience than myself.

asl added inline comments.Sep 22 2017, 8:03 AM
lib/Target/ARM/Thumb1FrameLowering.cpp
357

Do we have register scavenger available here, so we could scavenge a spare register? Or it's too early to to this - before the frame is set up ?

chill added inline comments.Sep 24 2017, 1:23 PM
lib/Target/ARM/Thumb1FrameLowering.cpp
357

Thanks for pointing that out. I've fixed the patch to add R4 to the set of callee saved register.

Unfortunately, I don't know of a better way to do the alignment in Thumb1. The only other alternative can think of (mov + bic) would need one more scratch register and would have more limiting range of possible alignments.

357

I've experimented with using a virtual scratch register and having the register scavenger later find a suitable hard register for it. I works ... kinda. The register R4 is used as a scratch anyway in function epilogue to restore the SP from the frame pointer, so using some other available register does not buy as anything. There's the option of using a virtual scratch register in epilogue too - but (besides triggering some bugs) the llvm::emitThumbRegPlusImmediate does not emit great code if the destination register is not a low register, e.g. we may get:

movs    r1, #-8
rsbs    r1, r1, #0
add     r1, r7
mov     sp, r1

Also, this approach of using R4 as scratch is also employed by ARM/Thumb2 stack align code and I think this should be addressed by a separate patch.

chill marked an inline comment as done.Sep 24 2017, 1:24 PM
chill updated this revision to Diff 116491.Sep 24 2017, 1:27 PM

Thanks for the comments, much appreciated.
The next revision of the patch makes sure R4 is added to the set of callee saved registers if the function requires stack realignment, too.

asl edited edge metadata.Sep 25 2017, 5:17 AM

Does this work correctly with "inreg" arguments? Just checking.

chill added a comment.Sep 27 2017, 9:55 AM
In D38143#879967, @asl wrote:

Does this work correctly with "inreg" arguments? Just checking.

Not sure. I couldn't produce a testcase, where presence or absence of "inreg" would make any difference.

rengolin added inline comments.
lib/Target/ARM/Thumb1FrameLowering.cpp
357

Right, the fix works for me (rather, I can't think of anything better). @asl @compnerd @joerg?

asl accepted this revision.Oct 18 2017, 7:58 AM

LGTM

This revision is now accepted and ready to land.Oct 18 2017, 7:58 AM
chill added a comment.Oct 18 2017, 8:04 AM

Thanks for the reviews and the approval.

This revision was automatically updated to reflect the committed changes.