Page MenuHomePhabricator

[ARM] Expand long shifts for Thumb1 to __aeabi_ calls

Authored by weimingz on Jan 22 2018, 5:11 PM.



For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Diff Detail


Event Timeline

weimingz created this revision.Jan 22 2018, 5:11 PM

On Thumb1, i64 shift is lowered to 20 instructions. For example:
For code like
unsigned long long foo(unsigned long long x, unsigned y) { return x << y;}

clang -mcpu=cortex-m0 -Os -S generates:

@ %bb.0: @ %entry
.save {r4, r6, r7, lr}
push {r4, r6, r7, lr}
.setfp r7, sp, #8
add r7, sp, #8
lsls r1, r2
movs r3, #32
subs r3, r3, r2
mov r4, r0
lsrs r4, r3
orrs r4, r1
mov r3, r2
subs r3, #32
mov r1, r0
lsls r1, r3
cmp r3, #0
bge .LBB0_2
@ %bb.1: @ %entry
mov r1, r4
.LBB0_2: @ %entry
lsls r0, r2
movs r2, #0
cmp r3, #0
bge .LBB0_4
@ %bb.3: @ %entry
mov r2, r0
.LBB0_4: @ %entry
mov r0, r2
pop {r4, r6, r7, pc}

If the i64-shifts are frequently used in source code, the generated code will bloat quickly.

samparker added inline comments.

Can you add a couple of extra checks to test that only the call is used please?

weimingz updated this revision to Diff 131097.Jan 23 2018, 9:39 AM
weimingz marked an inline comment as done.

As Sam suggests, add more checks in lit tests to make sure only "bl" is used.

samparker accepted this revision.Jan 24 2018, 12:38 AM

Great, LGTM. Thanks!

This revision is now accepted and ready to land.Jan 24 2018, 12:38 AM
This revision was automatically updated to reflect the committed changes.