For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
On Thumb1, i64 shift is lowered to 20 instructions. For example:
For code like
unsigned long long foo(unsigned long long x, unsigned y) { return x << y;}
clang -mcpu=cortex-m0 -Os -S generates:
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r6, r7, lr}
push {r4, r6, r7, lr}
.setfp r7, sp, #8
add r7, sp, #8
lsls r1, r2
movs r3, #32
subs r3, r3, r2
mov r4, r0
lsrs r4, r3
orrs r4, r1
mov r3, r2
subs r3, #32
mov r1, r0
lsls r1, r3
cmp r3, #0
bge .LBB0_2
@ %bb.1: @ %entry
mov r1, r4
.LBB0_2: @ %entry
lsls r0, r2
movs r2, #0
cmp r3, #0
bge .LBB0_4
@ %bb.3: @ %entry
mov r2, r0
.LBB0_4: @ %entry
mov r0, r2
pop {r4, r6, r7, pc}
.Lfunc_end0:
If the i64-shifts are frequently used in source code, the generated code will bloat quickly.
test/CodeGen/ARM/shift-i64.ll | ||
---|---|---|
27 ↗ | (On Diff #130989) | Can you add a couple of extra checks to test that only the call is used please? |