For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
On Thumb1, i64 shift is lowered to 20 instructions. For example:
For code like
unsigned long long foo(unsigned long long x, unsigned y) { return x << y;}
clang -mcpu=cortex-m0 -Os -S generates:
foo:
	.fnstart
@ %bb.0:                                @ %entry
	.save	{r4, r6, r7, lr}
	push	{r4, r6, r7, lr}
	.setfp	r7, sp, #8
	add	r7, sp, #8
	lsls	r1, r2
	movs	r3, #32
	subs	r3, r3, r2
	mov	r4, r0
	lsrs	r4, r3
	orrs	r4, r1
	mov	r3, r2
	subs	r3, #32
	mov	r1, r0
	lsls	r1, r3
	cmp	r3, #0
	bge	.LBB0_2
@ %bb.1:                                @ %entry
	mov	r1, r4
.LBB0_2:                                @ %entry
	lsls	r0, r2
	movs	r2, #0
	cmp	r3, #0
	bge	.LBB0_4
@ %bb.3:                                @ %entry
	mov	r2, r0
.LBB0_4:                                @ %entry
	mov	r0, r2
	pop	{r4, r6, r7, pc}
.Lfunc_end0:
If the i64-shifts are frequently used in source code, the generated code will bloat quickly.
| test/CodeGen/ARM/shift-i64.ll | ||
|---|---|---|
| 27 | Can you add a couple of extra checks to test that only the call is used please? | |
Can you add a couple of extra checks to test that only the call is used please?