This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Expand long shifts for Thumb1 to __aeabi_ calls
ClosedPublic

Authored by weimingz on Jan 22 2018, 5:11 PM.

Download Raw Diff

Details

Reviewers

Commits

rG665784f17082: [ARM] Expand long shifts for Thumb1 to __aeabi_ calls
rL323354: [ARM] Expand long shifts for Thumb1 to __aeabi_ calls

Summary

For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Diff Detail

Repository: rL LLVM

Event Timeline

weimingz created this revision.Jan 22 2018, 5:11 PM

Herald added subscribers: llvm-commits, kristof.beyls, javed.absar, aemerson. · View Herald TranscriptJan 22 2018, 5:11 PM

On Thumb1, i64 shift is lowered to 20 instructions. For example:
For code like
unsigned long long foo(unsigned long long x, unsigned y) { return x << y;}

clang -mcpu=cortex-m0 -Os -S generates:

foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r6, r7, lr}
push {r4, r6, r7, lr}
.setfp r7, sp, #8
add r7, sp, #8
lsls r1, r2
movs r3, #32
subs r3, r3, r2
mov r4, r0
lsrs r4, r3
orrs r4, r1
mov r3, r2
subs r3, #32
mov r1, r0
lsls r1, r3
cmp r3, #0
bge .LBB0_2
@ %bb.1: @ %entry
mov r1, r4
.LBB0_2: @ %entry
lsls r0, r2
movs r2, #0
cmp r3, #0
bge .LBB0_4
@ %bb.3: @ %entry
mov r2, r0
.LBB0_4: @ %entry
mov r0, r2
pop {r4, r6, r7, pc}
.Lfunc_end0:

If the i64-shifts are frequently used in source code, the generated code will bloat quickly.

samparker added a subscriber: samparker.Jan 23 2018, 12:39 AM

samparker added inline comments.

test/CodeGen/ARM/shift-i64.ll
27	Can you add a couple of extra checks to test that only the call is used please?

As Sam suggests, add more checks in lit tests to make sure only "bl" is used.

Great, LGTM. Thanks!

This revision is now accepted and ready to land.Jan 24 2018, 12:38 AM

Closed by commit rL323354: [ARM] Expand long shifts for Thumb1 to __aeabi_ calls (authored by weimingz). · Explain WhyJan 24 2018, 10:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.cpp

7 lines

test/

CodeGen/

ARM/

shift-i64.ll

15 lines

Diff 131097

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 814 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,

setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL, MVT::i64, Custom);		setOperationAction(ISD::SRL, MVT::i64, Custom);
setOperationAction(ISD::SRA, MVT::i64, Custom);		setOperationAction(ISD::SRA, MVT::i64, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i64, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i64, Custom);

		// Expand to __aeabi_l{lsl,lsr,asr} calls for Thumb1.
		if (Subtarget->isThumb1Only()) {
		setOperationAction(ISD::SHL_PARTS, MVT::i32, Expand);
		setOperationAction(ISD::SRA_PARTS, MVT::i32, Expand);
		setOperationAction(ISD::SRL_PARTS, MVT::i32, Expand);
		}

setOperationAction(ISD::ADDC, MVT::i32, Custom);		setOperationAction(ISD::ADDC, MVT::i32, Custom);
setOperationAction(ISD::ADDE, MVT::i32, Custom);		setOperationAction(ISD::ADDE, MVT::i32, Custom);
setOperationAction(ISD::SUBC, MVT::i32, Custom);		setOperationAction(ISD::SUBC, MVT::i32, Custom);
setOperationAction(ISD::SUBE, MVT::i32, Custom);		setOperationAction(ISD::SUBE, MVT::i32, Custom);

if (!Subtarget->isThumb1Only() && Subtarget->hasV6T2Ops())		if (!Subtarget->isThumb1Only() && Subtarget->hasV6T2Ops())
setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);		setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);

▲ Show 20 Lines • Show All 13,705 Lines • Show Last 20 Lines

test/CodeGen/ARM/shift-i64.ll

	; RUN: llc -mtriple=arm-eabi %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi %s -o - \| FileCheck %s
				; RUN: llc -mtriple=armv6m-eabi %s -o - \| FileCheck %s --check-prefix=EXPAND

	define i64 @test_shl(i64 %val, i64 %amt) {			define i64 @test_shl(i64 %val, i64 %amt) {
	; CHECK-LABEL: test_shl:			; CHECK-LABEL: test_shl:
				; EXPAND-LABEL: test_shl:
	; First calculate the hi part when the shift amount is small enough that it			; First calculate the hi part when the shift amount is small enough that it
	; contains components from both halves. It'll be returned in r1 so that's a			; contains components from both halves. It'll be returned in r1 so that's a
	; reasonable place for it to end up.			; reasonable place for it to end up.
	; CHECK: rsb [[REVERSE_SHIFT:.*]], r2, #32			; CHECK: rsb [[REVERSE_SHIFT:.*]], r2, #32
	; CHECK: lsr [[TMP:.*]], r0, [[REVERSE_SHIFT]]			; CHECK: lsr [[TMP:.*]], r0, [[REVERSE_SHIFT]]
	; CHECK: orr r1, [[TMP]], r1, lsl r2			; CHECK: orr r1, [[TMP]], r1, lsl r2

	; Check whether the shift was in fact small (< 32 bits).			; Check whether the shift was in fact small (< 32 bits).
	; CHECK: sub [[EXTRA_SHIFT:.*]], r2, #32			; CHECK: sub [[EXTRA_SHIFT:.*]], r2, #32
	; CHECK: cmp [[EXTRA_SHIFT]], #0			; CHECK: cmp [[EXTRA_SHIFT]], #0

	; If not, the high part of the answer is just the low part shifted by the			; If not, the high part of the answer is just the low part shifted by the
	; excess.			; excess.
	; CHECK: lslge r1, r0, [[EXTRA_SHIFT]]			; CHECK: lslge r1, r0, [[EXTRA_SHIFT]]

	; The low part is either a direct shift (1st inst) or 0. We can reuse the same			; The low part is either a direct shift (1st inst) or 0. We can reuse the same
	; NZCV.			; NZCV.
	; CHECK: lsl r0, r0, r2			; CHECK: lsl r0, r0, r2
	; CHECK: movge r0, #0			; CHECK: movge r0, #0

				; EXPAND: push {[[REG:r[0-9]+]], lr}
				samparkerUnsubmitted Done Reply Inline Actions Can you add a couple of extra checks to test that only the call is used please? samparker: Can you add a couple of extra checks to test that only the call is used please?
				; EXPAND-NEXT: bl __aeabi_llsl
				; EXPAND-NEXT: pop {[[REG]], pc}
	%res = shl i64 %val, %amt			%res = shl i64 %val, %amt
	ret i64 %res			ret i64 %res
	}			}

	; Explanation for lshr is pretty much the reverse of shl.			; Explanation for lshr is pretty much the reverse of shl.
	define i64 @test_lshr(i64 %val, i64 %amt) {			define i64 @test_lshr(i64 %val, i64 %amt) {
	; CHECK-LABEL: test_lshr:			; CHECK-LABEL: test_lshr:
				; EXPAND-LABEL: test_lshr:
	; CHECK: rsb [[REVERSE_SHIFT:.*]], r2, #32			; CHECK: rsb [[REVERSE_SHIFT:.*]], r2, #32
	; CHECK: lsr r0, r0, r2			; CHECK: lsr r0, r0, r2
	; CHECK: orr r0, r0, r1, lsl [[REVERSE_SHIFT]]			; CHECK: orr r0, r0, r1, lsl [[REVERSE_SHIFT]]
	; CHECK: sub [[EXTRA_SHIFT:.*]], r2, #32			; CHECK: sub [[EXTRA_SHIFT:.*]], r2, #32
	; CHECK: cmp [[EXTRA_SHIFT]], #0			; CHECK: cmp [[EXTRA_SHIFT]], #0
	; CHECK: lsrge r0, r1, [[EXTRA_SHIFT]]			; CHECK: lsrge r0, r1, [[EXTRA_SHIFT]]
	; CHECK: lsr r1, r1, r2			; CHECK: lsr r1, r1, r2
	; CHECK: movge r1, #0			; CHECK: movge r1, #0

				; EXPAND: push {[[REG:r[0-9]+]], lr}
				; EXPAND-NEXT: bl __aeabi_llsr
				; EXPAND-NEXT: pop {[[REG]], pc}
	%res = lshr i64 %val, %amt			%res = lshr i64 %val, %amt
	ret i64 %res			ret i64 %res
	}			}

	; One minor difference for ashr: the high bits must be "hi >> 31" if the shift			; One minor difference for ashr: the high bits must be "hi >> 31" if the shift
	; amount is large to get the right sign bit.			; amount is large to get the right sign bit.
	define i64 @test_ashr(i64 %val, i64 %amt) {			define i64 @test_ashr(i64 %val, i64 %amt) {
	; CHECK-LABEL: test_ashr:			; CHECK-LABEL: test_ashr:
				; EXPAND-LABEL: test_ashr:
	; CHECK: sub [[EXTRA_SHIFT:.*]], r2, #32			; CHECK: sub [[EXTRA_SHIFT:.*]], r2, #32
	; CHECK: asr [[HI_TMP:.*]], r1, r2			; CHECK: asr [[HI_TMP:.*]], r1, r2
	; CHECK: lsr r0, r0, r2			; CHECK: lsr r0, r0, r2
	; CHECK: rsb [[REVERSE_SHIFT:.*]], r2, #32			; CHECK: rsb [[REVERSE_SHIFT:.*]], r2, #32
	; CHECK: cmp [[EXTRA_SHIFT]], #0			; CHECK: cmp [[EXTRA_SHIFT]], #0
	; CHECK: orr r0, r0, r1, lsl [[REVERSE_SHIFT]]			; CHECK: orr r0, r0, r1, lsl [[REVERSE_SHIFT]]
	; CHECK: asrge [[HI_TMP]], r1, #31			; CHECK: asrge [[HI_TMP]], r1, #31
	; CHECK: asrge r0, r1, [[EXTRA_SHIFT]]			; CHECK: asrge r0, r1, [[EXTRA_SHIFT]]
	; CHECK: mov r1, [[HI_TMP]]			; CHECK: mov r1, [[HI_TMP]]

				; EXPAND: push {[[REG:r[0-9]+]], lr}
				; EXPAND-NEXT: bl __aeabi_lasr
				; EXPAND-NEXT: pop {[[REG]], pc}
	%res = ashr i64 %val, %amt			%res = ashr i64 %val, %amt
	ret i64 %res			ret i64 %res
	}			}