This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Alter the register allocation order for optsize on Thumb2
ClosedPublic

Authored by dmgreen on Dec 21 2018, 9:00 AM.

Download Raw Diff

Details

Reviewers

efriedma
t.p.northover
john.brawn

Commits

rG6a858a94250e: [ARM] Alter the register allocation order for minsize on Thumb2
rL351938: [ARM] Alter the register allocation order for minsize on Thumb2

Summary

Currently in Arm code, we allocate LR first, under the assumption that
it needs to be saved anyway. Unfortunately this has the disadvantage
that it will require any instructions using it to be the longer thumb2
instructions, not the shorter thumb1 ones.

This switches the order when we are optimising for minsize, returning to
the default order so that more lower registers can be used. It can end
up requiring more pushed registers, but on average produces smaller code.

Diff Detail

Repository: rL LLVM

Event Timeline

dmgreen created this revision.Dec 21 2018, 9:00 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptDec 21 2018, 9:00 AM

This isn't the first time this has come up; see https://reviews.llvm.org/D30324 . I guess changing the allocation order for lr, but not ip, makes this version simpler?

Hello

I had not seen that, thanks for pointing to it. Yes, this is a bit simpler, not trying to deal with r12. Just using the default allocation order only gets us so far, but it seems like a simple enough change. I had tried a few different orderings around using r4/r7 before lr, for example, as they are often spilled in pairs, but this seemed to give the best results for the codebases I tried.

I would say that this patch get us a little bit of codesize in many places, adding up to a good overall gain. Compared to all the other codefolding/libcall style changes we might try to make (that would occur in less places), this seems like an easier win for a simple change.

I guess this is fine as an incremental change. LGTM

lib/Target/ARM/ARMRegisterInfo.td
274 ↗	(On Diff #179295)	80 cols.

This revision is now accepted and ready to land.Jan 21 2019, 12:38 PM

Closed by commit rL351938: [ARM] Alter the register allocation order for minsize on Thumb2 (authored by dmgreen). · Explain WhyJan 23 2019, 2:19 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMRegisterInfo.td

31 lines

test/

CodeGen/

Thumb2/

reg-order.ll

106 lines

Diff 183066

llvm/trunk/lib/Target/ARM/ARMRegisterInfo.td

	Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
	// r7 == Frame Pointer (thumb-style backtraces)			// r7 == Frame Pointer (thumb-style backtraces)
	// r9 == May be reserved as Thread Register			// r9 == May be reserved as Thread Register
	// r11 == Frame Pointer (arm-style backtraces)			// r11 == Frame Pointer (arm-style backtraces)
	// r10 == Stack Limit			// r10 == Stack Limit
	//			//
	def GPR : RegisterClass<"ARM", [i32], 32, (add (sequence "R%u", 0, 12),			def GPR : RegisterClass<"ARM", [i32], 32, (add (sequence "R%u", 0, 12),
	SP, LR, PC)> {			SP, LR, PC)> {
	// Allocate LR as the first CSR since it is always saved anyway.			// Allocate LR as the first CSR since it is always saved anyway.
				// For Thumb2, using LR would force 32bit Thumb2 instructions, not the smaller
				// Thumb1 ones. It is a little better for codesize on average to use the
				// default order.
	// For Thumb1 mode, we don't want to allocate hi regs at all, as we don't			// For Thumb1 mode, we don't want to allocate hi regs at all, as we don't
	// know how to spill them. If we make our prologue/epilogue code smarter at			// know how to spill them. If we make our prologue/epilogue code smarter at
	// some point, we can go back to using the above allocation orders for the			// some point, we can go back to using the above allocation orders for the
	// Thumb1 instructions that know how to use hi regs.			// Thumb1 instructions that know how to use hi regs.
	let AltOrders = [(add LR, GPR), (trunc GPR, 8)];			let AltOrders = [(add LR, GPR), (trunc GPR, 8)];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			if (MF.getSubtarget<ARMSubtarget>().isThumb1Only())
				return 2;
				if (MF.getSubtarget<ARMSubtarget>().isThumb2() &&
				MF.getFunction().optForMinSize())
				return 0;
				return 1;
	}];			}];
	let DiagnosticString = "operand must be a register in range [r0, r15]";			let DiagnosticString = "operand must be a register in range [r0, r15]";
	}			}

	// GPRs without the PC. Some ARM instructions do not allow the PC in			// GPRs without the PC. Some ARM instructions do not allow the PC in
	// certain operand slots, particularly as the destination. Primarily			// certain operand slots, particularly as the destination. Primarily
	// useful for disassembly.			// useful for disassembly.
	def GPRnopc : RegisterClass<"ARM", [i32], 32, (sub GPR, PC)> {			def GPRnopc : RegisterClass<"ARM", [i32], 32, (sub GPR, PC)> {
	let AltOrders = [(add LR, GPRnopc), (trunc GPRnopc, 8)];			let AltOrders = [(add LR, GPRnopc), (trunc GPRnopc, 8)];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			if (MF.getSubtarget<ARMSubtarget>().isThumb1Only())
				return 2;
				if (MF.getSubtarget<ARMSubtarget>().isThumb2() &&
				MF.getFunction().optForMinSize())
				return 0;
				return 1;
	}];			}];
	let DiagnosticString = "operand must be a register in range [r0, r14]";			let DiagnosticString = "operand must be a register in range [r0, r14]";
	}			}

	// GPRs without the PC but with APSR. Some instructions allow accessing the			// GPRs without the PC but with APSR. Some instructions allow accessing the
	// APSR, while actually encoding PC in the register field. This is useful			// APSR, while actually encoding PC in the register field. This is useful
	// for assembly and disassembly only.			// for assembly and disassembly only.
	def GPRwithAPSR : RegisterClass<"ARM", [i32], 32, (add (sub GPR, PC), APSR_NZCV)> {			def GPRwithAPSR : RegisterClass<"ARM", [i32], 32, (add (sub GPR, PC), APSR_NZCV)> {
	let AltOrders = [(add LR, GPRnopc), (trunc GPRnopc, 8)];			let AltOrders = [(add LR, GPRnopc), (trunc GPRnopc, 8)];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			if (MF.getSubtarget<ARMSubtarget>().isThumb1Only())
				return 2;
				if (MF.getSubtarget<ARMSubtarget>().isThumb2() &&
				MF.getFunction().optForMinSize())
				return 0;
				return 1;
	}];			}];
	let DiagnosticString = "operand must be a register in range [r0, r14] or apsr_nzcv";			let DiagnosticString = "operand must be a register in range [r0, r14] or apsr_nzcv";
	}			}

	// GPRsp - Only the SP is legal. Used by Thumb1 instructions that want the			// GPRsp - Only the SP is legal. Used by Thumb1 instructions that want the
	// implied SP argument list.			// implied SP argument list.
	// FIXME: It would be better to not use this at all and refactor the			// FIXME: It would be better to not use this at all and refactor the
	// instructions to not have SP an an explicit argument. That makes			// instructions to not have SP an an explicit argument. That makes
	// frame index resolution a bit trickier, though.			// frame index resolution a bit trickier, though.
	def GPRsp : RegisterClass<"ARM", [i32], 32, (add SP)> {			def GPRsp : RegisterClass<"ARM", [i32], 32, (add SP)> {
	let DiagnosticString = "operand must be a register sp";			let DiagnosticString = "operand must be a register sp";
	}			}

	// restricted GPR register class. Many Thumb2 instructions allow the full			// restricted GPR register class. Many Thumb2 instructions allow the full
	// register range for operands, but have undefined behaviours when PC			// register range for operands, but have undefined behaviours when PC
	// or SP (R13 or R15) are used. The ARM ISA refers to these operands			// or SP (R13 or R15) are used. The ARM ISA refers to these operands
	// via the BadReg() pseudo-code description.			// via the BadReg() pseudo-code description.
	def rGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, SP, PC)> {			def rGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, SP, PC)> {
	let AltOrders = [(add LR, rGPR), (trunc rGPR, 8)];			let AltOrders = [(add LR, rGPR), (trunc rGPR, 8)];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			if (MF.getSubtarget<ARMSubtarget>().isThumb1Only())
				return 2;
				if (MF.getSubtarget<ARMSubtarget>().isThumb2() &&
				MF.getFunction().optForMinSize())
				return 0;
				return 1;
	}];			}];
	let DiagnosticType = "rGPR";			let DiagnosticType = "rGPR";
	}			}

	// Thumb registers are R0-R7 normally. Some instructions can still use			// Thumb registers are R0-R7 normally. Some instructions can still use
	// the general GPR register class above (MOV, e.g.)			// the general GPR register class above (MOV, e.g.)
	def tGPR : RegisterClass<"ARM", [i32], 32, (trunc GPR, 8)> {			def tGPR : RegisterClass<"ARM", [i32], 32, (trunc GPR, 8)> {
	let DiagnosticString = "operand must be a register in range [r0, r7]";			let DiagnosticString = "operand must be a register in range [r0, r7]";
	▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb2/reg-order.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=thumbv7m-none-eabi \| FileCheck %s


				define i32 @test(i32 %a, i32 %b, i32 %c, i32 %d) #0 {
				; CHECK-LABEL: test:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .save {r4, lr}
				; CHECK-NEXT: push {r4, lr}
				; CHECK-NEXT: adds r4, r3, r0
				; CHECK-NEXT: add.w r12, r2, r1
				; CHECK-NEXT: add r0, r1
				; CHECK-NEXT: adds r1, r3, r2
				; CHECK-NEXT: mul r4, r4, r12
				; CHECK-NEXT: mla r0, r1, r0, r4
				; CHECK-NEXT: pop {r4, pc}
				entry:
				%add = add nsw i32 %b, %a
				%add1 = add nsw i32 %d, %c
				%mul = mul nsw i32 %add1, %add
				%add2 = add nsw i32 %d, %a
				%add3 = add nsw i32 %c, %b
				%mul4 = mul nsw i32 %add2, %add3
				%add5 = add nsw i32 %mul, %mul4
				ret i32 %add5
				}

				define void @loop(i32 %I, i8* %A, i8* %B) #0 {
				; CHECK-LABEL: loop:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: .save {r4, r5, r6, r7, lr}
				; CHECK-NEXT: push {r4, r5, r6, r7, lr}
				; CHECK-NEXT: mov.w r12, #0
				; CHECK-NEXT: b .LBB1_2
				; CHECK-NEXT: .LBB1_1: @ %for.body
				; CHECK-NEXT: @ in Loop: Header=BB1_2 Depth=1
				; CHECK-NEXT: add.w r4, r12, r12, lsl #1
				; CHECK-NEXT: add.w r3, r2, r12, lsl #2
				; CHECK-NEXT: add r4, r1
				; CHECK-NEXT: add.w r12, r12, #1
				; CHECK-NEXT: ldrsb.w r6, [r4, #2]
				; CHECK-NEXT: ldrsb.w r5, [r4]
				; CHECK-NEXT: mov r7, r6
				; CHECK-NEXT: cmp r5, r6
				; CHECK-NEXT: it gt
				; CHECK-NEXT: movgt r7, r5
				; CHECK-NEXT: ldrsb.w r4, [r4, #1]
				; CHECK-NEXT: cmp r7, r4
				; CHECK-NEXT: it le
				; CHECK-NEXT: movle r7, r4
				; CHECK-NEXT: subs r4, r7, r4
				; CHECK-NEXT: subs r6, r7, r6
				; CHECK-NEXT: strb r6, [r3, #3]
				; CHECK-NEXT: strb r4, [r3, #2]
				; CHECK-NEXT: subs r4, r7, r5
				; CHECK-NEXT: strb r4, [r3, #1]
				; CHECK-NEXT: mvns r4, r7
				; CHECK-NEXT: strb r4, [r3]
				; CHECK-NEXT: .LBB1_2: @ %for.cond
				; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: cmp r12, r0
				; CHECK-NEXT: blt .LBB1_1
				; CHECK-NEXT: @ %bb.3: @ %for.cond.cleanup
				; CHECK-NEXT: pop {r4, r5, r6, r7, pc}
				entry:
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%A.addr.0 = phi i8* [ %A, %entry ], [ %incdec.ptr2, %for.body ]
				%B.addr.0 = phi i8* [ %B, %entry ], [ %incdec.ptr47, %for.body ]
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%cmp = icmp slt i32 %i.0, %I
				br i1 %cmp, label %for.body, label %for.cond.cleanup

				for.body: ; preds = %for.cond
				%incdec.ptr = getelementptr inbounds i8, i8* %A.addr.0, i32 1
				%0 = load i8, i8* %A.addr.0, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %A.addr.0, i32 2
				%1 = load i8, i8* %incdec.ptr, align 1
				%incdec.ptr2 = getelementptr inbounds i8, i8* %A.addr.0, i32 3
				%2 = load i8, i8* %incdec.ptr1, align 1
				%3 = icmp sgt i8 %0, %2
				%4 = select i1 %3, i8 %0, i8 %2
				%5 = icmp sgt i8 %4, %1
				%6 = select i1 %5, i8 %4, i8 %1
				%7 = xor i8 %6, -1
				%sub34 = sub i8 %6, %0
				%sub38 = sub i8 %6, %1
				%sub42 = sub i8 %6, %2
				%incdec.ptr44 = getelementptr inbounds i8, i8* %B.addr.0, i32 1
				store i8 %7, i8* %B.addr.0, align 1
				%incdec.ptr45 = getelementptr inbounds i8, i8* %B.addr.0, i32 2
				store i8 %sub34, i8* %incdec.ptr44, align 1
				%incdec.ptr46 = getelementptr inbounds i8, i8* %B.addr.0, i32 3
				store i8 %sub38, i8* %incdec.ptr45, align 1
				%incdec.ptr47 = getelementptr inbounds i8, i8* %B.addr.0, i32 4
				store i8 %sub42, i8* %incdec.ptr46, align 1
				%inc = add nuw nsw i32 %i.0, 1
				br label %for.cond

				for.cond.cleanup: ; preds = %for.cond
				ret void
				}


				attributes #0 = { minsize optsize }