Download Raw Diff

Details

Reviewers

qcolombet
rengolin
SjoerdMeijer
weimingz
dmgreen
jmolloy

Summary

For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like:

R0-R3, R12, LR, R4-R11

As a result, a lot of instructs that use R12/LR will be wide instrs.

This patch changes the allocation order to:

R0-R7, R12, LR, R8-R11

for thumb2 and -Oz.

In most cases, there is no extra push/pop instrs as they will be folded into
existing ones. There might be slight performance impact due to more stack
usage, so we only enable it when opt for min size.

For an embedded application with 83K code, this patch saves 430 bytes (0.5%).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

weimingz created this revision.Feb 23 2017, 11:39 PM

Herald added a subscriber: aemerson. · View Herald TranscriptFeb 23 2017, 11:39 PM

eastig added a subscriber: eastig.Feb 24 2017, 2:14 AM

I'm not sure about this change. I think it makes sense, but I don't think it's done in a way to integrate with the existing code.

I'm adding Quentin (the code owner for the register allocator) to have a more informed opinion.

I'd also expect some benchmarks (speed and size) on a few benchmarks, hopefully in a few thumb2 architectures. A 0.5% code size improvement on "an embedded application" isn't enough, I think.

cheers,
--renato

lib/CodeGen/RegisterClassInfo.cpp
129 ↗	(On Diff #89612)	I'm not very knowledgeable in this part of the code, but it seems you're destroying everything the above code was trying to do with the RCI order. It looks to me as though you should try to add the logic into the loop above, rather than splitting and discarding.
lib/Target/ARM/ARMBaseRegisterInfo.cpp
58 ↗	(On Diff #89612)	This option seems unnecessary. I mean, it's good to have it in order to track performance, but this change looks beneficial to all thumb targets. So, unless this brings performance or code size issue for some but not all thumb, I think this should be removed, and make the cost be calculated solely on `isThumb2`.

Please repost with llvm-commits on the CC list, so the patch gets sent to the mailing list.

Sure, I will get more numbers for other benchmarks.

lib/CodeGen/RegisterClassInfo.cpp
129 ↗	(On Diff #89612)	The code will maintain the original order if two registers have the same priority by using stable_sort: one register has lower CostPerUse, it has higher priority. Otherwise, the caller saved register have higher priority. Everything equal, the original order is kept.

qcolombet added inline comments.Mar 7 2017, 12:47 PM

lib/CodeGen/RegisterClassInfo.cpp
129 ↗	(On Diff #89612)	Renato is right, you're doing more that changing the order of the CSRs. Although this is probably not used in-tree, there is nothing that prevents to set a different cost for each register. Thus a stable sort on the cost per use may give a very different order than the raw order. For instance, the raw order could well be in terms of cost per use {10, 8, 3, 12, etc.}. Instead, what I would suggest is: Define an alternative order for MF Add a callback just to disable the special case for CSR, i.e., the callback you have works, just use it inside the loop not to populate the CSRAlias

Below are the measurements of text size of benchmarks.
Build flag: "-Oz -mthumb -mcpu=cortex-m3 -fomit-frame-pointer"
Baseline has extra flag: " -mllvm -arm-favor-r4-r7=false"

benchmark	baseline	favor low reg	reduction	reduction (%)
spec2000/ammp	73304	72984	320	0.436538
spec2000/art	8459	8459	0	0
spec2000/bzip2	18771	18739	32	0.170476
spec2000/crafty	133506	133242	264	0.197744
spec2000/eon	217561	217251	310	0.142489
spec2000/equake	11409	11373	36	0.31554
spec2000/gap	267736	267312	424	0.158365
spec2000/gcc	800902	800050	852	0.10638
spec2000/gzip	21563	21515	48	0.222604
spec2000/mcf	5366	5308	58	1.08088
spec2000/mesa	305721	304279	1442	0.471672
spec2000/parser	58571	58459	112	0.191221
spec2000/perlbmk	311020	310724	296	0.0951707
spec2000/twolf	119785	119905	-120	-0.100179
spec2000/vortex	322133	321581	552	0.171358
spec2000/vpr	90682	90570	112	0.123509
coremark	6281	6188	93	1.48066
spec2006/astar	19282	19284	-2	-0.0103724
spec2006/bzip2	33991	33897	94	0.276544
spec2006/dealII	1864927	1861707	3220	0.172661
spec2006/gcc	1835601	1833531	2070	0.11277
spec2006/gobmk	1151212	1149542	1670	0.145065
spec2006/h264ref	335652	335218	434	0.129301
spec2006/hmmer	153025	152693	332	0.216958
spec2006/lbm	7282	7270	12	0.16479
spec2006/libquantum	19926	19888	38	0.190706
spec2006/mcf	5484	5446	38	0.692925
spec2006/milc	64512	64236	276	0.427827
spec2006/namd	154110	153830	280	0.181688
spec2006/omnetpp	429750	429576	174	0.0404887
spec2006/perlbench	603275	602947	328	0.0543699
spec2006/povray	555524	554886	638	0.114847
spec2006/sjeng	86646	86302	344	0.397018
spec2006/soplex	210691	210169	522	0.247756
spec2006/sphinx3	100839	100641	198	0.196353
spec2006/xalancbmk	2819035	2817887	1148	0.0407232

ping?

Hi Weiming,

Sorry for the delay. I have two major points with this approach:

We're basically selecting different patterns based on different flags by crossing too many wires (table-gen reg definition, sub-target feature, command line option, optimisation flags). Dynamically changing with command line options is going to be harder to test and reproduce user bugs.
The benchmark numbers for code size reduction aren't still really *that* encouraging.

Ultimately, having a simple ARM/Thumb1/2 set would be as far as I'd go.

@qcolombet is there some prior art for that kind of fiddling? By looking at the other register files, the most that happens is things like "is64bit", in the same way we have "isThumb1".

cheers,
--renato

ostannard commandeered this revision.Jul 2 2019, 5:12 AM

ostannard added a reviewer: weimingz.

Herald added subscribers: jsji, kristof.beyls, javed.absar. · View Herald TranscriptJul 2 2019, 5:12 AM

Rebase onto trunk.
Move all controlling conditions into ARMSubtarget
Remove command-line option (the only reason to disable this is benchmarking it)
Simplify test with inline asm, should be more robust

This is giving me ~0.1% code size reduction on SPEC2006 (comparing -Oz without this patch to -Oz with it), and ~0.25% reduction on 8 mbed-os examples.

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 5:34 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

jmolloy added a subscriber: jmolloy.Jul 2 2019, 6:37 AM

jmolloy added inline comments.

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
295	All virtual functions should have a docstring.
llvm/lib/CodeGen/RegisterClassInfo.cpp
118–119	Although you're not using it here, it seems to me very natural for such a callback to take the PhysReg too.

ostannard marked an inline comment as done.Jul 3 2019, 2:12 AM

ostannard added inline comments.

llvm/lib/CodeGen/RegisterClassInfo.cpp
118–119	I'd rather not add extra code which we don't currently have a use for. That can easily be done later if it does turn out to be useful.

Added doc string
Remembered to git-add the test this time

jmolloy added inline comments.Jul 3 2019, 2:21 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
118–119	I agree with the principle and would usually advocate for it. But in this case the SubtargetInfo API is a public API that is used by out-of-tree targets. Having a sensible API that isn't overfit to the current in-tree targets when it's ~zero cost is something we should aim for, IMO.

Add a PhysReg parameter to ignoreCSRForAllocationOrder
Check that the register is a GPR in the ARM implementation. The other register classes have the callee-saved regs last, so this doesn't make any difference to the generated code, but might avoid surprising behaviour in the future.

LGTM, thanks for the change Oliver!

This revision is now accepted and ready to land.Jul 3 2019, 2:42 AM

Committed https://reviews.llvm.org/rL365014

Diff 207731

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	public:

/// Enable tracking of subregister liveness in register allocator.		/// Enable tracking of subregister liveness in register allocator.
/// Please use MachineRegisterInfo::subRegLivenessEnabled() instead where		/// Please use MachineRegisterInfo::subRegLivenessEnabled() instead where
/// possible.		/// possible.
virtual bool enableSubRegLiveness() const { return false; }		virtual bool enableSubRegLiveness() const { return false; }

/// This is called after a .mir file was loaded.		/// This is called after a .mir file was loaded.
virtual void mirFileLoaded(MachineFunction &MF) const;		virtual void mirFileLoaded(MachineFunction &MF) const;

		/// True if the register allocator should use the allocation orders exactly as
		jmolloyUnsubmitted Done Reply Inline Actions All virtual functions should have a docstring. jmolloy: All virtual functions should have a docstring.
		/// written in the tablegen descriptions, false if it should allocate
		/// the specified physical register later if is it callee-saved.
		virtual bool ignoreCSRForAllocationOrder(const MachineFunction &MF,
		unsigned PhysReg) const {
		return false;
		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_CODEGEN_TARGETSUBTARGETINFO_H		#endif // LLVM_CODEGEN_TARGETSUBTARGETINFO_H

llvm/lib/CodeGen/RegisterClassInfo.cpp

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	}			}

	/// compute - Compute the preferred allocation order for RC with reserved			/// compute - Compute the preferred allocation order for RC with reserved
	/// registers filtered out. Volatile registers come first followed by CSR			/// registers filtered out. Volatile registers come first followed by CSR
	/// aliases ordered according to the CSR order specified by the target.			/// aliases ordered according to the CSR order specified by the target.
	void RegisterClassInfo::compute(const TargetRegisterClass *RC) const {			void RegisterClassInfo::compute(const TargetRegisterClass *RC) const {
	assert(RC && "no register class given");			assert(RC && "no register class given");
	RCInfo &RCI = RegClass[RC->getID()];			RCInfo &RCI = RegClass[RC->getID()];
				auto &STI = MF->getSubtarget();

	// Raw register count, including all reserved regs.			// Raw register count, including all reserved regs.
	unsigned NumRegs = RC->getNumRegs();			unsigned NumRegs = RC->getNumRegs();

	if (!RCI.Order)			if (!RCI.Order)
	RCI.Order.reset(new MCPhysReg[NumRegs]);			RCI.Order.reset(new MCPhysReg[NumRegs]);

	unsigned N = 0;			unsigned N = 0;
	SmallVector<MCPhysReg, 16> CSRAlias;			SmallVector<MCPhysReg, 16> CSRAlias;
	unsigned MinCost = 0xff;			unsigned MinCost = 0xff;
	unsigned LastCost = ~0u;			unsigned LastCost = ~0u;
	unsigned LastCostChange = 0;			unsigned LastCostChange = 0;

	// FIXME: Once targets reserve registers instead of removing them from the			// FIXME: Once targets reserve registers instead of removing them from the
	// allocation order, we can simply use begin/end here.			// allocation order, we can simply use begin/end here.
	ArrayRef<MCPhysReg> RawOrder = RC->getRawAllocationOrder(*MF);			ArrayRef<MCPhysReg> RawOrder = RC->getRawAllocationOrder(*MF);
	for (unsigned i = 0; i != RawOrder.size(); ++i) {			for (unsigned i = 0; i != RawOrder.size(); ++i) {
	unsigned PhysReg = RawOrder[i];			unsigned PhysReg = RawOrder[i];
	// Remove reserved registers from the allocation order.			// Remove reserved registers from the allocation order.
	if (Reserved.test(PhysReg))			if (Reserved.test(PhysReg))
	continue;			continue;
	unsigned Cost = TRI->getCostPerUse(PhysReg);			unsigned Cost = TRI->getCostPerUse(PhysReg);
	MinCost = std::min(MinCost, Cost);			MinCost = std::min(MinCost, Cost);

	if (CalleeSavedAliases[PhysReg])			if (CalleeSavedAliases[PhysReg] &&
				!STI.ignoreCSRForAllocationOrder(*MF, PhysReg))
				jmolloyUnsubmitted Not Done Reply Inline Actions Although you're not using it here, it seems to me very natural for such a callback to take the PhysReg too. jmolloy: Although you're not using it here, it seems to me very natural for such a callback to take the…
				ostannardAuthorUnsubmitted Not Done Reply Inline Actions I'd rather not add extra code which we don't currently have a use for. That can easily be done later if it does turn out to be useful. ostannard: I'd rather not add extra code which we don't currently have a use for. That can easily be done…
				jmolloyUnsubmitted Not Done Reply Inline Actions I agree with the principle and would usually advocate for it. But in this case the SubtargetInfo API is a public API that is used by out-of-tree targets. Having a sensible API that isn't overfit to the current in-tree targets when it's ~zero cost is something we should aim for, IMO. jmolloy: I agree with the principle and would usually advocate for it. But in this case the…
	// PhysReg aliases a CSR, save it for later.			// PhysReg aliases a CSR, save it for later.
	CSRAlias.push_back(PhysReg);			CSRAlias.push_back(PhysReg);
	else {			else {
	if (Cost != LastCost)			if (Cost != LastCost)
	LastCostChange = N;			LastCostChange = N;
	RCI.Order[N++] = PhysReg;			RCI.Order[N++] = PhysReg;
	LastCost = Cost;			LastCost = Cost;
	}			}
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMRegisterInfo.td

	Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	//			//
	def GPR : RegisterClass<"ARM", [i32], 32, (add (sequence "R%u", 0, 12),			def GPR : RegisterClass<"ARM", [i32], 32, (add (sequence "R%u", 0, 12),
	SP, LR, PC)> {			SP, LR, PC)> {
	// Allocate LR as the first CSR since it is always saved anyway.			// Allocate LR as the first CSR since it is always saved anyway.
	// For Thumb1 mode, we don't want to allocate hi regs at all, as we don't			// For Thumb1 mode, we don't want to allocate hi regs at all, as we don't
	// know how to spill them. If we make our prologue/epilogue code smarter at			// know how to spill them. If we make our prologue/epilogue code smarter at
	// some point, we can go back to using the above allocation orders for the			// some point, we can go back to using the above allocation orders for the
	// Thumb1 instructions that know how to use hi regs.			// Thumb1 instructions that know how to use hi regs.
	let AltOrders = [(add LR, GPR), (trunc GPR, 8)];			let AltOrders = [(add LR, GPR), (trunc GPR, 8),
				(add (trunc GPR, 8), R12, LR, (shl GPR, 8))];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			return MF.getSubtarget<ARMSubtarget>().getGPRAllocationOrder(MF);
	}];			}];
	let DiagnosticString = "operand must be a register in range [r0, r15]";			let DiagnosticString = "operand must be a register in range [r0, r15]";
	}			}

	// GPRs without the PC. Some ARM instructions do not allow the PC in			// GPRs without the PC. Some ARM instructions do not allow the PC in
	// certain operand slots, particularly as the destination. Primarily			// certain operand slots, particularly as the destination. Primarily
	// useful for disassembly.			// useful for disassembly.
	def GPRnopc : RegisterClass<"ARM", [i32], 32, (sub GPR, PC)> {			def GPRnopc : RegisterClass<"ARM", [i32], 32, (sub GPR, PC)> {
	let AltOrders = [(add LR, GPRnopc), (trunc GPRnopc, 8)];			let AltOrders = [(add LR, GPRnopc), (trunc GPRnopc, 8),
				(add (trunc GPRnopc, 8), R12, LR, (shl GPRnopc, 8))];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			return MF.getSubtarget<ARMSubtarget>().getGPRAllocationOrder(MF);
	}];			}];
	let DiagnosticString = "operand must be a register in range [r0, r14]";			let DiagnosticString = "operand must be a register in range [r0, r14]";
	}			}

	// GPRs without the PC but with APSR. Some instructions allow accessing the			// GPRs without the PC but with APSR. Some instructions allow accessing the
	// APSR, while actually encoding PC in the register field. This is useful			// APSR, while actually encoding PC in the register field. This is useful
	// for assembly and disassembly only.			// for assembly and disassembly only.
	def GPRwithAPSR : RegisterClass<"ARM", [i32], 32, (add (sub GPR, PC), APSR_NZCV)> {			def GPRwithAPSR : RegisterClass<"ARM", [i32], 32, (add (sub GPR, PC), APSR_NZCV)> {
	Show All 38 Lines
	// where LR is the only legal loop counter register.			// where LR is the only legal loop counter register.
	def GPRlr : RegisterClass<"ARM", [i32], 32, (add LR)>;			def GPRlr : RegisterClass<"ARM", [i32], 32, (add LR)>;

	// restricted GPR register class. Many Thumb2 instructions allow the full			// restricted GPR register class. Many Thumb2 instructions allow the full
	// register range for operands, but have undefined behaviours when PC			// register range for operands, but have undefined behaviours when PC
	// or SP (R13 or R15) are used. The ARM ISA refers to these operands			// or SP (R13 or R15) are used. The ARM ISA refers to these operands
	// via the BadReg() pseudo-code description.			// via the BadReg() pseudo-code description.
	def rGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, SP, PC)> {			def rGPR : RegisterClass<"ARM", [i32], 32, (sub GPR, SP, PC)> {
	let AltOrders = [(add LR, rGPR), (trunc rGPR, 8)];			let AltOrders = [(add LR, rGPR), (trunc rGPR, 8),
				(add (trunc rGPR, 8), R12, LR, (shl rGPR, 8))];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			return MF.getSubtarget<ARMSubtarget>().getGPRAllocationOrder(MF);
	}];			}];
	let DiagnosticType = "rGPR";			let DiagnosticType = "rGPR";
	}			}

	// Thumb registers are R0-R7 normally. Some instructions can still use			// Thumb registers are R0-R7 normally. Some instructions can still use
	// the general GPR register class above (MOV, e.g.)			// the general GPR register class above (MOV, e.g.)
	def tGPR : RegisterClass<"ARM", [i32], 32, (trunc GPR, 8)> {			def tGPR : RegisterClass<"ARM", [i32], 32, (trunc GPR, 8)> {
	let DiagnosticString = "operand must be a register in range [r0, r7]";			let DiagnosticString = "operand must be a register in range [r0, r7]";
	▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 850 Lines • ▼ Show 20 Lines	public:
/// ROPI does not use GOT.		/// ROPI does not use GOT.
bool allowPositionIndependentMovt() const {		bool allowPositionIndependentMovt() const {
return isROPI() \|\| !isTargetELF();		return isROPI() \|\| !isTargetELF();
}		}

unsigned getPrefLoopAlignment() const {		unsigned getPrefLoopAlignment() const {
return PrefLoopAlignment;		return PrefLoopAlignment;
}		}

		bool ignoreCSRForAllocationOrder(const MachineFunction &MF,
		unsigned PhysReg) const override;
		unsigned getGPRAllocationOrder(const MachineFunction &MF) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_ARM_ARMSUBTARGET_H		#endif // LLVM_LIB_TARGET_ARM_ARMSUBTARGET_H

llvm/lib/Target/ARM/ARMSubtarget.cpp

Show First 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	bool ARMSubtarget::useFastISel() const {
if (!hasV6Ops())		if (!hasV6Ops())
return false;		return false;

// Thumb2 support on iOS; ARM support on iOS, Linux and NaCl.		// Thumb2 support on iOS; ARM support on iOS, Linux and NaCl.
return TM.Options.EnableFastISel &&		return TM.Options.EnableFastISel &&
((isTargetMachO() && !isThumb1Only()) \|\|		((isTargetMachO() && !isThumb1Only()) \|\|
(isTargetLinux() && !isThumb()) \|\| (isTargetNaCl() && !isThumb()));		(isTargetLinux() && !isThumb()) \|\| (isTargetNaCl() && !isThumb()));
}		}

		unsigned ARMSubtarget::getGPRAllocationOrder(const MachineFunction &MF) const {
		// The GPR register class has multiple possible allocation orders, with
		// tradeoffs preferred by different sub-architectures and optimisation goals.
		// The allocation orders are:
		// 0: (the default tablegen order, not used)
		// 1: r14, r0-r13
		// 2: r0-r7
		// 3: r0-r7, r12, lr, r8-r11
		// Note that the register allocator will change this order so that
		// callee-saved registers are used later, as they require extra work in the
		// prologue/epilogue (though we sometimes override that).

		// For thumb1-only targets, only the low registers are allocatable.
		if (isThumb1Only())
		return 2;

		// Allocate low registers first, so we can select more 16-bit instructions.
		// We also (in ignoreCSRForAllocationOrder) override the default behaviour
		// with regards to callee-saved registers, because pushing extra registers is
		// much cheaper (in terms of code size) than using high registers. After
		// that, we allocate r12 (doesn't need to be saved), lr (saving it means we
		// can return with the pop, don't need an extra "bx lr") and then the rest of
		// the high registers.
		if (isThumb2() && MF.getFunction().hasMinSize())
		return 3;

		// Otherwise, allocate in the default order, using LR first because saving it
		// allows a shorter epilogue sequence.
		return 1;
		}

		bool ARMSubtarget::ignoreCSRForAllocationOrder(const MachineFunction &MF,
		unsigned PhysReg) const {
		// To minimize code size in Thumb2, we prefer the usage of low regs (lower
		// cost per use) so we can use narrow encoding. By default, caller-saved
		// registers (e.g. lr, r12) are always allocated first, regardless of
		// their cost per use. When optForMinSize, we prefer the low regs even if
		// they are CSR because usually push/pop can be folded into existing ones.
		return isThumb2() && MF.getFunction().hasMinSize() &&
		ARM::GPRRegClass.contains(PhysReg);
		}

llvm/test/CodeGen/ARM/avoid-cpsr-rmw.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; rdar://12878928			; rdar://12878928
	define void @t3(i32* nocapture %ptr1, i32* %ptr2, i32 %c) nounwind minsize {			define void @t3(i32* nocapture %ptr1, i32* %ptr2, i32 %c) nounwind minsize {
	entry:			entry:
	; CHECK-LABEL: t3:			; CHECK-LABEL: t3:
	br label %while.body			br label %while.body

	while.body:			while.body:
	; CHECK: while.body			; CHECK: while.body
	; CHECK: mul r{{[0-9]+}}			; CHECK: muls r{{[0-9]+}}
	; CHECK: muls			; CHECK: muls
	%ptr1.addr.09 = phi i32* [ %add.ptr, %while.body ], [ %ptr1, %entry ]			%ptr1.addr.09 = phi i32* [ %add.ptr, %while.body ], [ %ptr1, %entry ]
	%ptr2.addr.08 = phi i32* [ %incdec.ptr, %while.body ], [ %ptr2, %entry ]			%ptr2.addr.08 = phi i32* [ %incdec.ptr, %while.body ], [ %ptr2, %entry ]
	%0 = load i32, i32* %ptr1.addr.09, align 4			%0 = load i32, i32* %ptr1.addr.09, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %ptr1.addr.09, i32 1			%arrayidx1 = getelementptr inbounds i32, i32* %ptr1.addr.09, i32 1
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	%arrayidx3 = getelementptr inbounds i32, i32* %ptr1.addr.09, i32 2			%arrayidx3 = getelementptr inbounds i32, i32* %ptr1.addr.09, i32 2
	%2 = load i32, i32* %arrayidx3, align 4			%2 = load i32, i32* %arrayidx3, align 4
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/favor-low-reg-for-Osize.ll

This file was added.

				; REQUIRES: asserts
				; RUN: llc -debug-only=regalloc < %s 2>%t \| FileCheck %s --check-prefix=CHECK
				; RUN: FileCheck %s < %t --check-prefix=DEBUG

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n8:16:32-S64"
				target triple = "thumbv7m--linux-gnueabi"


				; DEBUG: AllocationOrder(GPR) = [ $r0 $r1 $r2 $r3 $r4 $r5 $r6 $r7 $r12 $lr $r8 $r9 $r10 $r11 ]

				define i32 @test_minsize(i32 %x) optsize minsize {
				; CHECK-LABEL: test_minsize:
				entry:
				; CHECK: mov r4, r0
				tail call void asm sideeffect "", "~{r0},~{r1},~{r2},~{r3}"()
				; CHECK: mov r0, r4
				ret i32 %x
				}

				; DEBUG: AllocationOrder(GPR) = [ $r0 $r1 $r2 $r3 $r12 $lr $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 ]

				define i32 @test_optsize(i32 %x) optsize {
				; CHECK-LABEL: test_optsize:
				entry:
				; CHECK: mov r12, r0
				tail call void asm sideeffect "", "~{r0},~{r1},~{r2},~{r3}"()
				; CHECK: mov r0, r12
				ret i32 %x
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsize
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207731

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

llvm/lib/CodeGen/RegisterClassInfo.cpp

llvm/lib/Target/ARM/ARMRegisterInfo.td

llvm/lib/Target/ARM/ARMSubtarget.h

llvm/lib/Target/ARM/ARMSubtarget.cpp

llvm/test/CodeGen/ARM/avoid-cpsr-rmw.ll

llvm/test/CodeGen/ARM/favor-low-reg-for-Osize.ll

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsizeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207731

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

llvm/lib/CodeGen/RegisterClassInfo.cpp

llvm/lib/Target/ARM/ARMRegisterInfo.td

llvm/lib/Target/ARM/ARMSubtarget.h

llvm/lib/Target/ARM/ARMSubtarget.cpp

llvm/test/CodeGen/ARM/avoid-cpsr-rmw.ll

llvm/test/CodeGen/ARM/favor-low-reg-for-Osize.ll

[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsize
ClosedPublic