This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMInstrThumb.td
2
ARMRegisterInfo.td
2
Thumb1InstrInfo.cpp
-
ThumbRegisterInfo.h
-
ThumbRegisterInfo.cpp
-
test/CodeGen/Thumb/
-
CodeGen/
-
Thumb/
-
high-reg-spill-expand.mir
-
high-reg-spill.mir

Differential D49364

[ARM] Add support for spilling high registers in Thumb1
Needs ReviewPublic

Authored by petpav01 on Jul 16 2018, 1:40 AM.

Download Raw Diff

Details

Reviewers

olista01
t.p.northover
javed.absar
eli.friedman

Summary

LLVM normally only makes use of low registers in Thumb1 and methods Thumb1InstrInfo::storeRegToStackSlot()/loadRegFromStackSlot() are currently able to store/restore only them. However, it is possible in rare cases that a register allocator might need to spill a high register in the middle of a function as well.

Example:

$ cat test.c
void constraint_h(void) {
  int i;
  asm volatile("@ %0" : : "h" (i) : "r12");
}
$ clang -target arm-none-eabi -march=armv6-m -c test.c
clang-7: [...]/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp:85: virtual void llvm::Thumb1InstrInfo::storeRegToStackSlot(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::iterator, unsigned int, bool, int, const llvm::TargetRegisterClass*, const llvm::TargetRegisterInfo*) const: Assertion `(RC == &ARM::tGPRRegClass || (TargetRegisterInfo::isPhysicalRegister(SrcReg) && isARMLowRegister(SrcReg))) && "Unknown regclass!"' failed.
[...]

The program was compiled at -O0 and so Fast Register Allocator is used. The following happens in this case:

Prior to register allocation, MIR looks as follows:

Frame Objects:
  fi#0: size=4, align=4, at location [SP]

bb.0.entry:
  %1:tgpr = tLDRspi %stack.0.i, 0, 14, $noreg :: (dereferenceable load 4 from %ir.i)
  %0:hgpr = COPY %1:tgpr
  INLINEASM &"@ $0" [sideeffect] [attdialect], $0:[reguse:hGPR], %0:hgpr, $1:[clobber], implicit-def early-clobber $r12, !3
  tBX_RET 14, $noreg

Fast Register Allocator first satisfies %0:hgpr by selecting r12.
When the scan reaches the INLINEASM instruction, the allocator however notices that r12 is clobbered and so it needs to be spilled.
The allocator calls Thumb1InstrInfo::storeRegToStackSlot() to store the register in a stack slot but the method does not know how to do it and aborts. This can also result in a miscompilation if LLVM is built without assertions enabled.

Both store and load of a high register in Thumb1 needs an additional low register. For instance, the store is implemented as:

mov %lowReg, %spilledHighReg
str %lowReg, ...

An initial patch in this review extended Thumb1InstrInfo::storeRegToStackSlot() and loadRegFromStackSlot() to allow storing and restoring high registers by inserting a pseudo-instruction that got later lowered after register allocation in ThumbRegisterInfo::eliminateFrameIndex(). This relied on the register scavenger to secure a low register for the sequence. This is possibly problematic when the register pressure is high because ThumbRegisterInfo::saveScavengerRegister() at the moment also tries to make use of high register r12.

The current patch extends RegAllocFast and InlineSpiller to handle a spill with an intermediary directly.

Diff Detail

Event Timeline

petpav01 created this revision.Jul 16 2018, 1:40 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJul 16 2018, 1:40 AM

Herald added subscribers: llvm-commits, chrib, kristof.beyls, qcolombet. · View Herald Transcript

petpav01 added a reviewer: eli.friedman.Jul 16 2018, 2:56 AM

thopre added a subscriber: thopre.Jul 16 2018, 9:46 AM

thopre added inline comments.

lib/Target/ARM/Thumb1InstrInfo.cpp
138–147	Can you put a similar comment to the store to stack slot?

This is possibly problematic when the register pressure is high because ThumbRegisterInfo::saveScavengerRegister() currently also tries to make use of high register r12.

Also, the constant islands pass can clobber lr. But given the only way to end up with an "hGPR" register class is inline asm, we could probably work around this issue by excluding ip/lr from allocation order for hGPR.

That said, if we ever want to make the high registers generally allocatable in Thumb1 mode, this patch probably isn't the right solution; instead, we should make the register allocator insert the copy, so we aren't forced to scavenge a register later.

lib/Target/ARM/ARMRegisterInfo.td
209	This comment isn't really right. It's still worth using the high registers, with appropriate cost constraints: there are a few important instructions which can take high registers as inputs (cmp, add, bx/blx), and even if we're just effectively using them as spill slots, it's cheaper than spilling to the stack.

petpav01 updated this revision to Diff 156217.Jul 19 2018, 12:41 AM

In D49364#1164005, @efriedma wrote:

This is possibly problematic when the register pressure is high because ThumbRegisterInfo::saveScavengerRegister() currently also tries to make use of high register r12.

Also, the constant islands pass can clobber lr. But given the only way to end up with an "hGPR" register class is inline asm, we could probably work around this issue by excluding ip/lr from allocation order for hGPR.

That said, if we ever want to make the high registers generally allocatable in Thumb1 mode, this patch probably isn't the right solution; instead, we should make the register allocator insert the copy, so we aren't forced to scavenge a register later.

Sorry, the mentioned idea with the copy is not quite clear to me. Could you please explain it a bit more for me?

lib/Target/ARM/ARMRegisterInfo.td
209	Updated, hopefully it now makes sense.
lib/Target/ARM/Thumb1InstrInfo.cpp
138–147	Added.

Sorry, the mentioned idea with the copy is not quite clear to me. Could you please explain it a bit more for me?

Say the target has a new hook, call it "getRegClassForStackSaveRestore()" or something, which takes a register class, and returns a register class appropriate for stack save/restore operations. Then when a register allocator wants to spill a vreg, it first calls getRegClassForStackSaveRestore(); if that returns a new register class, instead of spilling using storeRegToStackSlot, it makes a new vreg with the returned class, and inserts a COPY to that vreg.

This avoids having to scavenge a register later; the register allocator has more ways to make a register available, so the resulting code is likely more efficient, and it avoids the potential problem of needing to scavenge multiple registers in ThumbRegisterInfo::eliminateFrameIndex.

thegameg added a subscriber: thegameg.Jul 30 2018, 4:04 AM

Thanks for the explanation of this idea. Updated patch goes in that direction and provides a prototype of this approach. The implementation is done in Fast Register Allocator (which has its own spiller code) and in InlineSpiller (used by the other LLVM allocators: Basic, Greedy, PBQP).

The implemented approach is to always make a complete spill of a high register to stack instead of initially moving it only to a low register and then spill the low register if actually needed. This allows to keep things a bit simpler to implement and reason about. InlineSpiller could be improved to implement only the mentioned "half-spills" but it does not appear necessary for now. With this problem currently being limited only to inline assembler, I think the Greedy production allocator should not get in a state where it would need to spill high registers.

The patch is not complete but I thought I would ask for feedback on it early, before I dive into solving remaining issues.

Known problems:

Reloads of registers that require a COPY instruction should be done by RegAllocFast before other register uses try to get satisfied to provide better assignment possibilities for the temporary register.
Helper registers used in high-register reloads should get properly removed from UsedInInstr in RegAllocFast so they can get used by the actual instruction.
RegAllocFast uses one temporary virtual register for all COPY instructions that it needs to insert for high-register spills. This is a workaround for LiveRegMap (SparseSet) not being resizable when it is not empty.
Operands of COPY instructions inserted by InlineSpiller can get inflated to GPR. This is visible in test hgpr-spill-basic.mir and would cause a problem if the inflated GRP register needed to get subsequently spilled.
Interaction with snippets and hoisting in InlineSpiller is likely not really correct.

Herald added subscribers: eraman, MatzeB. · View Herald TranscriptAug 14 2018, 2:38 AM

petpav01 mentioned this in D51927: [ARM] Enable spilling of the hGPR class in Thumb2.Sep 11 2018, 6:03 AM

Updated patch improves the RegAllocFast part and adds more testing for it. InlineSpiller has no new changes.

Description of the changes:

Code to allocate an intermediary register for the spill is moved to RegAllocFast::handleIntermediarySpill().
RegAllocFast::allocVirtReg() is split into allocVirtReg() and assignVirtReg(). The former method still does most of the allocation work but leaves final update of PhysRegState + LRI->PhysReg and error reporting to assignVirtReg(). This allows handleIntermediarySpill() to call allocVirtReg() to get a free register without updating other state.
Spilling all registers prior to a call instruction in RegAllocFast::allocateBasicBlock() is moved before clearing of UsedInInstr so an intermediary does not get allocated to a register used by the instruction.

This is still not a complete patch. Known problems are:

handleIntermediarySpill() does not always correctly update debug information (DBG_VALUEs).
InlineSpiller still has the same problems as mentioned previously and needs more work.
Changes implemented in SparseSet are currently without testing.

Any feedback on this is very welcome, especially whether the overall approach looks sensible or if some different idea would be preferable and better.

Note: There is a ongoing rewrite of RegAllocFast in D52010 which means this patch will need to be somewhat reworked after the rewrite lands but I do not think it should affect the basic idea that is implemented here.

Herald added a subscriber: dexonsmith. · View Herald TranscriptNov 7 2018, 5:59 AM

pratlucas mentioned this in D80999: [ARM][CodeGen] Enabling spilling of high registers in RegAllocFast for Thumb1.Jun 2 2020, 6:27 AM

Revision Contents

Path

Size

lib/

Target/

ARM/

10 lines

7 lines

68 lines

8 lines

ThumbRegisterInfo.cpp

71 lines

test/

CodeGen/

Thumb/

high-reg-spill-expand.mir

64 lines

high-reg-spill.mir

50 lines

Diff 156217

lib/Target/ARM/ARMInstrThumb.td

Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	defm tSTRB : thumb_st_rr_ri_enc<0b010, 0b0111, t_addrmode_rr,
truncstorei8>;		truncstorei8>;

// A8.6.207 & A8.6.205		// A8.6.207 & A8.6.205
defm tSTRH : thumb_st_rr_ri_enc<0b001, 0b1000, t_addrmode_rr,		defm tSTRH : thumb_st_rr_ri_enc<0b001, 0b1000, t_addrmode_rr,
t_addrmode_is2, AddrModeT1_2,		t_addrmode_is2, AddrModeT1_2,
IIC_iStore_bh_r, IIC_iStore_bh_i, "strh",		IIC_iStore_bh_r, IIC_iStore_bh_i, "strh",
truncstorei16>;		truncstorei16>;

		// Pseudo instructions for Thumb1 high-register spills.
		let mayStore = 1 in
		def tSPILL_HREG_SAVE :
		tPseudoInst<(outs), (ins hGPR:$Rt, t_addrmode_sp:$addr), 0, IIC_iStore_i, []>,
		Requires<[IsThumb1Only]>, Sched<[WriteST]>;
		let mayLoad = 1 in
		def tSPILL_HREG_RESTORE :
		tPseudoInst<(outs hGPR:$Rt), (ins t_addrmode_sp:$addr), 0, IIC_iLoad_i, []>,
		Requires<[IsThumb1Only]>, Sched<[WriteLd]>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Load / store multiple Instructions.		// Load / store multiple Instructions.
//		//

// These require base address to be written back or one of the loaded regs.		// These require base address to be written back or one of the loaded regs.
let hasSideEffects = 0 in {		let hasSideEffects = 0 in {

▲ Show 20 Lines • Show All 924 Lines • Show Last 20 Lines

lib/Target/ARM/ARMRegisterInfo.td

	Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	// r7 == Frame Pointer (thumb-style backtraces)			// r7 == Frame Pointer (thumb-style backtraces)
	// r9 == May be reserved as Thread Register			// r9 == May be reserved as Thread Register
	// r11 == Frame Pointer (arm-style backtraces)			// r11 == Frame Pointer (arm-style backtraces)
	// r10 == Stack Limit			// r10 == Stack Limit
	//			//
	def GPR : RegisterClass<"ARM", [i32], 32, (add (sequence "R%u", 0, 12),			def GPR : RegisterClass<"ARM", [i32], 32, (add (sequence "R%u", 0, 12),
	SP, LR, PC)> {			SP, LR, PC)> {
	// Allocate LR as the first CSR since it is always saved anyway.			// Allocate LR as the first CSR since it is always saved anyway.
	// For Thumb1 mode, we don't want to allocate hi regs at all, as we don't			// For Thumb1 mode, don't make hi regs generally allocatable as we aren't
	// know how to spill them. If we make our prologue/epilogue code smarter at			// currently great at working with them as such, e.g. spilling support for
				efriedmaUnsubmitted Not Done Reply Inline Actions This comment isn't really right. It's still worth using the high registers, with appropriate cost constraints: there are a few important instructions which can take high registers as inputs (cmp, add, bx/blx), and even if we're just effectively using them as spill slots, it's cheaper than spilling to the stack. efriedma: This comment isn't really right. It's still worth using the high registers, with appropriate…
				petpav01AuthorUnsubmitted Not Done Reply Inline Actions Updated, hopefully it now makes sense. petpav01: Updated, hopefully it now makes sense.
	// some point, we can go back to using the above allocation orders for the			// them is limited.
	// Thumb1 instructions that know how to use hi regs.
	let AltOrders = [(add LR, GPR), (trunc GPR, 8)];			let AltOrders = [(add LR, GPR), (trunc GPR, 8)];
	let AltOrderSelect = [{			let AltOrderSelect = [{
	return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();			return 1 + MF.getSubtarget<ARMSubtarget>().isThumb1Only();
	}];			}];
	let DiagnosticString = "operand must be a register in range [r0, r15]";			let DiagnosticString = "operand must be a register in range [r0, r15]";
	}			}

	// GPRs without the PC. Some ARM instructions do not allow the PC in			// GPRs without the PC. Some ARM instructions do not allow the PC in
	▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

lib/Target/ARM/Thumb1InstrInfo.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
}		}
}		}

void Thumb1InstrInfo::		void Thumb1InstrInfo::
storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,		storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
unsigned SrcReg, bool isKill, int FI,		unsigned SrcReg, bool isKill, int FI,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
assert((RC == &ARM::tGPRRegClass \|\|
(TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
isARMLowRegister(SrcReg))) && "Unknown regclass!");

if (RC == &ARM::tGPRRegClass \|\|
(TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
isARMLowRegister(SrcReg))) {
DebugLoc DL;		DebugLoc DL;
if (I != MBB.end()) DL = I->getDebugLoc();		if (I != MBB.end())
		DL = I->getDebugLoc();

MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, FI), MachineMemOperand::MOStore,		MachinePointerInfo::getFixedStack(MF, FI), MachineMemOperand::MOStore,
MFI.getObjectSize(FI), MFI.getObjectAlignment(FI));		MFI.getObjectSize(FI), MFI.getObjectAlignment(FI));

		if (RC->hasSuperClassEq(&ARM::tGPRRegClass) \|\|
		(TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
		isARMLowRegister(SrcReg)))
BuildMI(MBB, I, DL, get(ARM::tSTRspi))		BuildMI(MBB, I, DL, get(ARM::tSTRspi))
.addReg(SrcReg, getKillRegState(isKill))		.addReg(SrcReg, getKillRegState(isKill))
.addFrameIndex(FI)		.addFrameIndex(FI)
.addImm(0)		.addImm(0)
.addMemOperand(MMO)		.addMemOperand(MMO)
.add(predOps(ARMCC::AL));		.add(predOps(ARMCC::AL));
}		else if (RC->hasSuperClassEq(&ARM::hGPRRegClass))
		// Callers of storeRegToStackSlot() may expect only a single instruction to
		// be added but Thumb1 does not have an instruction that directly stores a
		// high register. Insert therefore a pseudo instruction that gets lowered
		// after register allocation in eliminateFrameIndex().
		BuildMI(MBB, I, DL, get(ARM::tSPILL_HREG_SAVE))
		.addReg(SrcReg, getKillRegState(isKill))
		.addFrameIndex(FI)
		.addImm(0)
		.addMemOperand(MMO);
		else
		llvm_unreachable("Unknown reg class!");
}		}

void Thumb1InstrInfo::		void Thumb1InstrInfo::
loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,		loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
unsigned DestReg, int FI,		unsigned DestReg, int FI,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
assert((RC->hasSuperClassEq(&ARM::tGPRRegClass) \|\|
(TargetRegisterInfo::isPhysicalRegister(DestReg) &&
isARMLowRegister(DestReg))) && "Unknown regclass!");

if (RC->hasSuperClassEq(&ARM::tGPRRegClass) \|\|
(TargetRegisterInfo::isPhysicalRegister(DestReg) &&
isARMLowRegister(DestReg))) {
DebugLoc DL;		DebugLoc DL;
if (I != MBB.end()) DL = I->getDebugLoc();		if (I != MBB.end())
		DL = I->getDebugLoc();
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, FI), MachineMemOperand::MOLoad,		MachinePointerInfo::getFixedStack(MF, FI), MachineMemOperand::MOLoad,
MFI.getObjectSize(FI), MFI.getObjectAlignment(FI));		MFI.getObjectSize(FI), MFI.getObjectAlignment(FI));

		if (RC->hasSuperClassEq(&ARM::tGPRRegClass) \|\|
		(TargetRegisterInfo::isPhysicalRegister(DestReg) &&
		isARMLowRegister(DestReg)))
BuildMI(MBB, I, DL, get(ARM::tLDRspi), DestReg)		BuildMI(MBB, I, DL, get(ARM::tLDRspi), DestReg)
.addFrameIndex(FI)		.addFrameIndex(FI)
.addImm(0)		.addImm(0)
.addMemOperand(MMO)		.addMemOperand(MMO)
.add(predOps(ARMCC::AL));		.add(predOps(ARMCC::AL));
}		else if (RC->hasSuperClassEq(&ARM::hGPRRegClass))
		// Insert a pseudo instruction to perform the load, similarly as in
		// storeRegToStackSlot().
		BuildMI(MBB, I, DL, get(ARM::tSPILL_HREG_RESTORE), DestReg)
		.addFrameIndex(FI)
		.addImm(0)
		.addMemOperand(MMO);
		else
		llvm_unreachable("Unknown reg class!");
}		}
		thopreUnsubmitted Not Done Reply Inline Actions Can you put a similar comment to the store to stack slot? thopre: Can you put a similar comment to the store to stack slot?
		petpav01AuthorUnsubmitted Not Done Reply Inline Actions Added. petpav01: Added.

void Thumb1InstrInfo::expandLoadStackGuard(		void Thumb1InstrInfo::expandLoadStackGuard(
MachineBasicBlock::iterator MI) const {		MachineBasicBlock::iterator MI) const {
MachineFunction &MF = *MI->getParent()->getParent();		MachineFunction &MF = *MI->getParent()->getParent();
const TargetMachine &TM = MF.getTarget();		const TargetMachine &TM = MF.getTarget();
if (TM.isPositionIndependent())		if (TM.isPositionIndependent())
expandLoadStackGuardBase(MI, ARM::tLDRLIT_ga_pcrel, ARM::tLDRi);		expandLoadStackGuardBase(MI, ARM::tLDRLIT_ga_pcrel, ARM::tLDRi);
else		else
Show All 15 Lines

lib/Target/ARM/ThumbRegisterInfo.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:
bool saveScavengerRegister(MachineBasicBlock &MBB,		bool saveScavengerRegister(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
MachineBasicBlock::iterator &UseMI,		MachineBasicBlock::iterator &UseMI,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
unsigned Reg) const override;		unsigned Reg) const override;
void eliminateFrameIndex(MachineBasicBlock::iterator II,		void eliminateFrameIndex(MachineBasicBlock::iterator II,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS = nullptr) const override;		RegScavenger *RS = nullptr) const override;

		private:
		void eliminateThumb1FrameIndex(MachineBasicBlock::iterator II, int SPAdj,
		unsigned FIOperandNum, RegScavenger *RS) const;
		void eliminateThumb1FrameIndexFromHighRegSpill(MachineBasicBlock::iterator II,
		int SPAdj,
		unsigned FIOperandNum,
		RegScavenger *RS) const;
};		};
}		}

#endif		#endif

lib/Target/ARM/ThumbRegisterInfo.cpp

Show First 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	void ThumbRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
MachineInstr &MI = *II;		MachineInstr &MI = *II;
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
const ARMSubtarget &STI = MF.getSubtarget<ARMSubtarget>();		const ARMSubtarget &STI = MF.getSubtarget<ARMSubtarget>();
if (!STI.isThumb1Only())		if (!STI.isThumb1Only())
return ARMBaseRegisterInfo::eliminateFrameIndex(II, SPAdj, FIOperandNum,		return ARMBaseRegisterInfo::eliminateFrameIndex(II, SPAdj, FIOperandNum,
RS);		RS);

		// Eliminate frame index from Thumb1 high-register spills.
		if (MI.getOpcode() == ARM::tSPILL_HREG_SAVE \|\|
		MI.getOpcode() == ARM::tSPILL_HREG_RESTORE) {
		eliminateThumb1FrameIndexFromHighRegSpill(II, SPAdj, FIOperandNum, RS);
		return;
		}

		eliminateThumb1FrameIndex(II, SPAdj, FIOperandNum, RS);
		}

		void ThumbRegisterInfo::eliminateThumb1FrameIndex(
		MachineBasicBlock::iterator II, int SPAdj, unsigned FIOperandNum,
		RegScavenger *RS) const {
		MachineInstr &MI = *II;
		MachineBasicBlock &MBB = *MI.getParent();
		MachineFunction &MF = *MBB.getParent();
		const ARMSubtarget &STI = MF.getSubtarget<ARMSubtarget>();
		assert(STI.isThumb1Only() &&
		"This eliminateFrameIndex only supports Thumb1!");

unsigned VReg = 0;		unsigned VReg = 0;
const ARMBaseInstrInfo &TII = *STI.getInstrInfo();		const ARMBaseInstrInfo &TII = *STI.getInstrInfo();
DebugLoc dl = MI.getDebugLoc();		DebugLoc dl = MI.getDebugLoc();
MachineInstrBuilder MIB(*MBB.getParent(), &MI);		MachineInstrBuilder MIB(*MBB.getParent(), &MI);

unsigned FrameReg;		unsigned FrameReg;
int FrameIndex = MI.getOperand(FIOperandNum).getIndex();		int FrameIndex = MI.getOperand(FIOperandNum).getIndex();
const ARMFrameLowering *TFI = getFrameLowering(MF);		const ARMFrameLowering *TFI = getFrameLowering(MF);
Show All 17 Lines	#endif // NDEBUG
// Special handling of dbg_value instructions.		// Special handling of dbg_value instructions.
if (MI.isDebugValue()) {		if (MI.isDebugValue()) {
MI.getOperand(FIOperandNum). ChangeToRegister(FrameReg, false /isDef/);		MI.getOperand(FIOperandNum). ChangeToRegister(FrameReg, false /isDef/);
MI.getOperand(FIOperandNum+1).ChangeToImmediate(Offset);		MI.getOperand(FIOperandNum+1).ChangeToImmediate(Offset);
return;		return;
}		}

// Modify MI as necessary to handle as much of 'Offset' as possible		// Modify MI as necessary to handle as much of 'Offset' as possible
assert(MF.getInfo<ARMFunctionInfo>()->isThumbFunction() &&
"This eliminateFrameIndex only supports Thumb1!");
if (rewriteFrameIndex(MI, FIOperandNum, FrameReg, Offset, TII))		if (rewriteFrameIndex(MI, FIOperandNum, FrameReg, Offset, TII))
return;		return;

// If we get here, the immediate doesn't fit into the instruction. We folded		// If we get here, the immediate doesn't fit into the instruction. We folded
// as much as possible above, handle the rest, providing a register that is		// as much as possible above, handle the rest, providing a register that is
// SP+LargeImm.		// SP+LargeImm.
assert(Offset && "This code isn't needed if offset already handled!");		assert(Offset && "This code isn't needed if offset already handled!");

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	#endif // NDEBUG
} else {		} else {
llvm_unreachable("Unexpected opcode!");		llvm_unreachable("Unexpected opcode!");
}		}

// Add predicate back if it's needed.		// Add predicate back if it's needed.
if (MI.isPredicable())		if (MI.isPredicable())
MIB.add(predOps(ARMCC::AL));		MIB.add(predOps(ARMCC::AL));
}		}

		void ThumbRegisterInfo::eliminateThumb1FrameIndexFromHighRegSpill(
		MachineBasicBlock::iterator II, int SPAdj, unsigned FIOperandNum,
		RegScavenger *RS) const {
		MachineInstr &MI = *II;
		MachineBasicBlock &MBB = *MI.getParent();
		MachineFunction &MF = *MBB.getParent();
		const ARMSubtarget &STI = MF.getSubtarget<ARMSubtarget>();
		assert(STI.isThumb1Only() &&
		"This eliminateFrameIndex only supports Thumb1!");
		const ARMBaseInstrInfo &TII = *STI.getInstrInfo();
		MachineRegisterInfo &MRI = MF.getRegInfo();

		// Elimination of a frame index from Thumb1 high-register spills is done in
		// two steps. The pseudo instructions get first expanded into low-register
		// stores/loads and then the frame index is eliminated from these new
		// instructions.
		unsigned LowReg = MRI.createVirtualRegister(&ARM::tGPRRegClass);
		unsigned HiReg = MI.getOperand(0).getReg();
		MachineInstr *UpdateMI;
		unsigned Opcode = MI.getOpcode();
		if (Opcode == ARM::tSPILL_HREG_SAVE) {
		// Emit a MOV from the high reg to the low reg.
		BuildMI(MBB, II, MI.getDebugLoc(), TII.get(ARM::tMOVr), LowReg)
		.addReg(HiReg, RegState::Kill)
		.add(predOps(ARMCC::AL));
		// Store the low register.
		UpdateMI = BuildMI(MBB, II, MI.getDebugLoc(), TII.get(ARM::tSTRspi))
		.addReg(LowReg, RegState::Kill)
		.add(MI.getOperand(1))
		.add(MI.getOperand(2))
		.setMemRefs(MI.memoperands_begin(), MI.memoperands_end())
		.add(predOps(ARMCC::AL));
		} else if (Opcode == ARM::tSPILL_HREG_RESTORE) {
		// Load the saved value in the low register.
		UpdateMI = BuildMI(MBB, II, MI.getDebugLoc(), TII.get(ARM::tLDRspi), LowReg)
		.add(MI.getOperand(1))
		.add(MI.getOperand(2))
		.setMemRefs(MI.memoperands_begin(), MI.memoperands_end())
		.add(predOps(ARMCC::AL));
		// Emit a MOV from the low reg to the high reg.
		BuildMI(MBB, II, MI.getDebugLoc(), TII.get(ARM::tMOVr), HiReg)
		.addReg(LowReg, RegState::Kill)
		.add(predOps(ARMCC::AL));
		} else
		llvm_unreachable("Unexpected opcode!");
		MBB.erase(II);
		eliminateThumb1FrameIndex(UpdateMI->getIterator(), SPAdj, FIOperandNum, RS);
		}

test/CodeGen/Thumb/high-reg-spill-expand.mir

This file was added.

				# RUN: llc -run-pass prologepilog %s -o - \| FileCheck %s

				# Check that the tSPILL_HREG_SAVE/RESTORE pseudo instructions get properly
				# expanded and have their frame index eliminated when the Prologue/Epilogue
				# Insertion pass is run.

				--- \|
				; ModuleID = 'test.ll'
				source_filename = "test.c"
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv6m-none--eabi"

				define void @constraint_h() #0 {
				entry:
				%i = alloca i32, align 4
				%0 = load i32, i32* %i, align 4
				call void asm sideeffect "@ $0", "h,~{r12}"(i32 %0)
				ret void
				}

				attributes #0 = { "no-frame-pointer-elim"="true" }

				...
				---
				name: constraint_h
				tracksRegLiveness: true
				stack:
				- { id: 0, name: i, size: 4, alignment: 4, stack-id: 0, local-offset: -4 }
				- { id: 1, type: spill-slot, size: 4, alignment: 4, stack-id: 0 }
				body: \|
				bb.0.entry:
				renamable $r0 = tLDRspi %stack.0.i, 0, 14, $noreg :: (dereferenceable load 4 from %ir.i)
				renamable $r12 = COPY killed renamable $r0
				tSPILL_HREG_SAVE killed $r12, %stack.1, 0 :: (store 4 into %stack.1)
				$r8 = tSPILL_HREG_RESTORE %stack.1, 0 :: (load 4 from %stack.1)
				INLINEASM &"@ $0", 1, 589833, killed renamable $r8, 12, implicit-def early-clobber $r12
				tBX_RET 14, $noreg

				...
				# CHECK: bb.0.entry:
				# CHECK-NEXT: liveins: $r6, $lr, $r8
				# CHECK-NEXT: {{ }}
				# CHECK-NEXT: frame-setup tPUSH 14, $noreg, killed $r6, killed $r7, killed $lr, implicit-def $sp, implicit $sp
				# CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 12
				# CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $lr, -4
				# CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $r7, -8
				# CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $r6, -12
				# CHECK-NEXT: $r7 = frame-setup tADDrSPi $sp, 1, 14, $noreg
				# CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa $r7, 8
				# CHECK-NEXT: $lr = tMOVr killed $r8, 14, $noreg
				# CHECK-NEXT: tPUSH 14, $noreg, killed $lr, implicit-def $sp, implicit $sp
				# CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $r8, -16
				# CHECK-NEXT: $sp = frame-setup tSUBspi $sp, 2, 14, $noreg
				# CHECK-NEXT: renamable $r0 = tLDRspi $sp, 1, 14, $noreg :: (dereferenceable load 4 from %ir.i)
				# CHECK-NEXT: renamable $r12 = COPY killed renamable $r0
				# CHECK-NEXT: $r0 = tMOVr killed $r12, 14, $noreg
				# CHECK-NEXT: tSTRspi killed $r0, $sp, 0, 14, $noreg :: (store 4 into %stack.1)
				# CHECK-NEXT: $r0 = tLDRspi $sp, 0, 14, $noreg :: (load 4 from %stack.1)
				# CHECK-NEXT: $r8 = tMOVr killed $r0, 14, $noreg
				# CHECK-NEXT: INLINEASM &"@ $0", 1, 589833, killed renamable $r8, 12, implicit-def early-clobber $r12
				# CHECK-NEXT: $sp = tADDspi $sp, 2, 14, $noreg
				# CHECK-NEXT: tPOP 14, $noreg, def $r0, implicit-def $sp, implicit $sp
				# CHECK-NEXT: $r8 = tMOVr killed $r0, 14, $noreg
				# CHECK-NEXT: tPOP_RET 14, $noreg, def $r6, def $r7, def $pc, implicit-def $sp, implicit $sp

test/CodeGen/Thumb/high-reg-spill.mir

This file was added.

				# RUN: llc -run-pass regallocfast %s -o - \| FileCheck %s

				# This test examines register allocation and spilling with Fast Register
				# Allocator. The test uses inline assembler that requests an input variable to
				# be loaded in a high register but at the same time has r12 marked as clobbered.
				# The allocator initially satisfies the load request by selecting r12 but then
				# needs to spill this register when it reaches the INLINEASM instruction and
				# notices its clobber definition.
				#
				# The test checks that the compiler is able to spill a high register in Thumb1
				# by inserting the tSPILL_HREG_SAVE/RESTORE pseudo instructions.

				--- \|
				; ModuleID = 'test.ll'
				source_filename = "test.c"
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv6m-none--eabi"

				define dso_local void @constraint_h() {
				entry:
				%i = alloca i32, align 4
				%0 = load i32, i32* %i, align 4
				call void asm sideeffect "@ $0", "h,~{r12}"(i32 %0)
				ret void
				}

				...
				---
				name: constraint_h
				tracksRegLiveness: true
				registers:
				- { id: 0, class: hgpr }
				- { id: 1, class: tgpr }
				stack:
				- { id: 0, name: i, size: 4, alignment: 4, stack-id: 0, local-offset: -4 }
				body: \|
				bb.0.entry:
				%1:tgpr = tLDRspi %stack.0.i, 0, 14, $noreg :: (dereferenceable load 4 from %ir.i)
				%0:hgpr = COPY %1
				INLINEASM &"@ $0", 1, 589833, %0, 12, implicit-def early-clobber $r12
				tBX_RET 14, $noreg

				...
				# CHECK: bb.0.entry:
				# CHECK-NEXT: renamable $r0 = tLDRspi %stack.0.i, 0, 14, $noreg :: (dereferenceable load 4 from %ir.i)
				# CHECK-NEXT: renamable $r12 = COPY killed renamable $r0
				# CHECK-NEXT: tSPILL_HREG_SAVE killed $r12, %stack.1, 0 :: (store 4 into %stack.1)
				# CHECK-NEXT: $r8 = tSPILL_HREG_RESTORE %stack.1, 0 :: (load 4 from %stack.1)
				# CHECK-NEXT: INLINEASM &"@ $0", 1, 589833, killed renamable $r8, 12, implicit-def early-clobber $r12
				# CHECK-NEXT: tBX_RET 14, $noreg