This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve handling of stack accesses in Thumb-1.
ClosedPublic

Authored by john.brawn on Feb 12 2015, 8:53 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy

Summary

Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions:

Use word loads for anyext byte and halfword loads from the stack.
Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid.
Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index.
Correct how many bits of offset we think AddrModeT1_s has.
Make the load/store optimizer able to cope with AddrModeT1_s.
Fiddle with a bunch of tests to cope with the code generation changes that the above all causes, typically where the use of SP-based addressing causes less callee-saved registers to be used and thus saved.

Diff Detail

Repository: rL LLVM

Event Timeline

john.brawn updated this revision to Diff 19837.Feb 12 2015, 8:53 AM

john.brawn retitled this revision from to [ARM] Improve handling of stack accesses in Thumb-1..

john.brawn updated this object.

john.brawn edited the test plan for this revision. (Show Details)

john.brawn set the repository for this revision to rL LLVM.

john.brawn added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptFeb 12 2015, 8:53 AM

jmolloy added a subscriber: jmolloy.Feb 12 2015, 10:21 AM

Hi John,

This looks reasonable to me. I'd wait for someone else like Tim to give it the go-ahead though before committing.

Before you commit, please remove braces around one-liner if-statements.

Cheers,

James

This revision is now accepted and ready to land.Feb 12 2015, 10:24 AM

While working on improving the spill estimation I realised that it wasn't that which was causing frame offsets to be considered invalid, but instead that ARMII::AddrModeT1_s was treated as having 5 bits of offset but it actually has 8 bits of offset (5 bits is true for FP-based loads, but FP-based loads aren't used in Thumb-1). I've attached a new patch that fixes that also.

Hi John,

LGTM.

cheers,
--renato

Thanks. Could you commit this for me please?

John

Hi John,

I'm seeing many failures on Thumb tests, are you seeing that, too?

Failing Tests (7):

LLVM :: CodeGen/ARM/2015-01-21-thumbv4t-ldstr-opt.ll
LLVM :: CodeGen/ARM/debug-frame-vararg.ll
LLVM :: CodeGen/ARM/frame-register.ll
LLVM :: CodeGen/ARM/thumb1-varalloc.ll
LLVM :: CodeGen/ARM/thumb1_return_sequence.ll
LLVM :: CodeGen/Thumb/stm-merge.ll
LLVM :: CodeGen/Thumb/vargs.ll

ex.

llvm/test/CodeGen/Thumb/stm-merge.ll:13:10: error: expected string not found in input
; CHECK: stm r[[BASE:[0-9]]]!, {{.*}}
         ^
<stdin>:23:2: note: scanning from here
 .fnstart
 ^
<stdin>:31:2: note: possible intended match here
 str r0, [sp]
 ^

llvm/test/CodeGen/Thumb/vargs.ll:39:10: error: expected string not found in input
; CHECK: pop
         ^
<stdin>:42:6: note: scanning from here
 pop {r3}
     ^
<stdin>:63:50: note: possible intended match here
 .section __DATA,__nl_symbol_ptr,non_lazy_symbol_pointers
                                             ^

Yes I see them. Also, I've realised that the patch breaks things on big-endian.

John

New patch with test failures fixed. Some test failures were due to the load/store optimizer not understanding the AddrModeT1_s loads/stores, so it now does understand them, but an annoyingly large amount were due to tests needing adjustment due to different codegen. The widening of loads also now only happens little-endian, and not in thumb2 as then we have other instructions with more flexible addressing.

In D7594#125493, @john.brawn wrote:

New patch with test failures fixed. Some test failures were due to the load/store optimizer not understanding the AddrModeT1_s loads/stores, so it now does understand them, but an annoyingly large amount were due to tests needing adjustment due to different codegen. The widening of loads also now only happens little-endian, and not in thumb2 as then we have other instructions with more flexible addressing.

Most of the tests were not generic enough to begin with. But due to their specific nature, I'm not sure we can improve them more than what you did.

James, are you happy with the LoadStoreOptimizer changes?

Otherwise, LGTM. Thanks!

Yeah this looks fine.

Thanks. Could you or Renato commit this for me please?

John

I'm running the tests now, will commit as soon as it finishes.

cheers,
--renato

r230496

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMBaseRegisterInfo.cpp

7 lines

ARMISelDAGToDAG.cpp

15 lines

ARMInstrThumb.td

11 lines

ARMLoadStoreOptimizer.cpp

36 lines

test/

CodeGen/

ARM/

2015-01-21-thumbv4t-ldstr-opt.ll

91 lines

atomic-ops-v8.ll

2 lines

debug-frame-vararg.ll

14 lines

frame-register.ll

6 lines

thumb1-varalloc.ll

32 lines

thumb1_return_sequence.ll

48 lines

Thumb/

stack-access.ll

74 lines

stm-merge.ll

9 lines

vargs.ll

16 lines

Diff 20177

lib/Target/ARM/ARMBaseRegisterInfo.cpp

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	// allocation, so adjust our SP-relative offset by that allocation size.			// allocation, so adjust our SP-relative offset by that allocation size.
	Offset = -Offset;			Offset = -Offset;
	Offset += MFI->getLocalFrameSize();			Offset += MFI->getLocalFrameSize();
	// Assume that we'll have at least some spill slots allocated.			// Assume that we'll have at least some spill slots allocated.
	// FIXME: This is a total SWAG number. We should run some statistics			// FIXME: This is a total SWAG number. We should run some statistics
	// and pick a real one.			// and pick a real one.
	Offset += 128; // 128 bytes of spill slots			Offset += 128; // 128 bytes of spill slots

	// If there is a frame pointer, try using it.			// If there's a frame pointer and the addressing mode allows it, try using it.
	// The FP is only available if there is no dynamic realignment. We			// The FP is only available if there is no dynamic realignment. We
	// don't know for sure yet whether we'll need that, so we guess based			// don't know for sure yet whether we'll need that, so we guess based
	// on whether there are any local variables that would trigger it.			// on whether there are any local variables that would trigger it.
	unsigned StackAlign = TFI->getStackAlignment();			unsigned StackAlign = TFI->getStackAlignment();
	if (TFI->hasFP(MF) &&			if (TFI->hasFP(MF) &&
				(MI->getDesc().TSFlags & ARMII::AddrModeMask) != ARMII::AddrModeT1_s &&
	!((MFI->getLocalFrameMaxAlign() > StackAlign) && canRealignStack(MF))) {			!((MFI->getLocalFrameMaxAlign() > StackAlign) && canRealignStack(MF))) {
	if (isFrameOffsetLegal(MI, FPOffset))			if (isFrameOffsetLegal(MI, FPOffset))
	return false;			return false;
	}			}
	// If we can reference via the stack pointer, try that.			// If we can reference via the stack pointer, try that.
	// FIXME: This (and the code that resolves the references) can be improved			// FIXME: This (and the code that resolves the references) can be improved
	// to only disallow SP relative references in the live range of			// to only disallow SP relative references in the live range of
	// the VLA(s). In practice, it's unclear how much difference that			// the VLA(s). In practice, it's unclear how much difference that
	▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	case ARMII::AddrMode_i12:			case ARMII::AddrMode_i12:
	case ARMII::AddrMode2:			case ARMII::AddrMode2:
	NumBits = 12;			NumBits = 12;
	break;			break;
	case ARMII::AddrMode3:			case ARMII::AddrMode3:
	NumBits = 8;			NumBits = 8;
	break;			break;
	case ARMII::AddrModeT1_s:			case ARMII::AddrModeT1_s:
	NumBits = 5;			NumBits = 8;
	Scale = 4;			Scale = 4;
	isSigned = false;			isSigned = false;
	break;			break;
	default:			default:
	llvm_unreachable("Unsupported addressing mode!");			llvm_unreachable("Unsupported addressing mode!");
	}			}

	Offset += getFrameIndexInstrOffset(MI, i);			Offset += getFrameIndexInstrOffset(MI, i);
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelDAGToDAG.cpp

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	SDValue &OffImm) {			SDValue &OffImm) {
	return SelectThumbAddrModeImm5S(N, 1, Base, OffImm);			return SelectThumbAddrModeImm5S(N, 1, Base, OffImm);
	}			}

	bool ARMDAGToDAGISel::SelectThumbAddrModeSP(SDValue N,			bool ARMDAGToDAGISel::SelectThumbAddrModeSP(SDValue N,
	SDValue &Base, SDValue &OffImm) {			SDValue &Base, SDValue &OffImm) {
	if (N.getOpcode() == ISD::FrameIndex) {			if (N.getOpcode() == ISD::FrameIndex) {
	int FI = cast<FrameIndexSDNode>(N)->getIndex();			int FI = cast<FrameIndexSDNode>(N)->getIndex();
				// Only multiples of 4 are allowed for the offset, so the frame object
				// alignment must be at least 4.
				MachineFrameInfo *MFI = MF->getFrameInfo();
				if (MFI->getObjectAlignment(FI) < 4)
				MFI->setObjectAlignment(FI, 4);
	Base = CurDAG->getTargetFrameIndex(FI, TLI->getPointerTy());			Base = CurDAG->getTargetFrameIndex(FI, TLI->getPointerTy());
	OffImm = CurDAG->getTargetConstant(0, MVT::i32);			OffImm = CurDAG->getTargetConstant(0, MVT::i32);
	return true;			return true;
	}			}

	if (!CurDAG->isBaseWithConstantOffset(N))			if (!CurDAG->isBaseWithConstantOffset(N))
	return false;			return false;

	RegisterSDNode *LHSR = dyn_cast<RegisterSDNode>(N.getOperand(0));			RegisterSDNode *LHSR = dyn_cast<RegisterSDNode>(N.getOperand(0));
	if (N.getOperand(0).getOpcode() == ISD::FrameIndex \|\|			if (N.getOperand(0).getOpcode() == ISD::FrameIndex \|\|
	(LHSR && LHSR->getReg() == ARM::SP)) {			(LHSR && LHSR->getReg() == ARM::SP)) {
	// If the RHS is + imm8 * scale, fold into addr mode.			// If the RHS is + imm8 * scale, fold into addr mode.
	int RHSC;			int RHSC;
	if (isScaledConstantInRange(N.getOperand(1), /Scale=/4, 0, 256, RHSC)) {			if (isScaledConstantInRange(N.getOperand(1), /Scale=/4, 0, 256, RHSC)) {
	Base = N.getOperand(0);			Base = N.getOperand(0);
	if (Base.getOpcode() == ISD::FrameIndex) {			if (Base.getOpcode() == ISD::FrameIndex) {
	int FI = cast<FrameIndexSDNode>(Base)->getIndex();			int FI = cast<FrameIndexSDNode>(Base)->getIndex();
				// For LHS+RHS to result in an offset that's a multiple of 4 the object
				// indexed by the LHS must be 4-byte aligned.
				MachineFrameInfo *MFI = MF->getFrameInfo();
				if (MFI->getObjectAlignment(FI) < 4)
				MFI->setObjectAlignment(FI, 4);
	Base = CurDAG->getTargetFrameIndex(FI, TLI->getPointerTy());			Base = CurDAG->getTargetFrameIndex(FI, TLI->getPointerTy());
	}			}
	OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32);			OffImm = CurDAG->getTargetConstant(RHSC, MVT::i32);
	return true;			return true;
	}			}
	}			}

	return false;			return false;
	▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	// Other cases are autogenerated.			// Other cases are autogenerated.
	break;			break;
	}			}
	case ISD::FrameIndex: {			case ISD::FrameIndex: {
	// Selects to ADDri FI, 0 which in turn will become ADDri SP, imm.			// Selects to ADDri FI, 0 which in turn will become ADDri SP, imm.
	int FI = cast<FrameIndexSDNode>(N)->getIndex();			int FI = cast<FrameIndexSDNode>(N)->getIndex();
	SDValue TFI = CurDAG->getTargetFrameIndex(FI, TLI->getPointerTy());			SDValue TFI = CurDAG->getTargetFrameIndex(FI, TLI->getPointerTy());
	if (Subtarget->isThumb1Only()) {			if (Subtarget->isThumb1Only()) {
				// Set the alignment of the frame object to 4, to avoid having to generate
				// more than one ADD
				MachineFrameInfo *MFI = MF->getFrameInfo();
				if (MFI->getObjectAlignment(FI) < 4)
				MFI->setObjectAlignment(FI, 4);
	return CurDAG->SelectNodeTo(N, ARM::tADDframe, MVT::i32, TFI,			return CurDAG->SelectNodeTo(N, ARM::tADDframe, MVT::i32, TFI,
	CurDAG->getTargetConstant(0, MVT::i32));			CurDAG->getTargetConstant(0, MVT::i32));
	} else {			} else {
	unsigned Opc = ((Subtarget->isThumb() && Subtarget->hasThumb2()) ?			unsigned Opc = ((Subtarget->isThumb() && Subtarget->hasThumb2()) ?
	ARM::t2ADDri : ARM::ADDri);			ARM::t2ADDri : ARM::ADDri);
	SDValue Ops[] = { TFI, CurDAG->getTargetConstant(0, MVT::i32),			SDValue Ops[] = { TFI, CurDAG->getTargetConstant(0, MVT::i32),
	getAL(CurDAG), CurDAG->getRegister(0, MVT::i32),			getAL(CurDAG), CurDAG->getRegister(0, MVT::i32),
	CurDAG->getRegister(0, MVT::i32) };			CurDAG->getRegister(0, MVT::i32) };
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrThumb.td

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	Requires<[IsThumb, HasV5T]>;			Requires<[IsThumb, HasV5T]>;

	// zextload i1 -> zextload i8			// zextload i1 -> zextload i8
	def : T1Pat<(zextloadi1 t_addrmode_rrs1:$addr),			def : T1Pat<(zextloadi1 t_addrmode_rrs1:$addr),
	(tLDRBr t_addrmode_rrs1:$addr)>;			(tLDRBr t_addrmode_rrs1:$addr)>;
	def : T1Pat<(zextloadi1 t_addrmode_is1:$addr),			def : T1Pat<(zextloadi1 t_addrmode_is1:$addr),
	(tLDRBi t_addrmode_is1:$addr)>;			(tLDRBi t_addrmode_is1:$addr)>;

				// extload from the stack -> word load from the stack, as it avoids having to
				// materialize the base in a separate register. This only works when a word
				// load puts the byte/halfword value in the same place in the register that the
				// byte/halfword load would, i.e. when little-endian.
				def : T1Pat<(extloadi1 t_addrmode_sp:$addr), (tLDRspi t_addrmode_sp:$addr)>,
				Requires<[IsThumb, IsThumb1Only, IsLE]>;
				def : T1Pat<(extloadi8 t_addrmode_sp:$addr), (tLDRspi t_addrmode_sp:$addr)>,
				Requires<[IsThumb, IsThumb1Only, IsLE]>;
				def : T1Pat<(extloadi16 t_addrmode_sp:$addr), (tLDRspi t_addrmode_sp:$addr)>,
				Requires<[IsThumb, IsThumb1Only, IsLE]>;

	// extload -> zextload			// extload -> zextload
	def : T1Pat<(extloadi1 t_addrmode_rrs1:$addr), (tLDRBr t_addrmode_rrs1:$addr)>;			def : T1Pat<(extloadi1 t_addrmode_rrs1:$addr), (tLDRBr t_addrmode_rrs1:$addr)>;
	def : T1Pat<(extloadi1 t_addrmode_is1:$addr), (tLDRBi t_addrmode_is1:$addr)>;			def : T1Pat<(extloadi1 t_addrmode_is1:$addr), (tLDRBi t_addrmode_is1:$addr)>;
	def : T1Pat<(extloadi8 t_addrmode_rrs1:$addr), (tLDRBr t_addrmode_rrs1:$addr)>;			def : T1Pat<(extloadi8 t_addrmode_rrs1:$addr), (tLDRBr t_addrmode_rrs1:$addr)>;
	def : T1Pat<(extloadi8 t_addrmode_is1:$addr), (tLDRBi t_addrmode_is1:$addr)>;			def : T1Pat<(extloadi8 t_addrmode_is1:$addr), (tLDRBi t_addrmode_is1:$addr)>;
	def : T1Pat<(extloadi16 t_addrmode_rrs2:$addr), (tLDRHr t_addrmode_rrs2:$addr)>;			def : T1Pat<(extloadi16 t_addrmode_rrs2:$addr), (tLDRHr t_addrmode_rrs2:$addr)>;
	def : T1Pat<(extloadi16 t_addrmode_is2:$addr), (tLDRHi t_addrmode_is2:$addr)>;			def : T1Pat<(extloadi16 t_addrmode_is2:$addr), (tLDRHi t_addrmode_is2:$addr)>;

	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lib/Target/ARM/ARMLoadStoreOptimizer.cpp

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines

	if (Opcode == ARM::t2LDRi12 \|\| Opcode == ARM::t2LDRi8 \|\|			if (Opcode == ARM::t2LDRi12 \|\| Opcode == ARM::t2LDRi8 \|\|
	Opcode == ARM::t2STRi12 \|\| Opcode == ARM::t2STRi8 \|\|			Opcode == ARM::t2STRi12 \|\| Opcode == ARM::t2STRi8 \|\|
	Opcode == ARM::t2LDRDi8 \|\| Opcode == ARM::t2STRDi8 \|\|			Opcode == ARM::t2LDRDi8 \|\| Opcode == ARM::t2STRDi8 \|\|
	Opcode == ARM::LDRi12 \|\| Opcode == ARM::STRi12)			Opcode == ARM::LDRi12 \|\| Opcode == ARM::STRi12)
	return OffField;			return OffField;

	// Thumb1 immediate offsets are scaled by 4			// Thumb1 immediate offsets are scaled by 4
	if (Opcode == ARM::tLDRi \|\| Opcode == ARM::tSTRi)			if (Opcode == ARM::tLDRi \|\| Opcode == ARM::tSTRi \|\|
				Opcode == ARM::tLDRspi \|\| Opcode == ARM::tSTRspi)
	return OffField * 4;			return OffField * 4;

	int Offset = isAM3 ? ARM_AM::getAM3Offset(OffField)			int Offset = isAM3 ? ARM_AM::getAM3Offset(OffField)
	: ARM_AM::getAM5Offset(OffField) * 4;			: ARM_AM::getAM5Offset(OffField) * 4;
	ARM_AM::AddrOpc Op = isAM3 ? ARM_AM::getAM3Op(OffField)			ARM_AM::AddrOpc Op = isAM3 ? ARM_AM::getAM3Op(OffField)
	: ARM_AM::getAM5Op(OffField);			: ARM_AM::getAM5Op(OffField);

	if (Op == ARM_AM::sub)			if (Op == ARM_AM::sub)
	Show All 19 Lines
	switch (Mode) {			switch (Mode) {
	default: llvm_unreachable("Unhandled submode!");			default: llvm_unreachable("Unhandled submode!");
	case ARM_AM::ia: return ARM::STMIA;			case ARM_AM::ia: return ARM::STMIA;
	case ARM_AM::da: return ARM::STMDA;			case ARM_AM::da: return ARM::STMDA;
	case ARM_AM::db: return ARM::STMDB;			case ARM_AM::db: return ARM::STMDB;
	case ARM_AM::ib: return ARM::STMIB;			case ARM_AM::ib: return ARM::STMIB;
	}			}
	case ARM::tLDRi:			case ARM::tLDRi:
				case ARM::tLDRspi:
	// tLDMIA is writeback-only - unless the base register is in the input			// tLDMIA is writeback-only - unless the base register is in the input
	// reglist.			// reglist.
	++NumLDMGened;			++NumLDMGened;
	switch (Mode) {			switch (Mode) {
	default: llvm_unreachable("Unhandled submode!");			default: llvm_unreachable("Unhandled submode!");
	case ARM_AM::ia: return ARM::tLDMIA;			case ARM_AM::ia: return ARM::tLDMIA;
	}			}
	case ARM::tSTRi:			case ARM::tSTRi:
				case ARM::tSTRspi:
	// There is no non-writeback tSTMIA either.			// There is no non-writeback tSTMIA either.
	++NumSTMGened;			++NumSTMGened;
	switch (Mode) {			switch (Mode) {
	default: llvm_unreachable("Unhandled submode!");			default: llvm_unreachable("Unhandled submode!");
	case ARM_AM::ia: return ARM::tSTMIA_UPD;			case ARM_AM::ia: return ARM::tSTMIA_UPD;
	}			}
	case ARM::t2LDRi8:			case ARM::t2LDRi8:
	case ARM::t2LDRi12:			case ARM::t2LDRi12:
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	return ARM_AM::ib;			return ARM_AM::ib;
	}			}
	}			}

	} // end namespace ARM_AM			} // end namespace ARM_AM
	} // end namespace llvm			} // end namespace llvm

	static bool isT1i32Load(unsigned Opc) {			static bool isT1i32Load(unsigned Opc) {
	return Opc == ARM::tLDRi;			return Opc == ARM::tLDRi \|\| Opc == ARM::tLDRspi;
	}			}

	static bool isT2i32Load(unsigned Opc) {			static bool isT2i32Load(unsigned Opc) {
	return Opc == ARM::t2LDRi12 \|\| Opc == ARM::t2LDRi8;			return Opc == ARM::t2LDRi12 \|\| Opc == ARM::t2LDRi8;
	}			}

	static bool isi32Load(unsigned Opc) {			static bool isi32Load(unsigned Opc) {
	return Opc == ARM::LDRi12 \|\| isT1i32Load(Opc) \|\| isT2i32Load(Opc) ;			return Opc == ARM::LDRi12 \|\| isT1i32Load(Opc) \|\| isT2i32Load(Opc) ;
	}			}

	static bool isT1i32Store(unsigned Opc) {			static bool isT1i32Store(unsigned Opc) {
	return Opc == ARM::tSTRi;			return Opc == ARM::tSTRi \|\| Opc == ARM::tSTRspi;
	}			}

	static bool isT2i32Store(unsigned Opc) {			static bool isT2i32Store(unsigned Opc) {
	return Opc == ARM::t2STRi12 \|\| Opc == ARM::t2STRi8;			return Opc == ARM::t2STRi12 \|\| Opc == ARM::t2STRi8;
	}			}

	static bool isi32Store(unsigned Opc) {			static bool isi32Store(unsigned Opc) {
	return Opc == ARM::STRi12 \|\| isT1i32Store(Opc) \|\| isT2i32Store(Opc);			return Opc == ARM::STRi12 \|\| isT1i32Store(Opc) \|\| isT2i32Store(Opc);
	}			}

	static unsigned getImmScale(unsigned Opc) {			static unsigned getImmScale(unsigned Opc) {
	switch (Opc) {			switch (Opc) {
	default: llvm_unreachable("Unhandled opcode!");			default: llvm_unreachable("Unhandled opcode!");
	case ARM::tLDRi:			case ARM::tLDRi:
	case ARM::tSTRi:			case ARM::tSTRi:
				case ARM::tLDRspi:
				case ARM::tSTRspi:
	return 1;			return 1;
	case ARM::tLDRHi:			case ARM::tLDRHi:
	case ARM::tSTRHi:			case ARM::tSTRHi:
	return 2;			return 2;
	case ARM::tLDRBi:			case ARM::tLDRBi:
	case ARM::tSTRBi:			case ARM::tSTRBi:
	return 4;			return 4;
	}			}
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	bool Writeback = isThumb1; // Thumb1 LDM/STM have base reg writeback.			bool Writeback = isThumb1; // Thumb1 LDM/STM have base reg writeback.

	// Exception: If the base register is in the input reglist, Thumb1 LDM is			// Exception: If the base register is in the input reglist, Thumb1 LDM is
	// non-writeback.			// non-writeback.
	// It's also not possible to merge an STR of the base register in Thumb1.			// It's also not possible to merge an STR of the base register in Thumb1.
	if (isThumb1)			if (isThumb1)
	for (unsigned I = 0; I < NumRegs; ++I)			for (unsigned I = 0; I < NumRegs; ++I)
	if (Base == Regs[I].first) {			if (Base == Regs[I].first) {
				assert(Base != ARM::SP && "Thumb1 does not allow SP in register list");
	if (Opcode == ARM::tLDRi) {			if (Opcode == ARM::tLDRi) {
	Writeback = false;			Writeback = false;
	break;			break;
	} else if (Opcode == ARM::tSTRi) {			} else if (Opcode == ARM::tSTRi) {
	return false;			return false;
	}			}
	}			}

	ARM_AM::AMSubMode Mode = ARM_AM::ia;			ARM_AM::AMSubMode Mode = ARM_AM::ia;
	// VFP and Thumb2 do not support IB or DA modes. Thumb1 only supports IA.			// VFP and Thumb2 do not support IB or DA modes. Thumb1 only supports IA.
	bool isNotVFP = isi32Load(Opcode) \|\| isi32Store(Opcode);			bool isNotVFP = isi32Load(Opcode) \|\| isi32Store(Opcode);
	bool haveIBAndDA = isNotVFP && !isThumb2 && !isThumb1;			bool haveIBAndDA = isNotVFP && !isThumb2 && !isThumb1;

	if (Offset == 4 && haveIBAndDA) {			if (Offset == 4 && haveIBAndDA) {
	Mode = ARM_AM::ib;			Mode = ARM_AM::ib;
	} else if (Offset == -4 * (int)NumRegs + 4 && haveIBAndDA) {			} else if (Offset == -4 * (int)NumRegs + 4 && haveIBAndDA) {
	Mode = ARM_AM::da;			Mode = ARM_AM::da;
	} else if (Offset == -4 * (int)NumRegs && isNotVFP && !isThumb1) {			} else if (Offset == -4 * (int)NumRegs && isNotVFP && !isThumb1) {
	// VLDM/VSTM do not support DB mode without also updating the base reg.			// VLDM/VSTM do not support DB mode without also updating the base reg.
	Mode = ARM_AM::db;			Mode = ARM_AM::db;
	} else if (Offset != 0) {			} else if (Offset != 0 \|\| Opcode == ARM::tLDRspi \|\| Opcode == ARM::tSTRspi) {
	// Check if this is a supported opcode before inserting instructions to			// Check if this is a supported opcode before inserting instructions to
	// calculate a new base register.			// calculate a new base register.
	if (!getLoadStoreMultipleOpcode(Opcode, Mode)) return false;			if (!getLoadStoreMultipleOpcode(Opcode, Mode)) return false;

	// If starting offset isn't zero, insert a MI to materialize a new base.			// If starting offset isn't zero, insert a MI to materialize a new base.
	// But only do so if it is cost effective, i.e. merging more than two			// But only do so if it is cost effective, i.e. merging more than two
	// loads / stores.			// loads / stores.
	if (NumRegs <= 2)			if (NumRegs <= 2)
	Show All 13 Lines
	// Use the scratch register to use as a new base.			// Use the scratch register to use as a new base.
	NewBase = Scratch;			NewBase = Scratch;
	if (NewBase == 0)			if (NewBase == 0)
	return false;			return false;
	}			}

	int BaseOpc =			int BaseOpc =
	isThumb2 ? ARM::t2ADDri :			isThumb2 ? ARM::t2ADDri :
				(isThumb1 && Base == ARM::SP) ? ARM::tADDrSPi :
	(isThumb1 && Offset < 8) ? ARM::tADDi3 :			(isThumb1 && Offset < 8) ? ARM::tADDi3 :
	isThumb1 ? ARM::tADDi8 : ARM::ADDri;			isThumb1 ? ARM::tADDi8 : ARM::ADDri;

	if (Offset < 0) {			if (Offset < 0) {
	Offset = - Offset;			Offset = - Offset;
	BaseOpc =			BaseOpc =
	isThumb2 ? ARM::t2SUBri :			isThumb2 ? ARM::t2SUBri :
	(isThumb1 && Offset < 8) ? ARM::tSUBi3 :			(isThumb1 && Offset < 8 && Base != ARM::SP) ? ARM::tSUBi3 :
	isThumb1 ? ARM::tSUBi8 : ARM::SUBri;			isThumb1 ? ARM::tSUBi8 : ARM::SUBri;
	}			}

	if (!TL->isLegalAddImmediate(Offset))			if (!TL->isLegalAddImmediate(Offset))
	// FIXME: Try add with register operand?			// FIXME: Try add with register operand?
	return false; // Probably not worth it then.			return false; // Probably not worth it then.

	if (isThumb1) {			if (isThumb1) {
	// Thumb1: depending on immediate size, use either			// Thumb1: depending on immediate size, use either
	// ADDS NewBase, Base, #imm3			// ADDS NewBase, Base, #imm3
	// or			// or
	// MOV NewBase, Base			// MOV NewBase, Base
	// ADDS NewBase, #imm8.			// ADDS NewBase, #imm8.
	if (Base != NewBase && Offset >= 8) {			if (Base != NewBase &&
				(BaseOpc == ARM::tADDi8 \|\| BaseOpc == ARM::tSUBi8)) {
	// Need to insert a MOV to the new base first.			// Need to insert a MOV to the new base first.
	if (isARMLowRegister(NewBase) && isARMLowRegister(Base) &&			if (isARMLowRegister(NewBase) && isARMLowRegister(Base) &&
	!STI->hasV6Ops()) {			!STI->hasV6Ops()) {
	// thumbv4t doesn't have lo->lo copies, and we can't predicate tMOVSr			// thumbv4t doesn't have lo->lo copies, and we can't predicate tMOVSr
	if (Pred != ARMCC::AL)			if (Pred != ARMCC::AL)
	return false;			return false;
	BuildMI(MBB, MBBI, dl, TII->get(ARM::tMOVSr), NewBase)			BuildMI(MBB, MBBI, dl, TII->get(ARM::tMOVSr), NewBase)
	.addReg(Base, getKillRegState(BaseKill));			.addReg(Base, getKillRegState(BaseKill));
	} else			} else
	BuildMI(MBB, MBBI, dl, TII->get(ARM::tMOVr), NewBase)			BuildMI(MBB, MBBI, dl, TII->get(ARM::tMOVr), NewBase)
	.addReg(Base, getKillRegState(BaseKill))			.addReg(Base, getKillRegState(BaseKill))
	.addImm(Pred).addReg(PredReg);			.addImm(Pred).addReg(PredReg);

	// Set up BaseKill and Base correctly to insert the ADDS/SUBS below.			// Set up BaseKill and Base correctly to insert the ADDS/SUBS below.
	Base = NewBase;			Base = NewBase;
	BaseKill = false;			BaseKill = false;
	}			}
	AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII->get(BaseOpc), NewBase), true)			if (BaseOpc == ARM::tADDrSPi) {
	.addReg(Base, getKillRegState(BaseKill)).addImm(Offset)			assert(Offset % 4 == 0 && "tADDrSPi offset is scaled by 4");
	.addImm(Pred).addReg(PredReg);			BuildMI(MBB, MBBI, dl, TII->get(BaseOpc), NewBase)
				.addReg(Base, getKillRegState(BaseKill)).addImm(Offset/4)
				.addImm(Pred).addReg(PredReg);
				} else
				AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII->get(BaseOpc), NewBase), true)
				.addReg(Base, getKillRegState(BaseKill)).addImm(Offset)
				.addImm(Pred).addReg(PredReg);
	} else {			} else {
	BuildMI(MBB, MBBI, dl, TII->get(BaseOpc), NewBase)			BuildMI(MBB, MBBI, dl, TII->get(BaseOpc), NewBase)
	.addReg(Base, getKillRegState(BaseKill)).addImm(Offset)			.addReg(Base, getKillRegState(BaseKill)).addImm(Offset)
	.addImm(Pred).addReg(PredReg).addReg(0);			.addImm(Pred).addReg(PredReg).addReg(0);
	}			}
	Base = NewBase;			Base = NewBase;
	BaseKill = true; // New base is always killed straight away.			BaseKill = true; // New base is always killed straight away.
	}			}
	▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines

	static inline unsigned getLSMultipleTransferSize(MachineInstr *MI) {			static inline unsigned getLSMultipleTransferSize(MachineInstr *MI) {
	switch (MI->getOpcode()) {			switch (MI->getOpcode()) {
	default: return 0;			default: return 0;
	case ARM::LDRi12:			case ARM::LDRi12:
	case ARM::STRi12:			case ARM::STRi12:
	case ARM::tLDRi:			case ARM::tLDRi:
	case ARM::tSTRi:			case ARM::tSTRi:
				case ARM::tLDRspi:
				case ARM::tSTRspi:
	case ARM::t2LDRi8:			case ARM::t2LDRi8:
	case ARM::t2LDRi12:			case ARM::t2LDRi12:
	case ARM::t2STRi8:			case ARM::t2STRi8:
	case ARM::t2STRi12:			case ARM::t2STRi12:
	case ARM::VLDRS:			case ARM::VLDRS:
	case ARM::VSTRS:			case ARM::VSTRS:
	return 4;			return 4;
	case ARM::VLDRD:			case ARM::VLDRD:
	▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	return MI->getOperand(1).isReg();			return MI->getOperand(1).isReg();
	case ARM::VLDRD:			case ARM::VLDRD:
	case ARM::VSTRD:			case ARM::VSTRD:
	return MI->getOperand(1).isReg();			return MI->getOperand(1).isReg();
	case ARM::LDRi12:			case ARM::LDRi12:
	case ARM::STRi12:			case ARM::STRi12:
	case ARM::tLDRi:			case ARM::tLDRi:
	case ARM::tSTRi:			case ARM::tSTRi:
				case ARM::tLDRspi:
				case ARM::tSTRspi:
	case ARM::t2LDRi8:			case ARM::t2LDRi8:
	case ARM::t2LDRi12:			case ARM::t2LDRi12:
	case ARM::t2STRi8:			case ARM::t2STRi8:
	case ARM::t2STRi12:			case ARM::t2STRi12:
	return MI->getOperand(1).isReg();			return MI->getOperand(1).isReg();
	}			}
	return false;			return false;
	}			}
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/CodeGen/ARM/2015-01-21-thumbv4t-ldstr-opt.ll

	; RUN: llc -mtriple=thumbv4t-none--eabi < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V4T			; RUN: llc -mtriple=thumbv4t-none--eabi < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V4T
	; RUN: llc -mtriple=thumbv6m-none--eabi < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V6M			; RUN: llc -mtriple=thumbv6m-none--eabi < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-V6M

	; CHECK-LABEL: foo			; CHECK-LABEL: test1
	define i32 @foo(i32 %z, ...) #0 {			define i32 @test1(i32* %p) {
	entry:
	%a = alloca i32, align 4
	%b = alloca i32, align 4
	%c = alloca i32, align 4
	%d = alloca i32, align 4
	%e = alloca i32, align 4
	%f = alloca i32, align 4
	%g = alloca i32, align 4
	%h = alloca i32, align 4

	store i32 1, i32* %a, align 4
	store i32 2, i32* %b, align 4
	store i32 3, i32* %c, align 4
	store i32 4, i32* %d, align 4
	store i32 5, i32* %e, align 4
	store i32 6, i32* %f, align 4
	store i32 7, i32* %g, align 4
	store i32 8, i32* %h, align 4

	%0 = load i32* %a, align 4
	%1 = load i32* %b, align 4
	%2 = load i32* %c, align 4
	%3 = load i32* %d, align 4
	%4 = load i32* %e, align 4
	%5 = load i32* %f, align 4
	%6 = load i32* %g, align 4
	%7 = load i32* %h, align 4

	%add = add nsw i32 %0, %1
	%add4 = add nsw i32 %add, %2
	%add5 = add nsw i32 %add4, %3
	%add6 = add nsw i32 %add5, %4
	%add7 = add nsw i32 %add6, %5
	%add8 = add nsw i32 %add7, %6
	%add9 = add nsw i32 %add8, %7

	%addz = add nsw i32 %add9, %z
	call void @llvm.va_start(i8* null)
	ret i32 %addz

	; CHECK: sub sp, #40			; Offsets less than 8 can be generated in a single add
	; CHECK-NEXT: add [[BASE:r[0-9]]], sp, #8			; CHECK: adds [[NEWBASE:r[0-9]]], r0, #4
				%1 = getelementptr inbounds i32* %p, i32 1
				%2 = getelementptr inbounds i32* %p, i32 2
				%3 = getelementptr inbounds i32* %p, i32 3
				%4 = getelementptr inbounds i32* %p, i32 4

	; CHECK-V4T: movs [[NEWBASE:r[0-9]]], [[BASE]]
	; CHECK-V6M: mov [[NEWBASE:r[0-9]]], [[BASE]]
	; CHECK-NEXT: adds [[NEWBASE]], #8
	; CHECK-NEXT: ldm [[NEWBASE]],			; CHECK-NEXT: ldm [[NEWBASE]],
				%5 = load i32* %1, align 4
				%6 = load i32* %2, align 4
				%7 = load i32* %3, align 4
				%8 = load i32* %4, align 4

				%9 = add nsw i32 %5, %6
				%10 = add nsw i32 %9, %7
				%11 = add nsw i32 %10, %8
				ret i32 %11
	}			}

	declare void @llvm.va_start(i8*) nounwind			; CHECK-LABEL: test2
				define i32 @test2(i32* %p) {

				; Offsets >=8 require a mov and an add
				; CHECK-V4T: movs [[NEWBASE:r[0-9]]], r0
				; CHECK-V6M: mov [[NEWBASE:r[0-9]]], r0
				; CHECK-NEXT: adds [[NEWBASE]], #8
				%1 = getelementptr inbounds i32* %p, i32 2
				%2 = getelementptr inbounds i32* %p, i32 3
				%3 = getelementptr inbounds i32* %p, i32 4
				%4 = getelementptr inbounds i32* %p, i32 5

				; CHECK-NEXT: ldm [[NEWBASE]],
				%5 = load i32* %1, align 4
				%6 = load i32* %2, align 4
				%7 = load i32* %3, align 4
				%8 = load i32* %4, align 4

				%9 = add nsw i32 %5, %6
				%10 = add nsw i32 %9, %7
				%11 = add nsw i32 %10, %8
				ret i32 %11
				}

test/CodeGen/ARM/atomic-ops-v8.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines

	define void @test_atomic_store_monotonic_regoff_i8(i64 %base, i64 %off, i8 %val) nounwind {			define void @test_atomic_store_monotonic_regoff_i8(i64 %base, i64 %off, i8 %val) nounwind {
	; CHECK-LABEL: test_atomic_store_monotonic_regoff_i8:			; CHECK-LABEL: test_atomic_store_monotonic_regoff_i8:

	%addr_int = add i64 %base, %off			%addr_int = add i64 %base, %off
	%addr = inttoptr i64 %addr_int to i8*			%addr = inttoptr i64 %addr_int to i8*

	store atomic i8 %val, i8* %addr monotonic, align 1			store atomic i8 %val, i8* %addr monotonic, align 1
	; CHECK-LE: ldrb{{(\.w)?}} [[VAL:r[0-9]+]], [sp]			; CHECK-LE: ldr{{b?(\.w)?}} [[VAL:r[0-9]+]], [sp]
	; CHECK-LE: strb [[VAL]], [r0, r2]			; CHECK-LE: strb [[VAL]], [r0, r2]
	; CHECK-BE: ldrb{{(\.w)?}} [[VAL:r[0-9]+]], [sp, #3]			; CHECK-BE: ldrb{{(\.w)?}} [[VAL:r[0-9]+]], [sp, #3]
	; CHECK-BE: strb [[VAL]], [r1, r3]			; CHECK-BE: strb [[VAL]], [r1, r3]

	ret void			ret void
	}			}

	define void @test_atomic_store_release_i8(i8 %val) nounwind {			define void @test_atomic_store_release_i8(i8 %val) nounwind {
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/CodeGen/ARM/debug-frame-vararg.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK-FP-ELIM: .cfi_offset r4, -32			; CHECK-FP-ELIM: .cfi_offset r4, -32
	; CHECK-FP-ELIM: add r11, sp, #8			; CHECK-FP-ELIM: add r11, sp, #8
	; CHECK-FP-ELIM: .cfi_def_cfa r11, 24			; CHECK-FP-ELIM: .cfi_def_cfa r11, 24

	; CHECK-THUMB-FP-LABEL: sum			; CHECK-THUMB-FP-LABEL: sum
	; CHECK-THUMB-FP: .cfi_startproc			; CHECK-THUMB-FP: .cfi_startproc
	; CHECK-THUMB-FP: sub sp, #16			; CHECK-THUMB-FP: sub sp, #16
	; CHECK-THUMB-FP: .cfi_def_cfa_offset 16			; CHECK-THUMB-FP: .cfi_def_cfa_offset 16
	; CHECK-THUMB-FP: push {r4, r5, r7, lr}			; CHECK-THUMB-FP: push {r4, lr}
	; CHECK-THUMB-FP: .cfi_def_cfa_offset 32			; CHECK-THUMB-FP: .cfi_def_cfa_offset 24
	; CHECK-THUMB-FP: .cfi_offset lr, -20			; CHECK-THUMB-FP: .cfi_offset lr, -20
	; CHECK-THUMB-FP: .cfi_offset r7, -24			; CHECK-THUMB-FP: .cfi_offset r4, -24
	; CHECK-THUMB-FP: .cfi_offset r5, -28
	; CHECK-THUMB-FP: .cfi_offset r4, -32
	; CHECK-THUMB-FP: sub sp, #8			; CHECK-THUMB-FP: sub sp, #8
	; CHECK-THUMB-FP: .cfi_def_cfa_offset 40			; CHECK-THUMB-FP: .cfi_def_cfa_offset 32

	; CHECK-THUMB-FP-ELIM-LABEL: sum			; CHECK-THUMB-FP-ELIM-LABEL: sum
	; CHECK-THUMB-FP-ELIM: .cfi_startproc			; CHECK-THUMB-FP-ELIM: .cfi_startproc
	; CHECK-THUMB-FP-ELIM: sub sp, #16			; CHECK-THUMB-FP-ELIM: sub sp, #16
	; CHECK-THUMB-FP-ELIM: .cfi_def_cfa_offset 16			; CHECK-THUMB-FP-ELIM: .cfi_def_cfa_offset 16
	; CHECK-THUMB-FP-ELIM: push {r4, r5, r7, lr}			; CHECK-THUMB-FP-ELIM: push {r4, r6, r7, lr}
	; CHECK-THUMB-FP-ELIM: .cfi_def_cfa_offset 32			; CHECK-THUMB-FP-ELIM: .cfi_def_cfa_offset 32
	; CHECK-THUMB-FP-ELIM: .cfi_offset lr, -20			; CHECK-THUMB-FP-ELIM: .cfi_offset lr, -20
	; CHECK-THUMB-FP-ELIM: .cfi_offset r7, -24			; CHECK-THUMB-FP-ELIM: .cfi_offset r7, -24
	; CHECK-THUMB-FP-ELIM: .cfi_offset r5, -28			; CHECK-THUMB-FP-ELIM: .cfi_offset r6, -28
	; CHECK-THUMB-FP-ELIM: .cfi_offset r4, -32			; CHECK-THUMB-FP-ELIM: .cfi_offset r4, -32
	; CHECK-THUMB-FP-ELIM: add r7, sp, #8			; CHECK-THUMB-FP-ELIM: add r7, sp, #8
	; CHECK-THUMB-FP-ELIM: .cfi_def_cfa r7, 24			; CHECK-THUMB-FP-ELIM: .cfi_def_cfa r7, 24

	define i32 @sum(i32 %count, ...) {			define i32 @sum(i32 %count, ...) {
	entry:			entry:
	%vl = alloca i8*, align 4			%vl = alloca i8*, align 4
	%vl1 = bitcast i8** %vl to i8*			%vl1 = bitcast i8** %vl to i8*
	Show All 26 Lines

test/CodeGen/ARM/frame-register.ll

Show All 24 Lines	entry:
%2 = load i32* %j, align 4		%2 = load i32* %j, align 4
%add1 = add nsw i32 %2, 1		%add1 = add nsw i32 %2, 1
ret i32 %add1		ret i32 %add1
}		}

; CHECK-ARM: push {r11, lr}		; CHECK-ARM: push {r11, lr}
; CHECK-ARM: mov r11, sp		; CHECK-ARM: mov r11, sp

; CHECK-THUMB: push {r4, r6, r7, lr}		; CHECK-THUMB: push {r7, lr}
; CHECK-THUMB: add r7, sp, #8		; CHECK-THUMB: add r7, sp, #0

; CHECK-DARWIN-ARM: push {r7, lr}		; CHECK-DARWIN-ARM: push {r7, lr}
; CHECK-DARWIN-THUMB: push {r4, r7, lr}		; CHECK-DARWIN-THUMB: push {r7, lr}

test/CodeGen/ARM/thumb1-varalloc.ll

	Show All 37 Lines
	; CHECK-NEXT: mov sp, r4			; CHECK-NEXT: mov sp, r4
	; CHECK-NEXT: pop {r4, r5, r6, r7, pc}			; CHECK-NEXT: pop {r4, r5, r6, r7, pc}
	ret i8* %.0			ret i8* %.0
	}			}

	declare noalias i8* @strdup(i8* nocapture) nounwind			declare noalias i8* @strdup(i8* nocapture) nounwind
	declare i32 @_called_func(i8, i32) nounwind			declare i32 @_called_func(i8, i32) nounwind

	; Variable ending up at unaligned offset from sp (i.e. not a multiple of 4)
	define void @test_local_var_addr() {
	; CHECK-LABEL: test_local_var_addr:

	%addr1 = alloca i8
	%addr2 = alloca i8

	; CHECK: mov r0, sp
	; CHECK: adds r0, #{{[0-9]+}}
	; CHECK: blx
	call void @take_ptr(i8* %addr1)

	; CHECK: mov r0, sp
	; CHECK: adds r0, #{{[0-9]+}}
	; CHECK: blx
	call void @take_ptr(i8* %addr2)

	ret void
	}

	; Simple variable ending up at sp.			; Simple variable ending up at sp.
	define void @test_simple_var() {			define void @test_simple_var() {
	; CHECK-LABEL: test_simple_var:			; CHECK-LABEL: test_simple_var:

	%addr32 = alloca i32			%addr32 = alloca i32
	%addr8 = bitcast i32* %addr32 to i8*			%addr8 = bitcast i32* %addr32 to i8*

	; CHECK: mov r0, sp			; CHECK: mov r0, sp
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

	; CHECK: add r0, sp, #1020			; CHECK: add r0, sp, #1020
	; CHECK-NEXT: blx			; CHECK-NEXT: blx
	call void @take_ptr(i8* %addr1)			call void @take_ptr(i8* %addr1)

	ret void			ret void
	}			}

	; Max range addressable with tADDrSPi + tADDi8			; Max range addressable with tADDrSPi + tADDi8 is 1275, however the automatic
	define void @test_local_var_offset_1275() {			; 4-byte aligning of objects on the stack combined with 8-byte stack alignment
	; CHECK-LABEL: test_local_var_offset_1275			; means that 1268 is the max offset we can use.
				define void @test_local_var_offset_1268() {
				; CHECK-LABEL: test_local_var_offset_1268
	%addr1 = alloca i8, i32 1			%addr1 = alloca i8, i32 1
	%addr2 = alloca i8, i32 1275			%addr2 = alloca i8, i32 1268

	; CHECK: add r0, sp, #1020			; CHECK: add r0, sp, #1020
	; CHECK: adds r0, #255			; CHECK: adds r0, #248
	; CHECK-NEXT: blx			; CHECK-NEXT: blx
	call void @take_ptr(i8* %addr1)			call void @take_ptr(i8* %addr1)

	ret void			ret void
	}			}

	declare void @take_ptr(i8*)			declare void @take_ptr(i8*)

test/CodeGen/ARM/thumb1_return_sequence.ll

	; RUN: llc -mtriple=thumbv4t-none--eabi < %s \| FileCheck %s --check-prefix=CHECK-V4T			; RUN: llc -mtriple=thumbv4t-none--eabi < %s \| FileCheck %s --check-prefix=CHECK-V4T
	; RUN: llc -mtriple=thumbv5t-none--eabi < %s \| FileCheck %s --check-prefix=CHECK-V5T			; RUN: llc -mtriple=thumbv5t-none--eabi < %s \| FileCheck %s --check-prefix=CHECK-V5T

	; CHECK-V4T-LABEL: clobberframe			; CHECK-V4T-LABEL: clobberframe
	; CHECK-V5T-LABEL: clobberframe			; CHECK-V5T-LABEL: clobberframe
	define <4 x i32> @clobberframe() #0 {			define <4 x i32> @clobberframe(<6 x i32>* %p) #0 {
	entry:			entry:
	; Prologue			; Prologue
	; --------			; --------
	; CHECK-V4T: push {[[SAVED:(r[4567](, )?)+]], lr}			; CHECK-V4T: push {[[SAVED:(r[4567](, )?)+]], lr}
	; CHECK-V4T: sub sp,			; CHECK-V4T: sub sp,
	; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}			; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}

	%b = alloca <4 x i32>, align 16			%b = alloca <6 x i32>, align 16
	%a = alloca <4 x i32>, align 16			%a = alloca <4 x i32>, align 16
	store <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32>* %b, align 16			%stuff = load <6 x i32>* %p, align 16
				store <6 x i32> %stuff, <6 x i32>* %b, align 16
	store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %a, align 16			store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %a, align 16
	%0 = load <4 x i32>* %a, align 16			%0 = load <4 x i32>* %a, align 16
	ret <4 x i32> %0			ret <4 x i32> %0

	; Epilogue			; Epilogue
	; --------			; --------
	; CHECK-V4T: add sp,			; CHECK-V4T: add sp,
	; CHECK-V4T-NEXT: pop {[[SAVED]]}			; CHECK-V4T-NEXT: pop {[[SAVED]]}
	Show All 40 Lines
	; CHECK-V5T-NEXT: add sp,			; CHECK-V5T-NEXT: add sp,
	; CHECK-V5T-NEXT: mov lr, r3			; CHECK-V5T-NEXT: mov lr, r3
	; CHECK-V5T-NEXT: mov r3, r12			; CHECK-V5T-NEXT: mov r3, r12
	; CHECK-V5T-NEXT: bx lr			; CHECK-V5T-NEXT: bx lr
	}			}

	; CHECK-V4T-LABEL: simpleframe			; CHECK-V4T-LABEL: simpleframe
	; CHECK-V5T-LABEL: simpleframe			; CHECK-V5T-LABEL: simpleframe
	define i32 @simpleframe() #0 {			define i32 @simpleframe(<6 x i32>* %p) #0 {
	entry:			entry:
	; Prologue			; Prologue
	; --------			; --------
	; CHECK-V4T: push {[[SAVED:(r[4567](, )?)+]], lr}			; CHECK-V4T: push {[[SAVED:(r[4567](, )?)+]], lr}
	; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}			; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}

	%a = alloca i32, align 4			%0 = load <6 x i32>* %p, align 16
	%b = alloca i32, align 4			%1 = extractelement <6 x i32> %0, i32 0
	%c = alloca i32, align 4			%2 = extractelement <6 x i32> %0, i32 1
	%d = alloca i32, align 4			%3 = extractelement <6 x i32> %0, i32 2
	store i32 1, i32* %a, align 4			%4 = extractelement <6 x i32> %0, i32 3
	store i32 2, i32* %b, align 4			%5 = extractelement <6 x i32> %0, i32 4
	store i32 3, i32* %c, align 4			%6 = extractelement <6 x i32> %0, i32 5
	store i32 4, i32* %d, align 4			%add1 = add nsw i32 %1, %2
	%0 = load i32* %a, align 4			%add2 = add nsw i32 %add1, %3
	%inc = add nsw i32 %0, 1			%add3 = add nsw i32 %add2, %4
	store i32 %inc, i32* %a, align 4			%add4 = add nsw i32 %add3, %5
	%1 = load i32* %b, align 4			%add5 = add nsw i32 %add4, %6
	%inc1 = add nsw i32 %1, 1
	store i32 %inc1, i32* %b, align 4
	%2 = load i32* %c, align 4
	%inc2 = add nsw i32 %2, 1
	store i32 %inc2, i32* %c, align 4
	%3 = load i32* %d, align 4
	%inc3 = add nsw i32 %3, 1
	store i32 %inc3, i32* %d, align 4
	%4 = load i32* %a, align 4
	%5 = load i32* %b, align 4
	%add = add nsw i32 %4, %5
	%6 = load i32* %c, align 4
	%add4 = add nsw i32 %add, %6
	%7 = load i32* %d, align 4
	%add5 = add nsw i32 %add4, %7
	ret i32 %add5			ret i32 %add5

	; Epilogue			; Epilogue
	; --------			; --------
	; CHECK-V4T: pop {[[SAVED]]}			; CHECK-V4T: pop {[[SAVED]]}
	; CHECK-V4T: pop {r3}			; CHECK-V4T: pop {r3}
	; CHECK-V4T: bx r3			; CHECK-V4T: bx r3
	; CHECK-V5T: pop {[[SAVED]], pc}			; CHECK-V5T: pop {[[SAVED]], pc}
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/CodeGen/Thumb/stack-access.ll

This file was added.

				; RUN: llc -mtriple=thumb-eabi < %s -o - \| FileCheck %s

				; Check that stack addresses are generated using a single ADD
				define void @test1(i8** %p) {
				%x = alloca i8, align 1
				%y = alloca i8, align 1
				%z = alloca i8, align 1
				; CHECK: add r1, sp, #8
				; CHECK: str r1, [r0]
				store i8* %x, i8** %p, align 4
				; CHECK: add r1, sp, #4
				; CHECK: str r1, [r0]
				store i8* %y, i8** %p, align 4
				; CHECK: mov r1, sp
				; CHECK: str r1, [r0]
				store i8* %z, i8** %p, align 4
				ret void
				}

				; Stack offsets larger than 1020 still need two ADDs
				define void @test2([1024 x i8]** %p) {
				%arr1 = alloca [1024 x i8], align 1
				%arr2 = alloca [1024 x i8], align 1
				; CHECK: add r1, sp, #1020
				; CHECK: adds r1, #4
				; CHECK: str r1, [r0]
				store [1024 x i8]* %arr1, [1024 x i8]** %p, align 4
				; CHECK: mov r1, sp
				; CHECK: str r1, [r0]
				store [1024 x i8]* %arr2, [1024 x i8]** %p, align 4
				ret void
				}

				; If possible stack-based lrdb/ldrh are widened to use SP-based addressing
				define i32 @test3() #0 {
				%x = alloca i8, align 1
				%y = alloca i8, align 1
				; CHECK: ldr r0, [sp]
				%1 = load i8* %x, align 1
				; CHECK: ldr r1, [sp, #4]
				%2 = load i8* %y, align 1
				%3 = add nsw i8 %1, %2
				%4 = zext i8 %3 to i32
				ret i32 %4
				}

				define i32 @test4() #0 {
				%x = alloca i16, align 2
				%y = alloca i16, align 2
				; CHECK: ldr r0, [sp]
				%1 = load i16* %x, align 2
				; CHECK: ldr r1, [sp, #4]
				%2 = load i16* %y, align 2
				%3 = add nsw i16 %1, %2
				%4 = zext i16 %3 to i32
				ret i32 %4
				}

				; Don't widen if the value needs to be zero-extended
				define zeroext i8 @test5() {
				%x = alloca i8, align 1
				; CHECK: mov r0, sp
				; CHECK: ldrb r0, [r0]
				%1 = load i8* %x, align 1
				ret i8 %1
				}

				define zeroext i16 @test6() {
				%x = alloca i16, align 2
				; CHECK: mov r0, sp
				; CHECK: ldrh r0, [r0]
				%1 = load i16* %x, align 2
				ret i16 %1
				}

test/CodeGen/Thumb/stm-merge.ll

	; RUN: llc -mtriple=thumbv6m-eabi -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv6m-eabi -verify-machineinstrs %s -o - \| FileCheck %s
	target datalayout = "e-m:e-p:32:32-i1:8:32-i8:8:32-i16:16:32-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-i1:8:32-i8:8:32-i16:16:32-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv6m--linux-gnueabi"			target triple = "thumbv6m--linux-gnueabi"

	@d = internal unnamed_addr global i32 0, align 4			@d = internal unnamed_addr global i32 0, align 4
	@c = internal global i32* null, align 4			@c = internal global i32* null, align 4
	@e = internal unnamed_addr global i32* null, align 4			@e = internal unnamed_addr global i32* null, align 4

	; Function Attrs: nounwind optsize			; Function Attrs: nounwind optsize
	define void @fn1() #0 {			define void @fn1(i32 %x, i32 %y, i32 %z) #0 {
	entry:			entry:
	; CHECK-LABEL: fn1:			; CHECK-LABEL: fn1:
	; CHECK: stm r[[BASE:[0-9]]]!, {{.*}}			; CHECK: stm r[[BASE:[0-9]]]!, {{.*}}
	; CHECK-NOT: {{.*}} r[[BASE]]			; CHECK-NOT: {{.*}} r[[BASE]]
	; CHECK: ldr r[[BASE]], {{.*}}
	%g = alloca i32, align 4			%g = alloca i32, align 4
	%h = alloca i32, align 4			%h = alloca i32, align 4
	store i32 1, i32* %g, align 4			%i = alloca i32, align 4
	store i32 0, i32* %h, align 4			store i32 %x, i32* %i, align 4
				store i32 %y, i32* %h, align 4
				store i32 %z, i32* %g, align 4
	%.pr = load i32* @d, align 4			%.pr = load i32* @d, align 4
	%cmp11 = icmp slt i32 %.pr, 1			%cmp11 = icmp slt i32 %.pr, 1
	br i1 %cmp11, label %for.inc.lr.ph, label %for.body5			br i1 %cmp11, label %for.inc.lr.ph, label %for.body5

	for.inc.lr.ph: ; preds = %entry			for.inc.lr.ph: ; preds = %entry
	store i32 1, i32* @d, align 4			store i32 1, i32* @d, align 4
	br label %for.body5			br label %for.body5

	Show All 13 Lines

test/CodeGen/Thumb/vargs.ll

	; RUN: llc -mtriple=thumb-eabi %s -o /dev/null			; RUN: llc -mtriple=thumb-eabi %s -o /dev/null
	; RUN: llc -mtriple=thumb-linux %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumb-linux %s -o - \| FileCheck %s
	; RUN: llc -mtriple=thumb-darwin %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumb-darwin %s -o - \| FileCheck %s

	@str = internal constant [4 x i8] c"%d\0A\00" ; <[4 x i8]*> [#uses=1]			@str = internal constant [4 x i8] c"%d\0A\00" ; <[4 x i8]*> [#uses=1]

	define void @f(i32 %a, ...) {			define void @f(i32 %a, ...) {
	entry:			entry:
				; Check that space is reserved above the pushed lr for variadic argument
				; registers to be stored in.
				; CHECK: sub sp, #[[IMM:[0-9]+]]
				; CHECK: push
	%va = alloca i8, align 4 ; <i8*> [#uses=4]			%va = alloca i8, align 4 ; <i8*> [#uses=4]
	%va.upgrd.1 = bitcast i8** %va to i8* ; <i8*> [#uses=1]			%va.upgrd.1 = bitcast i8** %va to i8* ; <i8*> [#uses=1]
	call void @llvm.va_start( i8* %va.upgrd.1 )			call void @llvm.va_start( i8* %va.upgrd.1 )
	br label %bb			br label %bb

	bb: ; preds = %bb, %entry			bb: ; preds = %bb, %entry
	%a_addr.0 = phi i32 [ %a, %entry ], [ %tmp5, %bb ] ; <i32> [#uses=2]			%a_addr.0 = phi i32 [ %a, %entry ], [ %tmp5, %bb ] ; <i32> [#uses=2]
	%tmp = load volatile i8** %va ; <i8*> [#uses=2]			%tmp = load volatile i8** %va ; <i8*> [#uses=2]
	%tmp2 = getelementptr i8* %tmp, i32 4 ; <i8*> [#uses=1]			%tmp2 = getelementptr i8* %tmp, i32 4 ; <i8*> [#uses=1]
	store volatile i8* %tmp2, i8** %va			store volatile i8* %tmp2, i8** %va
	%tmp5 = add i32 %a_addr.0, -1 ; <i32> [#uses=1]			%tmp5 = add i32 %a_addr.0, -1 ; <i32> [#uses=1]
	%tmp.upgrd.2 = icmp eq i32 %a_addr.0, 1 ; <i1> [#uses=1]			%tmp.upgrd.2 = icmp eq i32 %a_addr.0, 1 ; <i1> [#uses=1]
	br i1 %tmp.upgrd.2, label %bb7, label %bb			br i1 %tmp.upgrd.2, label %bb7, label %bb

	bb7: ; preds = %bb			bb7: ; preds = %bb
	%tmp3 = bitcast i8* %tmp to i32* ; <i32*> [#uses=1]			%tmp3 = bitcast i8* %tmp to i32* ; <i32*> [#uses=1]
	%tmp.upgrd.3 = load i32* %tmp3 ; <i32> [#uses=1]			%tmp.upgrd.3 = load i32* %tmp3 ; <i32> [#uses=1]
	%tmp10 = call i32 (i8, ...) @printf( i8* getelementptr ([4 x i8]* @str, i32 0, i64 0), i32 %tmp.upgrd.3 ) ; <i32> [#uses=0]			%tmp10 = call i32 (i8, ...) @printf( i8* getelementptr ([4 x i8]* @str, i32 0, i64 0), i32 %tmp.upgrd.3 ) ; <i32> [#uses=0]
	%va.upgrd.4 = bitcast i8** %va to i8* ; <i8*> [#uses=1]			%va.upgrd.4 = bitcast i8** %va to i8* ; <i8*> [#uses=1]
	call void @llvm.va_end( i8* %va.upgrd.4 )			call void @llvm.va_end( i8* %va.upgrd.4 )
	ret void			ret void

				; The return sequence should pop the lr to r3, recover the stack space used to
				; store variadic argument registers, then return via r3. Possibly there is a pop
				; before this, but only if the function happened to use callee-saved registers.
				; CHECK: pop {r3}
				; CHECK: add sp, #[[IMM]]
				; CHECK: bx r3
	}			}

	declare void @llvm.va_start(i8*)			declare void @llvm.va_start(i8*)

	declare i32 @printf(i8*, ...)			declare i32 @printf(i8*, ...)

	declare void @llvm.va_end(i8*)			declare void @llvm.va_end(i8*)

	; CHECK: pop
	; CHECK: pop
	; CHECK-NOT: pop

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Improve handling of stack accesses in Thumb-1.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 20177

lib/Target/ARM/ARMBaseRegisterInfo.cpp

lib/Target/ARM/ARMISelDAGToDAG.cpp

lib/Target/ARM/ARMInstrThumb.td

lib/Target/ARM/ARMLoadStoreOptimizer.cpp

test/CodeGen/ARM/2015-01-21-thumbv4t-ldstr-opt.ll

test/CodeGen/ARM/atomic-ops-v8.ll

test/CodeGen/ARM/debug-frame-vararg.ll

test/CodeGen/ARM/frame-register.ll

test/CodeGen/ARM/thumb1-varalloc.ll

test/CodeGen/ARM/thumb1_return_sequence.ll

test/CodeGen/Thumb/stack-access.ll

test/CodeGen/Thumb/stm-merge.ll

test/CodeGen/Thumb/vargs.ll

[ARM] Improve handling of stack accesses in Thumb-1.
ClosedPublic