This is an archive of the discontinued LLVM Phabricator instance.

Differential D6211

[Thumb1] Re-write emitThumbRegPlusImmediate
ClosedPublic

Authored by olista01 on Nov 11 2014, 2:02 AM.

Download Raw Diff

Details

Reviewers

t.p.northover

Summary

This was motivated by a bug which caused code like this to be
miscompiled:

declare void @take_ptr(i8*)
define void @test() {
  %addr1.32 = alloca i8
  %addr2.32 = alloca i32, i32 1028
  call void @take_ptr(i8* %addr1)
  ret void
}

This was emitting the following assembly to get the value of %addr1:

add r0, sp, #1020
add r0, r0, #8

However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:

add r0, sp, #1020
add r0, sp, #8

This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.

Diff Detail

Event Timeline

olista01 updated this revision to Diff 16031.Nov 11 2014, 2:02 AM

olista01 retitled this revision from to [Thumb1] Re-write emitThumbRegPlusImmediate.

olista01 updated this object.

olista01 edited the test plan for this revision. (Show Details)

olista01 added a reviewer: t.p.northover.

olista01 set the repository for this revision to rL LLVM.

olista01 added a subscriber: Unknown Object (MLST).

Hi Oliver,

I'm not sure how many comments I've got really, I suspect I've picked up on the same issue repeatedly in many cases. But still...

lib/Target/ARM/Thumb1RegisterInfo.cpp
146–147	Why not just fix the implementation of RoundUpToAlignment? Clang seems quite capable of optimising "/Align * Align" when Align is a power of 2, and it looks like the function gets inlined at all of its callsites.
228–259	isMul4 will be unused in a release build, I think, causing a warning.
242–245	This doesn't seem to line up with what actually gets implemented. In particular, I think we're in deep trouble if the RangeAfterCopy actually does get rounded up.
248	I think this overflows if RequiredExtraInstrs = UINT_MAX and there's a copy instruction.
272–273	I don't think this works with an unaligned copy but an aligned extra. That case doesn't seem to exist anyway, but perhaps it should more clear if that's intentional (comments, in the assert, ...).

olista01 added inline comments.Nov 13 2014, 5:40 AM

lib/Target/ARM/Thumb1RegisterInfo.cpp
242–245	I'm not sure I follow. This should round RequiredExtraInstrs up to the lowest integer number of instructions that could be used, and the loop then handles as much of the immediate as possible with each instruction, so these should match up. Can you give an example where this doesn't work?

Use existing RoundUpToAlignment
Remove isMul4, as it is unused in release builds
Assert on a case we could handle, but currently don't (and don't need to)
Don't fall back to the const pool when we don't have an extra instruction but also don't need one (I think this case is currently unused)

Hi Oliver,

Thanks for updating the patch. To explain my previous comment a bit more:

lib/Target/ARM/Thumb1RegisterInfo.cpp
242–245	I think it falls into the limited alignment handling comments If RangeAfterCopy has been rounded up here, then that means "(Bytes - CopyRange) % ExtraRange != 0". But "Bytes - CopyRange" is precisely what we'll have to emit with the extra instructions (Copy greedily takes as many bytes as possible). This is impossible. I don't believe the case actually happens, because when ExtraRange != 1, we only ever emit a MOV so CopyRange == 0. But that means the clause is untested and untestable, of course: another reason to remove it.

olista01 added inline comments.Nov 14 2014, 9:18 AM

lib/Target/ARM/Thumb1RegisterInfo.cpp
242–245	I think that the assertion on line 230 (second version of the patch) should catch this case, but I'll add one closer to here to be sure.

Add an extra assertion to catch the case where RangeAfterCopy is not aligned, but CopyOpc requires an aligned immediate.

Ah, sorry, turns out I was misreading just what alignment we were rounding up to. No wonder you were confused.

I think this looks reasonable. Fingers crossed we haven't missed anything; frame lowering is a nightmare.

Tim.

This revision is now accepted and ready to land.Nov 14 2014, 3:27 PM

Thanks, committed revision 222125.

Revision Contents

Path

Size

include/

llvm/

Support/

MathExtras.h

4 lines

lib/

Target/

ARM/

Thumb1RegisterInfo.cpp

270 lines

test/

CodeGen/

ARM/

thumb1-varalloc.ll

47 lines

Thumb/

large-stack.ll

50 lines

Diff 16220

include/llvm/Support/MathExtras.h

	Show First 20 Lines • Show All 589 Lines • ▼ Show 20 Lines
	/// Returns the next integer (mod 2**64) that is greater than or equal to			/// Returns the next integer (mod 2**64) that is greater than or equal to
	/// \p Value and is a multiple of \p Align. \p Align must be non-zero.			/// \p Value and is a multiple of \p Align. \p Align must be non-zero.
	///			///
	/// Examples:			/// Examples:
	/// \code			/// \code
	/// RoundUpToAlignment(5, 8) = 8			/// RoundUpToAlignment(5, 8) = 8
	/// RoundUpToAlignment(17, 8) = 24			/// RoundUpToAlignment(17, 8) = 24
	/// RoundUpToAlignment(~0LL, 8) = 0			/// RoundUpToAlignment(~0LL, 8) = 0
				/// RoundUpToAlignment(321, 255) = 510
	/// \endcode			/// \endcode
	inline uint64_t RoundUpToAlignment(uint64_t Value, uint64_t Align) {			inline uint64_t RoundUpToAlignment(uint64_t Value, uint64_t Align) {
	assert(isPowerOf2_64(Align) && "Alignment must be power of 2!");			return (Value + Align - 1) / Align * Align;
	return (Value + Align - 1) & ~uint64_t(Align - 1);
	}			}

	/// Returns the offset to the next integer (mod 2**64) that is greater than			/// Returns the offset to the next integer (mod 2**64) that is greater than
	/// or equal to \p Value and is a multiple of \p Align. \p Align must be			/// or equal to \p Value and is a multiple of \p Align. \p Align must be
	/// non-zero.			/// non-zero.
	inline uint64_t OffsetToAlignment(uint64_t Value, uint64_t Align) {			inline uint64_t OffsetToAlignment(uint64_t Value, uint64_t Align) {
	return RoundUpToAlignment(Value, Align) - Value;			return RoundUpToAlignment(Value, Align) - Value;
	}			}
	Show All 36 Lines

lib/Target/ARM/Thumb1RegisterInfo.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
void		void
Thumb1RegisterInfo::emitLoadConstPool(MachineBasicBlock &MBB,		Thumb1RegisterInfo::emitLoadConstPool(MachineBasicBlock &MBB,
MachineBasicBlock::iterator &MBBI,		MachineBasicBlock::iterator &MBBI,
DebugLoc dl,		DebugLoc dl,
unsigned DestReg, unsigned SubIdx,		unsigned DestReg, unsigned SubIdx,
int Val,		int Val,
ARMCC::CondCodes Pred, unsigned PredReg,		ARMCC::CondCodes Pred, unsigned PredReg,
unsigned MIFlags) const {		unsigned MIFlags) const {
		assert((isARMLowRegister(DestReg) \|\|
		isVirtualRegister(DestReg)) &&
		"Thumb1 does not have ldr to high register");

MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
MachineConstantPool *ConstantPool = MF.getConstantPool();		MachineConstantPool *ConstantPool = MF.getConstantPool();
const Constant *C = ConstantInt::get(		const Constant *C = ConstantInt::get(
Type::getInt32Ty(MBB.getParent()->getFunction()->getContext()), Val);		Type::getInt32Ty(MBB.getParent()->getFunction()->getContext()), Val);
unsigned Idx = ConstantPool->getConstantPoolIndex(C, 4);		unsigned Idx = ConstantPool->getConstantPoolIndex(C, 4);

BuildMI(MBB, MBBI, dl, TII.get(ARM::tLDRpci))		BuildMI(MBB, MBBI, dl, TII.get(ARM::tLDRpci))
Show All 24 Lines	void emitThumbRegPlusImmInReg(MachineBasicBlock &MBB,
// if either base or dest register is a high register. Also, if do not		// if either base or dest register is a high register. Also, if do not
// issue sub as part of the sequence if condition register is to be		// issue sub as part of the sequence if condition register is to be
// preserved.		// preserved.
if (NumBytes < 0 && !isHigh && CanChangeCC) {		if (NumBytes < 0 && !isHigh && CanChangeCC) {
isSub = true;		isSub = true;
NumBytes = -NumBytes;		NumBytes = -NumBytes;
}		}
unsigned LdReg = DestReg;		unsigned LdReg = DestReg;
if (DestReg == ARM::SP) {		if (DestReg == ARM::SP)
assert(BaseReg == ARM::SP && "Unexpected!");		assert(BaseReg == ARM::SP && "Unexpected!");
		if (!isARMLowRegister(DestReg) && !MRI.isVirtualRegister(DestReg))
LdReg = MF.getRegInfo().createVirtualRegister(&ARM::tGPRRegClass);		LdReg = MF.getRegInfo().createVirtualRegister(&ARM::tGPRRegClass);
}

if (NumBytes <= 255 && NumBytes >= 0)		if (NumBytes <= 255 && NumBytes >= 0 && CanChangeCC) {
AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVi8), LdReg))		AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVi8), LdReg))
.addImm(NumBytes).setMIFlags(MIFlags);		.addImm(NumBytes).setMIFlags(MIFlags);
else if (NumBytes < 0 && NumBytes >= -255) {		} else if (NumBytes < 0 && NumBytes >= -255 && CanChangeCC) {
AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVi8), LdReg))		AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVi8), LdReg))
.addImm(NumBytes).setMIFlags(MIFlags);		.addImm(NumBytes).setMIFlags(MIFlags);
AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII.get(ARM::tRSB), LdReg))		AddDefaultT1CC(BuildMI(MBB, MBBI, dl, TII.get(ARM::tRSB), LdReg))
.addReg(LdReg, RegState::Kill).setMIFlags(MIFlags);		.addReg(LdReg, RegState::Kill).setMIFlags(MIFlags);
} else		} else
MRI.emitLoadConstPool(MBB, MBBI, dl, LdReg, 0, NumBytes,		MRI.emitLoadConstPool(MBB, MBBI, dl, LdReg, 0, NumBytes,
ARMCC::AL, 0, MIFlags);		ARMCC::AL, 0, MIFlags);

// Emit add / sub.		// Emit add / sub.
int Opc = (isSub) ? ARM::tSUBrr : (isHigh ? ARM::tADDhirr : ARM::tADDrr);		int Opc = (isSub) ? ARM::tSUBrr : ((isHigh \|\| !CanChangeCC) ? ARM::tADDhirr
		: ARM::tADDrr);
MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
BuildMI(MBB, MBBI, dl, TII.get(Opc), DestReg);		BuildMI(MBB, MBBI, dl, TII.get(Opc), DestReg);
if (Opc != ARM::tADDhirr)		if (Opc != ARM::tADDhirr)
MIB = AddDefaultT1CC(MIB);		MIB = AddDefaultT1CC(MIB);
if (DestReg == ARM::SP \|\| isSub)		if (DestReg == ARM::SP \|\| isSub)
MIB.addReg(BaseReg).addReg(LdReg, RegState::Kill);		MIB.addReg(BaseReg).addReg(LdReg, RegState::Kill);
else		else
MIB.addReg(LdReg).addReg(BaseReg, RegState::Kill);		MIB.addReg(LdReg).addReg(BaseReg, RegState::Kill);
AddDefaultPred(MIB);		AddDefaultPred(MIB);
}		}

/// calcNumMI - Returns the number of instructions required to materialize
/// the specific add / sub r, c instruction.
static unsigned calcNumMI(int Opc, int ExtraOpc, unsigned Bytes,
unsigned NumBits, unsigned Scale) {
unsigned NumMIs = 0;
unsigned Chunk = ((1 << NumBits) - 1) * Scale;

if (Opc == ARM::tADDrSPi) {
unsigned ThisVal = (Bytes > Chunk) ? Chunk : Bytes;
Bytes -= ThisVal;
NumMIs++;
NumBits = 8;
Scale = 1; // Followed by a number of tADDi8.
Chunk = ((1 << NumBits) - 1) * Scale;
}

NumMIs += Bytes / Chunk;
if ((Bytes % Chunk) != 0)
NumMIs++;
if (ExtraOpc)
NumMIs++;
return NumMIs;
}

/// emitThumbRegPlusImmediate - Emits a series of instructions to materialize		/// emitThumbRegPlusImmediate - Emits a series of instructions to materialize
/// a destreg = basereg + immediate in Thumb code.		/// a destreg = basereg + immediate in Thumb code. Tries a series of ADDs or
		/// SUBs first, and uses a constant pool value if the instruction sequence would
		/// be too long. This is allowed to modify the condition flags.
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Why not just fix the implementation of RoundUpToAlignment? Clang seems quite capable of optimising "/Align * Align" when Align is a power of 2, and it looks like the function gets inlined at all of its callsites. t.p.northover: Why not just fix the implementation of RoundUpToAlignment? Clang seems quite capable of…
void llvm::emitThumbRegPlusImmediate(MachineBasicBlock &MBB,		void llvm::emitThumbRegPlusImmediate(MachineBasicBlock &MBB,
MachineBasicBlock::iterator &MBBI,		MachineBasicBlock::iterator &MBBI,
DebugLoc dl,		DebugLoc dl,
unsigned DestReg, unsigned BaseReg,		unsigned DestReg, unsigned BaseReg,
int NumBytes, const TargetInstrInfo &TII,		int NumBytes, const TargetInstrInfo &TII,
const ARMBaseRegisterInfo& MRI,		const ARMBaseRegisterInfo& MRI,
unsigned MIFlags) {		unsigned MIFlags) {
bool isSub = NumBytes < 0;		bool isSub = NumBytes < 0;
unsigned Bytes = (unsigned)NumBytes;		unsigned Bytes = (unsigned)NumBytes;
if (isSub) Bytes = -NumBytes;		if (isSub) Bytes = -NumBytes;
bool isMul4 = (Bytes & 3) == 0;
bool isTwoAddr = false;
bool DstNotEqBase = false;
unsigned NumBits = 1;
unsigned Scale = 1;
int Opc = 0;
int ExtraOpc = 0;
bool NeedCC = false;

if (DestReg == BaseReg && BaseReg == ARM::SP) {		int CopyOpc = 0;
assert(isMul4 && "Thumb sp inc / dec size must be multiple of 4!");		unsigned CopyBits = 0;
NumBits = 7;		unsigned CopyScale = 1;
Scale = 4;		bool CopyNeedsCC = false;
Opc = isSub ? ARM::tSUBspi : ARM::tADDspi;		int ExtraOpc = 0;
isTwoAddr = true;		unsigned ExtraBits = 0;
} else if (!isSub && BaseReg == ARM::SP) {		unsigned ExtraScale = 1;
// r1 = add sp, 403		bool ExtraNeedsCC = false;
// =>
// r1 = add sp, 100 * 4		// Strategy:
// r1 = add r1, 3		// We need to select two types of instruction, maximizing the available
if (!isMul4) {		// immediate range of each. The instructions we use will depend on whether
Bytes &= ~3;		// DestReg and BaseReg are low, high or the stack pointer.
ExtraOpc = ARM::tADDi3;		// * CopyOpc - DestReg = BaseReg + imm
}		// This will be emitted once if DestReg != BaseReg, and never if
DstNotEqBase = true;		// DestReg == BaseReg.
NumBits = 8;		// * ExtraOpc - DestReg = DestReg + imm
Scale = 4;		// This will be emitted as many times as necessary to add the
Opc = ARM::tADDrSPi;		// full immediate.
} else {		// If the immediate ranges of these instructions are not large enough to cover
// sp = sub sp, c		// NumBytes with a reasonable number of instructions, we fall back to using a
// r1 = sub sp, c		// value loaded from a constant pool.
// r8 = sub sp, c
if (DestReg != BaseReg)
DstNotEqBase = true;
if (DestReg == ARM::SP) {		if (DestReg == ARM::SP) {
Opc = isSub ? ARM::tSUBspi : ARM::tADDspi;		if (BaseReg == ARM::SP) {
assert(isMul4 && "Thumb sp inc / dec size must be multiple of 4!");		// sp -> sp
NumBits = 7;		// Already in right reg, no copy needed
Scale = 4;
} else {		} else {
Opc = isSub ? ARM::tSUBi8 : ARM::tADDi8;		// low -> sp or high -> sp
NumBits = 8;		CopyOpc = ARM::tMOVr;
NeedCC = true;		CopyBits = 0;
}		}
isTwoAddr = true;		ExtraOpc = isSub ? ARM::tSUBspi : ARM::tADDspi;
}		ExtraBits = 7;
		ExtraScale = 4;
unsigned NumMIs = calcNumMI(Opc, ExtraOpc, Bytes, NumBits, Scale);		} else if (isARMLowRegister(DestReg)) {
		if (BaseReg == ARM::SP) {
		// sp -> low
		assert(!isSub && "Thumb1 does not have tSUBrSPi");
		CopyOpc = ARM::tADDrSPi;
		CopyBits = 8;
		CopyScale = 4;
		} else if (DestReg == BaseReg) {
		// low -> same low
		// Already in right reg, no copy needed
		} else if (isARMLowRegister(BaseReg)) {
		// low -> different low
		CopyOpc = isSub ? ARM::tSUBi3 : ARM::tADDi3;
		CopyBits = 3;
		CopyNeedsCC = true;
		} else {
		// high -> low
		CopyOpc = ARM::tMOVr;
		CopyBits = 0;
		}
		ExtraOpc = isSub ? ARM::tSUBi8 : ARM::tADDi8;
		ExtraBits = 8;
		ExtraNeedsCC = true;
		} else /* DestReg is high */ {
		if (DestReg == BaseReg) {
		// high -> same high
		// Already in right reg, no copy needed
		} else {
		// {low,high,sp} -> high
		CopyOpc = ARM::tMOVr;
		CopyBits = 0;
		}
		ExtraOpc = 0;
		}

		// We could handle an unaligned immediate with an unaligned copy instruction
		// and an aligned extra instruction, but this case is not currently needed.
		assert(((Bytes & 3) == 0 \|\| ExtraScale == 1) &&
		"Unaligned offset, but all instructions require alignment");

		unsigned CopyRange = ((1 << CopyBits) - 1) * CopyScale;
		// If we would emit the copy with an immediate of 0, just use tMOVr.
		if (CopyOpc && Bytes < CopyScale) {
		CopyOpc = ARM::tMOVr;
		CopyBits = 0;
		CopyScale = 1;
		CopyNeedsCC = false;
		CopyRange = 0;
		}
		unsigned ExtraRange = ((1 << ExtraBits) - 1) * ExtraScale; // per instruction
		unsigned RequiredCopyInstrs = CopyOpc ? 1 : 0;
		unsigned RangeAfterCopy = (CopyRange > Bytes) ? 0 : (Bytes - CopyRange);

		t.p.northoverUnsubmitted Not Done Reply Inline Actions This doesn't seem to line up with what actually gets implemented. In particular, I think we're in deep trouble if the RangeAfterCopy actually does get rounded up. t.p.northover: This doesn't seem to line up with what actually gets implemented. In particular, I think we're…
		olista01AuthorUnsubmitted Not Done Reply Inline Actions I'm not sure I follow. This should round RequiredExtraInstrs up to the lowest integer number of instructions that could be used, and the loop then handles as much of the immediate as possible with each instruction, so these should match up. Can you give an example where this doesn't work? olista01: I'm not sure I follow. This should round RequiredExtraInstrs up to the lowest integer number of…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I think it falls into the limited alignment handling comments If RangeAfterCopy has been rounded up here, then that means "(Bytes - CopyRange) % ExtraRange != 0". But "Bytes - CopyRange" is precisely what we'll have to emit with the extra instructions (Copy greedily takes as many bytes as possible). This is impossible. I don't believe the case actually happens, because when ExtraRange != 1, we only ever emit a MOV so CopyRange == 0. But that means the clause is untested and untestable, of course: another reason to remove it. t.p.northover: I think it falls into the limited alignment handling comments If RangeAfterCopy has been…
		olista01AuthorUnsubmitted Not Done Reply Inline Actions I think that the assertion on line 230 (second version of the patch) should catch this case, but I'll add one closer to here to be sure. olista01: I think that the assertion on line 230 (second version of the patch) should catch this case…
		// We could handle this case when the copy instruction does not require an
		// aligned immediate, but we do not currently do this.
		assert(RangeAfterCopy % ExtraScale == 0 &&
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I think this overflows if RequiredExtraInstrs = UINT_MAX and there's a copy instruction. t.p.northover: I think this overflows if RequiredExtraInstrs = UINT_MAX and there's a copy instruction.
		"Extra instruction requires immediate to be aligned");

		unsigned RequiredExtraInstrs;
		if (ExtraRange)
		RequiredExtraInstrs = RoundUpToAlignment(RangeAfterCopy, ExtraRange) / ExtraRange;
		else if (RangeAfterCopy > 0)
		// We need an extra instruction but none is available
		RequiredExtraInstrs = 1000000;
		else
		RequiredExtraInstrs = 0;
		unsigned RequiredInstrs = RequiredCopyInstrs + RequiredExtraInstrs;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions isMul4 will be unused in a release build, I think, causing a warning. t.p.northover: isMul4 will be unused in a release build, I think, causing a warning.
unsigned Threshold = (DestReg == ARM::SP) ? 3 : 2;		unsigned Threshold = (DestReg == ARM::SP) ? 3 : 2;
if (NumMIs > Threshold) {
// This will expand into too many instructions. Load the immediate from a		// Use a constant pool, if the sequence of ADDs/SUBs is too expensive.
// constpool entry.		if (RequiredInstrs > Threshold) {
emitThumbRegPlusImmInReg(MBB, MBBI, dl,		emitThumbRegPlusImmInReg(MBB, MBBI, dl,
DestReg, BaseReg, NumBytes, true,		DestReg, BaseReg, NumBytes, true,
TII, MRI, MIFlags);		TII, MRI, MIFlags);
return;		return;
}		}

if (DstNotEqBase) {		// Emit zero or one copy instructions
if (isARMLowRegister(DestReg) && isARMLowRegister(BaseReg)) {		if (CopyOpc) {
// If both are low registers, emit DestReg = add BaseReg, max(Imm, 7)		unsigned CopyImm = std::min(Bytes, CopyRange) / CopyScale;
unsigned Chunk = (1 << 3) - 1;		Bytes -= CopyImm * CopyScale;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I don't think this works with an unaligned copy but an aligned extra. That case doesn't seem to exist anyway, but perhaps it should more clear if that's intentional (comments, in the assert, ...). t.p.northover: I don't think this works with an unaligned copy but an aligned extra. That case doesn't seem to…
unsigned ThisVal = (Bytes > Chunk) ? Chunk : Bytes;
Bytes -= ThisVal;		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(CopyOpc), DestReg);
const MCInstrDesc &MCID = TII.get(isSub ? ARM::tSUBi3 : ARM::tADDi3);		if (CopyNeedsCC)
const MachineInstrBuilder MIB =		MIB = AddDefaultT1CC(MIB);
AddDefaultT1CC(BuildMI(MBB, MBBI, dl, MCID, DestReg)		MIB.addReg(BaseReg, RegState::Kill);
.setMIFlags(MIFlags));		if (CopyOpc != ARM::tMOVr) {
AddDefaultPred(MIB.addReg(BaseReg, RegState::Kill).addImm(ThisVal));		MIB.addImm(CopyImm);
} else if (isARMLowRegister(DestReg) && BaseReg == ARM::SP && Bytes > 0) {
unsigned ThisVal = std::min(1020U, Bytes / 4 * 4);
Bytes -= ThisVal;
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tADDrSPi), DestReg)
.addReg(BaseReg, RegState::Kill).addImm(ThisVal / 4))
.setMIFlags(MIFlags);
} else {
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), DestReg)
.addReg(BaseReg, RegState::Kill))
.setMIFlags(MIFlags);
}		}
		AddDefaultPred(MIB.setMIFlags(MIFlags));

BaseReg = DestReg;		BaseReg = DestReg;
}		}

unsigned Chunk = ((1 << NumBits) - 1) * Scale;		// Emit zero or more in-place add/sub instructions
while (Bytes) {		while (Bytes) {
unsigned ThisVal = (Bytes > Chunk) ? Chunk : Bytes;		unsigned ExtraImm = std::min(Bytes, ExtraRange) / ExtraScale;
Bytes -= ThisVal;		Bytes -= ExtraImm * ExtraScale;
ThisVal /= Scale;
// Build the new tADD / tSUB.		MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(ExtraOpc), DestReg);
if (isTwoAddr) {		if (ExtraNeedsCC)
MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(Opc), DestReg);
if (NeedCC)
MIB = AddDefaultT1CC(MIB);
MIB.addReg(DestReg).addImm(ThisVal);
MIB = AddDefaultPred(MIB);
MIB.setMIFlags(MIFlags);
} else {
bool isKill = BaseReg != ARM::SP;
MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(Opc), DestReg);
if (NeedCC)
MIB = AddDefaultT1CC(MIB);		MIB = AddDefaultT1CC(MIB);
MIB.addReg(BaseReg, getKillRegState(isKill)).addImm(ThisVal);		MIB.addReg(BaseReg).addImm(ExtraImm);
MIB = AddDefaultPred(MIB);		MIB = AddDefaultPred(MIB);
MIB.setMIFlags(MIFlags);		MIB.setMIFlags(MIFlags);

BaseReg = DestReg;
if (Opc == ARM::tADDrSPi) {
// r4 = add sp, imm
// r4 = add r4, imm
// ...
NumBits = 8;
Scale = 1;
Chunk = ((1 << NumBits) - 1) * Scale;
Opc = isSub ? ARM::tSUBi8 : ARM::tADDi8;
NeedCC = isTwoAddr = true;
}
}
}

if (ExtraOpc) {
const MCInstrDesc &MCID = TII.get(ExtraOpc);
AddDefaultPred(AddDefaultT1CC(BuildMI(MBB, MBBI, dl, MCID, DestReg))
.addReg(DestReg, RegState::Kill)
.addImm(((unsigned)NumBytes) & 3)
.setMIFlags(MIFlags));
}		}
}		}

static void removeOperands(MachineInstr &MI, unsigned i) {		static void removeOperands(MachineInstr &MI, unsigned i) {
unsigned Op = i;		unsigned Op = i;
for (unsigned e = MI.getNumOperands(); i != e; ++i)		for (unsigned e = MI.getNumOperands(); i != e; ++i)
MI.RemoveOperand(Op);		MI.RemoveOperand(Op);
}		}
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

test/CodeGen/ARM/thumb1-varalloc.ll

	; RUN: llc < %s -mtriple=thumbv6-apple-darwin \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv6-apple-darwin \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv6-apple-darwin -regalloc=basic \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv6-apple-darwin -regalloc=basic \| FileCheck %s
				; RUN: llc < %s -o %t -filetype=obj -mtriple=thumbv6-apple-darwin
				; RUN: llvm-objdump -triple=thumbv6-apple-darwin -d %t \| FileCheck %s

	@__bar = external hidden global i8*			@__bar = external hidden global i8*
	@__baz = external hidden global i8*			@__baz = external hidden global i8*

	; rdar://8819685			; rdar://8819685
	define i8* @_foo() {			define i8* @_foo() {
	entry:			entry:
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	Show All 33 Lines
	; Variable ending up at unaligned offset from sp (i.e. not a multiple of 4)			; Variable ending up at unaligned offset from sp (i.e. not a multiple of 4)
	define void @test_local_var_addr() {			define void @test_local_var_addr() {
	; CHECK-LABEL: test_local_var_addr:			; CHECK-LABEL: test_local_var_addr:

	%addr1 = alloca i8			%addr1 = alloca i8
	%addr2 = alloca i8			%addr2 = alloca i8

	; CHECK: mov r0, sp			; CHECK: mov r0, sp
	; CHECK: adds r0, r0, #{{[0-9]+}}			; CHECK: adds r0, #{{[0-9]+}}
	; CHECK: blx _take_ptr			; CHECK: blx
	call void @take_ptr(i8* %addr1)			call void @take_ptr(i8* %addr1)

	; CHECK: mov r0, sp			; CHECK: mov r0, sp
	; CHECK: adds r0, r0, #{{[0-9]+}}			; CHECK: adds r0, #{{[0-9]+}}
	; CHECK: blx _take_ptr			; CHECK: blx
	call void @take_ptr(i8* %addr2)			call void @take_ptr(i8* %addr2)

	ret void			ret void
	}			}

	; Simple variable ending up at sp.			; Simple variable ending up at sp.
	define void @test_simple_var() {			define void @test_simple_var() {
	; CHECK-LABEL: test_simple_var:			; CHECK-LABEL: test_simple_var:

	%addr32 = alloca i32			%addr32 = alloca i32
	%addr8 = bitcast i32* %addr32 to i8*			%addr8 = bitcast i32* %addr32 to i8*

	; CHECK: mov r0, sp			; CHECK: mov r0, sp
	; CHECK-NOT: adds r0			; CHECK-NOT: adds r0
	; CHECK: blx _take_ptr			; CHECK: blx
	call void @take_ptr(i8* %addr8)			call void @take_ptr(i8* %addr8)
	ret void			ret void
	}			}

	; Simple variable ending up at aligned offset from sp.			; Simple variable ending up at aligned offset from sp.
	define void @test_local_var_addr_aligned() {			define void @test_local_var_addr_aligned() {
	; CHECK-LABEL: test_local_var_addr_aligned:			; CHECK-LABEL: test_local_var_addr_aligned:

	%addr1.32 = alloca i32			%addr1.32 = alloca i32
	%addr1 = bitcast i32* %addr1.32 to i8*			%addr1 = bitcast i32* %addr1.32 to i8*
	%addr2.32 = alloca i32			%addr2.32 = alloca i32
	%addr2 = bitcast i32* %addr2.32 to i8*			%addr2 = bitcast i32* %addr2.32 to i8*

	; CHECK: add r0, sp, #{{[0-9]+}}			; CHECK: add r0, sp, #{{[0-9]+}}
	; CHECK: blx _take_ptr			; CHECK: blx
	call void @take_ptr(i8* %addr1)			call void @take_ptr(i8* %addr1)

	; CHECK: mov r0, sp			; CHECK: mov r0, sp
	; CHECK-NOT: add r0			; CHECK-NOT: add r0
	; CHECK: blx _take_ptr			; CHECK: blx
	call void @take_ptr(i8* %addr2)			call void @take_ptr(i8* %addr2)

	ret void			ret void
	}			}

	; Simple variable ending up at aligned offset from sp.			; Simple variable ending up at aligned offset from sp.
	define void @test_local_var_big_offset() {			define void @test_local_var_big_offset() {
	; CHECK-LABEL: test_local_var_big_offset:			; CHECK-LABEL: test_local_var_big_offset:
	%addr1.32 = alloca i32, i32 257			%addr1.32 = alloca i32, i32 257
	%addr1 = bitcast i32* %addr1.32 to i8*			%addr1 = bitcast i32* %addr1.32 to i8*
	%addr2.32 = alloca i32, i32 257			%addr2.32 = alloca i32, i32 257

	; CHECK: add [[RTMP:r[0-9]+]], sp, #1020			; CHECK: add [[RTMP:r[0-9]+]], sp, #1020
	; CHECL: add r0, [[RTMP]], #8			; CHECK: adds [[RTMP]], #8
	; CHECK: blx _take_ptr			; CHECK: blx
				call void @take_ptr(i8* %addr1)

				ret void
				}

				; Max range addressable with tADDrSPi
				define void @test_local_var_offset_1020() {
				; CHECK-LABEL: test_local_var_offset_1020
				%addr1 = alloca i8, i32 4
				%addr2 = alloca i8, i32 1020

				; CHECK: add r0, sp, #1020
				; CHECK-NEXT: blx
				call void @take_ptr(i8* %addr1)

				ret void
				}

				; Max range addressable with tADDrSPi + tADDi8
				define void @test_local_var_offset_1275() {
				; CHECK-LABEL: test_local_var_offset_1275
				%addr1 = alloca i8, i32 1
				%addr2 = alloca i8, i32 1275

				; CHECK: add r0, sp, #1020
				; CHECK: adds r0, #255
				; CHECK-NEXT: blx
	call void @take_ptr(i8* %addr1)			call void @take_ptr(i8* %addr1)

	ret void			ret void
	}			}

	declare void @take_ptr(i8*)			declare void @take_ptr(i8*)

test/CodeGen/Thumb/large-stack.ll

	; RUN: llc < %s -mtriple=thumb-apple-ios \| FileCheck %s			; RUN: llc < %s -mtriple=thumb-apple-ios \| FileCheck %s --check-prefix=CHECK --check-prefix=IOS
				; RUN: llc < %s -mtriple=thumb-none-eabi \| FileCheck %s --check-prefix=CHECK --check-prefix=EABI
				; RUN: llc < %s -o %t -filetype=obj -mtriple=thumbv6-apple-ios
				; RUN: llvm-objdump -triple=thumbv6-apple-ios -d %t \| FileCheck %s --check-prefix=CHECK --check-prefix=IOS
				; RUN: llc < %s -o %t -filetype=obj -mtriple=thumbv6-none-eabi
				; RUN: llvm-objdump -triple=thumbv6-none-eabi -d %t \| FileCheck %s --check-prefix=CHECK --check-prefix=EABI

				; Largest stack for which a single tADDspi/tSUBspi is enough
	define void @test1() {			define void @test1() {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: sub sp, #256			; CHECK: sub sp, #508
	; CHECK: add sp, #256			; CHECK: add sp, #508
	%tmp = alloca [ 64 x i32 ] , align 4			%tmp = alloca [ 508 x i8 ] , align 4
	ret void			ret void
	}			}

				; Largest stack for which three tADDspi/tSUBspis are enough
				define void @test100() {
				; CHECK-LABEL: test100:
				; CHECK: sub sp, #508
				; CHECK: sub sp, #508
				; CHECK: sub sp, #508
				; EABI: add sp, #508
				; EABI: add sp, #508
				; EABI: add sp, #508
				; IOS: subs r4, r7, #4
				; IOS: mov sp, r4
				%tmp = alloca [ 1524 x i8 ] , align 4
				ret void
				}

				; Smallest stack for which we use a constant pool
	define void @test2() {			define void @test2() {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: ldr r0, LCPI			; CHECK: ldr r0,
	; CHECK: add sp, r0			; CHECK: add sp, r0
	; CHECK: subs r4, r7, #4			; EABI: ldr r0,
	; CHECK: mov sp, r4			; EABI: add sp, r0
	%tmp = alloca [ 4168 x i8 ] , align 4			; IOS: subs r4, r7, #4
				; IOS: mov sp, r4
				%tmp = alloca [ 1528 x i8 ] , align 4
	ret void			ret void
	}			}

	define i32 @test3() {			define i32 @test3() {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: ldr r1, LCPI			; CHECK: ldr r1,
	; CHECK: add sp, r1			; CHECK: add sp, r1
	; CHECK: ldr r1, LCPI			; CHECK: ldr r1,
	; CHECK: add r1, sp			; CHECK: add r1, sp
	; CHECK: subs r4, r7, #4			; EABI: ldr r1,
	; CHECK: mov sp, r4			; EABI: add sp, r1
				; IOS: subs r4, r7, #4
				; IOS: mov sp, r4
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%tmp = alloca i32, align 4			%tmp = alloca i32, align 4
	%a = alloca [805306369 x i8], align 16			%a = alloca [805306369 x i8], align 16
	store i32 0, i32* %tmp			store i32 0, i32* %tmp
	%tmp1 = load i32* %tmp			%tmp1 = load i32* %tmp
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	Show All 18 Lines