This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
1
ARMExpandPseudoInsts.cpp
8
ARMFrameLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
alloc-no-stack-realign.ll
-
fold-stack-adjust.ll
-
interrupt-attr.ll
-
spill-q.ll
-
stack-alignment.ll
-
Thumb2/
-
aligned-spill.ll
-
thumb2-spill-q.ll

Differential D6844

[ARM] Fix large stack alignment codegen bug for ARM and Thumb2 targets
ClosedPublic

Authored by kristof.beyls on Jan 5 2015, 7:52 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy

Summary

This patch partially fixes PR13007 (ARM CodeGen fails with large
stack alignment): for ARM and Thumb2 targets, but not for Thumb1,
as it seems stack alignment for Thumb1 targets hasn't been
supported at all.

Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.

This patch fixes code generation for large stack aligments by using
the BFC instruction instead, when the BFC instruction is available.
When not, it uses 2 instructions: a right shift, followed by a left
shift to zero out the lower bits.

The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I haven't been able to
produce a test case for that. Does anyone understand sjlj exception
handling well enough to produce a test case triggering the bug on
the FIXME I've added in ARMExpandPseudoInsts.cpp?

Please review!

Thanks,

Kristof

Diff Detail

Event Timeline

kristof.beyls updated this revision to Diff 17802.Jan 5 2015, 7:52 AM

kristof.beyls retitled this revision from to [ARM] Fix large stack alignment codegen bug for ARM and Thumb2 targets.

kristof.beyls updated this object.

kristof.beyls edited the test plan for this revision. (Show Details)

kristof.beyls added a reviewer: t.p.northover.

kristof.beyls set the repository for this revision to rL LLVM.

kristof.beyls added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptJan 5 2015, 7:52 AM

Hi Kristof,

Thanks for working on this. Comments inline.

Cheers,

James

lib/Target/ARM/ARMExpandPseudoInsts.cpp
890	I'd turn this FIXME into an assert. Then, you may end up with some buildbot providing you with a test case! Either way, it's a silent fault so let's fail hard.
lib/Target/ARM/ARMFrameLowering.cpp
214	Doc comments should use three slashes (///)
225	Might be worth mentioning in the doc comment whether Alignment is expected to be M or N in: M = 1 << N.
226	Coding style: MustBeSingleInstruction.
228	CanUseBFC
231	... && "Thumb1 alignment not supported!") or something
252	Where's the guarantee that we can't trigger this assertion in normal operation? If we can't use BFC, and AlignMask is > 255? Is this expected to happen? If not, please add a "&& "Reason!"" to the assert.
1149–1154	Please write comments in full sentences: "We must set the parameter..."

This revision now requires changes to proceed.Jan 7 2015, 2:50 AM

Thanks James, I've updated the patch taking into account your review comments.
The only comment I didn't make a change for is commenting whether Alignment is a log2 value or not, as I feel that that documentation would be overkill - it isn't a log2 value and it should be clear from the context it isn't.

One other coding-style issue.

lib/Target/ARM/ARMFrameLowering.cpp
231	Coding style: should be `NrBitsToZero`

Thanks for noticing Charlie - now fixed in updated patch.

Hi Kristof,

LGTM now.

Cheers,

James

This revision is now accepted and ready to land.Jan 8 2015, 2:09 AM

This got committed back in January as http://llvm.org/viewvc/llvm-project?rev=225446&view=rev

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMExpandPseudoInsts.cpp

3 lines

ARMFrameLowering.cpp

103 lines

test/

CodeGen/

ARM/

alloc-no-stack-realign.ll

2 lines

2 lines

14 lines

2 lines

164 lines

Thumb2/

aligned-spill.ll

8 lines

thumb2-spill-q.ll

2 lines

Diff 17855

lib/Target/ARM/ARMExpandPseudoInsts.cpp

Show First 20 Lines • Show All 881 Lines • ▼ Show 20 Lines	case ARM::Int_eh_sjlj_dispatchsetup: {
*TII);		*TII);
}		}
// If there's dynamic realignment, adjust for it.		// If there's dynamic realignment, adjust for it.
if (RI.needsStackRealignment(MF)) {		if (RI.needsStackRealignment(MF)) {
MachineFrameInfo *MFI = MF.getFrameInfo();		MachineFrameInfo *MFI = MF.getFrameInfo();
unsigned MaxAlign = MFI->getMaxAlignment();		unsigned MaxAlign = MFI->getMaxAlignment();
assert (!AFI->isThumb1OnlyFunction());		assert (!AFI->isThumb1OnlyFunction());
// Emit bic r6, r6, MaxAlign		// Emit bic r6, r6, MaxAlign
		assert(MaxAlign <= 256 && "The BIC instruction cannot encode "
		jmolloyUnsubmitted Not Done Reply Inline Actions I'd turn this FIXME into an assert. Then, you may end up with some buildbot providing you with a test case! Either way, it's a silent fault so let's fail hard. jmolloy: I'd turn this FIXME into an assert. Then, you may end up with some buildbot providing you with…
		"immediates larger than 256 with all lower "
		"bits set.");
unsigned bicOpc = AFI->isThumbFunction() ?		unsigned bicOpc = AFI->isThumbFunction() ?
ARM::t2BICri : ARM::BICri;		ARM::t2BICri : ARM::BICri;
AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, MI.getDebugLoc(),		AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, MI.getDebugLoc(),
TII->get(bicOpc), ARM::R6)		TII->get(bicOpc), ARM::R6)
.addReg(ARM::R6, RegState::Kill)		.addReg(ARM::R6, RegState::Kill)
.addImm(MaxAlign-1)));		.addImm(MaxAlign-1)));
}		}

▲ Show 20 Lines • Show All 468 Lines • Show Last 20 Lines

lib/Target/ARM/ARMFrameLowering.cpp

Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	for (auto &Info : Insts) {
TII.get(TargetOpcode::CFI_INSTRUCTION))		TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlags(MachineInstr::FrameSetup);		.setMIFlags(MachineInstr::FrameSetup);
}		}
}		}
};		};
}		}

		/// Emit an instruction sequence that will align the address in
		jmolloyUnsubmitted Not Done Reply Inline Actions Doc comments should use three slashes (///) jmolloy: Doc comments should use three slashes (///)
		/// register Reg by zero-ing out the lower bits. For versions of the
		/// architecture that support Neon, this must be done in a single
		/// instruction, since skipAlignedDPRCS2Spills assumes it is done in a
		/// single instruction. That function only gets called when optimizing
		/// spilling of D registers on a core with the Neon instruction set
		/// present.
		static void emitAligningInstructions(MachineFunction &MF, ARMFunctionInfo *AFI,
		const TargetInstrInfo &TII,
		MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		DebugLoc DL, const unsigned Reg,
		jmolloyUnsubmitted Not Done Reply Inline Actions Might be worth mentioning in the doc comment whether Alignment is expected to be M or N in: M = 1 << N. jmolloy: Might be worth mentioning in the doc comment whether Alignment is expected to be M or N in: M…
		const unsigned Alignment,
		jmolloyUnsubmitted Not Done Reply Inline Actions Coding style: MustBeSingleInstruction. jmolloy: Coding style: MustBeSingleInstruction.
		const bool MustBeSingleInstruction) {
		const ARMSubtarget &AST = MF.getTarget().getSubtarget<ARMSubtarget>();
		jmolloyUnsubmitted Not Done Reply Inline Actions CanUseBFC jmolloy: CanUseBFC
		const bool CanUseBFC = AST.hasV6T2Ops() \|\| AST.hasV7Ops();
		const unsigned AlignMask = Alignment - 1;
		const unsigned NrBitsToZero = countTrailingZeros(Alignment);
		jmolloyUnsubmitted Not Done Reply Inline Actions ... && "Thumb1 alignment not supported!") or something jmolloy: ... && "Thumb1 alignment not supported!") or something
		chatur01Unsubmitted Not Done Reply Inline Actions Coding style: should be `NrBitsToZero` chatur01: Coding style: should be `NrBitsToZero`
		assert(!AFI->isThumb1OnlyFunction() && "Thumb1 not supported");
		if (!AFI->isThumbFunction()) {
		// if the BFC instruction is available, use that to zero the lower
		// bits:
		// bfc Reg, #0, log2(Alignment)
		// otherwise use BIC, if the mask to zero the required number of bits
		// can be encoded in the bic immediate field
		// bic Reg, Reg, Alignment-1
		// otherwise, emit
		// lsr Reg, Reg, log2(Alignment)
		// lsl Reg, Reg, log2(Alignment)
		if (CanUseBFC) {
		AddDefaultPred(BuildMI(MBB, MBBI, DL, TII.get(ARM::BFC), Reg)
		.addReg(Reg, RegState::Kill)
		.addImm(~AlignMask));
		} else if (AlignMask <= 255) {
		AddDefaultCC(
		AddDefaultPred(BuildMI(MBB, MBBI, DL, TII.get(ARM::BICri), Reg)
		.addReg(Reg, RegState::Kill)
		.addImm(AlignMask)));
		} else {
		jmolloyUnsubmitted Not Done Reply Inline Actions Where's the guarantee that we can't trigger this assertion in normal operation? If we can't use BFC, and AlignMask is > 255? Is this expected to happen? If not, please add a "&& "Reason!"" to the assert. jmolloy: Where's the guarantee that we can't trigger this assertion in normal operation? If we can't use…
		assert(!MustBeSingleInstruction &&
		"Shouldn't call emitAligningInstructions demanding a single "
		"instruction to be emitted for large stack alignment for a target "
		"without BFC.");
		AddDefaultCC(AddDefaultPred(
		BuildMI(MBB, MBBI, DL, TII.get(ARM::MOVsi), Reg)
		.addReg(Reg, RegState::Kill)
		.addImm(ARM_AM::getSORegOpc(ARM_AM::lsr, NrBitsToZero))));
		AddDefaultCC(AddDefaultPred(
		BuildMI(MBB, MBBI, DL, TII.get(ARM::MOVsi), Reg)
		.addReg(Reg, RegState::Kill)
		.addImm(ARM_AM::getSORegOpc(ARM_AM::lsl, NrBitsToZero))));
		}
		} else {
		// Since this is only reached for Thumb-2 targets, the BFC instruction
		// should always be available.
		assert(CanUseBFC);
		AddDefaultPred(BuildMI(MBB, MBBI, DL, TII.get(ARM::t2BFC), Reg)
		.addReg(Reg, RegState::Kill)
		.addImm(~AlignMask));
		}
		}

void ARMFrameLowering::emitPrologue(MachineFunction &MF) const {		void ARMFrameLowering::emitPrologue(MachineFunction &MF) const {
MachineBasicBlock &MBB = MF.front();		MachineBasicBlock &MBB = MF.front();
MachineBasicBlock::iterator MBBI = MBB.begin();		MachineBasicBlock::iterator MBBI = MBB.begin();
MachineFrameInfo *MFI = MF.getFrameInfo();		MachineFrameInfo *MFI = MF.getFrameInfo();
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();		ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
MCContext &Context = MMI.getContext();		MCContext &Context = MMI.getContext();
const TargetMachine &TM = MF.getTarget();		const TargetMachine &TM = MF.getTarget();
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	void ARMFrameLowering::emitPrologue(MachineFunction &MF) const {
AFI->setDPRCalleeSavedAreaSize(DPRCSSize);		AFI->setDPRCalleeSavedAreaSize(DPRCSSize);

// If we need dynamic stack realignment, do it here. Be paranoid and make		// If we need dynamic stack realignment, do it here. Be paranoid and make
// sure if we also have VLAs, we have a base pointer for frame access.		// sure if we also have VLAs, we have a base pointer for frame access.
// If aligned NEON registers were spilled, the stack has already been		// If aligned NEON registers were spilled, the stack has already been
// realigned.		// realigned.
if (!AFI->getNumAlignedDPRCS2Regs() && RegInfo->needsStackRealignment(MF)) {		if (!AFI->getNumAlignedDPRCS2Regs() && RegInfo->needsStackRealignment(MF)) {
unsigned MaxAlign = MFI->getMaxAlignment();		unsigned MaxAlign = MFI->getMaxAlignment();
assert (!AFI->isThumb1OnlyFunction());		assert(!AFI->isThumb1OnlyFunction());
if (!AFI->isThumbFunction()) {		if (!AFI->isThumbFunction()) {
// Emit bic sp, sp, MaxAlign		emitAligningInstructions(MF, AFI, TII, MBB, MBBI, dl, ARM::SP, MaxAlign,
AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, dl,		false);
TII.get(ARM::BICri), ARM::SP)
.addReg(ARM::SP, RegState::Kill)
.addImm(MaxAlign-1)));
} else {		} else {
// We cannot use sp as source/dest register here, thus we're emitting the		// We cannot use sp as source/dest register here, thus we're using r4 to
// following sequence:		// perform the calculations. We're emitting the following sequence:
// mov r4, sp		// mov r4, sp
// bic r4, r4, MaxAlign		// -- use emitAligningInstructions to produce best sequence to zero
		// -- out lower bits in r4
// mov sp, r4		// mov sp, r4
// FIXME: It will be better just to find spare register here.		// FIXME: It will be better just to find spare register here.
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4)		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4)
.addReg(ARM::SP, RegState::Kill));		.addReg(ARM::SP, RegState::Kill));
AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, dl,		emitAligningInstructions(MF, AFI, TII, MBB, MBBI, dl, ARM::R4, MaxAlign,
TII.get(ARM::t2BICri), ARM::R4)		false);
.addReg(ARM::R4, RegState::Kill)
.addImm(MaxAlign-1)));
AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)		AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
.addReg(ARM::R4, RegState::Kill));		.addReg(ARM::R4, RegState::Kill));
}		}

AFI->setShouldRestoreSPFromFP(true);		AFI->setShouldRestoreSPFromFP(true);
}		}

// If we need a base pointer, set it up here. It's whatever the value		// If we need a base pointer, set it up here. It's whatever the value
// of the stack pointer is at this point. Any variable size objects		// of the stack pointer is at this point. Any variable size objects
// will be allocated after this, so we can still use the base pointer		// will be allocated after this, so we can still use the base pointer
▲ Show 20 Lines • Show All 478 Lines • ▼ Show 20 Lines	static void emitAlignedDPRCS2Spills(MachineBasicBlock &MBB,
bool isThumb = AFI->isThumbFunction();		bool isThumb = AFI->isThumbFunction();
assert(!AFI->isThumb1OnlyFunction() && "Can't realign stack for thumb1");		assert(!AFI->isThumb1OnlyFunction() && "Can't realign stack for thumb1");
AFI->setShouldRestoreSPFromFP(true);		AFI->setShouldRestoreSPFromFP(true);

// sub r4, sp, #numregs * 8		// sub r4, sp, #numregs * 8
// The immediate is <= 64, so it doesn't need any special encoding.		// The immediate is <= 64, so it doesn't need any special encoding.
unsigned Opc = isThumb ? ARM::t2SUBri : ARM::SUBri;		unsigned Opc = isThumb ? ARM::t2SUBri : ARM::SUBri;
AddDefaultCC(AddDefaultPred(BuildMI(MBB, MI, DL, TII.get(Opc), ARM::R4)		AddDefaultCC(AddDefaultPred(BuildMI(MBB, MI, DL, TII.get(Opc), ARM::R4)
.addReg(ARM::SP)		.addReg(ARM::SP)
.addImm(8 * NumAlignedDPRCS2Regs)));		.addImm(8 * NumAlignedDPRCS2Regs)));

// bic r4, r4, #align-1
Opc = isThumb ? ARM::t2BICri : ARM::BICri;
unsigned MaxAlign = MF.getFrameInfo()->getMaxAlignment();		unsigned MaxAlign = MF.getFrameInfo()->getMaxAlignment();
AddDefaultCC(AddDefaultPred(BuildMI(MBB, MI, DL, TII.get(Opc), ARM::R4)		// We must set parameter MustBeSingleInstruction to true, since
.addReg(ARM::R4, RegState::Kill)		// skipAlignedDPRCS2Spills expects exactly 3 instructions to perform
.addImm(MaxAlign - 1)));		// stack alignment. Luckily, this can always be done since all ARM
		// architecture versions that support Neon also support the BFC
		// instruction.
		emitAligningInstructions(MF, AFI, TII, MBB, MI, DL, ARM::R4, MaxAlign, true);
		jmolloyUnsubmitted Not Done Reply Inline Actions Please write comments in full sentences: "We must set the parameter..." jmolloy: Please write comments in full sentences: "We must set the parameter..."

// mov sp, r4		// mov sp, r4
// The stack pointer must be adjusted before spilling anything, otherwise		// The stack pointer must be adjusted before spilling anything, otherwise
// the stack slots could be clobbered by an interrupt handler.		// the stack slots could be clobbered by an interrupt handler.
// Leave r4 live, it is used below.		// Leave r4 live, it is used below.
Opc = isThumb ? ARM::tMOVr : ARM::MOVr;		Opc = isThumb ? ARM::tMOVr : ARM::MOVr;
MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(Opc), ARM::SP)		MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(Opc), ARM::SP)
.addReg(ARM::R4);		.addReg(ARM::R4);
▲ Show 20 Lines • Show All 1,017 Lines • Show Last 20 Lines

test/CodeGen/ARM/alloc-no-stack-realign.ll

	Show All 35 Lines
	%1 = load <16 x float>* %retval			%1 = load <16 x float>* %retval
	store <16 x float> %1, <16 x float>* %agg.result, align 16			store <16 x float> %1, <16 x float>* %agg.result, align 16
	ret void			ret void
	}			}

	define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp {			define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp {
	entry:			entry:
	; REALIGN-LABEL: test2			; REALIGN-LABEL: test2
	; REALIGN: bic sp, sp, #63			; REALIGN: bfc sp, #0, #6
	; REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]			; REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]
	; REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!			; REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
	; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
	; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32			; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32
	; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
	; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48			; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48
	; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]

	Show All 22 Lines

test/CodeGen/ARM/fold-stack-adjust.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK: vpop {d6, d7, d8, d9}			; CHECK: vpop {d6, d7, d8, d9}
	; CHECKL pop {r[[GLOBREG]], pc}			; CHECKL pop {r[[GLOBREG]], pc}

	; iOS uses aligned NEON stores here, which is convenient since we			; iOS uses aligned NEON stores here, which is convenient since we
	; want to make sure that works too.			; want to make sure that works too.
	; CHECK-IOS-LABEL: check_vfp_fold:			; CHECK-IOS-LABEL: check_vfp_fold:
	; CHECK-IOS: push {r0, r1, r2, r3, r4, r7, lr}			; CHECK-IOS: push {r0, r1, r2, r3, r4, r7, lr}
	; CHECK-IOS: sub.w r4, sp, #16			; CHECK-IOS: sub.w r4, sp, #16
	; CHECK-IOS: bic r4, r4, #15			; CHECK-IOS: bfc r4, #0, #4
	; CHECK-IOS: mov sp, r4			; CHECK-IOS: mov sp, r4
	; CHECK-IOS: vst1.64 {d8, d9}, [r4:128]			; CHECK-IOS: vst1.64 {d8, d9}, [r4:128]
	; ...			; ...
	; CHECK-IOS: add r4, sp, #16			; CHECK-IOS: add r4, sp, #16
	; CHECK-IOS: vld1.64 {d8, d9}, [r4:128]			; CHECK-IOS: vld1.64 {d8, d9}, [r4:128]
	; CHECK-IOS: mov sp, r4			; CHECK-IOS: mov sp, r4
	; CHECK-IOS: pop {r4, r7, pc}			; CHECK-IOS: pop {r4, r7, pc}

	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

test/CodeGen/ARM/interrupt-attr.ll

Show All 9 Lines	define arm_aapcscc void @irq_fn() alignstack(8) "interrupt"="IRQ" {
; Must save all registers except banked sp and lr (we save lr anyway because		; Must save all registers except banked sp and lr (we save lr anyway because
; we actually need it at the end to execute the return ourselves).		; we actually need it at the end to execute the return ourselves).

; Also need special function return setting pc and CPSR simultaneously.		; Also need special function return setting pc and CPSR simultaneously.
; CHECK-A-LABEL: irq_fn:		; CHECK-A-LABEL: irq_fn:
; CHECK-A: push {r0, r1, r2, r3, r10, r11, r12, lr}		; CHECK-A: push {r0, r1, r2, r3, r10, r11, r12, lr}
; CHECK-A: add r11, sp, #20		; CHECK-A: add r11, sp, #20
; CHECK-A-NOT: sub sp, sp, #{{[0-9]+}}		; CHECK-A-NOT: sub sp, sp, #{{[0-9]+}}
; CHECK-A: bic sp, sp, #7		; CHECK-A: bfc sp, #0, #3
; CHECK-A: bl bar		; CHECK-A: bl bar
; CHECK-A: sub sp, r11, #20		; CHECK-A: sub sp, r11, #20
; CHECK-A: pop {r0, r1, r2, r3, r10, r11, r12, lr}		; CHECK-A: pop {r0, r1, r2, r3, r10, r11, r12, lr}
; CHECK-A: subs pc, lr, #4		; CHECK-A: subs pc, lr, #4

; CHECK-A-THUMB-LABEL: irq_fn:		; CHECK-A-THUMB-LABEL: irq_fn:
; CHECK-A-THUMB: push.w {r0, r1, r2, r3, r4, r7, r12, lr}		; CHECK-A-THUMB: push.w {r0, r1, r2, r3, r4, r7, r12, lr}
; CHECK-A-THUMB: add r7, sp, #20		; CHECK-A-THUMB: add r7, sp, #20
; CHECK-A-THUMB: mov r4, sp		; CHECK-A-THUMB: mov r4, sp
; CHECK-A-THUMB: bic r4, r4, #7		; CHECK-A-THUMB: bfc r4, #0, #3
; CHECK-A-THUMB: bl bar		; CHECK-A-THUMB: bl bar
; CHECK-A-THUMB: sub.w r4, r7, #20		; CHECK-A-THUMB: sub.w r4, r7, #20
; CHECK-A-THUMB: mov sp, r4		; CHECK-A-THUMB: mov sp, r4
; CHECK-A-THUMB: pop.w {r0, r1, r2, r3, r4, r7, r12, lr}		; CHECK-A-THUMB: pop.w {r0, r1, r2, r3, r4, r7, r12, lr}
; CHECK-A-THUMB: subs pc, lr, #4		; CHECK-A-THUMB: subs pc, lr, #4

; Normal AAPCS function (r0-r3 pushed onto stack by hardware, lr set to		; Normal AAPCS function (r0-r3 pushed onto stack by hardware, lr set to
; appropriate sentinel so no special return needed).		; appropriate sentinel so no special return needed).
; CHECK-M-LABEL: irq_fn:		; CHECK-M-LABEL: irq_fn:
; CHECK-M: push.w {r4, r10, r11, lr}		; CHECK-M: push.w {r4, r10, r11, lr}
; CHECK-M: add.w r11, sp, #8		; CHECK-M: add.w r11, sp, #8
; CHECK-M: mov r4, sp		; CHECK-M: mov r4, sp
; CHECK-M: bic r4, r4, #7		; CHECK-M: bfc r4, #0, #3
; CHECK-M: mov sp, r4		; CHECK-M: mov sp, r4
; CHECK-M: bl _bar		; CHECK-M: bl _bar
; CHECK-M: sub.w r4, r11, #8		; CHECK-M: sub.w r4, r11, #8
; CHECK-M: mov sp, r4		; CHECK-M: mov sp, r4
; CHECK-M: pop.w {r4, r10, r11, pc}		; CHECK-M: pop.w {r4, r10, r11, pc}

call arm_aapcscc void @bar()		call arm_aapcscc void @bar()
ret void		ret void
}		}

; We don't push/pop r12, as it is banked for FIQ		; We don't push/pop r12, as it is banked for FIQ
define arm_aapcscc void @fiq_fn() alignstack(8) "interrupt"="FIQ" {		define arm_aapcscc void @fiq_fn() alignstack(8) "interrupt"="FIQ" {
; CHECK-A-LABEL: fiq_fn:		; CHECK-A-LABEL: fiq_fn:
; CHECK-A: push {r0, r1, r2, r3, r4, r5, r6, r7, r11, lr}		; CHECK-A: push {r0, r1, r2, r3, r4, r5, r6, r7, r11, lr}
; 32 to get past r0, r1, ..., r7		; 32 to get past r0, r1, ..., r7
; CHECK-A: add r11, sp, #32		; CHECK-A: add r11, sp, #32
; CHECK-A: sub sp, sp, #{{[0-9]+}}		; CHECK-A: sub sp, sp, #{{[0-9]+}}
; CHECK-A: bic sp, sp, #7		; CHECK-A: bfc sp, #0, #3
; [...]		; [...]
; 32 must match above		; 32 must match above
; CHECK-A: sub sp, r11, #32		; CHECK-A: sub sp, r11, #32
; CHECK-A: pop {r0, r1, r2, r3, r4, r5, r6, r7, r11, lr}		; CHECK-A: pop {r0, r1, r2, r3, r4, r5, r6, r7, r11, lr}
; CHECK-A: subs pc, lr, #4		; CHECK-A: subs pc, lr, #4

; CHECK-A-THUMB-LABEL: fiq_fn:		; CHECK-A-THUMB-LABEL: fiq_fn:
; CHECK-M-LABEL: fiq_fn:		; CHECK-M-LABEL: fiq_fn:
%val = load volatile [16 x i32]* @bigvar		%val = load volatile [16 x i32]* @bigvar
store volatile [16 x i32] %val, [16 x i32]* @bigvar		store volatile [16 x i32] %val, [16 x i32]* @bigvar
ret void		ret void
}		}

define arm_aapcscc void @swi_fn() alignstack(8) "interrupt"="SWI" {		define arm_aapcscc void @swi_fn() alignstack(8) "interrupt"="SWI" {
; CHECK-A-LABEL: swi_fn:		; CHECK-A-LABEL: swi_fn:
; CHECK-A: push {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}		; CHECK-A: push {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
; CHECK-A: add r11, sp, #44		; CHECK-A: add r11, sp, #44
; CHECK-A: sub sp, sp, #{{[0-9]+}}		; CHECK-A: sub sp, sp, #{{[0-9]+}}
; CHECK-A: bic sp, sp, #7		; CHECK-A: bfc sp, #0, #3
; [...]		; [...]
; CHECK-A: sub sp, r11, #44		; CHECK-A: sub sp, r11, #44
; CHECK-A: pop {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}		; CHECK-A: pop {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
; CHECK-A: subs pc, lr, #0		; CHECK-A: subs pc, lr, #0

%val = load volatile [16 x i32]* @bigvar		%val = load volatile [16 x i32]* @bigvar
store volatile [16 x i32] %val, [16 x i32]* @bigvar		store volatile [16 x i32] %val, [16 x i32]* @bigvar
ret void		ret void
}		}

define arm_aapcscc void @undef_fn() alignstack(8) "interrupt"="UNDEF" {		define arm_aapcscc void @undef_fn() alignstack(8) "interrupt"="UNDEF" {
; CHECK-A-LABEL: undef_fn:		; CHECK-A-LABEL: undef_fn:
; CHECK-A: push {r0, r1, r2, r3, r10, r11, r12, lr}		; CHECK-A: push {r0, r1, r2, r3, r10, r11, r12, lr}
; CHECK-A: add r11, sp, #20		; CHECK-A: add r11, sp, #20
; CHECK-A-NOT: sub sp, sp, #{{[0-9]+}}		; CHECK-A-NOT: sub sp, sp, #{{[0-9]+}}
; CHECK-A: bic sp, sp, #7		; CHECK-A: bfc sp, #0, #3
; [...]		; [...]
; CHECK-A: sub sp, r11, #20		; CHECK-A: sub sp, r11, #20
; CHECK-A: pop {r0, r1, r2, r3, r10, r11, r12, lr}		; CHECK-A: pop {r0, r1, r2, r3, r10, r11, r12, lr}
; CHECK-A: subs pc, lr, #0		; CHECK-A: subs pc, lr, #0

call void @bar()		call void @bar()
ret void		ret void
}		}

define arm_aapcscc void @abort_fn() alignstack(8) "interrupt"="ABORT" {		define arm_aapcscc void @abort_fn() alignstack(8) "interrupt"="ABORT" {
; CHECK-A-LABEL: abort_fn:		; CHECK-A-LABEL: abort_fn:
; CHECK-A: push {r0, r1, r2, r3, r10, r11, r12, lr}		; CHECK-A: push {r0, r1, r2, r3, r10, r11, r12, lr}
; CHECK-A: add r11, sp, #20		; CHECK-A: add r11, sp, #20
; CHECK-A-NOT: sub sp, sp, #{{[0-9]+}}		; CHECK-A-NOT: sub sp, sp, #{{[0-9]+}}
; CHECK-A: bic sp, sp, #7		; CHECK-A: bfc sp, #0, #3
; [...]		; [...]
; CHECK-A: sub sp, r11, #20		; CHECK-A: sub sp, r11, #20
; CHECK-A: pop {r0, r1, r2, r3, r10, r11, r12, lr}		; CHECK-A: pop {r0, r1, r2, r3, r10, r11, r12, lr}
; CHECK-A: subs pc, lr, #4		; CHECK-A: subs pc, lr, #4

call void @bar()		call void @bar()
ret void		ret void
}		}
Show All 17 Lines

test/CodeGen/ARM/spill-q.ll

	; RUN: llc < %s -mtriple=armv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s
	; PR4789			; PR4789

	%bar = type { float, float, float }			%bar = type { float, float, float }
	%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }			%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }
	%foo = type { <4 x float> }			%foo = type { <4 x float> }
	%quux = type { i32 (...)*, %baz, i32 }			%quux = type { i32 (...)*, %baz, i32 }
	%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }			%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }

	declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly			declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly

	define void @aaa(%quuz* %this, i8* %block) {			define void @aaa(%quuz* %this, i8* %block) {
	; CHECK-LABEL: aaa:			; CHECK-LABEL: aaa:
	; CHECK: bic {{.*}}, #15			; CHECK: bfc {{.*}}, #0, #4
	; CHECK: vst1.64 {{.*}}sp:128			; CHECK: vst1.64 {{.*}}sp:128
	; CHECK: vld1.64 {{.*}}sp:128			; CHECK: vld1.64 {{.*}}sp:128
	entry:			entry:
	%aligned_vec = alloca <4 x float>, align 16			%aligned_vec = alloca <4 x float>, align 16
	%"alloca point" = bitcast i32 0 to i32			%"alloca point" = bitcast i32 0 to i32
	%vecptr = bitcast <4 x float>* %aligned_vec to i8*			%vecptr = bitcast <4 x float>* %aligned_vec to i8*
	%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %vecptr, i32 1) nounwind ; <<4 x float>> [#uses=1]			%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %vecptr, i32 1) nounwind ; <<4 x float>> [#uses=1]
	store float 6.300000e+01, float* undef, align 4			store float 6.300000e+01, float* undef, align 4
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

test/CodeGen/ARM/stack-alignment.ll

This file was added.

				; RUN: llc -verify-machineinstrs < %s -mtriple=armv4t \| FileCheck %s -check-prefix=CHECK-v4A32
				; RUN: llc -verify-machineinstrs < %s -mtriple=armv7a \| FileCheck %s -check-prefix=CHECK-v7A32
				; RUN: llc -verify-machineinstrs < %s -mtriple=thumbv7a \| FileCheck %s -check-prefix=CHECK-THUMB2
				; FIXME: There are no tests for Thumb1 since dynamic stack alignment is not supported for
				; Thumb1.

				define i32 @f_bic_can_be_used_align() nounwind {
				entry:
				; CHECK-LABEL: f_bic_can_be_used_align:
				; CHECK-v7A32: bfc sp, #0, #8
				; CHECK-v4A32: bic sp, sp, #255
				; CHECK-THUMB2: mov r4, sp
				; CHECK-THUMB2-NEXT: bfc r4, #0, #8
				; CHECK-THUMB2-NEXT: mov sp, r4
				%x = alloca i32, align 256
				store volatile i32 0, i32* %x, align 256
				ret i32 0
				}

				define i32 @f_too_large_for_bic_align() nounwind {
				entry:
				; CHECK-LABEL: f_too_large_for_bic_align:
				; CHECK-v7A32: bfc sp, #0, #9
				; CHECK-v4A32: lsr sp, sp, #9
				; CHECK-v4A32: lsl sp, sp, #9
				; CHECK-THUMB2: mov r4, sp
				; CHECK-THUMB2-NEXT: bfc r4, #0, #9
				; CHECK-THUMB2-NEXT: mov sp, r4
				%x = alloca i32, align 512
				store volatile i32 0, i32* %x, align 512
				ret i32 0
				}

				define i8* @f_alignedDPRCS2Spills(double* %d) #0 {
				entry:
				; CHECK-LABEL: f_too_large_for_bic_align:
				; CHECK-v7A32: bfc sp, #0, #12
				; CHECK-v4A32: lsr sp, sp, #12
				; CHECK-v4A32: lsl sp, sp, #12
				; CHECK-THUMB2: bfc r4, #0, #12
				; CHECK-THUMB2-NEXT: mov sp, r4
				%a = alloca i8, align 4096
				%0 = load double* %d, align 4
				%arrayidx1 = getelementptr inbounds double* %d, i32 1
				%1 = load double* %arrayidx1, align 4
				%arrayidx2 = getelementptr inbounds double* %d, i32 2
				%2 = load double* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds double* %d, i32 3
				%3 = load double* %arrayidx3, align 4
				%arrayidx4 = getelementptr inbounds double* %d, i32 4
				%4 = load double* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds double* %d, i32 5
				%5 = load double* %arrayidx5, align 4
				%arrayidx6 = getelementptr inbounds double* %d, i32 6
				%6 = load double* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds double* %d, i32 7
				%7 = load double* %arrayidx7, align 4
				%arrayidx8 = getelementptr inbounds double* %d, i32 8
				%8 = load double* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds double* %d, i32 9
				%9 = load double* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds double* %d, i32 10
				%10 = load double* %arrayidx10, align 4
				%arrayidx11 = getelementptr inbounds double* %d, i32 11
				%11 = load double* %arrayidx11, align 4
				%arrayidx12 = getelementptr inbounds double* %d, i32 12
				%12 = load double* %arrayidx12, align 4
				%arrayidx13 = getelementptr inbounds double* %d, i32 13
				%13 = load double* %arrayidx13, align 4
				%arrayidx14 = getelementptr inbounds double* %d, i32 14
				%14 = load double* %arrayidx14, align 4
				%arrayidx15 = getelementptr inbounds double* %d, i32 15
				%15 = load double* %arrayidx15, align 4
				%arrayidx16 = getelementptr inbounds double* %d, i32 16
				%16 = load double* %arrayidx16, align 4
				%arrayidx17 = getelementptr inbounds double* %d, i32 17
				%17 = load double* %arrayidx17, align 4
				%arrayidx18 = getelementptr inbounds double* %d, i32 18
				%18 = load double* %arrayidx18, align 4
				%arrayidx19 = getelementptr inbounds double* %d, i32 19
				%19 = load double* %arrayidx19, align 4
				%arrayidx20 = getelementptr inbounds double* %d, i32 20
				%20 = load double* %arrayidx20, align 4
				%arrayidx21 = getelementptr inbounds double* %d, i32 21
				%21 = load double* %arrayidx21, align 4
				%arrayidx22 = getelementptr inbounds double* %d, i32 22
				%22 = load double* %arrayidx22, align 4
				%arrayidx23 = getelementptr inbounds double* %d, i32 23
				%23 = load double* %arrayidx23, align 4
				%arrayidx24 = getelementptr inbounds double* %d, i32 24
				%24 = load double* %arrayidx24, align 4
				%arrayidx25 = getelementptr inbounds double* %d, i32 25
				%25 = load double* %arrayidx25, align 4
				%arrayidx26 = getelementptr inbounds double* %d, i32 26
				%26 = load double* %arrayidx26, align 4
				%arrayidx27 = getelementptr inbounds double* %d, i32 27
				%27 = load double* %arrayidx27, align 4
				%arrayidx28 = getelementptr inbounds double* %d, i32 28
				%28 = load double* %arrayidx28, align 4
				%arrayidx29 = getelementptr inbounds double* %d, i32 29
				%29 = load double* %arrayidx29, align 4
				%div = fdiv double %29, %28
				%div30 = fdiv double %div, %27
				%div31 = fdiv double %div30, %26
				%div32 = fdiv double %div31, %25
				%div33 = fdiv double %div32, %24
				%div34 = fdiv double %div33, %23
				%div35 = fdiv double %div34, %22
				%div36 = fdiv double %div35, %21
				%div37 = fdiv double %div36, %20
				%div38 = fdiv double %div37, %19
				%div39 = fdiv double %div38, %18
				%div40 = fdiv double %div39, %17
				%div41 = fdiv double %div40, %16
				%div42 = fdiv double %div41, %15
				%div43 = fdiv double %div42, %14
				%div44 = fdiv double %div43, %13
				%div45 = fdiv double %div44, %12
				%div46 = fdiv double %div45, %11
				%div47 = fdiv double %div46, %10
				%div48 = fdiv double %div47, %9
				%div49 = fdiv double %div48, %8
				%div50 = fdiv double %div49, %7
				%div51 = fdiv double %div50, %6
				%div52 = fdiv double %div51, %5
				%div53 = fdiv double %div52, %4
				%div54 = fdiv double %div53, %3
				%div55 = fdiv double %div54, %2
				%div56 = fdiv double %div55, %1
				%div57 = fdiv double %div56, %0
				%div58 = fdiv double %0, %1
				%div59 = fdiv double %div58, %2
				%div60 = fdiv double %div59, %3
				%div61 = fdiv double %div60, %4
				%div62 = fdiv double %div61, %5
				%div63 = fdiv double %div62, %6
				%div64 = fdiv double %div63, %7
				%div65 = fdiv double %div64, %8
				%div66 = fdiv double %div65, %9
				%div67 = fdiv double %div66, %10
				%div68 = fdiv double %div67, %11
				%div69 = fdiv double %div68, %12
				%div70 = fdiv double %div69, %13
				%div71 = fdiv double %div70, %14
				%div72 = fdiv double %div71, %15
				%div73 = fdiv double %div72, %16
				%div74 = fdiv double %div73, %17
				%div75 = fdiv double %div74, %18
				%div76 = fdiv double %div75, %19
				%div77 = fdiv double %div76, %20
				%div78 = fdiv double %div77, %21
				%div79 = fdiv double %div78, %22
				%div80 = fdiv double %div79, %23
				%div81 = fdiv double %div80, %24
				%div82 = fdiv double %div81, %25
				%div83 = fdiv double %div82, %26
				%div84 = fdiv double %div83, %27
				%div85 = fdiv double %div84, %28
				%div86 = fdiv double %div85, %29
				%mul = fmul double %div57, %div86
				%conv = fptosi double %mul to i32
				%add.ptr = getelementptr inbounds i8* %a, i32 %conv
				ret i8* %add.ptr
				}

test/CodeGen/Thumb2/aligned-spill.ll

	; RUN: llc < %s -mcpu=cortex-a8 -align-neon-spills=0 \| FileCheck %s			; RUN: llc < %s -mcpu=cortex-a8 -align-neon-spills=0 \| FileCheck %s
	; RUN: llc < %s -mcpu=cortex-a8 -align-neon-spills=1 \| FileCheck %s --check-prefix=NEON			; RUN: llc < %s -mcpu=cortex-a8 -align-neon-spills=1 \| FileCheck %s --check-prefix=NEON
	target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"			target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
	target triple = "thumbv7-apple-ios"			target triple = "thumbv7-apple-ios"

	; CHECK: f			; CHECK: f
	; This function is forced to spill a double.			; This function is forced to spill a double.
	; Verify that the spill slot is properly aligned.			; Verify that the spill slot is properly aligned.
	;			;
	; The caller-saved r4 is used as a scratch register for stack realignment.			; The caller-saved r4 is used as a scratch register for stack realignment.
	; CHECK: push {r4, r7, lr}			; CHECK: push {r4, r7, lr}
	; CHECK: bic r4, r4, #7			; CHECK: bfc r4, #0, #3
	; CHECK: mov sp, r4			; CHECK: mov sp, r4
	define void @f(double* nocapture %p) nounwind ssp {			define void @f(double* nocapture %p) nounwind ssp {
	entry:			entry:
	%0 = load double* %p, align 4			%0 = load double* %p, align 4
	tail call void asm sideeffect "", "~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14},~{d15}"() nounwind			tail call void asm sideeffect "", "~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14},~{d15}"() nounwind
	tail call void @g() nounwind			tail call void @g() nounwind
	store double %0, double* %p, align 4			store double %0, double* %p, align 4
	ret void			ret void
	}			}

	; NEON: f			; NEON: f
	; NEON: push {r4, r7, lr}			; NEON: push {r4, r7, lr}
	; NEON: sub.w r4, sp, #64			; NEON: sub.w r4, sp, #64
	; NEON: bic r4, r4, #15			; NEON: bfc r4, #0, #4
	; Stack pointer must be updated before the spills.			; Stack pointer must be updated before the spills.
	; NEON: mov sp, r4			; NEON: mov sp, r4
	; NEON: vst1.64 {d8, d9, d10, d11}, [r4:128]!			; NEON: vst1.64 {d8, d9, d10, d11}, [r4:128]!
	; NEON: vst1.64 {d12, d13, d14, d15}, [r4:128]			; NEON: vst1.64 {d12, d13, d14, d15}, [r4:128]
	; Stack pointer adjustment for the stack frame contents.			; Stack pointer adjustment for the stack frame contents.
	; This could legally happen before the spills.			; This could legally happen before the spills.
	; Since the spill slot is only 8 bytes, technically it would be fine to only			; Since the spill slot is only 8 bytes, technically it would be fine to only
	; subtract #8 here. That would leave sp less aligned than some stack slots,			; subtract #8 here. That would leave sp less aligned than some stack slots,
	Show All 14 Lines
	entry:			entry:
	tail call void asm sideeffect "", "~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14}"() nounwind			tail call void asm sideeffect "", "~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14}"() nounwind
	ret void			ret void
	}			}

	; NEON: f7			; NEON: f7
	; NEON: push {r4, r7, lr}			; NEON: push {r4, r7, lr}
	; NEON: sub.w r4, sp, #56			; NEON: sub.w r4, sp, #56
	; NEON: bic r4, r4, #15			; NEON: bfc r4, #0, #4
	; Stack pointer must be updated before the spills.			; Stack pointer must be updated before the spills.
	; NEON: mov sp, r4			; NEON: mov sp, r4
	; NEON: vst1.64 {d8, d9, d10, d11}, [r4:128]!			; NEON: vst1.64 {d8, d9, d10, d11}, [r4:128]!
	; NEON: vst1.64 {d12, d13}, [r4:128]			; NEON: vst1.64 {d12, d13}, [r4:128]
	; NEON: vstr d14, [r4, #16]			; NEON: vstr d14, [r4, #16]
	; Epilog			; Epilog
	; NEON: vld1.64 {d8, d9, d10, d11},			; NEON: vld1.64 {d8, d9, d10, d11},
	; NEON: vld1.64 {d12, d13},			; NEON: vld1.64 {d12, d13},
	Show All 10 Lines
	}			}

	; Aligned spilling only works for contiguous ranges starting from d8.			; Aligned spilling only works for contiguous ranges starting from d8.
	; The rest goes to the standard vpush instructions.			; The rest goes to the standard vpush instructions.
	; NEON: f3plus4			; NEON: f3plus4
	; NEON: push {r4, r7, lr}			; NEON: push {r4, r7, lr}
	; NEON: vpush {d12, d13, d14, d15}			; NEON: vpush {d12, d13, d14, d15}
	; NEON: sub.w r4, sp, #24			; NEON: sub.w r4, sp, #24
	; NEON: bic r4, r4, #15			; NEON: bfc r4, #0, #4
	; Stack pointer must be updated before the spills.			; Stack pointer must be updated before the spills.
	; NEON: mov sp, r4			; NEON: mov sp, r4
	; NEON: vst1.64 {d8, d9}, [r4:128]			; NEON: vst1.64 {d8, d9}, [r4:128]
	; NEON: vstr d10, [r4, #16]			; NEON: vstr d10, [r4, #16]
	; Epilog			; Epilog
	; NEON: vld1.64 {d8, d9},			; NEON: vld1.64 {d8, d9},
	; NEON: vldr d10, [{{.*}}, #16]			; NEON: vldr d10, [{{.*}}, #16]
	; The stack pointer restore must happen after the reloads.			; The stack pointer restore must happen after the reloads.
	; NEON: mov sp,			; NEON: mov sp,
	; NEON: vpop {d12, d13, d14, d15}			; NEON: vpop {d12, d13, d14, d15}
	; NEON: pop			; NEON: pop

test/CodeGen/Thumb2/thumb2-spill-q.ll

	; RUN: llc < %s -mtriple=thumbv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s
	; PR4789			; PR4789

	%bar = type { float, float, float }			%bar = type { float, float, float }
	%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }			%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }
	%foo = type { <4 x float> }			%foo = type { <4 x float> }
	%quux = type { i32 (...)*, %baz, i32 }			%quux = type { i32 (...)*, %baz, i32 }
	%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }			%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }

	declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly			declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly

	define void @aaa(%quuz* %this, i8* %block) {			define void @aaa(%quuz* %this, i8* %block) {
	; CHECK-LABEL: aaa:			; CHECK-LABEL: aaa:
	; CHECK: bic r4, r4, #15			; CHECK: bfc r4, #0, #4
	; CHECK: vst1.64 {{.}}[{{.}}:128]			; CHECK: vst1.64 {{.}}[{{.}}:128]
	; CHECK: vld1.64 {{.}}[{{.}}:128]			; CHECK: vld1.64 {{.}}[{{.}}:128]
	entry:			entry:
	%aligned_vec = alloca <4 x float>, align 16			%aligned_vec = alloca <4 x float>, align 16
	%"alloca point" = bitcast i32 0 to i32			%"alloca point" = bitcast i32 0 to i32
	%vecptr = bitcast <4 x float>* %aligned_vec to i8*			%vecptr = bitcast <4 x float>* %aligned_vec to i8*
	%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %vecptr, i32 1) nounwind			%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %vecptr, i32 1) nounwind
	store float 6.300000e+01, float* undef, align 4			store float 6.300000e+01, float* undef, align 4
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix large stack alignment codegen bug for ARM and Thumb2 targetsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 17855

lib/Target/ARM/ARMExpandPseudoInsts.cpp

lib/Target/ARM/ARMFrameLowering.cpp

test/CodeGen/ARM/alloc-no-stack-realign.ll

test/CodeGen/ARM/fold-stack-adjust.ll

test/CodeGen/ARM/interrupt-attr.ll

test/CodeGen/ARM/spill-q.ll

test/CodeGen/ARM/stack-alignment.ll

test/CodeGen/Thumb2/aligned-spill.ll

test/CodeGen/Thumb2/thumb2-spill-q.ll

[ARM] Fix large stack alignment codegen bug for ARM and Thumb2 targets
ClosedPublic