This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMSubtarget.cpp
1/2
Thumb1FrameLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
thumb1_return_sequence.ll
-
v8m-tail-call.ll
-
Thumb/
-
thumb-shrink-wrapping.ll

Differential D39599

[ARM] Fix incorrect conversion of a tail call to an ordinary call
ClosedPublic

Authored by chill on Nov 3 2017, 7:28 AM.

Download Raw Diff

Details

Reviewers

rengolin
asl
olista01
john.brawn
peter.smith
efriedma

Commits

rGdc86e1444d42: [ARM] Fix incorrect conversion of a tail call to an ordinary call
rL318143: [ARM] Fix incorrect conversion of a tail call to an ordinary call

Summary

Compiling the following program:

int g(int), h(int, int, int, int, int);

int f(int a, int b, int c, int d, int e) {
  a = g(a);
  if (a == -1)
    return -1;
  return h(a, b, c, d, e);
}

with clang -target arm-arm-none-eabi -mcpu=cortex-m23 -O2 produced this assembly:

f:
        .fnstart
        .save   {r4, r5, r6, r7, lr}
        push    {r4, r5, r6, r7, lr}
        .setfp  r7, sp, #12
        add     r7, sp, #12
        .pad    #4
        sub     sp, #4
        mov     r4, r3
        mov     r5, r2
        mov     r6, r1
        bl      g
        adds    r1, r0, #1
        beq     .LBB0_2
        mov     r1, r6
        mov     r2, r5
        mov     r3, r4
        bl      h
        add     sp, #4
        pop     {r4, r5, r6, r7, pc}
.LBB0_2:
        movs    r0, #0
        mvns    r0, r0
        add     sp, #4
        pop     {r4, r5, r6, r7, pc}

Here, the function h is called with an incorrect stack argument. The reason is that the compiler originally created a tail call to h , but then
converted it to an ordinary call because LR was saved by the function and restoring LR is a bit more involved for Armv6m/Armv8m.base (a.k.a. "16-bit Thumb") and negates the benefits of the tail call. Unfortunately, this conversion is incorrect for functions, which have stack arguments as nothing has been done to pass the stack arguments to the callee.

Not doing that conversion and leaving the task of properly restoring LR to emitPopSpecialFixUp solves the correctness problem.
Unfortunately, for functions, which do save LR and tail-call a function without stack arguments we generate a slightly worse code.

Now, moving to emitPopSpecialFixUp, in the case we couldn't immediately find a "pop-friendly" register, but we have a pop instructions, we
can use as a temporary one of the callee-saved low registers and restore LR before popping other calle-saves.

After the patch, the code, generated for the tail-call looks like:

ldr     r4, [sp, #20]
mov     lr, r4
pop     {r4, r5, r6, r7}
add     sp, #4
b       h

Diff Detail

Event Timeline

chill created this revision.Nov 3 2017, 7:28 AM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptNov 3 2017, 7:28 AM

rogfer01 added a subscriber: rogfer01.Nov 3 2017, 8:22 AM

rogfer01 added inline comments.

lib/Target/ARM/Thumb1FrameLowering.cpp
516	Perhaps you meant `const BitVector &` here?

chill marked an inline comment as done.Nov 3 2017, 9:28 AM

chill added inline comments.

lib/Target/ARM/Thumb1FrameLowering.cpp
516	Yeah, absolutely. Fixed, will upload with other (eventual) changes.

Fix accidental pass-by-value (parameter should have reference type).

Ping?

rengolin added a reviewer: peter.smith.Nov 10 2017, 6:10 AM

Please fix the comment in ARMSubtarget::initSubtargetFeatures to match what you're doing.

Thanks for the comments. Comment in ARMSubtarget::initSubtargetFeatures updated.

Please make it clear in the commit message that this reverts D29020/r294000; otherwise, LGTM.

This revision is now accepted and ready to land.Nov 13 2017, 12:13 PM

Closed by commit rL318143: [ARM] Fix incorrect conversion of a tail call to an ordinary call (authored by chill). · Explain WhyNov 14 2017, 2:37 AM

This revision was automatically updated to reflect the committed changes.

Many thanks for the review.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMSubtarget.cpp

4 lines

Thumb1FrameLowering.cpp

106 lines

test/

CodeGen/

ARM/

thumb1_return_sequence.ll

41 lines

v8m-tail-call.ll

58 lines

Thumb/

thumb-shrink-wrapping.ll

9 lines

Diff 122626

lib/Target/ARM/ARMSubtarget.cpp

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	void ARMSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
// the Thumb1 16-bit unconditional branch doesn't have sufficient relocation		// the Thumb1 16-bit unconditional branch doesn't have sufficient relocation
// support in the assembler and linker to be used. This would need to be		// support in the assembler and linker to be used. This would need to be
// fixed to fully support tail calls in Thumb1.		// fixed to fully support tail calls in Thumb1.
//		//
// For ARMv8-M, we /do/ implement tail calls. Doing this is tricky for v8-M		// For ARMv8-M, we /do/ implement tail calls. Doing this is tricky for v8-M
// baseline, since the LDM/POP instruction on Thumb doesn't take LR. This		// baseline, since the LDM/POP instruction on Thumb doesn't take LR. This
// means if we need to reload LR, it takes extra instructions, which outweighs		// means if we need to reload LR, it takes extra instructions, which outweighs
// the value of the tail call; but here we don't know yet whether LR is going		// the value of the tail call; but here we don't know yet whether LR is going
// to be used. We generate the tail call here and turn it back into CALL/RET		// to be used. We take the optimistic approach of generating the tail call and
// in emitEpilogue if LR is used.		// perhaps taking a hit if we need to restore the LR.

// Thumb1 PIC calls to external symbols use BX, so they can be tail calls,		// Thumb1 PIC calls to external symbols use BX, so they can be tail calls,
// but we need to make sure there are enough registers; the only valid		// but we need to make sure there are enough registers; the only valid
// registers are the 4 used for parameters. We don't currently do this		// registers are the 4 used for parameters. We don't currently do this
// case.		// case.

SupportsTailCall = !isThumb() \|\| hasV8MBaselineOps();		SupportsTailCall = !isThumb() \|\| hasV8MBaselineOps();

▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

lib/Target/ARM/Thumb1FrameLowering.cpp

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	bool Thumb1FrameLowering::needPopSpecialFixUp(const MachineFunction &MF) const {
// LR cannot be encoded with Thumb1, i.e., it requires a special fix-up.		// LR cannot be encoded with Thumb1, i.e., it requires a special fix-up.
for (const CalleeSavedInfo &CSI : MF.getFrameInfo().getCalleeSavedInfo())		for (const CalleeSavedInfo &CSI : MF.getFrameInfo().getCalleeSavedInfo())
if (CSI.getReg() == ARM::LR)		if (CSI.getReg() == ARM::LR)
return true;		return true;

return false;		return false;
}		}

		static void findTemporariesForLR(const BitVector &GPRsNoLRSP,
		const BitVector &PopFriendly,
		rogfer01Unsubmitted Done Reply Inline Actions Perhaps you meant `const BitVector &` here? rogfer01: Perhaps you meant `const BitVector &` here?
		chillAuthorUnsubmitted Not Done Reply Inline Actions Yeah, absolutely. Fixed, will upload with other (eventual) changes. chill: Yeah, absolutely. Fixed, will upload with other (eventual) changes.
		const LivePhysRegs &UsedRegs, unsigned &PopReg,
		unsigned &TmpReg) {
		PopReg = TmpReg = 0;
		for (auto Reg : GPRsNoLRSP.set_bits()) {
		if (!UsedRegs.contains(Reg)) {
		// Remember the first pop-friendly register and exit.
		if (PopFriendly.test(Reg)) {
		PopReg = Reg;
		TmpReg = 0;
		break;
		}
		// Otherwise, remember that the register will be available to
		// save a pop-friendly register.
		TmpReg = Reg;
		}
		}
		}

bool Thumb1FrameLowering::emitPopSpecialFixUp(MachineBasicBlock &MBB,		bool Thumb1FrameLowering::emitPopSpecialFixUp(MachineBasicBlock &MBB,
bool DoIt) const {		bool DoIt) const {
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();		ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize();		unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize();
const TargetInstrInfo &TII = *STI.getInstrInfo();		const TargetInstrInfo &TII = *STI.getInstrInfo();
const ThumbRegisterInfo *RegInfo =		const ThumbRegisterInfo *RegInfo =
static_cast<const ThumbRegisterInfo *>(STI.getRegisterInfo());		static_cast<const ThumbRegisterInfo *>(STI.getRegisterInfo());
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	bool Thumb1FrameLowering::emitPopSpecialFixUp(MachineBasicBlock &MBB,
// Rebuild the GPRs from the high registers because they are removed		// Rebuild the GPRs from the high registers because they are removed
// form the GPR reg class for thumb1.		// form the GPR reg class for thumb1.
BitVector GPRsNoLRSP =		BitVector GPRsNoLRSP =
TRI.getAllocatableSet(MF, TRI.getRegClass(ARM::hGPRRegClassID));		TRI.getAllocatableSet(MF, TRI.getRegClass(ARM::hGPRRegClassID));
GPRsNoLRSP \|= PopFriendly;		GPRsNoLRSP \|= PopFriendly;
GPRsNoLRSP.reset(ARM::LR);		GPRsNoLRSP.reset(ARM::LR);
GPRsNoLRSP.reset(ARM::SP);		GPRsNoLRSP.reset(ARM::SP);
GPRsNoLRSP.reset(ARM::PC);		GPRsNoLRSP.reset(ARM::PC);
for (unsigned Register : GPRsNoLRSP.set_bits()) {		findTemporariesForLR(GPRsNoLRSP, PopFriendly, UsedRegs, PopReg, TemporaryReg);
if (!UsedRegs.contains(Register)) {
// Remember the first pop-friendly register and exit.		// If we couldn't find a pop-friendly register, restore LR before popping the
if (PopFriendly.test(Register)) {		// other callee-saved registers, so we can use one of them as a temporary.
PopReg = Register;		bool UseLDRSP = false;
TemporaryReg = 0;		if (!PopReg && MBBI != MBB.begin()) {
break;		auto PrevMBBI = MBBI;
}		PrevMBBI--;
// Otherwise, remember that the register will be available to		if (PrevMBBI->getOpcode() == ARM::tPOP) {
// save a pop-friendly register.		MBBI = PrevMBBI;
TemporaryReg = Register;		UsedRegs.stepBackward(*MBBI);
		findTemporariesForLR(GPRsNoLRSP, PopFriendly, UsedRegs, PopReg, TemporaryReg);
		UseLDRSP = true;
}		}
}		}

if (!DoIt && !PopReg && !TemporaryReg)		if (!DoIt && !PopReg && !TemporaryReg)
return false;		return false;

assert((PopReg \|\| TemporaryReg) && "Cannot get LR");		assert((PopReg \|\| TemporaryReg) && "Cannot get LR");

		if (UseLDRSP) {
		assert(PopReg && "Do not know how to get LR");
		// Load the LR via LDR tmp, [SP, #off]
		BuildMI(MBB, MBBI, dl, TII.get(ARM::tLDRspi))
		.addReg(PopReg, RegState::Define)
		.addReg(ARM::SP)
		.addImm(MBBI->getNumOperands() - 3)
		.add(predOps(ARMCC::AL));
		// Move from the temporary register to the LR.
		BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
		.addReg(ARM::LR, RegState::Define)
		.addReg(PopReg, RegState::Kill)
		.add(predOps(ARMCC::AL));
		// Advance past the pop instruction.
		MBBI++;
		// Increment the SP.
		emitSPUpdate(MBB, MBBI, TII, dl, *RegInfo, ArgRegsSaveSize + 4);
		return true;
		}

if (TemporaryReg) {		if (TemporaryReg) {
assert(!PopReg && "Unnecessary MOV is about to be inserted");		assert(!PopReg && "Unnecessary MOV is about to be inserted");
PopReg = PopFriendly.find_first();		PopReg = PopFriendly.find_first();
BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))		BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr))
.addReg(TemporaryReg, RegState::Define)		.addReg(TemporaryReg, RegState::Define)
.addReg(PopReg, RegState::Kill)		.addReg(PopReg, RegState::Kill)
.add(predOps(ARMCC::AL));		.add(predOps(ARMCC::AL));
}		}
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	for (unsigned i = CSI.size(); i != 0; --i) {
unsigned Reg = Info.getReg();		unsigned Reg = Info.getReg();

// High registers (excluding lr) have already been dealt with		// High registers (excluding lr) have already been dealt with
if (!(ARM::tGPRRegClass.contains(Reg) \|\| Reg == ARM::LR))		if (!(ARM::tGPRRegClass.contains(Reg) \|\| Reg == ARM::LR))
continue;		continue;

if (Reg == ARM::LR) {		if (Reg == ARM::LR) {
Info.setRestored(false);		Info.setRestored(false);
if (MBB.succ_empty()) {		if (!MBB.succ_empty() \|\|
		MI->getOpcode() == ARM::TCRETURNdi \|\|
		MI->getOpcode() == ARM::TCRETURNri)
		// LR may only be popped into PC, as part of return sequence.
		// If this isn't the return sequence, we'll need emitPopSpecialFixUp
		// to restore LR the hard way.
		// FIXME: if we don't pass any stack arguments it would be actually
		// advantageous and correct to do the conversion to an ordinary call
		// instruction here.
		continue;
// Special epilogue for vararg functions. See emitEpilogue		// Special epilogue for vararg functions. See emitEpilogue
if (isVarArg)		if (isVarArg)
continue;		continue;
// ARMv4T requires BX, see emitEpilogue		// ARMv4T requires BX, see emitEpilogue
if (!STI.hasV5TOps())		if (!STI.hasV5TOps())
continue;		continue;
// Tailcall optimization failed; change TCRETURN to a tBL
if (MI->getOpcode() == ARM::TCRETURNdi \|\|		// Pop LR into PC.
MI->getOpcode() == ARM::TCRETURNri) {
unsigned Opcode = MI->getOpcode() == ARM::TCRETURNdi
? ARM::tBL : ARM::tBLXr;
MachineInstrBuilder BL = BuildMI(MF, DL, TII.get(Opcode));
BL.add(predOps(ARMCC::AL));
BL.add(MI->getOperand(0));
MBB.insert(MI, &*BL);
}
Reg = ARM::PC;		Reg = ARM::PC;
(*MIB).setDesc(TII.get(ARM::tPOP_RET));		(*MIB).setDesc(TII.get(ARM::tPOP_RET));
if (MI != MBB.end())		if (MI != MBB.end())
MIB.copyImplicitOps(*MI);		MIB.copyImplicitOps(*MI);
MI = MBB.erase(MI);		MI = MBB.erase(MI);
} else
// LR may only be popped into PC, as part of return sequence.
// If this isn't the return sequence, we'll need emitPopSpecialFixUp
// to restore LR the hard way.
continue;
}		}
MIB.addReg(Reg, getDefRegState(true));		MIB.addReg(Reg, getDefRegState(true));
NeedsPop = true;		NeedsPop = true;
}		}

// It's illegal to emit pop instruction without operands.		// It's illegal to emit pop instruction without operands.
if (NeedsPop)		if (NeedsPop)
MBB.insert(MI, &*MIB);		MBB.insert(MI, &*MIB);
else		else
MF.DeleteMachineInstr(MIB);		MF.DeleteMachineInstr(MIB);

return true;		return true;
}		}

test/CodeGen/ARM/thumb1_return_sequence.ll

Show All 19 Lines	; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}
store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %a, align 16		store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %a, align 16
%0 = load <4 x i32>, <4 x i32>* %a, align 16		%0 = load <4 x i32>, <4 x i32>* %a, align 16
ret <4 x i32> %0		ret <4 x i32> %0

; Epilogue		; Epilogue
; --------		; --------
; Stack realignment means sp is restored from frame pointer		; Stack realignment means sp is restored from frame pointer
; CHECK-V4T: mov sp		; CHECK-V4T: mov sp
		; CHECK-V4T-NEXT: ldr [[POP:r[4567]]], [sp, #{{.*}}]
		; CHECK-V4T-NEXT: mov lr, [[POP]]
; CHECK-V4T-NEXT: pop {[[SAVED]]}		; CHECK-V4T-NEXT: pop {[[SAVED]]}
		; CHECK-V4T-NEXT add sp, sp, #4
; The ISA for v4 does not support pop pc, so make sure we do not emit		; The ISA for v4 does not support pop pc, so make sure we do not emit
; one even when we do not need to update SP.		; one even when we do not need to update SP.
; CHECK-V4T-NOT: pop {pc}		; CHECK-V4T-NOT: pop {pc}
; We may only use lo register to pop, but in that case, all the scratch		; CHECK-V4T: bx lr
; ones are used.
; r12 is the only register we are allowed to clobber for AAPCS.
; Use it to save a lo register.
; CHECK-V4T-NEXT: mov [[TEMP_REG:r12]], [[POP_REG:r[0-7]]]
; Pop the value of LR.
; CHECK-V4T-NEXT: pop {[[POP_REG]]}
; Copy the value of LR in the right register.
; CHECK-V4T-NEXT: mov lr, [[POP_REG]]
; Restore the value that was in the register we used to pop the value of LR.
; CHECK-V4T-NEXT: mov [[POP_REG]], [[TEMP_REG]]
; Return.
; CHECK-V4T-NEXT: bx lr
; CHECK-V5T: pop {[[SAVED]], pc}		; CHECK-V5T: pop {[[SAVED]], pc}
}		}

; CHECK-V4T-LABEL: clobbervariadicframe		; CHECK-V4T-LABEL: clobbervariadicframe
; CHECK-V5T-LABEL: clobbervariadicframe		; CHECK-V5T-LABEL: clobbervariadicframe
define <4 x i32> @clobbervariadicframe(i32 %i, ...) #0 {		define <4 x i32> @clobbervariadicframe(i32 %i, ...) #0 {
entry:		entry:
; Prologue		; Prologue
; --------		; --------
; CHECK-V4T: sub sp,		; CHECK-V4T: sub sp,
; CHECK-V4T: push {[[SAVED:(r[4567](, )?)+]], lr}		; CHECK-V4T: push {[[SAVED:(r[4567](, )?)+]], lr}
; CHECK-V5T: sub sp,		; CHECK-V5T: sub sp,
; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}		; CHECK-V5T: push {[[SAVED:(r[4567](, )?)+]], lr}

%b = alloca <4 x i32>, align 16		%b = alloca <4 x i32>, align 16
%a = alloca <4 x i32>, align 16		%a = alloca <4 x i32>, align 16
store <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32>* %b, align 16		store <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32>* %b, align 16
store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %a, align 16		store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %a, align 16
%0 = load <4 x i32>, <4 x i32>* %a, align 16		%0 = load <4 x i32>, <4 x i32>* %a, align 16
call void @llvm.va_start(i8* null)		call void @llvm.va_start(i8* null)
ret <4 x i32> %0		ret <4 x i32> %0

; Epilogue		; Epilogue
; --------		; --------
; CHECK-V4T: pop {[[SAVED]]}		; CHECK-V4T: ldr [[POP:r[4567]]], [sp, #{{.*}}]
; CHECK-V4T-NEXT: mov r12, [[POP_REG:r[0-7]]]		; CHECK-V4T-NEXT: mov lr, [[POP]]
; CHECK-V4T-NEXT: pop {[[POP_REG]]}		; CHECK-V4T-NEXT: pop {[[SAVED]]}
; CHECK-V4T-NEXT: add sp,		; CHECK-V4T-NEXT: add sp, #16
; CHECK-V4T-NEXT: mov lr, [[POP_REG]]		; CHECK-V4T-NEXT: bx lr
; CHECK-V4T-NEXT: mov [[POP_REG]], r12
; CHECK-V4T: bx lr
; CHECK-V5T: lsls r4		; CHECK-V5T: lsls r4
; CHECK-V5T-NEXT: mov sp, r4		; CHECK-V5T-NEXT: mov sp, r4
; CHECK-V5T: pop {[[SAVED]]}		; CHECK-V5T: ldr [[POP:r[4567]]], [sp, #{{.*}}]
; CHECK-V5T-NEXT: mov r12, [[POP_REG:r[0-7]]]		; CHECK-V5T-NEXT: mov lr, [[POP]]
; CHECK-V5T-NEXT: pop {[[POP_REG]]}		; CHECK-V5T-NEXT: pop {[[SAVED]]}
; CHECK-V5T-NEXT: add sp,		; CHECK-V5T-NEXT: add sp, #16
; CHECK-V5T-NEXT: mov lr, [[POP_REG]]
; CHECK-V5T-NEXT: mov [[POP_REG]], r12
; CHECK-V5T-NEXT: bx lr		; CHECK-V5T-NEXT: bx lr
}		}

; CHECK-V4T-LABEL: simpleframe		; CHECK-V4T-LABEL: simpleframe
; CHECK-V5T-LABEL: simpleframe		; CHECK-V5T-LABEL: simpleframe
define i32 @simpleframe(<6 x i32>* %p) #0 {		define i32 @simpleframe(<6 x i32>* %p) #0 {
entry:		entry:
; Prologue		; Prologue
; --------		; --------
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

test/CodeGen/ARM/v8m-tail-call.ll

	; RUN: llc %s -o - -mtriple=thumbv8m.base \| FileCheck %s			; RUN: llc %s -o - -mtriple=thumbv8m.base \| FileCheck %s

	define void @test() {			declare i32 @g(...)
	; CHECK-LABEL: test:
	entry:			declare i32 @h0(i32, i32, i32, i32)
	%call = tail call i32 @foo()			define hidden i32 @f0() {
	%tail = tail call i32 @foo()			%1 = tail call i32 bitcast (i32 (...)* @g to i32 ()*)()
	ret void			%2 = tail call i32 @h0(i32 %1, i32 1, i32 2, i32 3)
	; CHECK: bl foo			ret i32 %2
	; CHECK: bl foo			; CHECK-LABEL: f0
	; CHECK-NOT: b foo			; CHECK: ldr [[POP:r[4567]]], [sp
				; CHECK-NEXT: mov lr, [[POP]]
				; CHECK-NEXT: pop {{.*}}[[POP]]
				; CHECK-NEXT: add sp, #4
				; CHECK-NEXT: b h0
	}			}

	define void @test2() {			declare i32 @h1(i32)
	; CHECK-LABEL: test2:			define hidden i32 @f1() {
	entry:			%1 = tail call i32 bitcast (i32 (...)* @g to i32 ()*)()
	%tail = tail call i32 @foo()			%2 = tail call i32 @h1(i32 %1)
	ret void			ret i32 %2
	; CHECK: b foo			; CHECK-LABEL: f1
	; CHECK-NOT: bl foo			; CHECK: pop {r7}
				; CHECK: pop {r1}
				; CHECK: mov lr, r1
				; CHECK: b h1
	}			}

	declare i32 @foo()			declare i32 @h2(i32, i32, i32, i32, i32)
				define hidden i32 @f2(i32, i32, i32, i32, i32) {
				%6 = tail call i32 bitcast (i32 (...)* @g to i32 ()*)()
				%7 = icmp eq i32 %6, 0
				br i1 %7, label %10, label %8

				%9 = tail call i32 @h2(i32 %6, i32 %1, i32 %2, i32 %3, i32 %4)
				br label %10

				%11 = phi i32 [ %9, %8 ], [ -1, %5 ]
				ret i32 %11
				; CHECK-LABEL: f2
				; CHECK: ldr [[POP:r[4567]]], [sp
				; CHECK-NEXT: mov lr, [[POP]]
				; CHECK-NEXT: pop {{.*}}[[POP]]
				; CHECK-NEXT: add sp, #4
				; CHECK-NEXT: b h2
				}

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

	Show First 20 Lines • Show All 641 Lines • ▼ Show 20 Lines
	define i1 @beq_to_bx(i32* %y, i32 %head) {			define i1 @beq_to_bx(i32* %y, i32 %head) {
	; CHECK-LABEL: beq_to_bx:			; CHECK-LABEL: beq_to_bx:
	; DISABLE: push {r4, lr}			; DISABLE: push {r4, lr}
	; CHECK: cmp r2, #0			; CHECK: cmp r2, #0
	; CHECK-NEXT: beq [[EXIT_LABEL:LBB[0-9_]+]]			; CHECK-NEXT: beq [[EXIT_LABEL:LBB[0-9_]+]]
	; ENABLE: push {r4, lr}			; ENABLE: push {r4, lr}

	; CHECK: tst r3, r4			; CHECK: tst r3, r4
	; ENABLE-NEXT: pop {r4}			; ENABLE-NEXT: ldr [[POP:r[4567]]], [sp, #8]
	; ENABLE-NEXT: mov r12, r{{.*}}			; ENABLE-NEXT: mov lr, [[POP]]
	; ENABLE-NEXT: pop {r0}			; ENABLE-NEXT: pop {[[POP]]}
	; ENABLE-NEXT: mov lr, r0			; ENABLE-NEXT: add sp, #4
	; ENABLE-NEXT: mov r0, r12
	; CHECK-NEXT: beq [[EXIT_LABEL]]			; CHECK-NEXT: beq [[EXIT_LABEL]]

	; CHECK: str r1, [r2]			; CHECK: str r1, [r2]
	; CHECK: str r3, [r2]			; CHECK: str r3, [r2]
	; CHECK-NEXT: movs r0, #0			; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: [[EXIT_LABEL]]: @ %cleanup			; CHECK-NEXT: [[EXIT_LABEL]]: @ %cleanup
	; ENABLE-NEXT: bx lr			; ENABLE-NEXT: bx lr
	; DISABLE-V5-NEXT: pop {r4, pc}			; DISABLE-V5-NEXT: pop {r4, pc}
	Show All 25 Lines