This is an archive of the discontinued LLVM Phabricator instance.

Implement X86 code generation for musttail
ClosedPublic

Authored by rnk on Apr 24 2014, 3:29 PM.

Download Raw Diff

Details

Reviewers: None
Commits: rGfb6930856838: Implement X86 code generation for musttail
rL207598: Implement X86 code generation for musttail

Summary

Currently, musttail codegen is relying on sibcall optimization, and
reporting a fatal error if fails. Sibcall optimization fails when stack
arguments need to be modified, which is insufficient for musttail.

The logic for moving arguments in memory safely is already implemented
for GuaranteedTailCallOpt. This change merely arranges for musttail
calls to use it.

No functional change for GuaranteedTailCallOpt.

Diff Detail

Repository: rL LLVM

Event Timeline

rnk updated this revision to Diff 8819.Apr 24 2014, 3:29 PM

rnk retitled this revision from to Implement X86 code generation for musttail.

rnk updated this object.

rnk added a subscriber: Unknown Object (MLST).

Would it be possible to instead change IsEligibleForTailCallOptimization to allow sib calls in cases where arguments have to be updated? For example, we should be able to consider t4 in musttail.ll to have a sibcall, no?

lib/Target/X86/X86ISelLowering.cpp
2538 ↗	(On Diff #8819)	This is redundant with the following if. Maybe just leaving this if as is and moving the IsMustTail definition down would be easier to read?
2546 ↗	(On Diff #8819)	IsSibcall is already false in here.

Would it be possible to instead change IsEligibleForTailCallOptimization to allow sib calls in cases where arguments have to be updated?

Maybe. For musttail, we always have to be able to eliminate the tail call, regardless of what IsEligibleForTailCallOptimization says.

I started this change by making IsEligibleForTailCallOptimization return true in more cases, but ultimately I gave when I encountered MatchingStackOffset(). Evan's vision for sibcall optimization seems to be that it won't form any load and store DAG nodes. If you go back to the old llvmdev discussion about it, he felt that -tailcallopt, or at least the DAG that it created, wasn't profitable.

IMO we could revise that to: don't do sibcall optimization if we have to move the return address, and don't generate loads and stores to move parameters if we can safely detect that we don't have to (i.e. what MatchingStackOffset does).

For example, we should be able to consider t4 in musttail.ll to have a sibcall, no?

That would be a large, separate change for sibcall optimization, so I skipped it.

Do you think it's important to finish off this sibcall improvement before continuing with musttail, or can it wait?

lib/Target/X86/X86ISelLowering.cpp
2538 ↗	(On Diff #8819)	I agree

For example, we should be able to consider t4 in musttail.ll to have a sibcall, no?

That would be a large, separate change for sibcall optimization, so I skipped it.

Do you think it's important to finish off this sibcall improvement before continuing with musttail, or can it wait?

No, you have a good point. Changing sibcall is an optimization, so
there are treadoffs to be discussed. With musttail that is a
correctness problem and we can get that first.

Comment at: lib/Target/X86/X86ISelLowering.cpp:2538
@@ -2536,3 +2537,3 @@

if (MF.getTarget().Options.DisableTailCalls)

+ if (!IsMustTail && MF.getTarget().Options.DisableTailCalls)
isTailCall = false;
Rafael Ávila de Espíndola wrote:

This is redundant with the following if. Maybe just leaving this if as is and moving the IsMustTail definition down would be easier to read?

I agree

LGTM with the nits fixed.

Closed by commit rL207598 (authored by @rnk).

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

95 lines

test/

CodeGen/

X86/

musttail-indirect.ll

124 lines

musttail-thiscall.ll

31 lines

musttail.ll

75 lines

Diff 9546

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,492 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitTailCallLoadRetAddr(SelectionDAG &DAG,
// Load the "old" Return address.		// Load the "old" Return address.
OutRetAddr = DAG.getLoad(VT, dl, Chain, OutRetAddr, MachinePointerInfo(),		OutRetAddr = DAG.getLoad(VT, dl, Chain, OutRetAddr, MachinePointerInfo(),
false, false, false, 0);		false, false, false, 0);
return SDValue(OutRetAddr.getNode(), 1);		return SDValue(OutRetAddr.getNode(), 1);
}		}

/// EmitTailCallStoreRetAddr - Emit a store of the return address if tail call		/// EmitTailCallStoreRetAddr - Emit a store of the return address if tail call
/// optimization is performed and it is required (FPDiff!=0).		/// optimization is performed and it is required (FPDiff!=0).
static SDValue		static SDValue EmitTailCallStoreRetAddr(SelectionDAG &DAG, MachineFunction &MF,
EmitTailCallStoreRetAddr(SelectionDAG & DAG, MachineFunction &MF,		SDValue Chain, SDValue RetAddrFrIdx,
SDValue Chain, SDValue RetAddrFrIdx, EVT PtrVT,		EVT PtrVT, unsigned SlotSize,
unsigned SlotSize, int FPDiff, SDLoc dl) {		int FPDiff, SDLoc dl) {
// Store the return address to the appropriate stack slot.		// Store the return address to the appropriate stack slot.
if (!FPDiff) return Chain;		if (!FPDiff) return Chain;
// Calculate the new stack slot for the return address.		// Calculate the new stack slot for the return address.
int NewReturnAddrFI =		int NewReturnAddrFI =
MF.getFrameInfo()->CreateFixedObject(SlotSize, (int64_t)FPDiff - SlotSize,		MF.getFrameInfo()->CreateFixedObject(SlotSize, (int64_t)FPDiff - SlotSize,
false);		false);
SDValue NewRetAddrFrIdx = DAG.getFrameIndex(NewReturnAddrFI, PtrVT);		SDValue NewRetAddrFrIdx = DAG.getFrameIndex(NewReturnAddrFI, PtrVT);
Chain = DAG.getStore(Chain, dl, RetAddrFrIdx, NewRetAddrFrIdx,		Chain = DAG.getStore(Chain, dl, RetAddrFrIdx, NewRetAddrFrIdx,
Show All 20 Lines	X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
bool Is64Bit = Subtarget->is64Bit();		bool Is64Bit = Subtarget->is64Bit();
bool IsWin64 = Subtarget->isCallingConvWin64(CallConv);		bool IsWin64 = Subtarget->isCallingConvWin64(CallConv);
StructReturnType SR = callIsStructReturn(Outs);		StructReturnType SR = callIsStructReturn(Outs);
bool IsSibcall = false;		bool IsSibcall = false;

if (MF.getTarget().Options.DisableTailCalls)		if (MF.getTarget().Options.DisableTailCalls)
isTailCall = false;		isTailCall = false;

if (isTailCall) {		bool IsMustTail = CLI.CS && CLI.CS->isMustTailCall();
		if (IsMustTail) {
		// Force this to be a tail call. The verifier rules are enough to ensure
		// that we can lower this successfully without moving the return address
		// around.
		isTailCall = true;
		} else if (isTailCall) {
// Check if it's really possible to do a tail call.		// Check if it's really possible to do a tail call.
isTailCall = IsEligibleForTailCallOptimization(Callee, CallConv,		isTailCall = IsEligibleForTailCallOptimization(Callee, CallConv,
isVarArg, SR != NotStructReturn,		isVarArg, SR != NotStructReturn,
MF.getFunction()->hasStructRetAttr(), CLI.RetTy,		MF.getFunction()->hasStructRetAttr(), CLI.RetTy,
Outs, OutVals, Ins, DAG);		Outs, OutVals, Ins, DAG);

if (!isTailCall && CLI.CS && CLI.CS->isMustTailCall())
report_fatal_error("failed to perform tail call elimination on a call "
"site marked musttail");

// Sibcalls are automatically detected tailcalls which do not require		// Sibcalls are automatically detected tailcalls which do not require
// ABI changes.		// ABI changes.
if (!MF.getTarget().Options.GuaranteedTailCallOpt && isTailCall)		if (!MF.getTarget().Options.GuaranteedTailCallOpt && isTailCall)
IsSibcall = true;		IsSibcall = true;

if (isTailCall)		if (isTailCall)
++NumTailCalls;		++NumTailCalls;
}		}
Show All 18 Lines	if (IsSibcall)
// This is a sibcall. The memory operands are available in caller's		// This is a sibcall. The memory operands are available in caller's
// own caller's stack.		// own caller's stack.
NumBytes = 0;		NumBytes = 0;
else if (getTargetMachine().Options.GuaranteedTailCallOpt &&		else if (getTargetMachine().Options.GuaranteedTailCallOpt &&
IsTailCallConvention(CallConv))		IsTailCallConvention(CallConv))
NumBytes = GetAlignedArgumentStackSize(NumBytes, DAG);		NumBytes = GetAlignedArgumentStackSize(NumBytes, DAG);

int FPDiff = 0;		int FPDiff = 0;
if (isTailCall && !IsSibcall) {		if (isTailCall && !IsSibcall && !IsMustTail) {
// Lower arguments at fp - stackoffset + fpdiff.		// Lower arguments at fp - stackoffset + fpdiff.
X86MachineFunctionInfo *X86Info = MF.getInfo<X86MachineFunctionInfo>();		X86MachineFunctionInfo *X86Info = MF.getInfo<X86MachineFunctionInfo>();
unsigned NumBytesCallerPushed = X86Info->getBytesToPopOnReturn();		unsigned NumBytesCallerPushed = X86Info->getBytesToPopOnReturn();

FPDiff = NumBytesCallerPushed - NumBytes;		FPDiff = NumBytesCallerPushed - NumBytes;

// Set the delta of movement of the returnaddr stackslot.		// Set the delta of movement of the returnaddr stackslot.
// But only set if delta is greater than previous delta.		// But only set if delta is greater than previous delta.
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	if (Is64Bit && isVarArg && !IsWin64) {
unsigned NumXMMRegs = CCInfo.getFirstUnallocated(XMMArgRegs, 8);		unsigned NumXMMRegs = CCInfo.getFirstUnallocated(XMMArgRegs, 8);
assert((Subtarget->hasSSE1() \|\| !NumXMMRegs)		assert((Subtarget->hasSSE1() \|\| !NumXMMRegs)
&& "SSE registers cannot be used when SSE is disabled");		&& "SSE registers cannot be used when SSE is disabled");

RegsToPass.push_back(std::make_pair(unsigned(X86::AL),		RegsToPass.push_back(std::make_pair(unsigned(X86::AL),
DAG.getConstant(NumXMMRegs, MVT::i8)));		DAG.getConstant(NumXMMRegs, MVT::i8)));
}		}

// For tail calls lower the arguments to the 'real' stack slot.		// For tail calls lower the arguments to the 'real' stack slots. Sibcalls
if (isTailCall) {		// don't need this because the eligibility check rejects calls that require
		// shuffling arguments passed in memory.
		if (!IsSibcall && isTailCall) {
// Force all the incoming stack arguments to be loaded from the stack		// Force all the incoming stack arguments to be loaded from the stack
// before any new outgoing arguments are stored to the stack, because the		// before any new outgoing arguments are stored to the stack, because the
// outgoing stack slots may alias the incoming argument stack slots, and		// outgoing stack slots may alias the incoming argument stack slots, and
// the alias isn't otherwise explicit. This is slightly more conservative		// the alias isn't otherwise explicit. This is slightly more conservative
// than necessary, because it means that each store effectively depends		// than necessary, because it means that each store effectively depends
// on every argument instead of just those arguments it would clobber.		// on every argument instead of just those arguments it would clobber.
SDValue ArgChain = DAG.getStackArgumentTokenFactor(Chain);		SDValue ArgChain = DAG.getStackArgumentTokenFactor(Chain);

SmallVector<SDValue, 8> MemOpChains2;		SmallVector<SDValue, 8> MemOpChains2;
SDValue FIN;		SDValue FIN;
int FI = 0;		int FI = 0;
if (getTargetMachine().Options.GuaranteedTailCallOpt) {
for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {		for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
CCValAssign &VA = ArgLocs[i];		CCValAssign &VA = ArgLocs[i];
if (VA.isRegLoc())		if (VA.isRegLoc())
continue;		continue;
assert(VA.isMemLoc());		assert(VA.isMemLoc());
SDValue Arg = OutVals[i];		SDValue Arg = OutVals[i];
ISD::ArgFlagsTy Flags = Outs[i].Flags;		ISD::ArgFlagsTy Flags = Outs[i].Flags;
		// Skip inalloca arguments. They don't require any work.
		if (Flags.isInAlloca())
		continue;
// Create frame index.		// Create frame index.
int32_t Offset = VA.getLocMemOffset()+FPDiff;		int32_t Offset = VA.getLocMemOffset()+FPDiff;
uint32_t OpSize = (VA.getLocVT().getSizeInBits()+7)/8;		uint32_t OpSize = (VA.getLocVT().getSizeInBits()+7)/8;
FI = MF.getFrameInfo()->CreateFixedObject(OpSize, Offset, true);		FI = MF.getFrameInfo()->CreateFixedObject(OpSize, Offset, true);
FIN = DAG.getFrameIndex(FI, getPointerTy());		FIN = DAG.getFrameIndex(FI, getPointerTy());

if (Flags.isByVal()) {		if (Flags.isByVal()) {
// Copy relative to framepointer.		// Copy relative to framepointer.
SDValue Source = DAG.getIntPtrConstant(VA.getLocMemOffset());		SDValue Source = DAG.getIntPtrConstant(VA.getLocMemOffset());
if (!StackPtr.getNode())		if (!StackPtr.getNode())
StackPtr = DAG.getCopyFromReg(Chain, dl,		StackPtr = DAG.getCopyFromReg(Chain, dl,
RegInfo->getStackRegister(),		RegInfo->getStackRegister(),
getPointerTy());		getPointerTy());
Source = DAG.getNode(ISD::ADD, dl, getPointerTy(), StackPtr, Source);		Source = DAG.getNode(ISD::ADD, dl, getPointerTy(), StackPtr, Source);

MemOpChains2.push_back(CreateCopyOfByValArgument(Source, FIN,		MemOpChains2.push_back(CreateCopyOfByValArgument(Source, FIN,
ArgChain,		ArgChain,
Flags, DAG, dl));		Flags, DAG, dl));
} else {		} else {
// Store relative to framepointer.		// Store relative to framepointer.
MemOpChains2.push_back(		MemOpChains2.push_back(
DAG.getStore(ArgChain, dl, Arg, FIN,		DAG.getStore(ArgChain, dl, Arg, FIN,
MachinePointerInfo::getFixedStack(FI),		MachinePointerInfo::getFixedStack(FI),
false, false, 0));		false, false, 0));
}		}
}		}
}

if (!MemOpChains2.empty())		if (!MemOpChains2.empty())
Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, MemOpChains2);		Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, MemOpChains2);

// Store the return address to the appropriate stack slot.		// Store the return address to the appropriate stack slot.
Chain = EmitTailCallStoreRetAddr(DAG, MF, Chain, RetAddrFrIdx,		Chain = EmitTailCallStoreRetAddr(DAG, MF, Chain, RetAddrFrIdx,
getPointerTy(), RegInfo->getSlotSize(),		getPointerTy(), RegInfo->getSlotSize(),
FPDiff, dl);		FPDiff, dl);
▲ Show 20 Lines • Show All 18,081 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/musttail-indirect.ll

				; RUN: llc < %s -mtriple=i686-win32 \| FileCheck %s
				; RUN: llc < %s -mtriple=i686-win32 -O0 \| FileCheck %s

				; IR simplified from the following C++ snippet compiled for i686-windows-msvc:

				; struct A { A(); ~A(); int a; };
				;
				; struct B {
				; virtual int f(int);
				; virtual int g(A, int, A);
				; virtual void h(A, int, A);
				; virtual A i(A, int, A);
				; virtual A j(int);
				; };
				;
				; int (B::*mp_f)(int) = &B::f;
				; int (B::*mp_g)(A, int, A) = &B::g;
				; void (B::*mp_h)(A, int, A) = &B::h;
				; A (B::*mp_i)(A, int, A) = &B::i;
				; A (B::*mp_j)(int) = &B::j;

				; Each member pointer creates a thunk. The ones with inalloca are required to
				; tail calls by the ABI, even at O0.

				%struct.B = type { i32 (...)** }
				%struct.A = type { i32 }

				; CHECK-LABEL: f_thunk:
				; CHECK: jmpl
				; CHECK-NOT: ret
				define x86_thiscallcc i32 @f_thunk(%struct.B* %this, i32) {
				entry:
				%1 = bitcast %struct.B* %this to i32 (%struct.B, i32)**
				%vtable = load i32 (%struct.B, i32)** %1
				%2 = load i32 (%struct.B, i32)* %vtable
				%3 = musttail call x86_thiscallcc i32 %2(%struct.B* %this, i32 %0)
				ret i32 %3
				}

				; Inalloca thunks shouldn't require any stores to the stack.
				; CHECK-LABEL: g_thunk:
				; CHECK-NOT: mov %{{.}}, {{.(.esp.)}}
				; CHECK: jmpl
				; CHECK-NOT: ret
				define x86_thiscallcc i32 @g_thunk(%struct.B* %this, <{ %struct.A, i32, %struct.A }>* inalloca) {
				entry:
				%1 = bitcast %struct.B* %this to i32 (%struct.B, <{ %struct.A, i32, %struct.A }>)***
				%vtable = load i32 (%struct.B, <{ %struct.A, i32, %struct.A }>)*** %1
				%vfn = getelementptr inbounds i32 (%struct.B, <{ %struct.A, i32, %struct.A }>)** %vtable, i32 1
				%2 = load i32 (%struct.B, <{ %struct.A, i32, %struct.A }>)** %vfn
				%3 = musttail call x86_thiscallcc i32 %2(%struct.B* %this, <{ %struct.A, i32, %struct.A }>* inalloca %0)
				ret i32 %3
				}

				; CHECK-LABEL: h_thunk:
				; CHECK: jmpl
				; CHECK-NOT: mov %{{.}}, {{.(.esp.)}}
				; CHECK-NOT: ret
				define x86_thiscallcc void @h_thunk(%struct.B* %this, <{ %struct.A, i32, %struct.A }>* inalloca) {
				entry:
				%1 = bitcast %struct.B* %this to void (%struct.B, <{ %struct.A, i32, %struct.A }>)***
				%vtable = load void (%struct.B, <{ %struct.A, i32, %struct.A }>)*** %1
				%vfn = getelementptr inbounds void (%struct.B, <{ %struct.A, i32, %struct.A }>)** %vtable, i32 2
				%2 = load void (%struct.B, <{ %struct.A, i32, %struct.A }>)** %vfn
				musttail call x86_thiscallcc void %2(%struct.B* %this, <{ %struct.A, i32, %struct.A }>* inalloca %0)
				ret void
				}

				; CHECK-LABEL: i_thunk:
				; CHECK-NOT: mov %{{.}}, {{.(.esp.)}}
				; CHECK: jmpl
				; CHECK-NOT: ret
				define x86_thiscallcc %struct.A* @i_thunk(%struct.B* %this, <{ %struct.A, %struct.A, i32, %struct.A }> inalloca) {
				entry:
				%1 = bitcast %struct.B* %this to %struct.A* (%struct.B, <{ %struct.A, %struct.A, i32, %struct.A }>)**
				%vtable = load %struct.A* (%struct.B, <{ %struct.A, %struct.A, i32, %struct.A }>)** %1
				%vfn = getelementptr inbounds %struct.A* (%struct.B, <{ %struct.A, %struct.A, i32, %struct.A }>)* %vtable, i32 3
				%2 = load %struct.A* (%struct.B, <{ %struct.A, %struct.A, i32, %struct.A }>)* %vfn
				%3 = musttail call x86_thiscallcc %struct.A* %2(%struct.B* %this, <{ %struct.A, %struct.A, i32, %struct.A }> inalloca %0)
				ret %struct.A* %3
				}

				; CHECK-LABEL: j_thunk:
				; CHECK: jmpl
				; CHECK-NOT: ret
				define x86_thiscallcc void @j_thunk(%struct.A* noalias sret %agg.result, %struct.B* %this, i32) {
				entry:
				%1 = bitcast %struct.B* %this to void (%struct.A, %struct.B, i32)***
				%vtable = load void (%struct.A, %struct.B, i32)*** %1
				%vfn = getelementptr inbounds void (%struct.A, %struct.B, i32)** %vtable, i32 4
				%2 = load void (%struct.A, %struct.B, i32)** %vfn
				musttail call x86_thiscallcc void %2(%struct.A* sret %agg.result, %struct.B* %this, i32 %0)
				ret void
				}

				; CHECK-LABEL: _stdcall_thunk@8:
				; CHECK-NOT: mov %{{.}}, {{.(.esp.)}}
				; CHECK: jmpl
				; CHECK-NOT: ret
				define x86_stdcallcc i32 @stdcall_thunk(<{ %struct.B, %struct.A }> inalloca) {
				entry:
				%this_ptr = getelementptr inbounds <{ %struct.B, %struct.A }> %0, i32 0, i32 0
				%this = load %struct.B** %this_ptr
				%1 = bitcast %struct.B* %this to i32 (<{ %struct.B, %struct.A }>)***
				%vtable = load i32 (<{ %struct.B, %struct.A }>)*** %1
				%vfn = getelementptr inbounds i32 (<{ %struct.B, %struct.A }>)** %vtable, i32 1
				%2 = load i32 (<{ %struct.B, %struct.A }>)** %vfn
				%3 = musttail call x86_stdcallcc i32 %2(<{ %struct.B, %struct.A }> inalloca %0)
				ret i32 %3
				}

				; CHECK-LABEL: @fastcall_thunk@8:
				; CHECK-NOT: mov %{{.}}, {{.(.esp.)}}
				; CHECK: jmpl
				; CHECK-NOT: ret
				define x86_fastcallcc i32 @fastcall_thunk(%struct.B* inreg %this, <{ %struct.A }>* inalloca) {
				entry:
				%1 = bitcast %struct.B* %this to i32 (%struct.B, <{ %struct.A }>)***
				%vtable = load i32 (%struct.B, <{ %struct.A }>)*** %1
				%vfn = getelementptr inbounds i32 (%struct.B, <{ %struct.A }>)** %vtable, i32 1
				%2 = load i32 (%struct.B, <{ %struct.A }>)** %vfn
				%3 = musttail call x86_fastcallcc i32 %2(%struct.B* inreg %this, <{ %struct.A }>* inalloca %0)
				ret i32 %3
				}

llvm/trunk/test/CodeGen/X86/musttail-thiscall.ll

				; RUN: llc -march=x86 < %s \| FileCheck %s
				; RUN: llc -march=x86 -O0 < %s \| FileCheck %s

				; CHECK-LABEL: t1:
				; CHECK: jmp {{_?}}t1_callee
				define x86_thiscallcc void @t1(i8* %this) {
				%adj = getelementptr i8* %this, i32 4
				musttail call x86_thiscallcc void @t1_callee(i8* %adj)
				ret void
				}
				declare x86_thiscallcc void @t1_callee(i8* %this)

				; CHECK-LABEL: t2:
				; CHECK: jmp {{_?}}t2_callee
				define x86_thiscallcc i32 @t2(i8* %this, i32 %a) {
				%adj = getelementptr i8* %this, i32 4
				%rv = musttail call x86_thiscallcc i32 @t2_callee(i8* %adj, i32 %a)
				ret i32 %rv
				}
				declare x86_thiscallcc i32 @t2_callee(i8* %this, i32 %a)

				; CHECK-LABEL: t3:
				; CHECK: jmp {{_?}}t3_callee
				define x86_thiscallcc i8* @t3(i8* %this, <{ i8, i32 }> inalloca %args) {
				%adj = getelementptr i8* %this, i32 4
				%a_ptr = getelementptr <{ i8, i32 }> %args, i32 0, i32 1
				store i32 0, i32* %a_ptr
				%rv = musttail call x86_thiscallcc i8* @t3_callee(i8* %adj, <{ i8, i32 }> inalloca %args)
				ret i8* %rv
				}
				declare x86_thiscallcc i8* @t3_callee(i8* %this, <{ i8, i32 }> inalloca %args);

llvm/trunk/test/CodeGen/X86/musttail.ll

	; RUN: llc -march=x86 < %s \| FileCheck %s			; RUN: llc -march=x86 < %s \| FileCheck %s
				; RUN: llc -march=x86 -O0 < %s \| FileCheck %s
	; FIXME: Eliminate this tail call at -O0, since musttail is a correctness			; RUN: llc -march=x86 -disable-tail-calls < %s \| FileCheck %s
	; requirement.
	; RUN: not llc -march=x86 -O0 < %s

	declare void @t1_callee(i8*)			declare void @t1_callee(i8*)
	define void @t1(i32* %a) {			define void @t1(i32* %a) {
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: jmp {{_?}}t1_callee			; CHECK: jmp {{_?}}t1_callee
	%b = bitcast i32* %a to i8*			%b = bitcast i32* %a to i8*
	musttail call void @t1_callee(i8* %b)			musttail call void @t1_callee(i8* %b)
	ret void			ret void
	}			}

	declare i8* @t2_callee()			declare i8* @t2_callee()
	define i32* @t2() {			define i32* @t2() {
	; CHECK-LABEL: t2:			; CHECK-LABEL: t2:
	; CHECK: jmp {{_?}}t2_callee			; CHECK: jmp {{_?}}t2_callee
	%v = musttail call i8* @t2_callee()			%v = musttail call i8* @t2_callee()
	%w = bitcast i8* %v to i32*			%w = bitcast i8* %v to i32*
	ret i32* %w			ret i32* %w
	}			}

				; Complex frame layout: stack realignment with dynamic alloca.
				define void @t3(i32 %n) alignstack(32) nounwind {
				entry:
				; CHECK: t3:
				; CHECK: pushl %ebp
				; CHECK: pushl %esi
				; CHECK: andl $-32, %esp
				; CHECK: movl %esp, %esi
				; CHECK: popl %esi
				; CHECK: popl %ebp
				; CHECK-NEXT: jmp {{_?}}t3_callee
				%a = alloca i8, i32 %n
				call void @capture(i8* %a)
				musttail call void @t3_callee(i32 %n) nounwind
				ret void
				}

				declare void @capture(i8*)
				declare void @t3_callee(i32)

				; Test that we actually copy in and out stack arguments that aren't forwarded
				; without modification.
				define i32 @t4({}* %fn, i32 %n, i32 %r) {
				; CHECK-LABEL: t4:
				; CHECK: incl %[[r:.*]]
				; CHECK: decl %[[n:.*]]
				; CHECK: movl %[[r]], {{[0-9]+}}(%esp)
				; CHECK: movl %[[n]], {{[0-9]+}}(%esp)
				; CHECK: jmpl %{{.}}

				entry:
				%r1 = add i32 %r, 1
				%n1 = sub i32 %n, 1
				%fn_cast = bitcast {}* %fn to i32 ({}, i32, i32)
				%r2 = musttail call i32 %fn_cast({}* %fn, i32 %n1, i32 %r1)
				ret i32 %r2
				}

				; Combine the complex stack frame with the parameter modification.
				define i32 @t5({}* %fn, i32 %n, i32 %r) alignstack(32) {
				; CHECK-LABEL: t5:
				; CHECK: pushl %ebp
				; CHECK: movl %esp, %ebp
				; CHECK: pushl %esi
				; Align the stack.
				; CHECK: andl $-32, %esp
				; CHECK: movl %esp, %esi
				; Modify the args.
				; CHECK: incl %[[r:.*]]
				; CHECK: decl %[[n:.*]]
				; Store them through ebp, since that's the only stable arg pointer.
				; CHECK: movl %[[r]], {{[0-9]+}}(%ebp)
				; CHECK: movl %[[n]], {{[0-9]+}}(%ebp)
				; Epilogue.
				; CHECK: leal {{[-0-9]+}}(%ebp), %esp
				; CHECK: popl %esi
				; CHECK: popl %ebp
				; CHECK: jmpl %{{.}}

				entry:
				%a = alloca i8, i32 %n
				call void @capture(i8* %a)
				%r1 = add i32 %r, 1
				%n1 = sub i32 %n, 1
				%fn_cast = bitcast {}* %fn to i32 ({}, i32, i32)
				%r2 = musttail call i32 %fn_cast({}* %fn, i32 %n1, i32 %r1)
				ret i32 %r2
				}