This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-hello.ll
-
arm64-shrink-wrapping.ll
-
big-byval.ll
-
func-calls.ll
-
nontemporal.ll
-
swifterror.ll
-
tailcall-explicit-sret.ll

Differential D42006

AArch64: Omit callframe setup/destroy when not necessary
ClosedPublic

Authored by MatzeB on Jan 12 2018, 12:05 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
aemerson
eli.friedman
qcolombet
gberry
efriedma

Commits

rGdc4b3e87f466: AArch64: Omit callframe setup/destroy when not necessary
rL322917: AArch64: Omit callframe setup/destroy when not necessary

Summary

Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.

Fixes an invalid callframe nesting for byval arguments. This would fail the machine verifier as it looked like this before this patch (as in big-byval.ll):

...
ADJCALLSTACKDOWN 32768, 0, ...   # Setup for extfunc
...
ADJCALLSTACKDOWN 0, 0, ...  # setup for memcpy
...
BL &memcpy ...
ADJCALLSTACKUP 0, 0, ...    # destroy for memcpy
...
BL &extfunc
ADJCALLSTACKUP 32768, 0, ...   # destroy for extfunc

Saves us two machine instructions in the common case of zero-sized stackframes.
Remove an unnecessary scheduling barrier (hence the small unittest changes).

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB created this revision.Jan 12 2018, 12:05 PM

Herald added subscribers: kristof.beyls, javed.absar, mcrosier, rengolin. · View Herald TranscriptJan 12 2018, 12:05 PM

MatzeB edited the summary of this revision. (Show Details)Jan 16 2018, 6:38 PM

Various backends handle this situation different ways... ARM emits an inline loop, x86 and x86 emit an inline loop rather than a call. PowerPC has a utility createMemcpyOutsideCallSeq which avoids nesting. And this would introduce, essentially, a third way to handle it. We should probably settle on one solution, if possible.

Some target-independent code has special handling for callframe operations... but I guess most of the handling isn't necessary? Does a call without call frame setup instructions correctly disable the red zone?

In D42006#979433, @efriedma wrote:

Various backends handle this situation different ways... ARM emits an inline loop, x86 and x86 emit an inline loop rather than a call. PowerPC has a utility createMemcpyOutsideCallSeq which avoids nesting. And this would introduce, essentially, a third way to handle it. We should probably settle on one solution, if possible.

True, I could look into whether this simple solution would avoid the special case on PowerPC as well; but let's do that in a separate patch.
I'm not sure I want to change ARM/X86 from an inline loop to memcpy as it changes the performance characteristics (slightly slower for small sizes, faster for big sizes).

In any case I found this change desirable independently of it fixing the nesting issue.

Some target-independent code has special handling for callframe operations... but I guess most of the handling isn't necessary? Does a call without call frame setup instructions correctly disable the red zone?

AArch64FrameLowering::canUseRedZone() checks for MachineFrameInfo::hasCalls() which is set based on MCInstructions having a Call flag set not based of the stackframe setup (you can grep for setHasCalls(true) if you are interested).

LGTM

In general the idea makes sense: adjusting the stack by zero bytes is a no-op, so we don't need to mark it with a MachineInstr. I'm a little worried we're going to trip over some edge case where we depend on the call frame opcodes for some other reason, but I can't think of anything specific, so I guess we'll see. Please make sure you've run some basic correctness tests before you merge.

This revision is now accepted and ready to land.Jan 17 2018, 4:07 PM

Please make sure you've run some basic correctness tests before you merge.

I just tested this with the llvm test-suite with an asserts enabled compiler and the benchmarks compiled and ran fine.

Closed by commit rL322917: AArch64: Omit callframe setup/destroy when not necessary (authored by matze). · Explain WhyJan 18 2018, 6:48 PM

This revision was automatically updated to reflect the committed changes.

Hi Matthias,

With this specific change, I found several performance regressions in spec benchmarks on AArch64
In -O3 :

Spec2006/astar -3.25%
Spec2006/povray -5.28%
Spec2017/povray -6.08%

In LTO :

Spec2006/astar -4.20%
Spec2006/h264ref -2.15%

For me, it appears that different value was picked for spill, resulting in different spill/reloads in different blocks. Have you run any performance test and observed any reproducible gain or regression?

Thanks,
Jun

I didn't do in-depth performance tests. In principle this just remove a few "nop" instructions so I didn't expect big changes. The change gives the scheduler a bit more freedom though as there is not instruction redefining SP anymore... Did you only get regressions and no improvements from this?

If you have some conrete differences in assembly that would be intersting to look at. I should have time for detailed analysis at the beginning of next week.

In D42006#985616, @junbuml wrote:
Hi Matthias,

With this specific change, I found several performance regressions in spec benchmarks on AArch64
In -O3 :
Spec2006/astar -3.25%
Spec2006/povray -5.28%
Spec2017/povray -6.08%
In LTO :
Spec2006/astar -4.20%
Spec2006/h264ref -2.15%
For me, it appears that different value was picked for spill, resulting in different spill/reloads in different blocks. Have you run any performance test and observed any reproducible gain or regression?

Thanks,
Jun

I didn't do in-depth performance tests. In principle this just remove a few "nop" instructions so I didn't expect big changes. The change gives the scheduler a bit more freedom though as there is not instruction redefining SP anymore... Did you only get regressions and no improvements from this?

If you have some conrete differences in assembly that would be intersting to look at. I should have time for detailed analysis at the beginning of next week.

I didn't see any gain in our weekily performance run, but regressions which specifically after this change unfortunately. I looked at one of the hot function (_ZN7way2obj12releasepointEii) in spec2006/astar where this change made changes in spilling widely. I beleive it should be reproducible with -O3. Please let me know if you cannot reproduce the regression; I will be happy to support fixing the regressions.

Would it be possible to revert r322917 while we investigate the regressions? We also identified a 3.61% regression in SPEC2006/bzip2, so here's to complete list of regressions we are currently seeing due to this change:

With -O3 -fno-math-errno -ffp-contract=fast -fomit-frame-pointer -mcpu=falkor:

Spec2006/astar -3.25%
Spec2006/bzip2 -3.61%
Spec2006/povray -5.28%
Spec2017/povray -6.08%

With -O3 -flto -fuse-ld=gold -fno-math-errno -ffp-contract=fast -fwhole-program-vtables -fvisibility=hidden -fomit-frame-pointer -mcpu=falkor:

Spec2006/astar -4.20%
Spec2006/h264ref -2.15%

All tests were run on Falkor, but hopefully these issues can be reproduced on other targets. Please let us know if you need any assistance reproducing, Matthias.

Chad

Of course reverting is fine. This whole commit was mostly motivated by llvm/test/CodeGen/AArch64/big-callframe.ll but you could simply XFAIL that test together with the revert.

I will revert later if you don't beat me to it.

In D42006#990011, @mcrosier wrote:
Would it be possible to revert r322917 while we investigate the regressions? We also identified a 3.61% regression in SPEC2006/bzip2, so here's to complete list of regressions we are currently seeing due to this change:

With -O3 -fno-math-errno -ffp-contract=fast -fomit-frame-pointer -mcpu=falkor:
Spec2006/astar -3.25%
Spec2006/bzip2 -3.61%
Spec2006/povray -5.28%
Spec2017/povray -6.08%
With -O3 -flto -fuse-ld=gold -fno-math-errno -ffp-contract=fast -fwhole-program-vtables -fvisibility=hidden -fomit-frame-pointer -mcpu=falkor:
Spec2006/astar -4.20%
Spec2006/h264ref -2.15%
All tests were run on Falkor, but hopefully these issues can be reproduced on other targets. Please let us know if you need any assistance reproducing, Matthias.

Chad

Of course reverting is fine. This whole commit was mostly motivated by llvm/test/CodeGen/AArch64/big-callframe.ll but you could simply XFAIL that test together with the revert.
I will revert later if you don't beat me to it.

Okay. then I will revert this for now.

Thanks,
Jun

In r323683 , revert this and XFAIL big-callframe.ll.

Thanks, Matthias/Jun!

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

18 lines

test/

CodeGen/

AArch64/

arm64-hello.ll

4 lines

arm64-shrink-wrapping.ll

6 lines

13 lines

2 lines

8 lines

93 lines

tailcall-explicit-sret.ll

10 lines

Diff 130537

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,337 Lines • ▼ Show 20 Lines	if (IsTailCall && !IsSibCall) {
// The stack pointer must be 16-byte aligned at all times it's used for a		// The stack pointer must be 16-byte aligned at all times it's used for a
// memory operation, which in practice means at all times and in		// memory operation, which in practice means at all times and in
// particular across call boundaries. Therefore our own arguments started at		// particular across call boundaries. Therefore our own arguments started at
// a 16-byte aligned SP and the delta applied for the tail call should		// a 16-byte aligned SP and the delta applied for the tail call should
// satisfy the same constraint.		// satisfy the same constraint.
assert(FPDiff % 16 == 0 && "unaligned stack on tail call");		assert(FPDiff % 16 == 0 && "unaligned stack on tail call");
}		}

		// We can omit callseq_start/callseq_end if there is no callframe to setup.
		// Do not omit for patchpoints as SelectionDAGBuilder::visitPatchpoint()
		// currently expects it.
		bool OmitCallSeq = NumBytes == 0 && !CLI.IsPatchPoint;
		assert((!IsSibCall \|\| OmitCallSeq) && "Should not get callseq for sibcalls");

// Adjust the stack pointer for the new arguments...		// Adjust the stack pointer for the new arguments...
// These operations are automatically eliminated by the prolog/epilog pass		// These operations are automatically eliminated by the prolog/epilog pass
if (!IsSibCall)		if (!OmitCallSeq)
Chain = DAG.getCALLSEQ_START(Chain, NumBytes, 0, DL);		Chain = DAG.getCALLSEQ_START(Chain, NumBytes, 0, DL);

SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,		SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,
getPointerTy(DAG.getDataLayout()));		getPointerTy(DAG.getDataLayout()));

SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;		SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;
SmallVector<SDValue, 8> MemOpChains;		SmallVector<SDValue, 8> MemOpChains;
auto PtrVT = getPointerTy(DAG.getDataLayout());		auto PtrVT = getPointerTy(DAG.getDataLayout());
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	if (getTargetMachine().getCodeModel() == CodeModel::Large &&
Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);		Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
}		}
}		}

// We don't usually want to end the call-sequence here because we would tidy		// We don't usually want to end the call-sequence here because we would tidy
// the frame up after the call, however in the ABI-changing tail-call case		// the frame up after the call, however in the ABI-changing tail-call case
// we've carefully laid out the parameters so that when sp is reset they'll be		// we've carefully laid out the parameters so that when sp is reset they'll be
// in the correct location.		// in the correct location.
if (IsTailCall && !IsSibCall) {		if (IsTailCall && !OmitCallSeq) {
Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(NumBytes, DL, true),		Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(NumBytes, DL, true),
DAG.getIntPtrConstant(0, DL, true), InFlag, DL);		DAG.getIntPtrConstant(0, DL, true), InFlag, DL);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);
}		}

std::vector<SDValue> Ops;		std::vector<SDValue> Ops;
Ops.push_back(Chain);		Ops.push_back(Chain);
Ops.push_back(Callee);		Ops.push_back(Callee);
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,

// Returns a chain and a flag for retval copy to use.		// Returns a chain and a flag for retval copy to use.
Chain = DAG.getNode(AArch64ISD::CALL, DL, NodeTys, Ops);		Chain = DAG.getNode(AArch64ISD::CALL, DL, NodeTys, Ops);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);

uint64_t CalleePopBytes =		uint64_t CalleePopBytes =
DoesCalleeRestoreStack(CallConv, TailCallOpt) ? alignTo(NumBytes, 16) : 0;		DoesCalleeRestoreStack(CallConv, TailCallOpt) ? alignTo(NumBytes, 16) : 0;

		if (!OmitCallSeq)
Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(NumBytes, DL, true),		Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(NumBytes, DL, true),
DAG.getIntPtrConstant(CalleePopBytes, DL, true),		DAG.getIntPtrConstant(CalleePopBytes, DL, true),
InFlag, DL);		InFlag, DL);

if (!Ins.empty())		if (!Ins.empty())
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);

// Handle result values, copying them out of physregs into vregs that we		// Handle result values, copying them out of physregs into vregs that we
// return.		// return.
return LowerCallResult(Chain, InFlag, CallConv, IsVarArg, Ins, DL, DAG,		return LowerCallResult(Chain, InFlag, CallConv, IsVarArg, Ins, DL, DAG,
InVals, IsThisReturn,		InVals, IsThisReturn,
IsThisReturn ? OutVals[0] : SDValue());		IsThisReturn ? OutVals[0] : SDValue());
▲ Show 20 Lines • Show All 7,394 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-hello.ll

	; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -disable-post-ra -disable-fp-elim \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -disable-post-ra -disable-fp-elim \| FileCheck %s
	; RUN: llc < %s -mtriple=arm64-linux-gnu -disable-post-ra \| FileCheck %s --check-prefix=CHECK-LINUX			; RUN: llc < %s -mtriple=arm64-linux-gnu -disable-post-ra \| FileCheck %s --check-prefix=CHECK-LINUX

	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: sub sp, sp, #32			; CHECK: sub sp, sp, #32
	; CHECK-NEXT: stp x29, x30, [sp, #16]			; CHECK-NEXT: stp x29, x30, [sp, #16]
	; CHECK-NEXT: add x29, sp, #16			; CHECK-NEXT: add x29, sp, #16
	; CHECK-NEXT: stur wzr, [x29, #-4]
	; CHECK: adrp x0, l_.str@PAGE			; CHECK: adrp x0, l_.str@PAGE
	; CHECK: add x0, x0, l_.str@PAGEOFF			; CHECK: add x0, x0, l_.str@PAGEOFF
				; CHECK-NEXT: stur wzr, [x29, #-4]
	; CHECK-NEXT: bl _puts			; CHECK-NEXT: bl _puts
	; CHECK-NEXT: ldp x29, x30, [sp, #16]			; CHECK-NEXT: ldp x29, x30, [sp, #16]
	; CHECK-NEXT: add sp, sp, #32			; CHECK-NEXT: add sp, sp, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	; CHECK-LINUX-LABEL: main:			; CHECK-LINUX-LABEL: main:
	; CHECK-LINUX: str x30, [sp, #-16]!			; CHECK-LINUX: str x30, [sp, #-16]!
	; CHECK-LINUX-NEXT: str wzr, [sp, #12]
	; CHECK-LINUX: adrp x0, .L.str			; CHECK-LINUX: adrp x0, .L.str
	; CHECK-LINUX: add x0, x0, :lo12:.L.str			; CHECK-LINUX: add x0, x0, :lo12:.L.str
				; CHECK-LINUX-NEXT: str wzr, [sp, #12]
	; CHECK-LINUX-NEXT: bl puts			; CHECK-LINUX-NEXT: bl puts
	; CHECK-LINUX-NEXT: ldr x30, [sp], #16			; CHECK-LINUX-NEXT: ldr x30, [sp], #16
	; CHECK-LINUX-NEXT: ret			; CHECK-LINUX-NEXT: ret

	@.str = private unnamed_addr constant [7 x i8] c"hello\0A\00"			@.str = private unnamed_addr constant [7 x i8] c"hello\0A\00"

	define i32 @main() nounwind ssp {			define i32 @main() nounwind ssp {
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	store i32 0, i32* %retval			store i32 0, i32* %retval
	%call = call i32 @puts(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0))			%call = call i32 @puts(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0))
	ret i32 %call			ret i32 %call
	}			}

	declare i32 @puts(i8*)			declare i32 @puts(i8*)

llvm/trunk/test/CodeGen/AArch64/arm64-shrink-wrapping.ll

	Show All 16 Lines
	; CHECK-NEXT: stp [[SAVE_SP:x[0-9]+]], [[CSR:x[0-9]+]], [sp, #16]			; CHECK-NEXT: stp [[SAVE_SP:x[0-9]+]], [[CSR:x[0-9]+]], [sp, #16]
	; CHECK-NEXT: add [[SAVE_SP]], sp, #16			; CHECK-NEXT: add [[SAVE_SP]], sp, #16
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; After the prologue is set.			; After the prologue is set.
	; DISABLE: cmp w0, w1			; DISABLE: cmp w0, w1
	; DISABLE-NEXT: b.ge [[EXIT_LABEL:LBB[0-9_]+]]			; DISABLE-NEXT: b.ge [[EXIT_LABEL:LBB[0-9_]+]]
	;			;
	; Store %a in the alloca.
	; CHECK: stur w0, {{\[}}[[SAVE_SP]], #-4]
	; Set the alloca address in the second argument.			; Set the alloca address in the second argument.
	; CHECK-NEXT: sub x1, [[SAVE_SP]], #4			; CHECK: sub x1, [[SAVE_SP]], #4
				; Store %a in the alloca.
				; CHECK-NEXT: stur w0, {{\[}}[[SAVE_SP]], #-4]
	; Set the first argument to zero.			; Set the first argument to zero.
	; CHECK-NEXT: mov w0, wzr			; CHECK-NEXT: mov w0, wzr
	; CHECK-NEXT: bl _doSomething			; CHECK-NEXT: bl _doSomething
	;			;
	; Without shrink-wrapping, epilogue is in the exit block.			; Without shrink-wrapping, epilogue is in the exit block.
	; DISABLE: [[EXIT_LABEL]]:			; DISABLE: [[EXIT_LABEL]]:
	; Epilogue code.			; Epilogue code.
	; CHECK-NEXT: ldp x{{[0-9]+}}, [[CSR]], [sp, #16]			; CHECK-NEXT: ldp x{{[0-9]+}}, [[CSR]], [sp, #16]
	▲ Show 20 Lines • Show All 680 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/big-byval.ll

				; RUN: llc -o - %s -verify-machineinstrs \| FileCheck %s
				target triple = "aarch64--"

				; Make sure we don't fail machine verification because the memcpy callframe
				; setup is nested inside the extfunc callframe setup.
				; CHECK-LABEL: func:
				; CHECK: bl memcpy
				; CHECK: bl extfunc
				declare void @extfunc([4096 x i64]* byval %p)
				define void @func([4096 x i64]* %z) {
				call void @extfunc([4096 x i64]* byval %z)
				ret void
				}

llvm/trunk/test/CodeGen/AArch64/func-calls.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK: bl return_double			; CHECK: bl return_double
	; CHECK: str d0, [{{x[0-9]+}}, {{#?}}:lo12:vardouble]			; CHECK: str d0, [{{x[0-9]+}}, {{#?}}:lo12:vardouble]
	; CHECK-NOFP-NOT: str d0,			; CHECK-NOFP-NOT: str d0,

	%arr = call [2 x i64] @return_smallstruct()			%arr = call [2 x i64] @return_smallstruct()
	store [2 x i64] %arr, [2 x i64]* @varsmallstruct			store [2 x i64] %arr, [2 x i64]* @varsmallstruct
	; CHECK: bl return_smallstruct			; CHECK: bl return_smallstruct
	; CHECK: add x[[VARSMALLSTRUCT:[0-9]+]], {{x[0-9]+}}, :lo12:varsmallstruct			; CHECK: add x[[VARSMALLSTRUCT:[0-9]+]], {{x[0-9]+}}, :lo12:varsmallstruct
				; CHECK: add x8, {{x[0-9]+}}, {{#?}}:lo12:varstruct
	; CHECK: stp x0, x1, [x[[VARSMALLSTRUCT]]]			; CHECK: stp x0, x1, [x[[VARSMALLSTRUCT]]]

	call void @return_large_struct(%myStruct* sret @varstruct)			call void @return_large_struct(%myStruct* sret @varstruct)
	; CHECK: add x8, {{x[0-9]+}}, {{#?}}:lo12:varstruct
	; CHECK: bl return_large_struct			; CHECK: bl return_large_struct

	ret void			ret void
	}			}


	declare i32 @struct_on_stack(i8 %var0, i16 %var1, i32 %var2, i64 %var3, i128 %var45,			declare i32 @struct_on_stack(i8 %var0, i16 %var1, i32 %var2, i64 %var3, i128 %var45,
	i32* %var6, %myStruct* byval %struct, i32 %stacked,			i32* %var6, %myStruct* byval %struct, i32 %stacked,
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/nontemporal.ll

Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
store <2 x float> %v, <2 x float>* %tmp1, align 1, !nontemporal !0		store <2 x float> %v, <2 x float>* %tmp1, align 1, !nontemporal !0
ret void		ret void
}		}

declare void @dummy(<4 x float>*)		declare void @dummy(<4 x float>*)

define void @test_stnp_v4f32_offset_alloca(<4 x float> %v) #0 {		define void @test_stnp_v4f32_offset_alloca(<4 x float> %v) #0 {
; CHECK-LABEL: test_stnp_v4f32_offset_alloca:		; CHECK-LABEL: test_stnp_v4f32_offset_alloca:
; CHECK: stnp d0, d{{.*}}, [sp]		; CHECK: mov x0, sp
; CHECK-NEXT: mov x0, sp		; CHECK-NEXT: stnp d0, d{{.*}}, [sp]
; CHECK-NEXT: bl _dummy		; CHECK-NEXT: bl _dummy
%tmp0 = alloca <4 x float>		%tmp0 = alloca <4 x float>
store <4 x float> %v, <4 x float>* %tmp0, align 1, !nontemporal !0		store <4 x float> %v, <4 x float>* %tmp0, align 1, !nontemporal !0
call void @dummy(<4 x float>* %tmp0)		call void @dummy(<4 x float>* %tmp0)
ret void		ret void
}		}

define void @test_stnp_v4f32_offset_alloca_2(<4 x float> %v) #0 {		define void @test_stnp_v4f32_offset_alloca_2(<4 x float> %v) #0 {
; CHECK-LABEL: test_stnp_v4f32_offset_alloca_2:		; CHECK-LABEL: test_stnp_v4f32_offset_alloca_2:
; CHECK: stnp d0, d{{.*}}, [sp, #16]		; CHECK: mov x0, sp
; CHECK-NEXT: mov x0, sp		; CHECK-NEXT: stnp d0, d{{.*}}, [sp, #16]
; CHECK-NEXT: bl _dummy		; CHECK-NEXT: bl _dummy
%tmp0 = alloca <4 x float>, i32 2		%tmp0 = alloca <4 x float>, i32 2
%tmp1 = getelementptr <4 x float>, <4 x float>* %tmp0, i32 1		%tmp1 = getelementptr <4 x float>, <4 x float>* %tmp0, i32 1
store <4 x float> %v, <4 x float>* %tmp1, align 1, !nontemporal !0		store <4 x float> %v, <4 x float>* %tmp1, align 1, !nontemporal !0
call void @dummy(<4 x float>* %tmp0)		call void @dummy(<4 x float>* %tmp0)
ret void		ret void
}		}

!0 = !{ i32 1 }		!0 = !{ i32 1 }

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AArch64/swifterror.ll

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
}		}

%struct.S = type { i32, i32, i32, i32, i32, i32 }		%struct.S = type { i32, i32, i32, i32, i32, i32 }

; "foo_sret" is a function that takes a swifterror parameter, it also has a sret		; "foo_sret" is a function that takes a swifterror parameter, it also has a sret
; parameter.		; parameter.
define void @foo_sret(%struct.S* sret %agg.result, i32 %val1, %swift_error** swifterror %error_ptr_ref) {		define void @foo_sret(%struct.S* sret %agg.result, i32 %val1, %swift_error** swifterror %error_ptr_ref) {
; CHECK-APPLE-LABEL: foo_sret:		; CHECK-APPLE-LABEL: foo_sret:
; CHECK-APPLE: mov [[SRET:x[0-9]+]], x8
; CHECK-APPLE: orr w0, wzr, #0x10		; CHECK-APPLE: orr w0, wzr, #0x10
		; CHECK-APPLE: mov [[SRET:x[0-9]+]], x8
; CHECK-APPLE: malloc		; CHECK-APPLE: malloc
; CHECK-APPLE: orr [[ID:w[0-9]+]], wzr, #0x1		; CHECK-APPLE: orr [[ID:w[0-9]+]], wzr, #0x1
; CHECK-APPLE: strb [[ID]], [x0, #8]		; CHECK-APPLE: strb [[ID]], [x0, #8]
; CHECK-APPLE: str w{{.}}, [{{.}}[[SRET]], #4]		; CHECK-APPLE: str w{{.}}, [{{.}}[[SRET]], #4]
; CHECK-APPLE: mov x21, x0		; CHECK-APPLE: mov x21, x0
; CHECK-APPLE-NOT: x21		; CHECK-APPLE-NOT: x21

; CHECK-O0-LABEL: foo_sret:		; CHECK-O0-LABEL: foo_sret:
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	entry:
ret float %0		ret float %0
}		}
define swiftcc float @tailcallswifterror_swiftcc(%swift_error** swifterror %error_ptr_ref) {		define swiftcc float @tailcallswifterror_swiftcc(%swift_error** swifterror %error_ptr_ref) {
entry:		entry:
%0 = tail call swiftcc float @tailcallswifterror_swiftcc(%swift_error** swifterror %error_ptr_ref)		%0 = tail call swiftcc float @tailcallswifterror_swiftcc(%swift_error** swifterror %error_ptr_ref)
ret float %0		ret float %0
}		}

; CHECK-APPLE-LABEL: swifterror_clobber		; CHECK-APPLE-LABEL: swifterror_clobber:
; CHECK-APPLE: mov [[REG:x[0-9]+]], x21		; CHECK-APPLE: mov [[REG:x[0-9]+]], x21
; CHECK-APPLE: nop		; CHECK-APPLE: nop
; CHECK-APPLE: mov x21, [[REG]]		; CHECK-APPLE: mov x21, [[REG]]
define swiftcc void @swifterror_clobber(%swift_error** nocapture swifterror %err) {		define swiftcc void @swifterror_clobber(%swift_error** nocapture swifterror %err) {
call void asm sideeffect "nop", "~{x21}"()		call void asm sideeffect "nop", "~{x21}"()
ret void		ret void
}		}

; CHECK-APPLE-LABEL: swifterror_reg_clobber		; CHECK-APPLE-LABEL: swifterror_reg_clobber:
; CHECK-APPLE: stp {{.*}}x21		; CHECK-APPLE: stp {{.*}}x21
; CHECK-APPLE: nop		; CHECK-APPLE: nop
; CHECK-APPLE: ldp {{.*}}x21		; CHECK-APPLE: ldp {{.*}}x21
define swiftcc void @swifterror_reg_clobber(%swift_error** nocapture %err) {		define swiftcc void @swifterror_reg_clobber(%swift_error** nocapture %err) {
call void asm sideeffect "nop", "~{x21}"()		call void asm sideeffect "nop", "~{x21}"()
ret void		ret void
}		}
; CHECK-APPLE-LABEL: params_in_reg		; CHECK-APPLE-LABEL: params_in_reg:
; Save callee saved registers and swifterror since it will be clobbered by the first call to params_in_reg2.		; Save callee saved registers and swifterror since it will be clobbered by the first call to params_in_reg2.
; CHECK-APPLE: stp x21, x28, [sp		; CHECK-APPLE: stp x21, x28, [sp
; CHECK-APPLE: stp x27, x26, [sp		; CHECK-APPLE: stp x27, x26, [sp
; CHECK-APPLE: stp x25, x24, [sp		; CHECK-APPLE: stp x25, x24, [sp
; CHECK-APPLE: stp x23, x22, [sp		; CHECK-APPLE: stp x23, x22, [sp
; CHECK-APPLE: stp x20, x19, [sp		; CHECK-APPLE: stp x20, x19, [sp
; CHECK-APPLE: stp x29, x30, [sp		; CHECK-APPLE: stp x29, x30, [sp
; CHECK-APPLE: str x20, [sp		; CHECK-APPLE: str x7, [sp
; Store argument registers.		; Store argument registers.
; CHECK-APPLE: mov x23, x7		; CHECK-APPLE: mov x23, x6
; CHECK-APPLE: mov x24, x6		; CHECK-APPLE: mov x24, x5
; CHECK-APPLE: mov x25, x5		; CHECK-APPLE: mov x25, x4
; CHECK-APPLE: mov x26, x4		; CHECK-APPLE: mov x26, x3
; CHECK-APPLE: mov x27, x3		; CHECK-APPLE: mov x27, x2
; CHECK-APPLE: mov x28, x2		; CHECK-APPLE: mov x28, x1
; CHECK-APPLE: mov x19, x1		; CHECK-APPLE: mov x19, x0
; CHECK-APPLE: mov x22, x0
; Setup call.		; Setup call.
; CHECK-APPLE: orr w0, wzr, #0x1		; CHECK-APPLE: orr w0, wzr, #0x1
; CHECK-APPLE: orr w1, wzr, #0x2		; CHECK-APPLE: orr w1, wzr, #0x2
; CHECK-APPLE: orr w2, wzr, #0x3		; CHECK-APPLE: orr w2, wzr, #0x3
; CHECK-APPLE: orr w3, wzr, #0x4		; CHECK-APPLE: orr w3, wzr, #0x4
; CHECK-APPLE: mov w4, #5		; CHECK-APPLE: mov w4, #5
; CHECK-APPLE: orr w5, wzr, #0x6		; CHECK-APPLE: orr w5, wzr, #0x6
; CHECK-APPLE: orr w6, wzr, #0x7		; CHECK-APPLE: orr w6, wzr, #0x7
; CHECK-APPLE: orr w7, wzr, #0x8		; CHECK-APPLE: orr w7, wzr, #0x8
		; CHECK-APPLE: mov x22, x20
; CHECK-APPLE: mov x20, xzr		; CHECK-APPLE: mov x20, xzr
; CHECK-APPLE: mov x21, xzr		; CHECK-APPLE: mov x21, xzr
; CHECK-APPLE: bl _params_in_reg2		; CHECK-APPLE: bl _params_in_reg2
; Restore original arguments for next call.		; Restore original arguments for next call.
; CHECK-APPLE: mov x0, x22		; CHECK-APPLE: mov x0, x19
; CHECK-APPLE: mov x1, x19		; CHECK-APPLE: mov x1, x28
; CHECK-APPLE: mov x2, x28		; CHECK-APPLE: mov x2, x27
; CHECK-APPLE: mov x3, x27		; CHECK-APPLE: mov x3, x26
; CHECK-APPLE: mov x4, x26		; CHECK-APPLE: mov x4, x25
; CHECK-APPLE: mov x5, x25		; CHECK-APPLE: mov x5, x24
; CHECK-APPLE: mov x6, x24
; CHECK-APPLE: mov x7, x23
; Restore original swiftself argument and swifterror %err.		; Restore original swiftself argument and swifterror %err.
; CHECK-APPLE: ldp x20, x21, [sp		; CHECK-APPLE: ldp x7, x21, [sp
		; CHECK-APPLE: mov x20, x22
; CHECK-APPLE: bl _params_in_reg2		; CHECK-APPLE: bl _params_in_reg2
; Restore calle save registers but don't clober swifterror x21.		; Restore calle save registers but don't clober swifterror x21.
; CHECK-APPLE-NOT: x21		; CHECK-APPLE-NOT: x21
; CHECK-APPLE: ldp x29, x30, [sp		; CHECK-APPLE: ldp x29, x30, [sp
; CHECK-APPLE-NOT: x21		; CHECK-APPLE-NOT: x21
; CHECK-APPLE: ldp x20, x19, [sp		; CHECK-APPLE: ldp x20, x19, [sp
; CHECK-APPLE-NOT: x21		; CHECK-APPLE-NOT: x21
; CHECK-APPLE: ldp x23, x22, [sp		; CHECK-APPLE: ldp x23, x22, [sp
Show All 9 Lines	define swiftcc void @params_in_reg(i64, i64, i64, i64, i64, i64, i64, i64, i8* swiftself, %swift_error** nocapture swifterror %err) {
%error_ptr_ref = alloca swifterror %swift_error*, align 8		%error_ptr_ref = alloca swifterror %swift_error*, align 8
store %swift_error* null, %swift_error** %error_ptr_ref		store %swift_error* null, %swift_error** %error_ptr_ref
call swiftcc void @params_in_reg2(i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i8* swiftself null, %swift_error** nocapture swifterror %error_ptr_ref)		call swiftcc void @params_in_reg2(i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i8* swiftself null, %swift_error** nocapture swifterror %error_ptr_ref)
call swiftcc void @params_in_reg2(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i8* swiftself %8, %swift_error** nocapture swifterror %err)		call swiftcc void @params_in_reg2(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i8* swiftself %8, %swift_error** nocapture swifterror %err)
ret void		ret void
}		}
declare swiftcc void @params_in_reg2(i64, i64, i64, i64, i64, i64, i64, i64, i8* swiftself, %swift_error** nocapture swifterror %err)		declare swiftcc void @params_in_reg2(i64, i64, i64, i64, i64, i64, i64, i64, i8* swiftself, %swift_error** nocapture swifterror %err)

; CHECK-APPLE-LABEL: params_and_return_in_reg		; CHECK-APPLE-LABEL: params_and_return_in_reg:
; Store callee saved registers.		; Store callee saved registers.
; CHECK-APPLE: stp x20, x28, [sp, #24		; CHECK-APPLE: stp x7, x28, [sp, #24
; CHECK-APPLE: stp x27, x26, [sp		; CHECK-APPLE: stp x27, x26, [sp
; CHECK-APPLE: stp x25, x24, [sp		; CHECK-APPLE: stp x25, x24, [sp
; CHECK-APPLE: stp x23, x22, [sp		; CHECK-APPLE: stp x23, x22, [sp
; CHECK-APPLE: stp x20, x19, [sp		; CHECK-APPLE: stp x20, x19, [sp
; CHECK-APPLE: stp x29, x30, [sp		; CHECK-APPLE: stp x29, x30, [sp
; Save original arguments.		; Save original arguments.
; CHECK-APPLE: mov x23, x21		; CHECK-APPLE: mov x23, x21
; CHECK-APPLE: str x7, [sp, #16]		; CHECK-APPLE: str x6, [sp, #16]
; CHECK-APPLE: mov x24, x6		; CHECK-APPLE: mov x24, x5
; CHECK-APPLE: mov x25, x5		; CHECK-APPLE: mov x25, x4
; CHECK-APPLE: mov x26, x4		; CHECK-APPLE: mov x26, x3
; CHECK-APPLE: mov x27, x3		; CHECK-APPLE: mov x27, x2
; CHECK-APPLE: mov x28, x2		; CHECK-APPLE: mov x28, x1
; CHECK-APPLE: mov x19, x1		; CHECK-APPLE: mov x19, x0
; CHECK-APPLE: mov x22, x0
; Setup call arguments.		; Setup call arguments.
; CHECK-APPLE: orr w0, wzr, #0x1		; CHECK-APPLE: orr w0, wzr, #0x1
; CHECK-APPLE: orr w1, wzr, #0x2		; CHECK-APPLE: orr w1, wzr, #0x2
; CHECK-APPLE: orr w2, wzr, #0x3		; CHECK-APPLE: orr w2, wzr, #0x3
; CHECK-APPLE: orr w3, wzr, #0x4		; CHECK-APPLE: orr w3, wzr, #0x4
; CHECK-APPLE: mov w4, #5		; CHECK-APPLE: mov w4, #5
; CHECK-APPLE: orr w5, wzr, #0x6		; CHECK-APPLE: orr w5, wzr, #0x6
; CHECK-APPLE: orr w6, wzr, #0x7		; CHECK-APPLE: orr w6, wzr, #0x7
; CHECK-APPLE: orr w7, wzr, #0x8		; CHECK-APPLE: orr w7, wzr, #0x8
		; CHECK-APPLE: mov x22, x20
; CHECK-APPLE: mov x20, xzr		; CHECK-APPLE: mov x20, xzr
; CHECK-APPLE: mov x21, xzr		; CHECK-APPLE: mov x21, xzr
; CHECK-APPLE: bl _params_in_reg2		; CHECK-APPLE: bl _params_in_reg2
; Store swifterror %error_ptr_ref.		; Store swifterror %error_ptr_ref.
; CHECK-APPLE: str x21, [sp, #8]		; CHECK-APPLE: str x21, [sp, #8]
; Setup call arguments from original arguments.		; Setup call arguments from original arguments.
; CHECK-APPLE: mov x0, x22		; CHECK-APPLE: mov x0, x19
; CHECK-APPLE: mov x1, x19		; CHECK-APPLE: mov x1, x28
; CHECK-APPLE: mov x2, x28		; CHECK-APPLE: mov x2, x27
; CHECK-APPLE: mov x3, x27		; CHECK-APPLE: mov x3, x26
; CHECK-APPLE: mov x4, x26		; CHECK-APPLE: mov x4, x25
; CHECK-APPLE: mov x5, x25		; CHECK-APPLE: mov x5, x24
; CHECK-APPLE: mov x6, x24		; CHECK-APPLE: ldp x6, x7, [sp, #16]
; CHECK-APPLE: ldp x7, x20, [sp, #16]		; CHECK-APPLE: mov x20, x22
; CHECK-APPLE: mov x21, x23		; CHECK-APPLE: mov x21, x23
; CHECK-APPLE: bl _params_and_return_in_reg2		; CHECK-APPLE: bl _params_and_return_in_reg2
		; Save swifterror %err.
		; CHECK-APPLE: str x0, [sp, #24]
; Store return values.		; Store return values.
; CHECK-APPLE: mov x19, x0
; CHECK-APPLE: mov x22, x1		; CHECK-APPLE: mov x22, x1
; CHECK-APPLE: mov x24, x2		; CHECK-APPLE: mov x24, x2
; CHECK-APPLE: mov x25, x3		; CHECK-APPLE: mov x25, x3
; CHECK-APPLE: mov x26, x4		; CHECK-APPLE: mov x26, x4
; CHECK-APPLE: mov x27, x5		; CHECK-APPLE: mov x27, x5
; CHECK-APPLE: mov x28, x6		; CHECK-APPLE: mov x28, x6
; CHECK-APPLE: mov x23, x7		; CHECK-APPLE: mov x23, x7
; Save swifterror %err.
; CHECK-APPLE: str x21, [sp, #24]
; Setup call.		; Setup call.
; CHECK-APPLE: orr w0, wzr, #0x1		; CHECK-APPLE: orr w0, wzr, #0x1
; CHECK-APPLE: orr w1, wzr, #0x2		; CHECK-APPLE: orr w1, wzr, #0x2
; CHECK-APPLE: orr w2, wzr, #0x3		; CHECK-APPLE: orr w2, wzr, #0x3
; CHECK-APPLE: orr w3, wzr, #0x4		; CHECK-APPLE: orr w3, wzr, #0x4
; CHECK-APPLE: mov w4, #5		; CHECK-APPLE: mov w4, #5
; CHECK-APPLE: orr w5, wzr, #0x6		; CHECK-APPLE: orr w5, wzr, #0x6
; CHECK-APPLE: orr w6, wzr, #0x7		; CHECK-APPLE: orr w6, wzr, #0x7
; CHECK-APPLE: orr w7, wzr, #0x8		; CHECK-APPLE: orr w7, wzr, #0x8
		; CHECK-APPLE: mov x19, x21
; CHECK-APPLE: mov x20, xzr		; CHECK-APPLE: mov x20, xzr
; ... setup call with swiferror %error_ptr_ref.		; ... setup call with swiferror %error_ptr_ref.
; CHECK-APPLE: ldr x21, [sp, #8]		; CHECK-APPLE: ldr x21, [sp, #8]
; CHECK-APPLE: bl _params_in_reg2		; CHECK-APPLE: bl _params_in_reg2
; Restore return values for return from this function.		; Restore return values for return from this function.
; CHECK-APPLE: mov x0, x19
; CHECK-APPLE: mov x1, x22		; CHECK-APPLE: mov x1, x22
; CHECK-APPLE: mov x2, x24		; CHECK-APPLE: mov x2, x24
; CHECK-APPLE: mov x3, x25		; CHECK-APPLE: mov x3, x25
; CHECK-APPLE: mov x4, x26		; CHECK-APPLE: mov x4, x26
; CHECK-APPLE: mov x5, x27		; CHECK-APPLE: mov x5, x27
; CHECK-APPLE: mov x6, x28		; CHECK-APPLE: mov x6, x28
; CHECK-APPLE: mov x7, x23		; CHECK-APPLE: mov x7, x23
		; CHECK-APPLE: mov x21, x19
; Restore swifterror %err and callee save registers.		; Restore swifterror %err and callee save registers.
; CHECK-APPLE: ldp x21, x28, [sp, #24
; CHECK-APPLE: ldp x29, x30, [sp		; CHECK-APPLE: ldp x29, x30, [sp
; CHECK-APPLE: ldp x20, x19, [sp		; CHECK-APPLE: ldp x20, x19, [sp
; CHECK-APPLE: ldp x23, x22, [sp		; CHECK-APPLE: ldp x23, x22, [sp
; CHECK-APPLE: ldp x25, x24, [sp		; CHECK-APPLE: ldp x25, x24, [sp
; CHECK-APPLE: ldp x27, x26, [sp		; CHECK-APPLE: ldp x27, x26, [sp
		; CHECK-APPLE: ldp x0, x28, [sp, #24
; CHECK-APPLE: ret		; CHECK-APPLE: ret
define swiftcc { i64, i64, i64, i64, i64, i64, i64, i64 } @params_and_return_in_reg(i64, i64, i64, i64, i64, i64, i64, i64, i8* swiftself, %swift_error** nocapture swifterror %err) {		define swiftcc { i64, i64, i64, i64, i64, i64, i64, i64 } @params_and_return_in_reg(i64, i64, i64, i64, i64, i64, i64, i64, i8* swiftself, %swift_error** nocapture swifterror %err) {
%error_ptr_ref = alloca swifterror %swift_error*, align 8		%error_ptr_ref = alloca swifterror %swift_error*, align 8
store %swift_error* null, %swift_error** %error_ptr_ref		store %swift_error* null, %swift_error** %error_ptr_ref
call swiftcc void @params_in_reg2(i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i8* swiftself null, %swift_error** nocapture swifterror %error_ptr_ref)		call swiftcc void @params_in_reg2(i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i8* swiftself null, %swift_error** nocapture swifterror %error_ptr_ref)
%val = call swiftcc { i64, i64, i64, i64, i64, i64, i64, i64 } @params_and_return_in_reg2(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i8* swiftself %8, %swift_error** nocapture swifterror %err)		%val = call swiftcc { i64, i64, i64, i64, i64, i64, i64, i64 } @params_and_return_in_reg2(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i8* swiftself %8, %swift_error** nocapture swifterror %err)
call swiftcc void @params_in_reg2(i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i8* swiftself null, %swift_error** nocapture swifterror %error_ptr_ref)		call swiftcc void @params_in_reg2(i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i8* swiftself null, %swift_error** nocapture swifterror %error_ptr_ref)
ret { i64, i64, i64, i64, i64, i64, i64, i64 } %val		ret { i64, i64, i64, i64, i64, i64, i64, i64 } %val
Show All 13 Lines
entry:		entry:
tail call void @acallee(i8* null)		tail call void @acallee(i8* null)
ret void		ret void
}		}

declare swiftcc void @foo2(%swift_error** swifterror)		declare swiftcc void @foo2(%swift_error** swifterror)

; Make sure we properly assign registers during fast-isel.		; Make sure we properly assign registers during fast-isel.
; CHECK-O0-LABEL: testAssign		; CHECK-O0-LABEL: testAssign:
; CHECK-O0: mov [[TMP:x.*]], xzr		; CHECK-O0: mov [[TMP:x.*]], xzr
; CHECK-O0: mov x21, [[TMP]]		; CHECK-O0: mov x21, [[TMP]]
; CHECK-O0: bl _foo2		; CHECK-O0: bl _foo2
; CHECK-O0: str x21, [s[[STK:.*]]]		; CHECK-O0: str x21, [s[[STK:.*]]]
; CHECK-O0: ldr x0, [s[[STK]]]		; CHECK-O0: ldr x0, [s[[STK]]]

; CHECK-APPLE-LABEL: testAssign		; CHECK-APPLE-LABEL: testAssign:
; CHECK-APPLE: mov x21, xzr		; CHECK-APPLE: mov x21, xzr
; CHECK-APPLE: bl _foo2		; CHECK-APPLE: bl _foo2
; CHECK-APPLE: mov x0, x21		; CHECK-APPLE: mov x0, x21

define swiftcc %swift_error* @testAssign(i8* %error_ref) {		define swiftcc %swift_error* @testAssign(i8* %error_ref) {
entry:		entry:
%error_ptr = alloca swifterror %swift_error*		%error_ptr = alloca swifterror %swift_error*
store %swift_error* null, %swift_error** %error_ptr		store %swift_error* null, %swift_error** %error_ptr
call swiftcc void @foo2(%swift_error** swifterror %error_ptr)		call swiftcc void @foo2(%swift_error** swifterror %error_ptr)
br label %a		br label %a

a:		a:
%error = load %swift_error, %swift_error* %error_ptr		%error = load %swift_error, %swift_error* %error_ptr
ret %swift_error* %error		ret %swift_error* %error
}		}

llvm/trunk/test/CodeGen/AArch64/tailcall-explicit-sret.ll

	Show All 30 Lines
	define void @test_tailcall_explicit_sret_alloca_unused() #0 {			define void @test_tailcall_explicit_sret_alloca_unused() #0 {
	%l = alloca i1024, align 8			%l = alloca i1024, align 8
	tail call void @test_explicit_sret(i1024* %l)			tail call void @test_explicit_sret(i1024* %l)
	ret void			ret void
	}			}

	; CHECK-LABEL: _test_tailcall_explicit_sret_alloca_dummyusers:			; CHECK-LABEL: _test_tailcall_explicit_sret_alloca_dummyusers:
	; CHECK: ldr [[PTRLOAD1:q[0-9]+]], [x0]			; CHECK: ldr [[PTRLOAD1:q[0-9]+]], [x0]
	; CHECK: str [[PTRLOAD1]], [sp]
	; CHECK: mov x8, sp			; CHECK: mov x8, sp
	; CHECK-NEXT: bl _test_explicit_sret			; CHECK: str [[PTRLOAD1]], [sp]
				; CHECK: bl _test_explicit_sret
	; CHECK: ret			; CHECK: ret
	define void @test_tailcall_explicit_sret_alloca_dummyusers(i1024* %ptr) #0 {			define void @test_tailcall_explicit_sret_alloca_dummyusers(i1024* %ptr) #0 {
	%l = alloca i1024, align 8			%l = alloca i1024, align 8
	%r = load i1024, i1024* %ptr, align 8			%r = load i1024, i1024* %ptr, align 8
	store i1024 %r, i1024* %l, align 8			store i1024 %r, i1024* %l, align 8
	tail call void @test_explicit_sret(i1024* %l)			tail call void @test_explicit_sret(i1024* %l)
	ret void			ret void
	}			}
	Show All 20 Lines
	define i1024 @test_tailcall_explicit_sret_alloca_returned() #0 {			define i1024 @test_tailcall_explicit_sret_alloca_returned() #0 {
	%l = alloca i1024, align 8			%l = alloca i1024, align 8
	tail call void @test_explicit_sret(i1024* %l)			tail call void @test_explicit_sret(i1024* %l)
	%r = load i1024, i1024* %l, align 8			%r = load i1024, i1024* %l, align 8
	ret i1024 %r			ret i1024 %r
	}			}

	; CHECK-LABEL: _test_indirect_tailcall_explicit_sret_nosret_arg:			; CHECK-LABEL: _test_indirect_tailcall_explicit_sret_nosret_arg:
	; CHECK-DAG: mov x[[CALLERX8NUM:[0-9]+]], x8			; CHECK: mov [[FPTR:x[0-9]+]], x0
	; CHECK-DAG: mov [[FPTR:x[0-9]+]], x0
	; CHECK: mov x0, sp			; CHECK: mov x0, sp
	; CHECK-NEXT: blr [[FPTR]]			; CHECK: mov x[[CALLERX8NUM:[0-9]+]], x8
				; CHECK: blr [[FPTR]]
	; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]			; CHECK: ldr [[CALLERSRET1:q[0-9]+]], [sp]
	; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]			; CHECK: str [[CALLERSRET1:q[0-9]+]], [x[[CALLERX8NUM]]]
	; CHECK: ret			; CHECK: ret
	define void @test_indirect_tailcall_explicit_sret_nosret_arg(i1024* sret %arg, void (i1024) %f) #0 {			define void @test_indirect_tailcall_explicit_sret_nosret_arg(i1024* sret %arg, void (i1024) %f) #0 {
	%l = alloca i1024, align 8			%l = alloca i1024, align 8
	tail call void %f(i1024* %l)			tail call void %f(i1024* %l)
	%r = load i1024, i1024* %l, align 8			%r = load i1024, i1024* %l, align 8
	store i1024 %r, i1024* %arg, align 8			store i1024 %r, i1024* %arg, align 8
	Show All 17 Lines