This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64AsmPrinter.cpp
1/6
AArch64ISelLowering.cpp
2/11
AArch64MachineFunctionInfo.h
-
AArch64RegisterInfo.h
1/4
AArch64RegisterInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-calling-convention-mixed.ll

Differential D127209

[SVE][AArch64] Refine hasSVEArgsOrReturn
ClosedPublic

Authored by MattDevereau on Jun 7 2022, 5:28 AM.

Download Raw Diff

Details

Reviewers

bsmith
paulwalker-arm
c-rhodes
david-arm
efriedma
peterwaller-arm

Commits

rG5166345f5041: [SVE][AArch64] Refine hasSVEArgsOrReturn

Summary

As described in aapcs64 (https://github.com/ARM-software/abi-aa/blob/2022Q1/aapcs64/aapcs64.rst#scalable-vector-registers) AAVPCS is used only when registers z0-z7 take an SVE argument. This fixes the case where floats occupy the lower bits of registers z0-z7 but SVE arguments in registers greater than z7 cause a function to use AAVPCS where it should use AAPCS.

Moving SVE function deduction from AArch64RegisterInfo::hasSVEArgsOrReturn to AArch64TargetLowering::LowerFormalArguments where physical register lowering is more accurate fixes this.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,070 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp

Event Timeline

MattDevereau created this revision.Jun 7 2022, 5:28 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJun 7 2022, 5:28 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ctetreau, psnobl, hiraditya and 2 others. · View Herald Transcript

MattDevereau requested review of this revision.Jun 7 2022, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 5:28 AM

Herald added subscribers: llvm-commits, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B168281: Diff 434788.Jun 7 2022, 6:55 AM

efriedma added inline comments.Jun 7 2022, 1:06 PM

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
76	Using isFloatTy() like this isn't going to be accurate. If you want to correctly compute the registers used for a call, you really should just use the result of analyzeCallOperands() from isel. Maybe compute it in isel, then save the result in AArch64FunctionInfo.

efriedma added inline comments.Jun 7 2022, 1:08 PM

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
76	Ignore my reference to analyzeCallOperands(). The actual computation of the registers used happens in AArch64TargetLowering::LowerFormalArguments.

Moved SVE function deduction from AArch64RegisterInfo::hasSVEArgsOrReturn to AArch64TargetLowering::LowerFormalArguments
Added IsSVE bool to AArch64FunctionInfo which describes if an AArch64Function has SVE args or return type in physical registers

Harbormaster completed remote builds in B169037: Diff 435858.Jun 10 2022, 3:47 AM

paulwalker-arm added inline comments.Jun 10 2022, 4:13 AM

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
68–69	Does `AArch64FunctionInfo` contain all the information we need? It looks like it contains a `MachineFunction *`. I ask because I'm wondering if we can just get rid of `AArch64RegisterInfo::hasSVEArgsOrReturn` entirely.

MattDevereau added inline comments.Jun 10 2022, 4:25 AM

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
68–69	`AArch64TargetLowering::isEligibleForTailCallOptimization` and `AArch64AsmPrinter::emitFunctionEntryLabel()` also use `AArch64RegisterInfo::hasSVEArgsOrReturn`. As `AArch64FunctionInfo` has a reference to a machine function, it should be possible to move `isa<ScalableVectorType>(MF->getFunction().getReturnType())` into the new `AArch64FunctionInfo::IsSVE` method. So yes, I do think `AArch64FunctionInfo` has all the information we need

Removed hasSVEArgsOrReturn

Harbormaster completed remote builds in B169048: Diff 435880.Jun 10 2022, 5:45 AM

MattDevereau edited the summary of this revision. (Show Details)Jun 10 2022, 6:06 AM

LGTM

This revision is now accepted and ready to land.Jun 10 2022, 11:53 AM

efriedma requested changes to this revision.Jun 10 2022, 12:05 PM

efriedma added inline comments.

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	Hmm... actually, looking at this again, I'm a little concerned about checking the IR return type directly, instead of using AnalyzeReturn. I'm not sure if this actually comes up in practice, but calling convention lowering does various transforms on the return type. Maybe if you write something silly like "<vscale x 1000 x double>". Maybe we can add a bit of code to LowerFormalArguments to also check the return type?

This revision now requires changes to proceed.Jun 10 2022, 12:05 PM

Matt added a subscriber: Matt.Jun 10 2022, 3:14 PM

MattDevereau added inline comments.Jun 15 2022, 5:51 AM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	Modifying test `@sve_caller_sve_callee()` in `sve-tailcall.ll` to use return type `<vscale x 1000 x double>` caused the output "LLVM ERROR: Invalid size request on a scalable vector.". In similar fashion to the changes in LowerFormalArguments, I checked the types of the return values in `AArch64TargetLowering::LowerReturn` instead of checking the IR return type directly. This ends up passing on the rest of the tests except `sve_caller_sve_callee` and `sve_caller_sve_callee_fastcc`, where the return type is not eligible for tail optimization and the frame-pointer is incorrectly unpreserved. Is directly checking the IR return type in `AArch64TargetLowering::LowerReturn` after `AnalyzeReturn` still not sufficient?

efriedma added inline comments.Jun 15 2022, 2:59 PM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	I don't think we call AArch64TargetLowering::LowerReturn for all functions, in general, only functions that return. We need to do the computation somewhere that's called for every function. Besides my `<vscale x 1000 x double>` example, I don't have any specific cases where just checking `isa<ScalableVectorType>` doesn't work, but I'd like to avoid falling out of sync if we do ever add support for, for example, returning fixed-width vectors in SVE registers. Modifying test @sve_caller_sve_callee() in sve-tailcall.ll to use return type <vscale x 1000 x double> caused the output "LLVM ERROR: Invalid size request on a scalable vector.". That sounds like a bug we should fix. Although maybe not a high priority one.

sdesmalen added a subscriber: sdesmalen.Jun 16 2022, 3:36 AM

sdesmalen added inline comments.

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	I don't think we call AArch64TargetLowering::LowerReturn for all functions, in general, only functions that return. We need to do the computation somewhere that's called for every function. I'm not sure I understand this point. For arguments this check is done in LowerFormalArguments, so that case is all covered. For the return value, I'd expect that LowerReturn is sufficient, because if the function doesn't return (and thus doesn't call LowerReturn), it also can't return a value in an SVE register.
202	nit: Should this be something like `UsesSVERegsForArgsOrReturnValue` ?

MattDevereau updated this revision to Diff 437524.Jun 16 2022, 6:28 AM

Harbormaster completed remote builds in B170249: Diff 437524.Jun 16 2022, 7:07 AM

efriedma added inline comments.Jun 16 2022, 10:12 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6203	I don't understand what you're doing here; FuncInfo describes the caller, not the callee.
llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	A function that doesn't return can still throw an exception.

MattDevereau added inline comments.Jun 16 2022, 10:36 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6203	Not doing this causes `sve_caller_sve_callee` and `sve_caller_sve_callee_fastcc` in `sve-tailcall.ll` to fail. From `ISD::InputArg`: /// InputArg - This struct carries flags and type information about a /// single incoming (formal) argument or incoming (from the perspective /// of the caller) return value virtual register. ///

efriedma added inline comments.Jun 16 2022, 10:53 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6203	In @sve_caller_sve_callee, the return type is an SVE type, so we should be calling `setIsSVECC(true)` elsewhere. I think this maybe another case of the issue that LowerReturn doesn't run/runs too late?

sdesmalen added inline comments.Jun 16 2022, 3:25 PM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	If a function never returns because the only 'return' is an exceptional return, then it doesn't have to preserve the entire contents of z8-z23, which I think is what this patch tries to ascertain. Perhaps I was just reading too much into your statement, which confused me. I would have expected the function that analyses/lowers the return to be the right place to handle this, if it is called at the right time of course.

efriedma added inline comments.Jun 16 2022, 4:15 PM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	If we throw an exception, the unwinder has to restore z8-z23, I think? Or is there some carveout for unwinding that lets us get away without preserving those registers somehow? Is that documented somewhere? There's also the issue that tail calls don't go through LowerReturn. In any case, given that we're calling isSVECC() inside isel, we should ensure that all the setIsSVECC() happen before we start calling isSVECC(); anything else is confusing at best...

sdesmalen added inline comments.Jun 17 2022, 1:37 AM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	If we throw an exception, the unwinder has to restore z8-z23, I think? Or is there some carveout for unwinding that lets us get away without preserving those registers somehow? Is that documented somewhere? I'm not entirely sure where this is officially documented (or whether this is just common practice of existing unwinders), but D84737 states that unwinders may only preserve the lower 64bits. There's also the issue that tail calls don't go through LowerReturn. In any case, given that we're calling isSVECC() inside isel, we should ensure that all the setIsSVECC() happen before we start calling isSVECC(); anything else is confusing at best... Yes, I agree.

MattDevereau added inline comments.Jun 20 2022, 5:26 AM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	Is there any particular case where isSVECC() is likely to be called before setIsSVECC()? I've not observed this so far while working on this

efriedma added inline comments.Jun 20 2022, 2:08 PM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
200	isEligibleForTailCallOptimization() calls isSVECC(); I expect that runs before LowerReturn().

MattDevereau added a reviewer: peterwaller-arm.Jun 22 2022, 1:52 AM

peterwaller-arm added inline comments.Jun 22 2022, 2:38 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6193–6199	Is this logic not the logic that this patch is trying to fix? This needs to be changed to make use of the isSVECC flag. I see here that the calling convention of the Callee is based on this, but doesn't it need to account for the rules as described in the summary of the patch? Thinking aloud, my interpretation of what's going on is that the underlying CallLoweringInfo::CallConv is C/Fast, because that's what the 'user' specified, but this is 'magically' implied to be a AArch64_SVE_VectorCall only when it matters during the lowering of call/return. Architecturally this is a little confusing since it means we have these bits of code which 'patch up' the calling convention at the last moment. Is it possible we can fix the CallingConv earlier? I take it not easily, because the details of which registers are used is decided by e.g. AnalyzeReturn, and this only runs during the lowering.

MattDevereau planned changes to this revision.Jun 28 2022, 3:39 AM

Added logic to LowerFormalArguments to check the return type for scalable vector types. This is more reliable than checking the return types in LowerReturn as LowerReturn does not always run when checking calls, i.e. in the tail call case

Harbormaster completed remote builds in B172771: Diff 441023.Jun 29 2022, 9:47 AM

LGTM

This revision is now accepted and ready to land.Jun 29 2022, 1:05 PM

peterwaller-arm added inline comments.Jun 30 2022, 1:59 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6202	Note: "Ins" contains the return values of the call, and "Outs" contains the parameters to the call. I think these are confusingly named, at best. This update of CallConv is a reference to CLI.CallConv, so it's setting the CallConv in the `CLI`, which gets passed around for various decision making. I note that the logic for setting `CalleeOutSVE` is incorrect, because it does not account for the position of the arguments. If I put a scalable vector in any parameter it will update the calling convention, even though we know that's not the correct logic. I didn't yet find if/where this matters. If it does not matter I think it may be best removed to save future confusion, since it's incorrect.

peterwaller-arm accepted this revision.Jun 30 2022, 3:30 AM

peterwaller-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6202	With reflection, I'm changing my position on this for this patch, so please go ahead.

This revision was landed with ongoing or failed builds.Jul 1 2022, 6:26 AM

Closed by commit rG5166345f5041: [SVE][AArch64] Refine hasSVEArgsOrReturn (authored by MattDevereau). · Explain Why

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rG5166345f5041: [SVE][AArch64] Refine hasSVEArgsOrReturn.

peterwaller-arm mentioned this in D129135: [doc][ReleaseNotes] Document AArch64 SVE ABI fix from D127209.Jul 5 2022, 2:27 AM

peterwaller-arm mentioned this in rG2db2a4e11240: [doc][ReleaseNotes] Document AArch64 SVE ABI fix from D127209.Jul 7 2022, 3:56 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64AsmPrinter.cpp

2 lines

AArch64ISelLowering.cpp

25 lines

AArch64MachineFunctionInfo.h

7 lines

AArch64RegisterInfo.h

2 lines

AArch64RegisterInfo.cpp

10 lines

test/

CodeGen/

AArch64/

sve-calling-convention-mixed.ll

187 lines

Diff 441023

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp

Show First 20 Lines • Show All 859 Lines • ▼ Show 20 Lines	for (unsigned JTI = 0, e = JT.size(); JTI != e; ++JTI) {
}		}
}		}
}		}

void AArch64AsmPrinter::emitFunctionEntryLabel() {		void AArch64AsmPrinter::emitFunctionEntryLabel() {
if (MF->getFunction().getCallingConv() == CallingConv::AArch64_VectorCall \|\|		if (MF->getFunction().getCallingConv() == CallingConv::AArch64_VectorCall \|\|
MF->getFunction().getCallingConv() ==		MF->getFunction().getCallingConv() ==
CallingConv::AArch64_SVE_VectorCall \|\|		CallingConv::AArch64_SVE_VectorCall \|\|
STI->getRegisterInfo()->hasSVEArgsOrReturn(MF)) {		MF->getInfo<AArch64FunctionInfo>()->isSVECC()) {
auto *TS =		auto *TS =
static_cast<AArch64TargetStreamer *>(OutStreamer->getTargetStreamer());		static_cast<AArch64TargetStreamer *>(OutStreamer->getTargetStreamer());
TS->emitDirectiveVariantPCS(CurrentFnSym);		TS->emitDirectiveVariantPCS(CurrentFnSym);
}		}

return AsmPrinter::emitFunctionEntryLabel();		return AsmPrinter::emitFunctionEntryLabel();
}		}

▲ Show 20 Lines • Show All 692 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,428 Lines • ▼ Show 20 Lines	return CC == CallingConv::WebKit_JS ? RetCC_AArch64_WebKit_JS
: RetCC_AArch64_AAPCS;		: RetCC_AArch64_AAPCS;
}		}

SDValue AArch64TargetLowering::LowerFormalArguments(		SDValue AArch64TargetLowering::LowerFormalArguments(
SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,		const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,
SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {		SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
		const Function &F = MF.getFunction();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
bool IsWin64 = Subtarget->isCallingConvWin64(MF.getFunction().getCallingConv());		bool IsWin64 = Subtarget->isCallingConvWin64(F.getCallingConv());
		AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();

		SmallVector<ISD::OutputArg, 4> Outs;
		GetReturnInfo(CallConv, F.getReturnType(), F.getAttributes(), Outs,
		DAG.getTargetLoweringInfo(), MF.getDataLayout());
		if (any_of(Outs, [](ISD::OutputArg &Out){ return Out.VT.isScalableVector(); }))
		FuncInfo->setIsSVECC(true);

// Assign locations to all of the incoming arguments.		// Assign locations to all of the incoming arguments.
SmallVector<CCValAssign, 16> ArgLocs;		SmallVector<CCValAssign, 16> ArgLocs;
DenseMap<unsigned, SDValue> CopiedRegs;		DenseMap<unsigned, SDValue> CopiedRegs;
CCState CCInfo(CallConv, isVarArg, MF, ArgLocs, *DAG.getContext());		CCState CCInfo(CallConv, isVarArg, MF, ArgLocs, *DAG.getContext());

// At this point, Ins[].VT may already be promoted to i32. To correctly		// At this point, Ins[].VT may already be promoted to i32. To correctly
// handle passing i8 as i8 instead of i32 on stack, we pass in both i32 and		// handle passing i8 as i8 instead of i32 on stack, we pass in both i32 and
// i8 to CC_AArch64_AAPCS with i32 being ValVT and i8 being LocVT.		// i8 to CC_AArch64_AAPCS with i32 being ValVT and i8 being LocVT.
// Since AnalyzeFormalArguments uses Ins[].VT for both ValVT and LocVT, here		// Since AnalyzeFormalArguments uses Ins[].VT for both ValVT and LocVT, here
// we use a special version of AnalyzeFormalArguments to pass in ValVT and		// we use a special version of AnalyzeFormalArguments to pass in ValVT and
// LocVT.		// LocVT.
unsigned NumArgs = Ins.size();		unsigned NumArgs = Ins.size();
Function::const_arg_iterator CurOrigArg = MF.getFunction().arg_begin();		Function::const_arg_iterator CurOrigArg = F.arg_begin();
unsigned CurArgIdx = 0;		unsigned CurArgIdx = 0;
for (unsigned i = 0; i != NumArgs; ++i) {		for (unsigned i = 0; i != NumArgs; ++i) {
MVT ValVT = Ins[i].VT;		MVT ValVT = Ins[i].VT;
if (Ins[i].isOrigArg()) {		if (Ins[i].isOrigArg()) {
std::advance(CurOrigArg, Ins[i].getOrigArgIndex() - CurArgIdx);		std::advance(CurOrigArg, Ins[i].getOrigArgIndex() - CurArgIdx);
CurArgIdx = Ins[i].getOrigArgIndex();		CurArgIdx = Ins[i].getOrigArgIndex();

// Get type of the original argument.		// Get type of the original argument.
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
RC = &AArch64::FPR16RegClass;		RC = &AArch64::FPR16RegClass;
else if (RegVT == MVT::f32)		else if (RegVT == MVT::f32)
RC = &AArch64::FPR32RegClass;		RC = &AArch64::FPR32RegClass;
else if (RegVT == MVT::f64 \|\| RegVT.is64BitVector())		else if (RegVT == MVT::f64 \|\| RegVT.is64BitVector())
RC = &AArch64::FPR64RegClass;		RC = &AArch64::FPR64RegClass;
else if (RegVT == MVT::f128 \|\| RegVT.is128BitVector())		else if (RegVT == MVT::f128 \|\| RegVT.is128BitVector())
RC = &AArch64::FPR128RegClass;		RC = &AArch64::FPR128RegClass;
else if (RegVT.isScalableVector() &&		else if (RegVT.isScalableVector() &&
RegVT.getVectorElementType() == MVT::i1)		RegVT.getVectorElementType() == MVT::i1) {
		FuncInfo->setIsSVECC(true);
RC = &AArch64::PPRRegClass;		RC = &AArch64::PPRRegClass;
else if (RegVT.isScalableVector())		} else if (RegVT.isScalableVector()) {
		FuncInfo->setIsSVECC(true);
RC = &AArch64::ZPRRegClass;		RC = &AArch64::ZPRRegClass;
else		} else
llvm_unreachable("RegVT not supported by FORMAL_ARGUMENTS Lowering");		llvm_unreachable("RegVT not supported by FORMAL_ARGUMENTS Lowering");

// Transform the arguments in physical registers into virtual ones.		// Transform the arguments in physical registers into virtual ones.
Register Reg = MF.addLiveIn(VA.getLocReg(), RC);		Register Reg = MF.addLiveIn(VA.getLocReg(), RC);
ArgValue = DAG.getCopyFromReg(Chain, DL, Reg, RegVT);		ArgValue = DAG.getCopyFromReg(Chain, DL, Reg, RegVT);

// If this is an 8, 16 or 32-bit value, it is really passed promoted		// If this is an 8, 16 or 32-bit value, it is really passed promoted
// to 64 bits. Insert an assert[sz]ext to capture this, then		// to 64 bits. Insert an assert[sz]ext to capture this, then
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = Ins.size(); i != e; ++i) {
} else {		} else {
if (Subtarget->isTargetILP32() && Ins[i].Flags.isPointer())		if (Subtarget->isTargetILP32() && Ins[i].Flags.isPointer())
ArgValue = DAG.getNode(ISD::AssertZext, DL, ArgValue.getValueType(),		ArgValue = DAG.getNode(ISD::AssertZext, DL, ArgValue.getValueType(),
ArgValue, DAG.getValueType(MVT::i32));		ArgValue, DAG.getValueType(MVT::i32));

// i1 arguments are zero-extended to i8 by the caller. Emit a		// i1 arguments are zero-extended to i8 by the caller. Emit a
// hint to reflect this.		// hint to reflect this.
if (Ins[i].isOrigArg()) {		if (Ins[i].isOrigArg()) {
Argument *OrigArg = MF.getFunction().getArg(Ins[i].getOrigArgIndex());		Argument *OrigArg = F.getArg(Ins[i].getOrigArgIndex());
if (OrigArg->getType()->isIntegerTy(1)) {		if (OrigArg->getType()->isIntegerTy(1)) {
if (!Ins[i].Flags.isZExt()) {		if (!Ins[i].Flags.isZExt()) {
ArgValue = DAG.getNode(AArch64ISD::ASSERT_ZEXT_BOOL, DL,		ArgValue = DAG.getNode(AArch64ISD::ASSERT_ZEXT_BOOL, DL,
ArgValue.getValueType(), ArgValue);		ArgValue.getValueType(), ArgValue);
}		}
}		}
}		}

InVals.push_back(ArgValue);		InVals.push_back(ArgValue);
}		}
}		}
assert((ArgLocs.size() + ExtraArgLocs) == Ins.size());		assert((ArgLocs.size() + ExtraArgLocs) == Ins.size());

// varargs		// varargs
AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
if (isVarArg) {		if (isVarArg) {
if (!Subtarget->isTargetDarwin() \|\| IsWin64) {		if (!Subtarget->isTargetDarwin() \|\| IsWin64) {
// The AAPCS variadic function ABI is identical to the non-variadic		// The AAPCS variadic function ABI is identical to the non-variadic
// one. As a result there may be more arguments in registers and we should		// one. As a result there may be more arguments in registers and we should
// save them for future reference.		// save them for future reference.
// Win64 variadic functions also pass arguments in registers, but all float		// Win64 variadic functions also pass arguments in registers, but all float
// arguments are passed in integer registers.		// arguments are passed in integer registers.
saveVarArgRegisters(CCInfo, DAG, DL, Chain);		saveVarArgRegisters(CCInfo, DAG, DL, Chain);
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::isEligibleForTailCallOptimization(
const Function &CallerF = MF.getFunction();		const Function &CallerF = MF.getFunction();
CallingConv::ID CallerCC = CallerF.getCallingConv();		CallingConv::ID CallerCC = CallerF.getCallingConv();

// Functions using the C or Fast calling convention that have an SVE signature		// Functions using the C or Fast calling convention that have an SVE signature
// preserve more registers and should assume the SVE_VectorCall CC.		// preserve more registers and should assume the SVE_VectorCall CC.
// The check for matching callee-saved regs will determine whether it is		// The check for matching callee-saved regs will determine whether it is
// eligible for TCO.		// eligible for TCO.
if ((CallerCC == CallingConv::C \|\| CallerCC == CallingConv::Fast) &&		if ((CallerCC == CallingConv::C \|\| CallerCC == CallingConv::Fast) &&
AArch64RegisterInfo::hasSVEArgsOrReturn(&MF))		MF.getInfo<AArch64FunctionInfo>()->isSVECC())
CallerCC = CallingConv::AArch64_SVE_VectorCall;		CallerCC = CallingConv::AArch64_SVE_VectorCall;

bool CCMatch = CallerCC == CalleeCC;		bool CCMatch = CallerCC == CalleeCC;

// When using the Windows calling convention on a non-windows OS, we want		// When using the Windows calling convention on a non-windows OS, we want
// to back up and restore X18 in such functions; we can't do a tail call		// to back up and restore X18 in such functions; we can't do a tail call
// from those functions.		// from those functions.
if (CallerCC == CallingConv::Win64 && !Subtarget->isTargetWindows() &&		if (CallerCC == CallingConv::Win64 && !Subtarget->isTargetWindows() &&
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,

if (CLI.CB && CLI.CB->getAttributes().hasFnAttr(Attribute::ReturnsTwice) &&		if (CLI.CB && CLI.CB->getAttributes().hasFnAttr(Attribute::ReturnsTwice) &&
!Subtarget->noBTIAtReturnTwice()) {		!Subtarget->noBTIAtReturnTwice()) {
GuardWithBTI = FuncInfo->branchTargetEnforcement();		GuardWithBTI = FuncInfo->branchTargetEnforcement();
}		}

// Check callee args/returns for SVE registers and set calling convention		// Check callee args/returns for SVE registers and set calling convention
// accordingly.		// accordingly.
if (CallConv == CallingConv::C \|\| CallConv == CallingConv::Fast) {		if (CallConv == CallingConv::C \|\| CallConv == CallingConv::Fast) {
bool CalleeOutSVE = any_of(Outs, [](ISD::OutputArg &Out){		bool CalleeOutSVE = any_of(Outs, [](ISD::OutputArg &Out){
return Out.VT.isScalableVector();		return Out.VT.isScalableVector();
});		});
bool CalleeInSVE = any_of(Ins, [](ISD::InputArg &In){		bool CalleeInSVE = any_of(Ins, [](ISD::InputArg &In){
return In.VT.isScalableVector();		return In.VT.isScalableVector();
});		});
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Is this logic not the logic that this patch is trying to fix? This needs to be changed to make use of the isSVECC flag. I see here that the calling convention of the Callee is based on this, but doesn't it need to account for the rules as described in the summary of the patch? Thinking aloud, my interpretation of what's going on is that the underlying CallLoweringInfo::CallConv is C/Fast, because that's what the 'user' specified, but this is 'magically' implied to be a AArch64_SVE_VectorCall only when it matters during the lowering of call/return. Architecturally this is a little confusing since it means we have these bits of code which 'patch up' the calling convention at the last moment. Is it possible we can fix the CallingConv earlier? I take it not easily, because the details of which registers are used is decided by e.g. AnalyzeReturn, and this only runs during the lowering. peterwaller-arm: Is this logic not the logic that this patch is trying to fix? This needs to be changed to make…

if (CalleeInSVE \|\| CalleeOutSVE)		if (CalleeInSVE \|\| CalleeOutSVE)
CallConv = CallingConv::AArch64_SVE_VectorCall;		CallConv = CallingConv::AArch64_SVE_VectorCall;
		peterwaller-armUnsubmitted Not Done Reply Inline Actions Note: "Ins" contains the return values of the call, and "Outs" contains the parameters to the call. I think these are confusingly named, at best. This update of CallConv is a reference to CLI.CallConv, so it's setting the CallConv in the `CLI`, which gets passed around for various decision making. I note that the logic for setting `CalleeOutSVE` is incorrect, because it does not account for the position of the arguments. If I put a scalable vector in any parameter it will update the calling convention, even though we know that's not the correct logic. I didn't yet find if/where this matters. If it does not matter I think it may be best removed to save future confusion, since it's incorrect. peterwaller-arm: Note: "Ins" contains the return values of the call, and "Outs" contains the parameters to the…
		peterwaller-armUnsubmitted Not Done Reply Inline Actions With reflection, I'm changing my position on this for this patch, so please go ahead. peterwaller-arm: With reflection, I'm changing my position on this for this patch, so please go ahead.
}		}
		efriedmaUnsubmitted Not Done Reply Inline Actions I don't understand what you're doing here; FuncInfo describes the caller, not the callee. efriedma: I don't understand what you're doing here; FuncInfo describes the caller, not the callee.
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions Not doing this causes `sve_caller_sve_callee` and `sve_caller_sve_callee_fastcc` in `sve-tailcall.ll` to fail. From `ISD::InputArg`: /// InputArg - This struct carries flags and type information about a /// single incoming (formal) argument or incoming (from the perspective /// of the caller) return value virtual register. /// MattDevereau: Not doing this causes `sve_caller_sve_callee` and `sve_caller_sve_callee_fastcc` in `sve…
		efriedmaUnsubmitted Not Done Reply Inline Actions In @sve_caller_sve_callee, the return type is an SVE type, so we should be calling `setIsSVECC(true)` elsewhere. I think this maybe another case of the issue that LowerReturn doesn't run/runs too late? efriedma: In @sve_caller_sve_callee, the return type is an SVE type, so we should be calling `setIsSVECC…

if (IsTailCall) {		if (IsTailCall) {
// Check if it's really possible to do a tail call.		// Check if it's really possible to do a tail call.
IsTailCall = isEligibleForTailCallOptimization(CLI);		IsTailCall = isEligibleForTailCallOptimization(CLI);

// A sibling call is one where we're under the usual C ABI and not planning		// A sibling call is one where we're under the usual C ABI and not planning
// to change that but can still do a tail call:		// to change that but can still do a tail call:
if (!TailCallOpt && IsTailCall && CallConv != CallingConv::Tail &&		if (!TailCallOpt && IsTailCall && CallConv != CallingConv::Tail &&
▲ Show 20 Lines • Show All 15,035 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	class AArch64FunctionInfo final : public MachineFunctionInfo {
/// extended record.		/// extended record.
bool HasSwiftAsyncContext = false;		bool HasSwiftAsyncContext = false;

/// The stack slot where the Swift asynchronous context is stored.		/// The stack slot where the Swift asynchronous context is stored.
int SwiftAsyncContextFrameIdx = std::numeric_limits<int>::max();		int SwiftAsyncContextFrameIdx = std::numeric_limits<int>::max();

bool IsMTETagged = false;		bool IsMTETagged = false;

		/// The function has Scalable Vector or Scalable Predicate register argument
		/// or return type
		bool IsSVECC = false;

/// True if the function need unwind information.		/// True if the function need unwind information.
mutable Optional<bool> NeedsDwarfUnwindInfo;		mutable Optional<bool> NeedsDwarfUnwindInfo;

/// True if the function need asynchronous unwind information.		/// True if the function need asynchronous unwind information.
mutable Optional<bool> NeedsAsyncDwarfUnwindInfo;		mutable Optional<bool> NeedsAsyncDwarfUnwindInfo;

public:		public:
explicit AArch64FunctionInfo(MachineFunction &MF);		explicit AArch64FunctionInfo(MachineFunction &MF);

MachineFunctionInfo *		MachineFunctionInfo *
clone(BumpPtrAllocator &Allocator, MachineFunction &DestMF,		clone(BumpPtrAllocator &Allocator, MachineFunction &DestMF,
const DenseMap<MachineBasicBlock , MachineBasicBlock > &Src2DstMBB)		const DenseMap<MachineBasicBlock , MachineBasicBlock > &Src2DstMBB)
const override;		const override;

		bool isSVECC() const { return IsSVECC; };
		void setIsSVECC(bool s) { IsSVECC = s; };

		efriedmaUnsubmitted Not Done Reply Inline Actions Hmm... actually, looking at this again, I'm a little concerned about checking the IR return type directly, instead of using AnalyzeReturn. I'm not sure if this actually comes up in practice, but calling convention lowering does various transforms on the return type. Maybe if you write something silly like "<vscale x 1000 x double>". Maybe we can add a bit of code to LowerFormalArguments to also check the return type? efriedma: Hmm... actually, looking at this again, I'm a little concerned about checking the IR return…
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions Modifying test `@sve_caller_sve_callee()` in `sve-tailcall.ll` to use return type `<vscale x 1000 x double>` caused the output "LLVM ERROR: Invalid size request on a scalable vector.". In similar fashion to the changes in LowerFormalArguments, I checked the types of the return values in `AArch64TargetLowering::LowerReturn` instead of checking the IR return type directly. This ends up passing on the rest of the tests except `sve_caller_sve_callee` and `sve_caller_sve_callee_fastcc`, where the return type is not eligible for tail optimization and the frame-pointer is incorrectly unpreserved. Is directly checking the IR return type in `AArch64TargetLowering::LowerReturn` after `AnalyzeReturn` still not sufficient? MattDevereau: Modifying test `@sve_caller_sve_callee()` in `sve-tailcall.ll` to use return type `<vscale x…
		efriedmaUnsubmitted Not Done Reply Inline Actions I don't think we call AArch64TargetLowering::LowerReturn for all functions, in general, only functions that return. We need to do the computation somewhere that's called for every function. Besides my `<vscale x 1000 x double>` example, I don't have any specific cases where just checking `isa<ScalableVectorType>` doesn't work, but I'd like to avoid falling out of sync if we do ever add support for, for example, returning fixed-width vectors in SVE registers. Modifying test @sve_caller_sve_callee() in sve-tailcall.ll to use return type <vscale x 1000 x double> caused the output "LLVM ERROR: Invalid size request on a scalable vector.". That sounds like a bug we should fix. Although maybe not a high priority one. efriedma: I don't think we call AArch64TargetLowering::LowerReturn for all functions, in general, only…
		sdesmalenUnsubmitted Not Done Reply Inline Actions I don't think we call AArch64TargetLowering::LowerReturn for all functions, in general, only functions that return. We need to do the computation somewhere that's called for every function. I'm not sure I understand this point. For arguments this check is done in LowerFormalArguments, so that case is all covered. For the return value, I'd expect that LowerReturn is sufficient, because if the function doesn't return (and thus doesn't call LowerReturn), it also can't return a value in an SVE register. sdesmalen: > I don't think we call AArch64TargetLowering::LowerReturn for all functions, in general, only…
		efriedmaUnsubmitted Not Done Reply Inline Actions A function that doesn't return can still throw an exception. efriedma: A function that doesn't return can still throw an exception.
		sdesmalenUnsubmitted Not Done Reply Inline Actions If a function never returns because the only 'return' is an exceptional return, then it doesn't have to preserve the entire contents of z8-z23, which I think is what this patch tries to ascertain. Perhaps I was just reading too much into your statement, which confused me. I would have expected the function that analyses/lowers the return to be the right place to handle this, if it is called at the right time of course. sdesmalen: If a function never returns because the only 'return' is an exceptional return, then it doesn't…
		efriedmaUnsubmitted Not Done Reply Inline Actions If we throw an exception, the unwinder has to restore z8-z23, I think? Or is there some carveout for unwinding that lets us get away without preserving those registers somehow? Is that documented somewhere? There's also the issue that tail calls don't go through LowerReturn. In any case, given that we're calling isSVECC() inside isel, we should ensure that all the setIsSVECC() happen before we start calling isSVECC(); anything else is confusing at best... efriedma: If we throw an exception, the unwinder has to restore z8-z23, I think? Or is there some…
		sdesmalenUnsubmitted Not Done Reply Inline Actions If we throw an exception, the unwinder has to restore z8-z23, I think? Or is there some carveout for unwinding that lets us get away without preserving those registers somehow? Is that documented somewhere? I'm not entirely sure where this is officially documented (or whether this is just common practice of existing unwinders), but D84737 states that unwinders may only preserve the lower 64bits. There's also the issue that tail calls don't go through LowerReturn. In any case, given that we're calling isSVECC() inside isel, we should ensure that all the setIsSVECC() happen before we start calling isSVECC(); anything else is confusing at best... Yes, I agree. sdesmalen: > If we throw an exception, the unwinder has to restore z8-z23, I think? Or is there some…
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions Is there any particular case where isSVECC() is likely to be called before setIsSVECC()? I've not observed this so far while working on this MattDevereau: Is there any particular case where isSVECC() is likely to be called before setIsSVECC()? I've…
		efriedmaUnsubmitted Not Done Reply Inline Actions isEligibleForTailCallOptimization() calls isSVECC(); I expect that runs before LowerReturn(). efriedma: isEligibleForTailCallOptimization() calls isSVECC(); I expect that runs before LowerReturn().
void initializeBaseYamlFields(const yaml::AArch64FunctionInfo &YamlMFI);		void initializeBaseYamlFields(const yaml::AArch64FunctionInfo &YamlMFI);

		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: Should this be something like `UsesSVERegsForArgsOrReturnValue` ? sdesmalen: nit: Should this be something like `UsesSVERegsForArgsOrReturnValue` ?
unsigned getBytesInStackArgArea() const { return BytesInStackArgArea; }		unsigned getBytesInStackArgArea() const { return BytesInStackArgArea; }
void setBytesInStackArgArea(unsigned bytes) { BytesInStackArgArea = bytes; }		void setBytesInStackArgArea(unsigned bytes) { BytesInStackArgArea = bytes; }

unsigned getArgumentStackToRestore() const { return ArgumentStackToRestore; }		unsigned getArgumentStackToRestore() const { return ArgumentStackToRestore; }
void setArgumentStackToRestore(unsigned bytes) {		void setArgumentStackToRestore(unsigned bytes) {
ArgumentStackToRestore = bytes;		ArgumentStackToRestore = bytes;
}		}

▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterInfo.h

Show All 36 Lines	public:
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const;		bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const;
bool isAnyArgRegReserved(const MachineFunction &MF) const;		bool isAnyArgRegReserved(const MachineFunction &MF) const;
void emitReservedArgRegCallError(const MachineFunction &MF) const;		void emitReservedArgRegCallError(const MachineFunction &MF) const;

void UpdateCustomCalleeSavedRegs(MachineFunction &MF) const;		void UpdateCustomCalleeSavedRegs(MachineFunction &MF) const;
void UpdateCustomCallPreservedMask(MachineFunction &MF,		void UpdateCustomCallPreservedMask(MachineFunction &MF,
const uint32_t **Mask) const;		const uint32_t **Mask) const;

static bool hasSVEArgsOrReturn(const MachineFunction *MF);

/// Code Generation virtual methods...		/// Code Generation virtual methods...
const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;		const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;
const MCPhysReg getDarwinCalleeSavedRegs(const MachineFunction MF) const;		const MCPhysReg getDarwinCalleeSavedRegs(const MachineFunction MF) const;
const MCPhysReg *		const MCPhysReg *
getCalleeSavedRegsViaCopy(const MachineFunction *MF) const;		getCalleeSavedRegsViaCopy(const MachineFunction *MF) const;
const uint32_t *getCallPreservedMask(const MachineFunction &MF,		const uint32_t *getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID) const override;		CallingConv::ID) const override;
const uint32_t *getDarwinCallPreservedMask(const MachineFunction &MF,		const uint32_t *getDarwinCallPreservedMask(const MachineFunction &MF,
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	for (int I = 0; CSR_AArch64_AAPCS_SaveList[I]; ++I) {
return true;		return true;
}		}
return false;		return false;
}		}

RegToUseForCFI = Reg;		RegToUseForCFI = Reg;
return true;		return true;
}		}

bool AArch64RegisterInfo::hasSVEArgsOrReturn(const MachineFunction *MF) {
const Function &F = MF->getFunction();
return isa<ScalableVectorType>(F.getReturnType()) \|\|
any_of(F.args(), [](const Argument &Arg) {
return isa<ScalableVectorType>(Arg.getType());
});
}

const MCPhysReg *		const MCPhysReg *
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Does `AArch64FunctionInfo` contain all the information we need? It looks like it contains a `MachineFunction `. I ask because I'm wondering if we can just get rid of `AArch64RegisterInfo::hasSVEArgsOrReturn` entirely. paulwalker-arm:* Does `AArch64FunctionInfo` contain all the information we need? It looks like it contains a…
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions `AArch64TargetLowering::isEligibleForTailCallOptimization` and `AArch64AsmPrinter::emitFunctionEntryLabel()` also use `AArch64RegisterInfo::hasSVEArgsOrReturn`. As `AArch64FunctionInfo` has a reference to a machine function, it should be possible to move `isa<ScalableVectorType>(MF->getFunction().getReturnType())` into the new `AArch64FunctionInfo::IsSVE` method. So yes, I do think `AArch64FunctionInfo` has all the information we need MattDevereau: `AArch64TargetLowering::isEligibleForTailCallOptimization` and `AArch64AsmPrinter…
AArch64RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {		AArch64RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
assert(MF && "Invalid MachineFunction pointer.");		assert(MF && "Invalid MachineFunction pointer.");

if (MF->getFunction().getCallingConv() == CallingConv::GHC)		if (MF->getFunction().getCallingConv() == CallingConv::GHC)
// GHC set of callee saved regs is empty as all those regs are		// GHC set of callee saved regs is empty as all those regs are
// used for passing STG regs around		// used for passing STG regs around
return CSR_AArch64_NoRegs_SaveList;		return CSR_AArch64_NoRegs_SaveList;
		efriedmaUnsubmitted Not Done Reply Inline Actions Using isFloatTy() like this isn't going to be accurate. If you want to correctly compute the registers used for a call, you really should just use the result of analyzeCallOperands() from isel. Maybe compute it in isel, then save the result in AArch64FunctionInfo. efriedma: Using isFloatTy() like this isn't going to be accurate. If you want to correctly compute the…
		efriedmaUnsubmitted Not Done Reply Inline Actions Ignore my reference to analyzeCallOperands(). The actual computation of the registers used happens in AArch64TargetLowering::LowerFormalArguments. efriedma: Ignore my reference to analyzeCallOperands(). The actual computation of the registers used…
if (MF->getFunction().getCallingConv() == CallingConv::AnyReg)		if (MF->getFunction().getCallingConv() == CallingConv::AnyReg)
return CSR_AArch64_AllRegs_SaveList;		return CSR_AArch64_AllRegs_SaveList;

// Darwin has its own CSR_AArch64_AAPCS_SaveList, which means most CSR save		// Darwin has its own CSR_AArch64_AAPCS_SaveList, which means most CSR save
// lists depending on that will need to have their Darwin variant as well.		// lists depending on that will need to have their Darwin variant as well.
if (MF->getSubtarget<AArch64Subtarget>().isTargetDarwin())		if (MF->getSubtarget<AArch64Subtarget>().isTargetDarwin())
return getDarwinCalleeSavedRegs(MF);		return getDarwinCalleeSavedRegs(MF);

Show All 13 Lines	AArch64RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
if (MF->getFunction().getCallingConv() == CallingConv::SwiftTail)		if (MF->getFunction().getCallingConv() == CallingConv::SwiftTail)
return CSR_AArch64_AAPCS_SwiftTail_SaveList;		return CSR_AArch64_AAPCS_SwiftTail_SaveList;
if (MF->getFunction().getCallingConv() == CallingConv::PreserveMost)		if (MF->getFunction().getCallingConv() == CallingConv::PreserveMost)
return CSR_AArch64_RT_MostRegs_SaveList;		return CSR_AArch64_RT_MostRegs_SaveList;
if (MF->getFunction().getCallingConv() == CallingConv::Win64)		if (MF->getFunction().getCallingConv() == CallingConv::Win64)
// This is for OSes other than Windows; Windows is a separate case further		// This is for OSes other than Windows; Windows is a separate case further
// above.		// above.
return CSR_AArch64_AAPCS_X18_SaveList;		return CSR_AArch64_AAPCS_X18_SaveList;
if (hasSVEArgsOrReturn(MF))		if (MF->getInfo<AArch64FunctionInfo>()->isSVECC())
return CSR_AArch64_SVE_AAPCS_SaveList;		return CSR_AArch64_SVE_AAPCS_SaveList;
return CSR_AArch64_AAPCS_SaveList;		return CSR_AArch64_AAPCS_SaveList;
}		}

const MCPhysReg *		const MCPhysReg *
AArch64RegisterInfo::getDarwinCalleeSavedRegs(const MachineFunction *MF) const {		AArch64RegisterInfo::getDarwinCalleeSavedRegs(const MachineFunction *MF) const {
assert(MF && "Invalid MachineFunction pointer.");		assert(MF && "Invalid MachineFunction pointer.");
assert(MF->getSubtarget<AArch64Subtarget>().isTargetDarwin() &&		assert(MF->getSubtarget<AArch64Subtarget>().isTargetDarwin() &&
▲ Show 20 Lines • Show All 748 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	entry:			entry:
	%ptr1.bc = bitcast double * %ptr1 to <vscale x 8 x double> *			%ptr1.bc = bitcast double * %ptr1 to <vscale x 8 x double> *
	store volatile <vscale x 8 x double> %x2, <vscale x 8 x double>* %ptr1.bc			store volatile <vscale x 8 x double> %x2, <vscale x 8 x double>* %ptr1.bc
	%ptr2.bc = bitcast double * %ptr2 to <vscale x 6 x double> *			%ptr2.bc = bitcast double * %ptr2 to <vscale x 6 x double> *
	store volatile <vscale x 6 x double> %x3, <vscale x 6 x double>* %ptr2.bc			store volatile <vscale x 6 x double> %x3, <vscale x 6 x double>* %ptr2.bc
	ret double %x0			ret double %x0
	}			}

				; Use AAVPCS, SVE register in z0-z7 used

				define void @aavpcs1(i32 %s0, i32 %s1, i32 %s2, i32 %s3, i32 %s4, i32 %s5, i32 %s6, <vscale x 4 x i32> %s7, <vscale x 4 x i32> %s8, <vscale x 4 x i32> %s9, <vscale x 4 x i32> %s10, <vscale x 4 x i32> %s11, <vscale x 4 x i32> %s12, <vscale x 4 x i32> %s13, <vscale x 4 x i32> %s14, <vscale x 4 x i32> %s15, <vscale x 4 x i32> %s16, i32 * %ptr) nounwind {
				; CHECK-LABEL: aavpcs1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldp x8, x9, [sp]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z3.s }, p0/z, [x8]
				; CHECK-NEXT: ld1w { z24.s }, p0/z, [x7]
				; CHECK-NEXT: st1w { z0.s }, p0, [x9]
				; CHECK-NEXT: st1w { z1.s }, p0, [x9]
				; CHECK-NEXT: st1w { z2.s }, p0, [x9]
				; CHECK-NEXT: st1w { z4.s }, p0, [x9]
				; CHECK-NEXT: st1w { z5.s }, p0, [x9]
				; CHECK-NEXT: st1w { z6.s }, p0, [x9]
				; CHECK-NEXT: st1w { z7.s }, p0, [x9]
				; CHECK-NEXT: st1w { z24.s }, p0, [x9]
				; CHECK-NEXT: st1w { z3.s }, p0, [x9]
				; CHECK-NEXT: ret
				entry:
				%ptr1.bc = bitcast i32 * %ptr to <vscale x 4 x i32> *
				store volatile <vscale x 4 x i32> %s7, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s8, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s9, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s11, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s12, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s13, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s14, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s15, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s16, <vscale x 4 x i32>* %ptr1.bc
				ret void
				}

				; Use AAVPCS, SVE register in z0-z7 used

				define void @aavpcs2(float %s0, float %s1, float %s2, float %s3, float %s4, float %s5, float %s6, <vscale x 4 x float> %s7, <vscale x 4 x float> %s8, <vscale x 4 x float> %s9, <vscale x 4 x float> %s10, <vscale x 4 x float> %s11, <vscale x 4 x float> %s12,<vscale x 4 x float> %s13,<vscale x 4 x float> %s14,<vscale x 4 x float> %s15,<vscale x 4 x float> %s16,float * %ptr) nounwind {
				; CHECK-LABEL: aavpcs2:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldp x8, x9, [sp]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x7]
				; CHECK-NEXT: ld1w { z2.s }, p0/z, [x6]
				; CHECK-NEXT: ld1w { z3.s }, p0/z, [x5]
				; CHECK-NEXT: ld1w { z4.s }, p0/z, [x4]
				; CHECK-NEXT: ld1w { z5.s }, p0/z, [x3]
				; CHECK-NEXT: ld1w { z6.s }, p0/z, [x1]
				; CHECK-NEXT: ld1w { z24.s }, p0/z, [x0]
				; CHECK-NEXT: st1w { z7.s }, p0, [x9]
				; CHECK-NEXT: st1w { z24.s }, p0, [x9]
				; CHECK-NEXT: st1w { z6.s }, p0, [x9]
				; CHECK-NEXT: st1w { z5.s }, p0, [x9]
				; CHECK-NEXT: st1w { z4.s }, p0, [x9]
				; CHECK-NEXT: st1w { z3.s }, p0, [x9]
				; CHECK-NEXT: st1w { z2.s }, p0, [x9]
				; CHECK-NEXT: st1w { z1.s }, p0, [x9]
				; CHECK-NEXT: st1w { z0.s }, p0, [x9]
				; CHECK-NEXT: ret
				entry:
				%ptr1.bc = bitcast float * %ptr to <vscale x 4 x float> *
				store volatile <vscale x 4 x float> %s7, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s8, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s9, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s11, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s12, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s13, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s14, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s15, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s16, <vscale x 4 x float>* %ptr1.bc
				ret void
				}

				; Use AAVPCS, no SVE register in z0-z7 used (floats occupy z0-z7) but predicate arg is used

				define void @aavpcs3(float %s0, float %s1, float %s2, float %s3, float %s4, float %s5, float %s6, float %s7, <vscale x 4 x float> %s8, <vscale x 4 x float> %s9, <vscale x 4 x float> %s10, <vscale x 4 x float> %s11, <vscale x 4 x float> %s12, <vscale x 4 x float> %s13, <vscale x 4 x float> %s14, <vscale x 4 x float> %s15, <vscale x 4 x float> %s16, <vscale x 4 x float> %s17, <vscale x 16 x i1> %p0, float * %ptr) nounwind {
				; CHECK-LABEL: aavpcs3:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr x8, [sp]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x7]
				; CHECK-NEXT: ld1w { z2.s }, p0/z, [x6]
				; CHECK-NEXT: ld1w { z3.s }, p0/z, [x5]
				; CHECK-NEXT: ld1w { z4.s }, p0/z, [x4]
				; CHECK-NEXT: ld1w { z5.s }, p0/z, [x3]
				; CHECK-NEXT: ld1w { z6.s }, p0/z, [x2]
				; CHECK-NEXT: ld1w { z7.s }, p0/z, [x1]
				; CHECK-NEXT: ld1w { z24.s }, p0/z, [x0]
				; CHECK-NEXT: ldr x8, [sp, #16]
				; CHECK-NEXT: st1w { z24.s }, p0, [x8]
				; CHECK-NEXT: st1w { z7.s }, p0, [x8]
				; CHECK-NEXT: st1w { z6.s }, p0, [x8]
				; CHECK-NEXT: st1w { z5.s }, p0, [x8]
				; CHECK-NEXT: st1w { z4.s }, p0, [x8]
				; CHECK-NEXT: st1w { z3.s }, p0, [x8]
				; CHECK-NEXT: st1w { z2.s }, p0, [x8]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8]
				; CHECK-NEXT: st1w { z0.s }, p0, [x8]
				; CHECK-NEXT: ret
				entry:
				%ptr1.bc = bitcast float * %ptr to <vscale x 4 x float> *
				store volatile <vscale x 4 x float> %s8, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s9, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s10, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s11, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s12, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s13, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s14, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s15, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s16, <vscale x 4 x float>* %ptr1.bc
				ret void
				}

				; use AAVPCS, SVE register in z0-z7 used (i32s dont occupy z0-z7)

				define void @aavpcs4(i32 %s0, i32 %s1, i32 %s2, i32 %s3, i32 %s4, i32 %s5, i32 %s6, i32 %s7, <vscale x 4 x i32> %s8, <vscale x 4 x i32> %s9, <vscale x 4 x i32> %s10, <vscale x 4 x i32> %s11, <vscale x 4 x i32> %s12, <vscale x 4 x i32> %s13, <vscale x 4 x i32> %s14, <vscale x 4 x i32> %s15, <vscale x 4 x i32> %s16, <vscale x 4 x i32> %s17, i32 * %ptr) nounwind {
				; CHECK-LABEL: aavpcs4:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr x8, [sp]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ldr x9, [sp, #16]
				; CHECK-NEXT: ld1w { z24.s }, p0/z, [x8]
				; CHECK-NEXT: st1w { z0.s }, p0, [x9]
				; CHECK-NEXT: st1w { z1.s }, p0, [x9]
				; CHECK-NEXT: st1w { z2.s }, p0, [x9]
				; CHECK-NEXT: st1w { z3.s }, p0, [x9]
				; CHECK-NEXT: st1w { z4.s }, p0, [x9]
				; CHECK-NEXT: st1w { z5.s }, p0, [x9]
				; CHECK-NEXT: st1w { z6.s }, p0, [x9]
				; CHECK-NEXT: st1w { z7.s }, p0, [x9]
				; CHECK-NEXT: st1w { z24.s }, p0, [x9]
				; CHECK-NEXT: ret
				entry:
				%ptr1.bc = bitcast i32 * %ptr to <vscale x 4 x i32> *
				store volatile <vscale x 4 x i32> %s8, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s9, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s10, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s11, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s12, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s13, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s14, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s15, <vscale x 4 x i32>* %ptr1.bc
				store volatile <vscale x 4 x i32> %s16, <vscale x 4 x i32>* %ptr1.bc
				ret void
				}

				; Use AAPCS, no SVE register in z0-7 used (floats occupy z0-z7)

				define void @aapcs1(float %s0, float %s1, float %s2, float %s3, float %s4, float %s5, float %s6, float %s7, <vscale x 4 x float> %s8, <vscale x 4 x float> %s9, <vscale x 4 x float> %s10, <vscale x 4 x float> %s11, <vscale x 4 x float> %s12, <vscale x 4 x float> %s13, <vscale x 4 x float> %s14, <vscale x 4 x float> %s15, <vscale x 4 x float> %s16, <vscale x 4 x float> %s17, float * %ptr) nounwind {
				; CHECK-LABEL: aapcs1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr x8, [sp]
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x7]
				; CHECK-NEXT: ld1w { z2.s }, p0/z, [x6]
				; CHECK-NEXT: ld1w { z3.s }, p0/z, [x5]
				; CHECK-NEXT: ld1w { z4.s }, p0/z, [x4]
				; CHECK-NEXT: ld1w { z5.s }, p0/z, [x3]
				; CHECK-NEXT: ld1w { z6.s }, p0/z, [x2]
				; CHECK-NEXT: ld1w { z7.s }, p0/z, [x1]
				; CHECK-NEXT: ld1w { z16.s }, p0/z, [x0]
				; CHECK-NEXT: ldr x8, [sp, #16]
				; CHECK-NEXT: st1w { z16.s }, p0, [x8]
				; CHECK-NEXT: st1w { z7.s }, p0, [x8]
				; CHECK-NEXT: st1w { z6.s }, p0, [x8]
				; CHECK-NEXT: st1w { z5.s }, p0, [x8]
				; CHECK-NEXT: st1w { z4.s }, p0, [x8]
				; CHECK-NEXT: st1w { z3.s }, p0, [x8]
				; CHECK-NEXT: st1w { z2.s }, p0, [x8]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8]
				; CHECK-NEXT: st1w { z0.s }, p0, [x8]
				; CHECK-NEXT: ret
				entry:
				%ptr1.bc = bitcast float * %ptr to <vscale x 4 x float> *
				store volatile <vscale x 4 x float> %s8, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s9, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s10, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s11, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s12, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s13, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s14, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s15, <vscale x 4 x float>* %ptr1.bc
				store volatile <vscale x 4 x float> %s16, <vscale x 4 x float>* %ptr1.bc
				ret void
				}

	declare float @callee1(float, <vscale x 8 x double>, <vscale x 8 x double>, <vscale x 2 x double>)			declare float @callee1(float, <vscale x 8 x double>, <vscale x 8 x double>, <vscale x 2 x double>)
	declare float @callee2(i32, i32, i32, i32, i32, i32, i32, i32, float, <vscale x 8 x double>, <vscale x 8 x double>)			declare float @callee2(i32, i32, i32, i32, i32, i32, i32, i32, float, <vscale x 8 x double>, <vscale x 8 x double>)
	declare float @callee3(float, float, <vscale x 8 x double>, <vscale x 6 x double>, <vscale x 2 x double>)			declare float @callee3(float, float, <vscale x 8 x double>, <vscale x 6 x double>, <vscale x 2 x double>)

	declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 immarg)			declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 immarg)
	declare <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1>)			declare <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1>)
	declare <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64.nxv2i1(<vscale x 2 x i1>, double*)			declare <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64.nxv2i1(<vscale x 2 x i1>, double*)
	declare <vscale x 6 x double> @llvm.aarch64.sve.ld3.nxv6f64.nxv2i1(<vscale x 2 x i1>, double*)			declare <vscale x 6 x double> @llvm.aarch64.sve.ld3.nxv6f64.nxv2i1(<vscale x 2 x i1>, double*)
	declare <vscale x 2 x double> @llvm.aarch64.sve.ld1.nxv2f64(<vscale x 2 x i1>, double*)			declare <vscale x 2 x double> @llvm.aarch64.sve.ld1.nxv2f64(<vscale x 2 x i1>, double*)
	declare double @llvm.aarch64.sve.faddv.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>)			declare double @llvm.aarch64.sve.faddv.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>)
	declare <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv8f64(<vscale x 8 x double>, i32 immarg)			declare <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv8f64(<vscale x 8 x double>, i32 immarg)
	declare <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv6f64(<vscale x 6 x double>, i32 immarg)			declare <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv6f64(<vscale x 6 x double>, i32 immarg)