This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/1
AArch64Arm64ECCallLowering.cpp
3/6
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
7/7
arm64ec-cfg.ll

Differential D129727

[ARM64EC 11/?] Add support for lowering variadic indirect calls.
Needs ReviewPublic

Authored by bcl5980 on Jul 13 2022, 8:38 PM.

Download Raw Diff

Details

Reviewers

efriedma
dpaoliello
DavidSpickett

Summary

Part of initial Arm64EC patchset.
Variadic function's exit thunk is different from normal function. It need to allocate a dynamic stack allocation on the bottom of stack. Then copy the original arguments on the stack to the new allocation.
When we have a varargs function that returns the value in a register on AArch64, but requires an “sret” return on x64, we need to shuffle around the argument registers, and store x3 to the stack with 8bytes alignment.

Diff Detail

Event Timeline

bcl5980 created this revision.Jul 13 2022, 8:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 8:38 PM

Herald added subscribers: zzheng, hiraditya, kristof.beyls. · View Herald Transcript

bcl5980 requested review of this revision.Jul 13 2022, 8:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 8:38 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B175298: Diff 444505.Jul 13 2022, 8:39 PM

bcl5980 added inline comments.Jul 13 2022, 9:26 PM

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
363–365	It looks Microsoft generate code： sub sp,sp,x15,lsl #4 Does anyone know why we can't accept SP as input/ouput for the instruction? def : Pat<(sub GPR64:$Rn, arith_shifted_reg64:$Rm), (SUBSXrs GPR64:$Rn, arith_shifted_reg64:$Rm)>;

bcl5980 added a reviewer: DavidSpickett.Jul 13 2022, 11:48 PM

bcl5980 updated this revision to Diff 444546.Jul 14 2022, 12:56 AM

Harbormaster completed remote builds in B175321: Diff 444546.Jul 14 2022, 12:57 AM

fix build error

Harbormaster completed remote builds in B175322: Diff 444548.Jul 14 2022, 1:00 AM

efriedma added inline comments.Jul 14 2022, 10:04 AM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
483 ↗	(On Diff #444548)	For the whole "x3 is stored to the stack" thing, if we're going to continue to use an alloca, I'd probably recommend just increasing the size of the alloca by 8 bytes, then explicitly emit a store instruction in the IR. Messing with the alignment like this seems likely to cause issues.
llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
363–365	The instruction used by the Microsoft compiler is SUBXrx64. SUBSXrs can't refer to sp that way; the "lsl" is actually an alternate spelling of "uxtx". We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp.

bcl5980 added inline comments.Jul 14 2022, 11:47 PM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
483 ↗	(On Diff #444548)	I'm trying to move these code into AArch64TargetLowering::LowerCall but still have some problems about the stack layout. And I also trying to make the IR version also correct. Thanks for the idead of allocate extra 8bytes.
llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
363–365	We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp. Do you mean add pattern in td file? def : Pat<(sub GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm), (SUBSXrx GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm)>; If yes, we may need to add a new select function similar to arith_extended_reg32to64_i64 but remove check extend. Or can we do this on DAGCombine? And I think we can do this on another patch.

update based on efriedma's suggestion.

Harbormaster completed remote builds in B175580: Diff 444899.Jul 15 2022, 12:30 AM

bcl5980 marked an inline comment as not done.Jul 15 2022, 1:25 AM

bcl5980 added inline comments.

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
489–490	Can be TODO also. `sub x0, x29, #48` rematerialize from copy x10 and `fmov d0, x10` can't rematerialize so `sub x10, x29, #48` remain. How could we improve the reMaterializeTrivialDef to improve the code?

bcl5980 marked an inline comment as not done.Jul 15 2022, 1:26 AM

efriedma added inline comments.Jul 15 2022, 11:26 AM

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
182	Maybe make the contribution from RetStack a bit more clear.
llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
363–365	I think you'd want a PatLeaf to specifically look for a CopyFromReg from sp, as opposed to changing the way we lower all subtraction operations, but yes, that's the idea. (You could, alternatively, introduce a target-specific node in AArch64ISelLowering.h specifically to represent "sub sp, sp, xN, lsl #4", use it from LowerDYNAMIC_STACKALLOC, and write a pattern to lower it to SUBXrx64.) And yes, please leave it for another patch.
489–490	Not sure how we're ending up with two separate operations in the first place; I'd normally expect SelectionDAG CSE to kick in.

bcl5980 added inline comments.Jul 16 2022, 5:41 AM

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
489–490	It happen in the pass RegisterCoalescer, after the DAG CSE, even after machine CSE. The machine IR is: %15:gpr64sp = ADDXri %stack.0, 0, 0 $x0 = COPY %15:gpr64sp $d0 = COPY %15:gpr64sp x0 can rematerialize to `ADDXri %stack.0, 0, 0`, but d0 can not.

bcl5980 updated this revision to Diff 445235.Jul 16 2022, 8:05 AM

Harbormaster completed remote builds in B175827: Diff 445235.Jul 16 2022, 8:06 AM

bcl5980 updated this revision to Diff 445236.Jul 16 2022, 8:11 AM

Harbormaster completed remote builds in B175828: Diff 445236.Jul 16 2022, 8:12 AM

When I trying to move the memcpy to SelectionDag AArch64TargetLowering::LowerCall, there are two issues I can't fix:

Load address for __os_arm64x_dispatch_call_no_redirect is after memory copy in IR version. But if we copy memory in LowerCall the load is before memory copy. It cause one more register usage.
32 bytes for register store should be the real bottom on the stack but when I move memory copy into LowerCall , the dynamic allocation is always the real bottom on the stack.

@efriedma Do you have any idea to fix them?

Load address for __os_arm64x_dispatch_call_no_redirect is after memory copy in IR version. But if we copy memory in LowerCall the load is before memory copy. It cause one more register usage.

That's a consequence of doing the load in IR, I think. The load is before the memcpy in the initial SelectionDAG, and nothing tries to rearrange them. By default, scheduling happens in source order, and we don't reorder across calls after isel. I don't see any obvious fix; maybe the call lowering code could try to find the load and mess with its chains? But I wouldn't worry about it; if we actually care about the performance of varargs thunks, there are probably more significant improvements we could make, like trying to inline small memcpys.

32 bytes for register store should be the real bottom on the stack but when I move memory copy into LowerCall , the dynamic allocation is always the real bottom on the stack.

Maybe AArch64FrameLowering::hasReservedCallFrame is returning the wrong thing? Normally, the stack allocation for call arguments is allocated in the prologue; the stack frame layout code needs to know if that's illegal because there a dynamic/large/etc. allocation.

Alternatively, you could just make the extra 32 bytes part of the dynamic allocation, instead of trying to do it "properly".

SelectionDag version.

Harbormaster completed remote builds in B176461: Diff 446092.Jul 20 2022, 2:50 AM

bcl5980 updated this revision to Diff 446094.Jul 20 2022, 2:53 AM

Harbormaster completed remote builds in B176462: Diff 446094.Jul 20 2022, 2:54 AM

Diff 6 445236 is the Final IR version
Diff 8 446094 is SelectionDag version

bcl5980 updated this revision to Diff 446396.Jul 21 2022, 2:17 AM

Harbormaster completed remote builds in B176688: Diff 446396.Jul 21 2022, 2:18 AM

fix struct return with larger than 16bytes crash

Harbormaster completed remote builds in B180873: Diff 452119.Aug 12 2022, 2:48 AM

Rebase

Harbormaster completed remote builds in B184759: Diff 457502.Sep 1 2022, 10:24 PM

bcl5980 updated this revision to Diff 457503.Sep 1 2022, 10:28 PM

Harbormaster completed remote builds in B184760: Diff 457503.Sep 1 2022, 10:29 PM

bcl5980 added a parent revision: D126811: [ARM64EC 10/?] Add support for lowering indirect calls..Sep 1 2022, 10:29 PM

bcl5980 added a child revision: D132926: [ARM64EC 12/?] Add param/ret attr for struct to help generate correct thunk for arm64ec.Sep 1 2022, 10:48 PM

fix crash in non-opaque pointer mode

Harbormaster completed remote builds in B185168: Diff 458083.Sep 5 2022, 7:59 PM

efriedma added inline comments.Sep 13 2022, 4:31 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1438	Is this actually necessary? The linker should resolve plain "memcpy" to something reasonable, I think.
6596	In theory, you call CreateVariableSizedObject so that you can use the returned FrameIndex to refer to the object. If you don't actually use the FrameIndex for anything, the call looks sort of silly, sure.

bcl5980 added inline comments.Sep 26 2022, 2:57 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1438	Based on my local test, if I don't add `#` the memcpy will be link into x86 version memcpy and crash at runtime.

efriedma added inline comments.Sep 26 2022, 12:34 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1438	Does the linking process for arm64ec actually guarantee that we have an arm64ec msvcrt that includes "#memcpy" etc.? Even if it does, I don't think we can make the same assumption for all the other functions SelectionDAG needs to call. Given that, we're going to need some mechanism to allow calls generated by SelectionDAG to participate in thunking. In any case, if this change unblocks testing for you, we can leave it in with a FIXME to address the above.

bcl5980 added inline comments.Sep 26 2022, 7:19 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1438	I try to use dumpbin to dump the symbols for vcruntime.lib. And I find those symbols may related: 40 #memchr 41 #memcmp 42 #memcpy 43 #memmove 44 #memset I think we only need to consider the memory intrinsic functions that MSVC also can export. We can also add these intrinsic's exit thunk in AArch64TargetLowering::LowerCall.

efriedma added inline comments.Sep 27 2022, 10:01 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1438	The full set of stuff SelectionDAG can generate includes is basically stuff from RuntimeLibcalls.def, plus a few target-specific bits. (Off the top of my head, not sure if there are any target-specific calls on arm64 windows besides "`__chkstk`" and "`__security_check_cookie`".) If we expect that arm64ec code normally links against an arm64ec C runtime, I guess most of the routines we'd want should be available in "#"-prefixed versions, but I'm not sure about all of them...

address comments

Harbormaster completed remote builds in B191083: Diff 466268.Oct 8 2022, 1:15 AM

bcl5980 marked 5 inline comments as done.Oct 8 2022, 1:16 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Arm64ECCallLowering.cpp

92 lines

AArch64ISelLowering.cpp

31 lines

test/

CodeGen/

AArch64/

arm64ec-cfg.ll

248 lines

Diff 445235

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines

private:

Constant *GuardFnCFGlobal = nullptr;

Constant *GuardFnGlobal = nullptr;

Module *M = nullptr;

};

} // end anonymous namespace

Function *AArch64Arm64ECCallLowering::buildExitThunk(CallBase *CB) {

Type *RetTy = CB->getFunctionType()->getReturnType();

auto &DL = M->getDataLayout();

FunctionType *FT = CB->getFunctionType();

Type *RetTy = FT->getReturnType();

bool IsVarArg = FT->isVarArg();

Type *I8PtrTy = Type::getInt8PtrTy(M->getContext());

Type *I64Ty = Type::getInt64Ty(M->getContext());

SmallVector<Type *> DefArgTypes;

// The first argument to a thunk is the called function, stored in x9.

// (Normally, we won't explicitly refer to this in the assembly; it just

// gets passed on by the call.)

DefArgTypes.push_back(Type::getInt8PtrTy(M->getContext()));

DefArgTypes.push_back(I8PtrTy);

if (IsVarArg) {

// We treat the variadic function's exit thunk as a normal function

// with type:

// rettype exitthunk(

// i8* x9, i64 x0, i64 x1, i64 x2, i64 x3, i8* x4, i64 x5)

// that can coverage all types of variadic function.

// x9 is similar to normal exit thunk, store the called function.

// x0-x3 is the arguments be stored in registers.

// x4 is the address of the arguments on the stack.

// x5 is the size of the arguments on the stack.

for (int i = 0; i < 4; i++)

DefArgTypes.push_back(I64Ty);

DefArgTypes.push_back(I8PtrTy);

DefArgTypes.push_back(I64Ty);

} else {

for (unsigned i = 0; i < CB->arg_size(); ++i) {

DefArgTypes.push_back(CB->getArgOperand(i)->getType());

}

FunctionType *Ty = FunctionType::get(RetTy, DefArgTypes, false);

Function *F =

Function::Create(Ty, GlobalValue::InternalLinkage, 0, "thunk", M);

Function::Create(Ty, GlobalValue::InternalLinkage, 0, "exit_thunk", M);

F->setCallingConv(CallingConv::ARM64EC_Thunk_Native);

// Copy MSVC, and always set up a frame pointer. (Maybe this isn't necessary.)

F->addFnAttr("frame-pointer", "all");

// Only copy sret from the first argument. For C++ instance methods, clang can

// stick an sret marking on a later argument, but it doesn't actually affect

// the ABI, so we can omit it. This avoids triggering a verifier assertion.

if (CB->arg_size() > 0) {

auto Attr = CB->getParamAttr(0, Attribute::StructRet);

if (Attr.isValid())

F->addParamAttr(1, Attr);

}

// FIXME: Copy anything other than sret? Shouldn't be necessary for normal

// C ABI, but might show up in other cases.

BasicBlock *BB = BasicBlock::Create(M->getContext(), "", F);

IRBuilder<> IRB(BB);

PointerType *DispatchPtrTy = FunctionType::get(IRB.getVoidTy(), false)->getPointerTo(0);

Value *CalleePtr = M->getOrInsertGlobal(

"__os_arm64x_dispatch_call_no_redirect", DispatchPtrTy);

Value *Callee = IRB.CreateLoad(DispatchPtrTy, CalleePtr);

auto &DL = M->getDataLayout();

SmallVector<Value *> Args;

SmallVector<Type *> ArgTypes;

// Pass the called function in x9.

Args.push_back(F->arg_begin());

ArgTypes.push_back(Args.back()->getType());

bool RetStack = false;

Type *X64RetType = RetTy;

if (RetTy->isArrayTy() || RetTy->isStructTy()) {

// If the return type is an array or struct, translate it. Values of size

// 8 or less go into RAX; bigger values go into memory, and we pass a

// pointer.

if (DL.getTypeStoreSize(RetTy) > 8) {

Args.push_back(IRB.CreateAlloca(RetTy));

ArgTypes.push_back(Args.back()->getType());

X64RetType = IRB.getVoidTy();

RetStack = true;

} else {

X64RetType = IRB.getIntNTy(DL.getTypeStoreSizeInBits(RetTy));

}

for (auto &Arg : make_range(F->arg_begin() + 1, F->arg_end())) {

// The called function is variadic function, we can't pass x4(stack pointer)

// and x5(stack size) into the function type. There are part of calling conv.

auto ArgRange =

make_range(F->arg_begin() + 1,

IsVarArg ? F->arg_end() - (RetStack ? 3 : 2) : F->arg_end());

for (auto &Arg : ArgRange) {

// Translate arguments from AArch64 calling convention to x86 calling

// convention.

// For simple types, we don't need to do any translation: they're

// represented the same way. (Implicit sign extension is not part of

// either convention.)

// The big thing we have to worry about is struct types... but

Show All 9 Lines

if (Arg.getType()->isArrayTy() || Arg.getType()->isStructTy()) {

if (DL.getTypeStoreSize(Arg.getType()) <= 8)

Args.push_back(IRB.CreateLoad(

IRB.getIntNTy(DL.getTypeStoreSizeInBits(Arg.getType())), Mem));

else

Args.push_back(Mem);

} else {

Args.push_back(&Arg);

}

if (!IsVarArg)

ArgTypes.push_back(Args.back()->getType());

}

if (IsVarArg) {

// Memory addresss of the arguments on the stack.

Value *Src = F->arg_begin() + 5;

// Size of the arguments on the stack.

Value *SrcLength = F->arg_begin() + 6;

// Align the size to 16btyes.

// It looks Microsoft not only align the size to 16bytes,

// but also align (-1,-15) to -16. We don't know why so for

// now we don't add this part.

Constant *AddC = ConstantInt::get(I64Ty, 15 + (RetStack ? 8 : 0));

efriedmaUnsubmitted

Done

// now we don't add this part.

- Constant *AddC = ConstantInt::get(I64Ty, RetStack ? 31 : 15);

+ Constant *AddC = ConstantInt::get(I64Ty, 15 + (RetStack ? 8 : 0));

Constant *NegC = ConstantInt::get(I64Ty, -16ll);

Maybe make the contribution from RetStack a bit more clear.

efriedma: Maybe make the contribution from RetStack a bit more clear.

Constant *NegC = ConstantInt::get(I64Ty, -16ll);

Value *Add = IRB.CreateAdd(SrcLength, AddC);

Value *Length = IRB.CreateAnd(Add, NegC);

// FIXME: the allocation should be on the stack's bottom.

// For now the code here should assume we have no other dynamic

// allocation after the alloca inst. The assumption is fragile

// so we need to use another way to allocate it to make sure it

// is on the stack's bottom.

Type *I8Ty = Type::getInt8Ty(M->getContext());

AllocaInst *AI = IRB.CreateAlloca(I8Ty, Length);

AI->setAlignment(DL.getPrefTypeAlign(I64Ty));

Value *Dst = AI;

if (RetStack) {

IRB.CreateStore(F->arg_begin() + 4, AI);

Dst = IRB.CreateGEP(I8Ty, AI, ConstantInt::get(I64Ty, 8));

}

IRB.CreateMemCpy(Dst, Dst->getPointerAlignment(DL), Src,

Src->getPointerAlignment(DL), SrcLength);

}

// FIXME: Transfer necessary attributes? sret? anything else?

// FIXME: Try to share thunks. This probably involves simplifying the

// argument types (translating all integers/pointers to i64, etc.)

auto *CallTy = FunctionType::get(X64RetType, ArgTypes, false);

auto *CallTy = FunctionType::get(X64RetType, ArgTypes, IsVarArg);

PointerType *DispatchPtrTy =

FunctionType::get(IRB.getVoidTy(), false)->getPointerTo(0);

Value *CalleePtr = M->getOrInsertGlobal(

"__os_arm64x_dispatch_call_no_redirect", DispatchPtrTy);

Value *Callee = IRB.CreateLoad(DispatchPtrTy, CalleePtr);

Callee = IRB.CreateBitCast(Callee, CallTy->getPointerTo(0));

CallInst *Call = IRB.CreateCall(CallTy, Callee, Args);

Call->setCallingConv(CallingConv::ARM64EC_Thunk_X64);

Value *RetVal = Call;

if (RetTy->isArrayTy() || RetTy->isStructTy()) {

// If we rewrote the return type earlier, convert the return value to

// the proper type.

▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,424 Lines • ▼ Show 20 Lines	#undef LCALLNAME5
if (Subtarget->hasMOPS() && Subtarget->hasMTE()) {		if (Subtarget->hasMOPS() && Subtarget->hasMTE()) {
// Only required for llvm.aarch64.mops.memset.tag		// Only required for llvm.aarch64.mops.memset.tag
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);
}		}

PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();		PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();

IsStrictFPEnabled = true;		IsStrictFPEnabled = true;

		if (Subtarget->isWindowsArm64EC()) {
		// FIXME: are there other intrinsics we need to add here?
		setLibcallName(RTLIB::MEMCPY, "#memcpy");
		setLibcallName(RTLIB::MEMSET, "#memset");
		setLibcallName(RTLIB::MEMMOVE, "#memmove");
		efriedmaUnsubmitted Not Done Reply Inline Actions Is this actually necessary? The linker should resolve plain "memcpy" to something reasonable, I think. efriedma: Is this actually necessary? The linker should resolve plain "memcpy" to something reasonable…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Based on my local test, if I don't add `#` the memcpy will be link into x86 version memcpy and crash at runtime. bcl5980: Based on my local test, if I don't add `#` the memcpy will be link into x86 version memcpy and…
		efriedmaUnsubmitted Not Done Reply Inline Actions Does the linking process for arm64ec actually guarantee that we have an arm64ec msvcrt that includes "#memcpy" etc.? Even if it does, I don't think we can make the same assumption for all the other functions SelectionDAG needs to call. Given that, we're going to need some mechanism to allow calls generated by SelectionDAG to participate in thunking. In any case, if this change unblocks testing for you, we can leave it in with a FIXME to address the above. efriedma: Does the linking process for arm64ec actually guarantee that we have an arm64ec msvcrt that…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions I try to use dumpbin to dump the symbols for vcruntime.lib. And I find those symbols may related: 40 #memchr 41 #memcmp 42 #memcpy 43 #memmove 44 #memset I think we only need to consider the memory intrinsic functions that MSVC also can export. We can also add these intrinsic's exit thunk in AArch64TargetLowering::LowerCall. bcl5980: I try to use dumpbin to dump the symbols for vcruntime.lib. And I find those symbols may…
		efriedmaUnsubmitted Not Done Reply Inline Actions The full set of stuff SelectionDAG can generate includes is basically stuff from RuntimeLibcalls.def, plus a few target-specific bits. (Off the top of my head, not sure if there are any target-specific calls on arm64 windows besides "`__chkstk`" and "`__security_check_cookie`".) If we expect that arm64ec code normally links against an arm64ec C runtime, I guess most of the routines we'd want should be available in "#"-prefixed versions, but I'm not sure about all of them... efriedma: The full set of stuff SelectionDAG can generate includes is basically stuff from…
		}
}		}

void AArch64TargetLowering::addTypeForNEON(MVT VT) {		void AArch64TargetLowering::addTypeForNEON(MVT VT) {
assert(VT.isVector() && "VT should be a vector type");		assert(VT.isVector() && "VT should be a vector type");

if (VT.isFloatingPoint()) {		if (VT.isFloatingPoint()) {
MVT PromoteTo = EVT(VT).changeVectorElementTypeToInteger().getSimpleVT();		MVT PromoteTo = EVT(VT).changeVectorElementTypeToInteger().getSimpleVT();
setOperationPromotedToType(ISD::LOAD, VT, PromoteTo);		setOperationPromotedToType(ISD::LOAD, VT, PromoteTo);
▲ Show 20 Lines • Show All 5,107 Lines • ▼ Show 20 Lines	AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;		SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;
SmallVector<SDValue, 32> &OutVals = CLI.OutVals;		SmallVector<SDValue, 32> &OutVals = CLI.OutVals;
SmallVector<ISD::InputArg, 32> &Ins = CLI.Ins;		SmallVector<ISD::InputArg, 32> &Ins = CLI.Ins;
SDValue Chain = CLI.Chain;		SDValue Chain = CLI.Chain;
SDValue Callee = CLI.Callee;		SDValue Callee = CLI.Callee;
bool &IsTailCall = CLI.IsTailCall;		bool &IsTailCall = CLI.IsTailCall;
CallingConv::ID &CallConv = CLI.CallConv;		CallingConv::ID &CallConv = CLI.CallConv;
bool IsVarArg = CLI.IsVarArg;		bool IsVarArg = CLI.IsVarArg;
		bool IsArm64EcThunk = CallConv == CallingConv::ARM64EC_Thunk_X64;

MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MachineFunction::CallSiteInfo CSInfo;		MachineFunction::CallSiteInfo CSInfo;
bool IsThisReturn = false;		bool IsThisReturn = false;

AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
bool TailCallOpt = MF.getTarget().Options.GuaranteedTailCallOpt;		bool TailCallOpt = MF.getTarget().Options.GuaranteedTailCallOpt;
bool IsSibCall = false;		bool IsSibCall = false;
Show All 16 Lines	if (CallConv == CallingConv::C \|\| CallConv == CallingConv::Fast) {

if (CalleeInSVE \|\| CalleeOutSVE)		if (CalleeInSVE \|\| CalleeOutSVE)
CallConv = CallingConv::AArch64_SVE_VectorCall;		CallConv = CallingConv::AArch64_SVE_VectorCall;
}		}

if (IsTailCall) {		if (IsTailCall) {
// Check if it's really possible to do a tail call.		// Check if it's really possible to do a tail call.
IsTailCall = isEligibleForTailCallOptimization(CLI);		IsTailCall = isEligibleForTailCallOptimization(CLI);

		efriedmaUnsubmitted Done Reply Inline Actions In theory, you call CreateVariableSizedObject so that you can use the returned FrameIndex to refer to the object. If you don't actually use the FrameIndex for anything, the call looks sort of silly, sure. efriedma: In theory, you call CreateVariableSizedObject so that you can use the returned FrameIndex to…
// A sibling call is one where we're under the usual C ABI and not planning		// A sibling call is one where we're under the usual C ABI and not planning
// to change that but can still do a tail call:		// to change that but can still do a tail call:
if (!TailCallOpt && IsTailCall && CallConv != CallingConv::Tail &&		if (!TailCallOpt && IsTailCall && CallConv != CallingConv::Tail &&
CallConv != CallingConv::SwiftTail)		CallConv != CallingConv::SwiftTail)
IsSibCall = true;		IsSibCall = true;

if (IsTailCall)		if (IsTailCall)
++NumTailCalls;		++NumTailCalls;
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
});		});
} else {		} else {
RegsToPass.emplace_back(VA.getLocReg(), Arg);		RegsToPass.emplace_back(VA.getLocReg(), Arg);
RegsUsed.insert(VA.getLocReg());		RegsUsed.insert(VA.getLocReg());
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
if (Options.EmitCallSiteInfo)		if (Options.EmitCallSiteInfo)
CSInfo.emplace_back(VA.getLocReg(), i);		CSInfo.emplace_back(VA.getLocReg(), i);
}		}

		if (IsVarArg && IsArm64EcThunk) {
		// Float parameters are passed in both int and float register
		Register ShadowReg;
		switch (VA.getLocReg()) {
		case AArch64::X0:
		ShadowReg = AArch64::D0;
		break;
		case AArch64::X1:
		ShadowReg = AArch64::D1;
		break;
		case AArch64::X2:
		ShadowReg = AArch64::D2;
		break;
		case AArch64::X3:
		ShadowReg = AArch64::D3;
		break;
		}
		if (ShadowReg)
		RegsToPass.push_back(std::make_pair(ShadowReg, Arg));
		}
} else {		} else {
assert(VA.isMemLoc());		assert(VA.isMemLoc());

SDValue DstAddr;		SDValue DstAddr;
MachinePointerInfo DstInfo;		MachinePointerInfo DstInfo;

// FIXME: This works on big-endian for composite byvals, which are the		// FIXME: This works on big-endian for composite byvals, which are the
// common case. It should also work for fundamental types too.		// common case. It should also work for fundamental types too.
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
Arg = DAG.getNode(ISD::TRUNCATE, DL, VA.getValVT(), Arg);		Arg = DAG.getNode(ISD::TRUNCATE, DL, VA.getValVT(), Arg);

SDValue Store = DAG.getStore(Chain, DL, Arg, DstAddr, DstInfo);		SDValue Store = DAG.getStore(Chain, DL, Arg, DstAddr, DstInfo);
MemOpChains.push_back(Store);		MemOpChains.push_back(Store);
}		}
}		}
}		}

if (IsVarArg && Subtarget->isWindowsArm64EC()) {		if (IsVarArg && Subtarget->isWindowsArm64EC() && !IsArm64EcThunk) {
// For vararg calls, the Arm64EC ABI requires values in x4 and x5		// For vararg calls, the Arm64EC ABI requires values in x4 and x5
// describing the argument list. x4 contains the address of the		// describing the argument list. x4 contains the address of the
// first stack parameter. x5 contains the size in bytes of all parameters		// first stack parameter. x5 contains the size in bytes of all parameters
// passed on the stack.		// passed on the stack.
RegsToPass.emplace_back(AArch64::X4, StackPtr);		RegsToPass.emplace_back(AArch64::X4, StackPtr);
RegsToPass.emplace_back(AArch64::X5,		RegsToPass.emplace_back(AArch64::X5,
DAG.getConstant(NumBytes, DL, MVT::i64));		DAG.getConstant(NumBytes, DL, MVT::i64));
}		}
▲ Show 20 Lines • Show All 14,902 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: f:			; CHECK-LABEL: f:
	; CHECK: .seh_proc f			; CHECK: .seh_proc f
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk			; CHECK-NEXT: adrp x10, exit_thunk
	; CHECK-NEXT: add x10, x10, :lo12:thunk			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: blr x11			; CHECK-NEXT: blr x11
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	; CHECK: .seh_proc f2			; CHECK: .seh_proc f2
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk.1			; CHECK-NEXT: adrp x10, exit_thunk.1
	; CHECK-NEXT: add x10, x10, :lo12:thunk.1			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk.1
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: mov w0, #1			; CHECK-NEXT: mov w0, #1
	; CHECK-NEXT: mov w1, #2			; CHECK-NEXT: mov w1, #2
	; CHECK-NEXT: mov w2, #3			; CHECK-NEXT: mov w2, #3
	; CHECK-NEXT: mov w3, #4			; CHECK-NEXT: mov w3, #4
	; CHECK-NEXT: mov w4, #5			; CHECK-NEXT: mov w4, #5
	; CHECK-NEXT: blr x11			; CHECK-NEXT: blr x11
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK: .seh_proc f3			; CHECK: .seh_proc f3
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk.2			; CHECK-NEXT: adrp x10, exit_thunk.2
	; CHECK-NEXT: add x10, x10, :lo12:thunk.2			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk.2
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: movi d0, #0000000000000000			; CHECK-NEXT: movi d0, #0000000000000000
	; CHECK-NEXT: movi d1, #0000000000000000			; CHECK-NEXT: movi d1, #0000000000000000
	; CHECK-NEXT: movi d2, #0000000000000000			; CHECK-NEXT: movi d2, #0000000000000000
	; CHECK-NEXT: movi d3, #0000000000000000			; CHECK-NEXT: movi d3, #0000000000000000
	; CHECK-NEXT: blr x11			; CHECK-NEXT: blr x11
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: .seh_proc f4			; CHECK: .seh_proc f4
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk.3			; CHECK-NEXT: adrp x10, exit_thunk.3
	; CHECK-NEXT: add x10, x10, :lo12:thunk.3			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk.3
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: movi d0, #0000000000000000			; CHECK-NEXT: movi d0, #0000000000000000
	; CHECK-NEXT: movi d1, #0000000000000000			; CHECK-NEXT: movi d1, #0000000000000000
	; CHECK-NEXT: movi d2, #0000000000000000			; CHECK-NEXT: movi d2, #0000000000000000
	; CHECK-NEXT: movi d3, #0000000000000000			; CHECK-NEXT: movi d3, #0000000000000000
	; CHECK-NEXT: blr x11			; CHECK-NEXT: blr x11
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: fvar:			; CHECK-LABEL: fvar:
	; CHECK: .seh_proc fvar			; CHECK: .seh_proc fvar
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk.4			; CHECK-NEXT: adrp x10, exit_thunk.4
	; CHECK-NEXT: add x10, x10, :lo12:thunk.4			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk.4
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: mov w0, #4			; CHECK-NEXT: mov w0, #4
	; CHECK-NEXT: mov w1, #5			; CHECK-NEXT: mov w1, #5
	; CHECK-NEXT: mov w2, #6			; CHECK-NEXT: mov w2, #6
	; CHECK-NEXT: mov w3, #8			; CHECK-NEXT: mov w3, #8
	; CHECK-NEXT: mov x4, sp			; CHECK-NEXT: mov x4, sp
	Show All 11 Lines
	; CHECK: .seh_proc fvar2			; CHECK: .seh_proc fvar2
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: sub sp, sp, #32			; CHECK-NEXT: sub sp, sp, #32
	; CHECK-NEXT: .seh_stackalloc 32			; CHECK-NEXT: .seh_stackalloc 32
	; CHECK-NEXT: str x30, [sp, #16] // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #16] // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg x30, 16			; CHECK-NEXT: .seh_save_reg x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk.5			; CHECK-NEXT: adrp x10, exit_thunk.5
	; CHECK-NEXT: add x10, x10, :lo12:thunk.5			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk.5
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: mov x4, sp			; CHECK-NEXT: mov x4, sp
	; CHECK-NEXT: mov w8, #9			; CHECK-NEXT: mov w8, #9
	; CHECK-NEXT: mov w9, #7			; CHECK-NEXT: mov w9, #7
	; CHECK-NEXT: mov w0, #4			; CHECK-NEXT: mov w0, #4
	; CHECK-NEXT: mov w1, #5			; CHECK-NEXT: mov w1, #5
	Show All 17 Lines
	; CHECK: .seh_proc fvar3			; CHECK: .seh_proc fvar3
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: sub sp, sp, #32			; CHECK-NEXT: sub sp, sp, #32
	; CHECK-NEXT: .seh_stackalloc 32			; CHECK-NEXT: .seh_stackalloc 32
	; CHECK-NEXT: str x30, [sp, #16] // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #16] // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg x30, 16			; CHECK-NEXT: .seh_save_reg x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk.6			; CHECK-NEXT: adrp x10, exit_thunk.6
	; CHECK-NEXT: add x10, x10, :lo12:thunk.6			; CHECK-NEXT: add x10, x10, :lo12:exit_thunk.6
	; CHECK-NEXT: mov x11, x0			; CHECK-NEXT: mov x11, x0
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: mov x4, sp			; CHECK-NEXT: mov x4, sp
	; CHECK-NEXT: mov w8, #9			; CHECK-NEXT: mov w8, #9
	; CHECK-NEXT: mov w9, #7			; CHECK-NEXT: mov w9, #7
	; CHECK-NEXT: mov w0, #4			; CHECK-NEXT: mov w0, #4
	; CHECK-NEXT: mov w1, #5			; CHECK-NEXT: mov w1, #5
	; CHECK-NEXT: mov w2, #6			; CHECK-NEXT: mov w2, #6
	; CHECK-NEXT: mov w3, #8			; CHECK-NEXT: mov w3, #8
	; CHECK-NEXT: mov w5, #16			; CHECK-NEXT: mov w5, #16
	; CHECK-NEXT: str w8, [sp, #8]			; CHECK-NEXT: str w8, [sp, #8]
	; CHECK-NEXT: str w9, [sp]			; CHECK-NEXT: str w9, [sp]
	; CHECK-NEXT: blr x11			; CHECK-NEXT: blr x11
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldr x30, [sp, #16] // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp, #16] // 8-byte Folded Reload
	; CHECK-NEXT: .seh_save_reg x30, 16			; CHECK-NEXT: .seh_save_reg x30, 16
	; CHECK-NEXT: add sp, sp, #32			; CHECK-NEXT: add sp, sp, #32
	; CHECK-NEXT: .seh_stackalloc 32			; CHECK-NEXT: .seh_stackalloc 32
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk:			; CHECK-LABEL: exit_thunk:
	; CHECK: .seh_proc thunk			; CHECK: .seh_proc exit_thunk
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #48			; CHECK-NEXT: sub sp, sp, #48
	; CHECK-NEXT: .seh_stackalloc 48			; CHECK-NEXT: .seh_stackalloc 48
	; CHECK-NEXT: stp x29, x30, [sp, #32] // 16-byte Folded Spill			; CHECK-NEXT: stp x29, x30, [sp, #32] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 32			; CHECK-NEXT: .seh_save_fplr 32
	; CHECK-NEXT: add x29, sp, #32			; CHECK-NEXT: add x29, sp, #32
	; CHECK-NEXT: .seh_add_fp 32			; CHECK-NEXT: .seh_add_fp 32
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #32] // 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp, #32] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_save_fplr 32			; CHECK-NEXT: .seh_save_fplr 32
	; CHECK-NEXT: add sp, sp, #48			; CHECK-NEXT: add sp, sp, #48
	; CHECK-NEXT: .seh_stackalloc 48			; CHECK-NEXT: .seh_stackalloc 48
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk.1:			; CHECK-LABEL: exit_thunk.1:
	; CHECK: .seh_proc thunk.1			; CHECK: .seh_proc exit_thunk.1
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #64			; CHECK-NEXT: sub sp, sp, #64
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_stackalloc 64
	; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill			; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_save_fplr 48
	; CHECK-NEXT: add x29, sp, #48			; CHECK-NEXT: add x29, sp, #48
	; CHECK-NEXT: .seh_add_fp 48			; CHECK-NEXT: .seh_add_fp 48
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
	; CHECK-NEXT: str w4, [sp, #32]			; CHECK-NEXT: str w4, [sp, #32]
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_save_fplr 48
	; CHECK-NEXT: add sp, sp, #64			; CHECK-NEXT: add sp, sp, #64
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_stackalloc 64
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk.2:			; CHECK-LABEL: exit_thunk.2:
	; CHECK: .seh_proc thunk.2			; CHECK: .seh_proc exit_thunk.2
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #64			; CHECK-NEXT: sub sp, sp, #64
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_stackalloc 64
	; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill			; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_save_fplr 48
	; CHECK-NEXT: add x29, sp, #48			; CHECK-NEXT: add x29, sp, #48
	; CHECK-NEXT: .seh_add_fp 48			; CHECK-NEXT: .seh_add_fp 48
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
	; CHECK-NEXT: sub x0, x29, #16			; CHECK-NEXT: sub x0, x29, #16
	; CHECK-NEXT: stp s1, s2, [x29, #-12]			; CHECK-NEXT: stp s2, s3, [x29, #-8]
	; CHECK-NEXT: stur s0, [x29, #-16]			; CHECK-NEXT: stp s0, s1, [x29, #-16]
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
	; CHECK-NEXT: stur s3, [x29, #-4]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_save_fplr 48
	; CHECK-NEXT: add sp, sp, #64			; CHECK-NEXT: add sp, sp, #64
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_stackalloc 64
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk.3:			; CHECK-LABEL: exit_thunk.3:
	; CHECK: .seh_proc thunk.3			; CHECK: .seh_proc exit_thunk.3
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #80			; CHECK-NEXT: sub sp, sp, #80
	; CHECK-NEXT: .seh_stackalloc 80			; CHECK-NEXT: .seh_stackalloc 80
	; CHECK-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill			; CHECK-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 64			; CHECK-NEXT: .seh_save_fplr 64
	; CHECK-NEXT: add x29, sp, #64			; CHECK-NEXT: add x29, sp, #64
	; CHECK-NEXT: .seh_add_fp 64			; CHECK-NEXT: .seh_add_fp 64
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
	; CHECK-NEXT: sub x0, x29, #16			; CHECK-NEXT: sub x0, x29, #16
	; CHECK-NEXT: add x1, sp, #32			; CHECK-NEXT: add x1, sp, #32
	; CHECK-NEXT: stp s1, s2, [sp, #36]			; CHECK-NEXT: stp s2, s3, [sp, #40]
	; CHECK-NEXT: str s0, [sp, #32]			; CHECK-NEXT: stp s0, s1, [sp, #32]
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
	; CHECK-NEXT: str s3, [sp, #44]
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
	; CHECK-NEXT: ldp x0, x1, [x29, #-16]			; CHECK-NEXT: ldp x0, x1, [x29, #-16]
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #64] // 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp, #64] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_save_fplr 64			; CHECK-NEXT: .seh_save_fplr 64
	; CHECK-NEXT: add sp, sp, #80			; CHECK-NEXT: add sp, sp, #80
	; CHECK-NEXT: .seh_stackalloc 80			; CHECK-NEXT: .seh_stackalloc 80
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk.4:			; CHECK-LABEL: exit_thunk.4:
	; CHECK: .seh_proc thunk.4			; CHECK: .seh_proc exit_thunk.4
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #48			; CHECK-NEXT: stp x19, x20, [sp, #-64]! // 16-byte Folded Spill
	; CHECK-NEXT: .seh_stackalloc 48			; CHECK-NEXT: .seh_save_regp_x x19, 64
	; CHECK-NEXT: stp x29, x30, [sp, #32] // 16-byte Folded Spill			; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 32			; CHECK-NEXT: .seh_save_regp x21, 16
	; CHECK-NEXT: add x29, sp, #32			; CHECK-NEXT: str x25, [sp, #32] // 8-byte Folded Spill
	; CHECK-NEXT: .seh_add_fp 32			; CHECK-NEXT: .seh_save_reg x25, 32
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: stp x29, x30, [sp, #40] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_fplr 40
				; CHECK-NEXT: add x29, sp, #40
				; CHECK-NEXT: .seh_add_fp 40
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: add x8, x5, #15
				; CHECK-NEXT: mov x19, x3
				; CHECK-NEXT: lsr x15, x8, #4
				; CHECK-NEXT: mov x20, x2
				; CHECK-NEXT: mov x21, x1
				; CHECK-NEXT: mov x22, x0
				; CHECK-NEXT: mov x25, x9
				; CHECK-NEXT: bl __chkstk_arm64ec
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: sub x0, x8, x15, lsl #4
				; CHECK-NEXT: mov sp, x0
				bcl5980AuthorUnsubmitted Done Reply Inline Actions It looks Microsoft generate code： sub sp,sp,x15,lsl #4 Does anyone know why we can't accept SP as input/ouput for the instruction? def : Pat<(sub GPR64:$Rn, arith_shifted_reg64:$Rm), (SUBSXrs GPR64:$Rn, arith_shifted_reg64:$Rm)>; bcl5980: It looks Microsoft generate code： ``` sub sp,sp,x15,lsl #4 ``` Does anyone know why we…
				efriedmaUnsubmitted Done Reply Inline Actions The instruction used by the Microsoft compiler is SUBXrx64. SUBSXrs can't refer to sp that way; the "lsl" is actually an alternate spelling of "uxtx". We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp. efriedma: The instruction used by the Microsoft compiler is SUBXrx64. SUBSXrs can't refer to sp that way…
				bcl5980AuthorUnsubmitted Done Reply Inline Actions We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp. Do you mean add pattern in td file? def : Pat<(sub GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm), (SUBSXrx GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm)>; If yes, we may need to add a new select function similar to arith_extended_reg32to64_i64 but remove check extend. Or can we do this on DAGCombine? And I think we can do this on another patch. bcl5980: //We could add a specialized pattern specifically for subtraction operations where the first…
				efriedmaUnsubmitted Done Reply Inline Actions I think you'd want a PatLeaf to specifically look for a CopyFromReg from sp, as opposed to changing the way we lower all subtraction operations, but yes, that's the idea. (You could, alternatively, introduce a target-specific node in AArch64ISelLowering.h specifically to represent "sub sp, sp, xN, lsl #4", use it from LowerDYNAMIC_STACKALLOC, and write a pattern to lower it to SUBXrx64.) And yes, please leave it for another patch. efriedma: I think you'd want a PatLeaf to specifically look for a CopyFromReg from sp, as opposed to…
				; CHECK-NEXT: mov x1, x4
				; CHECK-NEXT: mov x2, x5
				; CHECK-NEXT: bl "#memcpy"
	; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: mov x9, x25
				; CHECK-NEXT: mov x0, x22
				; CHECK-NEXT: fmov d0, x22
				; CHECK-NEXT: mov x1, x21
				; CHECK-NEXT: fmov d1, x21
				; CHECK-NEXT: mov x2, x20
				; CHECK-NEXT: fmov d2, x20
				; CHECK-NEXT: mov x3, x19
				; CHECK-NEXT: fmov d3, x19
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
				; CHECK-NEXT: add sp, sp, #32
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #32] // 16-byte Folded Reload			; CHECK-NEXT: sub sp, x29, #40
	; CHECK-NEXT: .seh_save_fplr 32			; CHECK-NEXT: .seh_add_fp 40
	; CHECK-NEXT: add sp, sp, #48			; CHECK-NEXT: ldp x29, x30, [sp, #40] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_stackalloc 48			; CHECK-NEXT: .seh_save_fplr 40
				; CHECK-NEXT: ldr x25, [sp, #32] // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg x25, 32
				; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: ldp x19, x20, [sp], #64 // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp_x x19, 64
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk.5:			; CHECK-LABEL: exit_thunk.5:
	; CHECK: .seh_proc thunk.5			; CHECK: .seh_proc exit_thunk.5
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #64			; CHECK-NEXT: stp x19, x20, [sp, #-64]! // 16-byte Folded Spill
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_save_regp_x x19, 64
	; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill			; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_save_regp x21, 16
	; CHECK-NEXT: add x29, sp, #48			; CHECK-NEXT: str x25, [sp, #32] // 8-byte Folded Spill
	; CHECK-NEXT: .seh_add_fp 48			; CHECK-NEXT: .seh_save_reg x25, 32
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: stp x29, x30, [sp, #40] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_fplr 40
				; CHECK-NEXT: add x29, sp, #40
				; CHECK-NEXT: .seh_add_fp 40
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: add x8, x5, #15
				; CHECK-NEXT: mov x19, x3
				; CHECK-NEXT: lsr x15, x8, #4
				; CHECK-NEXT: mov x20, x2
				; CHECK-NEXT: mov x21, x1
				; CHECK-NEXT: mov x22, x0
				; CHECK-NEXT: mov x25, x9
				; CHECK-NEXT: bl __chkstk_arm64ec
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: sub x0, x8, x15, lsl #4
				; CHECK-NEXT: mov sp, x0
				; CHECK-NEXT: mov x1, x4
				; CHECK-NEXT: mov x2, x5
				; CHECK-NEXT: bl "#memcpy"
	; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
	; CHECK-NEXT: str w5, [sp, #40]
	; CHECK-NEXT: str w4, [sp, #32]
	; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: mov x9, x25
				; CHECK-NEXT: mov x0, x22
				; CHECK-NEXT: fmov d0, x22
				; CHECK-NEXT: mov x1, x21
				; CHECK-NEXT: fmov d1, x21
				; CHECK-NEXT: mov x2, x20
				; CHECK-NEXT: fmov d2, x20
				; CHECK-NEXT: mov x3, x19
				; CHECK-NEXT: fmov d3, x19
	; CHECK-NEXT: blr x8			; CHECK-NEXT: blr x8
				; CHECK-NEXT: add sp, sp, #32
	; CHECK-NEXT: mov w0, w8			; CHECK-NEXT: mov w0, w8
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload			; CHECK-NEXT: sub sp, x29, #40
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_add_fp 40
	; CHECK-NEXT: add sp, sp, #64			; CHECK-NEXT: ldp x29, x30, [sp, #40] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_save_fplr 40
				; CHECK-NEXT: ldr x25, [sp, #32] // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg x25, 32
				; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: ldp x19, x20, [sp], #64 // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp_x x19, 64
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
	; CHECK-LABEL: thunk.6:			; CHECK-LABEL: exit_thunk.6:
	; CHECK: .seh_proc thunk.6			; CHECK: .seh_proc exit_thunk.6
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #96			; CHECK-NEXT: stp x19, x20, [sp, #-48]! // 16-byte Folded Spill
	; CHECK-NEXT: .seh_stackalloc 96			; CHECK-NEXT: .seh_save_regp_x x19, 48
	; CHECK-NEXT: stp x29, x30, [sp, #80] // 16-byte Folded Spill			; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 80			; CHECK-NEXT: .seh_save_regp x21, 16
	; CHECK-NEXT: add x29, sp, #80			; CHECK-NEXT: stp x29, x30, [sp, #32] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_add_fp 80			; CHECK-NEXT: .seh_save_fplr 32
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: add x29, sp, #32
	; CHECK-NEXT: adrp x10, __os_arm64x_dispatch_call_no_redirect			; CHECK-NEXT: .seh_add_fp 32
	; CHECK-NEXT: mov w8, w3			; CHECK-NEXT: sub sp, sp, #16
	; CHECK-NEXT: mov w3, w2			; CHECK-NEXT: .seh_stackalloc 16
	; CHECK-NEXT: mov w2, w1			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: mov w1, w0			; CHECK-NEXT: add x8, x5, #31
	; CHECK-NEXT: sub x0, x29, #16			; CHECK-NEXT: mov x19, x2
	; CHECK-NEXT: ldr x10, [x10, :lo12:__os_arm64x_dispatch_call_no_redirect]			; CHECK-NEXT: lsr x15, x8, #4
	; CHECK-NEXT: str w5, [sp, #48]			; CHECK-NEXT: mov x20, x1
	; CHECK-NEXT: str w4, [sp, #40]			; CHECK-NEXT: mov x21, x0
	; CHECK-NEXT: str w8, [sp, #32]			; CHECK-NEXT: mov x22, x9
	; CHECK-NEXT: blr x10			; CHECK-NEXT: bl __chkstk_arm64ec
	; CHECK-NEXT: ldp x0, x1, [x29, #-16]			; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: sub x0, x8, x15, lsl #4
				; CHECK-NEXT: mov sp, x0
				; CHECK-NEXT: mov x1, x4
				; CHECK-NEXT: mov x2, x5
				; CHECK-NEXT: str x3, [x0], #8
				; CHECK-NEXT: bl "#memcpy"
				; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
				; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: sub x10, x29, #48
				; CHECK-NEXT: sub x0, x29, #48
				bcl5980AuthorUnsubmitted Done Reply Inline Actions Can be TODO also. `sub x0, x29, #48` rematerialize from copy x10 and `fmov d0, x10` can't rematerialize so `sub x10, x29, #48` remain. How could we improve the reMaterializeTrivialDef to improve the code? bcl5980: Can be TODO also. `sub x0, x29, #48` rematerialize from copy x10 and `fmov d0, x10` can't…
				efriedmaUnsubmitted Done Reply Inline Actions Not sure how we're ending up with two separate operations in the first place; I'd normally expect SelectionDAG CSE to kick in. efriedma: Not sure how we're ending up with two separate operations in the first place; I'd normally…
				bcl5980AuthorUnsubmitted Done Reply Inline Actions It happen in the pass RegisterCoalescer, after the DAG CSE, even after machine CSE. The machine IR is: %15:gpr64sp = ADDXri %stack.0, 0, 0 $x0 = COPY %15:gpr64sp $d0 = COPY %15:gpr64sp x0 can rematerialize to `ADDXri %stack.0, 0, 0`, but d0 can not. bcl5980: It happen in the pass RegisterCoalescer, after the DAG CSE, even after machine CSE. The machine…
				; CHECK-NEXT: mov x9, x22
				; CHECK-NEXT: mov x1, x21
				; CHECK-NEXT: fmov d1, x21
				; CHECK-NEXT: mov x2, x20
				; CHECK-NEXT: fmov d2, x20
				; CHECK-NEXT: fmov d0, x10
				; CHECK-NEXT: mov x3, x19
				; CHECK-NEXT: fmov d3, x19
				; CHECK-NEXT: blr x8
				; CHECK-NEXT: add sp, sp, #32
				; CHECK-NEXT: ldp x0, x1, [x29, #-48]
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldp x29, x30, [sp, #80] // 16-byte Folded Reload			; CHECK-NEXT: sub sp, x29, #32
	; CHECK-NEXT: .seh_save_fplr 80			; CHECK-NEXT: .seh_add_fp 32
	; CHECK-NEXT: add sp, sp, #96			; CHECK-NEXT: ldp x29, x30, [sp, #32] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_stackalloc 96			; CHECK-NEXT: .seh_save_fplr 32
				; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: ldp x19, x20, [sp], #48 // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp_x x19, 48
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc

This is an archive of the discontinued LLVM Phabricator instance.

[ARM64EC 11/?] Add support for lowering variadic indirect calls.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 445235

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll

[ARM64EC 11/?] Add support for lowering variadic indirect calls.
Needs ReviewPublic