This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/1
AArch64Arm64ECCallLowering.cpp
-
AArch64ISelLowering.h
3/6
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
7/7
arm64ec-cfg.ll

Differential D129727

[ARM64EC 11/?] Add support for lowering variadic indirect calls.
Needs ReviewPublic

Authored by bcl5980 on Jul 13 2022, 8:38 PM.

Download Raw Diff

Details

Reviewers

efriedma
dpaoliello
DavidSpickett

Summary

Part of initial Arm64EC patchset.
Variadic function's exit thunk is different from normal function. It need to allocate a dynamic stack allocation on the bottom of stack. Then copy the original arguments on the stack to the new allocation.
When we have a varargs function that returns the value in a register on AArch64, but requires an “sret” return on x64, we need to shuffle around the argument registers, and store x3 to the stack with 8bytes alignment.

Diff Detail

Event Timeline

bcl5980 created this revision.Jul 13 2022, 8:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 8:38 PM

Herald added subscribers: zzheng, hiraditya, kristof.beyls. · View Herald Transcript

bcl5980 requested review of this revision.Jul 13 2022, 8:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 8:38 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B175298: Diff 444505.Jul 13 2022, 8:39 PM

bcl5980 added inline comments.Jul 13 2022, 9:26 PM

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
321–323	It looks Microsoft generate code： sub sp,sp,x15,lsl #4 Does anyone know why we can't accept SP as input/ouput for the instruction? def : Pat<(sub GPR64:$Rn, arith_shifted_reg64:$Rm), (SUBSXrs GPR64:$Rn, arith_shifted_reg64:$Rm)>;

bcl5980 added a reviewer: DavidSpickett.Jul 13 2022, 11:48 PM

bcl5980 updated this revision to Diff 444546.Jul 14 2022, 12:56 AM

Harbormaster completed remote builds in B175321: Diff 444546.Jul 14 2022, 12:57 AM

fix build error

Harbormaster completed remote builds in B175322: Diff 444548.Jul 14 2022, 1:00 AM

efriedma added inline comments.Jul 14 2022, 10:04 AM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
483 ↗	(On Diff #444548)	For the whole "x3 is stored to the stack" thing, if we're going to continue to use an alloca, I'd probably recommend just increasing the size of the alloca by 8 bytes, then explicitly emit a store instruction in the IR. Messing with the alignment like this seems likely to cause issues.
llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
321–323	The instruction used by the Microsoft compiler is SUBXrx64. SUBSXrs can't refer to sp that way; the "lsl" is actually an alternate spelling of "uxtx". We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp.

bcl5980 added inline comments.Jul 14 2022, 11:47 PM

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
483 ↗	(On Diff #444548)	I'm trying to move these code into AArch64TargetLowering::LowerCall but still have some problems about the stack layout. And I also trying to make the IR version also correct. Thanks for the idead of allocate extra 8bytes.
llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
321–323	We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp. Do you mean add pattern in td file? def : Pat<(sub GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm), (SUBSXrx GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm)>; If yes, we may need to add a new select function similar to arith_extended_reg32to64_i64 but remove check extend. Or can we do this on DAGCombine? And I think we can do this on another patch.

update based on efriedma's suggestion.

Harbormaster completed remote builds in B175580: Diff 444899.Jul 15 2022, 12:30 AM

bcl5980 marked an inline comment as not done.Jul 15 2022, 1:25 AM

bcl5980 added inline comments.

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
329–330	Can be TODO also. `sub x0, x29, #48` rematerialize from copy x10 and `fmov d0, x10` can't rematerialize so `sub x10, x29, #48` remain. How could we improve the reMaterializeTrivialDef to improve the code?

bcl5980 marked an inline comment as not done.Jul 15 2022, 1:26 AM

efriedma added inline comments.Jul 15 2022, 11:26 AM

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
180	Maybe make the contribution from RetStack a bit more clear.
llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
321–323	I think you'd want a PatLeaf to specifically look for a CopyFromReg from sp, as opposed to changing the way we lower all subtraction operations, but yes, that's the idea. (You could, alternatively, introduce a target-specific node in AArch64ISelLowering.h specifically to represent "sub sp, sp, xN, lsl #4", use it from LowerDYNAMIC_STACKALLOC, and write a pattern to lower it to SUBXrx64.) And yes, please leave it for another patch.
329–330	Not sure how we're ending up with two separate operations in the first place; I'd normally expect SelectionDAG CSE to kick in.

bcl5980 added inline comments.Jul 16 2022, 5:41 AM

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll
329–330	It happen in the pass RegisterCoalescer, after the DAG CSE, even after machine CSE. The machine IR is: %15:gpr64sp = ADDXri %stack.0, 0, 0 $x0 = COPY %15:gpr64sp $d0 = COPY %15:gpr64sp x0 can rematerialize to `ADDXri %stack.0, 0, 0`, but d0 can not.

bcl5980 updated this revision to Diff 445235.Jul 16 2022, 8:05 AM

Harbormaster completed remote builds in B175827: Diff 445235.Jul 16 2022, 8:06 AM

bcl5980 updated this revision to Diff 445236.Jul 16 2022, 8:11 AM

Harbormaster completed remote builds in B175828: Diff 445236.Jul 16 2022, 8:12 AM

When I trying to move the memcpy to SelectionDag AArch64TargetLowering::LowerCall, there are two issues I can't fix:

Load address for __os_arm64x_dispatch_call_no_redirect is after memory copy in IR version. But if we copy memory in LowerCall the load is before memory copy. It cause one more register usage.
32 bytes for register store should be the real bottom on the stack but when I move memory copy into LowerCall , the dynamic allocation is always the real bottom on the stack.

@efriedma Do you have any idea to fix them?

Load address for __os_arm64x_dispatch_call_no_redirect is after memory copy in IR version. But if we copy memory in LowerCall the load is before memory copy. It cause one more register usage.

That's a consequence of doing the load in IR, I think. The load is before the memcpy in the initial SelectionDAG, and nothing tries to rearrange them. By default, scheduling happens in source order, and we don't reorder across calls after isel. I don't see any obvious fix; maybe the call lowering code could try to find the load and mess with its chains? But I wouldn't worry about it; if we actually care about the performance of varargs thunks, there are probably more significant improvements we could make, like trying to inline small memcpys.

32 bytes for register store should be the real bottom on the stack but when I move memory copy into LowerCall , the dynamic allocation is always the real bottom on the stack.

Maybe AArch64FrameLowering::hasReservedCallFrame is returning the wrong thing? Normally, the stack allocation for call arguments is allocated in the prologue; the stack frame layout code needs to know if that's illegal because there a dynamic/large/etc. allocation.

Alternatively, you could just make the extra 32 bytes part of the dynamic allocation, instead of trying to do it "properly".

SelectionDag version.

Harbormaster completed remote builds in B176461: Diff 446092.Jul 20 2022, 2:50 AM

bcl5980 updated this revision to Diff 446094.Jul 20 2022, 2:53 AM

Harbormaster completed remote builds in B176462: Diff 446094.Jul 20 2022, 2:54 AM

Diff 6 445236 is the Final IR version
Diff 8 446094 is SelectionDag version

bcl5980 updated this revision to Diff 446396.Jul 21 2022, 2:17 AM

Harbormaster completed remote builds in B176688: Diff 446396.Jul 21 2022, 2:18 AM

fix struct return with larger than 16bytes crash

Harbormaster completed remote builds in B180873: Diff 452119.Aug 12 2022, 2:48 AM

Rebase

Harbormaster completed remote builds in B184759: Diff 457502.Sep 1 2022, 10:24 PM

bcl5980 updated this revision to Diff 457503.Sep 1 2022, 10:28 PM

Harbormaster completed remote builds in B184760: Diff 457503.Sep 1 2022, 10:29 PM

bcl5980 added a parent revision: D126811: [ARM64EC 10/?] Add support for lowering indirect calls..Sep 1 2022, 10:29 PM

bcl5980 added a child revision: D132926: [ARM64EC 12/?] Add param/ret attr for struct to help generate correct thunk for arm64ec.Sep 1 2022, 10:48 PM

fix crash in non-opaque pointer mode

Harbormaster completed remote builds in B185168: Diff 458083.Sep 5 2022, 7:59 PM

efriedma added inline comments.Sep 13 2022, 4:31 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1477	Is this actually necessary? The linker should resolve plain "memcpy" to something reasonable, I think.
6864	In theory, you call CreateVariableSizedObject so that you can use the returned FrameIndex to refer to the object. If you don't actually use the FrameIndex for anything, the call looks sort of silly, sure.

bcl5980 added inline comments.Sep 26 2022, 2:57 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1477	Based on my local test, if I don't add `#` the memcpy will be link into x86 version memcpy and crash at runtime.

efriedma added inline comments.Sep 26 2022, 12:34 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1477	Does the linking process for arm64ec actually guarantee that we have an arm64ec msvcrt that includes "#memcpy" etc.? Even if it does, I don't think we can make the same assumption for all the other functions SelectionDAG needs to call. Given that, we're going to need some mechanism to allow calls generated by SelectionDAG to participate in thunking. In any case, if this change unblocks testing for you, we can leave it in with a FIXME to address the above.

bcl5980 added inline comments.Sep 26 2022, 7:19 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1477	I try to use dumpbin to dump the symbols for vcruntime.lib. And I find those symbols may related: 40 #memchr 41 #memcmp 42 #memcpy 43 #memmove 44 #memset I think we only need to consider the memory intrinsic functions that MSVC also can export. We can also add these intrinsic's exit thunk in AArch64TargetLowering::LowerCall.

efriedma added inline comments.Sep 27 2022, 10:01 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1477	The full set of stuff SelectionDAG can generate includes is basically stuff from RuntimeLibcalls.def, plus a few target-specific bits. (Off the top of my head, not sure if there are any target-specific calls on arm64 windows besides "`__chkstk`" and "`__security_check_cookie`".) If we expect that arm64ec code normally links against an arm64ec C runtime, I guess most of the routines we'd want should be available in "#"-prefixed versions, but I'm not sure about all of them...

address comments

Harbormaster completed remote builds in B191083: Diff 466268.Oct 8 2022, 1:15 AM

bcl5980 marked 5 inline comments as done.Oct 8 2022, 1:16 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Arm64ECCallLowering.cpp

37 lines

AArch64ISelLowering.h

4 lines

AArch64ISelLowering.cpp

96 lines

test/

CodeGen/

AArch64/

arm64ec-cfg.ll

347 lines

Diff 466268

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines

private:

Constant *GuardFnCFGlobal = nullptr;

Constant *GuardFnGlobal = nullptr;

Module *M = nullptr;

};

} // end anonymous namespace

Function *AArch64Arm64ECCallLowering::buildExitThunk(CallBase *CB) {

Type *RetTy = CB->getFunctionType()->getReturnType();

FunctionType *FT = CB->getFunctionType();

Type *RetTy = FT->getReturnType();

bool IsVarArg = FT->isVarArg();

Type *PtrTy = Type::getInt8PtrTy(M->getContext());

Type *I64Ty = Type::getInt64Ty(M->getContext());

SmallVector<Type *> DefArgTypes;

// The first argument to a thunk is the called function, stored in x9.

// (Normally, we won't explicitly refer to this in the assembly; it just

// gets passed on by the call.)

DefArgTypes.push_back(Type::getInt8PtrTy(M->getContext()));

DefArgTypes.push_back(PtrTy);

if (IsVarArg) {

// We treat the variadic function's exit thunk as a normal function

// with type:

// rettype exitthunk(

// ptr x9, ptr x0, i64 x1, i64 x2, i64 x3, ptr x4, i64 x5)

// that can coverage all types of variadic function.

// x9 is similar to normal exit thunk, store the called function.

// x0-x3 is the arguments be stored in registers.

// x4 is the address of the arguments on the stack.

// x5 is the size of the arguments on the stack.

DefArgTypes.push_back(PtrTy);

for (int i = 0; i < 3; i++)

DefArgTypes.push_back(I64Ty);

DefArgTypes.push_back(PtrTy);

DefArgTypes.push_back(I64Ty);

} else {

for (unsigned i = 0; i < CB->arg_size(); ++i) {

DefArgTypes.push_back(CB->getArgOperand(i)->getType());

}

FunctionType *Ty = FunctionType::get(RetTy, DefArgTypes, false);

Function *F =

Function::Create(Ty, GlobalValue::InternalLinkage, 0, "thunk", M);

F->setCallingConv(CallingConv::ARM64EC_Thunk_Native);

// Copy MSVC, and always set up a frame pointer. (Maybe this isn't necessary.)

F->addFnAttr("frame-pointer", "all");

// Only copy sret from the first argument. For C++ instance methods, clang can

// stick an sret marking on a later argument, but it doesn't actually affect

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

if (Arg.getType()->isArrayTy() || Arg.getType()->isStructTy()) {

if (DL.getTypeStoreSize(Arg.getType()) <= 8)

Args.push_back(IRB.CreateLoad(

IRB.getIntNTy(DL.getTypeStoreSizeInBits(Arg.getType())), Mem));

else

Args.push_back(Mem);

} else {

Args.push_back(&Arg);

}

if (!IsVarArg)

ArgTypes.push_back(Args.back()->getType());

}

// FIXME: Transfer necessary attributes? sret? anything else?

// FIXME: Try to share thunks. This probably involves simplifying the

// argument types (translating all integers/pointers to i64, etc.)

auto *CallTy = FunctionType::get(X64RetType, ArgTypes, false);

auto *CallTy = FunctionType::get(X64RetType, ArgTypes, IsVarArg);

Callee = IRB.CreateBitCast(Callee, CallTy->getPointerTo(0));

CallInst *Call = IRB.CreateCall(CallTy, Callee, Args);

Call->setCallingConv(CallingConv::ARM64EC_Thunk_X64);

Value *RetVal = Call;

if (RetTy->isArrayTy() || RetTy->isStructTy()) {

// If we rewrote the return type earlier, convert the return value to

efriedmaUnsubmitted

Done

// now we don't add this part.

- Constant *AddC = ConstantInt::get(I64Ty, RetStack ? 31 : 15);

+ Constant *AddC = ConstantInt::get(I64Ty, 15 + (RetStack ? 8 : 0));

Constant *NegC = ConstantInt::get(I64Ty, -16ll);

Maybe make the contribution from RetStack a bit more clear.

efriedma: Maybe make the contribution from RetStack a bit more clear.

// the proper type.

if (DL.getTypeStoreSize(RetTy) > 8) {

RetVal = IRB.CreateLoad(RetTy, Args[1]);

} else {

Value *CastAlloca = IRB.CreateAlloca(RetTy);

IRB.CreateStore(Call, IRB.CreateBitCast(

CastAlloca, Call->getType()->getPointerTo(0)));

RetVal = IRB.CreateLoad(RetTy, CastAlloca);

▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 901 Lines • ▼ Show 20 Lines	private:
void addTypeForNEON(MVT VT);		void addTypeForNEON(MVT VT);
void addTypeForFixedLengthSVE(MVT VT);		void addTypeForFixedLengthSVE(MVT VT);
void addDRTypeForNEON(MVT VT);		void addDRTypeForNEON(MVT VT);
void addQRTypeForNEON(MVT VT);		void addQRTypeForNEON(MVT VT);

unsigned allocateLazySaveBuffer(SDValue &Chain, const SDLoc &DL,		unsigned allocateLazySaveBuffer(SDValue &Chain, const SDLoc &DL,
SelectionDAG &DAG, Register &Reg) const;		SelectionDAG &DAG, Register &Reg) const;

		SDValue varArgCopyForExitThunk(SelectionDAG &DAG, SDLoc &DL, SDValue Chain,
		SmallVector<SDValue, 32> &OutVals,
		bool RetStack) const;

SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,		SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
bool isVarArg,		bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &DL, SelectionDAG &DAG,		const SDLoc &DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;

SDValue LowerCall(CallLoweringInfo & /CLI/,		SDValue LowerCall(CallLoweringInfo & /CLI/,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;
▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,463 Lines • ▼ Show 20 Lines	#undef LCALLNAME5
if (Subtarget->hasMOPS() && Subtarget->hasMTE()) {		if (Subtarget->hasMOPS() && Subtarget->hasMTE()) {
// Only required for llvm.aarch64.mops.memset.tag		// Only required for llvm.aarch64.mops.memset.tag
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);
}		}

PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();		PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();

IsStrictFPEnabled = true;		IsStrictFPEnabled = true;

		if (Subtarget->isWindowsArm64EC()) {
		// FIXME: are there other intrinsics we need to add here?
		setLibcallName(RTLIB::MEMCPY, "#memcpy");
		setLibcallName(RTLIB::MEMSET, "#memset");
		setLibcallName(RTLIB::MEMMOVE, "#memmove");
		efriedmaUnsubmitted Not Done Reply Inline Actions Is this actually necessary? The linker should resolve plain "memcpy" to something reasonable, I think. efriedma: Is this actually necessary? The linker should resolve plain "memcpy" to something reasonable…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Based on my local test, if I don't add `#` the memcpy will be link into x86 version memcpy and crash at runtime. bcl5980: Based on my local test, if I don't add `#` the memcpy will be link into x86 version memcpy and…
		efriedmaUnsubmitted Not Done Reply Inline Actions Does the linking process for arm64ec actually guarantee that we have an arm64ec msvcrt that includes "#memcpy" etc.? Even if it does, I don't think we can make the same assumption for all the other functions SelectionDAG needs to call. Given that, we're going to need some mechanism to allow calls generated by SelectionDAG to participate in thunking. In any case, if this change unblocks testing for you, we can leave it in with a FIXME to address the above. efriedma: Does the linking process for arm64ec actually guarantee that we have an arm64ec msvcrt that…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions I try to use dumpbin to dump the symbols for vcruntime.lib. And I find those symbols may related: 40 #memchr 41 #memcmp 42 #memcpy 43 #memmove 44 #memset I think we only need to consider the memory intrinsic functions that MSVC also can export. We can also add these intrinsic's exit thunk in AArch64TargetLowering::LowerCall. bcl5980: I try to use dumpbin to dump the symbols for vcruntime.lib. And I find those symbols may…
		efriedmaUnsubmitted Not Done Reply Inline Actions The full set of stuff SelectionDAG can generate includes is basically stuff from RuntimeLibcalls.def, plus a few target-specific bits. (Off the top of my head, not sure if there are any target-specific calls on arm64 windows besides "`__chkstk`" and "`__security_check_cookie`".) If we expect that arm64ec code normally links against an arm64ec C runtime, I guess most of the routines we'd want should be available in "#"-prefixed versions, but I'm not sure about all of them... efriedma: The full set of stuff SelectionDAG can generate includes is basically stuff from…
		}
}		}

void AArch64TargetLowering::addTypeForNEON(MVT VT) {		void AArch64TargetLowering::addTypeForNEON(MVT VT) {
assert(VT.isVector() && "VT should be a vector type");		assert(VT.isVector() && "VT should be a vector type");

if (VT.isFloatingPoint()) {		if (VT.isFloatingPoint()) {
MVT PromoteTo = EVT(VT).changeVectorElementTypeToInteger().getSimpleVT();		MVT PromoteTo = EVT(VT).changeVectorElementTypeToInteger().getSimpleVT();
setOperationPromotedToType(ISD::LOAD, VT, PromoteTo);		setOperationPromotedToType(ISD::LOAD, VT, PromoteTo);
▲ Show 20 Lines • Show All 5,079 Lines • ▼ Show 20 Lines	static void analyzeCallOperands(const AArch64TargetLowering &TLI,
const SelectionDAG &DAG = CLI.DAG;		const SelectionDAG &DAG = CLI.DAG;
CallingConv::ID CalleeCC = CLI.CallConv;		CallingConv::ID CalleeCC = CLI.CallConv;
bool IsVarArg = CLI.IsVarArg;		bool IsVarArg = CLI.IsVarArg;
const SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;		const SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;
bool IsCalleeWin64 = Subtarget->isCallingConvWin64(CalleeCC);		bool IsCalleeWin64 = Subtarget->isCallingConvWin64(CalleeCC);

// For Arm64EC thunks, allocate 32 extra bytes at the bottom of the stack		// For Arm64EC thunks, allocate 32 extra bytes at the bottom of the stack
// for the shadow store.		// for the shadow store.
if (CalleeCC == CallingConv::ARM64EC_Thunk_X64)		// Variadic function allocate 32 extra bytes in the dynamic allocation
		if (CalleeCC == CallingConv::ARM64EC_Thunk_X64 && !IsVarArg)
CCInfo.AllocateStack(32, Align(16));		CCInfo.AllocateStack(32, Align(16));

unsigned NumArgs = Outs.size();		unsigned NumArgs = Outs.size();
for (unsigned i = 0; i != NumArgs; ++i) {		for (unsigned i = 0; i != NumArgs; ++i) {
MVT ArgVT = Outs[i].VT;		MVT ArgVT = Outs[i].VT;
ISD::ArgFlagsTy ArgFlags = Outs[i].Flags;		ISD::ArgFlagsTy ArgFlags = Outs[i].Flags;

bool UseVarArgCC = false;		bool UseVarArgCC = false;
▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::changeStreamingMode(

if (InFlag)		if (InFlag)
Ops.push_back(InFlag);		Ops.push_back(InFlag);

unsigned Opcode = Enable ? AArch64ISD::SMSTART : AArch64ISD::SMSTOP;		unsigned Opcode = Enable ? AArch64ISD::SMSTART : AArch64ISD::SMSTOP;
return DAG.getNode(Opcode, DL, DAG.getVTList(MVT::Other, MVT::Glue), Ops);		return DAG.getNode(Opcode, DL, DAG.getVTList(MVT::Other, MVT::Glue), Ops);
}		}

		// Variadic function's exit thunk need to allocate an allocation on
		// the bottom of current stack as callee's real arguments on the stack,
		// then copy caller's arguments on the stack to the allocation .
		SDValue AArch64TargetLowering::varArgCopyForExitThunk(
		SelectionDAG &DAG, SDLoc &DL, SDValue Chain,
		SmallVector<SDValue, 32> &OutVals, bool RetStack) const {
		MachineFunction &MF = DAG.getMachineFunction();
		MachineFrameInfo &MFI = MF.getFrameInfo();

		// Memory addresss of the arguments on the stack.
		SDValue X4Stack = OutVals[OutVals.size() - 2];
		// Size of the arguments on the stack.
		SDValue X5Length = OutVals[OutVals.size() - 1];

		// 32 extra bytes shadow register
		// 8 extra bytes to store x3
		int64_t ExtraAlloc = 32 + (RetStack ? 8 : 0);
		SDValue AlignC = DAG.getConstant(0, DL, MVT::i64);
		SDValue AddC = DAG.getConstant(15 + ExtraAlloc, DL, MVT::i64);
		SDValue Add = DAG.getNode(ISD::ADD, DL, MVT::i64, X5Length, AddC);

		// Dynamic stack wiil align the size to 16btyes.
		// It looks Microsoft not only align the size to 16bytes,
		// but also align (-1,-15) to -16. We don't know why so for
		// now we don't add this part.
		SDValue Ops[] = {Chain, Add, AlignC};
		SDVTList VTs = DAG.getVTList(MVT::i64, MVT::Other);
		SDValue Buffer = DAG.getNode(ISD::DYNAMIC_STACKALLOC, DL, VTs, Ops);
		unsigned FI = MFI.CreateVariableSizedObject(Align(16), nullptr);
		Register Reg =
		efriedmaUnsubmitted Done Reply Inline Actions In theory, you call CreateVariableSizedObject so that you can use the returned FrameIndex to refer to the object. If you don't actually use the FrameIndex for anything, the call looks sort of silly, sure. efriedma: In theory, you call CreateVariableSizedObject so that you can use the returned FrameIndex to…
		MF.getRegInfo().createVirtualRegister(getRegClassFor(MVT::i64));
		Chain = DAG.getCopyToReg(Buffer.getValue(1), DL, Reg, Buffer.getValue(0));
		MachinePointerInfo PtrInfo = MachinePointerInfo::getStack(MF, FI);
		SDValue Ptr = DAG.getObjectPtrOffset(DL, Buffer, TypeSize::Fixed(32));

		// When varargs function returns the value in a register on AArch64,
		// but requires an “sret” return on x64, we need to shuffle around
		// the argument registers, and store x3 to the stack.
		if (RetStack) {
		Chain = DAG.getStore(Chain, DL, OutVals[5], Ptr, PtrInfo.getWithOffset(32));
		Ptr = DAG.getObjectPtrOffset(DL, Buffer, TypeSize::Fixed(ExtraAlloc));
		}

		SDValue Cpy =
		DAG.getMemcpy(Chain, DL, Ptr, X4Stack, X5Length, Align(8),
		/isVol = / false, /AlwaysInline = / false,
		/isTailCall = / false, PtrInfo.getWithOffset(ExtraAlloc),
		MachinePointerInfo());
		Chain = Cpy.getValue(1);

		return Chain;
		}

/// LowerCall - Lower a call to a callseq_start + CALL + callseq_end chain,		/// LowerCall - Lower a call to a callseq_start + CALL + callseq_end chain,
/// and add input and output parameter nodes.		/// and add input and output parameter nodes.
SDValue		SDValue
AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,		AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const {		SmallVectorImpl<SDValue> &InVals) const {
SelectionDAG &DAG = CLI.DAG;		SelectionDAG &DAG = CLI.DAG;
SDLoc &DL = CLI.DL;		SDLoc &DL = CLI.DL;
SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;		SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;
SmallVector<SDValue, 32> &OutVals = CLI.OutVals;		SmallVector<SDValue, 32> &OutVals = CLI.OutVals;
SmallVector<ISD::InputArg, 32> &Ins = CLI.Ins;		SmallVector<ISD::InputArg, 32> &Ins = CLI.Ins;
SDValue Chain = CLI.Chain;		SDValue Chain = CLI.Chain;
SDValue Callee = CLI.Callee;		SDValue Callee = CLI.Callee;
bool &IsTailCall = CLI.IsTailCall;		bool &IsTailCall = CLI.IsTailCall;
CallingConv::ID &CallConv = CLI.CallConv;		CallingConv::ID &CallConv = CLI.CallConv;
bool IsVarArg = CLI.IsVarArg;		bool IsVarArg = CLI.IsVarArg;
		bool IsArm64EcThunk = CallConv == CallingConv::ARM64EC_Thunk_X64;

MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MachineFunction::CallSiteInfo CSInfo;		MachineFunction::CallSiteInfo CSInfo;
bool IsThisReturn = false;		bool IsThisReturn = false;

AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
bool TailCallOpt = MF.getTarget().Options.GuaranteedTailCallOpt;		bool TailCallOpt = MF.getTarget().Options.GuaranteedTailCallOpt;
bool IsCFICall = CLI.CB && CLI.CB->isIndirectCall() && CLI.CFIType;		bool IsCFICall = CLI.CB && CLI.CB->isIndirectCall() && CLI.CFIType;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
if (IsVarArg) {		if (IsVarArg) {
unsigned NumArgs = Outs.size();		unsigned NumArgs = Outs.size();

for (unsigned i = 0; i != NumArgs; ++i) {		for (unsigned i = 0; i != NumArgs; ++i) {
if (!Outs[i].IsFixed && Outs[i].VT.isScalableVector())		if (!Outs[i].IsFixed && Outs[i].VT.isScalableVector())
report_fatal_error("Passing SVE types to variadic functions is "		report_fatal_error("Passing SVE types to variadic functions is "
"currently not supported");		"currently not supported");
}		}

		// Variadic exit thunk only need first 5 parameters to lower call itself.
		// Last 2 arguments are the stack address and size.
		if (IsArm64EcThunk)
		Outs.resize(5);
}		}

analyzeCallOperands(*this, Subtarget, CLI, CCInfo);		analyzeCallOperands(*this, Subtarget, CLI, CCInfo);

// Get a count of how many bytes are to be pushed on the stack.		// Get a count of how many bytes are to be pushed on the stack.
unsigned NumBytes = CCInfo.getNextStackOffset();		unsigned NumBytes = CCInfo.getNextStackOffset();

if (IsSibCall) {		if (IsSibCall) {
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,		SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,
getPointerTy(DAG.getDataLayout()));		getPointerTy(DAG.getDataLayout()));

SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;		SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;
SmallSet<unsigned, 8> RegsUsed;		SmallSet<unsigned, 8> RegsUsed;
SmallVector<SDValue, 8> MemOpChains;		SmallVector<SDValue, 8> MemOpChains;
auto PtrVT = getPointerTy(DAG.getDataLayout());		auto PtrVT = getPointerTy(DAG.getDataLayout());

		if (IsVarArg && IsArm64EcThunk)
		Chain = varArgCopyForExitThunk(DAG, DL, Chain, OutVals, Outs[1].IsFixed);

if (IsVarArg && CLI.CB && CLI.CB->isMustTailCall()) {		if (IsVarArg && CLI.CB && CLI.CB->isMustTailCall()) {
const auto &Forwards = FuncInfo->getForwardedMustTailRegParms();		const auto &Forwards = FuncInfo->getForwardedMustTailRegParms();
for (const auto &F : Forwards) {		for (const auto &F : Forwards) {
SDValue Val = DAG.getCopyFromReg(Chain, DL, F.VReg, F.VT);		SDValue Val = DAG.getCopyFromReg(Chain, DL, F.VReg, F.VT);
RegsToPass.emplace_back(F.PReg, Val);		RegsToPass.emplace_back(F.PReg, Val);
}		}
}		}

▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
if (RequiresSMChange && isa<FrameIndexSDNode>(Arg))		if (RequiresSMChange && isa<FrameIndexSDNode>(Arg))
Arg = DAG.getNode(AArch64ISD::OBSCURE_COPY, DL, MVT::i64, Arg);		Arg = DAG.getNode(AArch64ISD::OBSCURE_COPY, DL, MVT::i64, Arg);
RegsToPass.emplace_back(VA.getLocReg(), Arg);		RegsToPass.emplace_back(VA.getLocReg(), Arg);
RegsUsed.insert(VA.getLocReg());		RegsUsed.insert(VA.getLocReg());
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
if (Options.EmitCallSiteInfo)		if (Options.EmitCallSiteInfo)
CSInfo.emplace_back(VA.getLocReg(), i);		CSInfo.emplace_back(VA.getLocReg(), i);
}		}

		if (IsVarArg && IsArm64EcThunk) {
		// Float parameters are passed in both int and float register
		Register ShadowReg;
		switch (VA.getLocReg()) {
		case AArch64::X0:
		ShadowReg = AArch64::D0;
		break;
		case AArch64::X1:
		ShadowReg = AArch64::D1;
		break;
		case AArch64::X2:
		ShadowReg = AArch64::D2;
		break;
		case AArch64::X3:
		ShadowReg = AArch64::D3;
		break;
		}
		if (ShadowReg)
		RegsToPass.push_back(std::make_pair(
		ShadowReg, DAG.getRegister(VA.getLocReg(), MVT::i64)));
		}
} else {		} else {
assert(VA.isMemLoc());		assert(VA.isMemLoc());

SDValue DstAddr;		SDValue DstAddr;
MachinePointerInfo DstInfo;		MachinePointerInfo DstInfo;

// FIXME: This works on big-endian for composite byvals, which are the		// FIXME: This works on big-endian for composite byvals, which are the
// common case. It should also work for fundamental types too.		// common case. It should also work for fundamental types too.
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
Arg = DAG.getNode(ISD::TRUNCATE, DL, VA.getValVT(), Arg);		Arg = DAG.getNode(ISD::TRUNCATE, DL, VA.getValVT(), Arg);

SDValue Store = DAG.getStore(Chain, DL, Arg, DstAddr, DstInfo);		SDValue Store = DAG.getStore(Chain, DL, Arg, DstAddr, DstInfo);
MemOpChains.push_back(Store);		MemOpChains.push_back(Store);
}		}
}		}
}		}

if (IsVarArg && Subtarget->isWindowsArm64EC()) {		if (IsVarArg && Subtarget->isWindowsArm64EC() && !IsArm64EcThunk) {
// For vararg calls, the Arm64EC ABI requires values in x4 and x5		// For vararg calls, the Arm64EC ABI requires values in x4 and x5
// describing the argument list. x4 contains the address of the		// describing the argument list. x4 contains the address of the
// first stack parameter. x5 contains the size in bytes of all parameters		// first stack parameter. x5 contains the size in bytes of all parameters
// passed on the stack.		// passed on the stack.
RegsToPass.emplace_back(AArch64::X4, StackPtr);		RegsToPass.emplace_back(AArch64::X4, StackPtr);
RegsToPass.emplace_back(AArch64::X5,		RegsToPass.emplace_back(AArch64::X5,
DAG.getConstant(NumBytes, DL, MVT::i64));		DAG.getConstant(NumBytes, DL, MVT::i64));
}		}
▲ Show 20 Lines • Show All 15,553 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll

	Show All 13 Lines
	}			}

	define void @f3(ptr %g) {			define void @f3(ptr %g) {
	entry:			entry:
	call void %g([4 x float] zeroinitializer)			call void %g([4 x float] zeroinitializer)
	ret void			ret void
	}			}

				define [2 x i64] @f4(ptr %g) {
				entry:
				%r = call [2 x i64] %g([4 x float] zeroinitializer)
				ret [2 x i64] %r
				}

				define void @fvar(ptr %g) {
				entry:
				call void (i32, ...) %g(i32 4, i32 5, i32 6, i32 8)
				ret void
				}

				define i32 @fvar2(ptr %g) {
				entry:
				%r = call i32 (i32, ...) %g(i32 4, i32 5, i32 6, i32 8, i32 7, i32 9)
				ret i32 %r
				}

				define [2 x i64] @fvar3(ptr %g) {
				entry:
				%r = call [2 x i64] (i32, ...) %g(i32 4, i32 5, i32 6, i32 8, i32 7, i32 9)
				ret [2 x i64] %r
				}

	; CHECK-LABEL: f:			; CHECK-LABEL: f:
	; CHECK: .seh_proc f			; CHECK: .seh_proc f
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endprologue			; CHECK-NEXT: .seh_endprologue
	; CHECK-NEXT: adrp x8, __os_arm64x_check_icall			; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
	; CHECK-NEXT: adrp x10, thunk			; CHECK-NEXT: adrp x10, thunk
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .seh_startepilogue			; CHECK-NEXT: .seh_startepilogue
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .seh_save_reg_x x30, 16			; CHECK-NEXT: .seh_save_reg_x x30, 16
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
	;			;
				; CHECK-LABEL: f4:
				; CHECK: .seh_proc f4
				; CHECK-NEXT: // %bb.0: // %entry
				; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .seh_save_reg_x x30, 16
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
				; CHECK-NEXT: adrp x10, thunk.3
				; CHECK-NEXT: add x10, x10, :lo12:thunk.3
				; CHECK-NEXT: mov x11, x0
				; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
				; CHECK-NEXT: blr x8
				; CHECK-NEXT: movi d0, #0000000000000000
				; CHECK-NEXT: movi d1, #0000000000000000
				; CHECK-NEXT: movi d2, #0000000000000000
				; CHECK-NEXT: movi d3, #0000000000000000
				; CHECK-NEXT: blr x11
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg_x x30, 16
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: fvar:
				; CHECK: .seh_proc fvar
				; CHECK-NEXT: // %bb.0: // %entry
				; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .seh_save_reg_x x30, 16
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
				; CHECK-NEXT: adrp x10, thunk.4
				; CHECK-NEXT: add x10, x10, :lo12:thunk.4
				; CHECK-NEXT: mov x11, x0
				; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
				; CHECK-NEXT: blr x8
				; CHECK-NEXT: mov w0, #4
				; CHECK-NEXT: mov w1, #5
				; CHECK-NEXT: mov w2, #6
				; CHECK-NEXT: mov w3, #8
				; CHECK-NEXT: mov x4, sp
				; CHECK-NEXT: mov x5, xzr
				; CHECK-NEXT: blr x11
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg_x x30, 16
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: fvar2:
				; CHECK: .seh_proc fvar2
				; CHECK-NEXT: // %bb.0: // %entry
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: .seh_stackalloc 32
				; CHECK-NEXT: str x30, [sp, #16] // 8-byte Folded Spill
				; CHECK-NEXT: .seh_save_reg x30, 16
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
				; CHECK-NEXT: adrp x10, thunk.5
				; CHECK-NEXT: add x10, x10, :lo12:thunk.5
				; CHECK-NEXT: mov x11, x0
				; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
				; CHECK-NEXT: blr x8
				; CHECK-NEXT: mov x4, sp
				; CHECK-NEXT: mov w8, #9
				; CHECK-NEXT: mov w9, #7
				; CHECK-NEXT: mov w0, #4
				; CHECK-NEXT: mov w1, #5
				; CHECK-NEXT: mov w2, #6
				; CHECK-NEXT: mov w3, #8
				; CHECK-NEXT: mov w5, #16
				; CHECK-NEXT: str w8, [sp, #8]
				; CHECK-NEXT: str w9, [sp]
				; CHECK-NEXT: blr x11
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: ldr x30, [sp, #16] // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg x30, 16
				; CHECK-NEXT: add sp, sp, #32
				; CHECK-NEXT: .seh_stackalloc 32
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: fvar3:
				; CHECK: .seh_proc fvar3
				; CHECK-NEXT: // %bb.0: // %entry
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: .seh_stackalloc 32
				; CHECK-NEXT: str x30, [sp, #16] // 8-byte Folded Spill
				; CHECK-NEXT: .seh_save_reg x30, 16
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
				; CHECK-NEXT: adrp x10, thunk.6
				; CHECK-NEXT: add x10, x10, :lo12:thunk.6
				; CHECK-NEXT: mov x11, x0
				; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
				; CHECK-NEXT: blr x8
				; CHECK-NEXT: mov x4, sp
				; CHECK-NEXT: mov w8, #9
				; CHECK-NEXT: mov w9, #7
				; CHECK-NEXT: mov w0, #4
				; CHECK-NEXT: mov w1, #5
				; CHECK-NEXT: mov w2, #6
				; CHECK-NEXT: mov w3, #8
				; CHECK-NEXT: mov w5, #16
				; CHECK-NEXT: str w8, [sp, #8]
				; CHECK-NEXT: str w9, [sp]
				; CHECK-NEXT: blr x11
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: ldr x30, [sp, #16] // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg x30, 16
				; CHECK-NEXT: add sp, sp, #32
				; CHECK-NEXT: .seh_stackalloc 32
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
	; CHECK-LABEL: thunk:			; CHECK-LABEL: thunk:
	; CHECK: .seh_proc thunk			; CHECK: .seh_proc thunk
	; CHECK-NEXT: // %bb.0:			; CHECK-NEXT: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #48			; CHECK-NEXT: sub sp, sp, #48
	; CHECK-NEXT: .seh_stackalloc 48			; CHECK-NEXT: .seh_stackalloc 48
	; CHECK-NEXT: stp x29, x30, [sp, #32] // 16-byte Folded Spill			; CHECK-NEXT: stp x29, x30, [sp, #32] // 16-byte Folded Spill
	; CHECK-NEXT: .seh_save_fplr 32			; CHECK-NEXT: .seh_save_fplr 32
	; CHECK-NEXT: add x29, sp, #32			; CHECK-NEXT: add x29, sp, #32
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload
	; CHECK-NEXT: .seh_save_fplr 48			; CHECK-NEXT: .seh_save_fplr 48
	; CHECK-NEXT: add sp, sp, #64			; CHECK-NEXT: add sp, sp, #64
	; CHECK-NEXT: .seh_stackalloc 64			; CHECK-NEXT: .seh_stackalloc 64
	; CHECK-NEXT: .seh_endepilogue			; CHECK-NEXT: .seh_endepilogue
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .seh_endfunclet			; CHECK-NEXT: .seh_endfunclet
	; CHECK-NEXT: .seh_endproc			; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: thunk.3:
				; CHECK: .seh_proc thunk.3
				; CHECK-NEXT: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #80
				; CHECK-NEXT: .seh_stackalloc 80
				; CHECK-NEXT: stp x29, x30, [sp, #64] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_fplr 64
				; CHECK-NEXT: add x29, sp, #64
				; CHECK-NEXT: .seh_add_fp 64
				; CHECK-NEXT: .seh_endprologue
				bcl5980AuthorUnsubmitted Done Reply Inline Actions It looks Microsoft generate code： sub sp,sp,x15,lsl #4 Does anyone know why we can't accept SP as input/ouput for the instruction? def : Pat<(sub GPR64:$Rn, arith_shifted_reg64:$Rm), (SUBSXrs GPR64:$Rn, arith_shifted_reg64:$Rm)>; bcl5980: It looks Microsoft generate code： ``` sub sp,sp,x15,lsl #4 ``` Does anyone know why we…
				efriedmaUnsubmitted Done Reply Inline Actions The instruction used by the Microsoft compiler is SUBXrx64. SUBSXrs can't refer to sp that way; the "lsl" is actually an alternate spelling of "uxtx". We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp. efriedma: The instruction used by the Microsoft compiler is SUBXrx64. SUBSXrs can't refer to sp that way…
				bcl5980AuthorUnsubmitted Done Reply Inline Actions We could add a specialized pattern specifically for subtraction operations where the first operand of the subtraction is a copy from sp. Do you mean add pattern in td file? def : Pat<(sub GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm), (SUBSXrx GPR64sp:$SP, arith_extended_reg32to64_i64:$Rm)>; If yes, we may need to add a new select function similar to arith_extended_reg32to64_i64 but remove check extend. Or can we do this on DAGCombine? And I think we can do this on another patch. bcl5980: //We could add a specialized pattern specifically for subtraction operations where the first…
				efriedmaUnsubmitted Done Reply Inline Actions I think you'd want a PatLeaf to specifically look for a CopyFromReg from sp, as opposed to changing the way we lower all subtraction operations, but yes, that's the idea. (You could, alternatively, introduce a target-specific node in AArch64ISelLowering.h specifically to represent "sub sp, sp, xN, lsl #4", use it from LowerDYNAMIC_STACKALLOC, and write a pattern to lower it to SUBXrx64.) And yes, please leave it for another patch. efriedma: I think you'd want a PatLeaf to specifically look for a CopyFromReg from sp, as opposed to…
				; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
				; CHECK-NEXT: sub x0, x29, #16
				; CHECK-NEXT: add x1, sp, #32
				; CHECK-NEXT: stp s1, s2, [sp, #36]
				; CHECK-NEXT: str s0, [sp, #32]
				; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: str s3, [sp, #44]
				bcl5980AuthorUnsubmitted Done Reply Inline Actions Can be TODO also. `sub x0, x29, #48` rematerialize from copy x10 and `fmov d0, x10` can't rematerialize so `sub x10, x29, #48` remain. How could we improve the reMaterializeTrivialDef to improve the code? bcl5980: Can be TODO also. `sub x0, x29, #48` rematerialize from copy x10 and `fmov d0, x10` can't…
				efriedmaUnsubmitted Done Reply Inline Actions Not sure how we're ending up with two separate operations in the first place; I'd normally expect SelectionDAG CSE to kick in. efriedma: Not sure how we're ending up with two separate operations in the first place; I'd normally…
				bcl5980AuthorUnsubmitted Done Reply Inline Actions It happen in the pass RegisterCoalescer, after the DAG CSE, even after machine CSE. The machine IR is: %15:gpr64sp = ADDXri %stack.0, 0, 0 $x0 = COPY %15:gpr64sp $d0 = COPY %15:gpr64sp x0 can rematerialize to `ADDXri %stack.0, 0, 0`, but d0 can not. bcl5980: It happen in the pass RegisterCoalescer, after the DAG CSE, even after machine CSE. The machine…
				; CHECK-NEXT: blr x8
				; CHECK-NEXT: ldp x0, x1, [x29, #-16]
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: ldp x29, x30, [sp, #64] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_fplr 64
				; CHECK-NEXT: add sp, sp, #80
				; CHECK-NEXT: .seh_stackalloc 80
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: thunk.4:
				; CHECK: .seh_proc thunk.4
				; CHECK-NEXT: // %bb.0:
				; CHECK-NEXT: stp x19, x20, [sp, #-64]! // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp_x x19, 64
				; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: stp x25, x26, [sp, #32] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp x25, 32
				; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_fplr 48
				; CHECK-NEXT: add x29, sp, #48
				; CHECK-NEXT: .seh_add_fp 48
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
				; CHECK-NEXT: mov x19, x3
				; CHECK-NEXT: mov x20, x2
				; CHECK-NEXT: mov x21, x1
				; CHECK-NEXT: mov x22, x0
				; CHECK-NEXT: mov x25, x9
				; CHECK-NEXT: ldr x26, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: add x8, x5, #47
				; CHECK-NEXT: lsr x15, x8, #4
				; CHECK-NEXT: bl __chkstk_arm64ec
				; CHECK-NEXT: sub x8, sp, x15, lsl #4
				; CHECK-NEXT: add x0, x8, #32
				; CHECK-NEXT: mov sp, x8
				; CHECK-NEXT: mov x1, x4
				; CHECK-NEXT: mov x2, x5
				; CHECK-NEXT: bl "#memcpy"
				; CHECK-NEXT: mov x0, x22
				; CHECK-NEXT: mov x1, x21
				; CHECK-NEXT: mov x2, x20
				; CHECK-NEXT: mov x3, x19
				; CHECK-NEXT: mov x9, x25
				; CHECK-NEXT: fmov d0, x0
				; CHECK-NEXT: fmov d1, x1
				; CHECK-NEXT: fmov d2, x2
				; CHECK-NEXT: fmov d3, x3
				; CHECK-NEXT: blr x26
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: sub sp, x29, #48
				; CHECK-NEXT: .seh_add_fp 48
				; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_fplr 48
				; CHECK-NEXT: ldp x25, x26, [sp, #32] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x25, 32
				; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: ldp x19, x20, [sp], #64 // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp_x x19, 64
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: thunk.5:
				; CHECK: .seh_proc thunk.5
				; CHECK-NEXT: // %bb.0:
				; CHECK-NEXT: stp x19, x20, [sp, #-64]! // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp_x x19, 64
				; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: stp x25, x26, [sp, #32] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp x25, 32
				; CHECK-NEXT: stp x29, x30, [sp, #48] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_fplr 48
				; CHECK-NEXT: add x29, sp, #48
				; CHECK-NEXT: .seh_add_fp 48
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
				; CHECK-NEXT: mov x19, x3
				; CHECK-NEXT: mov x20, x2
				; CHECK-NEXT: mov x21, x1
				; CHECK-NEXT: mov x22, x0
				; CHECK-NEXT: mov x25, x9
				; CHECK-NEXT: ldr x26, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: add x8, x5, #47
				; CHECK-NEXT: lsr x15, x8, #4
				; CHECK-NEXT: bl __chkstk_arm64ec
				; CHECK-NEXT: sub x8, sp, x15, lsl #4
				; CHECK-NEXT: add x0, x8, #32
				; CHECK-NEXT: mov sp, x8
				; CHECK-NEXT: mov x1, x4
				; CHECK-NEXT: mov x2, x5
				; CHECK-NEXT: bl "#memcpy"
				; CHECK-NEXT: mov x0, x22
				; CHECK-NEXT: mov x1, x21
				; CHECK-NEXT: mov x2, x20
				; CHECK-NEXT: mov x3, x19
				; CHECK-NEXT: mov x9, x25
				; CHECK-NEXT: fmov d0, x0
				; CHECK-NEXT: fmov d1, x1
				; CHECK-NEXT: fmov d2, x2
				; CHECK-NEXT: fmov d3, x3
				; CHECK-NEXT: blr x26
				; CHECK-NEXT: mov w0, w8
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: sub sp, x29, #48
				; CHECK-NEXT: .seh_add_fp 48
				; CHECK-NEXT: ldp x29, x30, [sp, #48] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_fplr 48
				; CHECK-NEXT: ldp x25, x26, [sp, #32] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x25, 32
				; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: ldp x19, x20, [sp], #64 // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp_x x19, 64
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc
				;
				; CHECK-LABEL: thunk.6:
				; CHECK: .seh_proc thunk.6
				; CHECK-NEXT: // %bb.0:
				; CHECK-NEXT: stp x19, x20, [sp, #-64]! // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp_x x19, 64
				; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: str x25, [sp, #32] // 8-byte Folded Spill
				; CHECK-NEXT: .seh_save_reg x25, 32
				; CHECK-NEXT: stp x29, x30, [sp, #40] // 16-byte Folded Spill
				; CHECK-NEXT: .seh_save_fplr 40
				; CHECK-NEXT: add x29, sp, #40
				; CHECK-NEXT: .seh_add_fp 40
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .seh_stackalloc 16
				; CHECK-NEXT: .seh_endprologue
				; CHECK-NEXT: adrp x8, __os_arm64x_dispatch_call_no_redirect
				; CHECK-NEXT: mov x19, x2
				; CHECK-NEXT: mov x20, x1
				; CHECK-NEXT: mov x21, x0
				; CHECK-NEXT: mov x22, x9
				; CHECK-NEXT: ldr x25, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
				; CHECK-NEXT: add x8, x5, #55
				; CHECK-NEXT: lsr x15, x8, #4
				; CHECK-NEXT: bl __chkstk_arm64ec
				; CHECK-NEXT: sub x8, sp, x15, lsl #4
				; CHECK-NEXT: mov sp, x8
				; CHECK-NEXT: add x0, x8, #40
				; CHECK-NEXT: mov x1, x4
				; CHECK-NEXT: mov x2, x5
				; CHECK-NEXT: str x3, [x8, #32]
				; CHECK-NEXT: bl "#memcpy"
				; CHECK-NEXT: sub x0, x29, #56
				; CHECK-NEXT: mov x1, x21
				; CHECK-NEXT: mov x2, x20
				; CHECK-NEXT: mov x3, x19
				; CHECK-NEXT: mov x9, x22
				; CHECK-NEXT: fmov d0, x0
				; CHECK-NEXT: fmov d1, x1
				; CHECK-NEXT: fmov d2, x2
				; CHECK-NEXT: fmov d3, x3
				; CHECK-NEXT: blr x25
				; CHECK-NEXT: ldp x0, x1, [x29, #-56]
				; CHECK-NEXT: .seh_startepilogue
				; CHECK-NEXT: sub sp, x29, #40
				; CHECK-NEXT: .seh_add_fp 40
				; CHECK-NEXT: ldp x29, x30, [sp, #40] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_fplr 40
				; CHECK-NEXT: ldr x25, [sp, #32] // 8-byte Folded Reload
				; CHECK-NEXT: .seh_save_reg x25, 32
				; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp x21, 16
				; CHECK-NEXT: ldp x19, x20, [sp], #64 // 16-byte Folded Reload
				; CHECK-NEXT: .seh_save_regp_x x19, 64
				; CHECK-NEXT: .seh_endepilogue
				; CHECK-NEXT: ret
				; CHECK-NEXT: .seh_endfunclet
				; CHECK-NEXT: .seh_endproc

This is an archive of the discontinued LLVM Phabricator instance.

[ARM64EC 11/?] Add support for lowering variadic indirect calls.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 466268

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/arm64ec-cfg.ll

[ARM64EC 11/?] Add support for lowering variadic indirect calls.
Needs ReviewPublic