This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/X86/
-
lib/
-
Target/
-
X86/
2
X86ISelLowering.cpp

Differential D109966

[X86][NFC] structure-return simplificiation
ClosedPublic

Authored by urnathan on Sep 17 2021, 8:01 AM.

Download Raw Diff

Details

Reviewers

rnk

Commits

rGc11e7b59d2e9: [X86][NFC] structure-return simplificiation

Summary

The X86 backend only needs to know whether structure return is via an
sret pointer.  This removes the categorization enumeration and adjusts
and renames the related functions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

urnathan created this revision.Sep 17 2021, 8:01 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptSep 17 2021, 8:01 AM

urnathan requested review of this revision.Sep 17 2021, 8:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2021, 8:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B124395: Diff 373221.Sep 17 2021, 8:45 AM

All three conditions that use this value are trying to determine if the callee is expected to pop the sret pointer off the stack. They have some duplicate subtarget checks that can probably be simplified. They all check these conditions:

!IsMCU: MCU doesn't seem to pop the sret value ever
!Is64Bit: In the TCO eligbility check, this is expressed as Is32bit, but is functionally identical
!IsMSVC: MSVC doesn't do this callee-pop sret thing
!IsInReg: sret parameters marked with inreg are not popped (reasonable)

So, can I suggest this simplification:

replace the IsMCU parameter with the subtarget
move all three subtarget checks into these functions. Consider sharing them, maybe a helper like isSRetCalleePop, it's basically Is32bit && !IsMSVC && !IsMCU.
make the helper names more precise: returnHasCalleePopSRet / argsHaveCalleePopSRet

llvm/lib/Target/X86/X86ISelLowering.cpp
3352–3353	There's a subtlety here: LLVM doesn't require that sret arguments come first. However, it would be impossible for the callee to pop off the second argument and leave behind the first, so this convention is only ever used when the sret argument appears first. Therefore only the first argument needs to be checked.
4665–4668	This comment should live with the target checks if those get moved.

thanks for your comments. Moving the check-which-ABI bits into the predicate does make things much simpler. I chose to templatize it, rather than write the same thing twice. That does mean taking a SmallVector<T> reference, rather than ArrayRef, because template deduction. But that ends up with less work to do anyway. I reordered the checks to get the best short-circuiting (heuristic guesswork though).

would you prefer anonymous namespace, or stick with the static fn?

Herald added a subscriber: mstorsjo. · View Herald TranscriptOct 5 2021, 7:58 AM

Harbormaster completed remote builds in B127078: Diff 377244.Oct 5 2021, 8:40 AM

lgtm, thanks!

In D109966#3042923, @urnathan wrote:

would you prefer anonymous namespace, or stick with the static fn?

LLVM has a guideline preferring static:
https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

This revision is now accepted and ready to land.Oct 5 2021, 10:58 AM

Closed by commit rGc11e7b59d2e9: [X86][NFC] structure-return simplificiation (authored by urnathan). · Explain WhyOct 6 2021, 3:13 AM

This revision was automatically updated to reflect the committed changes.

urnathan added a commit: rGc11e7b59d2e9: [X86][NFC] structure-return simplificiation.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

93 lines

Diff 377488

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,333 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// StdCall calling convention seems to be standard for many Windows' API		// StdCall calling convention seems to be standard for many Windows' API
// routines and around. It differs from C calling convention just a little:		// routines and around. It differs from C calling convention just a little:
// callee should clean up the stack, not caller. Symbols should be also		// callee should clean up the stack, not caller. Symbols should be also
// decorated in some fancy way :) It doesn't support any vector arguments.		// decorated in some fancy way :) It doesn't support any vector arguments.
// For info on fast calling convention see Fast Calling Convention (tail call)		// For info on fast calling convention see Fast Calling Convention (tail call)
// implementation LowerX86_32FastCCCallTo.		// implementation LowerX86_32FastCCCallTo.

/// CallIsStructReturn - Determines whether a call uses struct return		/// Determines whether Args, either a set of outgoing arguments to a call, or a
/// semantics.		/// set of incoming args of a call, contains an sret pointer that the callee
enum StructReturnType {		/// pops
NotStructReturn,		template <typename T>
RegStructReturn,		static bool hasCalleePopSRet(const SmallVectorImpl<T> &Args,
StackStructReturn		const X86Subtarget &Subtarget) {
};		// Not C++20 (yet), so no concepts available.
static StructReturnType		static_assert(std::is_same<T, ISD::OutputArg>::value \|\|
callIsStructReturn(ArrayRef<ISD::OutputArg> Outs, bool IsMCU) {		std::is_same<T, ISD::InputArg>::value,
if (Outs.empty())		"requires ISD::OutputArg or ISD::InputArg");
return NotStructReturn;
		// Only 32-bit pops the sret. It's a 64-bit world these days, so early-out
		rnkUnsubmitted Not Done Reply Inline Actions There's a subtlety here: LLVM doesn't require that sret arguments come first. However, it would be impossible for the callee to pop off the second argument and leave behind the first, so this convention is only ever used when the sret argument appears first. Therefore only the first argument needs to be checked. rnk: There's a subtlety here: LLVM doesn't require that sret arguments come first. However, it would…
const ISD::ArgFlagsTy &Flags = Outs[0].Flags;		// for most compilations.
if (!Flags.isSRet())		if (!Subtarget.is32Bit())
return NotStructReturn;		return false;
if (Flags.isInReg() \|\| IsMCU)
return RegStructReturn;		if (Args.empty())
return StackStructReturn;		return false;
}
		// Most calls do not have an sret argument, check the arg next.
/// Determines whether a function uses struct return semantics.		const ISD::ArgFlagsTy &Flags = Args[0].Flags;
static StructReturnType		if (!Flags.isSRet() \|\| Flags.isInReg())
argsAreStructReturn(ArrayRef<ISD::InputArg> Ins, bool IsMCU) {		return false;
if (Ins.empty())
return NotStructReturn;		// The MSVCabi does not pop the sret.
		if (Subtarget.getTargetTriple().isOSMSVCRT())
const ISD::ArgFlagsTy &Flags = Ins[0].Flags;		return false;
if (!Flags.isSRet())
return NotStructReturn;		// MCUs don't pop the sret
if (Flags.isInReg() \|\| IsMCU)		if (Subtarget.isTargetMCU())
return RegStructReturn;		return false;
return StackStructReturn;
		// Callee pops argument
		return true;
}		}

/// Make a copy of an aggregate at address specified by "Src" to address		/// Make a copy of an aggregate at address specified by "Src" to address
/// "Dst" with size and alignment information specified by the specific		/// "Dst" with size and alignment information specified by the specific
/// parameter attribute. The copy will be passed as a byval function parameter.		/// parameter attribute. The copy will be passed as a byval function parameter.
static SDValue CreateCopyOfByValArgument(SDValue Src, SDValue Dst,		static SDValue CreateCopyOfByValArgument(SDValue Src, SDValue Dst,
SDValue Chain, ISD::ArgFlagsTy Flags,		SDValue Chain, ISD::ArgFlagsTy Flags,
SelectionDAG &DAG, const SDLoc &dl) {		SelectionDAG &DAG, const SDLoc &dl) {
▲ Show 20 Lines • Show All 617 Lines • ▼ Show 20 Lines	if (X86::isCalleePop(CallConv, Is64Bit, IsVarArg,
FuncInfo->setBytesToPopOnReturn(StackSize); // Callee pops everything.		FuncInfo->setBytesToPopOnReturn(StackSize); // Callee pops everything.
} else if (CallConv == CallingConv::X86_INTR && Ins.size() == 2) {		} else if (CallConv == CallingConv::X86_INTR && Ins.size() == 2) {
// X86 interrupts must pop the error code (and the alignment padding) if		// X86 interrupts must pop the error code (and the alignment padding) if
// present.		// present.
FuncInfo->setBytesToPopOnReturn(Is64Bit ? 16 : 4);		FuncInfo->setBytesToPopOnReturn(Is64Bit ? 16 : 4);
} else {		} else {
FuncInfo->setBytesToPopOnReturn(0); // Callee pops nothing.		FuncInfo->setBytesToPopOnReturn(0); // Callee pops nothing.
// If this is an sret function, the return should pop the hidden pointer.		// If this is an sret function, the return should pop the hidden pointer.
if (!Is64Bit && !canGuaranteeTCO(CallConv) &&		if (!canGuaranteeTCO(CallConv) && hasCalleePopSRet(Ins, Subtarget))
!Subtarget.getTargetTriple().isOSMSVCRT() &&
argsAreStructReturn(Ins, Subtarget.isTargetMCU()) == StackStructReturn)
FuncInfo->setBytesToPopOnReturn(4);		FuncInfo->setBytesToPopOnReturn(4);
}		}

if (!Is64Bit) {		if (!Is64Bit) {
// RegSaveFrameIndex is X86-64 only.		// RegSaveFrameIndex is X86-64 only.
FuncInfo->setRegSaveFrameIndex(0xAAAAAAA);		FuncInfo->setRegSaveFrameIndex(0xAAAAAAA);
}		}

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
CallingConv::ID CallConv = CLI.CallConv;		CallingConv::ID CallConv = CLI.CallConv;
bool &isTailCall = CLI.IsTailCall;		bool &isTailCall = CLI.IsTailCall;
bool isVarArg = CLI.IsVarArg;		bool isVarArg = CLI.IsVarArg;
const auto *CB = CLI.CB;		const auto *CB = CLI.CB;

MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
bool Is64Bit = Subtarget.is64Bit();		bool Is64Bit = Subtarget.is64Bit();
bool IsWin64 = Subtarget.isCallingConvWin64(CallConv);		bool IsWin64 = Subtarget.isCallingConvWin64(CallConv);
StructReturnType SR = callIsStructReturn(Outs, Subtarget.isTargetMCU());
bool IsSibcall = false;		bool IsSibcall = false;
bool IsGuaranteeTCO = MF.getTarget().Options.GuaranteedTailCallOpt \|\|		bool IsGuaranteeTCO = MF.getTarget().Options.GuaranteedTailCallOpt \|\|
CallConv == CallingConv::Tail \|\| CallConv == CallingConv::SwiftTail;		CallConv == CallingConv::Tail \|\| CallConv == CallingConv::SwiftTail;
		bool IsCalleePopSRet = !IsGuaranteeTCO && hasCalleePopSRet(Outs, Subtarget);
X86MachineFunctionInfo *X86Info = MF.getInfo<X86MachineFunctionInfo>();		X86MachineFunctionInfo *X86Info = MF.getInfo<X86MachineFunctionInfo>();
bool HasNCSR = (CB && isa<CallInst>(CB) &&		bool HasNCSR = (CB && isa<CallInst>(CB) &&
CB->hasFnAttr("no_caller_saved_registers"));		CB->hasFnAttr("no_caller_saved_registers"));
bool HasNoCfCheck = (CB && CB->doesNoCfCheck());		bool HasNoCfCheck = (CB && CB->doesNoCfCheck());
bool IsIndirectCall = (CB && isa<CallInst>(CB) && CB->isIndirectCall());		bool IsIndirectCall = (CB && isa<CallInst>(CB) && CB->isIndirectCall());
const Module *M = MF.getMMI().getModule();		const Module *M = MF.getMMI().getModule();
Metadata *IsCFProtectionSupported = M->getModuleFlag("cf-protection-branch");		Metadata *IsCFProtectionSupported = M->getModuleFlag("cf-protection-branch");

Show All 9 Lines	if (Subtarget.isPICStyleGOT() && !IsGuaranteeTCO && !IsMustTail) {
// that require lazy function symbol resolution. Using musttail or		// that require lazy function symbol resolution. Using musttail or
// GuaranteedTailCallOpt will override this.		// GuaranteedTailCallOpt will override this.
GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee);		GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee);
if (!G \|\| (!G->getGlobal()->hasLocalLinkage() &&		if (!G \|\| (!G->getGlobal()->hasLocalLinkage() &&
G->getGlobal()->hasDefaultVisibility()))		G->getGlobal()->hasDefaultVisibility()))
isTailCall = false;		isTailCall = false;
}		}


if (isTailCall && !IsMustTail) {		if (isTailCall && !IsMustTail) {
// Check if it's really possible to do a tail call.		// Check if it's really possible to do a tail call.
isTailCall = IsEligibleForTailCallOptimization(		isTailCall = IsEligibleForTailCallOptimization(
Callee, CallConv, SR == StackStructReturn, isVarArg, CLI.RetTy, Outs,		Callee, CallConv, IsCalleePopSRet, isVarArg, CLI.RetTy, Outs, OutVals,
OutVals, Ins, DAG);		Ins, DAG);

// Sibcalls are automatically detected tailcalls which do not require		// Sibcalls are automatically detected tailcalls which do not require
// ABI changes.		// ABI changes.
if (!IsGuaranteeTCO && isTailCall)		if (!IsGuaranteeTCO && isTailCall)
IsSibcall = true;		IsSibcall = true;

if (isTailCall)		if (isTailCall)
++NumTailCalls;		++NumTailCalls;
▲ Show 20 Lines • Show All 484 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
DAG.addCallSiteInfo(Chain.getNode(), std::move(CSInfo));		DAG.addCallSiteInfo(Chain.getNode(), std::move(CSInfo));

// Save heapallocsite metadata.		// Save heapallocsite metadata.
if (CLI.CB)		if (CLI.CB)
if (MDNode *HeapAlloc = CLI.CB->getMetadata("heapallocsite"))		if (MDNode *HeapAlloc = CLI.CB->getMetadata("heapallocsite"))
DAG.addHeapAllocSite(Chain.getNode(), HeapAlloc);		DAG.addHeapAllocSite(Chain.getNode(), HeapAlloc);

// Create the CALLSEQ_END node.		// Create the CALLSEQ_END node.
unsigned NumBytesForCalleeToPop;		unsigned NumBytesForCalleeToPop = 0; // Callee pops nothing.
if (X86::isCalleePop(CallConv, Is64Bit, isVarArg,		if (X86::isCalleePop(CallConv, Is64Bit, isVarArg,
DAG.getTarget().Options.GuaranteedTailCallOpt))		DAG.getTarget().Options.GuaranteedTailCallOpt))
NumBytesForCalleeToPop = NumBytes; // Callee pops everything		NumBytesForCalleeToPop = NumBytes; // Callee pops everything
else if (!Is64Bit && !canGuaranteeTCO(CallConv) &&		else if (!canGuaranteeTCO(CallConv) && IsCalleePopSRet)
!Subtarget.getTargetTriple().isOSMSVCRT() &&		// If this call passes a struct-return pointer, the callee
SR == StackStructReturn)		// pops that struct pointer.
		rnkUnsubmitted Not Done Reply Inline Actions This comment should live with the target checks if those get moved. rnk: This comment should live with the target checks if those get moved.
// If this is a call to a struct-return function, the callee
// pops the hidden struct pointer, so we have to push it back.
// This is common for Darwin/X86, Linux & Mingw32 targets.
// For MSVC Win32 targets, the caller pops the hidden struct pointer.
NumBytesForCalleeToPop = 4;		NumBytesForCalleeToPop = 4;
else
NumBytesForCalleeToPop = 0; // Callee pops nothing.

// Returns a flag for retval copy to use.		// Returns a flag for retval copy to use.
if (!IsSibcall) {		if (!IsSibcall) {
Chain = DAG.getCALLSEQ_END(Chain,		Chain = DAG.getCALLSEQ_END(Chain,
DAG.getIntPtrConstant(NumBytesToPop, dl, true),		DAG.getIntPtrConstant(NumBytesToPop, dl, true),
DAG.getIntPtrConstant(NumBytesForCalleeToPop, dl,		DAG.getIntPtrConstant(NumBytesForCalleeToPop, dl,
true),		true),
InFlag, dl);		InFlag, dl);
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	bool MatchingStackOffset(SDValue Arg, unsigned Offset, ISD::ArgFlagsTy Flags,
}		}

return Bytes == MFI.getObjectSize(FI);		return Bytes == MFI.getObjectSize(FI);
}		}

/// Check whether the call is eligible for tail call optimization. Targets		/// Check whether the call is eligible for tail call optimization. Targets
/// that want to do tail call optimization should implement this function.		/// that want to do tail call optimization should implement this function.
bool X86TargetLowering::IsEligibleForTailCallOptimization(		bool X86TargetLowering::IsEligibleForTailCallOptimization(
SDValue Callee, CallingConv::ID CalleeCC, bool IsCalleeStackStructRet,		SDValue Callee, CallingConv::ID CalleeCC, bool IsCalleePopSRet,
bool isVarArg, Type *RetTy, const SmallVectorImpl<ISD::OutputArg> &Outs,		bool isVarArg, Type *RetTy, const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals,		const SmallVectorImpl<SDValue> &OutVals,
const SmallVectorImpl<ISD::InputArg> &Ins, SelectionDAG &DAG) const {		const SmallVectorImpl<ISD::InputArg> &Ins, SelectionDAG &DAG) const {
if (!mayTailCallThisCC(CalleeCC))		if (!mayTailCallThisCC(CalleeCC))
return false;		return false;

// If -tailcallopt is specified, make fastcc functions tail-callable.		// If -tailcallopt is specified, make fastcc functions tail-callable.
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
Show All 36 Lines	bool X86TargetLowering::IsEligibleForTailCallOptimization(
// Also avoid sibcall optimization if we're an sret return fn and the callee		// Also avoid sibcall optimization if we're an sret return fn and the callee
// is incompatible. See comment in LowerReturn about why hasStructRetAttr is		// is incompatible. See comment in LowerReturn about why hasStructRetAttr is
// insufficient.		// insufficient.
if (MF.getInfo<X86MachineFunctionInfo>()->getSRetReturnReg()) {		if (MF.getInfo<X86MachineFunctionInfo>()->getSRetReturnReg()) {
// For a compatible tail call the callee must return our sret pointer. So it		// For a compatible tail call the callee must return our sret pointer. So it
// needs to be (a) an sret function itself and (b) we pass our sret as its		// needs to be (a) an sret function itself and (b) we pass our sret as its
// sret. Condition #b is harder to determine.		// sret. Condition #b is harder to determine.
return false;		return false;
} else if (Subtarget.is32Bit() && IsCalleeStackStructRet)		} else if (IsCalleePopSRet)
// In the i686 ABI, the sret pointer is callee-pop, so we cannot tail-call,		// The callee pops an sret, so we cannot tail-call, as our caller doesn't
// as our caller doesn't expect that.		// expect that.
return false;		return false;

// Do not sibcall optimize vararg calls unless all arguments are passed via		// Do not sibcall optimize vararg calls unless all arguments are passed via
// registers.		// registers.
LLVMContext &C = *DAG.getContext();		LLVMContext &C = *DAG.getContext();
if (isVarArg && !Outs.empty()) {		if (isVarArg && !Outs.empty()) {
// Optimizing for varargs on Win64 is unlikely to be safe without		// Optimizing for varargs on Win64 is unlikely to be safe without
// additional testing.		// additional testing.
▲ Show 20 Lines • Show All 49,126 Lines • Show Last 20 Lines