This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Eliminate stack frame in non-leaf function based on shrink wrapping
AbandonedPublic

Authored by inouehrs on Mar 13 2017, 9:41 AM.

Download Raw Diff

Details

Reviewers

qcolombet
lei
syzaara
kbarton
sfertile
jtony
hfinkel
nemanjai

Summary

We can expand the scope of shrink wrapping by not creating a stack frame in non-leaf functions.
For example, in the call sequence of A->B->C , we can help shrink wrapping while compiling B by allowing C directly return the control to A when exiting from C. I call it direct return.
In this case, we do not need to create stack frame in B and it increases the opportunity of shrink wrapping.
To apply direct return, we need to confirm some conditions including:

invocation B->C must be a tail call (i.e. no instruction between call and return in B).
invocation B->C must be with the internal linkage since direct return does not comply with ABI.
other ABI specific checks.

In this patch, this optimization is enabled only for PowerPC64 with ELFv2 ABI, but it might be applicable for other platforms by implementing a platform-specific part.

The original motivation of this patch is to optimize the hot method of tcmalloc. In which GCC applies shrink wrapping with the direct return, but LLVM cannot. By applying this patch and basic block deduplication ( https://reviews.llvm.org/D30774 ), LLVM can do shrink wrapping for this hot method.

Diff Detail

Event Timeline

inouehrs created this revision.Mar 13 2017, 9:41 AM

inouehrs added a reviewer: qcolombet.Mar 13 2017, 10:07 AM

Hi,

I haven't looked at the patch proper yet.
Inlined a couple of stylistic comment.

One high level comment, I am not super confortable to change the "status" of the shrink-wrapping pass from a pure analysis to an optimization, i.e., with this patch we actually change the code in the machine function. That might be the right thing to do, but I have to have a closer look to give an informed opinion.

Cheers,
-Quentin

include/llvm/Target/TargetInstrInfo.h
162	Use /// (doxygen style comment)
162	supported
lib/CodeGen/ShrinkWrap.cpp
87	applied
137	can directly return
146	Move the comment of what this method is doing here for consistency.
146	Lower case for the first letter of the method name.
450	Iterate over the basic block etc.
450	What do you mean by target? Candidate?
452	to directly
452	You mean the current MF as callee, right?
472	Add a message in the assert, eg. By construction we must have one successor here
502	Period
625	Proper sentence please (Capital letter at the beginning and period at the end). See http://llvm.org/docs/CodingStandards.html#commenting

I updated the comments and assert based on the suggestion.
I did rebase to the latest code base.

Quentin,

Thank you so much for your suggestions.
I am thinking how I can avoid modifying the code in this 'analysis' pass. I appreciate your further advice.
Best regards.

@inouehrs Should this patch be abandoned and we focus on when we can refine the tail-call checks during lowering?

In D30900#762409, @sfertile wrote:

@inouehrs Should this patch be abandoned and we focus on when we can refine the tail-call checks during lowering?

Yes, I think we should try to show whether this approach provides any benefit over simply using the tail call optimization and abandon it if it doesn't.

Thanks for the suggestions. Let me thimk the differences between TCO and this approach.
(@sfertile Sorry seems to have missed your comment above.)

The intention of this patch is avoid creating a stack frame if all method calls in a function is optimized with tailcallopt.
But I think I should do such optimization without bothering shrink wrap analysis; x86 backend seems already doing such opt.
So, I will abandon this patch. Thank you.

To seek another approach as discussed above.

inouehrs mentioned this in D30774: [SimplifyCFG] Merging duplicated basic blocks.Jul 21 2017, 6:16 AM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetInstrInfo.h

9 lines

lib/

CodeGen/

ShrinkWrap.cpp

148 lines

Target/

PowerPC/

8 lines

5 lines

3 lines

30 lines

PPCMachineFunctionInfo.h

9 lines

test/

CodeGen/

PowerPC/

shrinkwrap_direct_return.ll

121 lines

Diff 92321

include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	public:
/// pointer and operating without, through the use of these two instructions.		/// pointer and operating without, through the use of these two instructions.
///		///
unsigned getCallFrameSetupOpcode() const { return CallFrameSetupOpcode; }		unsigned getCallFrameSetupOpcode() const { return CallFrameSetupOpcode; }
unsigned getCallFrameDestroyOpcode() const { return CallFrameDestroyOpcode; }		unsigned getCallFrameDestroyOpcode() const { return CallFrameDestroyOpcode; }

unsigned getCatchReturnOpcode() const { return CatchRetOpcode; }		unsigned getCatchReturnOpcode() const { return CatchRetOpcode; }
unsigned getReturnOpcode() const { return ReturnOpcode; }		unsigned getReturnOpcode() const { return ReturnOpcode; }

		/// In the call sequence of A->B->C for example, we can help Shrink Wrapping
		/// by allowing directly return from C to A if possible (e.g. tail call in B).
		/// This method returns the opcode to call C from B if it is supported.

		/// If direct return is not supported on the target, it returns -1.
		qcolombetUnsubmitted Not Done Reply Inline Actions Use /// (doxygen style comment) qcolombet: Use /// (doxygen style comment)
		qcolombetUnsubmitted Not Done Reply Inline Actions supported qcolombet: supported
		virtual unsigned getCallOpcodeForDirectReturn() const { return (unsigned)-1; }
		/// Return true if this call instruction is eligible for direct return.
		virtual bool isEligibleForDirectReturn(const MachineInstr &CI) const { return false; }

/// Returns the actual stack pointer adjustment made by an instruction		/// Returns the actual stack pointer adjustment made by an instruction
/// as part of a call sequence. By default, only call frame setup/destroy		/// as part of a call sequence. By default, only call frame setup/destroy
/// instructions adjust the stack, but targets may want to override this		/// instructions adjust the stack, but targets may want to override this
/// to enable more fine-grained adjustment, or adjust by a different value.		/// to enable more fine-grained adjustment, or adjust by a different value.
virtual int getSPAdjust(const MachineInstr &MI) const;		virtual int getSPAdjust(const MachineInstr &MI) const;

/// Return true if the instruction is a "coalescable" extension instruction.		/// Return true if the instruction is a "coalescable" extension instruction.
/// That is, it's like a copy where it's legal for the source to overlap the		/// That is, it's like a copy where it's legal for the source to overlap the
▲ Show 20 Lines • Show All 1,449 Lines • Show Last 20 Lines

lib/CodeGen/ShrinkWrap.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
#define DEBUG_TYPE "shrink-wrap"		#define DEBUG_TYPE "shrink-wrap"

using namespace llvm;		using namespace llvm;

STATISTIC(NumFunc, "Number of functions");		STATISTIC(NumFunc, "Number of functions");
STATISTIC(NumCandidates, "Number of shrink-wrapping candidates");		STATISTIC(NumCandidates, "Number of shrink-wrapping candidates");
STATISTIC(NumCandidatesDropped,		STATISTIC(NumCandidatesDropped,
"Number of shrink-wrapping candidates dropped because of frequency");		"Number of shrink-wrapping candidates dropped because of frequency");
		STATISTIC(NumDirectReturn,
		"Number of method calls applied direct return optimization");
		qcolombetUnsubmitted Not Done Reply Inline Actions applied qcolombet: applied

static cl::opt<cl::boolOrDefault>		static cl::opt<cl::boolOrDefault>
EnableShrinkWrapOpt("enable-shrink-wrap", cl::Hidden,		EnableShrinkWrapOpt("enable-shrink-wrap", cl::Hidden,
cl::desc("enable the shrink-wrapping pass"));		cl::desc("enable the shrink-wrapping pass"));

		static cl::opt<bool>
		EnableDirectReturnOpt("shrink-wrap-direct-return-opt", cl::Hidden, cl::init(true),
		cl::desc("enable the direct return in shrink wrapping"));

namespace {		namespace {
/// \brief Class to determine where the safe point to insert the		/// \brief Class to determine where the safe point to insert the
/// prologue and epilogue are.		/// prologue and epilogue are.
/// Unlike the paper from Fred C. Chow, PLDI'88, that introduces the		/// Unlike the paper from Fred C. Chow, PLDI'88, that introduces the
/// shrink-wrapping term for prologue/epilogue placement, this pass		/// shrink-wrapping term for prologue/epilogue placement, this pass
/// does not rely on expensive data-flow analysis. Instead we use the		/// does not rely on expensive data-flow analysis. Instead we use the
/// dominance properties and loop information to decide which point		/// dominance properties and loop information to decide which point
/// are safe for such insertion.		/// are safe for such insertion.
Show All 24 Lines	class ShrinkWrap : public MachineFunctionPass {
unsigned FrameDestroyOpcode;		unsigned FrameDestroyOpcode;
/// Entry block.		/// Entry block.
const MachineBasicBlock *Entry;		const MachineBasicBlock *Entry;
typedef SmallSetVector<unsigned, 16> SetOfRegs;		typedef SmallSetVector<unsigned, 16> SetOfRegs;
/// Registers that need to be saved for the current function.		/// Registers that need to be saved for the current function.
mutable SetOfRegs CurrentCSRs;		mutable SetOfRegs CurrentCSRs;
/// Current MachineFunction.		/// Current MachineFunction.
MachineFunction *MachineFunc;		MachineFunction *MachineFunc;
		/// Call instructions that can directly return from the callee of
		qcolombetUnsubmitted Not Done Reply Inline Actions can directly return qcolombet: can directly return
		/// this method to the caller of this method (by skinpping this method).
		SmallVector<MachineInstr*,4> DirectReturnCandidates;

/// \brief Check if \p MI uses or defines a callee-saved register or		/// \brief Check if \p MI uses or defines a callee-saved register or
/// a frame index. If this is the case, this means \p MI must happen		/// a frame index. If this is the case, this means \p MI must happen
/// after Save and before Restore.		/// after Save and before Restore.
bool useOrDefCSROrFI(const MachineInstr &MI, RegScavenger *RS) const;		bool useOrDefCSROrFI(const MachineInstr &MI, RegScavenger *RS) const;

		/// This function iterate over the basic blocks to find the candidates of
		qcolombetUnsubmitted Not Done Reply Inline Actions Move the comment of what this method is doing here for consistency. qcolombet: Move the comment of what this method is doing here for consistency.
		qcolombetUnsubmitted Not Done Reply Inline Actions Lower case for the first letter of the method name. qcolombet: Lower case for the first letter of the method name.
		/// the direct return optimization.
		/// For example, in the call sequence of A->B->C (B is the current method),
		/// the direct return allows C to directly pass the control to A
		/// when C returns.
		/// To apply direct return, the returned value from C directly is passed
		/// to A as the return value of B (i.e. no inst. between call and return).
		/// Also, call of C must be with the internal linkage since direct return
		/// does not comply with ABI.
		void findDirectReturnCandidates(const TargetInstrInfo &TII, RegScavenger *RS);

const SetOfRegs &getCurrentCSRs(RegScavenger *RS) const {		const SetOfRegs &getCurrentCSRs(RegScavenger *RS) const {
if (CurrentCSRs.empty()) {		if (CurrentCSRs.empty()) {
BitVector SavedRegs;		BitVector SavedRegs;
const TargetFrameLowering *TFI =		const TargetFrameLowering *TFI =
MachineFunc->getSubtarget().getFrameLowering();		MachineFunc->getSubtarget().getFrameLowering();

TFI->determineCalleeSaves(*MachineFunc, SavedRegs, RS);		TFI->determineCalleeSaves(*MachineFunc, SavedRegs, RS);

Show All 21 Lines	void init(MachineFunction &MF) {
MBFI = &getAnalysis<MachineBlockFrequencyInfo>();		MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
EntryFreq = MBFI->getEntryFreq();		EntryFreq = MBFI->getEntryFreq();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
FrameSetupOpcode = TII.getCallFrameSetupOpcode();		FrameSetupOpcode = TII.getCallFrameSetupOpcode();
FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();		FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();
Entry = &MF.front();		Entry = &MF.front();
CurrentCSRs.clear();		CurrentCSRs.clear();
		DirectReturnCandidates.clear();
MachineFunc = &MF;		MachineFunc = &MF;

++NumFunc;		++NumFunc;
}		}

/// Check whether or not Save and Restore points are still interesting for		/// Check whether or not Save and Restore points are still interesting for
/// shrink-wrapping.		/// shrink-wrapping.
bool ArePointsInteresting() const { return Save != Entry && Save && Restore; }		bool ArePointsInteresting() const { return Save != Entry && Save && Restore; }
Show All 33 Lines
INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_END(ShrinkWrap, "shrink-wrap", "Shrink Wrap Pass", false, false)		INITIALIZE_PASS_END(ShrinkWrap, "shrink-wrap", "Shrink Wrap Pass", false, false)

bool ShrinkWrap::useOrDefCSROrFI(const MachineInstr &MI,		bool ShrinkWrap::useOrDefCSROrFI(const MachineInstr &MI,
RegScavenger *RS) const {		RegScavenger *RS) const {

		// If DirectReturnCandidates is in the BB,
		// Call frame creation can be skipped.
		// Use or def of CSR has been already checked.
		for (MachineInstr *CI : DirectReturnCandidates)
		if (MI.getParent() == CI->getParent())
		return false;

if (MI.getOpcode() == FrameSetupOpcode \|\|		if (MI.getOpcode() == FrameSetupOpcode \|\|
MI.getOpcode() == FrameDestroyOpcode) {		MI.getOpcode() == FrameDestroyOpcode) {
DEBUG(dbgs() << "Frame instruction: " << MI << '\n');		DEBUG(dbgs() << "Frame instruction: " << MI << '\n');
return true;		return true;
}		}
for (const MachineOperand &MO : MI.operands()) {		for (const MachineOperand &MO : MI.operands()) {
bool UseOrDefCSR = false;		bool UseOrDefCSR = false;
if (MO.isReg()) {		if (MO.isReg()) {
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	for (const MachineBasicBlock *SuccBB : MBB->successors()) {
// Otherwise, we have an irreducible graph.		// Otherwise, we have an irreducible graph.
if (!isProperBackedge(MLI, MBB, SuccBB))		if (!isProperBackedge(MLI, MBB, SuccBB))
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		void
		qcolombetUnsubmitted Not Done Reply Inline Actions Iterate over the basic block etc. qcolombet: Iterate over the basic block etc.
		qcolombetUnsubmitted Not Done Reply Inline Actions What do you mean by target? Candidate? qcolombet: What do you mean by target? Candidate?
		ShrinkWrap::findDirectReturnCandidates(const TargetInstrInfo &TII, RegScavenger *RS) {
		for (MachineBasicBlock &MBB : *MachineFunc) {
		qcolombetUnsubmitted Not Done Reply Inline Actions to directly qcolombet: to directly
		qcolombetUnsubmitted Not Done Reply Inline Actions You mean the current MF as callee, right? qcolombet: You mean the current MF as callee, right?
		DEBUG(dbgs() << "findDirectReturnCandidates: " << MBB.getNumber() << ' '
		<< MBB.getName() << '\n');

		// This BB is not a target if it has multiple or no successors
		if (MBB.succ_size() != 1)
		continue;

		// This BB is not a target if the successor is not a return block
		// or if the successor includes instructions before the return
		MachineBasicBlock NextMBB = (MBB.succ_begin());
		assert(NextMBB != NULL &&
		"By construction we must have one successor here");
		if (!NextMBB->isReturnBlock())
		continue;

		if (!NextMBB->getFirstNonDebugInstr()->getDesc().isReturn())
		continue;

		// We now investigate instructions in this BB
		MachineInstr *CI = NULL;
		qcolombetUnsubmitted Not Done Reply Inline Actions Add a message in the assert, eg. By construction we must have one successor here qcolombet: Add a message in the assert, eg. By construction we must have one successor here
		for (MachineInstr &MI : MBB) {
		if (MI.getDesc().isCall() && CI == NULL) {
		// we can optimize only internal linkage
		// since external linkers cannot handle direct return
		if (!(MI.getOperand(0).isGlobal() &&
		MI.getOperand(0).getGlobal()->hasInternalLinkage()))
		break;

		// we need to do platform-specific checks
		if (!TII.isEligibleForDirectReturn(MI))
		break;

		CI = &MI;
		continue;
		}

		if (MI.getOpcode() == FrameSetupOpcode \|\|
		MI.getOpcode() == FrameDestroyOpcode)
		continue;

		// If this BB accesses a CSR, or there is another instructions after call
		// we cannot optimize this BB.
		if ((CI != NULL && !MI.getDesc().isUnconditionalBranch()) \|\|
		useOrDefCSROrFI(MI, RS)) {
		CI = NULL;
		break;
		}
		}
		if (CI != NULL) {
		DEBUG(dbgs() << "Call instruction in this BB is selected "
		qcolombetUnsubmitted Not Done Reply Inline Actions Period qcolombet: Period
		"as a target of birect return\n");
		DirectReturnCandidates.push_back(CI);
		}
		}
		}

bool ShrinkWrap::runOnMachineFunction(MachineFunction &MF) {		bool ShrinkWrap::runOnMachineFunction(MachineFunction &MF) {
if (MF.empty() \|\| !isShrinkWrapEnabled(MF))		if (MF.empty() \|\| !isShrinkWrapEnabled(MF))
return false;		return false;

DEBUG(dbgs() << "**** Analysing " << MF.getName() << '\n');		DEBUG(dbgs() << "**** Analysing " << MF.getName() << '\n');

init(MF);		init(MF);

if (isIrreducibleCFG(MF, *MLI)) {		if (isIrreducibleCFG(MF, *MLI)) {
// If MF is irreducible, a block may be in a loop without		// If MF is irreducible, a block may be in a loop without
// MachineLoopInfo reporting it. I.e., we may use the		// MachineLoopInfo reporting it. I.e., we may use the
// post-dominance property in loops, which lead to incorrect		// post-dominance property in loops, which lead to incorrect
// results. Moreover, we may miss that the prologue and		// results. Moreover, we may miss that the prologue and
// epilogue are not in the same loop, leading to unbalanced		// epilogue are not in the same loop, leading to unbalanced
// construction/deconstruction of the stack frame.		// construction/deconstruction of the stack frame.
DEBUG(dbgs() << "Irreducible CFGs are not supported yet\n");		DEBUG(dbgs() << "Irreducible CFGs are not supported yet\n");
return false;		return false;
}		}

const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
std::unique_ptr<RegScavenger> RS(		std::unique_ptr<RegScavenger> RS(
TRI->requiresRegisterScavenging(MF) ? new RegScavenger() : nullptr);		TRI->requiresRegisterScavenging(MF) ? new RegScavenger() : nullptr);

		// We look for return targets before finding save/restore points.
		// The direct return is supported, if DirectReturnCallOpcode != -1.
		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
		unsigned DirectReturnCallOpcode = TII.getCallOpcodeForDirectReturn();
		if (EnableDirectReturnOpt && DirectReturnCallOpcode != (unsigned)-1)
		findDirectReturnCandidates(TII, RS.get());

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
DEBUG(dbgs() << "Look into: " << MBB.getNumber() << ' ' << MBB.getName()		DEBUG(dbgs() << "Look into: " << MBB.getNumber() << ' ' << MBB.getName()
<< '\n');		<< '\n');

if (MBB.isEHFuncletEntry()) {		if (MBB.isEHFuncletEntry()) {
DEBUG(dbgs() << "EH Funclets are not supported yet.\n");		DEBUG(dbgs() << "EH Funclets are not supported yet.\n");
return false;		return false;
}		}
Show All 10 Lines	for (const MachineInstr &MI : MBB) {
DEBUG(dbgs() << "No Shrink wrap candidate found\n");		DEBUG(dbgs() << "No Shrink wrap candidate found\n");
return false;		return false;
}		}
// No need to look for other instructions, this basic block		// No need to look for other instructions, this basic block
// will already be part of the handled region.		// will already be part of the handled region.
break;		break;
}		}
}		}

		if (!ArePointsInteresting() && !DirectReturnCandidates.empty()) {
		// Shrink wrapping need to select save and restore points,
		// but there is no more frame/CSR code due to direct return support.
		// We create an unreachable BB and select it for save and restore points.
		// This unreachable BB will be eliminated by control flow optimizer.
		MachineBasicBlock ReturnBlock = (DirectReturnCandidates[0]->getParent()->succ_begin());
		MachineBasicBlock *NewMBB = MF.CreateMachineBasicBlock();
		MF.push_back(NewMBB);
		TII.insertUnconditionalBranch(NewMBB, &ReturnBlock, ReturnBlock->findBranchDebugLoc());
		NewMBB->addSuccessor(ReturnBlock);
		updateSaveRestorePoints(*NewMBB, RS.get());
		DEBUG(dbgs() << "Created an unreachable BB#" << NewMBB->getNumber()
		<< " to select for creating stack frame\n");
		}

if (!ArePointsInteresting()) {		if (!ArePointsInteresting()) {
// If the points are not interesting at this point, then they must be null		// If the points are not interesting at this point, then they must be null
// because it means we did not encounter any frame/CSR related code.		// because it means we did not encounter any frame/CSR related code.
// Otherwise, we would have returned from the previous loop.		// Otherwise, we would have returned from the previous loop.
assert(!Save && !Restore && "We miss a shrink-wrap opportunity?!");		assert(!Save && !Restore && "We miss a shrink-wrap opportunity?!");
DEBUG(dbgs() << "Nothing to shrink-wrap\n");		DEBUG(dbgs() << "Nothing to shrink-wrap\n");
return false;		return false;
}		}
Show All 27 Lines	if (!IsSaveCheap \|\| !TargetCanUseSaveAsPrologue) {
Restore = FindIDom<>(Restore, Restore->successors(), MPDT);		Restore = FindIDom<>(Restore, Restore->successors(), MPDT);
if (!Restore)		if (!Restore)
break;		break;
NewBB = Restore;		NewBB = Restore;
}		}
updateSaveRestorePoints(*NewBB, RS.get());		updateSaveRestorePoints(*NewBB, RS.get());
} while (Save && Restore);		} while (Save && Restore);

		if (!DirectReturnCandidates.empty()) {
		const MCInstrDesc &MCID = TII.get(DirectReturnCallOpcode);
		qcolombetUnsubmitted Not Done Reply Inline Actions Proper sentence please (Capital letter at the beginning and period at the end). See http://llvm.org/docs/CodingStandards.html#commenting qcolombet: Proper sentence please (Capital letter at the beginning and period at the end). See http://llvm.
		for (MachineInstr *CandidateCI : DirectReturnCandidates) {
		MachineBasicBlock *MBB = CandidateCI->getParent();
		// If the candidate is after the stack save, we don't modify it
		if (MDT->dominates(Save, MBB)) {
		assert(MPDT->dominates(Restore, MBB) &&
		"inconsistent dominance information");
		continue;
		}

		DEBUG(dbgs() << "The call in " << MBB->getName()
		<< " is modified for direct return\n");
		// We replace the opcode of the call by the special one,
		// then eliminate unnecessary instructions.
		// The control never go back from the call.
		CandidateCI->setDesc(MCID);
		if (MBB->back().getDesc().isBranch())
		MBB->pop_back();
		MBB->removeSuccessor(MBB->succ_begin());
		assert(MBB->succ_empty() &&
		"The candidate MBB has only one succrssor and it must be removed.");
		MachineInstr FSetupMI = NULL, FDestroyMI = NULL;
		for (MachineInstr &MI : *MBB) {
		if (MI.getOpcode() == FrameSetupOpcode)
		FSetupMI = &MI;
		if (MI.getOpcode() == FrameDestroyOpcode)
		FDestroyMI = &MI;
		}
		assert((FSetupMI && FDestroyMI) && "We cannot find frame instructions");
		MBB->erase(FSetupMI);
		MBB->erase(FDestroyMI);

		NumDirectReturn++;
		}
		}

if (!ArePointsInteresting()) {		if (!ArePointsInteresting()) {
++NumCandidatesDropped;		++NumCandidatesDropped;
return false;		return false;
}		}

DEBUG(dbgs() << "Final shrink wrap candidates:\nSave: " << Save->getNumber()		DEBUG(dbgs() << "Final shrink wrap candidates:\nSave: " << Save->getNumber()
<< ' ' << Save->getName() << "\nRestore: "		<< ' ' << Save->getName() << "\nRestore: "
<< Restore->getNumber() << ' ' << Restore->getName() << '\n');		<< Restore->getNumber() << ' ' << Restore->getName() << '\n');
Show All 34 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,229 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumOps; ++i) {
unsigned Align =		unsigned Align =
CalculateStackSlotAlignment(ArgVT, OrigVT, Flags, PtrByteSize);		CalculateStackSlotAlignment(ArgVT, OrigVT, Flags, PtrByteSize);
NumBytes = ((NumBytes + Align - 1) / Align) * Align;		NumBytes = ((NumBytes + Align - 1) / Align) * Align;

NumBytes += CalculateStackSlotSize(ArgVT, Flags, PtrByteSize);		NumBytes += CalculateStackSlotSize(ArgVT, Flags, PtrByteSize);
if (Flags.isInConsecutiveRegsLast())		if (Flags.isInConsecutiveRegsLast())
NumBytes = ((NumBytes + PtrByteSize - 1)/PtrByteSize) * PtrByteSize;		NumBytes = ((NumBytes + PtrByteSize - 1)/PtrByteSize) * PtrByteSize;
}		}
		PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
		if (CallConv != CallingConv::Fast) {
		FuncInfo->setHasParameterArea(HasParameterArea);
		} else {
		bool FitInRegs = (NumGPRsUsed <= NumGPRs && NumVRsUsed <= NumVRs &&
		NumFPRsUsed <= NumFPRs);
		FuncInfo->setHasParameterArea(!FitInRegs);
		}

unsigned NumBytesActuallyUsed = NumBytes;		unsigned NumBytesActuallyUsed = NumBytes;

// In the old ELFv1 ABI,		// In the old ELFv1 ABI,
// the prolog code of the callee may store up to 8 GPR argument registers to		// the prolog code of the callee may store up to 8 GPR argument registers to
// the stack, allowing va_start to index over them in memory if its varargs.		// the stack, allowing va_start to index over them in memory if its varargs.
// Because we cannot tell if this is needed on the caller side, we have to		// Because we cannot tell if this is needed on the caller side, we have to
// conservatively assume that it is needed. As such, make sure we have at		// conservatively assume that it is needed. As such, make sure we have at
▲ Show 20 Lines • Show All 7,694 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstr64Bit.td

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	let Uses = [RM], isCodeGenOnly = 1 in {
def BL8_NOP_TLS : IForm_and_DForm_4_zero<18, 0, 1, 24,		def BL8_NOP_TLS : IForm_and_DForm_4_zero<18, 0, 1, 24,
(outs), (ins tlscall:$func),		(outs), (ins tlscall:$func),
"bl $func\n\tnop", IIC_BrB, []>;		"bl $func\n\tnop", IIC_BrB, []>;

def BLA8_NOP : IForm_and_DForm_4_zero<18, 1, 1, 24,		def BLA8_NOP : IForm_and_DForm_4_zero<18, 1, 1, 24,
(outs), (ins abscalltarget:$func),		(outs), (ins abscalltarget:$func),
"bla $func\n\tnop", IIC_BrB,		"bla $func\n\tnop", IIC_BrB,
[(PPCcall_nop (i64 imm:$func))]>;		[(PPCcall_nop (i64 imm:$func))]>;

		// This opcode is for method call without setting link register
		// used for direct return optimization in shrink wrapping
		def B_CALL : IForm<18, 0, 0, (outs), (ins calltarget:$func),
		"b $func", IIC_BrB, []>;
}		}
let Uses = [CTR8, RM] in {		let Uses = [CTR8, RM] in {
def BCTRL8 : XLForm_2_ext<19, 528, 20, 0, 1, (outs), (ins),		def BCTRL8 : XLForm_2_ext<19, 528, 20, 0, 1, (outs), (ins),
"bctrl", IIC_BrB, [(PPCbctrl)]>,		"bctrl", IIC_BrB, [(PPCbctrl)]>,
Requires<[In64BitMode]>;		Requires<[In64BitMode]>;

let isCodeGenOnly = 1 in {		let isCodeGenOnly = 1 in {
def BCCCTRL8 : XLForm_2_br<19, 528, 1, (outs), (ins pred:$cond),		def BCCCTRL8 : XLForm_2_br<19, 528, 1, (outs), (ins pred:$cond),
▲ Show 20 Lines • Show All 1,158 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.h

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	protected:
/// rotate amt is zero. We also have to munge the immediates a bit.		/// rotate amt is zero. We also have to munge the immediates a bit.
MachineInstr *commuteInstructionImpl(MachineInstr &MI, bool NewMI,		MachineInstr *commuteInstructionImpl(MachineInstr &MI, bool NewMI,
unsigned OpIdx1,		unsigned OpIdx1,
unsigned OpIdx2) const override;		unsigned OpIdx2) const override;

public:		public:
explicit PPCInstrInfo(PPCSubtarget &STI);		explicit PPCInstrInfo(PPCSubtarget &STI);

		unsigned getCallOpcodeForDirectReturn() const override;
		bool isEligibleForDirectReturn (const MachineInstr &CI) const override;

/// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As		/// getRegisterInfo - TargetInstrInfo is a superset of MRegister info. As
/// such, whenever a client has an instance of instruction info, it should		/// such, whenever a client has an instance of instruction info, it should
/// always be able to get register info as well (through this method).		/// always be able to get register info as well (through this method).
///		///
const PPCRegisterInfo &getRegisterInfo() const { return RI; }		const PPCRegisterInfo &getRegisterInfo() const { return RI; }

ScheduleHazardRecognizer *		ScheduleHazardRecognizer *
CreateTargetHazardRecognizer(const TargetSubtargetInfo *STI,		CreateTargetHazardRecognizer(const TargetSubtargetInfo *STI,
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.cpp

	Show First 20 Lines • Show All 1,925 Lines • ▼ Show 20 Lines
	}			}

	const TargetRegisterClass *			const TargetRegisterClass *
	PPCInstrInfo::updatedRC(const TargetRegisterClass *RC) const {			PPCInstrInfo::updatedRC(const TargetRegisterClass *RC) const {
	if (Subtarget.hasVSX() && RC == &PPC::VRRCRegClass)			if (Subtarget.hasVSX() && RC == &PPC::VRRCRegClass)
	return &PPC::VSRCRegClass;			return &PPC::VSRCRegClass;
	return RC;			return RC;
	}			}

				unsigned PPCInstrInfo::getCallOpcodeForDirectReturn() const {
				if (Subtarget.isPPC64() && Subtarget.isELFv2ABI()) return PPC::B_CALL;
				return -1;
				}

				bool PPCInstrInfo::isEligibleForDirectReturn (const MachineInstr &CI) const {
				assert(CI.getDesc().isCall() &&
				isa<Function>(CI.getOperand(0).getGlobal()) &&
				"isEligibleForDirectReturn requires call instrcution");
				assert(Subtarget.isPPC64() && Subtarget.isELFv2ABI() &&
				"We only support direct return on ELFv2 for PPC64");

				const Function *CalleeFunc = dyn_cast<Function>(CI.getOperand(0).getGlobal());
				const MachineFunction *CallerMF = CI.getParent()->getParent();

				// If this method returns another datatype, we cannot optimize.
				if (!CallerMF->getFunction()->getReturnType()->isVoidTy() &&
				CallerMF->getFunction()->getReturnType() != CalleeFunc->getReturnType())
				return false;

				// A function call that requires parameter area exists in this method,
				// so we do not eliminate stack creation.
				// FIXME: It is better to check the use of parameter area of this callee
				if (CallerMF->getInfo<PPCFunctionInfo>()->hasParameterArea())
				return false;
				assert(!CalleeFunc->isVarArg() && "Vararg function uses parameter area");

				return true;
				}

lib/Target/PowerPC/PPCMachineFunctionInfo.h

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	class PPCFunctionInfo : public MachineFunctionInfo {

/// Whether this uses the PIC Base register or not.		/// Whether this uses the PIC Base register or not.
bool UsesPICBase = false;		bool UsesPICBase = false;

/// True if this function has a subset of CSRs that is handled explicitly via		/// True if this function has a subset of CSRs that is handled explicitly via
/// copies		/// copies
bool IsSplitCSR = false;		bool IsSplitCSR = false;

		/// True if this method has parameter save area in stack frame
		bool HasParameterArea = false;

public:		public:
explicit PPCFunctionInfo(MachineFunction &MF) : MF(MF) {}		explicit PPCFunctionInfo(MachineFunction &MF) : MF(MF) {}

int getFramePointerSaveIndex() const { return FramePointerSaveIndex; }		int getFramePointerSaveIndex() const { return FramePointerSaveIndex; }
void setFramePointerSaveIndex(int Idx) { FramePointerSaveIndex = Idx; }		void setFramePointerSaveIndex(int Idx) { FramePointerSaveIndex = Idx; }

int getReturnAddrSaveIndex() const { return ReturnAddrSaveIndex; }		int getReturnAddrSaveIndex() const { return ReturnAddrSaveIndex; }
void setReturnAddrSaveIndex(int idx) { ReturnAddrSaveIndex = idx; }		void setReturnAddrSaveIndex(int idx) { ReturnAddrSaveIndex = idx; }
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	public:
void addMustSaveCR(unsigned Reg) { MustSaveCRs.push_back(Reg); }		void addMustSaveCR(unsigned Reg) { MustSaveCRs.push_back(Reg); }

void setUsesPICBase(bool uses) { UsesPICBase = uses; }		void setUsesPICBase(bool uses) { UsesPICBase = uses; }
bool usesPICBase() const { return UsesPICBase; }		bool usesPICBase() const { return UsesPICBase; }

bool isSplitCSR() const { return IsSplitCSR; }		bool isSplitCSR() const { return IsSplitCSR; }
void setIsSplitCSR(bool s) { IsSplitCSR = s; }		void setIsSplitCSR(bool s) { IsSplitCSR = s; }

		void setHasParameterArea(bool s) {
		// The current method has parameter area if one of callees requires it.
		HasParameterArea \|= s;
		}
		bool hasParameterArea() const { return HasParameterArea; }

MCSymbol *getPICOffsetSymbol() const;		MCSymbol *getPICOffsetSymbol() const;

MCSymbol *getGlobalEPSymbol() const;		MCSymbol *getGlobalEPSymbol() const;
MCSymbol *getLocalEPSymbol() const;		MCSymbol *getLocalEPSymbol() const;
MCSymbol *getTOCOffsetSymbol() const;		MCSymbol *getTOCOffsetSymbol() const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_POWERPC_PPCMACHINEFUNCTIONINFO_H		#endif // LLVM_LIB_TARGET_POWERPC_PPCMACHINEFUNCTIONINFO_H

test/CodeGen/PowerPC/shrinkwrap_direct_return.ll

This file was added.

				; RUN: llc -verify-machineinstrs -O1 -mcpu=ppc64 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s

				define i64 @func1(i64 %a) {
				; Since both functions calls are internal, we don't need to create/destroy stack frame at all in func1.
				; CHECK-LABEL: func1:
				; CHECK-NOT: mflr
				; CHECK-DAG: b internal_func1
				; CHECK-DAG: b internal_func2
				; CHECK-NOT: mtlr

				check1:
				%and = and i64 %a, 1
				%tobool = icmp eq i64 %and, 0
				br i1 %tobool, label %check2, label %if.then

				check2:
				%and1 = and i64 %a, 2
				%tobool2 = icmp eq i64 %and1, 0
				br i1 %tobool2, label %return, label %if.then2

				return:
				%retval = phi i64 [ %call, %if.then ], [ %call2, %if.then2 ], [ %a, %check2 ]
				ret i64 %retval

				if.then:
				%call = tail call fastcc i64 @internal_func1(i64 %a)
				br label %return

				if.then2:
				%call2 = tail call fastcc i64 @internal_func2(i64 %a)
				br label %return
				}



				define i64 @func2(i64 %a) {
				; Only internal function can be optimized with direct return. We create/destroy stack frame around external_func1.
				; CHECK-LABEL: func2:
				; CHECK-DAG: mflr
				; CHECK-DAG: bl external_func1
				; CHECK-DAG: b internal_func1
				; CHECK-DAG: mtlr

				check1:
				%and = and i64 %a, 1
				%tobool = icmp eq i64 %and, 0
				br i1 %tobool, label %check2, label %if.then

				check2:
				%and1 = and i64 %a, 2
				%tobool2 = icmp eq i64 %and1, 0
				br i1 %tobool2, label %return, label %if.then2

				return:
				%retval = phi i64 [ %call, %if.then ], [ %call2, %if.then2 ], [ %a, %check2 ]
				ret i64 %retval

				if.then:
				%call = tail call fastcc i64 @internal_func1(i64 %a)
				br label %return

				if.then2:
				%call2 = tail call i64 @external_func1(i64 %a)
				br label %return
				}



				define i64 @func3(i64 %a) {
				; Since a method call for vararg exists in func3, we do not optimize with direct return.
				; Hence, we create stack frame in the method prologue and destroy it in epilogue.
				; CHECK-LABEL: func3:
				; CHECK: mflr
				; CHECK-DAG: bl internal_func1
				; CHECK-DAG: bl internal_vararg_func
				; CHECK: mtlr

				check1:
				%and = and i64 %a, 1
				%tobool = icmp eq i64 %and, 0
				br i1 %tobool, label %check2, label %if.then

				check2:
				%and1 = and i64 %a, 2
				%tobool2 = icmp eq i64 %and1, 0
				br i1 %tobool2, label %return, label %if.then2

				return:
				%retval = phi i64 [ %call, %if.then ], [ %call2, %if.then2 ], [ %a, %check2 ]
				ret i64 %retval

				if.then:
				%call = tail call fastcc i64 @internal_func1(i64 %a)
				br label %return

				if.then2:
				%call2 = tail call i64 (i64, ...) @internal_vararg_func(i64 %a)
				br label %return
				}




				define internal fastcc i64 @internal_func1(i64 %a) unnamed_addr #1 {
				check1:
				ret i64 %a
				}

				define internal fastcc i64 @internal_func2(i64 %a) unnamed_addr #1 {
				check1:
				ret i64 %a
				}

				define internal i64 @internal_vararg_func(i64 %a, ...) unnamed_addr #1 {
				check1:
				ret i64 %a
				}

				declare i64 @external_func1(i64) #1

				attributes #1 = { noinline }