This is an archive of the discontinued LLVM Phabricator instance.

Inliner Enhancement - Delayed Inlining
Needs RevisionPublic

Authored by yinma on Mar 19 2015, 4:29 PM.

Download Raw Diff

Details

Reviewers

chandlerc
eraman
Jiangning
hfinkel
javed.absar
apazos

Summary

This strategy can be enabled with -mllvm -enable-delayed-inline.It is disabled by default.

In order to solve the extern A, B, C problem we implemented a subset of thedeferred inlining concept, named delayed inlining.The algorithm is below.

Given A->B->C as a simple example:

Staring at processing SCC_B, if C can be inlined into B, B will be markedwith delayed flag only and return without doing actual inlining
In SCC_A, if delayed B can be inlined into A, B will be inlined into Aand C will be inlined into A like in the normal inlining process.But B will be checked, if B has delayed flag, inling C->B will be pushed intothe queue to make sure C is correctly inlined into B.
If delayed B cannot be inlined into A, C will be inlined into B,the updated B will be tested again as the final decision.

The general algorithm description:

If all call sites in a SCC can be inlined, we will check if any functionin SCC will be visited again in the later inlining steps. If yes, all functionsin SCC will be marked delayed and return without doing actual inlining work.
In later SCC processing, if B can be inlined into A, if B is adelayed function, we recursively make sure all callsites inside B will be pushedinto the queue and correctly inlined into B and B is correctly inlined into A.
If a delayed B cannot be inlined into A, we recursively inline all callsitesinsided B into B. The updated B will be tested again as the final decision.
In order to improve speed, we cache the inline cost computed for the bodyof a function F. for a call instruction to F,the cost currently is set to the min( the cached cost, callpenalty(25)).

The current inliner works on callsite level and defers inlining when the callercan be inlined into all caller's callers.This new algorithm works on function level and delays inlining when all calleescan be inlined into callers. This makes sure all delayed funcs can correctly bere-inlined later. Setting flags into function attribute assist the re-inliningprocessing. The new solution also has makeup steps.

This strategy has been only tested with SPEC workload on AArch64 for performance but correctness passed many other benchmarks.

Diff Detail

Event Timeline

yinma updated this revision to Diff 22322.Mar 19 2015, 4:29 PM

yinma retitled this revision from to Inliner Enhancement - Delayed Inlining.

yinma updated this object.

yinma edited the test plan for this revision. (Show Details)

yinma added reviewers: Jiangning, rengolin, chandlerc, apazos, hfinkel, eraman.

Herald added a subscriber: aemerson. · View Herald TranscriptMar 19 2015, 4:29 PM

I don't think using attributes is the right approach here. I think this should just be a change to the core inlining algorithm, and how it processes candidates.

There are also no tests or any experimental results?

Also, please upload patches with full context.

This revision now requires changes to proceed.Apr 14 2015, 6:28 AM

Hi Chandler, I agree that the attributes to save cost is not the right approach. I can work on this to fix it. However, we tried this algorithm for SPEC on ARM/AArch64. we saw improved results. However, when we tried this with LTO, we experienced mess regression. Largely due to too aggressive inlining because this algorithm will make some functions which cannot be inlined before inlined now. Some functions should be huge.

In general, delayed inlining is a way that boosts up inlining threshold for certain type of functions. And it is a way that always boosts up threshold. Sometimes, the boost up may be too big. Do you have any idea that we can improve to limit the boost up in the delayed process?

rengolin removed a reviewer: rengolin.Jul 28 2015, 4:16 AM

apazos resigned from this revision.Sep 18 2018, 12:18 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 18 2018, 12:18 PM

Herald added subscribers: haicheng, kristof.beyls. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

Analysis/

InlineCost.h

6 lines

Bitcode/

LLVMBitCodes.h

3 lines

IR/

Attributes.h

9 lines

Function.h

6 lines

Transforms/

IPO/

InlinerPass.h

4 lines

lib/

Analysis/

IPA/

InlineCost.cpp

56 lines

Bitcode/

Reader/

BitcodeReader.cpp

2 lines

Writer/

BitcodeWriter.cpp

2 lines

IR/

Attributes.cpp

12 lines

Verifier.cpp

3 lines

Transforms/

IPO/

InlineAlways.cpp

1 line

InlineSimple.cpp

1 line

Inliner.cpp

191 lines

Diff 22322

include/llvm/Analysis/InlineCost.h

Context not available.
	class InlineCostAnalysis : public CallGraphSCCPass {	class InlineCostAnalysis : public CallGraphSCCPass {
	TargetTransformInfoWrapperPass *TTIWP;	TargetTransformInfoWrapperPass *TTIWP;
	AssumptionCacheTracker *ACT;	AssumptionCacheTracker *ACT;
		bool EnableDelayedInline;

	public:	public:
	static char ID;	static char ID;
Context not available.

	/// \brief Minimal filter to detect invalid constructs for inlining.	/// \brief Minimal filter to detect invalid constructs for inlining.
	bool isInlineViable(Function &Callee);	bool isInlineViable(Function &Callee);

		/// \brief Sets delayed inline mode which affects inline cost computation.
		void enableDelayedInline(bool Value) {
		EnableDelayedInline = Value;
		}
	};	};

	}	}
Context not available.

include/llvm/Bitcode/LLVMBitCodes.h

Context not available.
	ATTR_KIND_IN_ALLOCA = 38,	ATTR_KIND_IN_ALLOCA = 38,
	ATTR_KIND_NON_NULL = 39,	ATTR_KIND_NON_NULL = 39,
	ATTR_KIND_JUMP_TABLE = 40,	ATTR_KIND_JUMP_TABLE = 40,
	ATTR_KIND_DEREFERENCEABLE = 41	ATTR_KIND_DEREFERENCEABLE = 41,
		ATTR_KIND_DELAYED_INLINE = 42
	};	};

	enum ComdatSelectionKindCodes {	enum ComdatSelectionKindCodes {
Context not available.

include/llvm/IR/Attributes.h

Context not available.
	SanitizeMemory, ///< MemorySanitizer is on.	SanitizeMemory, ///< MemorySanitizer is on.
	UWTable, ///< Function must be in a unwind table	UWTable, ///< Function must be in a unwind table
	ZExt, ///< Zero extended before/after call	ZExt, ///< Zero extended before/after call
		// FIXME: how to create a hidden attribute for compiler use only.
		DelayedInline, ///< Function is in delayed inlining mode

	EndAttrKinds ///< Sentinal value useful for loops	EndAttrKinds ///< Sentinal value useful for loops
	};	};
Context not available.
	/// attribute list. Since attribute lists are immutable, this returns the new	/// attribute list. Since attribute lists are immutable, this returns the new
	/// list.	/// list.
	AttributeSet removeAttribute(LLVMContext &C, unsigned Index,	AttributeSet removeAttribute(LLVMContext &C, unsigned Index,
		StringRef Kind) const;

		/// \brief Remove the specified attribute at the specified index from this
		/// attribute list. Since attribute lists are immutable, this returns the new
		/// list.
		AttributeSet removeAttribute(LLVMContext &C, unsigned Index,
	Attribute::AttrKind Attr) const;	Attribute::AttrKind Attr) const;


	/// \brief Remove the specified attributes at the specified index from this	/// \brief Remove the specified attributes at the specified index from this
	/// attribute list. Since attribute lists are immutable, this returns the new	/// attribute list. Since attribute lists are immutable, this returns the new
	/// list.	/// list.
Context not available.

include/llvm/IR/Function.h

Context not available.
	getContext(), AttributeSet::FunctionIndex, N));	getContext(), AttributeSet::FunctionIndex, N));
	}	}

		/// @brief Remove function attributes from this function.
		void removeFnAttr(StringRef Kind) {
		setAttributes(AttributeSets.removeAttribute(
		getContext(), AttributeSet::FunctionIndex, Kind));
		}

	/// @brief Add function attributes to this function.	/// @brief Add function attributes to this function.
	void addFnAttr(StringRef Kind) {	void addFnAttr(StringRef Kind) {
	setAttributes(	setAttributes(
Context not available.

include/llvm/Transforms/IPO/InlinerPass.h

Context not available.
	/// deal with that subset of the functions.	/// deal with that subset of the functions.
	bool removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly = false);	bool removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly = false);

		protected:
		// Delayed inline mode affects how inline cost is computed.
		bool EnableDelayedInline;

	private:	private:
	// InlineThreshold - Cache the value here for easy access.	// InlineThreshold - Cache the value here for easy access.
	unsigned InlineThreshold;	unsigned InlineThreshold;
Context not available.

lib/Analysis/IPA/InlineCost.cpp

Context not available.
	#include "llvm/ADT/SmallPtrSet.h"	#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/ADT/SmallVector.h"	#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/Statistic.h"	#include "llvm/ADT/Statistic.h"
		#include "llvm/ADT/StringExtras.h"
	#include "llvm/Analysis/AssumptionCache.h"	#include "llvm/Analysis/AssumptionCache.h"
	#include "llvm/Analysis/CodeMetrics.h"	#include "llvm/Analysis/CodeMetrics.h"
	#include "llvm/Analysis/ConstantFolding.h"	#include "llvm/Analysis/ConstantFolding.h"
Context not available.

	int Threshold;	int Threshold;
	int Cost;	int Cost;
		int BodyCost;

	bool IsCallerRecursive;	bool IsCallerRecursive;
	bool IsRecursiveCall;	bool IsRecursiveCall;
Context not available.
	CallAnalyzer(const TargetTransformInfo &TTI, AssumptionCacheTracker *ACT,	CallAnalyzer(const TargetTransformInfo &TTI, AssumptionCacheTracker *ACT,
	Function &Callee, int Threshold)	Function &Callee, int Threshold)
	: TTI(TTI), ACT(ACT), F(Callee), Threshold(Threshold), Cost(0),	: TTI(TTI), ACT(ACT), F(Callee), Threshold(Threshold), Cost(0),
	IsCallerRecursive(false), IsRecursiveCall(false),	BodyCost(0), IsCallerRecursive(false), IsRecursiveCall(false),
	ExposesReturnsTwice(false), HasDynamicAlloca(false),	ExposesReturnsTwice(false), HasDynamicAlloca(false),
	ContainsNoDuplicateCall(false), HasReturn(false), HasIndirectBr(false),	ContainsNoDuplicateCall(false), HasReturn(false), HasIndirectBr(false),
	AllocatedSize(0), NumInstructions(0), NumVectorInstructions(0),	AllocatedSize(0), NumInstructions(0), NumVectorInstructions(0),
Context not available.

	int getThreshold() { return Threshold; }	int getThreshold() { return Threshold; }
	int getCost() { return Cost; }	int getCost() { return Cost; }
		int getBodyCost() { return BodyCost; }

	// Keep a bunch of stats about the cost savings found so we can print them	// Keep a bunch of stats about the cost savings found so we can print them
	// out when debugging.	// out when debugging.
Context not available.
	}	}

	if (TTI.isLoweredToCall(F)) {	if (TTI.isLoweredToCall(F)) {
	// We account for the average 1 instruction per call argument setup	// Check if there is cached body cost.
	// here.	if (F->hasFnAttribute("CICost")) {
	Cost += CS.arg_size() * InlineConstants::InstrCost;	Attribute Attr = F->getFnAttribute("CICost");
		int BodyCost = 0;
	// Everything other than inline ASM will also have a significant cost	if (!Attr.getValueAsString().getAsInteger(10, BodyCost))
	// merely from making the call.	Cost += BodyCost;
	if (!isa<InlineAsm>(CS.getCalledValue()))	else
	Cost += InlineConstants::CallPenalty;	Cost += InlineConstants::CallPenalty;

		} else {
		// We account for the average 1 instruction per call argument setup
		// here.
		Cost += CS.arg_size() * InlineConstants::InstrCost;

		// Everything other than inline ASM will also have a significant cost
		// merely from making the call.
		if (!isa<InlineAsm>(CS.getCalledValue()))
		Cost += InlineConstants::CallPenalty;
		}
	}	}

	return Base::visitCallSite(CS);	return Base::visitCallSite(CS);
Context not available.
	// the callee, so this is purely duplicate work).	// the callee, so this is purely duplicate work).
	SmallPtrSet<const Value *, 32> EphValues;	SmallPtrSet<const Value *, 32> EphValues;
	CodeMetrics::collectEphemeralValues(&F, &ACT->getAssumptionCache(F), EphValues);	CodeMetrics::collectEphemeralValues(&F, &ACT->getAssumptionCache(F), EphValues);
		BodyCost = Cost;

	// The worklist of live basic blocks in the callee after inlining. We avoid	// The worklist of live basic blocks in the callee after inlining. We avoid
	// adding basic blocks of the callee which can be proven to be dead for this	// adding basic blocks of the callee which can be proven to be dead for this
Context not available.
	return false;	return false;

	Threshold += VectorBonus;	Threshold += VectorBonus;
		BodyCost = Cost - BodyCost;

	return Cost < Threshold;	return Cost < Threshold;
	}	}
Context not available.

	char InlineCostAnalysis::ID = 0;	char InlineCostAnalysis::ID = 0;

	InlineCostAnalysis::InlineCostAnalysis() : CallGraphSCCPass(ID) {}	InlineCostAnalysis::InlineCostAnalysis() : CallGraphSCCPass(ID), EnableDelayedInline(false) {}

	InlineCostAnalysis::~InlineCostAnalysis() {}	InlineCostAnalysis::~InlineCostAnalysis() {}

Context not available.
	// Check if there was a reason to force inlining or no inlining.	// Check if there was a reason to force inlining or no inlining.
	if (!ShouldInline && CA.getCost() < CA.getThreshold())	if (!ShouldInline && CA.getCost() < CA.getThreshold())
	return InlineCost::getNever();	return InlineCost::getNever();

		if (ShouldInline && EnableDelayedInline) {
		// For every Cx, we record its body cost to make a better estimate on
		// the real cost penalty of Cx when inlining B to A in delayed inlining
		// mode. For normal process.
		Function *F = CS.getCalledFunction();
		int BodyCost = CA.getBodyCost();

		// If we set body cost to the real cost, it will make delayed inlining
		// useless for A->B->C problem (body cost can make B too big).
		// Because no matter what order, the cost of B will be one
		// after C inlined to B. B will not be inlined.
		// So we must decrease the cost to be a lower one to boost the local
		// threshold larger that the one set.
		if (BodyCost > InlineConstants::CallPenalty)
		BodyCost = InlineConstants::CallPenalty;
		StringRef CostStr = itostr(BodyCost);
		F->addFnAttr("CICost", CostStr);
		}

	if (ShouldInline && CA.getCost() >= CA.getThreshold())	if (ShouldInline && CA.getCost() >= CA.getThreshold())
	return InlineCost::getAlways();	return InlineCost::getAlways();

Context not available.

lib/Bitcode/Reader/BitcodeReader.cpp

Context not available.
	return Attribute::UWTable;	return Attribute::UWTable;
	case bitc::ATTR_KIND_Z_EXT:	case bitc::ATTR_KIND_Z_EXT:
	return Attribute::ZExt;	return Attribute::ZExt;
		case bitc::ATTR_KIND_DELAYED_INLINE:
		return Attribute::DelayedInline;
	}	}
	}	}

Context not available.

lib/Bitcode/Writer/BitcodeWriter.cpp

Context not available.
	return bitc::ATTR_KIND_UW_TABLE;	return bitc::ATTR_KIND_UW_TABLE;
	case Attribute::ZExt:	case Attribute::ZExt:
	return bitc::ATTR_KIND_Z_EXT;	return bitc::ATTR_KIND_Z_EXT;
		case Attribute::DelayedInline:
		return bitc::ATTR_KIND_DELAYED_INLINE;
	case Attribute::EndAttrKinds:	case Attribute::EndAttrKinds:
	llvm_unreachable("Can not encode end-attribute kinds marker.");	llvm_unreachable("Can not encode end-attribute kinds marker.");
	case Attribute::None:	case Attribute::None:
Context not available.

lib/IR/Attributes.cpp

Context not available.
	return "zeroext";	return "zeroext";
	if (hasAttribute(Attribute::Cold))	if (hasAttribute(Attribute::Cold))
	return "cold";	return "cold";
		if (hasAttribute(DelayedInline))
		return "delayed_inline";

	// FIXME: These should be output like this:	// FIXME: These should be output like this:
	//	//
Context not available.
	case Attribute::InAlloca: return 1ULL << 43;	case Attribute::InAlloca: return 1ULL << 43;
	case Attribute::NonNull: return 1ULL << 44;	case Attribute::NonNull: return 1ULL << 44;
	case Attribute::JumpTable: return 1ULL << 45;	case Attribute::JumpTable: return 1ULL << 45;
		case Attribute::DelayedInline: return 1ULL << 46;
	case Attribute::Dereferenceable:	case Attribute::Dereferenceable:
	llvm_unreachable("dereferenceable attribute not supported in raw format");	llvm_unreachable("dereferenceable attribute not supported in raw format");
	}	}
Context not available.
	return removeAttributes(C, Index, AttributeSet::get(C, Index, Attr));	return removeAttributes(C, Index, AttributeSet::get(C, Index, Attr));
	}	}

		AttributeSet AttributeSet::removeAttribute(LLVMContext &C, unsigned Index,
		StringRef Kind) const {
		if (!hasAttribute(Index, Kind))
		return *this;
		llvm::AttrBuilder B;
		B.addAttribute(Kind);
		return removeAttributes(C, Index, AttributeSet::get(C, Index, B));
		}

	AttributeSet AttributeSet::removeAttributes(LLVMContext &C, unsigned Index,	AttributeSet AttributeSet::removeAttributes(LLVMContext &C, unsigned Index,
	AttributeSet Attrs) const {	AttributeSet Attrs) const {
	if (!pImpl) return AttributeSet();	if (!pImpl) return AttributeSet();
Context not available.

lib/IR/Verifier.cpp

Context not available.
	I->getKindAsEnum() == Attribute::NoBuiltin \|\|	I->getKindAsEnum() == Attribute::NoBuiltin \|\|
	I->getKindAsEnum() == Attribute::Cold \|\|	I->getKindAsEnum() == Attribute::Cold \|\|
	I->getKindAsEnum() == Attribute::OptimizeNone \|\|	I->getKindAsEnum() == Attribute::OptimizeNone \|\|
	I->getKindAsEnum() == Attribute::JumpTable) {	I->getKindAsEnum() == Attribute::JumpTable \|\|
		I->getKindAsEnum() == Attribute::DelayedInline) {
	if (!isFunction) {	if (!isFunction) {
	CheckFailed("Attribute '" + I->getAsString() +	CheckFailed("Attribute '" + I->getAsString() +
	"' only applies to functions!", V);	"' only applies to functions!", V);
Context not available.

lib/Transforms/IPO/InlineAlways.cpp

Context not available.

	bool AlwaysInliner::runOnSCC(CallGraphSCC &SCC) {	bool AlwaysInliner::runOnSCC(CallGraphSCC &SCC) {
	ICA = &getAnalysis<InlineCostAnalysis>();	ICA = &getAnalysis<InlineCostAnalysis>();
		ICA->enableDelayedInline(EnableDelayedInline);
	return Inliner::runOnSCC(SCC);	return Inliner::runOnSCC(SCC);
	}	}

Context not available.

lib/Transforms/IPO/InlineSimple.cpp

Context not available.

	bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {	bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {
	ICA = &getAnalysis<InlineCostAnalysis>();	ICA = &getAnalysis<InlineCostAnalysis>();
		ICA->enableDelayedInline(EnableDelayedInline);
	return Inliner::runOnSCC(SCC);	return Inliner::runOnSCC(SCC);
	}	}

Context not available.

lib/Transforms/IPO/Inliner.cpp

Context not available.
	ColdThreshold("inlinecold-threshold", cl::Hidden, cl::init(225),	ColdThreshold("inlinecold-threshold", cl::Hidden, cl::init(225),
	cl::desc("Threshold for inlining functions with cold attribute"));	cl::desc("Threshold for inlining functions with cold attribute"));

		cl::opt<bool>
		EnableDelayedInlineOpt("enable-delayed-inline", cl::Hidden,
		cl::init(false), cl::ZeroOrMore);

	// Threshold to use when optsize is specified (and there is no -inline-limit).	// Threshold to use when optsize is specified (and there is no -inline-limit).
	const int OptSizeThreshold = 75;	const int OptSizeThreshold = 75;

Context not available.
	return false;	return false;
	}	}

		static bool toBeVisitedAgain(Function *F) {
		if (F->use_empty())
		return false;

		// Check if any use of F is a call instruction of specific type.
		for (User *U : F->users()) {
		if (CallInst *I = dyn_cast<CallInst>(U)) {
		CallSite CS(I);
		Function *Callee = CS.getCalledFunction();
		// We must ensure this call will be processed again.
		if (Callee && !Callee->isDeclaration() && Callee == F)
		return true;
		}
		}

		return false;
		}

		static unsigned
		insertFunctionCallSites(Function *F,
		SmallVectorImpl<std::pair<CallSite, int>>& CallSites,
		unsigned CSi) {
		unsigned Count = 0;

		for (auto &BB : *F)
		for (auto &Inst : BB) {
		auto *I = &Inst;
		// If this isn't a call, or it is a call to an intrinsic, it can
		// never be inlined.
		if (!isa<CallInst>(I) \|\| isa<IntrinsicInst>(I))
		continue;

		CallSite CS(I);
		// If this is a direct call to an external function, we can never inline
		// it. If it is an indirect call, inlining may resolve it to be a
		// direct call, so we keep it.
		if (!CS.getCalledFunction() \|\| CS.getCalledFunction()->isDeclaration())
		continue;

		if (CS.getCalledFunction() == F)
		continue;

		CallSites.insert(CallSites.begin() + CSi, std::make_pair(CS, -1));
		DEBUG(dbgs() << " -> Add delayed call site: "
		<< *CS.getInstruction() << "\n");
		++Count;
		}

		return Count;
		}

	bool Inliner::runOnSCC(CallGraphSCC &SCC) {	bool Inliner::runOnSCC(CallGraphSCC &SCC) {
	CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();	CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
	AssumptionCacheTracker *ACT = &getAnalysis<AssumptionCacheTracker>();	AssumptionCacheTracker *ACT = &getAnalysis<AssumptionCacheTracker>();
Context not available.
	const TargetLibraryInfo *TLI = TLIP ? &TLIP->getTLI() : nullptr;	const TargetLibraryInfo *TLI = TLIP ? &TLIP->getTLI() : nullptr;
	AliasAnalysis *AA = &getAnalysis<AliasAnalysis>();	AliasAnalysis *AA = &getAnalysis<AliasAnalysis>();

		EnableDelayedInline = EnableDelayedInlineOpt;

	SmallPtrSet<Function*, 8> SCCFunctions;	SmallPtrSet<Function*, 8> SCCFunctions;
	DEBUG(dbgs() << "Inliner visiting SCC:");	DEBUG(dbgs() << "Inliner visiting SCC:");
	for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) {	for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) {
Context not available.
	InlinedArrayAllocasTy InlinedArrayAllocas;	InlinedArrayAllocasTy InlinedArrayAllocas;
	InlineFunctionInfo InlineInfo(&CG, AA, ACT);	InlineFunctionInfo InlineInfo(&CG, AA, ACT);

		// For solving A->B->Cx problem,we check if all calls (Cx) in function B
		// will be inlined. If yes, we don't inline any of them, we wait for the time
		// when testing B->A by putting B into delayed mode. The estimated cost for
		// inlined Cx will be recorded.
		//
		// If B is inlined into A with estimated cost of inlined Cx, all Cx will be
		// revisited and inlined, no extra action to do. Ff B is not inlined into A
		// with estimated cost of inlined Cx, all Cx will be inlined into B and
		// re-test if can inline B->A again.
		if (EnableDelayedInline) {
		bool ShouldInlineAll = true;
		DEBUG(dbgs() << " BEGIN - Evaluating Delay inlining action: \n");
		for (unsigned CSi = 0, E = CallSites.size(); CSi != E; ++CSi) {
		CallSite CS = CallSites[CSi].first;

		Function *Callee = CS.getCalledFunction();

		if (!isInstructionTriviallyDead(CS.getInstruction(), TLI)) {
		// We can only inline direct calls to non-declarations.
		if (!Callee \|\| Callee->isDeclaration())
		continue;

		// Check if the policy determines that we should inline this function.
		DEBUG(dbgs() << " EVAL: ");
		if (!shouldInline(CS)) {
		ShouldInlineAll = false;
		break;
		}
		}
		}

		if (ShouldInlineAll) {
		// Set delayed mode for all functions in this SCC only if there is
		// function A that calls B, in the A->B->C case.
		bool ToBeVisitedAgain = false;
		for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) {
		Function F = (I)->getFunction();
		if (!F)
		continue;

		if (toBeVisitedAgain(F)) {
		ToBeVisitedAgain = true;
		break;
		}
		}

		if (ToBeVisitedAgain) {
		for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E;
		++I) {
		Function F = (I)->getFunction();
		if (!F)
		continue;

		// Marking B as delayed.
		F->addFnAttr(Attribute::DelayedInline);
		DEBUG(dbgs() << " Delay inlining flag set for all calls in: "
		<< F->getName() << "\n");
		}
		// Do not try inlining anymore at this point.
		return false;
		}
		}
		DEBUG(dbgs() << " END - Evaluating Delay inlining action. \n");
		}

	// Now that we have all of the call sites, loop over them and inline them if	// Now that we have all of the call sites, loop over them and inline them if
	// it looks profitable to do so.	// it looks profitable to do so.
	bool Changed = false;	bool Changed = false;
	bool LocalChange;	bool LocalChange;
		unsigned DelayedCount = 0;
	do {	do {
	LocalChange = false;	LocalChange = false;
	// Iterate over the outer loop because inlining functions can cause indirect	// Iterate over the outer loop because inlining functions can cause indirect
Context not available.
	DEBUG(dbgs() << " -> Deleting dead call: "	DEBUG(dbgs() << " -> Deleting dead call: "
	<< *CS.getInstruction() << "\n");	<< *CS.getInstruction() << "\n");
	// Update the call graph by deleting the edge from Callee to Caller.	// Update the call graph by deleting the edge from Callee to Caller.
		if (Callee->hasFnAttribute(Attribute::DelayedInline))
		Callee->removeFnAttr(Attribute::DelayedInline);
	CG[Caller]->removeCallEdgeFor(CS);	CG[Caller]->removeCallEdgeFor(CS);
	CS.getInstruction()->eraseFromParent();	CS.getInstruction()->eraseFromParent();
	++NumCallsDeleted;	++NumCallsDeleted;
Context not available.
	// Get DebugLoc to report. CS will be invalid after Inliner.	// Get DebugLoc to report. CS will be invalid after Inliner.
	DebugLoc DLoc = CS.getInstruction()->getDebugLoc();	DebugLoc DLoc = CS.getInstruction()->getDebugLoc();

		bool InDelayedMakeupProcess = false;
		bool IsCalleeDelayedFunc = false;
		if (DelayedCount > 0) {
		DEBUG(dbgs() << "D#" << DelayedCount << " ");
		--DelayedCount;
		InDelayedMakeupProcess = true;
		if (Callee->hasFnAttribute(Attribute::DelayedInline))
		IsCalleeDelayedFunc = true;
		}

	// If the policy determines that we should inline this function,	// If the policy determines that we should inline this function,
	// try to do so.	// try to do so.
	if (!shouldInline(CS)) {	if (!shouldInline(CS)) {
		if (Callee->hasFnAttribute(Attribute::DelayedInline)) {
		// We need to add delayed call sites back to make sure they
		// are inlined correctly.
		// FIXME: Note in the A->B->C->D case,
		// we are visiting callee B of A. We insert C call sites
		// before B. This will cause inline B<-C first, while
		// the bottom up mode inlines C<-D and then B-<-C.
		DelayedCount += insertFunctionCallSites(Callee, CallSites, CSi);

		// And re-test after all delayed call sites are inlined.
		Callee->removeFnAttr(Attribute::DelayedInline);
		++DelayedCount;
		--CSi;
		continue;
		}
	emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,	emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
	Twine(Callee->getName() +	Twine(Callee->getName() +
	" will not be inlined into " +	" will not be inlined into " +
	Caller->getName()));	Caller->getName()));
		if (InDelayedMakeupProcess) {
		CallSites.erase(CallSites.begin() + CSi);
		--CSi;
		}
	continue;	continue;
	}	}

Context not available.
	}	}
	++NumInlined;	++NumInlined;

		if (Callee->hasFnAttribute(Attribute::DelayedInline) &&
		!toBeVisitedAgain(Callee) &&
		!(Callee->use_empty() && Callee->hasLocalLinkage() &&
		!SCCFunctions.count(Callee) &&
		CG[Callee]->getNumReferences() == 0)) {
		// B was inlined into A above. If B is marked as delayed and it is
		// an extern function and it is the last time visiting B, so we need
		// to make sure B has C inlined.
		DelayedCount += insertFunctionCallSites(Callee, CallSites, CSi + 1);
		// This callee will be inlined. It is safe to
		// remove delayedinline flag now.
		Callee->removeFnAttr(Attribute::DelayedInline);
		}

	// Report the inline decision.	// Report the inline decision.
	emitOptimizationRemark(	emitOptimizationRemark(
	CallerCtx, DEBUG_TYPE, *Caller, DLoc,	CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Context not available.
	for (unsigned i = 0, e = InlineInfo.InlinedCalls.size();	for (unsigned i = 0, e = InlineInfo.InlinedCalls.size();
	i != e; ++i) {	i != e; ++i) {
	Value *Ptr = InlineInfo.InlinedCalls[i];	Value *Ptr = InlineInfo.InlinedCalls[i];
	CallSites.push_back(std::make_pair(CallSite(Ptr), NewHistoryID));
		CallSite NewCS(Ptr);
		if (InDelayedMakeupProcess) {
		// Only consider inlined funcs if the callee is delayed function.
		// We have to put them before the next candidate because they
		// originally should be inlined already.
		if (IsCalleeDelayedFunc && NewCS.getCalledFunction() &&
		!NewCS.getCalledFunction()->isDeclaration()) {
		CallSites.insert(CallSites.begin() + CSi + 1,
		std::make_pair(NewCS, NewHistoryID));
		++DelayedCount;
		}
		} else
		CallSites.push_back(std::make_pair(NewCS, NewHistoryID));
	}	}
	}	}
	}	}
Context not available.
	// swap/pop_back for efficiency, but do not use it if doing so would	// swap/pop_back for efficiency, but do not use it if doing so would
	// move a call site to a function in this SCC before the	// move a call site to a function in this SCC before the
	// 'FirstCallInSCC' barrier.	// 'FirstCallInSCC' barrier.
	if (SCC.isSingular()) {	if (SCC.isSingular() && DelayedCount == 0) {
	CallSites[CSi] = CallSites.back();	CallSites[CSi] = CallSites.back();
	CallSites.pop_back();	CallSites.pop_back();
	} else {	} else {
Context not available.
	if (!F \|\| F->isDeclaration())	if (!F \|\| F->isDeclaration())
	continue;	continue;

		// Remove inline internal attributes
		if (F->hasFnAttribute(Attribute::DelayedInline))
		F->removeFnAttr(Attribute::DelayedInline);
		if (F->hasFnAttribute("CICost"))
		F->removeFnAttr("CICost");

	// Handle the case when this function is called and we only want to care	// Handle the case when this function is called and we only want to care
	// about always-inline functions. This is a bit of a hack to share code	// about always-inline functions. This is a bit of a hack to share code
	// between here and the InlineAlways pass.	// between here and the InlineAlways pass.
Context not available.