This is an archive of the discontinued LLVM Phabricator instance.

[Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on splitting an OR condition.
AbandonedPublic

Authored by junbuml on Oct 6 2017, 12:52 PM.

Download Raw Diff

Details

Reviewers

chandlerc
eraman
mcrosier

Summary

If a call site is dominated by an OR condition and if any of its arguments are predicated on this OR condition, see if splitting the condition (and thereby further constraining the arguments) increases our opportunities to inline the call.

For example, in the code below, if callee() is not inlinable, we try to split the call site since we can predicate the argument (ptr) based on the OR condition. Inline if any of the new call sites is inlinable.

Split the OR condition from :

char *ptr = foo();
bool cond = bar();
if (!ptr || cond)
  callee(ptr);

to:

if (!ptr)
  callee(null)  // pass null because ptr is known constant null
else if (cond)
  callee(nonnull ptr)   // set the nonnull attribute on the ptr argument

, if the inline cost for either callee(null) or callee(nonnull %ptr) is less than threshold.

This is WIP and needs more test cases, but I'm submitting to get an early high level feedback about its approach.

I found 20% performance improvement in spec2017/gcc without regression in my spec2000/2006/2017 tests on aarch64.

Diff Detail

Event Timeline

junbuml created this revision.Oct 6 2017, 12:52 PM

Herald added subscribers: kristof.beyls, aemerson. · View Herald TranscriptOct 6 2017, 12:52 PM

mcrosier retitled this revision from [Inline][WIP] Try to inline if predicated on OR condition to [Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on an splitting an OR condition..Oct 6 2017, 2:48 PM

mcrosier edited the summary of this revision. (Show Details)

mcrosier retitled this revision from [Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on an splitting an OR condition. to [Inline][WIP] Expose more inlining opportunities by further constraining call site arguments based on splitting an OR condition..Oct 6 2017, 2:50 PM

Simplified the code little bit and updated comments. Please let me know any comment.

This is good stuff, but I don't feel Inliner is the right place to do such transformation.

Narrowly speaking, it is callsite splitting transformation -- but in general it can be enhanced to handle more general block cloning/splitting to enable more const/predicate propagation (similar to jumpthreading)? IPA-cp based function cloning can also benefit from this.

This is good stuff, but I don't feel Inliner is the right place to do such transformation.
Narrowly speaking, it is callsite splitting transformation -- but in general it can be enhanced to handle more general block cloning/splitting to enable more const/predicate propagation (similar to jumpthreading)? IPA-cp based function cloning can also benefit from this.

Yes, this change itself is callsite splitting of which profitability is tightly related with the inline cost. That's why I placed this in inliner. Making this more general for blocks might be good, but I'm not sure about the profitability check in general and the range of blocks we have to cover. When isolating this just for callsite splitting, I wasn't able to find any other good place other than inliner. I will be happy to hear any suggestion.

If we limit this only for the call site splitting, do you still think inliner is not a good place for this?

I see a lot of potential to make this more general. As I mentioned, this is similar to constant propagation based function cloning -- exposing specialization opportunities seems not limited to inliner though inlining could be the biggest customer.

Consider this:

define void @foo(i32) local_unnamed_addr #0 {

%2 = icmp eq i32 %0, 10
%3 = select i1 %2, i32 1, i32 2
tail call void @bar(i32 %3) #2
ret void

}

Converting Select into control flow and expose the constant propagation opportunity should be done in the same pass.

Consider another example:

define void @foo(i32) local_unnamed_addr #0 {

%2 = icmp eq i32 %0, 10
br i1 %2, label %3, label %4

; <label>:3: ; preds = %1

tail call void @bar(i32 1) #2
br label %4

; <label>:4: ; preds = %1, %3

%5 = phi i32 [ 1, %3 ], [ 2, %1 ]
tail call void @bar(i32 %5) #2
ret void

}

Hoisting 'bar' call into incoming block of the phi can also expose opportunity.

Note that simplifyCFG pass in LLVM currently aggressively sink common code into the merge point -- which may lead to missing opportunities here. Chandler has a patch to undo that to reduce the damage done by the sinking but that pass is pretty late in the pipeline and won't help for inlining/cloning purpose.

Submitted https://reviews.llvm.org/D39137 to add a new pass for call-site splitting.

Revision Contents

Path

Size

lib/

Transforms/

IPO/

Inliner.cpp

343 lines

test/

Transforms/

Inline/

inline-predicated-or.ll

67 lines

Diff 118704

lib/Transforms/IPO/Inliner.cpp

Show All 26 Lines
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"		#include "llvm/Transforms/Utils/ModuleUtils.h"
using namespace llvm;		using namespace llvm;
		using namespace PatternMatch;

#define DEBUG_TYPE "inline"		#define DEBUG_TYPE "inline"

STATISTIC(NumInlined, "Number of functions inlined");		STATISTIC(NumInlined, "Number of functions inlined");
STATISTIC(NumCallsDeleted, "Number of call sites deleted, not inlined");		STATISTIC(NumCallsDeleted, "Number of call sites deleted, not inlined");
STATISTIC(NumDeleted, "Number of functions deleted because all callers found");		STATISTIC(NumDeleted, "Number of functions deleted because all callers found");
STATISTIC(NumMergedAllocas, "Number of allocas merged together");		STATISTIC(NumMergedAllocas, "Number of allocas merged together");

▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	if (callerWillBeRemoved && !Caller->hasOneUse())
TotalSecondaryCost -= InlineConstants::LastCallToStaticBonus;		TotalSecondaryCost -= InlineConstants::LastCallToStaticBonus;

if (inliningPreventsSomeOuterInline && TotalSecondaryCost < IC.getCost())		if (inliningPreventsSomeOuterInline && TotalSecondaryCost < IC.getCost())
return true;		return true;

return false;		return false;
}		}

		static void addNonNullAttribute(Instruction CallI, Instruction &NewCallI,
		Value Op, Constant ConstValue) {
		if (!NewCallI) {
		NewCallI = CallI->clone();
		NewCallI->insertAfter(CallI);
		}
		CallSite CS(NewCallI);
		unsigned ArgNo = 0;
		for (CallSite::arg_iterator I = CS.arg_begin(), E = CS.arg_end(); I != E;
		++I, ++ArgNo)
		if (*I == Op)
		CS.addParamAttr(ArgNo, Attribute::NonNull);
		}

		static void setConstantInArgument(Instruction CallI, Instruction &NewCallI,
		Value Op, Constant ConstValue) {
		if (!NewCallI) {
		NewCallI = CallI->clone();
		NewCallI->insertAfter(CallI);
		}
		CallSite CS(NewCallI);
		unsigned ArgNo = 0;
		for (CallSite::arg_iterator I = CS.arg_begin(), E = CS.arg_end(); I != E;
		++I, ++ArgNo)
		if (*I == Op)
		CS.setArgument(ArgNo, ConstValue);
		}

		static bool createCallSitesWithConstrainedArgument(
		Instruction Instr, Instruction &TopTakenCI, Instruction *&TopUntakenCI,
		SmallVectorImpl<BranchInst > &BranchInsts, BasicBlock TopBB) {
		assert(BranchInsts.size() <= 2 &&
		"Unexpected number of blocks in the OR predicated condition");
		BasicBlock *CallSiteBB = Instr->getParent();
		TerminatorInst *TopTI = TopBB->getTerminator();
		bool IsCSInTakenPath = CallSiteBB == TopTI->getSuccessor(0);

		for (unsigned I = 0, E = BranchInsts.size(); I != E; ++I) {
		BranchInst *PBI = BranchInsts[I];
		assert(PBI->isConditional());
		ICmpInst *Cmp = cast<ICmpInst>(PBI->getCondition());
		Value *Op0 = Cmp->getOperand(0);
		Constant *Op1 = cast<Constant>(Cmp->getOperand(1));
		CmpInst::Predicate Pred = Cmp->getPredicate();

		if (PBI->getParent() == TopBB) {
		Instruction *&CallTakenFromTop = IsCSInTakenPath ? TopTakenCI : TopUntakenCI;
		Instruction *&CallUntakenFromTop = IsCSInTakenPath ? TopUntakenCI : TopTakenCI;

		assert(Pred == ICmpInst::ICMP_EQ \|\| Pred == ICmpInst::ICMP_NE &&
		"Unexpected predicate in an OR condition");

		// Set the constant value for the call in the taken path from the top
		// block.
		Instruction *&CallTaken = ICmpInst::ICMP_EQ ? CallTakenFromTop : CallUntakenFromTop;
		setConstantInArgument(Instr, CallTaken, Op0, Op1);

		// Add the NonNull attribute if compared with the null pointer for the
		// call in the untaken path from the top block.
		if (Op1->getType()->isPointerTy() && Op1->isNullValue()) {
		Instruction *&CallUntaken = ICmpInst::ICMP_EQ ? CallUntakenFromTop : CallTakenFromTop;
		addNonNullAttribute(Instr, CallUntaken, Op0, Op1);
		}

		} else {
		Instruction *&CallUntaken = TopUntakenCI;
		if (Pred == ICmpInst::ICMP_EQ) {
		if (PBI->getSuccessor(0) == Instr->getParent()) {
		// Set the constant value for the call in the untaken path from the
		// top block.
		setConstantInArgument(Instr, CallUntaken, Op0, Op1);
		} else {
		// Add the NonNull attribute if compared with the null pointer for the
		// call in the untaken path from the top block.
		if (Op1->getType()->isPointerTy() && Op1->isNullValue())
		addNonNullAttribute(Instr, CallUntaken, Op0, Op1);
		}

		} else {
		if (PBI->getSuccessor(0) == Instr->getParent()) {
		// Add the NonNull attribute if compared with the null pointer for the
		// call in the untaken path from the top block.
		if (Op1->getType()->isPointerTy() && Op1->isNullValue())
		addNonNullAttribute(Instr, CallUntaken, Op0, Op1);
		} else if (Pred == ICmpInst::ICMP_NE) {
		// Set the constant value for the call in the untaken path from the
		// top block.
		setConstantInArgument(Instr, CallUntaken, Op0, Op1);
		} else
		llvm_unreachable("Unexpected condition");
		}
		}
		}
		return TopTakenCI \|\| TopUntakenCI;
		}

		static bool splitOrConds(CallGraph &CG, CallSite CS, BasicBlock *TopBB,
		Instruction CallTaken, Instruction CallUntaken) {
		assert((CallTaken \|\| CallUntaken) && "Expect at least one new call site");
		Instruction *Instr = CS.getInstruction();
		Function *Caller = CS.getCaller();
		Function *Callee = CS.getCalledFunction();

		BasicBlock *CallSiteBB = Instr->getParent();
		pred_iterator PII = pred_begin(CallSiteBB);
		BasicBlock Pred1 = PII++;
		BasicBlock Pred2 = PII;

		BasicBlock *NextCond;
		if (TopBB == Pred1)
		NextCond = Pred2;
		else if (TopBB == Pred2)
		NextCond = Pred1;
		else
		llvm_unreachable("Unexpected OR condition");

		BasicBlock *TakenBlock =
		SplitBlockPredecessors(CallSiteBB, TopBB, ".taken.split");
		BasicBlock *UntakenBlock =
		SplitBlockPredecessors(CallSiteBB, NextCond, ".untaken.split");
		if (!TakenBlock \|\| !UntakenBlock)
		return false;

		if (!CallTaken) {
		CallTaken = Instr->clone();
		CallTaken->insertBefore(&*TakenBlock->getFirstInsertionPt());
		} else
		CallTaken->moveBefore(&*TakenBlock->getFirstInsertionPt());

		if (!CallUntaken) {
		CallUntaken = Instr->clone();
		CallUntaken->insertBefore(&*UntakenBlock->getFirstInsertionPt());
		} else
		CallUntaken->moveBefore(&*UntakenBlock->getFirstInsertionPt());

		CallSite CSTaken(CallTaken);
		CallSite CSUntaken(CallUntaken);

		CG[Caller]->addCalledFunction(CSTaken, CG[Callee]);
		CG[Caller]->addCalledFunction(CSUntaken, CG[Callee]);
		CG[Caller]->removeCallEdgeFor(CS);

		// Replace users of the original call with a PHI mering call sites split.
		if (Instr->getNumUses()) {
		PHINode *PN = PHINode::Create(Instr->getType(), 2, "call.phi", Instr);
		PN->addIncoming(CallTaken, TakenBlock);
		PN->addIncoming(CallUntaken, UntakenBlock);
		Instr->replaceAllUsesWith(PN);
		}
		Instr->eraseFromParent();
		return true;
		}

		static bool isCondRelevantToAnyCallArgument(ICmpInst *Cmp, CallSite CS) {
		assert(isa<Constant>(Cmp->getOperand(1)) && "Expected a constant operand.");
		Value *Op0 = Cmp->getOperand(0);
		unsigned ArgNo = 0;
		for (CallSite::arg_iterator I = CS.arg_begin(), E = CS.arg_end(); I != E;
		++I, ++ArgNo) {
		// Don't consider arguments that are already known non-null.
		if (CS.paramHasAttr(ArgNo, Attribute::NonNull))
		continue;

		if (*I == Op0)
		return true;
		}
		return false;
		}

		static void findOrCondRelevantToCallArgument(
		CallSite CS, BasicBlock PredBB, BasicBlock OtherPredBB,
		SmallVectorImpl<BranchInst > &BranchInsts, BasicBlock &TopBB) {
		auto *PBI = dyn_cast<BranchInst>(PredBB->getTerminator());
		if (!PBI \|\| !PBI->isConditional())
		return;

		if (OtherPredBB)
		if (PBI->getSuccessor(0) == OtherPredBB \|\|
		PBI->getSuccessor(1) == OtherPredBB)
		if (PredBB == OtherPredBB->getSinglePredecessor()) {
		assert(TopBB == nullptr && "Expect to find only a single top block");
		TopBB = PredBB;
		}

		CmpInst::Predicate Pred;
		Value *Cond = PBI->getCondition();
		if (match(Cond, m_ICmp(Pred, m_Value(), m_Constant()))) {
		ICmpInst *Cmp = cast<ICmpInst>(Cond);
		if (isCondRelevantToAnyCallArgument(Cmp, CS))
		if (Pred == ICmpInst::ICMP_EQ \|\| Pred == ICmpInst::ICMP_NE)
		BranchInsts.push_back(PBI);
		}
		}

		// Return true if an agument in CS is predicated on an 'or' condition.
		static bool
		isPredicatedOnOrCondition(CallSite CS,
		SmallVectorImpl<BranchInst *> &BranchInsts,
		BasicBlock *&TopBB) {
		BasicBlock *ParentBB = CS.getInstruction()->getParent();

		// Multiple predecessors that equal and 'or' condition.
		pred_iterator PII = pred_begin(ParentBB);
		pred_iterator PIE = pred_end(ParentBB);
		unsigned NumPreds = std::distance(PII, PIE);
		if (NumPreds != 2)
		return false;

		BasicBlock Preds[2] = {PII++, *PII};
		findOrCondRelevantToCallArgument(CS, Preds[0], Preds[1], BranchInsts, TopBB);
		findOrCondRelevantToCallArgument(CS, Preds[1], Preds[0], BranchInsts, TopBB);
		return !BranchInsts.empty() && TopBB != nullptr;
		}

/// Return the cost only if the inliner should attempt to inline at the given		/// Return the cost only if the inliner should attempt to inline at the given
/// CallSite. If we return the cost, we will emit an optimisation remark later		/// CallSite. If we return the cost, we will emit an optimisation remark later
/// using that cost, so we won't do so from this function.		/// using that cost, so we won't do so from this function.
static Optional<InlineCost>		static Optional<InlineCost>
shouldInline(CallSite CS, function_ref<InlineCost(CallSite CS)> GetInlineCost,		shouldInline(CallSite CS, function_ref<InlineCost(CallSite CS)> GetInlineCost,
OptimizationRemarkEmitter &ORE) {		OptimizationRemarkEmitter &ORE) {
using namespace ore;		using namespace ore;
InlineCost IC = GetInlineCost(CS);		InlineCost IC = GetInlineCost(CS);
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	shouldInline(CallSite CS, function_ref<InlineCost(CallSite CS)> GetInlineCost,
}		}

DEBUG(dbgs() << " Inlining: cost=" << IC.getCost()		DEBUG(dbgs() << " Inlining: cost=" << IC.getCost()
<< ", thres=" << IC.getThreshold()		<< ", thres=" << IC.getThreshold()
<< ", Call: " << *CS.getInstruction() << '\n');		<< ", Call: " << *CS.getInstruction() << '\n');
return IC;		return IC;
}		}

		// If a call site is dominated by an OR condition and if any of its arguments
		// are predicated on this OR condition, see if splitting the condition (and
		// thereby further constraining the arguments) increases our opportunities to
		// inline the call.
		//
		// For example, in the code below, if callee() is not inlinable, we try to
		// split the call site since we can predicate the argument (ptr) based on the OR
		// condition. Inline if any of the new call sites is inlinable.
		//
		// Split from :
		// if (!ptr \|\| c)
		// callee(ptr);
		// to :
		// if (!ptr)
		// callee(nonnull ptr) // set non-null attribute in the argument
		// else if (c)
		// callee(null) // set the known constant value
		//
		// , if the inline cost for either callee(null) or callee(nonnull %ptr) is less
		// than threshold
		static Instruction *tryToInlineIfPredicatedOnOrCondition(
		CallGraph &CG, CallSite CS, int &Cost, int &Threshold,
		function_ref<InlineCost(CallSite CS)> GetInlineCost,
		OptimizationRemarkEmitter &ORE) {
		if (!CS.arg_size())
		return nullptr;

		SmallVector<BranchInst *, 4> BranchInsts;
		BasicBlock *TopBB = nullptr;
		if (!isPredicatedOnOrCondition(CS, BranchInsts, TopBB))
		return nullptr;

		Instruction *Instr = CS.getInstruction();
		Instruction *CallTaken = nullptr;
		Instruction *CallUntaken = nullptr;

		// Based on the OR predicated condition, temporarily create call sites with
		// the NonNull attribute or constant value in arguments.
		if (!createCallSitesWithConstrainedArgument(Instr, CallTaken, CallUntaken,
		BranchInsts, TopBB))
		return nullptr;

		int CostOfTaken = INT_MAX;
		int CostOfUntaken = INT_MAX;
		int ThresholdOfTaken = INT_MIN;
		int ThresholdOfUntaken = INT_MIN;

		if (CallTaken) {
		CallSite CSTaken(CallTaken);
		Optional<InlineCost> OICTaken = shouldInline(CSTaken, GetInlineCost, ORE);
		if (OICTaken) {
		CostOfTaken = OICTaken->getCost();
		ThresholdOfTaken = OICTaken->getThreshold();
		}
		}

		if (CallUntaken) {
		CallSite CSUntaken(CallUntaken);
		Optional<InlineCost> OICUntaken =
		shouldInline(CSUntaken, GetInlineCost, ORE);
		if (OICUntaken) {
		CostOfUntaken = OICUntaken->getCost();
		ThresholdOfUntaken = OICUntaken->getThreshold();
		}
		}

		// See if any new call site created above is turned into inlinable.
		if (CostOfTaken != INT_MAX \|\| CostOfUntaken != INT_MAX) {
		// Allow splitting the OR condition only when the call instruction is the
		// first instruction of its block. Based on this constraint, we clone
		// only the call instruction, and also we do not add any extra conditional
		// branches.
		if (Instr != (&*Instr->getParent()->begin()) \|\|
		!splitOrConds(CG, CS, TopBB, CallTaken, CallUntaken)) {
		if (CallTaken)
		CallTaken->eraseFromParent();
		if (CallUntaken)
		CallUntaken->eraseFromParent();
		return nullptr;
		}

		if (CostOfTaken > CostOfUntaken) {
		Cost = CostOfUntaken;
		Threshold = ThresholdOfUntaken;
		return CallUntaken;
		} else {
		Cost = CostOfTaken;
		Threshold = ThresholdOfTaken;
		return CallTaken;
		}
		} else {
		if (CallTaken)
		CallTaken->eraseFromParent();
		if (CallUntaken)
		CallUntaken->eraseFromParent();
		}
		return nullptr;
		}

/// Return true if the specified inline history ID		/// Return true if the specified inline history ID
/// indicates an inline history that includes the specified function.		/// indicates an inline history that includes the specified function.
static bool InlineHistoryIncludes(		static bool InlineHistoryIncludes(
Function *F, int InlineHistoryID,		Function *F, int InlineHistoryID,
const SmallVectorImpl<std::pair<Function *, int>> &InlineHistory) {		const SmallVectorImpl<std::pair<Function *, int>> &InlineHistory) {
while (InlineHistoryID != -1) {		while (InlineHistoryID != -1) {
assert(unsigned(InlineHistoryID) < InlineHistory.size() &&		assert(unsigned(InlineHistoryID) < InlineHistory.size() &&
"Invalid inline history ID");		"Invalid inline history ID");
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	for (unsigned CSi = 0; CSi != CallSites.size(); ++CSi) {
}		}

// FIXME for new PM: because of the old PM we currently generate ORE and		// FIXME for new PM: because of the old PM we currently generate ORE and
// in turn BFI on demand. With the new PM, the ORE dependency should		// in turn BFI on demand. With the new PM, the ORE dependency should
// just become a regular analysis dependency.		// just become a regular analysis dependency.
OptimizationRemarkEmitter ORE(Caller);		OptimizationRemarkEmitter ORE(Caller);

Optional<InlineCost> OIC = shouldInline(CS, GetInlineCost, ORE);		Optional<InlineCost> OIC = shouldInline(CS, GetInlineCost, ORE);
		bool isAlways = false;
		int Cost, Threshold;

// If the policy determines that we should inline this function,		// If the policy determines that we should inline this function,
// delete the call instead.		// delete the call instead.
if (!OIC)		if (!OIC) {
		if (Instruction *InlinableInst = tryToInlineIfPredicatedOnOrCondition(
		CG, CS, Cost, Threshold, GetInlineCost, ORE)) {
		CallSite InlinableCS(InlinableInst);
		CS = InlinableCS;
		} else
continue;		continue;
		} else {
		isAlways = OIC->isAlways();
		if (OIC->isVariable()) {
		Cost = OIC->getCost();
		Threshold = OIC->getThreshold();
		}
		}

// If this call site is dead and it is to a readonly function, we should		// If this call site is dead and it is to a readonly function, we should
// just delete the call instead of trying to inline it, regardless of		// just delete the call instead of trying to inline it, regardless of
// size. This happens because IPSCCP propagates the result out of the		// size. This happens because IPSCCP propagates the result out of the
// call and then we're left with the dead call.		// call and then we're left with the dead call.
if (IsTriviallyDead) {		if (IsTriviallyDead) {
DEBUG(dbgs() << " -> Deleting dead call: " << *Instr << "\n");		DEBUG(dbgs() << " -> Deleting dead call: " << *Instr << "\n");
// Update the call graph by deleting the edge from Callee to Caller.		// Update the call graph by deleting the edge from Callee to Caller.
Show All 13 Lines	for (unsigned CSi = 0; CSi != CallSites.size(); ++CSi) {
ORE.emit(		ORE.emit(
OptimizationRemarkMissed(DEBUG_TYPE, "NotInlined", DLoc, Block)		OptimizationRemarkMissed(DEBUG_TYPE, "NotInlined", DLoc, Block)
<< NV("Callee", Callee) << " will not be inlined into "		<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", Caller));		<< NV("Caller", Caller));
continue;		continue;
}		}
++NumInlined;		++NumInlined;

if (OIC->isAlways())		if (isAlways)
ORE.emit(OptimizationRemark(DEBUG_TYPE, "AlwaysInline", DLoc, Block)		ORE.emit(OptimizationRemark(DEBUG_TYPE, "AlwaysInline", DLoc, Block)
<< NV("Callee", Callee) << " inlined into "		<< NV("Callee", Callee) << " inlined into "
<< NV("Caller", Caller) << " with cost=always");		<< NV("Caller", Caller) << " with cost=always");
else		else
ORE.emit(OptimizationRemark(DEBUG_TYPE, "Inlined", DLoc, Block)		ORE.emit(OptimizationRemark(DEBUG_TYPE, "Inlined", DLoc, Block)
<< NV("Callee", Callee) << " inlined into "		<< NV("Callee", Callee) << " inlined into "
<< NV("Caller", Caller)		<< NV("Caller", Caller) << " with cost=" << NV("Cost", Cost)
<< " with cost=" << NV("Cost", OIC->getCost())		<< " (threshold=" << NV("Threshold", Threshold) << ")");
<< " (threshold=" << NV("Threshold", OIC->getThreshold())
<< ")");

// If inlining this function gave us any new call sites, throw them		// If inlining this function gave us any new call sites, throw them
// onto our worklist to process. They are useful inline candidates.		// onto our worklist to process. They are useful inline candidates.
if (!InlineInfo.InlinedCalls.empty()) {		if (!InlineInfo.InlinedCalls.empty()) {
// Create a new inline history entry for this, so that we remember		// Create a new inline history entry for this, so that we remember
// that these new callsites came about due to inlining Callee.		// that these new callsites came about due to inlining Callee.
int NewHistoryID = InlineHistory.size();		int NewHistoryID = InlineHistory.size();
InlineHistory.push_back(std::make_pair(Callee, InlineHistoryID));		InlineHistory.push_back(std::make_pair(Callee, InlineHistoryID));
▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

test/Transforms/Inline/inline-predicated-or.ll

This file was added.

				; RUN: opt < %s -inline -instcombine -jump-threading -S \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-linaro-linux-gnueabi"

				%struct.bitmap = type { i32, %struct.bitmap* }

				;CHECK-LABEL: @caller
				;CHECK-LABEL: NextCond:
				;CHECK: br {{.*}} label %callee.exit
				;CHECK-LABEL: CallSiteBB.taken.split:
				;CHECK: call void @callee(%struct.bitmap* null, %struct.bitmap* null, %struct.bitmap* %b_elt)
				;CHECK-LABEL: callee.exit:
				;CHECK: call void @dummy2(%struct.bitmap* %a_elt)

				define void @caller(i1 %c, %struct.bitmap* %a_elt, %struct.bitmap* %b_elt) {
				entry:
				br label %Top

				Top:
				%tobool1 = icmp eq %struct.bitmap* %a_elt, null
				br i1 %tobool1, label %CallSiteBB, label %NextCond

				NextCond:
				%cmp = icmp ne %struct.bitmap* %b_elt, null
				br i1 %cmp, label %CallSiteBB, label %End

				CallSiteBB:
				call void @callee(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %b_elt)
				br label %End

				End:
				ret void
				}

				define void @callee(%struct.bitmap* %dst_elt, %struct.bitmap* %a_elt, %struct.bitmap* %b_elt) {
				entry:
				%tobool = icmp ne %struct.bitmap* %a_elt, null
				%tobool1 = icmp ne %struct.bitmap* %b_elt, null
				%or.cond = and i1 %tobool, %tobool1
				br i1 %or.cond, label %Cond, label %Big

				Cond:
				%cmp = icmp eq %struct.bitmap* %dst_elt, %a_elt
				br i1 %cmp, label %Small, label %Big

				Small:
				call void @dummy2(%struct.bitmap* %a_elt)
				br label %End

				Big:
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				call void @dummy1(%struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt, %struct.bitmap* %a_elt)
				br label %End

				End:
				ret void
				}

				declare void @dummy2(%struct.bitmap*)
				declare void @dummy1(%struct.bitmap, %struct.bitmap, %struct.bitmap, %struct.bitmap, %struct.bitmap, %struct.bitmap)