This is an archive of the discontinued LLVM Phabricator instance.

Differential D79138

[Inlining] Teach shouldBeDeferred to take the total cost into account
ClosedPublic

Authored by kazu on Apr 29 2020, 3:45 PM.

Download Raw Diff

Details

Reviewers

davidxl

Commits

rGe8984fe65b94: [Inlining] Teach shouldBeDeferred to take the total cost into account

Summary

This patch teaches shouldBeDeferred to take into account the total
cost of inlining.

Suppose we have a call hierarchy {A1,A2,A3,...}->B->C. (Each of A1,
A2, A3, ... calls B, which in turn calls C.)

Without this patch, shouldBeDeferred essentially returns true if

TotalSecondaryCost < IC.getCost()

where TotalSecondaryCost is the total cost of inlining B into As.
This means that if B is a small wraper function, for example, it would
get inlined into all of As. In turn, C gets inlined into all of As.
In other words, shouldBeDeferred ignores the cost of inlining C into
each of As.

This patch replaces the expression above with:

TotalCost < Allowance

where

TotalCost is TotalSecondaryCost + IC.getCost() * # of As, and
Allowance is IC.getCost() * Scale

For now, Scale defaults to 2, which essentially limits the number of
As to 1 for shouldBeDeferred to return true.

With this patch, Clang PGO bootstrap results in a 0.33% smaller .text*
sections. Compiling the 10 largest preprocessed files of Clang with
the PGO bootstrapped clang takes:

69.677 seconds on average of five runs without the patch, and
68.939 seconds on average of five runs with the patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kazu created this revision.Apr 29 2020, 3:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2020, 3:45 PM

Herald added subscribers: hiraditya, eraman. · View Herald Transcript

Harbormaster completed remote builds in B55215: Diff 261073.Apr 29 2020, 4:46 PM

davidxl added inline comments.Apr 30 2020, 2:26 PM

llvm/lib/Transforms/IPO/Inliner.cpp
348	NumCallerUsers may be more explicit.

I've renamed SecondaryUsers to NumCallerUsers.

kazu marked an inline comment as done.Apr 30 2020, 4:25 PM

Harbormaster completed remote builds in B55386: Diff 261386.Apr 30 2020, 5:11 PM

This is a conceptually a very good change. I think it should be split into two stages. The first is to make the changes but with default settings as NFC. The second stage is to reset the parameter with some benchmark number (e.g, spec perf should not regress, also code size impact).

One way to to the first stage is to not consider NumCaller adjustment if the deferral scale is a special value (such as -1).

I've updated the patch to turn off the new cost calculation by default.

Harbormaster completed remote builds in B55699: Diff 261927.May 4 2020, 3:05 PM

lgtm

This revision is now accepted and ready to land.May 5 2020, 10:42 AM

Closed by commit rGe8984fe65b94: [Inlining] Teach shouldBeDeferred to take the total cost into account (authored by kazu). · Explain WhyMay 5 2020, 11:20 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

Inliner.cpp

27 lines

Diff 261927

llvm/lib/Transforms/IPO/Inliner.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
/// prior to LLVM's code generator having support for stack coloring based on		/// prior to LLVM's code generator having support for stack coloring based on
/// lifetime markers. It is now in the process of being removed. To experiment		/// lifetime markers. It is now in the process of being removed. To experiment
/// with disabling it and relying fully on lifetime marker based stack		/// with disabling it and relying fully on lifetime marker based stack
/// coloring, you can pass this flag to LLVM.		/// coloring, you can pass this flag to LLVM.
static cl::opt<bool>		static cl::opt<bool>
DisableInlinedAllocaMerging("disable-inlined-alloca-merging",		DisableInlinedAllocaMerging("disable-inlined-alloca-merging",
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

		// An integer used to limit the cost of inline deferral. The default negative
		// number tells shouldBeDeferred to only take the secondary cost into account.
		static cl::opt<int>
		InlineDeferralScale("inline-deferral-scale",
		cl::desc("Scale to limit the cost of inline deferral"),
		cl::init(-1), cl::Hidden);

namespace {		namespace {

enum class InlinerFunctionImportStatsOpts {		enum class InlinerFunctionImportStatsOpts {
No = 0,		No = 0,
Basic = 1,		Basic = 1,
Verbose = 2,		Verbose = 2,
};		};

▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	shouldBeDeferred(Function *Caller, InlineCost IC, int &TotalSecondaryCost,
TotalSecondaryCost = 0;		TotalSecondaryCost = 0;
// The candidate cost to be imposed upon the current function.		// The candidate cost to be imposed upon the current function.
int CandidateCost = IC.getCost() - 1;		int CandidateCost = IC.getCost() - 1;
// If the caller has local linkage and can be inlined to all its callers, we		// If the caller has local linkage and can be inlined to all its callers, we
// can apply a huge negative bonus to TotalSecondaryCost.		// can apply a huge negative bonus to TotalSecondaryCost.
bool ApplyLastCallBonus = Caller->hasLocalLinkage() && !Caller->hasOneUse();		bool ApplyLastCallBonus = Caller->hasLocalLinkage() && !Caller->hasOneUse();
// This bool tracks what happens if we DO inline C into B.		// This bool tracks what happens if we DO inline C into B.
bool InliningPreventsSomeOuterInline = false;		bool InliningPreventsSomeOuterInline = false;
		unsigned NumCallerUsers = 0;
		davidxlUnsubmitted Done Reply Inline Actions NumCallerUsers may be more explicit. davidxl: NumCallerUsers may be more explicit.
for (User *U : Caller->users()) {		for (User *U : Caller->users()) {
// If the caller will not be removed (either because it does not have a
// local linkage or because the LastCallToStaticBonus has been already
// applied), then we can exit the loop early.
if (!ApplyLastCallBonus && TotalSecondaryCost >= IC.getCost())
return false;
CallBase *CS2 = dyn_cast<CallBase>(U);		CallBase *CS2 = dyn_cast<CallBase>(U);

// If this isn't a call to Caller (it could be some other sort		// If this isn't a call to Caller (it could be some other sort
// of reference) skip it. Such references will prevent the caller		// of reference) skip it. Such references will prevent the caller
// from being removed.		// from being removed.
if (!CS2 \|\| CS2->getCalledFunction() != Caller) {		if (!CS2 \|\| CS2->getCalledFunction() != Caller) {
ApplyLastCallBonus = false;		ApplyLastCallBonus = false;
continue;		continue;
Show All 9 Lines	if (IC2.isAlways())
continue;		continue;

// See if inlining of the original callsite would erase the cost delta of		// See if inlining of the original callsite would erase the cost delta of
// this callsite. We subtract off the penalty for the call instruction,		// this callsite. We subtract off the penalty for the call instruction,
// which we would be deleting.		// which we would be deleting.
if (IC2.getCostDelta() <= CandidateCost) {		if (IC2.getCostDelta() <= CandidateCost) {
InliningPreventsSomeOuterInline = true;		InliningPreventsSomeOuterInline = true;
TotalSecondaryCost += IC2.getCost();		TotalSecondaryCost += IC2.getCost();
		NumCallerUsers++;
}		}
}		}

		if (!InliningPreventsSomeOuterInline)
		return false;

// If all outer calls to Caller would get inlined, the cost for the last		// If all outer calls to Caller would get inlined, the cost for the last
// one is set very low by getInlineCost, in anticipation that Caller will		// one is set very low by getInlineCost, in anticipation that Caller will
// be removed entirely. We did not account for this above unless there		// be removed entirely. We did not account for this above unless there
// is only one caller of Caller.		// is only one caller of Caller.
if (ApplyLastCallBonus)		if (ApplyLastCallBonus)
TotalSecondaryCost -= InlineConstants::LastCallToStaticBonus;		TotalSecondaryCost -= InlineConstants::LastCallToStaticBonus;

return InliningPreventsSomeOuterInline && TotalSecondaryCost < IC.getCost();		// If InlineDeferralScale is negative, then ignore the cost of primary
		// inlining -- IC.getCost() multiplied by the number of callers to Caller.
		if (InlineDeferralScale < 0)
		return TotalSecondaryCost < IC.getCost();

		int TotalCost = TotalSecondaryCost + IC.getCost() * NumCallerUsers;
		int Allowance = IC.getCost() * InlineDeferralScale;
		return TotalCost < Allowance;
}		}

static std::basic_ostream<char> &operator<<(std::basic_ostream<char> &R,		static std::basic_ostream<char> &operator<<(std::basic_ostream<char> &R,
const ore::NV &Arg) {		const ore::NV &Arg) {
return R << Arg.Val;		return R << Arg.Val;
}		}

template <class RemarkT>		template <class RemarkT>
▲ Show 20 Lines • Show All 861 Lines • Show Last 20 Lines