This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Analysis/
-
lib/
-
Analysis/
1/1
InlineOrder.cpp

Differential D134376

[ModuleInliner] Add a cost-benefit-based priority
ClosedPublic

Authored by kazu on Sep 21 2022, 10:50 AM.

Download Raw Diff

Details

Reviewers

taolq

Commits

rG4e9dd21015f2: [ModuleInliner] Add a cost-benefit-based priority

Summary

This patch teaches the module inliner a traversal order designed for
the instrumentation FDO (+ThinLTO) scenario.

The new traversal order prioritizes call sites in the following order:

Those call sites that are expected to reduce the caller size

Those call sites that have gone through the cost-benefit analaysis

The remaining call sites

With this fairly simple traversal order, a large internel benchmark
yields performance comparable to the bottom-up inliner -- both in
terms of the execution performance and .text* sizes.

Big thanks goes to Liqiang Tao for the module inliner infrastructure.

I still have hacks outside this patch to prevent excessively long
compilation or .text* size explosion. I'm trying to come up with
acceptable solutions in near future.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::lljit-initialize-deinitialize.ll
	200 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::priority-static-initializer.S
	210 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S

Event Timeline

kazu created this revision.Sep 21 2022, 10:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:50 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

kazu requested review of this revision.Sep 21 2022, 10:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B188006: Diff 461947.Sep 21 2022, 10:51 AM

kazu mentioned this in D134373: [Analysis] Introduce getStaticBonusApplied (NFC).Sep 21 2022, 10:56 AM

Use getStaticBonusApplied.

kazu retitled this revision from [ModuleInliner] Add a cost-benefit-based priority (WORK-IN-PROGRESS) to [ModuleInliner] Add a cost-benefit-based priority.Sep 25 2022, 11:54 PM

kazu edited the summary of this revision. (Show Details)

Herald added a subscriber: wenlei. · View Herald TranscriptSep 25 2022, 11:54 PM

kazu added a reviewer: taolq.Sep 25 2022, 11:54 PM

Harbormaster completed remote builds in B188645: Diff 462822.Sep 26 2022, 1:54 AM

Please take a look. Thanks!

a large internel benchmark yields performance comparable to the bottom-up inliner -- both in terms of the execution performance and .text* sizes.

This looks promising. The comparison is between enable-module-inliner on vs off, right? Do you plan to tune and open up module inliner for sample PGO and non-PGO cases where cost-benefit analysis isn't available yet?

llvm/lib/Analysis/InlineOrder.cpp
34	perhaps just name it "cost-benefit"? "ratio" can be confusing.

In D134376#3818545, @wenlei wrote:

a large internel benchmark yields performance comparable to the bottom-up inliner -- both in terms of the execution performance and .text* sizes.

This looks promising. The comparison is between enable-module-inliner on vs off, right?

Thanks. Yes, -mllvm -enable-module-inliner is the only difference. Both the baseline and the experiment use FDO, ThinLTO, and -fsplit-machine-functions.

One thing I might point out here is that I am not doing any cleanup beyond whatever basic cleanups InlineFunction performs. With the bottom-up inliner, we diligently clean up after each SCC (see PassBuilder::addPGOInstrPasses), but that doesn't seem to matter with the module inliner. My hypothesis is that once we inline those call sites that reduce the caller size, followed by ones with high benefit-to-cost ratios, then we've captured the vast majority of the benefit from inlining. That is, we don't really need the exact "tightest" instruction count after DCE, CSE, etc. Even if the module inliner didn't give us additional performance, it could still simplify our life -- no CGSCC maintenance or successive cleanups.

Do you plan to tune and open up module inliner for sample PGO and non-PGO cases where cost-benefit analysis isn't available yet?

Yes, I'd like to do something for the sample PGO case. I'd like to enable the cost-benefit analysis for the sample PGO, but that's been my hardest project. IIUC, the sample profile loader keeps inlining functions top down until the weight of the inlining subtree goes below the threshold. As a result, by the time we get to the profile-driven inliner (Inliner.cpp), we don't have a lot of interesting decisions left to make. For the sample PGO case, I think I have to depart from the top-down inlining and start inlining callees from where it matters according to some combinations of context sensitivity, profile counts, and the usual metrics from trial inlining (InlineCost.cpp), etc.

The non-PGO case isn't our primary interest. That said, if I come up with reasonable heuristics, I might contribute that. There may be a long-term benefit in steering the community toward the module inliner as the sole inliner as opposed to directing them to the two different inliners (Inliner.cpp and ModuleInliner.cpp) depending on whether they are using instrumentation FDO or not.

LGTM.
Thanks for this work. It's glad to see this infrastructure works well.
Looking forward to discovering reasonable heuristics :-D
Don't forget to resolve the comment.

This revision is now accepted and ready to land.Sep 29 2022, 12:59 AM

Renamed OptRatio to CostBenefit.

Renamed optratio to cost-benefit.

kazu marked an inline comment as done.Sep 29 2022, 8:59 AM

This revision was landed with ongoing or failed builds.Sep 29 2022, 9:00 AM

Closed by commit rG4e9dd21015f2: [ModuleInliner] Add a cost-benefit-based priority (authored by kazu). · Explain Why

This revision was automatically updated to reflect the committed changes.

kazu added a commit: rG4e9dd21015f2: [ModuleInliner] Add a cost-benefit-based priority.

In D134376#3822942, @taolq wrote:

LGTM.
Thanks for this work. It's glad to see this infrastructure works well.
Looking forward to discovering reasonable heuristics :-D

Thank you for the review! We just started the module inliner journey. I'm also looking forward to improving both the priority and threshold functions.

Harbormaster completed remote builds in B189438: Diff 463925.Sep 29 2022, 10:16 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

InlineOrder.cpp

84 lines

Diff 462822

llvm/lib/Analysis/InlineOrder.cpp

Show All 24 Lines
enum class InlinePriorityMode : int { Size, Cost, OptRatio };		enum class InlinePriorityMode : int { Size, Cost, OptRatio };

static cl::opt<InlinePriorityMode> UseInlinePriority(		static cl::opt<InlinePriorityMode> UseInlinePriority(
"inline-priority-mode", cl::init(InlinePriorityMode::Size), cl::Hidden,		"inline-priority-mode", cl::init(InlinePriorityMode::Size), cl::Hidden,
cl::desc("Choose the priority mode to use in module inline"),		cl::desc("Choose the priority mode to use in module inline"),
cl::values(clEnumValN(InlinePriorityMode::Size, "size",		cl::values(clEnumValN(InlinePriorityMode::Size, "size",
"Use callee size priority."),		"Use callee size priority."),
clEnumValN(InlinePriorityMode::Cost, "cost",		clEnumValN(InlinePriorityMode::Cost, "cost",
"Use inline cost priority.")));		"Use inline cost priority."),
		clEnumValN(InlinePriorityMode::OptRatio, "optratio",
		wenleiUnsubmitted Done Reply Inline Actions perhaps just name it "cost-benefit"? "ratio" can be confusing. wenlei: perhaps just name it "cost-benefit"? "ratio" can be confusing.
		"Use cost-benefit ratio.")));

		static cl::opt<int> ModuleInlinerTopPriorityThreshold(
		"moudle-inliner-top-priority-threshold", cl::Hidden, cl::init(0),
		cl::desc("The cost threshold for call sites that get inlined without the "
		"cost-benefit analysis"));

namespace {		namespace {

llvm::InlineCost getInlineCostWrapper(CallBase &CB,		llvm::InlineCost getInlineCostWrapper(CallBase &CB,
FunctionAnalysisManager &FAM,		FunctionAnalysisManager &FAM,
const InlineParams &Params) {		const InlineParams &Params) {
Function &Caller = *CB.getCaller();		Function &Caller = *CB.getCaller();
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:
static bool isMoreDesirable(const CostPriority &P1, const CostPriority &P2) {		static bool isMoreDesirable(const CostPriority &P1, const CostPriority &P2) {
return P1.Cost < P2.Cost;		return P1.Cost < P2.Cost;
}		}

private:		private:
int Cost;		int Cost;
};		};

		class CostBenefitPriority {
		public:
		CostBenefitPriority() = default;
		CostBenefitPriority(const CallBase *CB, FunctionAnalysisManager &FAM,
		const InlineParams &Params) {
		auto IC = getInlineCostWrapper(const_cast<CallBase &>(*CB), FAM, Params);
		Cost = IC.getCost();
		StaticBonusApplied = IC.getStaticBonusApplied();
		CostBenefit = IC.getCostBenefit();
		}

		static bool isMoreDesirable(const CostBenefitPriority &P1,
		const CostBenefitPriority &P2) {
		// We prioritize call sites in the dictionary order of the following
		// priorities:
		//
		// 1. Those call sites that are expected to reduce the caller size when
		// inlined. Within them, we prioritize those call sites with bigger
		// reduction.
		//
		// 2. Those call sites that have gone through the cost-benefit analysis.
		// Currently, they are limited to hot call sites. Within them, we
		// prioritize those call sites with higher benefit-to-cost ratios.
		//
		// 3. Remaining call sites are prioritized according to their costs.

		// We add back StaticBonusApplied to determine whether we expect the caller
		// to shrink (even if we don't delete the callee).
		bool P1ReducesCallerSize =
		P1.Cost + P1.StaticBonusApplied < ModuleInlinerTopPriorityThreshold;
		bool P2ReducesCallerSize =
		P2.Cost + P2.StaticBonusApplied < ModuleInlinerTopPriorityThreshold;
		if (P1ReducesCallerSize \|\| P2ReducesCallerSize) {
		// If one reduces the caller size while the other doesn't, then return
		// true iff P1 reduces the caller size.
		if (P1ReducesCallerSize != P2ReducesCallerSize)
		return P1ReducesCallerSize;

		// If they both reduce the caller size, pick the one with the smaller
		// cost.
		return P1.Cost < P2.Cost;
		}

		bool P1HasCB = P1.CostBenefit.has_value();
		bool P2HasCB = P2.CostBenefit.has_value();
		if (P1HasCB \|\| P2HasCB) {
		// If one has undergone the cost-benefit analysis while the other hasn't,
		// then return true iff P1 has.
		if (P1HasCB != P2HasCB)
		return P1HasCB;

		// If they have undergone the cost-benefit analysis, then pick the one
		// with a higher benefit-to-cost ratio.
		APInt LHS = P1.CostBenefit->getBenefit() * P2.CostBenefit->getCost();
		APInt RHS = P2.CostBenefit->getBenefit() * P1.CostBenefit->getCost();
		return LHS.ugt(RHS);
		}

		// Remaining call sites are ordered according to their costs.
		return P1.Cost < P2.Cost;
		}

		private:
		int Cost;
		int StaticBonusApplied;
		Optional<CostBenefitPair> CostBenefit;
		};

template <typename PriorityT>		template <typename PriorityT>
class PriorityInlineOrder : public InlineOrder<std::pair<CallBase *, int>> {		class PriorityInlineOrder : public InlineOrder<std::pair<CallBase *, int>> {
using T = std::pair<CallBase *, int>;		using T = std::pair<CallBase *, int>;

bool hasLowerPriority(const CallBase L, const CallBase R) const {		bool hasLowerPriority(const CallBase L, const CallBase R) const {
const auto I1 = Priorities.find(L);		const auto I1 = Priorities.find(L);
const auto I2 = Priorities.find(R);		const auto I2 = Priorities.find(R);
assert(I1 != Priorities.end() && I2 != Priorities.end());		assert(I1 != Priorities.end() && I2 != Priorities.end());
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	llvm::getInlineOrder(FunctionAnalysisManager &FAM, const InlineParams &Params) {
case InlinePriorityMode::Size:		case InlinePriorityMode::Size:
LLVM_DEBUG(dbgs() << " Current used priority: Size priority ---- \n");		LLVM_DEBUG(dbgs() << " Current used priority: Size priority ---- \n");
return std::make_unique<PriorityInlineOrder<SizePriority>>(FAM, Params);		return std::make_unique<PriorityInlineOrder<SizePriority>>(FAM, Params);

case InlinePriorityMode::Cost:		case InlinePriorityMode::Cost:
LLVM_DEBUG(dbgs() << " Current used priority: Cost priority ---- \n");		LLVM_DEBUG(dbgs() << " Current used priority: Cost priority ---- \n");
return std::make_unique<PriorityInlineOrder<CostPriority>>(FAM, Params);		return std::make_unique<PriorityInlineOrder<CostPriority>>(FAM, Params);

default:		case InlinePriorityMode::OptRatio:
llvm_unreachable("Unsupported Inline Priority Mode");		LLVM_DEBUG(
break;		dbgs() << " Current used priority: cost-benefit priority ---- \n");
		return std::make_unique<PriorityInlineOrder<CostBenefitPriority>>(FAM, Params);
}		}
return nullptr;		return nullptr;
}		}