This is an archive of the discontinued LLVM Phabricator instance.

Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter
ClosedPublic

Authored by danielcdh on Jul 14 2016, 10:02 AM.

Download Raw Diff

Details

Reviewers

davidxl
eraman

Commits

rGde39cb938436: Replace hot-callsite based heuristic to use its own threshold parameter instead…
rL277860: Replace hot-callsite based heuristic to use its own threshold parameter…

Summary

Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 63998.Jul 14 2016, 10:02 AM

danielcdh retitled this revision from to Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter.

danielcdh updated this object.

danielcdh added reviewers: eraman, davidxl.

danielcdh added a subscriber: llvm-commits.

it looks good for me, but I am not really competent to accept it yet, so I guess wait for the other reviewers.

should we have a fixed cutoff threshold, or make it adaptive to the hotness ? Or more generally, make it part of the global speedup analysis (the larger the global speedup, the larger the speedup) -- which we will soon have?

Another thing: PGO and autoFDO should reduce overall text size compared with O2 build, so we should also do the opposite -- if the caller has zero count (non samples with autoFDO), the threshold should be reduced.

add cold callsite heuristic

In D22368#488605, @davidxl wrote:

should we have a fixed cutoff threshold, or make it adaptive to the hotness ? Or more generally, make it part of the global speedup analysis (the larger the global speedup, the larger the speedup) -- which we will soon have?

I agree. But for now, the PSI interface only give use boolean value for hot/cold inquiry. How about we have this fixed cutoff in first, later when global speedup analysis is ready, we use the speedup as parameter to set the threshold?

Another thing: PGO and autoFDO should reduce overall text size compared with O2 build, so we should also do the opposite -- if the caller has zero count (non samples with autoFDO), the threshold should be reduced.

Added cold callsite heuristic.

eraman added inline comments.Jul 19 2016, 5:26 PM

lib/Analysis/InlineCost.cpp
69	Adding a threshold different from inlinehint-threshold makes sense. But if you want to set the default this high, then it is important to have numbers to justify. Running spec with different thresholds and a comment here saying you chose this default because this gives the best performance/size tradeoff is important.

LGTM.

[Copying Dehao's comments from the review thread for reference]

I experimented different thresholds from 325 (original) to 4000. The code size change is within 2% fall all speccpu2006 int benchmarks. In terms of performance. Up to 4% speedup is observed for perlbench when changing threshold from 325 to 3000. And above 3000, the performance curve remains flat. For all other speccpu2006 int benchmarks, the performance change does not escape noise range when changing threshold from 325 to 4000. Similar performance/size result is observed for internal benchmarks. So I think 3000 seems to be a sweet spot for the threshold.

This patch still needs an approval?

In D22368#501206, @danielcdh wrote:

This patch still needs an approval

LGTM from me. Don't know if David has any other comments.

This revision is now accepted and ready to land.Aug 5 2016, 10:24 AM

danielcdh closed this revision.Aug 5 2016, 1:36 PM

Revision Contents

Path

Size

lib/

Analysis/

InlineCost.cpp

23 lines

test/

Transforms/

Inline/

inline-hot-callsite.ll

2 lines

Diff 64552

lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

// We introduce this threshold to help performance of instrumentation based		// We introduce this threshold to help performance of instrumentation based
// PGO before we actually hook up inliner with analysis passes such as BPI and		// PGO before we actually hook up inliner with analysis passes such as BPI and
// BFI.		// BFI.
static cl::opt<int> ColdThreshold(		static cl::opt<int> ColdThreshold(
"inlinecold-threshold", cl::Hidden, cl::init(225),		"inlinecold-threshold", cl::Hidden, cl::init(225),
cl::desc("Threshold for inlining functions with cold attribute"));		cl::desc("Threshold for inlining functions with cold attribute"));

		static cl::opt<int>
		eramanUnsubmitted Not Done Reply Inline Actions Adding a threshold different from inlinehint-threshold makes sense. But if you want to set the default this high, then it is important to have numbers to justify. Running spec with different thresholds and a comment here saying you chose this default because this gives the best performance/size tradeoff is important. eraman: Adding a threshold different from inlinehint-threshold makes sense. But if you want to set the…
		HotCallSiteThreshold("hot-callsite-threshold", cl::Hidden, cl::init(3000),
		cl::ZeroOrMore,
		cl::desc("Threshold for hot callsites "));

namespace {		namespace {

class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {		class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
typedef InstVisitor<CallAnalyzer, bool> Base;		typedef InstVisitor<CallAnalyzer, bool> Base;
friend class InstVisitor<CallAnalyzer, bool>;		friend class InstVisitor<CallAnalyzer, bool>;

/// The TargetTransformInfo available for this compilation.		/// The TargetTransformInfo available for this compilation.
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
▲ Show 20 Lines • Show All 552 Lines • ▼ Show 20 Lines	if (DefaultInlineThreshold.getNumOccurrences() > 0) {
// attributes when they would decrease the threshold.		// attributes when they would decrease the threshold.
if (Caller->optForMinSize() && OptMinSizeThreshold < Threshold)		if (Caller->optForMinSize() && OptMinSizeThreshold < Threshold)
Threshold = OptMinSizeThreshold;		Threshold = OptMinSizeThreshold;
else if (Caller->optForSize() && OptSizeThreshold < Threshold)		else if (Caller->optForSize() && OptSizeThreshold < Threshold)
Threshold = OptSizeThreshold;		Threshold = OptSizeThreshold;
}		}

bool HotCallsite = false;		bool HotCallsite = false;
		bool ColdCallsite = false;
uint64_t TotalWeight;		uint64_t TotalWeight;
if (CS.getInstruction()->extractProfTotalWeight(TotalWeight) &&		if (CS.getInstruction()->extractProfTotalWeight(TotalWeight))
PSI->isHotCount(TotalWeight))		if (PSI->isHotCount(TotalWeight))
HotCallsite = true;		HotCallsite = true;
		else if (PSI->isColdCount(TotalWeight))
		ColdCallsite = true;

// Listen to the inlinehint attribute or profile based hotness information		// Listen to the inlinehint attribute or profile based hotness information
// when it would increase the threshold and the caller does not need to		// when it would increase the threshold and the caller does not need to
// minimize its size.		// minimize its size.
bool InlineHint = Callee.hasFnAttribute(Attribute::InlineHint) \|\|		bool InlineHint = Callee.hasFnAttribute(Attribute::InlineHint) \|\|
PSI->isHotFunction(&Callee) \|\|		PSI->isHotFunction(&Callee);
HotCallsite;
if (InlineHint && HintThreshold > Threshold && !Caller->optForMinSize())		if (InlineHint && HintThreshold > Threshold && !Caller->optForMinSize())
Threshold = HintThreshold;		Threshold = HintThreshold;

		if (HotCallsite && HotCallSiteThreshold > Threshold &&
		!Caller->optForMinSize())
		Threshold = HotCallSiteThreshold;

bool ColdCallee = PSI->isColdFunction(&Callee);		bool ColdCallee = PSI->isColdFunction(&Callee);
// Command line argument for DefaultInlineThreshold will override the default		// Command line argument for DefaultInlineThreshold will override the default
// ColdThreshold. If we have -inline-threshold but no -inlinecold-threshold,		// ColdThreshold. If we have -inline-threshold but no -inlinecold-threshold,
// do not use the default cold threshold even if it is smaller.		// do not use the default cold threshold even if it is smaller.
if ((DefaultInlineThreshold.getNumOccurrences() == 0 \|\|		if ((DefaultInlineThreshold.getNumOccurrences() == 0 \|\|
ColdThreshold.getNumOccurrences() > 0) &&		ColdThreshold.getNumOccurrences() > 0) &&
ColdCallee && ColdThreshold < Threshold)		(ColdCallee \|\| ColdCallsite) && ColdThreshold < Threshold)
Threshold = ColdThreshold;		Threshold = ColdThreshold;

// Finally, take the target-specific inlining threshold multiplier into		// Finally, take the target-specific inlining threshold multiplier into
// account.		// account.
Threshold *= TTI.getInliningThresholdMultiplier();		Threshold *= TTI.getInliningThresholdMultiplier();
}		}

bool CallAnalyzer::visitCmpInst(CmpInst &I) {		bool CallAnalyzer::visitCmpInst(CmpInst &I) {
▲ Show 20 Lines • Show All 888 Lines • Show Last 20 Lines

test/Transforms/Inline/inline-hot-callsite.ll

	; RUN: opt < %s -inline -inline-threshold=0 -inlinehint-threshold=100 -S \| FileCheck %s			; RUN: opt < %s -inline -inline-threshold=0 -hot-callsite-threshold=100 -S \| FileCheck %s

	; This tests that a hot callsite gets the (higher) inlinehint-threshold even without			; This tests that a hot callsite gets the (higher) inlinehint-threshold even without
	; without inline hints and gets inlined because the cost is less than			; without inline hints and gets inlined because the cost is less than
	; inlinehint-threshold. A cold callee with identical body does not get inlined because			; inlinehint-threshold. A cold callee with identical body does not get inlined because
	; cost exceeds the inline-threshold			; cost exceeds the inline-threshold

	define i32 @callee1(i32 %x) {			define i32 @callee1(i32 %x) {
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines