This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] Reduce inline thresholds to compensate for cost changes
ClosedPublic

Authored by jmolloy on Nov 18 2016, 4:49 AM.

Download Raw Diff

Details

Reviewers

mzolotukhin
davidxl
jmolloy
Gerolf
eraman

Summary

In r286814, the algorithm for calculating inline costs changed. This
caused more inlining to take place which is especially apparent
in optsize and minsize modes.

As the cost calculation removed a skewed behaviour (we were inconsistent
about the cost of calls) it isn't possible to update the thresholds to
get exactly the same behaviour as before. However, this threshold change
accounts for the very common case where an inline candidate has no
calls within it. In this case, r286814 would inline around 5-6 more (IR)
instructions.

The changes to -Oz have been heavily benchmarked. The "obvious" value
for the inline threshold at -Oz is zero, but due to inaccuracies in the
inline heuristics this can actually cause code size increases due to
not inlining key thunk functions (that then disappear). Experimentally,
5 was the sweet spot for code size over the test-suite.

For -Os, this change removes the outlier results shown up by green dragon
(http://104.154.54.203/db_default/v4/nts/13248).

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 78501.Nov 18 2016, 4:49 AM

jmolloy retitled this revision from to [PATCH] Reduce inline thresholds to compensate for cost changes.

jmolloy updated this object.

jmolloy added reviewers: Gerolf, mzolotukhin, davidxl, eraman.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

davidxl added inline comments.Nov 18 2016, 9:20 AM

include/llvm/Analysis/InlineCost.h
39	Any reason to reduce O3 threshold?

The Oz case looks interesting. Can you share more details/insights about the "inaccuracies" w/ specific examples? I'm wondering if that can be fixed in general or be more triggered towards some trunk characteristics. But this is just something we can think about and discuss while moving on and celebrate the ct/cs recoveries :-). So LGTM!

Hi James,

Thanks for following up, a couple of questions inline.

Best regards,
Michael

test/Transforms/Inline/ephemeral.ll
24	Why is this change needed?
test/Transforms/Inline/inline-fp.ll
135–136	Is this one of the cases where we can not preserve the original behavior, or we intentionally want another one here (no context in the patch makes it harder to comprehend from here)?

Hi Michael,

Sorry for the lack of context - see inline comments.

Cheers,

James

test/Transforms/Inline/ephemeral.ll
24	Simply, the inner function was doing too much work. It was inlined in minsize mode before, but isn't now (and the new behaviour is correct - the inlining would previously have increased codesize). Reducing the amount of work the inner function does allows it to be inlined.
test/Transforms/Inline/inline-fp.ll
135–136	Apologies for the lack of context - error on my side. The testcase is attempting to ensure that soft-float calls are more expensive to the inliner than hardware floating point ops. The test was obviously reduced from a real program, and expects an inner function to be inlined. This inner function doesn't contain any function calls, and is the maximum size we would have inlined previously. We've reduced the minsize threshold, so these should no longer get inlined. Upping the test to requiring optsize instead of minsize changes the threshold, and the original behaviour is maintained.

Hi James,

Thanks for the comments, all clear now. I didn't mean to hold the patch by them.

Michael

Diffusion mentioned this in rL288024: [InlineCost] Reduce inline thresholds to compensate for cost changes.Nov 28 2016, 3:17 AM

Committed in r288024.

This revision is now accepted and ready to land.Nov 28 2016, 6:45 AM

jmolloy closed this revision.Nov 28 2016, 6:45 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

InlineCost.h

6 lines

test/

Transforms/

Inline/

ephemeral.ll

14 lines

inline-fp.ll

4 lines

Diff 78692

include/llvm/Analysis/InlineCost.h

	Show All 24 Lines
	class DataLayout;			class DataLayout;
	class Function;			class Function;
	class ProfileSummaryInfo;			class ProfileSummaryInfo;
	class TargetTransformInfo;			class TargetTransformInfo;

	namespace InlineConstants {			namespace InlineConstants {
	// Various thresholds used by inline cost analysis.			// Various thresholds used by inline cost analysis.
	/// Use when optsize (-Os) is specified.			/// Use when optsize (-Os) is specified.
	const int OptSizeThreshold = 75;			const int OptSizeThreshold = 50;

	/// Use when minsize (-Oz) is specified.			/// Use when minsize (-Oz) is specified.
	const int OptMinSizeThreshold = 25;			const int OptMinSizeThreshold = 5;

	/// Use when -O3 is specified.			/// Use when -O3 is specified.
	const int OptAggressiveThreshold = 275;			const int OptAggressiveThreshold = 250;
				davidxlUnsubmitted Not Done Reply Inline Actions Any reason to reduce O3 threshold? davidxl: Any reason to reduce O3 threshold?

	// Various magic constants used to adjust heuristics.			// Various magic constants used to adjust heuristics.
	const int InstrCost = 5;			const int InstrCost = 5;
	const int IndirectCallThreshold = 100;			const int IndirectCallThreshold = 100;
	const int CallPenalty = 25;			const int CallPenalty = 25;
	const int LastCallToStaticBonus = 15000;			const int LastCallToStaticBonus = 15000;
	const int ColdccPenalty = 2000;			const int ColdccPenalty = 2000;
	const int NoreturnPenalty = 10000;			const int NoreturnPenalty = 10000;
	▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

test/Transforms/Inline/ephemeral.ll

	; RUN: opt -S -Oz %s \| FileCheck %s			; RUN: opt -S -Oz %s \| FileCheck %s

	@a = global i32 4			@a = global i32 4

	define i1 @inner() {			define i32 @inner() {
	%a1 = load volatile i32, i32* @a			%a1 = load volatile i32, i32* @a
	%x1 = add i32 %a1, %a1
	%c = icmp eq i32 %x1, 0

	; Here are enough instructions to prevent inlining, but because they are used			; Here are enough instructions to prevent inlining, but because they are used
	; only by the @llvm.assume intrinsic, they're free (and, thus, inlining will			; only by the @llvm.assume intrinsic, they're free (and, thus, inlining will
	; still happen).			; still happen).
	%a2 = mul i32 %a1, %a1			%a2 = mul i32 %a1, %a1
	%a3 = sub i32 %a1, 5			%a3 = sub i32 %a1, 5
	%a4 = udiv i32 %a3, -13			%a4 = udiv i32 %a3, -13
	%a5 = mul i32 %a4, %a4			%a5 = mul i32 %a4, %a4
	%a6 = add i32 %a5, %x1			%a6 = add i32 %a5, %a5
	%ca = icmp sgt i32 %a6, -7			%ca = icmp sgt i32 %a6, -7
	tail call void @llvm.assume(i1 %ca)			tail call void @llvm.assume(i1 %ca)

	ret i1 %c			ret i32 %a1
	}			}

	; @inner() should be inlined for -Oz.			; @inner() should be inlined for -Oz.
	; CHECK-NOT: call i1 @inner			; CHECK-NOT: call i1 @inner
	define i1 @outer() optsize {			define i32 @outer() optsize {
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Why is this change needed? mzolotukhin: Why is this change needed?
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Simply, the inner function was doing too much work. It was inlined in minsize mode before, but isn't now (and the new behaviour is correct - the inlining would previously have increased codesize). Reducing the amount of work the inner function does allows it to be inlined. jmolloy: Simply, the inner function was doing too much work. It was inlined in minsize mode before, but…
	%r = call i1 @inner()			%r = call i32 @inner()
	ret i1 %r			ret i32 %r
	}			}

	declare void @llvm.assume(i1) nounwind			declare void @llvm.assume(i1) nounwind

test/Transforms/Inline/inline-fp.ll

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	entry:
%div = fdiv float %sub3, %mul		%div = fdiv float %sub3, %mul
ret float %div		ret float %div
}		}

declare float @fabsf(float) optsize minsize		declare float @fabsf(float) optsize minsize

declare float @llvm.pow.f32(float, float) optsize minsize		declare float @llvm.pow.f32(float, float) optsize minsize

attributes #0 = { minsize optsize }		attributes #0 = { optsize }
attributes #1 = { minsize optsize "use-soft-float"="true" }		attributes #1 = { optsize "use-soft-float"="true" }
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Is this one of the cases where we can not preserve the original behavior, or we intentionally want another one here (no context in the patch makes it harder to comprehend from here)? mzolotukhin: Is this one of the cases where we can not preserve the original behavior, or we intentionally…
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Apologies for the lack of context - error on my side. The testcase is attempting to ensure that soft-float calls are more expensive to the inliner than hardware floating point ops. The test was obviously reduced from a real program, and expects an inner function to be inlined. This inner function doesn't contain any function calls, and is the maximum size we would have inlined previously. We've reduced the minsize threshold, so these should no longer get inlined. Upping the test to requiring optsize instead of minsize changes the threshold, and the original behaviour is maintained. jmolloy: Apologies for the lack of context - error on my side. The testcase is attempting to ensure…