This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
3
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
inline-mergeprof.ll
-
inline-topdown.ll

Differential D82919

[SampleFDO] Enable sample-profile-top-down-load by default.
ClosedPublic

Authored by wmi on Jun 30 2020, 3:32 PM.

Download Raw Diff

Details

Reviewers

wenlei
davidxl

Commits

rGe32469a14037: [SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge…

Summary

sample-profile-top-down-load is an internal option which can enable top-down order of inlining and profile annotation in sample profile load pass. It was found to be beneficial for better profile annotation.

Recently we found it could also solve some build time issue. Suppose function A has many callsites in function B. In the last release binary where sample profile was collected, the outline copy of A is large because there are many other functions inlined into A. However although all the callsites calling A in B are inlined, but every inlined body is small (A was inlined into B before other functions are inlined into A), there is no build time issue in last release.

In an optimized build using the sample profile collected from last release, without top-down inlining, we saw a case that A got very large because of inlining, and then multiple callsites of A got inlined into B, and that led to a huge B which caused significant build time issue besides profile annotation issue.

To solve that problem, the patch proposes to enable the flag sample-profile-top-down-load by default.

I reevaluated the performance again in two server benchmarks. Run one benchmark 6 times, it had no performance change in 4 runs and had 0.2% improvement in 2 runs. Run another benchmark 6 times and it had no performance change.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wmi created this revision.Jun 30 2020, 3:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2020, 3:32 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

Great. We use this switch together with -sample-profile-merge-inlinee, so the profile to drive top-down inlining is more accurate and that often gives best performance. May want to turn both on together?

Sounds good to me. I will redo the performance test for it.

Enable -sample-profile-merge-inlinee by default together with -sample-profile-top-down-load.

I tested the performance with sample-profile-top-down-load and sample-profile-merge-inlinee both enabled. In different compiler versions I got different result. In one version about three weeks older, I got 0.4% improvement for one benchmark steadily in multiple runs and neutral for another. In the head llvm version, I saw neutral result for both benchmarks.

Disable sample-profile-merge-inlinee when sample-profile-top-down-load is not effective (Currently sample-profile-top-down-load is only effective for new pass manager).

Thanks for measurement. LGTM.

This revision is now accepted and ready to land.Jul 6 2020, 8:03 PM

lgtm

Closed by commit rGe32469a14037: [SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge… (authored by wmi). · Explain WhyJul 8 2020, 9:24 AM

This revision was automatically updated to reflect the committed changes.

ychen added a subscriber: ychen.Nov 30 2020, 3:35 PM

ychen added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1791	I don't fully understand the patch TBH. `ProfileMergeInlinee` is also set to false when `CG == nullptr` which holds true when the legacy pass manager is used. Is this intended?

hoy added a subscriber: hoy.Nov 30 2020, 6:47 PM

hoy added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1791	`ProfileMergeInlinee` is set to false when a top-down inlining is not available. This happens when the top-down inlining is explicitly disabled, or when a call graph is not available which means a top-down order cannot be computed.

ychen added inline comments.Nov 30 2020, 7:54 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1791	@hoy, thank you. I just realized that I should've asked this in D70655 where the call graph is not computed hence not available for the legacy pass manager.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

SampleProfile.cpp

20 lines

test/

Transforms/

SampleProfile/

inline-mergeprof.ll

8 lines

inline-topdown.ll

4 lines

Diff 276463

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines

static cl::opt<bool> ProfileAccurateForSymsInList(		static cl::opt<bool> ProfileAccurateForSymsInList(
"profile-accurate-for-symsinlist", cl::Hidden, cl::ZeroOrMore,		"profile-accurate-for-symsinlist", cl::Hidden, cl::ZeroOrMore,
cl::init(true),		cl::init(true),
cl::desc("For symbols in profile symbol list, regard their profiles to "		cl::desc("For symbols in profile symbol list, regard their profiles to "
"be accurate. It may be overriden by profile-sample-accurate. "));		"be accurate. It may be overriden by profile-sample-accurate. "));

static cl::opt<bool> ProfileMergeInlinee(		static cl::opt<bool> ProfileMergeInlinee(
"sample-profile-merge-inlinee", cl::Hidden, cl::init(false),		"sample-profile-merge-inlinee", cl::Hidden, cl::init(true),
cl::desc("Merge past inlinee's profile to outline version if sample "		cl::desc("Merge past inlinee's profile to outline version if sample "
"profile loader decided not to inline a call site."));		"profile loader decided not to inline a call site. It will "
		"only be enabled when top-down order of profile loading is "
		"enabled. "));

static cl::opt<bool> ProfileTopDownLoad(		static cl::opt<bool> ProfileTopDownLoad(
"sample-profile-top-down-load", cl::Hidden, cl::init(false),		"sample-profile-top-down-load", cl::Hidden, cl::init(true),
cl::desc("Do profile annotation and inlining for functions in top-down "		cl::desc("Do profile annotation and inlining for functions in top-down "
"order of call graph during sample profile loading."));		"order of call graph during sample profile loading. It only "
		"works for new pass manager. "));

static cl::opt<bool> ProfileSizeInline(		static cl::opt<bool> ProfileSizeInline(
"sample-profile-inline-size", cl::Hidden, cl::init(false),		"sample-profile-inline-size", cl::Hidden, cl::init(false),
cl::desc("Inline cold call sites in profile loader if it's beneficial "		cl::desc("Inline cold call sites in profile loader if it's beneficial "
"for code size."));		"for code size."));

static cl::opt<int> SampleColdCallSiteThreshold(		static cl::opt<int> SampleColdCallSiteThreshold(
"sample-profile-cold-inline-threshold", cl::Hidden, cl::init(45),		"sample-profile-cold-inline-threshold", cl::Hidden, cl::init(45),
▲ Show 20 Lines • Show All 1,612 Lines • ▼ Show 20 Lines	INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)

std::vector<Function *>		std::vector<Function *>
SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {		SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {
std::vector<Function *> FunctionOrderList;		std::vector<Function *> FunctionOrderList;
FunctionOrderList.reserve(M.size());		FunctionOrderList.reserve(M.size());

if (!ProfileTopDownLoad \|\| CG == nullptr) {		if (!ProfileTopDownLoad \|\| CG == nullptr) {
		if (ProfileMergeInlinee) {
		ychenUnsubmitted Not Done Reply Inline Actions I don't fully understand the patch TBH. `ProfileMergeInlinee` is also set to false when `CG == nullptr` which holds true when the legacy pass manager is used. Is this intended? ychen: I don't fully understand the patch TBH. `ProfileMergeInlinee` is also set to false when `CG ==…
		hoyUnsubmitted Not Done Reply Inline Actions `ProfileMergeInlinee` is set to false when a top-down inlining is not available. This happens when the top-down inlining is explicitly disabled, or when a call graph is not available which means a top-down order cannot be computed. hoy: `ProfileMergeInlinee` is set to false when a top-down inlining is not available. This happens…
		ychenUnsubmitted Not Done Reply Inline Actions @hoy, thank you. I just realized that I should've asked this in D70655 where the call graph is not computed hence not available for the legacy pass manager. ychen: @hoy, thank you. I just realized that I should've asked this in D70655 where the call graph is…
		// Disable ProfileMergeInlinee if profile is not loaded in top down order,
		// because the profile for a function may be used for the profile
		// annotation of its outline copy before the profile merging of its
		// non-inlined inline instances, and that is not the way how
		// ProfileMergeInlinee is supposed to work.
		ProfileMergeInlinee = false;
		}

for (Function &F : M)		for (Function &F : M)
if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile"))		if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile"))
FunctionOrderList.push_back(&F);		FunctionOrderList.push_back(&F);
return FunctionOrderList;		return FunctionOrderList;
}		}

assert(&CG->getModule() == &M);		assert(&CG->getModule() == &M);
scc_iterator<CallGraph *> CGI = scc_begin(CG);		scc_iterator<CallGraph *> CGI = scc_begin(CG);
▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/inline-mergeprof.ll

	; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee'			; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee'
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S \| FileCheck -check-prefix=SCALE %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S \| FileCheck -check-prefix=SCALE %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -S \| FileCheck -check-prefix=SCALE %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S \| FileCheck -check-prefix=SCALE %s
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S \| FileCheck -check-prefix=SCALE %s

	; Test we properly merge not inlined profile properly with '-sample-profile-merge-inlinee'			; Test we properly merge not inlined profile properly with '-sample-profile-merge-inlinee'
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S \| FileCheck -check-prefix=MERGE %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S \| FileCheck -check-prefix=MERGE %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee -S \| FileCheck -check-prefix=MERGE %s

	@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1			@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1

	define i32 @main() #0 !dbg !6 {			define i32 @main() #0 !dbg !6 {
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%s = alloca i32, align 4			%s = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/inline-topdown.ll

	; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op.			; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op.

	; Test we aren't doing specialization for inlining with default source order			; Test we aren't doing specialization for inlining with default source order
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -S \| FileCheck -check-prefix=DEFAULT %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-top-down-load=false -S \| FileCheck -check-prefix=DEFAULT %s

	; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load'			; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load'
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -S \| FileCheck -check-prefix=TOPDOWN %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load=true -S \| FileCheck -check-prefix=TOPDOWN %s


	@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1			@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1

	define i32 @_Z3sumii(i32 %x, i32 %y) #0 !dbg !6 {			define i32 @_Z3sumii(i32 %x, i32 %y) #0 !dbg !6 {
	entry:			entry:
	%x.addr = alloca i32, align 4			%x.addr = alloca i32, align 4
	%y.addr = alloca i32, align 4			%y.addr = alloca i32, align 4
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines