This is an archive of the discontinued LLVM Phabricator instance.

[SamplePGO] Skip inlinee profile scaling for sample loader inlining
ClosedPublic

Authored by wenlei on Mar 8 2021, 8:33 AM.

Download Raw Diff

Details

Reviewers

wmi
hoy
davidxl

Commits

rG051f2c144e1e: [SamplePGO] Skip inlinee profile scaling for sample loader inlining

Summary

For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wenlei created this revision.Mar 8 2021, 8:33 AM

Herald added subscribers: modimo, lxfind, hiraditya. · View Herald TranscriptMar 8 2021, 8:33 AM

wenlei requested review of this revision.Mar 8 2021, 8:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2021, 8:33 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

We have this change internally, and it shows a 0.2-0.3% geomean win on spec2017 with CSSPGO (from small wins on larger ones like xalancbmk). I think it should help baseline AutoFDO too.

hoy accepted this revision.Mar 8 2021, 11:41 AM

This revision is now accepted and ready to land.Mar 8 2021, 11:41 AM

Harbormaster completed remote builds in B92677: Diff 329032.Mar 8 2021, 12:44 PM

@wmi any concern with landing this change? I don't expect any issues, but just to be prudent do you want to experiment before this is landed?

In D98187#2618605, @wenlei wrote:

@wmi any concern with landing this change? I don't expect any issues, but just to be prudent do you want to experiment before this is landed?

That is a nice catch! I will definitely experiment it and expect some improvement from it. Will have the data tomorrow and get back.

In D98187#2618612, @wmi wrote:

In D98187#2618605, @wenlei wrote:

@wmi any concern with landing this change? I don't expect any issues, but just to be prudent do you want to experiment before this is landed?

That is a nice catch! I will definitely experiment it and expect some improvement from it. Will have the data tomorrow and get back.

I got ~0.2% improvement on our search benchmark. That is a nice improvement. Thanks!

In D98187#2619936, @wmi wrote:

In D98187#2618612, @wmi wrote:

In D98187#2618605, @wenlei wrote:

@wmi any concern with landing this change? I don't expect any issues, but just to be prudent do you want to experiment before this is landed?

That is a nice catch! I will definitely experiment it and expect some improvement from it. Will have the data tomorrow and get back.

I got ~0.2% improvement on our search benchmark. That is a nice improvement. Thanks!

Thanks for the measurement, great to know it helps.

Closed by commit rG051f2c144e1e: [SamplePGO] Skip inlinee profile scaling for sample loader inlining (authored by wenlei). · Explain WhyMar 11 2021, 10:18 AM

This revision was automatically updated to reflect the committed changes.

wenlei added a commit: rG051f2c144e1e: [SamplePGO] Skip inlinee profile scaling for sample loader inlining.

wmi mentioned this in D99394: [SampleFDO] Do not scale the magic number NOMORE_ICP_MAGICNUM in value profile during profile update..Mar 25 2021, 5:30 PM

wmi mentioned this in rG3cbf44190b59: [SampleFDO] Do not scale the magic number NOMORE_ICP_MAGICNUM in value profile.Mar 29 2021, 9:34 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

Cloning.h

9 lines

lib/

Transforms/

IPO/

SampleProfile.cpp

1 line

Utils/

InlineFunction.cpp

20 lines

Diff 330011

llvm/include/llvm/Transforms/Utils/Cloning.h

	Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	/// the auxiliary results produced by it.			/// the auxiliary results produced by it.
	class InlineFunctionInfo {			class InlineFunctionInfo {
	public:			public:
	explicit InlineFunctionInfo(			explicit InlineFunctionInfo(
	CallGraph *cg = nullptr,			CallGraph *cg = nullptr,
	function_ref<AssumptionCache &(Function &)> GetAssumptionCache = nullptr,			function_ref<AssumptionCache &(Function &)> GetAssumptionCache = nullptr,
	ProfileSummaryInfo *PSI = nullptr,			ProfileSummaryInfo *PSI = nullptr,
	BlockFrequencyInfo *CallerBFI = nullptr,			BlockFrequencyInfo *CallerBFI = nullptr,
	BlockFrequencyInfo *CalleeBFI = nullptr)			BlockFrequencyInfo *CalleeBFI = nullptr, bool UpdateProfile = true)
	: CG(cg), GetAssumptionCache(GetAssumptionCache), PSI(PSI),			: CG(cg), GetAssumptionCache(GetAssumptionCache), PSI(PSI),
	CallerBFI(CallerBFI), CalleeBFI(CalleeBFI) {}			CallerBFI(CallerBFI), CalleeBFI(CalleeBFI),
				UpdateProfile(UpdateProfile) {}

	/// If non-null, InlineFunction will update the callgraph to reflect the			/// If non-null, InlineFunction will update the callgraph to reflect the
	/// changes it makes.			/// changes it makes.
	CallGraph *CG;			CallGraph *CG;
	function_ref<AssumptionCache &(Function &)> GetAssumptionCache;			function_ref<AssumptionCache &(Function &)> GetAssumptionCache;
	ProfileSummaryInfo *PSI;			ProfileSummaryInfo *PSI;
	BlockFrequencyInfo CallerBFI, CalleeBFI;			BlockFrequencyInfo CallerBFI, CalleeBFI;

	/// InlineFunction fills this in with all static allocas that get copied into			/// InlineFunction fills this in with all static allocas that get copied into
	/// the caller.			/// the caller.
	SmallVector<AllocaInst *, 4> StaticAllocas;			SmallVector<AllocaInst *, 4> StaticAllocas;

	/// InlineFunction fills this in with callsites that were inlined from the			/// InlineFunction fills this in with callsites that were inlined from the
	/// callee. This is only filled in if CG is non-null.			/// callee. This is only filled in if CG is non-null.
	SmallVector<WeakTrackingVH, 8> InlinedCalls;			SmallVector<WeakTrackingVH, 8> InlinedCalls;

	/// All of the new call sites inlined into the caller.			/// All of the new call sites inlined into the caller.
	///			///
	/// 'InlineFunction' fills this in by scanning the inlined instructions, and			/// 'InlineFunction' fills this in by scanning the inlined instructions, and
	/// only if CG is null. If CG is non-null, instead the value handle			/// only if CG is null. If CG is non-null, instead the value handle
	/// `InlinedCalls` above is used.			/// `InlinedCalls` above is used.
	SmallVector<CallBase *, 8> InlinedCallSites;			SmallVector<CallBase *, 8> InlinedCallSites;

				/// Update profile for callee as well as cloned version. We need to do this
				/// for regular inlining, but not for inlining from sample profile loader.
				bool UpdateProfile;

	void reset() {			void reset() {
	StaticAllocas.clear();			StaticAllocas.clear();
	InlinedCalls.clear();			InlinedCalls.clear();
	InlinedCallSites.clear();			InlinedCallSites.clear();
	}			}
	};			};

	/// This function inlines the called function into the basic			/// This function inlines the called function into the basic
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 1,087 Lines • ▼ Show 20 Lines	ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "InlineFail", DLoc, BB)
<< "incompatible inlining");		<< "incompatible inlining");
return false;		return false;
}		}

if (!Cost)		if (!Cost)
return false;		return false;

InlineFunctionInfo IFI(nullptr, GetAC);		InlineFunctionInfo IFI(nullptr, GetAC);
		IFI.UpdateProfile = false;
if (InlineFunction(CB, IFI).isSuccess()) {		if (InlineFunction(CB, IFI).isSuccess()) {
// The call to InlineFunction erases I, so we can't pass it here.		// The call to InlineFunction erases I, so we can't pass it here.
emitInlinedInto(ORE, DLoc, BB, CalledFunction, *BB->getParent(), Cost,		emitInlinedInto(ORE, DLoc, BB, CalledFunction, *BB->getParent(), Cost,
true, CSINLINE_DEBUG);		true, CSINLINE_DEBUG);

// Now populate the list of newly exposed call sites.		// Now populate the list of newly exposed call sites.
if (InlinedCallSites) {		if (InlinedCallSites) {
InlinedCallSites->clear();		InlinedCallSites->clear();
▲ Show 20 Lines • Show All 816 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/InlineFunction.cpp

Show First 20 Lines • Show All 1,948 Lines • ▼ Show 20 Lines	CloneAndPruneFunctionInto(Caller, CalledFunc, VMap,
&InlinedFunctionInfo, &CB);		&InlinedFunctionInfo, &CB);
// Remember the first block that is newly cloned over.		// Remember the first block that is newly cloned over.
FirstNewBlock = LastBlock; ++FirstNewBlock;		FirstNewBlock = LastBlock; ++FirstNewBlock;

// Insert retainRV/clainRV runtime calls.		// Insert retainRV/clainRV runtime calls.
if (objcarc::hasAttachedCallOpBundle(&CB))		if (objcarc::hasAttachedCallOpBundle(&CB))
inlineRetainOrClaimRVCalls(CB, Returns);		inlineRetainOrClaimRVCalls(CB, Returns);

		// Updated caller/callee profiles only when requested. For sample loader
		// inlining, the context-sensitive inlinee profile doesn't need to be
		// subtracted from callee profile, and the inlined clone also doesn't need
		// to be scaled based on call site count.
		if (IFI.UpdateProfile) {
if (IFI.CallerBFI != nullptr && IFI.CalleeBFI != nullptr)		if (IFI.CallerBFI != nullptr && IFI.CalleeBFI != nullptr)
// Update the BFI of blocks cloned into the caller.		// Update the BFI of blocks cloned into the caller.
updateCallerBFI(OrigBB, VMap, IFI.CallerBFI, IFI.CalleeBFI,		updateCallerBFI(OrigBB, VMap, IFI.CallerBFI, IFI.CalleeBFI,
CalledFunc->front());		CalledFunc->front());

updateCallProfile(CalledFunc, VMap, CalledFunc->getEntryCount(), CB,		updateCallProfile(CalledFunc, VMap, CalledFunc->getEntryCount(), CB,
IFI.PSI, IFI.CallerBFI);		IFI.PSI, IFI.CallerBFI);
		}

// Inject byval arguments initialization.		// Inject byval arguments initialization.
for (std::pair<Value, Value> &Init : ByValInit)		for (std::pair<Value, Value> &Init : ByValInit)
HandleByValArgumentInit(Init.first, Init.second, Caller->getParent(),		HandleByValArgumentInit(Init.first, Init.second, Caller->getParent(),
&*FirstNewBlock, IFI);		&*FirstNewBlock, IFI);

Optional<OperandBundleUse> ParentDeopt =		Optional<OperandBundleUse> ParentDeopt =
CB.getOperandBundle(LLVMContext::OB_deopt);		CB.getOperandBundle(LLVMContext::OB_deopt);
▲ Show 20 Lines • Show All 668 Lines • Show Last 20 Lines