This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
1/7
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
-
inline-topdown.prof
1/2
inline-topdown.ll

Differential D70655

[AutoFDO] Top-down Inlining for specialization with context-sensitive profile
ClosedPublic

Authored by wenlei on Nov 25 2019, 12:06 AM.

Download Raw Diff

Details

Reviewers

wmi
davidxl

Commits

rG532196d811ad: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile

Summary

AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case:

Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing.

This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wenlei created this revision.Nov 25 2019, 12:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2019, 12:06 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

wenlei mentioned this in D52845: Update entry count for cold calls.Nov 25 2019, 12:41 AM

This looks good. Can this be handled for cross module (thinLTO) case somehow too?

Can this be handled for cross module (thinLTO) case somehow too?

That'd be nice, but the parallelization and isolation of thin-backends made two things difficult: 1) enforcing global top-down order; 2) adjust profile based on inline decision cross thin-backends, e.g. what D70653 was trying to do (current adjustment, scale or merge, only happens on imported clone, which is sometimes not very useful).

What I'm thinking about is to have ThinLink make global inline decision without reading IR (e.g. based on profile and summary) so it's still reasonably thin, and also adjust profile globally based on inline decisions. In addition, ThinLink also need to pass adjusted profile and inline decision to thin-backends for execution. (Not doing the mechanics of inlinine during ThinLink as that can slow it down a lot, but the challenge is without seeing the IR, the inline decision is going to be a proximation, though we may be able to make it very close as the inline heuristics used by sample loader is relatively simple too)

I think conceptually this shouldn't be too disruptive to current implementation, as sample loader is very early in LTO time passes, which is close to ThinLink. But implementation-wise, this is going to be a lot of changes, and a bit intrusive. Chatted with @tejohnson about this during LLVM dev meeting, and I'd love to hear to alternative ideas too.

One way to handle it is 1) delay early inlining of sites into a hot function if a big percentage of calls to the function are from other modules. This still allows intra module top down inlining of it; or 2) keep a clone of the unlined body of the original function and use that one during cross module inlining.

In D70655#1759165, @davidxl wrote:

One way to handle it is 1) delay early inlining of sites into a hot function if a big percentage of calls to the function are from other modules. This still allows intra module top down inlining of it; or 2) keep a clone of the unlined body of the original function and use that one during cross module inlining.

Both would work for leveraging context-sensitive profile to drive better inlining with specialization, if profile at the time of processing a specific function is accurate. However, the problem of not having cross-module profile adjustment for inlining still exists. With top-down inlining, profile adjustment and PGO inlining is an iterative process. In call graph, lower function's inlining relies on higher function's inlining (and its profile adjustment). So not being able to adjust profile cross thin-backends not only affects post-inline profile fidelity, it also limits PGO inlining..

In short, I think the two ideas will give us some mechanism to do specialization during inlining, but it still misses good (adjusted) profile that drives that kind of inlining.

what I mentioned should be complementary to the top-down method in this patch -- it just allows the full top-down to be doable for cross module scenario as well.

That is a good catch. From my understanding it could potentially reduce inlining in some cold places and potentially increase regular inlinling. Do you see how much impact on performance and code size by changing it? I will do some evaluation on my side.

llvm/lib/Transforms/IPO/SampleProfile.cpp
434–435	If the buildFunctionOrder call is moved to runOnModule, we don't need the field and buildFunctionOrder can return a function order vector.
1749	We can move the check about whether the Function is a declaration to here, so FunctionOrderList can be a little bit smaller.
1807	The check can be moved above.
llvm/test/Transforms/SampleProfile/inline-topdown.ll
19	rename the variable using opt -instnamer.

wmi mentioned this in D70653: [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function.Nov 25 2019, 12:15 PM

Thanks for the discussions and code review. I'll address the comments later, but wanted to answer this one first.

Do you see how much impact on performance and code size by changing it? I will do some evaluation on my side.

I ran it with MySQL linkbench (https://github.com/facebookarchive/linkbench), and it showed ~0.5% performance win and slight code size increase (<1%) with D70653, this patch, and another one combined together. (the other one let inline replay also inline small functions if CGSCC inliner will inline them eventually. I'll upstream that one too).

I haven't tried to measure the breakdown yet, but I think the profile merge and top-down inline change should be the most important ones. I plan to play more with each individual changes, and I'm also interested in what you see from your workload. (I'm putting these changes each under a switch so it's easier for all of us to measure and tune).

address review feedback

Harbormaster completed remote builds in B41474: Diff 230985.Nov 25 2019, 4:10 PM

wenlei marked 2 inline comments as done.Nov 25 2019, 4:12 PM

wenlei added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
434–435	good point, thanks for the suggestion. I moved `buildFunctionOrder` into `runOnModule`, and also moved `!isDeclaration` check into `buildFunctionOrder`.
llvm/test/Transforms/SampleProfile/inline-topdown.ll
19	done.

I did performance test for this change and the result is neutral.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1808	Add some message for the assertion.

This revision is now accepted and ready to land.Nov 27 2019, 9:18 AM

Closed by commit rG532196d811ad: [AutoFDO] Top-down Inlining for specialization with context-sensitive profile (authored by wenlei). · Explain WhyDec 5 2019, 4:23 PM

This revision was automatically updated to reflect the committed changes.

wenlei mentioned this in D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.Oct 25 2020, 1:02 PM

ychen mentioned this in D82919: [SampleFDO] Enable sample-profile-top-down-load by default..Nov 30 2020, 7:54 PM

ychen added a subscriber: ychen.Nov 30 2020, 7:56 PM

ychen added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1826	The new pass manager computes the call graph but here it is skipped. Should we also compute the call graph here?

Herald added subscribers: hoy, modimo, lxfind. · View Herald TranscriptNov 30 2020, 7:56 PM

hoy added inline comments.Dec 1 2020, 9:36 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1826	Passing a call graph into the sample loader sounds helpful to me. I'm not sure if there is an available `FunctionAnalysisManager` object so that the call graph can be reused. Computing a call graph on the spot might hurt the compiler throughput. @wenlei can you shed light on this?

wenlei mentioned this in rG6b989a171073: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.Dec 6 2020, 12:12 PM

wenlei mentioned this in D94001: [CSSPGO] Call site prioritized inlining for sample PGO.Jan 3 2021, 4:45 PM

wenlei mentioned this in rG6bae5973c476: [CSSPGO] Call site prioritized inlining for sample PGO.Feb 1 2021, 11:47 PM

wmi mentioned this in D95988: [CSSPGO] Process functions in a top-down order on a dynamic call graph..Feb 11 2021, 10:48 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

SampleProfile.cpp

55 lines

test/

Transforms/

SampleProfile/

Inputs/

inline-topdown.prof

10 lines

inline-topdown.ll

123 lines

Diff 232469

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show All 20 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/SampleProfile.h"		#include "llvm/Transforms/IPO/SampleProfile.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
		#include "llvm/Analysis/CallGraph.h"
		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	static cl::opt<bool> ProfileAccurateForSymsInList(
cl::desc("For symbols in profile symbol list, regard their profiles to "		cl::desc("For symbols in profile symbol list, regard their profiles to "
"be accurate. It may be overriden by profile-sample-accurate. "));		"be accurate. It may be overriden by profile-sample-accurate. "));

static cl::opt<bool> ProfileMergeInlinee(		static cl::opt<bool> ProfileMergeInlinee(
"sample-profile-merge-inlinee", cl::Hidden, cl::init(false),		"sample-profile-merge-inlinee", cl::Hidden, cl::init(false),
cl::desc("Merge past inlinee's profile to outline version if sample "		cl::desc("Merge past inlinee's profile to outline version if sample "
"profile loader decided not to inline a call site."));		"profile loader decided not to inline a call site."));

		static cl::opt<bool> ProfileTopDownLoad(
		"sample-profile-top-down-load", cl::Hidden, cl::init(false),
		cl::desc("Do profile annotation and inlining for functions in top-down "
		"order of call graph during sample profile loading."));

namespace {		namespace {

using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;		using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;
using EquivalenceClassMap = DenseMap<const BasicBlock , const BasicBlock >;		using EquivalenceClassMap = DenseMap<const BasicBlock , const BasicBlock >;
using Edge = std::pair<const BasicBlock , const BasicBlock >;		using Edge = std::pair<const BasicBlock , const BasicBlock >;
using EdgeWeightMap = DenseMap<Edge, uint64_t>;		using EdgeWeightMap = DenseMap<Edge, uint64_t>;
using BlockEdgeMap =		using BlockEdgeMap =
DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;		DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	SampleProfileLoader(
std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo)		std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo)
: GetAC(std::move(GetAssumptionCache)),		: GetAC(std::move(GetAssumptionCache)),
GetTTI(std::move(GetTargetTransformInfo)), CoverageTracker(*this),		GetTTI(std::move(GetTargetTransformInfo)), CoverageTracker(*this),
Filename(Name), RemappingFilename(RemapName),		Filename(Name), RemappingFilename(RemapName),
IsThinLTOPreLink(IsThinLTOPreLink) {}		IsThinLTOPreLink(IsThinLTOPreLink) {}

bool doInitialization(Module &M);		bool doInitialization(Module &M);
bool runOnModule(Module &M, ModuleAnalysisManager *AM,		bool runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummaryInfo *_PSI);		ProfileSummaryInfo _PSI, CallGraph CG);

void dump() { Reader->dump(); }		void dump() { Reader->dump(); }

protected:		protected:
friend class SampleCoverageTracker;		friend class SampleCoverageTracker;

bool runOnFunction(Function &F, ModuleAnalysisManager *AM);		bool runOnFunction(Function &F, ModuleAnalysisManager *AM);
unsigned getFunctionLoc(Function &F);		unsigned getFunctionLoc(Function &F);
Show All 15 Lines	protected:
void findEquivalenceClasses(Function &F);		void findEquivalenceClasses(Function &F);
template <bool IsPostDom>		template <bool IsPostDom>
void findEquivalencesFor(BasicBlock BB1, ArrayRef<BasicBlock > Descendants,		void findEquivalencesFor(BasicBlock BB1, ArrayRef<BasicBlock > Descendants,
DominatorTreeBase<BasicBlock, IsPostDom> *DomTree);		DominatorTreeBase<BasicBlock, IsPostDom> *DomTree);

void propagateWeights(Function &F);		void propagateWeights(Function &F);
uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);		uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);
void buildEdges(Function &F);		void buildEdges(Function &F);
		std::vector<Function > buildFunctionOrder(Module &M, CallGraph CG);
bool propagateThroughEdges(Function &F, bool UpdateBlockCount);		bool propagateThroughEdges(Function &F, bool UpdateBlockCount);
void computeDominanceAndLoopInfo(Function &F);		void computeDominanceAndLoopInfo(Function &F);
void clearFunctionData();		void clearFunctionData();
bool callsiteIsHot(const FunctionSamples *CallsiteFS,		bool callsiteIsHot(const FunctionSamples *CallsiteFS,
ProfileSummaryInfo *PSI);		ProfileSummaryInfo *PSI);

/// Map basic blocks to their computed weights.		/// Map basic blocks to their computed weights.
///		///
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	protected:
// Information recorded when we declined to inline a call site		// Information recorded when we declined to inline a call site
// because we have determined it is too cold is accumulated for		// because we have determined it is too cold is accumulated for
// each callee function. Initially this is just the entry count.		// each callee function. Initially this is just the entry count.
struct NotInlinedProfileInfo {		struct NotInlinedProfileInfo {
uint64_t entryCount;		uint64_t entryCount;
};		};
DenseMap<Function *, NotInlinedProfileInfo> notInlinedCallInfo;		DenseMap<Function *, NotInlinedProfileInfo> notInlinedCallInfo;

// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for		// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for
// all the function symbols defined or declared in current module.		// all the function symbols defined or declared in current module.
		wmiUnsubmitted Not Done Reply Inline Actions If the buildFunctionOrder call is moved to runOnModule, we don't need the field and buildFunctionOrder can return a function order vector. wmi: If the buildFunctionOrder call is moved to runOnModule, we don't need the field and…
		wenleiAuthorUnsubmitted Done Reply Inline Actions good point, thanks for the suggestion. I moved `buildFunctionOrder` into `runOnModule`, and also moved `!isDeclaration` check into `buildFunctionOrder`. wenlei: good point, thanks for the suggestion. I moved `buildFunctionOrder` into `runOnModule`, and…
DenseMap<uint64_t, StringRef> GUIDToFuncNameMap;		DenseMap<uint64_t, StringRef> GUIDToFuncNameMap;

// All the Names used in FunctionSamples including outline function		// All the Names used in FunctionSamples including outline function
// names, inline instance names and call target names.		// names, inline instance names and call target names.
StringSet<> NamesInProfile;		StringSet<> NamesInProfile;

// For symbol in profile symbol list, whether to regard their profiles		// For symbol in profile symbol list, whether to regard their profiles
// to be accurate. It is mainly decided by existance of profile symbol		// to be accurate. It is mainly decided by existance of profile symbol
▲ Show 20 Lines • Show All 1,256 Lines • ▼ Show 20 Lines
INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)

		std::vector<Function *>
		SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {
		std::vector<Function *> FunctionOrderList;
		FunctionOrderList.reserve(M.size());

		if (!ProfileTopDownLoad \|\| CG == nullptr) {
		for (Function &F : M)
		if (!F.isDeclaration())
		FunctionOrderList.push_back(&F);
		return FunctionOrderList;
		}

		assert(&CG->getModule() == &M);
		scc_iterator<CallGraph *> CGI = scc_begin(CG);
		while (!CGI.isAtEnd()) {
		for (CallGraphNode node : CGI) {
		auto F = node->getFunction();
		if (F && !F->isDeclaration())
		FunctionOrderList.push_back(F);
		}
		++CGI;
		}

		std::reverse(FunctionOrderList.begin(), FunctionOrderList.end());
		return FunctionOrderList;
		}

bool SampleProfileLoader::doInitialization(Module &M) {		bool SampleProfileLoader::doInitialization(Module &M) {
auto &Ctx = M.getContext();		auto &Ctx = M.getContext();

std::unique_ptr<SampleProfileReaderItaniumRemapper> RemapReader;		std::unique_ptr<SampleProfileReaderItaniumRemapper> RemapReader;
auto ReaderOrErr =		auto ReaderOrErr =
SampleProfileReader::create(Filename, Ctx, RemappingFilename);		SampleProfileReader::create(Filename, Ctx, RemappingFilename);
if (std::error_code EC = ReaderOrErr.getError()) {		if (std::error_code EC = ReaderOrErr.getError()) {
std::string Msg = "Could not open profile: " + EC.message();		std::string Msg = "Could not open profile: " + EC.message();
Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));		Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));
return false;		return false;
}		}
Reader = std::move(ReaderOrErr.get());		Reader = std::move(ReaderOrErr.get());
Reader->collectFuncsFrom(M);		Reader->collectFuncsFrom(M);
ProfileIsValid = (Reader->read() == sampleprof_error::success);		ProfileIsValid = (Reader->read() == sampleprof_error::success);
PSL = Reader->getProfileSymbolList();		PSL = Reader->getProfileSymbolList();
		wmiUnsubmitted Not Done Reply Inline Actions We can move the check about whether the Function is a declaration to here, so FunctionOrderList can be a little bit smaller. wmi: We can move the check about whether the Function is a declaration to here, so FunctionOrderList…

// While profile-sample-accurate is on, ignore symbol list.		// While profile-sample-accurate is on, ignore symbol list.
ProfAccForSymsInList =		ProfAccForSymsInList =
ProfileAccurateForSymsInList && PSL && !ProfileSampleAccurate;		ProfileAccurateForSymsInList && PSL && !ProfileSampleAccurate;
if (ProfAccForSymsInList) {		if (ProfAccForSymsInList) {
NamesInProfile.clear();		NamesInProfile.clear();
if (auto NameTable = Reader->getNameTable())		if (auto NameTable = Reader->getNameTable())
NamesInProfile.insert(NameTable->begin(), NameTable->end());		NamesInProfile.insert(NameTable->begin(), NameTable->end());
}		}

return true;		return true;
}		}

ModulePass *llvm::createSampleProfileLoaderPass() {		ModulePass *llvm::createSampleProfileLoaderPass() {
return new SampleProfileLoaderLegacyPass();		return new SampleProfileLoaderLegacyPass();
}		}

ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {		ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {
return new SampleProfileLoaderLegacyPass(Name);		return new SampleProfileLoaderLegacyPass(Name);
}		}

bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,		bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummaryInfo *_PSI) {		ProfileSummaryInfo _PSI, CallGraph CG) {
GUIDToFuncNameMapper Mapper(M, *Reader, GUIDToFuncNameMap);		GUIDToFuncNameMapper Mapper(M, *Reader, GUIDToFuncNameMap);
if (!ProfileIsValid)		if (!ProfileIsValid)
return false;		return false;

PSI = _PSI;		PSI = _PSI;
if (M.getProfileSummary(/* IsCS */ false) == nullptr)		if (M.getProfileSummary(/* IsCS */ false) == nullptr)
M.setProfileSummary(Reader->getSummary().getMD(M.getContext()),		M.setProfileSummary(Reader->getSummary().getMD(M.getContext()),
ProfileSummary::PSK_Sample);		ProfileSummary::PSK_Sample);
Show All 18 Lines	if (pos != StringRef::npos) {
// stripped name. In this case of name conflicting, set the value		// stripped name. In this case of name conflicting, set the value
// to nullptr to avoid confusion.		// to nullptr to avoid confusion.
if (!r.second)		if (!r.second)
r.first->second = nullptr;		r.first->second = nullptr;
}		}
}		}

bool retval = false;		bool retval = false;
for (auto &F : M)		for (auto F : buildFunctionOrder(M, CG)) {
		wmiUnsubmitted Not Done Reply Inline Actions The check can be moved above. wmi: The check can be moved above.
if (!F.isDeclaration()) {		assert(!F->isDeclaration());
		wmiUnsubmitted Not Done Reply Inline Actions Add some message for the assertion. wmi: Add some message for the assertion.
clearFunctionData();		clearFunctionData();
retval \|= runOnFunction(F, AM);		retval \|= runOnFunction(*F, AM);
}		}

// Account for cold calls not inlined....		// Account for cold calls not inlined....
for (const std::pair<Function *, NotInlinedProfileInfo> &pair :		for (const std::pair<Function *, NotInlinedProfileInfo> &pair :
notInlinedCallInfo)		notInlinedCallInfo)
updateProfileCallee(pair.first, pair.second.entryCount);		updateProfileCallee(pair.first, pair.second.entryCount);

return retval;		return retval;
}		}

bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {		bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {
ACT = &getAnalysis<AssumptionCacheTracker>();		ACT = &getAnalysis<AssumptionCacheTracker>();
TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();		TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
return SampleLoader.runOnModule(M, nullptr, PSI);		return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);
		ychenUnsubmitted Not Done Reply Inline Actions The new pass manager computes the call graph but here it is skipped. Should we also compute the call graph here? ychen: The new pass manager computes the call graph but here it is skipped. Should we also compute the…
		hoyUnsubmitted Not Done Reply Inline Actions Passing a call graph into the sample loader sounds helpful to me. I'm not sure if there is an available `FunctionAnalysisManager` object so that the call graph can be reused. Computing a call graph on the spot might hurt the compiler throughput. @wenlei can you shed light on this? hoy: Passing a call graph into the sample loader sounds helpful to me. I'm not sure if there is an…
}		}

bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {		bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {

DILocation2SampleMap.clear();		DILocation2SampleMap.clear();
// By default the entry count is initialized to -1, which will be treated		// By default the entry count is initialized to -1, which will be treated
// conservatively by getEntryCount as the same as unknown (None). This is		// conservatively by getEntryCount as the same as unknown (None). This is
// to avoid newly added code to be treated as cold. If we have samples		// to avoid newly added code to be treated as cold. If we have samples
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	SampleProfileLoader SampleLoader(
ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,		ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,
ProfileRemappingFileName.empty() ? SampleProfileRemappingFile		ProfileRemappingFileName.empty() ? SampleProfileRemappingFile
: ProfileRemappingFileName,		: ProfileRemappingFileName,
IsThinLTOPreLink, GetAssumptionCache, GetTTI);		IsThinLTOPreLink, GetAssumptionCache, GetTTI);

SampleLoader.doInitialization(M);		SampleLoader.doInitialization(M);

ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);		ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);
if (!SampleLoader.runOnModule(M, &AM, PSI))		CallGraph &CG = AM.getResult<CallGraphAnalysis>(M);
		if (!SampleLoader.runOnModule(M, &AM, PSI, &CG))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

llvm/test/Transforms/SampleProfile/Inputs/inline-topdown.prof

This file was added.

				main:225715:0
				2.1: 5553
				3: 5391
				3.1: _Z3sumii:50000
				1: _Z3subii:0
				1: 0

				_Z3sumii:6010:50000
				1: _Z3subii:60000
				1: 9
				No newline at end of file

llvm/test/Transforms/SampleProfile/inline-topdown.ll

This file was added.

				; Note that this needs new pass manager for now. Passing `-sample-profile-top-down-load` to legacy pass manager is a no-op.

				; Test we aren't doing specialization for inlining with default source order
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -S \| FileCheck -check-prefix=DEFAULT %s

				; Test we specialize based on call path with context-sensitive profile while inlining with '-sample-profile-top-down-load'
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -S \| FileCheck -check-prefix=TOPDOWN %s


				@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1

				define i32 @_Z3sumii(i32 %x, i32 %y) !dbg !6 {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %y, i32* %y.addr, align 4
				%tmp = load i32, i32* %x.addr, align 4, !dbg !8
				%tmp1 = load i32, i32* %y.addr, align 4, !dbg !8
				wmiUnsubmitted Not Done Reply Inline Actions rename the variable using opt -instnamer. wmi: rename the variable using opt -instnamer.
				wenleiAuthorUnsubmitted Done Reply Inline Actions done. wenlei: done.
				%add = add nsw i32 %tmp, %tmp1, !dbg !8
				%tmp2 = load i32, i32* %x.addr, align 4, !dbg !8
				%tmp3 = load i32, i32* %y.addr, align 4, !dbg !8
				%call = call i32 @_Z3subii(i32 %tmp2, i32 %tmp3), !dbg !8
				ret i32 %add, !dbg !8
				}

				define i32 @_Z3subii(i32 %x, i32 %y) !dbg !9 {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %y, i32* %y.addr, align 4
				%tmp = load i32, i32* %x.addr, align 4, !dbg !10
				%tmp1 = load i32, i32* %y.addr, align 4, !dbg !10
				%add = sub nsw i32 %tmp, %tmp1, !dbg !10
				ret i32 %add, !dbg !11
				}

				define i32 @main() !dbg !12 {
				entry:
				%retval = alloca i32, align 4
				%s = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 0, i32* %retval
				store i32 0, i32* %i, align 4, !dbg !13
				br label %while.cond, !dbg !14

				while.cond: ; preds = %if.end, %entry
				%tmp = load i32, i32* %i, align 4, !dbg !15
				%inc = add nsw i32 %tmp, 1, !dbg !15
				store i32 %inc, i32* %i, align 4, !dbg !15
				%cmp = icmp slt i32 %tmp, 400000000, !dbg !15
				br i1 %cmp, label %while.body, label %while.end, !dbg !15

				while.body: ; preds = %while.cond
				%tmp1 = load i32, i32* %i, align 4, !dbg !17
				%cmp1 = icmp ne i32 %tmp1, 100, !dbg !17
				br i1 %cmp1, label %if.then, label %if.else, !dbg !17

				if.then: ; preds = %while.body
				%tmp2 = load i32, i32* %i, align 4, !dbg !19
				%tmp3 = load i32, i32* %s, align 4, !dbg !19
				%call = call i32 @_Z3sumii(i32 %tmp2, i32 %tmp3), !dbg !19
				store i32 %call, i32* %s, align 4, !dbg !19
				br label %if.end, !dbg !19

				if.else: ; preds = %while.body
				store i32 30, i32* %s, align 4, !dbg !21
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				br label %while.cond, !dbg !23

				while.end: ; preds = %while.cond
				%tmp4 = load i32, i32* %s, align 4, !dbg !25
				%call2 = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i32 0, i32 0), i32 %tmp4), !dbg !25
				ret i32 0, !dbg !26
				}

				declare i32 @printf(i8*, ...)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}
				!llvm.ident = !{!5}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 3.5 ", isOptimized: false, runtimeVersion: 0, emissionKind: NoDebug, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "calls.cc", directory: ".")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 1, !"Debug Info Version", i32 3}
				!5 = !{!"clang version 3.5 "}
				!6 = distinct !DISubprogram(name: "sum", scope: !1, file: !1, line: 3, type: !7, scopeLine: 3, virtualIndex: 6, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!7 = !DISubroutineType(types: !2)
				!8 = !DILocation(line: 4, scope: !6)
				!9 = distinct !DISubprogram(name: "sub", scope: !1, file: !1, line: 20, type: !7, scopeLine: 20, virtualIndex: 6, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!10 = !DILocation(line: 20, scope: !9)
				!11 = !DILocation(line: 21, scope: !9)
				!12 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !7, scopeLine: 7, virtualIndex: 6, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!13 = !DILocation(line: 8, scope: !12)
				!14 = !DILocation(line: 9, scope: !12)
				!15 = !DILocation(line: 9, scope: !16)
				!16 = !DILexicalBlockFile(scope: !12, file: !1, discriminator: 2)
				!17 = !DILocation(line: 10, scope: !18)
				!18 = distinct !DILexicalBlock(scope: !12, file: !1, line: 10)
				!19 = !DILocation(line: 10, scope: !20)
				!20 = !DILexicalBlockFile(scope: !18, file: !1, discriminator: 2)
				!21 = !DILocation(line: 10, scope: !22)
				!22 = !DILexicalBlockFile(scope: !18, file: !1, discriminator: 4)
				!23 = !DILocation(line: 10, scope: !24)
				!24 = !DILexicalBlockFile(scope: !18, file: !1, discriminator: 6)
				!25 = !DILocation(line: 11, scope: !12)
				!26 = !DILocation(line: 12, scope: !12)


				; DEFAULT: @_Z3sumii
				; DEFAULT-NOT: call i32 @_Z3subii
				; DEFAULT: @main()
				; DEFAULT-NOT: call i32 @_Z3subii

				; TOPDOWN: @_Z3sumii
				; TOPDOWN-NOT: call i32 @_Z3subii
				; TOPDOWN: @main()
				; TOPDOWN: call i32 @_Z3subii
				No newline at end of file