This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
8/15
ProfiledCallGraph.h
2/3
SampleContextTracker.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
2/4
SampleContextTracker.cpp
10/21
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
ctxsplit.ll
-
inline-mergeprof.ll
-
profile-context-order.ll
-
profile-context-tracker-debug.ll
-
profile-topdown-order.ll
-
tools/llvm-profgen/
-
llvm-profgen/
-
CSPreInliner.cpp

Differential D99351

[CSSPGO] Top-down processing order based on full profile.
ClosedPublic

Authored by hoy on Mar 25 2021, 9:09 AM.

Download Raw Diff

Details

Reviewers

davidxl
wmi
wenlei

Commits

rG3e3fc431dfe4: [CSSPGO] Top-down processing order based on full profile.

Summary

Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example:

Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them.

Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining.

Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph.

#3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions.

Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4.

I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%.

The change is an enhancement to https://reviews.llvm.org/D95988.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Mar 25 2021, 9:09 AM

Herald added subscribers: wenlei, hiraditya. · View Herald TranscriptMar 25 2021, 9:09 AM

hoy requested review of this revision.Mar 25 2021, 9:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2021, 9:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B95707: Diff 333321.Mar 25 2021, 9:10 AM

hoy added a parent revision: D99146: [CSSPGO][llvm-profgen] Context-sensitive global pre-inliner.Mar 25 2021, 9:11 AM

hoy edited the summary of this revision. (Show Details)Mar 25 2021, 9:14 AM

hoy added reviewers: davidxl, wmi, wenlei.

hoy edited the summary of this revision. (Show Details)

wlei added a subscriber: wlei.Mar 25 2021, 1:08 PM

wenlei added inline comments.Mar 25 2021, 11:00 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
570–571	This is very similar to CSPreInliner::buildTopDownOrder. We had to let context track do things like addCallGraphEdges in the past. But now it probably makes more sense to let ProfileCallGraph take care of the graph building. I can refactor part of CSPreInliner::buildTopDownOrder into ctor of ProfileCallGraph so it can be reused here.
570–571	Is it intentional that we only look at trie, but not call targets from body samples for call edges?
llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Nit: the description implies that with `-use-profiled-call-graph=1`, we would do top-down order even if `-sample-profile-top-down-load=0` is used. But the implementation doesn't do that. Would be good to have a cohesive connection between the two switches, and description to reflect that. How about `use-profiled-top-down-order` with description like "Use the top-down order defined by profiled call graph when `-sample-profile-top-down-load` is on"?
1579	assert that we don't go this path for csspgo?
1631	What happens if a function is not in input profile, looks like it will be skipped in sample loader after this change? Before the change, we would still set entry count for a function if it has no profile.
1642–1643	Such case has to involve indirect call because if we have direct call, say we have A->B->C, if B is gone in post-link, we will still honor A->C order because the call to C is visible in A, correct? It may be clearer to use example in the description. Same for #4. It'd be good to call out that some of this only applies to csspgo (e.g. #2 shouldn't be a problem for AutoFDO).
1663–1665	Using ProfiledCallGraph allows us to order without needing function object, but profile could be stale (e.g. missing a new edge after source drift). Can we build with ProfiledCallGraph with static call graph nodes and edges included as well?
1667–1668	How about moving the dispatch for `ProfileIsCS` into `SampleProfileLoader::buildProfiledCallGraph`?

wenlei mentioned this in D99146: [CSSPGO][llvm-profgen] Context-sensitive global pre-inliner.Mar 26 2021, 10:57 AM

hoy marked an inline comment as done.Mar 26 2021, 12:16 PM

hoy added inline comments.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
570–571	Sounds good. Moving the logic into ProfileCallGraph makes more sense.
570–571	It is intentional. A call target that doesn't come with a profile or is not on a call path to its child profile can be ignored since processing it before its caller (if this is the only context) shouldn't lose anything.
llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Good point. Description changed.
1579	Done.
1631	Good catch! It's an overlook.
1642–1643	Exactly. A->C can be used to recover A->B->C with C's `!dbg` information. Examples added.
1663–1665	Actually adding static edges leads to worse performance for some benchmarks because of SCC. In that case, static edges in SCC should be completely removed so that only profile edges are honored. On the other hand, yes, profile could be stale, but that's the information FDO relies on. I think without the profile, top-down order isn't important. In other words, static call edges seems not important when they don't correspond to a context in the profile.

Addressing Wenlei's comments.

Harbormaster completed remote builds in B95917: Diff 333607.Mar 26 2021, 1:06 PM

The change is an enhancement to https://reviews.llvm.org/D95988 so it is better to mention D95988 in the description/commit log.

Before this patch UseProfileIndirectCallEdges is true for non-CS sampleFDO, now UseProfiledCallGraph is false by default so it change the behavior of non-CS sampleFDO. I will test UseProfiledCallGraph==true for non-CS sampleFDO and if performance is ok, we can enable UseProfiledCallGraph by default. How does it sound?

hoy edited the summary of this revision. (Show Details)Mar 26 2021, 1:57 PM

In D99351#2653712, @wmi wrote:

The change is an enhancement to https://reviews.llvm.org/D95988 so it is better to mention D95988 in the description/commit log.

Before this patch UseProfileIndirectCallEdges is true for non-CS sampleFDO, now UseProfiledCallGraph is false by default so it change the behavior of non-CS sampleFDO. I will test UseProfiledCallGraph==true for non-CS sampleFDO and if performance is ok, we can enable UseProfiledCallGraph by default. How does it sound?

Sound good, thanks for testing it for non-CS!

BTW, UseProfileIndirectCallEdges was true by default but only worked with CS FDO, like if (UseProfileIndirectCallEdges && ProfileIsCS), because the support of adding indirect call edges for non-CS was not done.

wenlei added inline comments.Mar 26 2021, 4:39 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1663–1665	Ok, this makes sense. So static edges can be conflicting then we may end up with SCC order not compatible with context trie. Using strictly profile order makes sure we will get maximum inlining along context trie. I think that (intentionally not adding call graph edges) worth a comment explaining by itself.

Just find ProfiledCallGraph.h is not included in the patch so the build failed after applying the patch.

In D99351#2654048, @wmi wrote:

Just find ProfiledCallGraph.h is not included in the patch so the build failed after applying the patch.

Thanks for trying it. You’ll also need to apply Wenlei’s patch D99146.

In D99351#2654236, @hoy wrote:

In D99351#2654048, @wmi wrote:

Just find ProfiledCallGraph.h is not included in the patch so the build failed after applying the patch.

Thanks for trying it. You’ll also need to apply Wenlei’s patch D99146.

Then I run into the cyclic including issue:
In this patch llvm/Transforms/IPO/SampleContextTracker.h includes llvm/Transforms/IPO/ProfiledCallGraph.h while in D99146 llvm/Transforms/IPO/ProfiledCallGraph.h includes llvm/Transforms/IPO/SampleContextTracker.h.

In D99351#2654704, @wmi wrote:

In D99351#2654236, @hoy wrote:

In D99351#2654048, @wmi wrote:

Just find ProfiledCallGraph.h is not included in the patch so the build failed after applying the patch.

Thanks for trying it. You’ll also need to apply Wenlei’s patch D99146.

Then I run into the cyclic including issue:
In this patch llvm/Transforms/IPO/SampleContextTracker.h includes llvm/Transforms/IPO/ProfiledCallGraph.h while in D99146 llvm/Transforms/IPO/ProfiledCallGraph.h includes llvm/Transforms/IPO/SampleContextTracker.h.

I see. Sounds like the inclusion of ProfiledCallGraph.h should be moved into SampleContextTracker.cpp, and SampleContextTracker.h will need to have a forward declaration of class ProfiledCallGraph.

Addressing Wenlei's and Wei's comment.

Harbormaster completed remote builds in B96023: Diff 333747.Mar 28 2021, 4:34 PM

Seems there is still build error:

lib/Transforms/IPO/SampleProfile.cpp:1692:23: error: no matching constructor for initialization of 'llvm::sampleprof::ProfiledCallGraph'

ProfiledCallGraph ProfiledCG;

include/llvm/Transforms/IPO/ProfiledCallGraph.h:43:3: note: candidate constructor not viable: requires 2 arguments, but 0 were provided

ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap,

In D99351#2654965, @wmi wrote:
Seems there is still build error:

lib/Transforms/IPO/SampleProfile.cpp:1692:23: error: no matching constructor for initialization of 'llvm::sampleprof::ProfiledCallGraph'
ProfiledCallGraph ProfiledCG;
include/llvm/Transforms/IPO/ProfiledCallGraph.h:43:3: note: candidate constructor not viable: requires 2 arguments, but 0 were provided
ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap,

Oh I see. I didn't sync to Wenlei's latest patch whee the profiled callgraph construction was moved into the constructor. Since there is a subtle difference about whether to limit the call graph build to nodes with samples only, I'm adding back the default constructor for now. We can discuss how to make the code well shared.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1663–1665	Sounds good, comment added.

Updating D99351: [CSSPGO] Top-down processing order based on full profile.

Harbormaster completed remote builds in B96046: Diff 333772.Mar 29 2021, 12:07 AM

In D99351#2654984, @hoy wrote:
In D99351#2654965, @wmi wrote:
Seems there is still build error:

lib/Transforms/IPO/SampleProfile.cpp:1692:23: error: no matching constructor for initialization of 'llvm::sampleprof::ProfiledCallGraph'
ProfiledCallGraph ProfiledCG;
include/llvm/Transforms/IPO/ProfiledCallGraph.h:43:3: note: candidate constructor not viable: requires 2 arguments, but 0 were provided
ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap,
Oh I see. I didn't sync to Wenlei's latest patch whee the profiled callgraph construction was moved into the constructor. Since there is a subtle difference about whether to limit the call graph build to nodes with samples only, I'm adding back the default constructor for now. We can discuss how to make the code well shared.

Let me commit my change so this can also move forward (sorry for delay). I was thinking about adding another ctor for profiled call graph for AutoFDO case (would require changes in this patch). But we could also move the graph building into a helper function..

In D99351#2656191, @wenlei wrote:
In D99351#2654984, @hoy wrote:
In D99351#2654965, @wmi wrote:
Seems there is still build error:

lib/Transforms/IPO/SampleProfile.cpp:1692:23: error: no matching constructor for initialization of 'llvm::sampleprof::ProfiledCallGraph'
ProfiledCallGraph ProfiledCG;
include/llvm/Transforms/IPO/ProfiledCallGraph.h:43:3: note: candidate constructor not viable: requires 2 arguments, but 0 were provided
ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap,
Oh I see. I didn't sync to Wenlei's latest patch whee the profiled callgraph construction was moved into the constructor. Since there is a subtle difference about whether to limit the call graph build to nodes with samples only, I'm adding back the default constructor for now. We can discuss how to make the code well shared.
Let me commit my change so this can also move forward (sorry for delay). I was thinking about adding another ctor for profiled call graph for AutoFDO case (would require changes in this patch). But we could also move the graph building into a helper function..

Sounds good. We can do the refactoring in the current change. Thanks.

The ProfiledCallGraph change D99146 is in, now some adjustment is needed here.

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
44	I think we can let ProfiledCallGraph take over the responsibility of graph building, for both autofdo and csspgo. We could do this through two separate ctor, or a common ctor plus two "build graph" helpers.
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
33	With graph building all moving into ProfileCallGraph, we can avoid referencing ProfiledCallGraph in context tracker. Basically SampleContextTracker::addProfiledCallEdges can be removed?

Moving call graph build into profiledCallGraph.h.

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
44	Sounds good.

Harbormaster completed remote builds in B96163: Diff 333942.Mar 29 2021, 12:17 PM

wenlei added inline comments.Mar 29 2021, 4:31 PM

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
43–44	The purpose of this is to have all nodes populated before adding call edges and enable sanity check when adding edges. But I can see that's not possible for afdo profiles. If we end up adding nodes on the fly for afdo, we could do the same for csspgo, in which case we can remove ProfileMap from parameter.
46	Can we just pass in `StringMap<FunctionSamples> &ProfileMap` instead of the `Reader`? ProfiledCallGraph doesn't need to interact with Reader except for getting the profiles. For checking CS profile, we could do FunctionsSamples::ProfileIsCS too.
54	Can we merge the two ctors for CS profile? Some refactoring to merge them, with parameter to control whether we add edges for call targets and comments explaining why call target edges are skipped would be good. It also makes sense for llvm-profgen path to use exactly what the compiler uses for top-down ordering. So if adding call target edges are problematic for compiler, maybe we should skip that for llvm-profgen too.
104	If we name the one above as `addProfiledCall`, this would be `addProfiledCalls` to be consistent? (And this is indeed adding both nodes and edges)
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
20–21	This can be removed too.

The performance test result is neutral, so I think we can enable UseProfiledCallGraph by default.

llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Emit an error to prevent misuse if ProfileTopDownLoad is false and UseProfiledCallGraph is true?

In D99351#2657349, @wmi wrote:

The performance test result is neutral, so I think we can enable UseProfiledCallGraph by default.

Thanks for measuring the performance. Sounds good to turn it on by default.

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
46	Good point, `Reader` is not really needed here.
54	Yes, they can be merged. The existing constructor will need to get rid of ProfileMap as you mentioned in the other comment.
104	Sounds good.
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
20–21	Good catch.
llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Actually when ProfileTopDownLoad is false, UseProfiledCallGraph doesn't do anything since it'll return early in `buildProfiledCallGraph`. Do you think an error is needed when it returns early while UseProfiledCallGraph is true?

Updating D99351: [CSSPGO] Top-down processing order based on full profile.

Herald added a subscriber: eraman. · View Herald TranscriptMar 29 2021, 6:30 PM

Harbormaster completed remote builds in B96233: Diff 334031.Mar 29 2021, 7:17 PM

wmi added inline comments.Mar 29 2021, 8:35 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Silently ignoring this flag may cause confusion. A warning may be enough.

wenlei added inline comments.Mar 29 2021, 9:22 PM

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
67–74	Remove AddNodeWithSamplesOnly parameter and always add everything even for llvm-profgen? CSPreInliner::processFunction skips names without profile, so it should just work. llvm-profgen should follow what compiler does.
69–72	Actually adding extra edges from call target samples has the risk of forming top-down order that is not compliant with context order due to SCC, right? That is similar to how adding static edges can hurt. So instead of saying call target edge doesn't help, it would be helpful if we make it clear in the comment explaining how adding extra edges from call targets may hurt.
llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Looks like this is not enforced though.. A common pattern is tuning knobs for an optimization and when optimization is turned off, we don't emit warning when tuning flags are still used. It looks to me that silently ignore a tuning flag when an optimization is off is more mainstream then emitting a warning.. don't have a strong opinion though.

hoy added inline comments.Mar 29 2021, 10:38 PM

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
67–74	Sounds good. They are now unified.
69–72	Good point. Comment updated.
llvm/lib/Transforms/IPO/SampleProfile.cpp
165	Yeah, looks that silently ignoring tuning flags is common. Added a warning though, which should be clear and helpful to users.

Addressing comments from Wei and Wenlei.

wenlei added inline comments.Mar 29 2021, 11:18 PM

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
73	typo: which
llvm/lib/Transforms/IPO/SampleProfile.cpp
1605	Is this the canonical way of emit warning? Or something through diagnostics like `LLVMContext.diagnose(DiagnosticInfoSampleProfile(..., DS_Warning))`?

Harbormaster completed remote builds in B96244: Diff 334049.Mar 29 2021, 11:18 PM

hoy added inline comments.Mar 30 2021, 9:07 AM

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
73	Fixed.
llvm/lib/Transforms/IPO/SampleProfile.cpp
1605	The usage of `errs()` to display text messages is quite common, I also used that in lld, though it is not a formal way to emit warnings that users can track in documents.

Updating D99351: [CSSPGO] Top-down processing order based on full profile.

lgtm, thanks.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1605	Yeah, saw inconsistent messages all over the place.. "WARNING", "warning" and "Warning". I guess we're not making it worse. :)

This revision is now accepted and ready to land.Mar 30 2021, 9:19 AM

LGTM.

Harbormaster completed remote builds in B96339: Diff 334182.Mar 30 2021, 10:08 AM

Closed by commit rG3e3fc431dfe4: [CSSPGO] Top-down processing order based on full profile. (authored by hoy). · Explain WhyMar 30 2021, 10:43 AM

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG3e3fc431dfe4: [CSSPGO] Top-down processing order based on full profile..

jsji mentioned this in D99815: [CSSPGO][Test] XFAIL profile-context-tracker-debug.ll on AIX.Apr 2 2021, 2:09 PM

jsji mentioned this in rG1d54aa2e0d72: [CSSPGO][Test] XFAIL profile-context-tracker-debug.ll on AIX.Apr 2 2021, 3:16 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

ProfiledCallGraph.h

72 lines

SampleContextTracker.h

2 lines

lib/

Transforms/

IPO/

SampleContextTracker.cpp

22 lines

SampleProfile.cpp

211 lines

test/

Transforms/

SampleProfile/

ctxsplit.ll

6 lines

inline-mergeprof.ll

10 lines

profile-context-order.ll

14 lines

profile-context-tracker-debug.ll

24 lines

profile-topdown-order.ll

8 lines

tools/

llvm-profgen/

CSPreInliner.cpp

2 lines

Diff 334223

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

//===-- ProfiledCallGraph.h - Profiled Call Graph ----------------- C++ -*-===//		//===-- ProfiledCallGraph.h - Profiled Call Graph ----------------- C++ -*-===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_PROFGEN_PROFILEDCALLGRAPH_H		#ifndef LLVM_TOOLS_LLVM_PROFGEN_PROFILEDCALLGRAPH_H
#define LLVM_TOOLS_LLVM_PROFGEN_PROFILEDCALLGRAPH_H		#define LLVM_TOOLS_LLVM_PROFGEN_PROFILEDCALLGRAPH_H

#include "llvm/ADT/GraphTraits.h"		#include "llvm/ADT/GraphTraits.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/Transforms/IPO/SampleContextTracker.h"		#include "llvm/Transforms/IPO/SampleContextTracker.h"
#include <queue>		#include <queue>
#include <set>		#include <set>
#include <string>		#include <string>

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

Show All 10 Lines	bool operator()(const ProfiledCallGraphNode *L,
return L->Name < R->Name;		return L->Name < R->Name;
}		}
};		};
std::set<ProfiledCallGraphNode *, ProfiledCallGraphNodeComparer> Callees;		std::set<ProfiledCallGraphNode *, ProfiledCallGraphNodeComparer> Callees;
};		};

class ProfiledCallGraph {		class ProfiledCallGraph {
public:		public:
using iterator = std::set<ProfiledCallGraphNode *>::iterator;		using iterator = std::set<ProfiledCallGraphNode *>::iterator;
ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap,
		wenleiUnsubmitted Not Done Reply Inline Actions I think we can let ProfiledCallGraph take over the responsibility of graph building, for both autofdo and csspgo. We could do this through two separate ctor, or a common ctor plus two "build graph" helpers. wenlei: I think we can let ProfiledCallGraph take over the responsibility of graph building, for both…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
		wenleiUnsubmitted Not Done Reply Inline Actions The purpose of this is to have all nodes populated before adding call edges and enable sanity check when adding edges. But I can see that's not possible for afdo profiles. If we end up adding nodes on the fly for afdo, we could do the same for csspgo, in which case we can remove ProfileMap from parameter. wenlei: The purpose of this is to have all nodes populated before adding call edges and enable sanity…
SampleContextTracker &ContextTracker) {		// Constructor for non-CS profile.
// Add all profiled functions into profiled call graph.		ProfiledCallGraph(StringMap<FunctionSamples> &ProfileMap) {
		wenleiUnsubmitted Not Done Reply Inline Actions Can we just pass in `StringMap<FunctionSamples> &ProfileMap` instead of the `Reader`? ProfiledCallGraph doesn't need to interact with Reader except for getting the profiles. For checking CS profile, we could do FunctionsSamples::ProfileIsCS too. wenlei: Can we just pass in `StringMap<FunctionSamples> &ProfileMap` instead of the `Reader`?
		hoyAuthorUnsubmitted Done Reply Inline Actions Good point, `Reader` is not really needed here. hoy: Good point, `Reader` is not really needed here.
// We only add function with actual context profile		assert(!FunctionSamples::ProfileIsCS && "CS profile is not handled here");
for (auto &FuncSample : ProfileMap) {		for (const auto &Samples : ProfileMap) {
FunctionSamples *FSamples = &FuncSample.second;		addProfiledCalls(Samples.second);
addProfiledFunction(FSamples->getName());		}
}		}

// BFS traverse the context profile trie to add call edges for		// Constructor for CS profile.
// both samples calls as well as calls shown in context.		ProfiledCallGraph(SampleContextTracker &ContextTracker) {
		wenleiUnsubmitted Not Done Reply Inline Actions Can we merge the two ctors for CS profile? Some refactoring to merge them, with parameter to control whether we add edges for call targets and comments explaining why call target edges are skipped would be good. It also makes sense for llvm-profgen path to use exactly what the compiler uses for top-down ordering. So if adding call target edges are problematic for compiler, maybe we should skip that for llvm-profgen too. wenlei: Can we merge the two ctors for CS profile? Some refactoring to merge them, with parameter to…
		hoyAuthorUnsubmitted Done Reply Inline Actions Yes, they can be merged. The existing constructor will need to get rid of ProfileMap as you mentioned in the other comment. hoy: Yes, they can be merged. The existing constructor will need to get rid of ProfileMap as you…
		// BFS traverse the context profile trie to add call edges for calls shown
		// in context.
std::queue<ContextTrieNode *> Queue;		std::queue<ContextTrieNode *> Queue;
Queue.push(&ContextTracker.getRootContext());		for (auto &Child : ContextTracker.getRootContext().getAllChildContext()) {
		ContextTrieNode *Callee = &Child.second;
		addProfiledFunction(Callee->getFuncName());
		Queue.push(Callee);
		}

while (!Queue.empty()) {		while (!Queue.empty()) {
ContextTrieNode *Caller = Queue.front();		ContextTrieNode *Caller = Queue.front();
Queue.pop();		Queue.pop();
FunctionSamples *CallerSamples = Caller->getFunctionSamples();		// Add calls for context. When AddNodeWithSamplesOnly is true, both caller
		// and callee need to have context profile.
// Add calls for context, if both caller and callee has context profile.		// Note that callsite target samples are completely ignored since they can
		// conflict with the context edges, which are formed by context
		// compression during profile generation, for cyclic SCCs. This may
		// further result in an SCC order incompatible with the purely
		wenleiUnsubmitted Not Done Reply Inline Actions Actually adding extra edges from call target samples has the risk of forming top-down order that is not compliant with context order due to SCC, right? That is similar to how adding static edges can hurt. So instead of saying call target edge doesn't help, it would be helpful if we make it clear in the comment explaining how adding extra edges from call targets may hurt. wenlei: Actually adding extra edges from call target samples has the risk of forming top-down order…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good point. Comment updated. hoy: Good point. Comment updated.
		// context-based one, which may in turn block context-based inlining.
		wenleiUnsubmitted Not Done Reply Inline Actions typo: which wenlei: typo: which
		hoyAuthorUnsubmitted Done Reply Inline Actions Fixed. hoy: Fixed.
for (auto &Child : Caller->getAllChildContext()) {		for (auto &Child : Caller->getAllChildContext()) {
		wenleiUnsubmitted Not Done Reply Inline Actions Remove AddNodeWithSamplesOnly parameter and always add everything even for llvm-profgen? CSPreInliner::processFunction skips names without profile, so it should just work. llvm-profgen should follow what compiler does. wenlei: Remove AddNodeWithSamplesOnly parameter and always add everything even for llvm-profgen?
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. They are now unified. hoy: Sounds good. They are now unified.
ContextTrieNode *Callee = &Child.second;		ContextTrieNode *Callee = &Child.second;
		addProfiledFunction(Callee->getFuncName());
Queue.push(Callee);		Queue.push(Callee);
if (CallerSamples && Callee->getFunctionSamples()) {
addProfiledCall(Caller->getFuncName(), Callee->getFuncName());		addProfiledCall(Caller->getFuncName(), Callee->getFuncName());
}		}
}		}

// Add calls from call site samples
if (CallerSamples) {
for (auto &LocCallSite : CallerSamples->getBodySamples()) {
for (auto &NameCallSite : LocCallSite.second.getCallTargets()) {
addProfiledCall(Caller->getFuncName(), NameCallSite.first());
}
}
}
}
}		}

iterator begin() { return Root.Callees.begin(); }		iterator begin() { return Root.Callees.begin(); }
iterator end() { return Root.Callees.end(); }		iterator end() { return Root.Callees.end(); }
ProfiledCallGraphNode *getEntryNode() { return &Root; }		ProfiledCallGraphNode *getEntryNode() { return &Root; }
void addProfiledFunction(StringRef Name) {		void addProfiledFunction(StringRef Name) {
if (!ProfiledFunctions.count(Name)) {		if (!ProfiledFunctions.count(Name)) {
// Link to synthetic root to make sure every node is reachable		// Link to synthetic root to make sure every node is reachable
// from root. This does not affect SCC order.		// from root. This does not affect SCC order.
Root.Callees.insert(&ProfiledFunctions[Name]);		Root.Callees.insert(&ProfiledFunctions[Name]);
ProfiledFunctions[Name] = ProfiledCallGraphNode(Name);		ProfiledFunctions[Name] = ProfiledCallGraphNode(Name);
}		}
}		}

void addProfiledCall(StringRef CallerName, StringRef CalleeName) {		void addProfiledCall(StringRef CallerName, StringRef CalleeName) {
assert(ProfiledFunctions.count(CallerName));		assert(ProfiledFunctions.count(CallerName));
auto CalleeIt = ProfiledFunctions.find(CalleeName);		auto CalleeIt = ProfiledFunctions.find(CalleeName);
if (CalleeIt == ProfiledFunctions.end()) {		if (CalleeIt == ProfiledFunctions.end()) {
return;		return;
}		}
ProfiledFunctions[CallerName].Callees.insert(&CalleeIt->second);		ProfiledFunctions[CallerName].Callees.insert(&CalleeIt->second);
}		}

		void addProfiledCalls(const FunctionSamples &Samples) {
		wenleiUnsubmitted Done Reply Inline Actions If we name the one above as `addProfiledCall`, this would be `addProfiledCalls` to be consistent? (And this is indeed adding both nodes and edges) wenlei: If we name the one above as `addProfiledCall`, this would be `addProfiledCalls` to be…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
		addProfiledFunction(Samples.getFuncName());

		for (const auto &Sample : Samples.getBodySamples()) {
		for (const auto &Target : Sample.second.getCallTargets()) {
		addProfiledFunction(Target.first());
		addProfiledCall(Samples.getFuncName(), Target.first());
		}
		}

		for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {
		for (const auto &InlinedSamples : CallsiteSamples.second) {
		addProfiledFunction(InlinedSamples.first);
		addProfiledCall(Samples.getFuncName(), InlinedSamples.first);
		addProfiledCalls(InlinedSamples.second);
		}
		}
		}

private:		private:
ProfiledCallGraphNode Root;		ProfiledCallGraphNode Root;
StringMap<ProfiledCallGraphNode> ProfiledFunctions;		StringMap<ProfiledCallGraphNode> ProfiledFunctions;
};		};

} // end namespace sampleprof		} // end namespace sampleprof

template <> struct GraphTraits<ProfiledCallGraphNode *> {		template <> struct GraphTraits<ProfiledCallGraphNode *> {
Show All 27 Lines

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

Show All 11 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H		#ifndef LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H
#define LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H		#define LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H

#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/CallGraph.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
		wenleiUnsubmitted Done Reply Inline Actions This can be removed too. wenlei: This can be removed too.
		hoyAuthorUnsubmitted Done Reply Inline Actions Good catch. hoy: Good catch.
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
#include <list>		#include <list>
#include <map>		#include <map>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

namespace llvm {		namespace llvm {

// Internal trie tree representation used for tracking context tree and sample		// Internal trie tree representation used for tracking context tree and sample
		wenleiUnsubmitted Not Done Reply Inline Actions With graph building all moving into ProfileCallGraph, we can avoid referencing ProfiledCallGraph in context tracker. Basically SampleContextTracker::addProfiledCallEdges can be removed? wenlei: With graph building all moving into ProfileCallGraph, we can avoid referencing…
// profiles. The path from root node to a given node represents the context of		// profiles. The path from root node to a given node represents the context of
// that nodes' profile.		// that nodes' profile.
class ContextTrieNode {		class ContextTrieNode {
public:		public:
ContextTrieNode(ContextTrieNode *Parent = nullptr,		ContextTrieNode(ContextTrieNode *Parent = nullptr,
StringRef FName = StringRef(),		StringRef FName = StringRef(),
FunctionSamples *FSamples = nullptr,		FunctionSamples *FSamples = nullptr,
LineLocation CallLoc = {0, 0})		LineLocation CallLoc = {0, 0})
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	public:
ContextTrieNode *getContextFor(const SampleContext &Context);		ContextTrieNode *getContextFor(const SampleContext &Context);
// Mark a context profile as inlined when function is inlined.		// Mark a context profile as inlined when function is inlined.
// This makes sure that inlined context profile will be excluded in		// This makes sure that inlined context profile will be excluded in
// function's base profile.		// function's base profile.
void markContextSamplesInlined(const FunctionSamples *InlinedSamples);		void markContextSamplesInlined(const FunctionSamples *InlinedSamples);
ContextTrieNode &getRootContext();		ContextTrieNode &getRootContext();
void promoteMergeContextSamplesTree(const Instruction &Inst,		void promoteMergeContextSamplesTree(const Instruction &Inst,
StringRef CalleeName);		StringRef CalleeName);
void addCallGraphEdges(CallGraph &CG, StringMap<Function *> &SymbolMap);
// Dump the internal context profile trie.		// Dump the internal context profile trie.
void dump();		void dump();

private:		private:
ContextTrieNode getContextFor(const DILocation DIL);		ContextTrieNode getContextFor(const DILocation DIL);
ContextTrieNode getCalleeContextFor(const DILocation DIL,		ContextTrieNode getCalleeContextFor(const DILocation DIL,
StringRef CalleeName);		StringRef CalleeName);
ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,		ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,
Show All 19 Lines

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines	if (!ToNode) {
FromNode.getAllChildContext().clear();		FromNode.getAllChildContext().clear();
}		}

// For root of subtree, remove itself from old parent too		// For root of subtree, remove itself from old parent too
if (MoveToRoot)		if (MoveToRoot)
FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncName());		FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncName());

return *ToNode;		return *ToNode;
}		}

// Replace call graph edges with dynamic call edges from the profile.
void SampleContextTracker::addCallGraphEdges(CallGraph &CG,
StringMap<Function *> &SymbolMap) {
// Add profile call edges to the call graph.
std::queue<ContextTrieNode *> NodeQueue;
NodeQueue.push(&RootContext);
while (!NodeQueue.empty()) {
ContextTrieNode *Node = NodeQueue.front();
NodeQueue.pop();
Function *F = SymbolMap.lookup(Node->getFuncName());
for (auto &I : Node->getAllChildContext()) {
ContextTrieNode *ChildNode = &I.second;
NodeQueue.push(ChildNode);
if (F && !F->isDeclaration()) {
Function *Callee = SymbolMap.lookup(ChildNode->getFuncName());
if (Callee && !Callee->isDeclaration())
CG[F]->addCalledFunction(nullptr, CG[Callee]);
}
}
}
}
} // namespace llvm		} // namespace llvm
		wenleiUnsubmitted Not Done Reply Inline Actions This is very similar to CSPreInliner::buildTopDownOrder. We had to let context track do things like addCallGraphEdges in the past. But now it probably makes more sense to let ProfileCallGraph take care of the graph building. I can refactor part of CSPreInliner::buildTopDownOrder into ctor of ProfileCallGraph so it can be reused here. wenlei: This is very similar to CSPreInliner::buildTopDownOrder. We had to let context track do things…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. Moving the logic into ProfileCallGraph makes more sense. hoy: Sounds good. Moving the logic into ProfileCallGraph makes more sense.
		wenleiUnsubmitted Not Done Reply Inline Actions Is it intentional that we only look at trie, but not call targets from body samples for call edges? wenlei: Is it intentional that we only look at trie, but not call targets from body samples for call…
		hoyAuthorUnsubmitted Done Reply Inline Actions It is intentional. A call target that doesn't come with a profile or is not on a call path to its child profile can be ignored since processing it before its caller (if this is the only context) shouldn't lose anything. hoy: It is intentional. A call target that doesn't come with a profile or is not on a call path to…

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/GenericDomTree.h"		#include "llvm/Support/GenericDomTree.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
		#include "llvm/Transforms/IPO/ProfiledCallGraph.h"
#include "llvm/Transforms/IPO/SampleContextTracker.h"		#include "llvm/Transforms/IPO/SampleContextTracker.h"
#include "llvm/Transforms/IPO/SampleProfileProbe.h"		#include "llvm/Transforms/IPO/SampleProfileProbe.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Utils/CallPromotionUtils.h"		#include "llvm/Transforms/Utils/CallPromotionUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"		#include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"
#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"		#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"
#include <algorithm>		#include <algorithm>
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	cl::desc("Merge past inlinee's profile to outline version if sample "
"enabled. "));		"enabled. "));

static cl::opt<bool> ProfileTopDownLoad(		static cl::opt<bool> ProfileTopDownLoad(
"sample-profile-top-down-load", cl::Hidden, cl::init(true),		"sample-profile-top-down-load", cl::Hidden, cl::init(true),
cl::desc("Do profile annotation and inlining for functions in top-down "		cl::desc("Do profile annotation and inlining for functions in top-down "
"order of call graph during sample profile loading. It only "		"order of call graph during sample profile loading. It only "
"works for new pass manager. "));		"works for new pass manager. "));

static cl::opt<bool> UseProfileIndirectCallEdges(		static cl::opt<bool>
"use-profile-indirect-call-edges", cl::init(true), cl::Hidden,		UseProfiledCallGraph("use-profiled-call-graph", cl::init(true), cl::Hidden,
		wenleiUnsubmitted Not Done Reply Inline Actions Nit: the description implies that with `-use-profiled-call-graph=1`, we would do top-down order even if `-sample-profile-top-down-load=0` is used. But the implementation doesn't do that. Would be good to have a cohesive connection between the two switches, and description to reflect that. How about `use-profiled-top-down-order` with description like "Use the top-down order defined by profiled call graph when `-sample-profile-top-down-load` is on"? wenlei: Nit: the description implies that with `-use-profiled-call-graph=1`, we would do top-down order…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good point. Description changed. hoy: Good point. Description changed.
		wmiUnsubmitted Not Done Reply Inline Actions Emit an error to prevent misuse if ProfileTopDownLoad is false and UseProfiledCallGraph is true? wmi: Emit an error to prevent misuse if ProfileTopDownLoad is false and UseProfiledCallGraph is true?
		hoyAuthorUnsubmitted Done Reply Inline Actions Actually when ProfileTopDownLoad is false, UseProfiledCallGraph doesn't do anything since it'll return early in `buildProfiledCallGraph`. Do you think an error is needed when it returns early while UseProfiledCallGraph is true? hoy: Actually when ProfileTopDownLoad is false, UseProfiledCallGraph doesn't do anything since it'll…
		wmiUnsubmitted Not Done Reply Inline Actions Silently ignoring this flag may cause confusion. A warning may be enough. wmi: Silently ignoring this flag may cause confusion. A warning may be enough.
		wenleiUnsubmitted Not Done Reply Inline Actions Looks like this is not enforced though.. A common pattern is tuning knobs for an optimization and when optimization is turned off, we don't emit warning when tuning flags are still used. It looks to me that silently ignore a tuning flag when an optimization is off is more mainstream then emitting a warning.. don't have a strong opinion though. wenlei: Looks like this is not enforced though.. A common pattern is tuning knobs for an optimization…
		hoyAuthorUnsubmitted Done Reply Inline Actions Yeah, looks that silently ignoring tuning flags is common. Added a warning though, which should be clear and helpful to users. hoy: Yeah, looks that silently ignoring tuning flags is common. Added a warning though, which should…
cl::desc("Considering indirect call samples from profile when top-down "		cl::desc("Process functions in a top-down order "
"processing functions. Only CSSPGO is supported."));		"defined by the profiled call graph when "
		"-sample-profile-top-down-load is on."));
static cl::opt<bool> UseProfileTopDownOrder(
"use-profile-top-down-order", cl::init(false), cl::Hidden,
cl::desc("Process functions in one SCC in a top-down order "
"based on the input profile."));

static cl::opt<bool> ProfileSizeInline(		static cl::opt<bool> ProfileSizeInline(
"sample-profile-inline-size", cl::Hidden, cl::init(false),		"sample-profile-inline-size", cl::Hidden, cl::init(false),
cl::desc("Inline cold call sites in profile loader if it's beneficial "		cl::desc("Inline cold call sites in profile loader if it's beneficial "
"for code size."));		"for code size."));

cl::opt<int> ProfileInlineGrowthLimit(		cl::opt<int> ProfileInlineGrowthLimit(
"sample-profile-inline-growth-limit", cl::Hidden, cl::init(12),		"sample-profile-inline-growth-limit", cl::Hidden, cl::init(12),
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	protected:
inlineHotFunctionsWithPriority(Function &F,		inlineHotFunctionsWithPriority(Function &F,
DenseSet<GlobalValue::GUID> &InlinedGUIDs);		DenseSet<GlobalValue::GUID> &InlinedGUIDs);
// Inline cold/small functions in addition to hot ones		// Inline cold/small functions in addition to hot ones
bool shouldInlineColdCallee(CallBase &CallInst);		bool shouldInlineColdCallee(CallBase &CallInst);
void emitOptimizationRemarksForInlineCandidates(		void emitOptimizationRemarksForInlineCandidates(
const SmallVectorImpl<CallBase *> &Candidates, const Function &F,		const SmallVectorImpl<CallBase *> &Candidates, const Function &F,
bool Hot);		bool Hot);
std::vector<Function > buildFunctionOrder(Module &M, CallGraph CG);		std::vector<Function > buildFunctionOrder(Module &M, CallGraph CG);
void addCallGraphEdges(CallGraph &CG, const FunctionSamples &Samples);		std::unique_ptr<ProfiledCallGraph> buildProfiledCallGraph(CallGraph &CG);
void replaceCallGraphEdges(CallGraph &CG, StringMap<Function *> &SymbolMap);
void generateMDProfMetadata(Function &F);		void generateMDProfMetadata(Function &F);

/// Map from function name to Function *. Used to find the function from		/// Map from function name to Function *. Used to find the function from
/// the function name. If the function name contains suffix, additional		/// the function name. If the function name contains suffix, additional
/// entry is added to map from the stripped name to the function if there		/// entry is added to map from the stripped name to the function if there
/// is one-to-one mapping.		/// is one-to-one mapping.
StringMap<Function *> SymbolMap;		StringMap<Function *> SymbolMap;

▲ Show 20 Lines • Show All 1,166 Lines • ▼ Show 20 Lines	INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)

// Add inlined profile call edges to the call graph.		std::unique_ptr<ProfiledCallGraph>
void SampleProfileLoader::addCallGraphEdges(CallGraph &CG,		SampleProfileLoader::buildProfiledCallGraph(CallGraph &CG) {
		wenleiUnsubmitted Done Reply Inline Actions assert that we don't go this path for csspgo? wenlei: assert that we don't go this path for csspgo?
		hoyAuthorUnsubmitted Done Reply Inline Actions Done. hoy: Done.
const FunctionSamples &Samples) {		std::unique_ptr<ProfiledCallGraph> ProfiledCG;
Function *Caller = SymbolMap.lookup(Samples.getFuncName());		if (ProfileIsCS)
if (!Caller \|\| Caller->isDeclaration())		ProfiledCG = std::make_unique<ProfiledCallGraph>(*ContextTracker);
return;		else
		ProfiledCG = std::make_unique<ProfiledCallGraph>(Reader->getProfiles());

// Skip non-inlined call edges which are not important since top down inlining		// Add all functions into the profiled call graph even if they are not in
// for non-CS profile is to get more precise profile matching, not to enable		// the profile. This makes sure functions missing from the profile still
// more inlining.		// gets a chance to be processed.
		for (auto &Node : CG) {
for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {		const auto *F = Node.first;
for (const auto &InlinedSamples : CallsiteSamples.second) {		if (!F \|\| F->isDeclaration() \|\| !F->hasFnAttribute("use-sample-profile"))
Function *Callee = SymbolMap.lookup(InlinedSamples.first);		continue;
if (Callee && !Callee->isDeclaration())		ProfiledCG->addProfiledFunction(FunctionSamples::getCanonicalFnName(*F));
CG[Caller]->addCalledFunction(nullptr, CG[Callee]);
addCallGraphEdges(CG, InlinedSamples.second);
}
}
}		}

// Replace call graph edges with dynamic call edges from the profile.		return ProfiledCG;
void SampleProfileLoader::replaceCallGraphEdges(
CallGraph &CG, StringMap<Function *> &SymbolMap) {
// Remove static call edges from the call graph except for the ones from the
// root which make the call graph connected.
for (const auto &Node : CG)
if (Node.second.get() != CG.getExternalCallingNode())
Node.second->removeAllCalledFunctions();

// Add profile call edges to the call graph.
if (ProfileIsCS) {
ContextTracker->addCallGraphEdges(CG, SymbolMap);
} else {
for (const auto &Samples : Reader->getProfiles())
addCallGraphEdges(CG, Samples.second);
}
}		}

std::vector<Function *>		std::vector<Function *>
SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {		SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {
std::vector<Function *> FunctionOrderList;		std::vector<Function *> FunctionOrderList;
FunctionOrderList.reserve(M.size());		FunctionOrderList.reserve(M.size());

		if (!ProfileTopDownLoad && UseProfiledCallGraph)
		errs() << "WARNING: -use-profiled-call-graph ignored, should be used "
		wenleiUnsubmitted Not Done Reply Inline Actions Is this the canonical way of emit warning? Or something through diagnostics like `LLVMContext.diagnose(DiagnosticInfoSampleProfile(..., DS_Warning))`? wenlei: Is this the canonical way of emit warning? Or something through diagnostics like `LLVMContext.
		hoyAuthorUnsubmitted Done Reply Inline Actions The usage of `errs()` to display text messages is quite common, I also used that in lld, though it is not a formal way to emit warnings that users can track in documents. hoy: The usage of `errs()` to display text messages is quite common, I also used that in lld, though…
		wenleiUnsubmitted Not Done Reply Inline Actions Yeah, saw inconsistent messages all over the place.. "WARNING", "warning" and "Warning". I guess we're not making it worse. :) wenlei: Yeah, saw inconsistent messages all over the place.. "WARNING", "warning" and "Warning". I…
		"together with -sample-profile-top-down-load.\n";

if (!ProfileTopDownLoad \|\| CG == nullptr) {		if (!ProfileTopDownLoad \|\| CG == nullptr) {
if (ProfileMergeInlinee) {		if (ProfileMergeInlinee) {
// Disable ProfileMergeInlinee if profile is not loaded in top down order,		// Disable ProfileMergeInlinee if profile is not loaded in top down order,
// because the profile for a function may be used for the profile		// because the profile for a function may be used for the profile
// annotation of its outline copy before the profile merging of its		// annotation of its outline copy before the profile merging of its
// non-inlined inline instances, and that is not the way how		// non-inlined inline instances, and that is not the way how
// ProfileMergeInlinee is supposed to work.		// ProfileMergeInlinee is supposed to work.
ProfileMergeInlinee = false;		ProfileMergeInlinee = false;
}		}

for (Function &F : M)		for (Function &F : M)
if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile"))		if (!F.isDeclaration() && F.hasFnAttribute("use-sample-profile"))
FunctionOrderList.push_back(&F);		FunctionOrderList.push_back(&F);
return FunctionOrderList;		return FunctionOrderList;
}		}

assert(&CG->getModule() == &M);		assert(&CG->getModule() == &M);

// Add indirect call edges from profile to augment the static call graph.		if (UseProfiledCallGraph \|\|
// Functions will be processed in a top-down order defined by the static call		(ProfileIsCS && !UseProfiledCallGraph.getNumOccurrences())) {
// graph. Adjusting the order by considering indirect call edges from the		// Use profiled call edges to augment the top-down order. There are cases
// profile (which don't exist in the static call graph) can enable the		// that the top-down order computed based on the static call graph doesn't
// inlining of indirect call targets by processing the caller before them.		// reflect real execution order. For example
// TODO: enable this for non-CS profile and fix the counts returning logic to		//
		wenleiUnsubmitted Not Done Reply Inline Actions What happens if a function is not in input profile, looks like it will be skipped in sample loader after this change? Before the change, we would still set entry count for a function if it has no profile. wenlei: What happens if a function is not in input profile, looks like it will be skipped in sample…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good catch! It's an overlook. hoy: Good catch! It's an overlook.
// have a full support for indirect calls.		// 1. Incomplete static call graph due to unknown indirect call targets.
if (UseProfileIndirectCallEdges && ProfileIsCS) {		// Adjusting the order by considering indirect call edges from the
for (auto &Entry : *CG) {		// profile can enable the inlining of indirect call targets by allowing
const auto *F = Entry.first;		// the caller processed before them.
if (!F \|\| F->isDeclaration() \|\| !F->hasFnAttribute("use-sample-profile"))		// 2. Mutual call edges in an SCC. The static processing order computed for
continue;		// an SCC may not reflect the call contexts in the context-sensitive
auto &AllContexts = ContextTracker->getAllContextSamplesFor(F->getName());		// profile, thus may cause potential inlining to be overlooked. The
if (AllContexts.empty())		// function order in one SCC is being adjusted to a top-down order based
continue;		// on the profile to favor more inlining. This is only a problem with CS
		// profile.
for (const auto &BB : *F) {		// 3. Transitive indirect call edges due to inlining. When a callee function
for (const auto &I : BB.getInstList()) {		// (say B) is inlined into into a caller function (say A) in LTO prelink,
		wenleiUnsubmitted Not Done Reply Inline Actions Such case has to involve indirect call because if we have direct call, say we have A->B->C, if B is gone in post-link, we will still honor A->C order because the call to C is visible in A, correct? It may be clearer to use example in the description. Same for #4. It'd be good to call out that some of this only applies to csspgo (e.g. #2 shouldn't be a problem for AutoFDO). wenlei: Such case has to involve indirect call because if we have direct call, say we have A->B->C, if…
		hoyAuthorUnsubmitted Done Reply Inline Actions Exactly. A->C can be used to recover A->B->C with C's `!dbg` information. Examples added. hoy: Exactly. A->C can be used to recover A->B->C with C's `!dbg` information. Examples added.
const auto *CB = dyn_cast<CallBase>(&I);		// every call edge originated from the callee B will be transferred to
if (!CB \|\| !CB->isIndirectCall())		// the caller A. If any transferred edge (say A->C) is indirect, the
continue;		// original profiled indirect edge B->C, even if considered, would not
const DebugLoc &DLoc = I.getDebugLoc();		// enforce a top-down order from the caller A to the potential indirect
if (!DLoc)		// call target C in LTO postlink since the inlined callee B is gone from
continue;		// the static call graph.
auto CallSite = FunctionSamples::getCallSiteIdentifier(DLoc);		// 4. #3 can happen even for direct call targets, due to functions defined
for (FunctionSamples *Samples : AllContexts) {		// in header files. A header function (say A), when included into source
if (auto CallTargets = Samples->findCallTargetMapAt(CallSite)) {		// files, is defined multiple times but only one definition survives due
for (const auto &Target : CallTargets.get()) {		// to ODR. Therefore, the LTO prelink inlining done on those dropped
Function *Callee = SymbolMap.lookup(Target.first());		// definitions can be useless based on a local file scope. More
if (Callee && !Callee->isDeclaration())		// importantly, the inlinee (say B), once fully inlined to a
Entry.second->addCalledFunction(nullptr, (*CG)[Callee]);		// to-be-dropped A, will have no profile to consume when its outlined
}		// version is compiled. This can lead to a profile-less prelink
}		// compilation for the outlined version of B which may be called from
}		// external modules. while this isn't easy to fix, we rely on the
}		// postlink AutoFDO pipeline to optimize B. Since the survived copy of
}		// the A can be inlined in its local scope in prelink, it may not exist
}		// in the merged IR in postlink, and we'll need the profiled call edges
}		// to enforce a top-down order for the rest of the functions.
		//
		// Considering those cases, a profiled call graph completely independent of
		wenleiUnsubmitted Not Done Reply Inline Actions Using ProfiledCallGraph allows us to order without needing function object, but profile could be stale (e.g. missing a new edge after source drift). Can we build with ProfiledCallGraph with static call graph nodes and edges included as well? wenlei: Using ProfiledCallGraph allows us to order without needing function object, but profile could…
		hoyAuthorUnsubmitted Done Reply Inline Actions Actually adding static edges leads to worse performance for some benchmarks because of SCC. In that case, static edges in SCC should be completely removed so that only profile edges are honored. On the other hand, yes, profile could be stale, but that's the information FDO relies on. I think without the profile, top-down order isn't important. In other words, static call edges seems not important when they don't correspond to a context in the profile. hoy: Actually adding static edges leads to worse performance for some benchmarks because of SCC. In…
		wenleiUnsubmitted Not Done Reply Inline Actions Ok, this makes sense. So static edges can be conflicting then we may end up with SCC order not compatible with context trie. Using strictly profile order makes sure we will get maximum inlining along context trie. I think that (intentionally not adding call graph edges) worth a comment explaining by itself. wenlei: Ok, this makes sense. So static edges can be conflicting then we may end up with SCC order not…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good, comment added. hoy: Sounds good, comment added.
		// the static call graph is constructed based on profile data, where
		// function objects are not even needed to handle case #3 and case 4.
		//
		wenleiUnsubmitted Not Done Reply Inline Actions How about moving the dispatch for `ProfileIsCS` into `SampleProfileLoader::buildProfiledCallGraph`? wenlei: How about moving the dispatch for `ProfileIsCS` into `SampleProfileLoader…
		// Note that static callgraph edges are completely ignored since they
		// can be conflicting with profiled edges for cyclic SCCs and may result in
		// an SCC order incompatible with profile-defined one. Using strictly
		// profile order ensures a maximum inlining experience. On the other hand,
		// static call edges are not so important when they don't correspond to a
		// context in the profile.

// Compute a top-down order the profile which is used to sort functions in		std::unique_ptr<ProfiledCallGraph> ProfiledCG = buildProfiledCallGraph(*CG);
// one SCC later. The static processing order computed for an SCC may not		scc_iterator<ProfiledCallGraph *> CGI = scc_begin(ProfiledCG.get());
// reflect the call contexts in the context-sensitive profile, thus may cause
// potential inlining to be overlooked. The function order in one SCC is being
// adjusted to a top-down order based on the profile to favor more inlining.
DenseMap<Function *, uint64_t> ProfileOrderMap;
if (UseProfileTopDownOrder \|\|
(ProfileIsCS && !UseProfileTopDownOrder.getNumOccurrences())) {
// Create a static call graph. The call edges are not important since they
// will be replaced by dynamic edges from the profile.
CallGraph ProfileCG(M);
replaceCallGraphEdges(ProfileCG, SymbolMap);
scc_iterator<CallGraph *> CGI = scc_begin(&ProfileCG);
uint64_t I = 0;
while (!CGI.isAtEnd()) {		while (!CGI.isAtEnd()) {
for (CallGraphNode Node : CGI) {		for (ProfiledCallGraphNode Node : CGI) {
if (auto *F = Node->getFunction())		Function *F = SymbolMap.lookup(Node->Name);
ProfileOrderMap[F] = ++I;		if (F && !F->isDeclaration() && F->hasFnAttribute("use-sample-profile"))
		FunctionOrderList.push_back(F);
}		}
++CGI;		++CGI;
}		}
}		} else {

scc_iterator<CallGraph *> CGI = scc_begin(CG);		scc_iterator<CallGraph *> CGI = scc_begin(CG);
while (!CGI.isAtEnd()) {		while (!CGI.isAtEnd()) {
uint64_t Start = FunctionOrderList.size();
for (CallGraphNode Node : CGI) {		for (CallGraphNode Node : CGI) {
auto *F = Node->getFunction();		auto *F = Node->getFunction();
if (F && !F->isDeclaration() && F->hasFnAttribute("use-sample-profile"))		if (F && !F->isDeclaration() && F->hasFnAttribute("use-sample-profile"))
FunctionOrderList.push_back(F);		FunctionOrderList.push_back(F);
}		}

// Sort nodes in SCC based on the profile top-down order.
if (!ProfileOrderMap.empty()) {
std::stable_sort(FunctionOrderList.begin() + Start,
FunctionOrderList.end(),
[&ProfileOrderMap](Function Left, Function Right) {
return ProfileOrderMap[Left] < ProfileOrderMap[Right];
});
}

++CGI;		++CGI;
}		}
		}

LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "Function processing order:\n";		dbgs() << "Function processing order:\n";
for (auto F : reverse(FunctionOrderList)) {		for (auto F : reverse(FunctionOrderList)) {
dbgs() << F->getName() << "\n";		dbgs() << F->getName() << "\n";
}		}
});		});

▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/ctxsplit.ll

	; Check the nonflattened part of the ctxsplit profile will be read in thinlto			; Check the nonflattened part of the ctxsplit profile will be read in thinlto
	; postlink phase while flattened part of the ctxsplit profile will not be read.			; postlink phase while flattened part of the ctxsplit profile will not be read.
	; RUN: opt < %s -passes='thinlto<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file=%S/Inputs/ctxsplit.extbinary.afdo -S \| FileCheck %s --check-prefix=POSTLINK			; RUN: opt < %s -passes='thinlto<O2>' -pgo-kind=pgo-sample-use-pipeline -use-profiled-call-graph=0 -profile-file=%S/Inputs/ctxsplit.extbinary.afdo -S \| FileCheck %s --check-prefix=POSTLINK
	;			;
	; Check both the flattened and nonflattened parts of the ctxsplit profile will			; Check both the flattened and nonflattened parts of the ctxsplit profile will
	; be read in thinlto prelink phase.			; be read in thinlto prelink phase.
	; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file=%S/Inputs/ctxsplit.extbinary.afdo -S \| FileCheck %s --check-prefix=PRELINK			; RUN: opt < %s -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -use-profiled-call-graph=0 -profile-file=%S/Inputs/ctxsplit.extbinary.afdo -S \| FileCheck %s --check-prefix=PRELINK
	;			;
	; Check both the flattened and nonflattened parts of the ctxsplit profile will			; Check both the flattened and nonflattened parts of the ctxsplit profile will
	; be read in non-thinlto mode.			; be read in non-thinlto mode.
	; RUN: opt < %s -passes='default<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file=%S/Inputs/ctxsplit.extbinary.afdo -S \| FileCheck %s --check-prefix=NOTHINLTO			; RUN: opt < %s -passes='default<O2>' -pgo-kind=pgo-sample-use-pipeline -use-profiled-call-graph=0 -profile-file=%S/Inputs/ctxsplit.extbinary.afdo -S \| FileCheck %s --check-prefix=NOTHINLTO

	; POSTLINK: define dso_local i32 @goo() {{.*}} !prof ![[ENTRY1:[0-9]+]] {			; POSTLINK: define dso_local i32 @goo() {{.*}} !prof ![[ENTRY1:[0-9]+]] {
	; POSTLINK: define dso_local i32 @foo() {{.*}} !prof ![[ENTRY2:[0-9]+]] {			; POSTLINK: define dso_local i32 @foo() {{.*}} !prof ![[ENTRY2:[0-9]+]] {
	; POSTLINK: ![[ENTRY1]] = !{!"function_entry_count", i64 1001}			; POSTLINK: ![[ENTRY1]] = !{!"function_entry_count", i64 1001}
	; POSTLINK: ![[ENTRY2]] = !{!"function_entry_count", i64 -1}			; POSTLINK: ![[ENTRY2]] = !{!"function_entry_count", i64 -1}
	; PRELINK: define dso_local i32 @goo() {{.*}} !prof ![[ENTRY1:[0-9]+]] {			; PRELINK: define dso_local i32 @goo() {{.*}} !prof ![[ENTRY1:[0-9]+]] {
	; PRELINK: define dso_local i32 @foo() {{.*}} !prof ![[ENTRY2:[0-9]+]] {			; PRELINK: define dso_local i32 @foo() {{.*}} !prof ![[ENTRY2:[0-9]+]] {
	; PRELINK: ![[ENTRY1]] = !{!"function_entry_count", i64 1001}			; PRELINK: ![[ENTRY1]] = !{!"function_entry_count", i64 1001}
	Show All 40 Lines

llvm/test/Transforms/SampleProfile/inline-mergeprof.ll

	; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee'			; Test we lose details of not inlined profile without '-sample-profile-merge-inlinee'
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -enable-new-pm=0 -S \| FileCheck -check-prefix=SCALE %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -use-profiled-call-graph=0 -enable-new-pm=0 -S \| FileCheck -check-prefix=SCALE %s
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -enable-new-pm=0 -S \| FileCheck -check-prefix=SCALE %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -use-profiled-call-graph=0 -enable-new-pm=0 -S \| FileCheck -check-prefix=SCALE %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -S \| FileCheck -check-prefix=SCALE %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=false -use-profiled-call-graph=0 -S \| FileCheck -check-prefix=SCALE %s

	; Test we properly merge not inlined profile with '-sample-profile-merge-inlinee'			; Test we properly merge not inlined profile with '-sample-profile-merge-inlinee'
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -S \| FileCheck -check-prefix=MERGE %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.prof -sample-profile-merge-inlinee=true -use-profiled-call-graph=0 -S \| FileCheck -check-prefix=MERGE %s

	; Test we properly merge not inlined profile with '-sample-profile-merge-inlinee'			; Test we properly merge not inlined profile with '-sample-profile-merge-inlinee'
	; when the profile uses md5.			; when the profile uses md5.
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.md5.prof -sample-profile-merge-inlinee=true -S \| FileCheck -check-prefix=MERGE %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-mergeprof.md5.prof -sample-profile-merge-inlinee=true -use-profiled-call-graph=0 -S \| FileCheck -check-prefix=MERGE %s

	@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1			@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1

	define i32 @main() #0 !dbg !6 {			define i32 @main() #0 !dbg !6 {
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%s = alloca i32, align 4			%s = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/profile-context-order.ll

	;; Test for different function processing orders affecting inlining in sample profile loader.			;; Test for different function processing orders affecting inlining in sample profile loader.

	;; There is an SCC _Z5funcAi -> _Z8funcLeafi -> _Z5funcAi in the program.			;; There is an SCC _Z5funcAi -> _Z8funcLeafi -> _Z5funcAi in the program.
	;; With -use-profile-top-down-order=0, the top-down processing order of			;; With -use-profiled-call-graph=0, the top-down processing order of
	;; that SCC is (_Z8funcLeafi, _Z5funcAi), which is determinined based on			;; that SCC is (_Z8funcLeafi, _Z5funcAi), which is determinined based on
	;; the static call graph. With -use-profile-top-down-order=1, call edges			;; the static call graph. With -use-profiled-call-graph=1, call edges
	;; from profile are considered, thus the order becomes (_Z5funcAi, _Z8funcLeafi)			;; from profile are considered, thus the order becomes (_Z5funcAi, _Z8funcLeafi)
	;; which leads to _Z8funcLeafi inlined into _Z5funcAi.			;; which leads to _Z8funcLeafi inlined into _Z5funcAi.
	; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=1 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=INLINE			; RUN: opt < %s -passes=sample-profile -use-profiled-call-graph=1 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=INLINE
	; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=0 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=NOINLINE			; RUN: opt < %s -passes=sample-profile -use-profiled-call-graph=0 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=NOINLINE

	;; There is an indirect call _Z5funcAi -> _Z3fibi in the program.			;; There is an indirect call _Z5funcAi -> _Z3fibi in the program.
	;; With -use-profile-indirect-call-edges=0, the processing order computed			;; With -use-profiled-call-graph=0, the processing order computed
	;; based on the static call graph is (_Z3fibi, _Z5funcAi). With			;; based on the static call graph is (_Z3fibi, _Z5funcAi). With
	;; -use-profile-top-down-order=1, the indirect call edge from profile is			;; -use-profiled-call-graph=1, the indirect call edge from profile is
	;; considered, thus the order becomes (_Z5funcAi, _Z3fibi) which leads to			;; considered, thus the order becomes (_Z5funcAi, _Z3fibi) which leads to
	;; _Z3fibi inlined into _Z5funcAi.			;; _Z3fibi inlined into _Z5funcAi.
	; RUN: opt < %s -passes=sample-profile -use-profile-indirect-call-edges=1 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=ICALL-INLINE			; RUN: opt < %s -passes=sample-profile -use-profiled-call-graph=1 -sample-profile-file=%S/Inputs/profile-context-order.prof -S \| FileCheck %s -check-prefix=ICALL-INLINE

	@factor = dso_local global i32 3, align 4, !dbg !0			@factor = dso_local global i32 3, align 4, !dbg !0
	@fp = dso_local global i32 (i32)* null, align 8			@fp = dso_local global i32 (i32)* null, align 8

	define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {			define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
	entry:			entry:
	store i32 (i32)* @_Z3fibi, i32 (i32)** @fp, align 8, !dbg !25			store i32 (i32)* @_Z3fibi, i32 (i32)** @fp, align 8, !dbg !25
	br label %for.body, !dbg !25			br label %for.body, !dbg !25
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll

	Show All 19 Lines
	; INLINE-ALL-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi			; INLINE-ALL-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi
	; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi			; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi
	; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi			; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z8funcLeafi			; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z8funcLeafi
	; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call.i1 = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call.i1 = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcAi
	; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcAi
	; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcBi			; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcBi
	; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcBi			; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcBi
	; INLINE-ALL-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi			; INLINE-ALL-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi			; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi
	; INLINE-ALL-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi			; INLINE-ALL-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi			; INLINE-ALL-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi			; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi
	; INLINE-ALL-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
				; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcAi
				; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcAi
	; INLINE-ALL-NEXT: Getting base profile for function: _Z8funcLeafi			; INLINE-ALL-NEXT: Getting base profile for function: _Z8funcLeafi
	; INLINE-ALL-NEXT: Merging context profile into base profile: _Z8funcLeafi			; INLINE-ALL-NEXT: Merging context profile into base profile: _Z8funcLeafi

	; Test we inlined the following in top-down order and promot rest not inlined context profile into base profile			; Test we inlined the following in top-down order and promot rest not inlined context profile into base profile
	; _Z5funcAi:1 @ _Z8funcLeafi			; _Z5funcAi:1 @ _Z8funcLeafi
	; _Z5funcBi:1 @ _Z8funcLeafi			; _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT: Getting base profile for function: main			; INLINE-HOT: Getting base profile for function: main
	; INLINE-HOT-NEXT: Merging context profile into base profile: main			; INLINE-HOT-NEXT: Merging context profile into base profile: main
	; INLINE-HOT-NEXT: Found context tree root to promote: external:12 @ main			; INLINE-HOT-NEXT: Found context tree root to promote: external:12 @ main
	; INLINE-HOT-NEXT: Context promoted and merged to: main			; INLINE-HOT-NEXT: Context promoted and merged to: main
	; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi			; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
	; INLINE-HOT-NEXT: Callee context found: main:3.1 @ _Z5funcBi			; INLINE-HOT-NEXT: Callee context found: main:3.1 @ _Z5funcBi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi			; INLINE-HOT-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi
	; INLINE-HOT-NEXT: Callee context found: main:3 @ _Z5funcAi			; INLINE-HOT-NEXT: Callee context found: main:3 @ _Z5funcAi
	; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcAi
	; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcAi
	; INLINE-HOT-NEXT: Found context tree root to promote: main:3 @ _Z5funcAi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !50
	; INLINE-HOT-NEXT: Callee context found: _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62
	; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69
	; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcBi			; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcBi
	; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcBi			; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcBi
	; INLINE-HOT-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi			; INLINE-HOT-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi			; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi
	; INLINE-HOT-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi			; INLINE-HOT-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi			; INLINE-HOT-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi			; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi
	; INLINE-HOT-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi			; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi			; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
				; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcAi
				; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcAi
				; INLINE-HOT-NEXT: Found context tree root to promote: main:3 @ _Z5funcAi
				; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi
				; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !50
				; INLINE-HOT-NEXT: Callee context found: _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62
				; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69
	; INLINE-HOT-NEXT: Getting base profile for function: _Z8funcLeafi			; INLINE-HOT-NEXT: Getting base profile for function: _Z8funcLeafi
	; INLINE-HOT-NEXT: Merging context profile into base profile: _Z8funcLeafi			; INLINE-HOT-NEXT: Merging context profile into base profile: _Z8funcLeafi


	@factor = dso_local global i32 3, align 4, !dbg !0			@factor = dso_local global i32 3, align 4, !dbg !0

	define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {			define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
	entry:			entry:
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/profile-topdown-order.ll

	;; Test for different function processing orders affecting inlining in sample profile loader.			;; Test for different function processing orders affecting inlining in sample profile loader.

	;; There is an SCC _Z5funcAi -> _Z8funcLeafi -> _Z5funcAi in the program.			;; There is an SCC _Z5funcAi -> _Z8funcLeafi -> _Z5funcAi in the program.
	;; With -use-profile-top-down-order=0, the top-down processing order of			;; With -use-profiled-call-graph=0, the top-down processing order of
	;; that SCC is (_Z8funcLeafi, _Z5funcAi), which is determinined based on			;; that SCC is (_Z8funcLeafi, _Z5funcAi), which is determinined based on
	;; the static call graph. With -use-profile-top-down-order=1, call edges			;; the static call graph. With -use-profiled-call-graph=1, call edges
	;; from profile are considered, thus the order becomes (_Z5funcAi, _Z8funcLeafi).			;; from profile are considered, thus the order becomes (_Z5funcAi, _Z8funcLeafi).
	;; While _Z8funcLeafi is not supposed to be inlined, the outlined entry counts			;; While _Z8funcLeafi is not supposed to be inlined, the outlined entry counts
	;; are affected.			;; are affected.
	; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=0 -sample-profile-file=%S/Inputs/profile-topdown-order.prof -S \| FileCheck %s -check-prefix=STATIC			; RUN: opt < %s -passes=sample-profile -use-profiled-call-graph=0 -sample-profile-file=%S/Inputs/profile-topdown-order.prof -S \| FileCheck %s -check-prefix=STATIC
	; RUN: opt < %s -passes=sample-profile -use-profile-top-down-order=1 -sample-profile-file=%S/Inputs/profile-topdown-order.prof -S \| FileCheck %s -check-prefix=DYNAMIC			; RUN: opt < %s -passes=sample-profile -use-profiled-call-graph=1 -sample-profile-file=%S/Inputs/profile-topdown-order.prof -S \| FileCheck %s -check-prefix=DYNAMIC


	; STATIC: define dso_local i32 @_Z8funcLeafi{{.*}} !prof ![[#PROF:]]			; STATIC: define dso_local i32 @_Z8funcLeafi{{.*}} !prof ![[#PROF:]]
	; STATIC: ![[#PROF]] = !{!"function_entry_count", i64 21}			; STATIC: ![[#PROF]] = !{!"function_entry_count", i64 21}
	; DYNAMIC: define dso_local i32 @_Z8funcLeafi{{.*}} !prof ![[#PROF:]]			; DYNAMIC: define dso_local i32 @_Z8funcLeafi{{.*}} !prof ![[#PROF:]]
	; DYNAMIC: ![[#PROF]] = !{!"function_entry_count", i64 27}			; DYNAMIC: ![[#PROF]] = !{!"function_entry_count", i64 27}

	@factor = dso_local global i32 3, align 4, !dbg !0			@factor = dso_local global i32 3, align 4, !dbg !0
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/CSPreInliner.cpp

	Show All 36 Lines

	CSPreInliner::CSPreInliner(StringMap<FunctionSamples> &Profiles,			CSPreInliner::CSPreInliner(StringMap<FunctionSamples> &Profiles,
	uint64_t HotThreshold, uint64_t ColdThreshold)			uint64_t HotThreshold, uint64_t ColdThreshold)
	: ContextTracker(Profiles), ProfileMap(Profiles),			: ContextTracker(Profiles), ProfileMap(Profiles),
	HotCountThreshold(HotThreshold), ColdCountThreshold(ColdThreshold) {}			HotCountThreshold(HotThreshold), ColdCountThreshold(ColdThreshold) {}

	std::vector<StringRef> CSPreInliner::buildTopDownOrder() {			std::vector<StringRef> CSPreInliner::buildTopDownOrder() {
	std::vector<StringRef> Order;			std::vector<StringRef> Order;
	ProfiledCallGraph ProfiledCG(ProfileMap, ContextTracker);			ProfiledCallGraph ProfiledCG(ContextTracker);

	// Now that we have a profiled call graph, construct top-down order			// Now that we have a profiled call graph, construct top-down order
	// by building up SCC and reversing SCC order.			// by building up SCC and reversing SCC order.
	scc_iterator<ProfiledCallGraph *> I = scc_begin(&ProfiledCG);			scc_iterator<ProfiledCallGraph *> I = scc_begin(&ProfiledCG);
	while (!I.isAtEnd()) {			while (!I.isAtEnd()) {
	for (ProfiledCallGraphNode Node : I) {			for (ProfiledCallGraphNode Node : I) {
	if (Node != ProfiledCG.getEntryNode())			if (Node != ProfiledCG.getEntryNode())
	Order.push_back(Node->Name);			Order.push_back(Node->Name);
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Top-down processing order based on full profile.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 334223

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/ctxsplit.ll

llvm/test/Transforms/SampleProfile/inline-mergeprof.ll

llvm/test/Transforms/SampleProfile/profile-context-order.ll

llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll

llvm/test/Transforms/SampleProfile/profile-topdown-order.ll

llvm/tools/llvm-profgen/CSPreInliner.cpp

[CSSPGO] Top-down processing order based on full profile.
ClosedPublic