This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
1/2
SampleContextTracker.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
2/4
SampleContextTracker.cpp
15/20
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
-
indirect-call-csspgo.prof
-
csspgo-inline-debug.ll
-
csspgo-inline-icall.ll
-
csspgo-inline.ll
-
profile-context-tracker-debug.ll
-
profile-context-tracker.ll

Differential D94001

[CSSPGO] Call site prioritized inlining for sample PGO
ClosedPublic

Authored by wenlei on Jan 3 2021, 4:45 PM.

Download Raw Diff

Details

Reviewers

wmi
davidxl
hoy

Commits

rG6bae5973c476: [CSSPGO] Call site prioritized inlining for sample PGO

Summary

This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (-sample-profile-prioritized-inline=0).

Motivation

With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.

It doesn't take inline candidate size (and saving) into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inaccurate hotness heuristic. With pseudo-probe and the change above, this is no longer an issue for CSSPGO.

New FDO Inliner

Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.

Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
Size/growth Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.

Note that the new inliner also avoids repeatedly evaluating same set of call sites, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).

Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.

Tunings and knobs

I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.

sample-profile-inline-growth-limit: Limit the size growth ratio, default to 12 - inliner can't grow size to 12x of pre-inline size.
sample-profile-inline-limit-min: The lower bound of size growth limit, defaults to 100.
sample-profile-inline-limit-max: The upper bound of size growth limit, defaults to 10000.
sample-profile-hot-inline-threshold: Inline cost threshold for hot call sites, defaults to 3000, same as CGSCC inliner.

Results

Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and mature for bigger workloads (we're working on it).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	300 ms	x64 debian > libarcher.races::task-dependency.c
	250 ms	x64 debian > libarcher.races::task-taskgroup-unrelated.c
	300 ms	x64 debian > libarcher.races::task-taskwait-nested.c
	270 ms	x64 debian > libarcher.races::task-two.c
	290 ms	x64 debian > libarcher.task::task-barrier.c
		View Full Test Results (13 Failed)

Event Timeline

wenlei created this revision.Jan 3 2021, 4:45 PM

Herald added subscribers: hoy, modimo, lxfind and 2 others. · View Herald TranscriptJan 3 2021, 4:45 PM

wenlei requested review of this revision.Jan 3 2021, 4:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2021, 4:45 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wenlei edited the summary of this revision. (Show Details)Jan 3 2021, 4:54 PM

Harbormaster completed remote builds in B83863: Diff 314306.Jan 3 2021, 5:17 PM

Fix test and linter.

Harbormaster completed remote builds in B83871: Diff 314319.Jan 4 2021, 12:27 AM

wenlei added reviewers: wmi, davidxl, hoy.Jan 4 2021, 10:24 AM

wenlei added a subscriber: wlei.

A few high level questions:

Can the size bloat problem be handled in llvm-profgen time ? Basically using hotness information to prune/merge profiles properly?
What is the intuition behind BFS order?
How often does the size limit get triggered with the new inliner?
what is the largest improvement in spec06? Any internal benchmark data?

In D94001#2479793, @davidxl wrote:

A few high level questions:

Can the size bloat problem be handled in llvm-profgen time ? Basically using hotness information to prune/merge profiles properly?

Yes to some degree. llvm-profgen can do two things: 1) prune and merge profile using hotness without knowledge of inlining. We are doing this already for baseline without this change, but not enough to limit inlining, since this has to be conservative as the pruning is not selective enough. 2) In best case scenario, if we can predict all inline decision, llvm-profgen can prepare (promote and merge) the profile in a way that only those context profile needed for inlining is kept, in which case the inlining will be bounded by the profile output from llvm-profgen. However, llvm-profgen don't have inline cost, so it's hard to prepare context profile perfectly. It's best to leave it up to the sample profile inliner to decide what context profile is useful, and what to be promoted and merged.

We do plan to implement some top down global inlining estimation and adjust profiles accordingly in llvm-profgen though, for ThinLTO. This is because LTO sample profile inliner is not global for ThinLTO, and letting llvm-profgen do some preparation would help with cross-module context profile adjustment. (We could also do something in ThinLink, but that adds more complexity and may also hurt compile time)

What is the intuition behind BFS order?

This is to have a more balanced inlining. BFS with priority queue will always pick the most beneficial call site to inline within several levels of call graph from current function; while DFS may go deep on a particular call path without knowing whether we have more beneficial candidate on other paths.

How often does the size limit get triggered with the new inliner?

Most functions in SPEC didn't hit the limit. There's one outlier, gobmk, without the cap it's disastrous for perf and code size - its call graph is very dense with small functions, and unbounded profile guided inliner would go wild. I can also double check on other workload.

what is the largest improvement in spec06? Any internal benchmark data?

The largest improvements came from povray (11%), gobmk (10%), followed by a few others in low single digit.

When we made this change, we haven't tried internal workload, and later when we try internal workload, it's group with other changes together. So unfortunately, I don't have data for internal workload for this change alone. We can give it a try to on some internal benchmarks, though we will have to use internal fork to do that which has other stuff not yet upstreamed.

Sorry for the delay in review. It makes a lot of sense to have a priority based inliner for CSSPGO since its profile annotation doesn't rely on replaying the inlining. But I don't understand why we rely on BFS/DFS strategy to expose the hottest callsite for priority based inliner. In my mind, CSSPGO can know the hotness of each callsite with full context information. Can we just sort all the callsites based on CSSPGO profile then try to inline the hottest callsite under a specific context first in a top down manner? We may actually need to inline some parent callsites before we can inline the hottest callsite, but it is all driven by the target to inline the hottest callsite first. If we worry some callsite is too deep and we need to inline through a deep path before we can inline the specific callsite, we may add the depth into priority computation. What do you think?

llvm/lib/Transforms/IPO/SampleProfile.cpp
431	Since there is another Struct named as InlineCandidate, can we rename it to doInline or executeInline?

In D94001#2494594, @wmi wrote:

Sorry for the delay in review. It makes a lot of sense to have a priority based inliner for CSSPGO since its profile annotation doesn't rely on replaying the inlining. But I don't understand why we rely on BFS/DFS strategy to expose the hottest callsite for priority based inliner. In my mind, CSSPGO can know the hotness of each callsite with full context information. Can we just sort all the callsites based on CSSPGO profile then try to inline the hottest callsite under a specific context first in a top down manner? We may actually need to inline some parent callsites before we can inline the hottest callsite, but it is all driven by the target to inline the hottest callsite first. If we worry some callsite is too deep and we need to inline through a deep path before we can inline the specific callsite, we may add the depth into priority computation. What do you think?

Yeah, that's one step further than what I have in this patch. The key here is the prioritized work list, and BFS is just a natural by product of using the priority queue. I called out BFS to emphasize the more balanced inlining, but it's not super accurate because it is only a true BFS when priorities are all identical. What you suggested is essentially increasing the scope of call sites to prioritize - currently it's the immediate callees. I agreed that increasing the scope to multiple levels or the entire sub call tree may be beneficial. But I don't think it conflicts the approach in the patch, instead it can be added leveraging this implementation.

With context profiles organized in a trie, it's relatively easy to get the hottest call sites in sub call tree (assuming no stale profile). I think we can assign total order/priority for these hot call sites as well as the call sites in the middle leading to those hot ones (call sites that lead to hottest call site need to have their priority bumped up), then the priority queue will lead us to these hottest call sites first even if they're several levels away.. I can give it a try to increase the scope of call site for prioritization as a follow up.

We need to be able to priority call site before its IR is available though. Originally I was thinking about also taking cost/size into account in priority in addition to call site hotness (don't have that implemented yet), to do that we need the actual call site IR in addition to profile. I was also hoping to eventually unify the early inliner for AFDO and CSSPGO. This implementation didn't diverge from AFDO too far, and we should be able to merge the two in the future with just different priority assignment (equal priority for AFDO). But I don't think these two are critical.

llvm/lib/Transforms/IPO/SampleProfile.cpp
431	Good point, renamed to `tryInlineCandidate` to be consistent with `tryPromoteIndirectCall` and `shouldInlineCandidate`. This function may decided not to inline.

retitle, rename inlineCandidate.

wenlei retitled this revision from [CSSPGO] Call site prioritized BFS inlining for sample PGO to [CSSPGO] Call site prioritized inlining for sample PGO.Jan 12 2021, 11:01 PM

Harbormaster completed remote builds in B84979: Diff 316328.Jan 12 2021, 11:46 PM

In D94001#2495265, @wenlei wrote:

In D94001#2494594, @wmi wrote:

Sorry for the delay in review. It makes a lot of sense to have a priority based inliner for CSSPGO since its profile annotation doesn't rely on replaying the inlining. But I don't understand why we rely on BFS/DFS strategy to expose the hottest callsite for priority based inliner. In my mind, CSSPGO can know the hotness of each callsite with full context information. Can we just sort all the callsites based on CSSPGO profile then try to inline the hottest callsite under a specific context first in a top down manner? We may actually need to inline some parent callsites before we can inline the hottest callsite, but it is all driven by the target to inline the hottest callsite first. If we worry some callsite is too deep and we need to inline through a deep path before we can inline the specific callsite, we may add the depth into priority computation. What do you think?

Yeah, that's one step further than what I have in this patch. The key here is the prioritized work list, and BFS is just a natural by product of using the priority queue. I called out BFS to emphasize the more balanced inlining, but it's not super accurate because it is only a true BFS when priorities are all identical. What you suggested is essentially increasing the scope of call sites to prioritize - currently it's the immediate callees. I agreed that increasing the scope to multiple levels or the entire sub call tree may be beneficial. But I don't think it conflicts the approach in the patch, instead it can be added leveraging this implementation.

With context profiles organized in a trie, it's relatively easy to get the hottest call sites in sub call tree (assuming no stale profile). I think we can assign total order/priority for these hot call sites as well as the call sites in the middle leading to those hot ones (call sites that lead to hottest call site need to have their priority bumped up), then the priority queue will lead us to these hottest call sites first even if they're several levels away.. I can give it a try to increase the scope of call site for prioritization as a follow up.

Ok, thanks. About the prioritized work list, I was also expecting it will consider all the possible candidates in the module, not just in a Function, so it will be disruptive in terms of the order of profile annotation and inlining. I agree it can be a followup since it needs more experiments.

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
100	Do we expect DIL to be the debug location of the indirect call?
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
43	How about getMaxCountChildContext?
223–225	Is it possible for getContextFor to return nullptr for DIL, and do you need to handle such case?
llvm/lib/Transforms/IPO/SampleProfile.cpp
186	It will only be applied to priority based sample profile loader inlining, right? Same for the description of sample-profile-inline-limit-min/sample-profile-inline-limit-max
368	Considering the case CallsiteCounts are equal, does every InlineCandidate have non-null CalleeSamples? If that is true, FunctionSamples::GUID can be used to make the comparison more stable.
1001	Add assertions to check L and R are not nullptr?
1404	CallsiteCount is 0 definitely before this line, so no need to use std::max. CallsiteCount = CalleeSamples->getEntrySamples();
1502	Several parts in this loop looks quite similar as the counterparts in inlineHotFunctions. Is it possible to make them shared?

wenlei marked an inline comment as done.Jan 19 2021, 5:59 PM

wenlei added inline comments.

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
100	Right, discriminator for line-based profile and call site probe id for csspgo profile should differentiate call sites so DIL uniquely identifies a call site. Updated the comment to be explicit.
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
43	Good point, changed to `getHottestChildContext`.
223–225	That shouldn't happen because everything we inlined during sample loader must have a context profile, hence `getContextFor` for any location must have a containing context profile.
llvm/lib/Transforms/IPO/SampleProfile.cpp
186	Yeah, updated the descriptions. thanks for pointing out.
368	Yeah, every candidate should have non-null CalleeSamples. Added tie breaker using GUID. Thanks for the suggestion.
1404	Good catch, this is a mistaken when taken out the changes for upstream. These are meant to be separate ifs. We found that taking the max works better. Fixed.
1502	Yeah, I thought about that too. I hoisted out part of ICP into tryPromoteIndirectCall to be shared. Looking more at this, I think if we let inlineHotFunctions uses InlineCandidate (with dummy hotness), we should be able to reuse tryInlineCandidate for inlineHotFunctions, and that may enable more shared code for the two loops. What about let me try this with an NFC patch on top of this one, so this patch doesn't change existing inlining code too much?

Address Wei's feedback. Clean up ICPCallee from InlineCandidate now that we don't check InlineCost before ICP.

Harbormaster completed remote builds in B85803: Diff 317737.Jan 19 2021, 7:26 PM

default CallsitePrioritizedInline to true only for CSSPGO

wenlei added inline comments.Jan 20 2021, 12:03 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1502	Sent D95024. The two loops are shorter now. inlineCallInstruction (AFDO) similar to tryInlineCandidate (CSSPGO) is now merged and removed. The ICP+Inline code all lifted into new common helper tryPromoteAndInlineCandidate.

wenlei added a child revision: D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.Jan 20 2021, 12:05 AM

Harbormaster completed remote builds in B85842: Diff 317784.Jan 20 2021, 1:06 AM

rebase

Harbormaster completed remote builds in B86896: Diff 319648.Jan 27 2021, 2:30 PM

rebase, update test case after merge.

Some more data points:

Inline size/growth limit hits:
- Among spec2017, gcc has ~100 hits where inlining stopped due to limit. parest has ~40. Other benchmarks mostly has around 0-20 hits.
- For server side Hermes (internal version of https://github.com/facebook/hermes), inliner hit limit for ~500 times.
- For cpython (internal version of https://github.com/python/cpython), inliner hit limit for ~100 times.

Perf and code size:
- I don't have numbers comparing just w/ and w/o the new inliner for bigger workloads. But here's what I got on spec2017 comparing -sample-profile-prioritized-inline=1 vs -sample-profile-prioritized-inline=1 - ~1.5% geomean perf boost, and it also controls the size growth (30% reduction) as intended. This is using pseudo-probe, and some small tweaks that we will upstream after this change are also included in the test run.

SPEC2017 run time .text size
508.namd_r -0.62% -10.73%
510.parest_r 0.23% -1.68%
511.povray_r -3.79% -50.93%
526.blender_r 1.06% -7.78%
600.perlbench_s -0.62% -41.87%
602.gcc_s -1.58% -60.28%
605.mcf_s -0.64% -39.95%
620.omnetpp_s 0.34% -4.08%
623.xalancbmk_s -5.24% -8.20%
625.x264_s -10.36% -75.58%
631.deepsjeng_s -1.87% -55.70%
638.imagick_s 0.14% 0.00%
641.leela_s -0.87% -13.85%
644.nab_s 0.79% -6.19%
657.xz_s 0.21% -12.84%
geomean -1.57% -31.16%

Harbormaster completed remote builds in B87375: Diff 320490.Feb 1 2021, 9:36 AM

@wmi, @davidxl We collected some data hoping that gave some insight to the questions asked earlier. I think I've addressed all comments here (with some in D95024), let me know if I missed anything or if you have any other comments.

Thanks for the data. It shows priority based inliner does significantly better than current early inliner in sample loader for CSSPGO both on performance and code size! It will interested to know how the importance to performance is distributed between priority based early inliner and CGSCC inliner now. I imagaine CGSCC inliner now plays a very minor role in CSSPGO now?

llvm/lib/Transforms/IPO/SampleProfile.cpp
1412	Add assert message. There are some other places missing the messages.
1535	According to https://bugs.llvm.org/show_bug.cgi?id=18962, PR18962 has been fixed in 2014. Is it the right bug to refer to?
1544–1546	Should the comment block be hoisted?

wenlei marked 3 inline comments as done.Feb 1 2021, 6:04 PM

wenlei added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1412	Message added for all instances.
1535	The bug was fixed, but we generate different types for the same definition for the case in that bug, and call analyzer can't handle that now. Updated the comment to include more details. For code below, whether fn1 is present affects the type name of A. Then we could see two different types from the same definition, which makes CallAnalyzer choke as it's expecting matching parameter type on both caller and callee side. class A { // append has to have the same prototype as fn1 to tickle the bug. void (append)(A ); }; void fn1(A *p1) { }
1544–1546	Updated the 1st part of the comment, and removed the 2nd part which is out dated.

Address Wei's comments.

In D94001#2534785, @wmi wrote:

Thanks for the data. It shows priority based inliner does significantly better than current early inliner in sample loader for CSSPGO both on performance and code size! It will interested to know how the importance to performance is distributed between priority based early inliner and CGSCC inliner now. I imagaine CGSCC inliner now plays a very minor role in CSSPGO now?

CGSCC inliner is still quite important for CSSPGO for now. We have tried to skip pre-LTO inlining, and that seems to be fine, however if we skip LTO time CGSCC inlining, there's noticeable regression. This is something we want to dig more, and gradually shift more inlining from CGSCC to sample loader's top down inlining.

Harbormaster completed remote builds in B87459: Diff 320655.Feb 1 2021, 7:54 PM

LGTM.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1535	I see, thanks for clarifying it. It is better to add a TODO here.

This revision is now accepted and ready to land.Feb 1 2021, 10:37 PM

wenlei marked an inline comment as done.Feb 1 2021, 11:45 PM

wenlei added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1535	Updated comment with TODO.

Update comment.

This revision was landed with ongoing or failed builds.Feb 1 2021, 11:47 PM

Closed by commit rG6bae5973c476: [CSSPGO] Call site prioritized inlining for sample PGO (authored by wenlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wenlei added a commit: rG6bae5973c476: [CSSPGO] Call site prioritized inlining for sample PGO.

Harbormaster completed remote builds in B87481: Diff 320690.Feb 2 2021, 12:28 AM

wmi mentioned this in D95988: [CSSPGO] Process functions in a top-down order on a dynamic call graph..Feb 5 2021, 10:52 PM

wenlei mentioned this in D110658: [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope.Oct 1 2021, 12:04 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

SampleContextTracker.h

6 lines

lib/

Transforms/

IPO/

SampleContextTracker.cpp

77 lines

SampleProfile.cpp

442 lines

test/

Transforms/

SampleProfile/

Inputs/

indirect-call-csspgo.prof

10 lines

	csspgo-inline-debug.ll
	profile-context-tracker.ll

73 lines

csspgo-inline-icall.ll

63 lines

	csspgo-inline.ll
	profile-context-tracker.ll

81 lines

profile-context-tracker-debug.ll

25 lines

profile-context-tracker.ll

15 lines

Diff 320655

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

	Show All 17 Lines
	#include "llvm/ADT/SmallSet.h"			#include "llvm/ADT/SmallSet.h"
	#include "llvm/ADT/StringMap.h"			#include "llvm/ADT/StringMap.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/IR/DebugInfoMetadata.h"			#include "llvm/IR/DebugInfoMetadata.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/ProfileData/SampleProf.h"			#include "llvm/ProfileData/SampleProf.h"
	#include <list>			#include <list>
	#include <map>			#include <map>
				#include <vector>

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;

	namespace llvm {			namespace llvm {

	// Internal trie tree representation used for tracking context tree and sample			// Internal trie tree representation used for tracking context tree and sample
	// profiles. The path from root node to a given node represents the context of			// profiles. The path from root node to a given node represents the context of
	// that nodes' profile.			// that nodes' profile.
	class ContextTrieNode {			class ContextTrieNode {
	public:			public:
	ContextTrieNode(ContextTrieNode *Parent = nullptr,			ContextTrieNode(ContextTrieNode *Parent = nullptr,
	StringRef FName = StringRef(),			StringRef FName = StringRef(),
	FunctionSamples *FSamples = nullptr,			FunctionSamples *FSamples = nullptr,
	LineLocation CallLoc = {0, 0})			LineLocation CallLoc = {0, 0})
	: ParentContext(Parent), FuncName(FName), FuncSamples(FSamples),			: ParentContext(Parent), FuncName(FName), FuncSamples(FSamples),
	CallSiteLoc(CallLoc){};			CallSiteLoc(CallLoc){};
	ContextTrieNode *getChildContext(const LineLocation &CallSite,			ContextTrieNode *getChildContext(const LineLocation &CallSite,
	StringRef CalleeName);			StringRef CalleeName);
	ContextTrieNode *getChildContext(const LineLocation &CallSite);			ContextTrieNode *getHottestChildContext(const LineLocation &CallSite);
	ContextTrieNode *getOrCreateChildContext(const LineLocation &CallSite,			ContextTrieNode *getOrCreateChildContext(const LineLocation &CallSite,
	StringRef CalleeName,			StringRef CalleeName,
	bool AllowCreate = true);			bool AllowCreate = true);

	ContextTrieNode &moveToChildContext(const LineLocation &CallSite,			ContextTrieNode &moveToChildContext(const LineLocation &CallSite,
	ContextTrieNode &&NodeToMove,			ContextTrieNode &&NodeToMove,
	StringRef ContextStrToRemove,			StringRef ContextStrToRemove,
	bool DeleteNode = true);			bool DeleteNode = true);
	Show All 35 Lines
	// calling context and the context is identified by path from root to the node.			// calling context and the context is identified by path from root to the node.
	class SampleContextTracker {			class SampleContextTracker {
	public:			public:
	SampleContextTracker(StringMap<FunctionSamples> &Profiles);			SampleContextTracker(StringMap<FunctionSamples> &Profiles);
	// Query context profile for a specific callee with given name at a given			// Query context profile for a specific callee with given name at a given
	// call-site. The full context is identified by location of call instruction.			// call-site. The full context is identified by location of call instruction.
	FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,			FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,
	StringRef CalleeName);			StringRef CalleeName);
				// Get samples for indirect call targets for call site at given location.
				std::vector<const FunctionSamples *>
				getIndirectCalleeContextSamplesFor(const DILocation *DIL);
				wmiUnsubmitted Not Done Reply Inline Actions Do we expect DIL to be the debug location of the indirect call? wmi: Do we expect DIL to be the debug location of the indirect call?
				wenleiAuthorUnsubmitted Done Reply Inline Actions Right, discriminator for line-based profile and call site probe id for csspgo profile should differentiate call sites so DIL uniquely identifies a call site. Updated the comment to be explicit. wenlei: Right, discriminator for line-based profile and call site probe id for csspgo profile should…
	// Query context profile for a given location. The full context			// Query context profile for a given location. The full context
	// is identified by input DILocation.			// is identified by input DILocation.
	FunctionSamples getContextSamplesFor(const DILocation DIL);			FunctionSamples getContextSamplesFor(const DILocation DIL);
	// Query context profile for a given sample contxt of a function.			// Query context profile for a given sample contxt of a function.
	FunctionSamples *getContextSamplesFor(const SampleContext &Context);			FunctionSamples *getContextSamplesFor(const SampleContext &Context);
	// Query base profile for a given function. A base profile is a merged view			// Query base profile for a given function. A base profile is a merged view
	// of all context profiles for contexts that are not inlined.			// of all context profiles for contexts that are not inlined.
	FunctionSamples *getBaseSamplesFor(const Function &Func,			FunctionSamples *getBaseSamplesFor(const Function &Func,
	Show All 37 Lines

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

Show All 24 Lines

#define DEBUG_TYPE "sample-context-tracker"		#define DEBUG_TYPE "sample-context-tracker"

namespace llvm {		namespace llvm {

ContextTrieNode *ContextTrieNode::getChildContext(const LineLocation &CallSite,		ContextTrieNode *ContextTrieNode::getChildContext(const LineLocation &CallSite,
StringRef CalleeName) {		StringRef CalleeName) {
if (CalleeName.empty())		if (CalleeName.empty())
return getChildContext(CallSite);		return getHottestChildContext(CallSite);

uint32_t Hash = nodeHash(CalleeName, CallSite);		uint32_t Hash = nodeHash(CalleeName, CallSite);
auto It = AllChildContext.find(Hash);		auto It = AllChildContext.find(Hash);
if (It != AllChildContext.end())		if (It != AllChildContext.end())
return &It->second;		return &It->second;
return nullptr;		return nullptr;
}		}

ContextTrieNode *		ContextTrieNode *
ContextTrieNode::getChildContext(const LineLocation &CallSite) {		ContextTrieNode::getHottestChildContext(const LineLocation &CallSite) {
		wmiUnsubmitted Not Done Reply Inline Actions How about getMaxCountChildContext? wmi: How about getMaxCountChildContext?
		wenleiAuthorUnsubmitted Done Reply Inline Actions Good point, changed to `getHottestChildContext`. wenlei: Good point, changed to `getHottestChildContext`.
// CSFDO-TODO: This could be slow, change AllChildContext so we can		// CSFDO-TODO: This could be slow, change AllChildContext so we can
// do point look up for child node by call site alone.		// do point look up for child node by call site alone.
// CSFDO-TODO: Return the child with max count for indirect call		// Retrieve the child node with max count for indirect call
ContextTrieNode *ChildNodeRet = nullptr;		ContextTrieNode *ChildNodeRet = nullptr;
		uint64_t MaxCalleeSamples = 0;
for (auto &It : AllChildContext) {		for (auto &It : AllChildContext) {
ContextTrieNode &ChildNode = It.second;		ContextTrieNode &ChildNode = It.second;
if (ChildNode.CallSiteLoc == CallSite) {		if (ChildNode.CallSiteLoc != CallSite)
if (ChildNodeRet)		continue;
return nullptr;		FunctionSamples *Samples = ChildNode.getFunctionSamples();
else		if (!Samples)
		continue;
		if (Samples->getTotalSamples() > MaxCalleeSamples) {
ChildNodeRet = &ChildNode;		ChildNodeRet = &ChildNode;
		MaxCalleeSamples = Samples->getTotalSamples();
}		}
}		}

return ChildNodeRet;		return ChildNodeRet;
}		}

ContextTrieNode &ContextTrieNode::moveToChildContext(		ContextTrieNode &ContextTrieNode::moveToChildContext(
const LineLocation &CallSite, ContextTrieNode &&NodeToMove,		const LineLocation &CallSite, ContextTrieNode &&NodeToMove,
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	for (auto &FuncSample : Profiles) {
NewNode->setFunctionSamples(FSamples);		NewNode->setFunctionSamples(FSamples);
}		}
}		}

FunctionSamples *		FunctionSamples *
SampleContextTracker::getCalleeContextSamplesFor(const CallBase &Inst,		SampleContextTracker::getCalleeContextSamplesFor(const CallBase &Inst,
StringRef CalleeName) {		StringRef CalleeName) {
LLVM_DEBUG(dbgs() << "Getting callee context for instr: " << Inst << "\n");		LLVM_DEBUG(dbgs() << "Getting callee context for instr: " << Inst << "\n");
// CSFDO-TODO: We use CalleeName to differentiate indirect call
// We need to get sample for indirect callee too.
DILocation *DIL = Inst.getDebugLoc();		DILocation *DIL = Inst.getDebugLoc();
if (!DIL)		if (!DIL)
return nullptr;		return nullptr;

		// For indirect call, CalleeName will be empty, in which case the context
		// profile for callee with largest total samples will be returned.
ContextTrieNode *CalleeContext = getCalleeContextFor(DIL, CalleeName);		ContextTrieNode *CalleeContext = getCalleeContextFor(DIL, CalleeName);
if (CalleeContext) {		if (CalleeContext) {
FunctionSamples *FSamples = CalleeContext->getFunctionSamples();		FunctionSamples *FSamples = CalleeContext->getFunctionSamples();
LLVM_DEBUG(if (FSamples) {		LLVM_DEBUG(if (FSamples) {
dbgs() << " Callee context found: " << FSamples->getContext() << "\n";		dbgs() << " Callee context found: " << FSamples->getContext() << "\n";
});		});
return FSamples;		return FSamples;
}		}

return nullptr;		return nullptr;
}		}

		std::vector<const FunctionSamples *>
		SampleContextTracker::getIndirectCalleeContextSamplesFor(
		const DILocation *DIL) {
		std::vector<const FunctionSamples *> R;
		if (!DIL)
		return R;

		ContextTrieNode *CallerNode = getContextFor(DIL);
		LineLocation CallSite = FunctionSamples::getCallSiteIdentifier(DIL);
		for (auto &It : CallerNode->getAllChildContext()) {
		wmiUnsubmitted Not Done Reply Inline Actions Is it possible for getContextFor to return nullptr for DIL, and do you need to handle such case? wmi: Is it possible for getContextFor to return nullptr for DIL, and do you need to handle such case?
		wenleiAuthorUnsubmitted Done Reply Inline Actions That shouldn't happen because everything we inlined during sample loader must have a context profile, hence `getContextFor` for any location must have a containing context profile. wenlei: That shouldn't happen because everything we inlined during sample loader must have a context…
		ContextTrieNode &ChildNode = It.second;
		if (ChildNode.getCallSiteLoc() != CallSite)
		continue;
		if (FunctionSamples *CalleeSamples = ChildNode.getFunctionSamples())
		R.push_back(CalleeSamples);
		}

		return R;
		}

FunctionSamples *		FunctionSamples *
SampleContextTracker::getContextSamplesFor(const DILocation *DIL) {		SampleContextTracker::getContextSamplesFor(const DILocation *DIL) {
assert(DIL && "Expect non-null location");		assert(DIL && "Expect non-null location");

ContextTrieNode *ContextNode = getContextFor(DIL);		ContextTrieNode *ContextNode = getContextFor(DIL);
if (!ContextNode)		if (!ContextNode)
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Marking context profile as inlined: "
<< InlinedSamples->getContext() << "\n");		<< InlinedSamples->getContext() << "\n");
InlinedSamples->getContext().setState(InlinedContext);		InlinedSamples->getContext().setState(InlinedContext);
}		}

void SampleContextTracker::promoteMergeContextSamplesTree(		void SampleContextTracker::promoteMergeContextSamplesTree(
const Instruction &Inst, StringRef CalleeName) {		const Instruction &Inst, StringRef CalleeName) {
LLVM_DEBUG(dbgs() << "Promoting and merging context tree for instr: \n"		LLVM_DEBUG(dbgs() << "Promoting and merging context tree for instr: \n"
<< Inst << "\n");		<< Inst << "\n");
// CSFDO-TODO: We also need to promote context profile from indirect
// calls. We won't have callee names from those from call instr.
if (CalleeName.empty())
return;

// Get the caller context for the call instruction, we don't use callee		// Get the caller context for the call instruction, we don't use callee
// name from call because there can be context from indirect calls too.		// name from call because there can be context from indirect calls too.
DILocation *DIL = Inst.getDebugLoc();		DILocation *DIL = Inst.getDebugLoc();
ContextTrieNode *CallerNode = getContextFor(DIL);		ContextTrieNode *CallerNode = getContextFor(DIL);
if (!CallerNode)		if (!CallerNode)
return;		return;

// Get the context that needs to be promoted		LineLocation CallSite = FunctionSamples::getCallSiteIdentifier(DIL);
LineLocation CallSite(FunctionSamples::getOffset(DIL),		// For indirect call, CalleeName will be empty, in which case we need to
DIL->getBaseDiscriminator());		// promote all non-inlined child context profiles.
		if (CalleeName.empty()) {
		for (auto &It : CallerNode->getAllChildContext()) {
		ContextTrieNode *NodeToPromo = &It.second;
		if (CallSite != NodeToPromo->getCallSiteLoc())
		continue;
		FunctionSamples *FromSamples = NodeToPromo->getFunctionSamples();
		if (FromSamples && FromSamples->getContext().hasState(InlinedContext))
		continue;
		promoteMergeContextSamplesTree(*NodeToPromo);
		}
		return;
		}

		// Get the context for the given callee that needs to be promoted
ContextTrieNode *NodeToPromo =		ContextTrieNode *NodeToPromo =
CallerNode->getChildContext(CallSite, CalleeName);		CallerNode->getChildContext(CallSite, CalleeName);
if (!NodeToPromo)		if (!NodeToPromo)
return;		return;

promoteMergeContextSamplesTree(*NodeToPromo);		promoteMergeContextSamplesTree(*NodeToPromo);
}		}

ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(		ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(
ContextTrieNode &NodeToPromo) {		ContextTrieNode &NodeToPromo) {
// Promote the input node to be directly under root. This can happen		// Promote the input node to be directly under root. This can happen
// when we decided to not inline a function under context represented		// when we decided to not inline a function under context represented
// by the input node. The promote and merge is then needed to reflect		// by the input node. The promote and merge is then needed to reflect
// the context profile in the base (context-less) profile.		// the context profile in the base (context-less) profile.
FunctionSamples *FromSamples = NodeToPromo.getFunctionSamples();		FunctionSamples *FromSamples = NodeToPromo.getFunctionSamples();
assert(FromSamples && "Shouldn't promote a context without profile");		assert(FromSamples && "Shouldn't promote a context without profile");
LLVM_DEBUG(dbgs() << " Found context tree root to promote: "		LLVM_DEBUG(dbgs() << " Found context tree root to promote: "
<< FromSamples->getContext() << "\n");		<< FromSamples->getContext() << "\n");

		assert(!FromSamples->getContext().hasState(InlinedContext) &&
		"Shouldn't promote inlined context profile");
StringRef ContextStrToRemove = FromSamples->getContext().getCallingContext();		StringRef ContextStrToRemove = FromSamples->getContext().getCallingContext();
return promoteMergeContextSamplesTree(NodeToPromo, RootContext,		return promoteMergeContextSamplesTree(NodeToPromo, RootContext,
ContextStrToRemove);		ContextStrToRemove);
}		}

void SampleContextTracker::dump() {		void SampleContextTracker::dump() {
dbgs() << "Context Profile Tree:\n";		dbgs() << "Context Profile Tree:\n";
std::queue<ContextTrieNode *> NodeQueue;		std::queue<ContextTrieNode *> NodeQueue;
Show All 16 Lines	SampleContextTracker::getContextFor(const SampleContext &Context) {
return getOrCreateContextPath(Context, false);		return getOrCreateContextPath(Context, false);
}		}

ContextTrieNode *		ContextTrieNode *
SampleContextTracker::getCalleeContextFor(const DILocation *DIL,		SampleContextTracker::getCalleeContextFor(const DILocation *DIL,
StringRef CalleeName) {		StringRef CalleeName) {
assert(DIL && "Expect non-null location");		assert(DIL && "Expect non-null location");

// CSSPGO-TODO: need to support indirect callee
if (CalleeName.empty())
return nullptr;

ContextTrieNode *CallContext = getContextFor(DIL);		ContextTrieNode *CallContext = getContextFor(DIL);
if (!CallContext)		if (!CallContext)
return nullptr;		return nullptr;

		// When CalleeName is empty, the child context profile with max
		// total samples will be returned.
return CallContext->getChildContext(		return CallContext->getChildContext(
LineLocation(FunctionSamples::getOffset(DIL),		LineLocation(FunctionSamples::getOffset(DIL),
DIL->getBaseDiscriminator()),		DIL->getBaseDiscriminator()),
CalleeName);		CalleeName);
}		}

ContextTrieNode SampleContextTracker::getContextFor(const DILocation DIL) {		ContextTrieNode SampleContextTracker::getContextFor(const DILocation DIL) {
assert(DIL && "Expect non-null location");		assert(DIL && "Expect non-null location");
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show All 20 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/SampleProfile.h"		#include "llvm/Transforms/IPO/SampleProfile.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
		#include "llvm/ADT/PriorityQueue.h"
#include "llvm/ADT/SCCIterator.h"		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
STATISTIC(NumCSInlined,		STATISTIC(NumCSInlined,
"Number of functions inlined with context sensitive profile");		"Number of functions inlined with context sensitive profile");
STATISTIC(NumCSNotInlined,		STATISTIC(NumCSNotInlined,
"Number of functions not inlined with context sensitive profile");		"Number of functions not inlined with context sensitive profile");
STATISTIC(NumMismatchedProfile,		STATISTIC(NumMismatchedProfile,
"Number of functions with CFG mismatched profile");		"Number of functions with CFG mismatched profile");
STATISTIC(NumMatchedProfile, "Number of functions with CFG matched profile");		STATISTIC(NumMatchedProfile, "Number of functions with CFG matched profile");

		STATISTIC(NumCSInlinedHitMinLimit,
		"Number of functions with FDO inline stopped due to min size limit");
		STATISTIC(NumCSInlinedHitMaxLimit,
		"Number of functions with FDO inline stopped due to max size limit");
		STATISTIC(
		NumCSInlinedHitGrowthLimit,
		"Number of functions with FDO inline stopped due to growth size limit");

// Command line option to specify the file to read samples from. This is		// Command line option to specify the file to read samples from. This is
// mainly used for debugging.		// mainly used for debugging.
static cl::opt<std::string> SampleProfileFile(		static cl::opt<std::string> SampleProfileFile(
"sample-profile-file", cl::init(""), cl::value_desc("filename"),		"sample-profile-file", cl::init(""), cl::value_desc("filename"),
cl::desc("Profile file loaded by -sample-profile"), cl::Hidden);		cl::desc("Profile file loaded by -sample-profile"), cl::Hidden);

// The named file contains a set of transformations that may have been applied		// The named file contains a set of transformations that may have been applied
// to the symbol names between the program from which the sample data was		// to the symbol names between the program from which the sample data was
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	cl::desc("Do profile annotation and inlining for functions in top-down "
"order of call graph during sample profile loading. It only "		"order of call graph during sample profile loading. It only "
"works for new pass manager. "));		"works for new pass manager. "));

static cl::opt<bool> ProfileSizeInline(		static cl::opt<bool> ProfileSizeInline(
"sample-profile-inline-size", cl::Hidden, cl::init(false),		"sample-profile-inline-size", cl::Hidden, cl::init(false),
cl::desc("Inline cold call sites in profile loader if it's beneficial "		cl::desc("Inline cold call sites in profile loader if it's beneficial "
"for code size."));		"for code size."));

		static cl::opt<int> ProfileInlineGrowthLimit(
		"sample-profile-inline-growth-limit", cl::Hidden, cl::init(12),
		cl::desc("The size growth ratio limit for proirity-based sample profile "
		"loader inlining."));
		wmiUnsubmitted Not Done Reply Inline Actions It will only be applied to priority based sample profile loader inlining, right? Same for the description of sample-profile-inline-limit-min/sample-profile-inline-limit-max wmi: It will only be applied to priority based sample profile loader inlining, right? Same for the…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Yeah, updated the descriptions. thanks for pointing out. wenlei: Yeah, updated the descriptions. thanks for pointing out.

		static cl::opt<int> ProfileInlineLimitMin(
		"sample-profile-inline-limit-min", cl::Hidden, cl::init(100),
		cl::desc("The lower bound of size growth limit for "
		"proirity-based sample profile loader inlining."));

		static cl::opt<int> ProfileInlineLimitMax(
		"sample-profile-inline-limit-max", cl::Hidden, cl::init(10000),
		cl::desc("The upper bound of size growth limit for "
		"proirity-based sample profile loader inlining."));

		static cl::opt<int> ProfileICPThreshold(
		"sample-profile-icp-threshold", cl::Hidden, cl::init(5),
		cl::desc(
		"Relative hotness threshold for indirect "
		"call promotion in proirity-based sample profile loader inlining."));

		static cl::opt<int> SampleHotCallSiteThreshold(
		"sample-profile-hot-inline-threshold", cl::Hidden, cl::init(3000),
		cl::desc("Hot callsite threshold for proirity-based sample profile loader "
		"inlining."));

		static cl::opt<bool> CallsitePrioritizedInline(
		"sample-profile-prioritized-inline", cl::Hidden, cl::ZeroOrMore,
		cl::init(false),
		cl::desc("Use call site prioritized inlining for sample profile loader."
		"Currently only CSSPGO is supported."));

static cl::opt<int> SampleColdCallSiteThreshold(		static cl::opt<int> SampleColdCallSiteThreshold(
"sample-profile-cold-inline-threshold", cl::Hidden, cl::init(45),		"sample-profile-cold-inline-threshold", cl::Hidden, cl::init(45),
cl::desc("Threshold for inlining cold callsites"));		cl::desc("Threshold for inlining cold callsites"));

static cl::opt<std::string> ProfileInlineReplayFile(		static cl::opt<std::string> ProfileInlineReplayFile(
"sample-profile-inline-replay", cl::init(""), cl::value_desc("filename"),		"sample-profile-inline-replay", cl::init(""), cl::value_desc("filename"),
cl::desc(		cl::desc(
"Optimization remarks file containing inline remarks to be replayed "		"Optimization remarks file containing inline remarks to be replayed "
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	void SetGUIDToFuncNameMapForAll(DenseMap<uint64_t, StringRef> *Map) {
}		}
}		}

SampleProfileReader &CurrentReader;		SampleProfileReader &CurrentReader;
Module &CurrentModule;		Module &CurrentModule;
DenseMap<uint64_t, StringRef> &CurrentGUIDToFuncNameMap;		DenseMap<uint64_t, StringRef> &CurrentGUIDToFuncNameMap;
};		};

		// Inline candidate used by iterative callsite prioritized inliner
		struct InlineCandidate {
		CallBase *CallInstr;
		const FunctionSamples *CalleeSamples;
		uint64_t CallsiteCount;
		};

		// Inline candidate comparer using call site weight
		struct CandidateComparer {
		bool operator()(const InlineCandidate &LHS, const InlineCandidate &RHS) {
		if (LHS.CallsiteCount != RHS.CallsiteCount)
		return LHS.CallsiteCount < RHS.CallsiteCount;
		wmiUnsubmitted Not Done Reply Inline Actions Considering the case CallsiteCounts are equal, does every InlineCandidate have non-null CalleeSamples? If that is true, FunctionSamples::GUID can be used to make the comparison more stable. wmi: Considering the case CallsiteCounts are equal, does every InlineCandidate have non-null…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Yeah, every candidate should have non-null CalleeSamples. Added tie breaker using GUID. Thanks for the suggestion. wenlei: Yeah, every candidate should have non-null CalleeSamples. Added tie breaker using GUID. Thanks…

		// Tie breaker using GUID so we have stable/deterministic inlining order
		assert(LHS.CalleeSamples && RHS.CalleeSamples &&
		"Expect non-null FunctionSamples");
		return LHS.CalleeSamples->getGUID(LHS.CalleeSamples->getName()) <
		RHS.CalleeSamples->getGUID(RHS.CalleeSamples->getName());
		}
		};

		using CandidateQueue =
		PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
		CandidateComparer>;

/// Sample profile pass.		/// Sample profile pass.
///		///
/// This pass reads profile data from the file specified by		/// This pass reads profile data from the file specified by
/// -sample-profile-file and annotates every affected function with the		/// -sample-profile-file and annotates every affected function with the
/// profile information found in that file.		/// profile information found in that file.
class SampleProfileLoader {		class SampleProfileLoader {
public:		public:
SampleProfileLoader(		SampleProfileLoader(
Show All 21 Lines	protected:
ErrorOr<uint64_t> getInstWeight(const Instruction &I);		ErrorOr<uint64_t> getInstWeight(const Instruction &I);
ErrorOr<uint64_t> getProbeWeight(const Instruction &I);		ErrorOr<uint64_t> getProbeWeight(const Instruction &I);
ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB);		ErrorOr<uint64_t> getBlockWeight(const BasicBlock *BB);
const FunctionSamples *findCalleeFunctionSamples(const CallBase &I) const;		const FunctionSamples *findCalleeFunctionSamples(const CallBase &I) const;
std::vector<const FunctionSamples *>		std::vector<const FunctionSamples *>
findIndirectCallFunctionSamples(const Instruction &I, uint64_t &Sum) const;		findIndirectCallFunctionSamples(const Instruction &I, uint64_t &Sum) const;
mutable DenseMap<const DILocation , const FunctionSamples > DILocation2SampleMap;		mutable DenseMap<const DILocation , const FunctionSamples > DILocation2SampleMap;
const FunctionSamples *findFunctionSamples(const Instruction &I) const;		const FunctionSamples *findFunctionSamples(const Instruction &I) const;
bool inlineCallInstruction(CallBase &CB);		CallBase *tryPromoteIndirectCall(Function &F, StringRef CalleeName,
		uint64_t &Sum, uint64_t Count, CallBase *I,
		const char *&Reason);
		bool inlineCallInstruction(CallBase &CB,
		const FunctionSamples *CalleeSamples);
bool inlineHotFunctions(Function &F,		bool inlineHotFunctions(Function &F,
DenseSet<GlobalValue::GUID> &InlinedGUIDs);		DenseSet<GlobalValue::GUID> &InlinedGUIDs);
		// Helper functions call-site prioritized BFS inliner
		// Will change the main FDO inliner to be work list based directly in
		// upstream, then merge this change with that and remove the duplication.
		InlineCost shouldInlineCandidate(InlineCandidate &Candidate);
		bool getInlineCandidate(InlineCandidate NewCandidate, CallBase CB);
		bool tryInlineCandidate(InlineCandidate &Candidate,
		wmiUnsubmitted Not Done Reply Inline Actions Since there is another Struct named as InlineCandidate, can we rename it to doInline or executeInline? wmi: Since there is another Struct named as InlineCandidate, can we rename it to doInline or…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Good point, renamed to `tryInlineCandidate` to be consistent with `tryPromoteIndirectCall` and `shouldInlineCandidate`. This function may decided not to inline. wenlei: Good point, renamed to `tryInlineCandidate` to be consistent with `tryPromoteIndirectCall` and…
		SmallVector<CallBase *, 8> &InlinedCallSites);
		bool
		inlineHotFunctionsWithPriority(Function &F,
		DenseSet<GlobalValue::GUID> &InlinedGUIDs);
// Inline cold/small functions in addition to hot ones		// Inline cold/small functions in addition to hot ones
bool shouldInlineColdCallee(CallBase &CallInst);		bool shouldInlineColdCallee(CallBase &CallInst);
void emitOptimizationRemarksForInlineCandidates(		void emitOptimizationRemarksForInlineCandidates(
const SmallVectorImpl<CallBase *> &Candidates, const Function &F,		const SmallVectorImpl<CallBase *> &Candidates, const Function &F,
bool Hot);		bool Hot);
void printEdgeWeight(raw_ostream &OS, Edge E);		void printEdgeWeight(raw_ostream &OS, Edge E);
void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const;		void printBlockWeight(raw_ostream &OS, const BasicBlock *BB) const;
void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);		void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);
▲ Show 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	SampleProfileLoader::findIndirectCallFunctionSamples(
const Instruction &Inst, uint64_t &Sum) const {		const Instruction &Inst, uint64_t &Sum) const {
const DILocation *DIL = Inst.getDebugLoc();		const DILocation *DIL = Inst.getDebugLoc();
std::vector<const FunctionSamples *> R;		std::vector<const FunctionSamples *> R;

if (!DIL) {		if (!DIL) {
return R;		return R;
}		}

		auto FSCompare = [](const FunctionSamples L, const FunctionSamples R) {
		wmiUnsubmitted Done Reply Inline Actions Add assertions to check L and R are not nullptr? wmi: Add assertions to check L and R are not nullptr?
		assert(L && R && "Expect non-null FunctionSamples");
		if (L->getEntrySamples() != R->getEntrySamples())
		return L->getEntrySamples() > R->getEntrySamples();
		return FunctionSamples::getGUID(L->getName()) <
		FunctionSamples::getGUID(R->getName());
		};

		if (ProfileIsCS) {
		auto CalleeSamples =
		ContextTracker->getIndirectCalleeContextSamplesFor(DIL);
		if (CalleeSamples.empty())
		return R;

		// For CSSPGO, we only use target context profile's entry count
		// as that already includes both inlined callee and non-inlined ones..
		Sum = 0;
		for (const auto *const FS : CalleeSamples) {
		Sum += FS->getEntrySamples();
		R.push_back(FS);
		}
		llvm::sort(R, FSCompare);
		return R;
		}

const FunctionSamples *FS = findFunctionSamples(Inst);		const FunctionSamples *FS = findFunctionSamples(Inst);
if (FS == nullptr)		if (FS == nullptr)
return R;		return R;

auto CallSite = FunctionSamples::getCallSiteIdentifier(DIL);		auto CallSite = FunctionSamples::getCallSiteIdentifier(DIL);
auto T = FS->findCallTargetMapAt(CallSite);		auto T = FS->findCallTargetMapAt(CallSite);
Sum = 0;		Sum = 0;
if (T)		if (T)
for (const auto &T_C : T.get())		for (const auto &T_C : T.get())
Sum += T_C.second;		Sum += T_C.second;
if (const FunctionSamplesMap *M = FS->findFunctionSamplesMapAt(CallSite)) {		if (const FunctionSamplesMap *M = FS->findFunctionSamplesMapAt(CallSite)) {
if (M->empty())		if (M->empty())
return R;		return R;
for (const auto &NameFS : *M) {		for (const auto &NameFS : *M) {
Sum += NameFS.second.getEntrySamples();		Sum += NameFS.second.getEntrySamples();
R.push_back(&NameFS.second);		R.push_back(&NameFS.second);
}		}
llvm::sort(R, [](const FunctionSamples L, const FunctionSamples R) {		llvm::sort(R, FSCompare);
if (L->getEntrySamples() != R->getEntrySamples())
return L->getEntrySamples() > R->getEntrySamples();
return FunctionSamples::getGUID(L->getName()) <
FunctionSamples::getGUID(R->getName());
});
}		}
return R;		return R;
}		}

/// Get the FunctionSamples for an instruction.		/// Get the FunctionSamples for an instruction.
///		///
/// The FunctionSamples of an instruction \p Inst is the inlined instance		/// The FunctionSamples of an instruction \p Inst is the inlined instance
/// in which that instruction is coming from. We traverse the inline stack		/// in which that instruction is coming from. We traverse the inline stack
Show All 20 Lines	if (ProfileIsCS)
it.first->second = ContextTracker->getContextSamplesFor(DIL);		it.first->second = ContextTracker->getContextSamplesFor(DIL);
else		else
it.first->second =		it.first->second =
Samples->findFunctionSamples(DIL, Reader->getRemapper());		Samples->findFunctionSamples(DIL, Reader->getRemapper());
}		}
return it.first->second;		return it.first->second;
}		}

bool SampleProfileLoader::inlineCallInstruction(CallBase &CB) {		CallBase *
		SampleProfileLoader::tryPromoteIndirectCall(Function &F, StringRef CalleeName,
		uint64_t &Sum, uint64_t Count,
		CallBase I, const char &Reason) {
		Reason = "Callee function not available";
		// R->getValue() != &F is to prevent promoting a recursive call.
		// If it is a recursive call, we do not inline it as it could bloat
		// the code exponentially. There is way to better handle this, e.g.
		// clone the caller first, and inline the cloned caller if it is
		// recursive. As llvm does not inline recursive calls, we will
		// simply ignore it instead of handling it explicitly.
		auto R = SymbolMap.find(CalleeName);
		if (R != SymbolMap.end() && R->getValue() &&
		!R->getValue()->isDeclaration() && R->getValue()->getSubprogram() &&
		R->getValue()->hasFnAttribute("use-sample-profile") &&
		R->getValue() != &F && isLegalToPromote(*I, R->getValue(), &Reason)) {
		auto *DI =
		&pgo::promoteIndirectCall(*I, R->getValue(), Count, Sum, false, ORE);
		Sum -= Count;
		return DI;
		}
		return nullptr;
		}

		bool SampleProfileLoader::inlineCallInstruction(
		CallBase &CB, const FunctionSamples *CalleeSamples) {
if (ExternalInlineAdvisor) {		if (ExternalInlineAdvisor) {
auto Advice = ExternalInlineAdvisor->getAdvice(CB);		auto Advice = ExternalInlineAdvisor->getAdvice(CB);
if (!Advice->isInliningRecommended()) {		if (!Advice->isInliningRecommended()) {
Advice->recordUnattemptedInlining();		Advice->recordUnattemptedInlining();
return false;		return false;
}		}
// Dummy record, we don't use it for replay.		// Dummy record, we don't use it for replay.
Advice->recordInlining();		Advice->recordInlining();
Show All 18 Lines	ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "InlineFail", DLoc, BB)
<< "incompatible inlining");		<< "incompatible inlining");
return false;		return false;
}		}
InlineFunctionInfo IFI(nullptr, GetAC);		InlineFunctionInfo IFI(nullptr, GetAC);
if (InlineFunction(CB, IFI).isSuccess()) {		if (InlineFunction(CB, IFI).isSuccess()) {
// The call to InlineFunction erases I, so we can't pass it here.		// The call to InlineFunction erases I, so we can't pass it here.
emitInlinedInto(ORE, DLoc, BB, CalledFunction, *BB->getParent(), Cost,		emitInlinedInto(ORE, DLoc, BB, CalledFunction, *BB->getParent(), Cost,
true, CSINLINE_DEBUG);		true, CSINLINE_DEBUG);
		if (ProfileIsCS)
		ContextTracker->markContextSamplesInlined(CalleeSamples);
		++NumCSInlined;
return true;		return true;
}		}
return false;		return false;
}		}

bool SampleProfileLoader::shouldInlineColdCallee(CallBase &CallInst) {		bool SampleProfileLoader::shouldInlineColdCallee(CallBase &CallInst) {
if (!ProfileSizeInline)		if (!ProfileSizeInline)
return false;		return false;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	for (CallBase *I : CIS) {
if (LTOPhase == ThinOrFullLTOPhase::ThinLTOPreLink) {		if (LTOPhase == ThinOrFullLTOPhase::ThinLTOPreLink) {
FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),		FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),
PSI->getOrCompHotCountThreshold());		PSI->getOrCompHotCountThreshold());
continue;		continue;
}		}
if (!callsiteIsHot(FS, PSI))		if (!callsiteIsHot(FS, PSI))
continue;		continue;

const char *Reason = "Callee function not available";		const char *Reason = nullptr;
// R->getValue() != &F is to prevent promoting a recursive call.
// If it is a recursive call, we do not inline it as it could bloat
// the code exponentially. There is way to better handle this, e.g.
// clone the caller first, and inline the cloned caller if it is
// recursive. As llvm does not inline recursive calls, we will
// simply ignore it instead of handling it explicitly.
auto CalleeFunctionName = FS->getFuncName();		auto CalleeFunctionName = FS->getFuncName();
auto R = SymbolMap.find(CalleeFunctionName);		if (CallBase *DI =
if (R != SymbolMap.end() && R->getValue() &&		tryPromoteIndirectCall(F, CalleeFunctionName, Sum,
!R->getValue()->isDeclaration() &&		FS->getEntrySamples(), I, Reason)) {
R->getValue()->getSubprogram() &&
R->getValue()->hasFnAttribute("use-sample-profile") &&
R->getValue() != &F &&
isLegalToPromote(*I, R->getValue(), &Reason)) {
uint64_t C = FS->getEntrySamples();
auto &DI =
pgo::promoteIndirectCall(*I, R->getValue(), C, Sum, false, ORE);
Sum -= C;
PromotedInsns.insert(I);		PromotedInsns.insert(I);
// If profile mismatches, we should not attempt to inline DI.		// If profile mismatches, we should not attempt to inline DI.
if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&		if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&
inlineCallInstruction(cast<CallBase>(DI))) {		inlineCallInstruction(cast<CallBase>(*DI), FS)) {
if (ProfileIsCS)
ContextTracker->markContextSamplesInlined(FS);
localNotInlinedCallSites.erase(I);		localNotInlinedCallSites.erase(I);
LocalChanged = true;		LocalChanged = true;
++NumCSInlined;
}		}
} else {		} else {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "\nFailed to promote indirect call to "		<< "\nFailed to promote indirect call to "
<< CalleeFunctionName << " because " << Reason << "\n");		<< CalleeFunctionName << " because " << Reason << "\n");
}		}
}		}
} else if (CalledFunction && CalledFunction->getSubprogram() &&		} else if (CalledFunction && CalledFunction->getSubprogram() &&
!CalledFunction->isDeclaration()) {		!CalledFunction->isDeclaration()) {
if (inlineCallInstruction(*I)) {		if (inlineCallInstruction(*I, localNotInlinedCallSites.count(I)
if (ProfileIsCS)		? localNotInlinedCallSites[I]
ContextTracker->markContextSamplesInlined(		: nullptr)) {
localNotInlinedCallSites[I]);
localNotInlinedCallSites.erase(I);		localNotInlinedCallSites.erase(I);
LocalChanged = true;		LocalChanged = true;
++NumCSInlined;
}		}
} else if (LTOPhase == ThinOrFullLTOPhase::ThinLTOPreLink) {		} else if (LTOPhase == ThinOrFullLTOPhase::ThinLTOPreLink) {
findCalleeFunctionSamples(*I)->findInlinedFunctions(		findCalleeFunctionSamples(*I)->findInlinedFunctions(
InlinedGUIDs, F.getParent(), PSI->getOrCompHotCountThreshold());		InlinedGUIDs, F.getParent(), PSI->getOrCompHotCountThreshold());
}		}
}		}
if (LocalChanged) {		if (LocalChanged) {
Changed = true;		Changed = true;
} else {		} else {
break;		break;
}		}
}		}

		// For CS profile, profile for not inlined context will be merged when
		// base profile is being trieved
		if (ProfileIsCS)
		return Changed;

// Accumulate not inlined callsite information into notInlinedSamples		// Accumulate not inlined callsite information into notInlinedSamples
for (const auto &Pair : localNotInlinedCallSites) {		for (const auto &Pair : localNotInlinedCallSites) {
CallBase *I = Pair.getFirst();		CallBase *I = Pair.getFirst();
Function *Callee = I->getCalledFunction();		Function *Callee = I->getCalledFunction();
if (!Callee \|\| Callee->isDeclaration())		if (!Callee \|\| Callee->isDeclaration())
continue;		continue;

ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "NotInline",		ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "NotInline",
Show All 30 Lines	if (ProfileMergeInlinee) {
auto pair =		auto pair =
notInlinedCallInfo.try_emplace(Callee, NotInlinedProfileInfo{0});		notInlinedCallInfo.try_emplace(Callee, NotInlinedProfileInfo{0});
pair.first->second.entryCount += FS->getEntrySamples();		pair.first->second.entryCount += FS->getEntrySamples();
}		}
}		}
return Changed;		return Changed;
}		}

		bool SampleProfileLoader::tryInlineCandidate(
		InlineCandidate &Candidate, SmallVector<CallBase *, 8> &InlinedCallSites) {

		CallBase &CB = *Candidate.CallInstr;
		Function *CalledFunction = CB.getCalledFunction();
		assert(CalledFunction && "Expect a callee with definition");
		DebugLoc DLoc = CB.getDebugLoc();
		BasicBlock *BB = CB.getParent();

		InlineCost Cost = shouldInlineCandidate(Candidate);
		if (Cost.isNever()) {
		ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "InlineFail", DLoc, BB)
		<< "incompatible inlining");
		return false;
		}

		if (!Cost)
		return false;

		InlineFunctionInfo IFI(nullptr, GetAC);
		if (InlineFunction(CB, IFI).isSuccess()) {
		// The call to InlineFunction erases I, so we can't pass it here.
		emitInlinedInto(ORE, DLoc, BB, CalledFunction, *BB->getParent(), Cost,
		true, CSINLINE_DEBUG);

		// Now populate the list of newly exposed call sites.
		InlinedCallSites.clear();
		for (auto &I : IFI.InlinedCallSites)
		InlinedCallSites.push_back(I);

		if (ProfileIsCS)
		ContextTracker->markContextSamplesInlined(Candidate.CalleeSamples);
		++NumCSInlined;
		return true;
		}
		return false;
		}

		bool SampleProfileLoader::getInlineCandidate(InlineCandidate *NewCandidate,
		CallBase *CB) {
		assert(CB && "Expect non-null call instruction");

		if (isa<IntrinsicInst>(CB))
		return false;

		// Find the callee's profile. For indirect call, find hottest target profile.
		const FunctionSamples CalleeSamples = findCalleeFunctionSamples(CB);
		if (!CalleeSamples)
		return false;

		uint64_t CallsiteCount = 0;
		ErrorOr<uint64_t> Weight = getBlockWeight(CB->getParent());
		if (Weight)
		CallsiteCount = Weight.get();
		if (CalleeSamples)
		CallsiteCount = std::max(CallsiteCount, CalleeSamples->getEntrySamples());
		wmiUnsubmitted Not Done Reply Inline Actions CallsiteCount is 0 definitely before this line, so no need to use std::max. CallsiteCount = CalleeSamples->getEntrySamples(); wmi: CallsiteCount is 0 definitely before this line, so no need to use std::max. CallsiteCount =…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Good catch, this is a mistaken when taken out the changes for upstream. These are meant to be separate ifs. We found that taking the max works better. Fixed. wenlei: Good catch, this is a mistaken when taken out the changes for upstream. These are meant to be…

		*NewCandidate = {CB, CalleeSamples, CallsiteCount};
		return true;
		}

		InlineCost
		SampleProfileLoader::shouldInlineCandidate(InlineCandidate &Candidate) {
		assert(ProfileIsCS && "Prioritiy based inliner only works with CSSPGO now");
		wmiUnsubmitted Done Reply Inline Actions Add assert message. There are some other places missing the messages. wmi: Add assert message. There are some other places missing the messages.
		wenleiAuthorUnsubmitted Done Reply Inline Actions Message added for all instances. wenlei: Message added for all instances.

		std::unique_ptr<InlineAdvice> Advice = nullptr;
		if (ExternalInlineAdvisor) {
		Advice = ExternalInlineAdvisor->getAdvice(*Candidate.CallInstr);
		if (!Advice->isInliningRecommended()) {
		Advice->recordUnattemptedInlining();
		return InlineCost::getNever("not previously inlined");
		}
		Advice->recordInlining();
		return InlineCost::getAlways("previously inlined");
		}

		// Adjust threshold based on call site hotness, only do this for callsite
		// prioritized inliner because otherwise cost-benefit check is done earlier.
		int SampleThreshold = SampleColdCallSiteThreshold;
		if (CallsitePrioritizedInline) {
		if (Candidate.CallsiteCount > PSI->getHotCountThreshold())
		SampleThreshold = SampleHotCallSiteThreshold;
		else if (!ProfileSizeInline)
		return InlineCost::getNever("cold callsite");
		}

		Function *Callee = Candidate.CallInstr->getCalledFunction();
		assert(Callee && "Expect a definition for inline candidate of direct call");

		InlineParams Params = getInlineParams();
		Params.ComputeFullInlineCost = true;
		// Checks if there is anything in the reachable portion of the callee at
		// this callsite that makes this inlining potentially illegal. Need to
		// set ComputeFullInlineCost, otherwise getInlineCost may return early
		// when cost exceeds threshold without checking all IRs in the callee.
		// The acutal cost does not matter because we only checks isNever() to
		// see if it is legal to inline the callsite.
		InlineCost Cost = getInlineCost(*Candidate.CallInstr, Callee, Params,
		GetTTI(*Callee), GetAC, GetTLI);

		// For old FDO inliner, we inline the call site as long as cost is not
		// "Never". The cost-benefit check is done earlier.
		if (!CallsitePrioritizedInline) {
		if (Cost.isNever())
		return Cost;
		return InlineCost::getAlways("hot callsite previously inlined");
		}

		// Honor always inline and never inline from call analyzer
		if (Cost.isNever() \|\| Cost.isAlways())
		return Cost;

		// Otherwise only use the cost from call analyzer, but overwite threshold with
		// Sample PGO threshold.
		return InlineCost::get(Cost.getCost(), SampleThreshold);
		}

		bool SampleProfileLoader::inlineHotFunctionsWithPriority(
		Function &F, DenseSet<GlobalValue::GUID> &InlinedGUIDs) {
		DenseSet<Instruction *> PromotedInsns;
		assert(ProfileIsCS && "Prioritiy based inliner only works with CSSPGO now");

		// ProfAccForSymsInList is used in callsiteIsHot. The assertion makes sure
		// Profile symbol list is ignored when profile-sample-accurate is on.
		assert((!ProfAccForSymsInList \|\|
		(!ProfileSampleAccurate &&
		!F.hasFnAttribute("profile-sample-accurate"))) &&
		"ProfAccForSymsInList should be false when profile-sample-accurate "
		"is enabled");

		// Populating worklist with initial call sites from root inliner, along
		// with call site weights.
		CandidateQueue CQueue;
		InlineCandidate NewCandidate;
		for (auto &BB : F) {
		for (auto &I : BB.getInstList()) {
		auto *CB = dyn_cast<CallBase>(&I);
		if (!CB)
		continue;
		if (getInlineCandidate(&NewCandidate, CB))
		CQueue.push(NewCandidate);
		}
		}

		// Cap the size growth from profile guided inlining. This is needed even
		// though cost of each inline candidate already accounts for callee size,
		// because with top-down inlining, we can grow inliner size significantly
		// with large number of smaller inlinees each pass the cost check.
		assert(ProfileInlineLimitMax >= ProfileInlineLimitMin &&
		"Max inline size limit should not be smaller than min inline size "
		"limit.");
		unsigned SizeLimit = F.getInstructionCount() * ProfileInlineGrowthLimit;
		SizeLimit = std::min(SizeLimit, (unsigned)ProfileInlineLimitMax);
		SizeLimit = std::max(SizeLimit, (unsigned)ProfileInlineLimitMin);
		wmiUnsubmitted Not Done Reply Inline Actions Several parts in this loop looks quite similar as the counterparts in inlineHotFunctions. Is it possible to make them shared? wmi: Several parts in this loop looks quite similar as the counterparts in inlineHotFunctions. Is it…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Yeah, I thought about that too. I hoisted out part of ICP into tryPromoteIndirectCall to be shared. Looking more at this, I think if we let inlineHotFunctions uses InlineCandidate (with dummy hotness), we should be able to reuse tryInlineCandidate for inlineHotFunctions, and that may enable more shared code for the two loops. What about let me try this with an NFC patch on top of this one, so this patch doesn't change existing inlining code too much? wenlei: Yeah, I thought about that too. I hoisted out part of ICP into tryPromoteIndirectCall to be…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Sent D95024. The two loops are shorter now. inlineCallInstruction (AFDO) similar to tryInlineCandidate (CSSPGO) is now merged and removed. The ICP+Inline code all lifted into new common helper tryPromoteAndInlineCandidate. wenlei: Sent D95024. The two loops are shorter now. inlineCallInstruction (AFDO) similar to…
		if (ExternalInlineAdvisor)
		SizeLimit = std::numeric_limits<unsigned>::max();

		// Perform iterative BFS call site prioritized inlining
		bool Changed = false;
		while (!CQueue.empty() && F.getInstructionCount() < SizeLimit) {
		InlineCandidate Candidate = CQueue.top();
		CQueue.pop();
		CallBase *I = Candidate.CallInstr;
		Function *CalledFunction = I->getCalledFunction();

		if (CalledFunction == &F)
		continue;
		if (I->isIndirectCall()) {
		if (PromotedInsns.count(I))
		continue;
		uint64_t Sum;
		auto CalleeSamples = findIndirectCallFunctionSamples(*I, Sum);
		uint64_t SumOrigin = Sum;
		for (const auto *FS : CalleeSamples) {
		// TODO: Consider disable pre-lTO ICP for MonoLTO as well
		if (LTOPhase == ThinOrFullLTOPhase::ThinLTOPreLink) {
		FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),
		PSI->getOrCompHotCountThreshold());
		continue;
		}
		uint64_t EntryCountDistributed = FS->getEntrySamples();
		// In addition to regular inline cost check, we also need to make sure
		// ICP isn't introducing excessive speculative checks even if individual
		// target looks beneficial to promote and inline. That means we should
		// only do ICP when there's a small number dominant targets.
		if (EntryCountDistributed < SumOrigin / ProfileICPThreshold)
		break;
		wmiUnsubmitted Done Reply Inline Actions According to https://bugs.llvm.org/show_bug.cgi?id=18962, PR18962 has been fixed in 2014. Is it the right bug to refer to? wmi: According to https://bugs.llvm.org/show_bug.cgi?id=18962, PR18962 has been fixed in 2014. Is…
		wenleiAuthorUnsubmitted Done Reply Inline Actions The bug was fixed, but we generate different types for the same definition for the case in that bug, and call analyzer can't handle that now. Updated the comment to include more details. For code below, whether fn1 is present affects the type name of A. Then we could see two different types from the same definition, which makes CallAnalyzer choke as it's expecting matching parameter type on both caller and callee side. class A { // append has to have the same prototype as fn1 to tickle the bug. void (append)(A ); }; void fn1(A p1) { } wenlei:* The bug was fixed, but we generate different types for the same definition for the case in that…
		wmiUnsubmitted Done Reply Inline Actions I see, thanks for clarifying it. It is better to add a TODO here. wmi: I see, thanks for clarifying it. It is better to add a TODO here.
		wenleiAuthorUnsubmitted Done Reply Inline Actions Updated comment with TODO. wenlei: Updated comment with TODO.
		// For indirect call, we don't run CallAnalyzer to get InlineCost
		// before actual inlining. This is because we could see two different
		// types from the same definition, which makes CallAnalyzer choke as
		// it's expecting matching parameter type on both caller and callee
		// side. See example from PR18962 for the triggering cases (the bug was
		// fixed, but we generate different types).
		if (!PSI->isHotCount(EntryCountDistributed))
		break;
		const char *Reason = nullptr;
		auto CalleeFunctionName = FS->getFuncName();
		if (CallBase *DI = tryPromoteIndirectCall(
		wmiUnsubmitted Done Reply Inline Actions Should the comment block be hoisted? wmi: Should the comment block be hoisted?
		wenleiAuthorUnsubmitted Done Reply Inline Actions Updated the 1st part of the comment, and removed the 2nd part which is out dated. wenlei: Updated the 1st part of the comment, and removed the 2nd part which is out dated.
		F, CalleeFunctionName, Sum, EntryCountDistributed, I, Reason)) {
		// Attach function profile for promoted indirect callee, and update
		// call site count for the promoted inline candidate too.
		Candidate = {DI, FS, EntryCountDistributed};
		PromotedInsns.insert(I);
		SmallVector<CallBase *, 8> InlinedCallSites;
		// If profile mismatches, we should not attempt to inline DI.
		if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&
		tryInlineCandidate(Candidate, InlinedCallSites)) {
		for (auto *CB : InlinedCallSites) {
		if (getInlineCandidate(&NewCandidate, CB))
		CQueue.emplace(NewCandidate);
		}
		Changed = true;
		}
		} else {
		LLVM_DEBUG(dbgs()
		<< "\nFailed to promote indirect call to "
		<< CalleeFunctionName << " because " << Reason << "\n");
		}
		}
		} else if (CalledFunction && CalledFunction->getSubprogram() &&
		!CalledFunction->isDeclaration()) {
		SmallVector<CallBase *, 8> InlinedCallSites;
		if (tryInlineCandidate(Candidate, InlinedCallSites)) {
		for (auto *CB : InlinedCallSites) {
		if (getInlineCandidate(&NewCandidate, CB))
		CQueue.emplace(NewCandidate);
		}
		Changed = true;
		}
		} else if (LTOPhase == ThinOrFullLTOPhase::ThinLTOPreLink) {
		findCalleeFunctionSamples(*I)->findInlinedFunctions(
		InlinedGUIDs, F.getParent(), PSI->getOrCompHotCountThreshold());
		}
		}

		if (!CQueue.empty()) {
		if (SizeLimit == (unsigned)ProfileInlineLimitMax)
		++NumCSInlinedHitMaxLimit;
		else if (SizeLimit == (unsigned)ProfileInlineLimitMin)
		++NumCSInlinedHitMinLimit;
		else
		++NumCSInlinedHitGrowthLimit;
		}

		return Changed;
		}

/// Find equivalence classes for the given block.		/// Find equivalence classes for the given block.
///		///
/// This finds all the blocks that are guaranteed to execute the same		/// This finds all the blocks that are guaranteed to execute the same
/// number of times as \p BB1. To do this, it traverses all the		/// number of times as \p BB1. To do this, it traverses all the
/// descendants of \p BB1 in the dominator or post-dominator tree.		/// descendants of \p BB1 in the dominator or post-dominator tree.
///		///
/// A block BB2 will be in the same equivalence class as \p BB1 if		/// A block BB2 will be in the same equivalence class as \p BB1 if
/// the following holds:		/// the following holds:
▲ Show 20 Lines • Show All 585 Lines • ▼ Show 20 Lines	if (FunctionSamples::ProfileIsProbeBased) {
if (getFunctionLoc(F) == 0)		if (getFunctionLoc(F) == 0)
return false;		return false;

LLVM_DEBUG(dbgs() << "Line number for the first instruction in "		LLVM_DEBUG(dbgs() << "Line number for the first instruction in "
<< F.getName() << ": " << getFunctionLoc(F) << "\n");		<< F.getName() << ": " << getFunctionLoc(F) << "\n");
}		}

DenseSet<GlobalValue::GUID> InlinedGUIDs;		DenseSet<GlobalValue::GUID> InlinedGUIDs;
		if (ProfileIsCS && CallsitePrioritizedInline)
		Changed \|= inlineHotFunctionsWithPriority(F, InlinedGUIDs);
		else
Changed \|= inlineHotFunctions(F, InlinedGUIDs);		Changed \|= inlineHotFunctions(F, InlinedGUIDs);

// Compute basic block weights.		// Compute basic block weights.
Changed \|= computeBlockWeights(F);		Changed \|= computeBlockWeights(F);

if (Changed) {		if (Changed) {
// Add an entry count to the function using the samples gathered at the		// Add an entry count to the function using the samples gathered at the
// function entry.		// function entry.
// Sets the GUIDs that are inlined in the profiled binary. This is used		// Sets the GUIDs that are inlined in the profiled binary. This is used
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (!ExternalInlineAdvisor->areReplayRemarksLoaded())
ExternalInlineAdvisor.reset();		ExternalInlineAdvisor.reset();
}		}

// Apply tweaks if context-sensitive profile is available.		// Apply tweaks if context-sensitive profile is available.
if (Reader->profileIsCS()) {		if (Reader->profileIsCS()) {
ProfileIsCS = true;		ProfileIsCS = true;
FunctionSamples::ProfileIsCS = true;		FunctionSamples::ProfileIsCS = true;

		// Enable priority-base inliner and size inline by default for CSSPGO.
		if (!ProfileSizeInline.getNumOccurrences())
		ProfileSizeInline = true;
		if (!CallsitePrioritizedInline.getNumOccurrences())
		CallsitePrioritizedInline = true;

// Tracker for profiles under different context		// Tracker for profiles under different context
ContextTracker =		ContextTracker =
std::make_unique<SampleContextTracker>(Reader->getProfiles());		std::make_unique<SampleContextTracker>(Reader->getProfiles());
}		}

// Load pseudo probe descriptors for probe-based function samples.		// Load pseudo probe descriptors for probe-based function samples.
if (Reader->profileIsProbeBased()) {		if (Reader->profileIsProbeBased()) {
ProbeManager = std::make_unique<PseudoProbeManager>(M);		ProbeManager = std::make_unique<PseudoProbeManager>(M);
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/indirect-call-csspgo.prof

This file was added.

				[test]:63067:0
				1: 3345 _Z3barv:1398 _Z3foov:2059
				2: 100 _Z3bazv:102
				3: 100 _Z3zoov:102
				[test:1 @ _Z3barv]:200:100
				1: 100
				[test:1 @ _Z3foov]:4220:1200
				14: 4220
				[test:2 @ _Z3bazv]:200:100
				5: 100
				No newline at end of file

llvm/test/Transforms/SampleProfile/csspgo-inline-debug.ll

This file was copied from llvm/test/Transforms/SampleProfile/profile-context-tracker.ll.

; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly		; REQUIRES: asserts
; based on inline decision, so post inline counts are accurate.		; Test that the new FDO inliner using prioty queue will not visit same call site again and again.
		; Use debug prints as repeated call site evaluation is not visible from final inline decision.
; RUN: llvm-profdata merge --sample --extbinary %S/Inputs/profile-context-tracker.prof -o %t

; Note that we need new pass manager to enable top-down processing for sample profile loader		; Note that we need new pass manager to enable top-down processing for sample profile loader
; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-prioritized-inline=0 -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=OLD-INLINE
; main:3 @ _Z5funcAi		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-prioritized-inline=1 -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=NEW-INLINE
; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
; _Z5funcBi:1 @ _Z8funcLeafi
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL

; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile
; main:3 @ _Z5funcAi
; _Z5funcAi:1 @ _Z8funcLeafi
; _Z5funcBi:1 @ _Z8funcLeafi
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT

		; Old inliner will evaluate the same call site three times
		; OLD-INLINE: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
		; OLD-INLINE-NEXT: Callee context found: main:3.1 @ _Z5funcBi
		; OLD-INLINE: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
		; OLD-INLINE-NEXT: Callee context found: main:3.1 @ _Z5funcBi
		; OLD-INLINE: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
		; OLD-INLINE-NEXT: Callee context found: main:3.1 @ _Z5funcBi

		; New inliner only evaluate the same call site once
		; NEW-INLINE: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
		; NEW-INLINE-NEXT: Callee context found: main:3.1 @ _Z5funcBi
		; NEW-INLINE-NOT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
		; NEW-INLINE-NOT: Callee context found: main:3.1 @ _Z5funcBi

@factor = dso_local global i32 3, align 4, !dbg !0		@factor = dso_local global i32 3, align 4, !dbg !0

define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {		define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
; INLINE-ALL: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
; INLINE-HOT: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
entry:		entry:
br label %for.body, !dbg !25		br label %for.body, !dbg !25

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret i32 %add3, !dbg !27		ret i32 %add3, !dbg !27

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]		%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]
%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]		%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]
%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32		%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32
; _Z5funcBi is marked noinline
; INLINE-ALL: call i32 @_Z5funcBi
; INLINE-HOT: call i32 @_Z5funcBi
%add = add nuw nsw i32 %x.011, 1, !dbg !31		%add = add nuw nsw i32 %x.011, 1, !dbg !31
%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28		%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28
; INLINE-ALL-NOT: call i32 @_Z5funcAi
; INLINE-HOT: call i32 @_Z5funcAi
%add2 = add i32 %call, %r.010, !dbg !34		%add2 = add i32 %call, %r.010, !dbg !34
%add3 = add i32 %add2, %call1, !dbg !35		%add3 = add i32 %add2, %call1, !dbg !35
%dec = add nsw i32 %x.011, -1, !dbg !36		%dec = add nsw i32 %x.011, -1, !dbg !36
%cmp = icmp eq i32 %x.011, 0, !dbg !38		%cmp = icmp eq i32 %x.011, 0, !dbg !38
br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25		br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25
}		}

define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #1 !dbg !40 {		define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #1 !dbg !40 {
; _Z5funcAi is inlined, so outline remainder should have zero counts
; INLINE-ALL: @_Z5funcAi{{.*}}!prof ![[FUNCA_PROF:[0-9]+]]
; INLINE-HOT: @_Z5funcAi{{.*}}!prof ![[FUNCA_PROF:[0-9]+]]
entry:		entry:
%add = add nsw i32 %x, 100000, !dbg !44		%add = add nsw i32 %x, 100000, !dbg !44
; _Z8funcLeafi is already inlined on main->_Z5funcAi->_Z8funcLeafi,
; so it should not be inlined on _Z5funcAi->_Z8funcLeafi based on updated
; (merged and promoted) context profile
; INLINE-ALL: call i32 @_Z8funcLeafi
; INLINE-HOT-NOT: call i32 @_Z8funcLeafi
%call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !45		%call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !45
ret i32 %call, !dbg !46		ret i32 %call, !dbg !46
}		}

define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {		define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {
; main->_Z5funcAi->_Z8funcLeafi is inlined, and _Z5funcBi->_Z8funcLeafi is also
; inlined, so outline remainder should have empty profile
; INLINE-ALL: @_Z8funcLeafi{{.*}}!prof ![[LEAF_PROF:[0-9]+]]
; INLINE-HOT: @_Z8funcLeafi{{.*}}!prof ![[LEAF_PROF:[0-9]+]]
entry:		entry:
%cmp = icmp sgt i32 %x, 0, !dbg !57		%cmp = icmp sgt i32 %x, 0, !dbg !57
br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59		br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59

while.cond2.preheader: ; preds = %entry		while.cond2.preheader: ; preds = %entry
%cmp313 = icmp slt i32 %x, 0, !dbg !60		%cmp313 = icmp slt i32 %x, 0, !dbg !60
br i1 %cmp313, label %while.body4, label %if.end, !dbg !63		br i1 %cmp313, label %while.body4, label %if.end, !dbg !63

Show All 14 Lines	while.body4: ; preds = %while.body4, %while.cond2.preheader
br i1 %cmp3, label %while.body4, label %if.end, !dbg !63		br i1 %cmp3, label %while.body4, label %if.end, !dbg !63

if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader		if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader
%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]		%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]
ret i32 %x.addr.2, !dbg !76		ret i32 %x.addr.2, !dbg !76
}		}

define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {		define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {
; _Z5funcBi is marked noinline, so outline remainder has promoted context profile
; INLINE-ALL: @_Z5funcBi{{.*}}!prof ![[FUNCB_PROF:[0-9]+]]
; INLINE-HOT: @_Z5funcBi{{.*}}!prof ![[FUNCB_PROF:[0-9]+]]
entry:		entry:
%sub = add nsw i32 %x, -100000, !dbg !51		%sub = add nsw i32 %x, -100000, !dbg !51
%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52		%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52
; _Z5funcBi is not inlined into main, so we main->_Z5funcBi->_Z8funcLeafi
; should be inlined based on promoted context profile
; INLINE-ALL-NOT: call i32 @_Z8funcLeafi
; INLINE-HOT-NOT: call i32 @_Z8funcLeafi
ret i32 %call, !dbg !53		ret i32 %call, !dbg !53
}		}

; INLINE-ALL-DAG: [[MAIN_PROF]] = !{!"function_entry_count", i64 13}
; INLINE-ALL-DAG: [[FUNCA_PROF]] = !{!"function_entry_count", i64 0}
; INLINE-ALL-DAG-SAME: [[LEAF_PROF]] = !{!"function_entry_count", i64 0}
; INLINE-ALL-DAG: [[FUNCB_PROF]] = !{!"function_entry_count", i64 33}

; INLINE-HOT-DAG: [[MAIN_PROF]] = !{!"function_entry_count", i64 13}
; INLINE-HOT-DAG: [[FUNCA_PROF]] = !{!"function_entry_count", i64 12}
; INLINE-HOT-DAG-SAME: [[LEAF_PROF]] = !{!"function_entry_count", i64 0}
; INLINE-HOT-DAG: [[FUNCB_PROF]] = !{!"function_entry_count", i64 33}

declare i32 @_Z3fibi(i32)		declare i32 @_Z3fibi(i32)

attributes #0 = { nofree noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }		attributes #0 = { nofree noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }
attributes #1 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }		attributes #1 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }

!llvm.dbg.cu = !{!2}		!llvm.dbg.cu = !{!2}
!llvm.module.flags = !{!14, !15, !16}		!llvm.module.flags = !{!14, !15, !16}
!llvm.ident = !{!17}		!llvm.ident = !{!17}
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/csspgo-inline-icall.ll

This file was added.

				; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-threshold=100 -pass-remarks=sample-profile -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-ALL %s
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-threshold=100 -pass-remarks=sample-profile -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-ALL %s
				; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-threshold=100 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/indirect-call-csspgo.prof -sample-profile-icp-threshold=100 -pass-remarks=sample-profile -sample-profile-inline-size=0 -S -o /dev/null 2>&1 \| FileCheck -check-prefix=ICP-HOT %s

				define void @test(void ()*) #0 !dbg !3 {
				;; Add two direct call to force top-down order for sample profile loader
				call void @_Z3foov(), !dbg !7
				call void @_Z3barv(), !dbg !7
				call void @_Z3bazv(), !dbg !7
				%2 = alloca void ()*
				store void ()* %0, void ()** %2
				%3 = load void (), void ()* %2
				call void %3(), !dbg !4
				%4 = alloca void ()*
				store void ()* %0, void ()** %4
				%5 = load void (), void ()* %4
				call void %5(), !dbg !5
				ret void
				}

				define void @_Z3foov() #0 !dbg !8 {
				ret void
				}

				define void @_Z3barv() #0 !dbg !9 {
				ret void
				}

				define void @_Z3bazv() #0 !dbg !10 {
				ret void
				}

				define void @_Z3zoov() #0 !dbg !11 {
				ret void
				}

				attributes #0 = {"use-sample-profile"}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!2}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1)
				!1 = !DIFile(filename: "test.cc", directory: "/")
				!2 = !{i32 2, !"Debug Info Version", i32 3}
				!3 = distinct !DISubprogram(name: "test", scope: !1, file: !1, line: 3, unit: !0)
				!4 = !DILocation(line: 4, scope: !3)
				!5 = !DILocation(line: 5, scope: !3)
				!6 = !DILocation(line: 6, scope: !3)
				!7 = !DILocation(line: 7, scope: !3)
				!8 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !1, file: !1, line: 29, unit: !0)
				!9 = distinct !DISubprogram(name: "bar", linkageName: "_Z3barv", scope: !1, file: !1, line: 32, unit: !0)
				!10 = distinct !DISubprogram(name: "baz", linkageName: "_Z3bazv", scope: !1, file: !1, line: 24, unit: !0)
				!11 = distinct !DISubprogram(name: "zoo", linkageName: "_Z3zoov", scope: !1, file: !1, line: 24, unit: !0)


				; ICP-ALL: remark: test.cc:5:0: _Z3bazv inlined into test
				; ICP-ALL-NEXT: remark: test.cc:4:0: _Z3foov inlined into test
				; ICP-ALL-NEXT: remark: test.cc:4:0: _Z3barv inlined into test
				; ICP-ALL-NOT: remark

				; ICP-HOT: remark: test.cc:4:0: _Z3foov inlined into test
				; ICP-HOT-NOT: remark

llvm/test/Transforms/SampleProfile/csspgo-inline.ll

This file was copied from llvm/test/Transforms/SampleProfile/profile-context-tracker.ll.

; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly		; Test for CSSPGO's new early inliner using priority queue
; based on inline decision, so post inline counts are accurate.

; RUN: llvm-profdata merge --sample --extbinary %S/Inputs/profile-context-tracker.prof -o %t

; Note that we need new pass manager to enable top-down processing for sample profile loader		; Note that we need new pass manager to enable top-down processing for sample profile loader
; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile		; Test we inlined the following in top-down order with old inliner
; main:3 @ _Z5funcAi		; main:3 @ _Z5funcAi
; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi		; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
; _Z5funcBi:1 @ _Z8funcLeafi		; _Z5funcBi:1 @ _Z8funcLeafi
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-prioritized-inline=0 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL		;
		; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, so we get less inlining for given profile
; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-NEW
; main:3 @ _Z5funcAi		;
; _Z5funcAi:1 @ _Z8funcLeafi		; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, tuning hot cutoff can get us the same inlining
; _Z5funcBi:1 @ _Z8funcLeafi		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-summary-cutoff-hot=999900 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT		;
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT		; With new FDO early inliner, callee entry count is used to drive inlining instead of callee total samples, tuning cold sample profile inline threshold can get us the same inlining
		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=200 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-BASE
		;
		; With new FDO early inliner and tuned cutoff, we can control inlining through size growth tuning knob.
		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-summary-cutoff-hot=999900 -sample-profile-inline-limit-min=0 -sample-profile-inline-growth-limit=1 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --allow-empty --check-prefix=INLINE-NEW-LIMIT1
		; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-summary-cutoff-hot=999900 -sample-profile-inline-limit-min=10 -sample-profile-inline-growth-limit=1 -profile-sample-accurate -S -pass-remarks=inline -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-NEW-LIMIT2


		; INLINE-BASE: remark: merged.cpp:14:10: _Z5funcAi inlined into main to match profiling context with (cost={{[0-9]+}}, threshold={{[0-9]+}}) at callsite main:3:10
		; INLINE-BASE: remark: merged.cpp:27:11: _Z8funcLeafi inlined into main to match profiling context with (cost={{[0-9]+}}, threshold={{[0-9]+}}) at callsite _Z5funcAi:1:11 @ main:3:10
		; INLINE-BASE: remark: merged.cpp:33:11: _Z8funcLeafi inlined into _Z5funcBi to match profiling context with (cost={{[0-9]+}}, threshold={{[0-9]+}}) at callsite _Z5funcBi:1:11

		; INLINE-NEW: remark: merged.cpp:14:10: _Z5funcAi inlined into main to match profiling context with (cost={{[0-9]+}}, threshold={{[0-9]+}}) at callsite main:3:10
		; INLINE-NEW-NOT: remark

		; INLINE-NEW-LIMIT1-NOT: remark

		; INLINE-NEW-LIMIT2: remark: merged.cpp:27:11: _Z8funcLeafi inlined into _Z5funcAi to match profiling context with (cost={{[0-9]+}}, threshold={{[0-9]+}}) at callsite _Z5funcAi:1:11
		; INLINE-NEW-LIMIT2: remark: merged.cpp:33:11: _Z8funcLeafi inlined into _Z5funcBi to match profiling context with (cost={{[0-9]+}}, threshold={{[0-9]+}}) at callsite _Z5funcBi:1:11
		; INLINE-NEW-LIMIT2-NOT: remark

@factor = dso_local global i32 3, align 4, !dbg !0		@factor = dso_local global i32 3, align 4, !dbg !0

define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {		define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
; INLINE-ALL: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
; INLINE-HOT: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
entry:		entry:
br label %for.body, !dbg !25		br label %for.body, !dbg !25

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
ret i32 %add3, !dbg !27		ret i32 %add3, !dbg !27

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]		%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]
%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]		%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]
%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32		%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32
; _Z5funcBi is marked noinline
; INLINE-ALL: call i32 @_Z5funcBi
; INLINE-HOT: call i32 @_Z5funcBi
%add = add nuw nsw i32 %x.011, 1, !dbg !31		%add = add nuw nsw i32 %x.011, 1, !dbg !31
%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28		%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28
; INLINE-ALL-NOT: call i32 @_Z5funcAi
; INLINE-HOT: call i32 @_Z5funcAi
%add2 = add i32 %call, %r.010, !dbg !34		%add2 = add i32 %call, %r.010, !dbg !34
%add3 = add i32 %add2, %call1, !dbg !35		%add3 = add i32 %add2, %call1, !dbg !35
%dec = add nsw i32 %x.011, -1, !dbg !36		%dec = add nsw i32 %x.011, -1, !dbg !36
%cmp = icmp eq i32 %x.011, 0, !dbg !38		%cmp = icmp eq i32 %x.011, 0, !dbg !38
br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25		br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25
}		}

define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #1 !dbg !40 {		define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #1 !dbg !40 {
; _Z5funcAi is inlined, so outline remainder should have zero counts
; INLINE-ALL: @_Z5funcAi{{.*}}!prof ![[FUNCA_PROF:[0-9]+]]
; INLINE-HOT: @_Z5funcAi{{.*}}!prof ![[FUNCA_PROF:[0-9]+]]
entry:		entry:
%add = add nsw i32 %x, 100000, !dbg !44		%add = add nsw i32 %x, 100000, !dbg !44
; _Z8funcLeafi is already inlined on main->_Z5funcAi->_Z8funcLeafi,
; so it should not be inlined on _Z5funcAi->_Z8funcLeafi based on updated
; (merged and promoted) context profile
; INLINE-ALL: call i32 @_Z8funcLeafi
; INLINE-HOT-NOT: call i32 @_Z8funcLeafi
%call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !45		%call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !45
ret i32 %call, !dbg !46		ret i32 %call, !dbg !46
}		}

define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {		define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {
; main->_Z5funcAi->_Z8funcLeafi is inlined, and _Z5funcBi->_Z8funcLeafi is also
; inlined, so outline remainder should have empty profile
; INLINE-ALL: @_Z8funcLeafi{{.*}}!prof ![[LEAF_PROF:[0-9]+]]
; INLINE-HOT: @_Z8funcLeafi{{.*}}!prof ![[LEAF_PROF:[0-9]+]]
entry:		entry:
%cmp = icmp sgt i32 %x, 0, !dbg !57		%cmp = icmp sgt i32 %x, 0, !dbg !57
br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59		br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59

while.cond2.preheader: ; preds = %entry		while.cond2.preheader: ; preds = %entry
%cmp313 = icmp slt i32 %x, 0, !dbg !60		%cmp313 = icmp slt i32 %x, 0, !dbg !60
br i1 %cmp313, label %while.body4, label %if.end, !dbg !63		br i1 %cmp313, label %while.body4, label %if.end, !dbg !63

Show All 14 Lines	while.body4: ; preds = %while.body4, %while.cond2.preheader
br i1 %cmp3, label %while.body4, label %if.end, !dbg !63		br i1 %cmp3, label %while.body4, label %if.end, !dbg !63

if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader		if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader
%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]		%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]
ret i32 %x.addr.2, !dbg !76		ret i32 %x.addr.2, !dbg !76
}		}

define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {		define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {
; _Z5funcBi is marked noinline, so outline remainder has promoted context profile
; INLINE-ALL: @_Z5funcBi{{.*}}!prof ![[FUNCB_PROF:[0-9]+]]
; INLINE-HOT: @_Z5funcBi{{.*}}!prof ![[FUNCB_PROF:[0-9]+]]
entry:		entry:
%sub = add nsw i32 %x, -100000, !dbg !51		%sub = add nsw i32 %x, -100000, !dbg !51
%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52		%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52
; _Z5funcBi is not inlined into main, so we main->_Z5funcBi->_Z8funcLeafi
; should be inlined based on promoted context profile
; INLINE-ALL-NOT: call i32 @_Z8funcLeafi
; INLINE-HOT-NOT: call i32 @_Z8funcLeafi
ret i32 %call, !dbg !53		ret i32 %call, !dbg !53
}		}

; INLINE-ALL-DAG: [[MAIN_PROF]] = !{!"function_entry_count", i64 13}
; INLINE-ALL-DAG: [[FUNCA_PROF]] = !{!"function_entry_count", i64 0}
; INLINE-ALL-DAG-SAME: [[LEAF_PROF]] = !{!"function_entry_count", i64 0}
; INLINE-ALL-DAG: [[FUNCB_PROF]] = !{!"function_entry_count", i64 33}

; INLINE-HOT-DAG: [[MAIN_PROF]] = !{!"function_entry_count", i64 13}
; INLINE-HOT-DAG: [[FUNCA_PROF]] = !{!"function_entry_count", i64 12}
; INLINE-HOT-DAG-SAME: [[LEAF_PROF]] = !{!"function_entry_count", i64 0}
; INLINE-HOT-DAG: [[FUNCB_PROF]] = !{!"function_entry_count", i64 33}

declare i32 @_Z3fibi(i32)		declare i32 @_Z3fibi(i32)

attributes #0 = { nofree noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }		attributes #0 = { nofree noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }
attributes #1 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }		attributes #1 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }

!llvm.dbg.cu = !{!2}		!llvm.dbg.cu = !{!2}
!llvm.module.flags = !{!14, !15, !16}		!llvm.module.flags = !{!14, !15, !16}
!llvm.ident = !{!17}		!llvm.ident = !{!17}
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly			; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly
	; based on inline decision, so post inline counts are accurate.			; based on inline decision, so post inline counts are accurate.

	; Note that we need new pass manager to enable top-down processing for sample profile loader			; Note that we need new pass manager to enable top-down processing for sample profile loader
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-ALL			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=200 -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-ALL
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-HOT			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-prioritized-inline=0 -sample-profile-inline-size=0 -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-HOT


	; Testwe we inlined the following in top-down order and promot rest not inlined context profile into base profile			; Test we inlined the following in top-down order and promot rest not inlined context profile into base profile
	; main:3 @ _Z5funcAi			; main:3 @ _Z5funcAi
	; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; _Z5funcBi:1 @ _Z8funcLeafi			; _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL: Getting base profile for function: main			; INLINE-ALL: Getting base profile for function: main
	; INLINE-ALL-NEXT: Merging context profile into base profile: main			; INLINE-ALL-NEXT: Merging context profile into base profile: main
	; INLINE-ALL-NEXT: Found context tree root to promote: external:12 @ main			; INLINE-ALL-NEXT: Found context tree root to promote: external:12 @ main
	; INLINE-ALL-NEXT: Context promoted and merged to: main			; INLINE-ALL-NEXT: Context promoted and merged to: main
	; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi			; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
	; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi			; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi			; INLINE-ALL-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi
	; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi			; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi
	; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi			; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi(
	; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z8funcLeafi			; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z8funcLeafi
	; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
	; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call.i1 = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call.i1 = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcAi			; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcAi
	; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcAi			; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcAi
	; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcBi			; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcBi
	; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcBi			; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcBi
	; INLINE-ALL-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi			; INLINE-ALL-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi			; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi
	; INLINE-ALL-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi			; INLINE-ALL-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi			; INLINE-ALL-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi
	; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi			; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi
	; INLINE-ALL-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-ALL-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi			; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
	; INLINE-ALL-NEXT: Getting base profile for function: _Z8funcLeafi			; INLINE-ALL-NEXT: Getting base profile for function: _Z8funcLeafi
	; INLINE-ALL-NEXT: Merging context profile into base profile: _Z8funcLeafi			; INLINE-ALL-NEXT: Merging context profile into base profile: _Z8funcLeafi

	; Testwe we inlined the following in top-down order and promot rest not inlined context profile into base profile			; Test we inlined the following in top-down order and promot rest not inlined context profile into base profile
	; main:3 @ _Z5funcAi
	; _Z5funcAi:1 @ _Z8funcLeafi			; _Z5funcAi:1 @ _Z8funcLeafi
	; _Z5funcBi:1 @ _Z8funcLeafi			; _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT: Getting base profile for function: main			; INLINE-HOT: Getting base profile for function: main
	; INLINE-HOT-NEXT: Merging context profile into base profile: main			; INLINE-HOT-NEXT: Merging context profile into base profile: main
	; INLINE-HOT-NEXT: Found context tree root to promote: external:12 @ main			; INLINE-HOT-NEXT: Found context tree root to promote: external:12 @ main
	; INLINE-HOT-NEXT: Context promoted and merged to: main			; INLINE-HOT-NEXT: Context promoted and merged to: main
	; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !58			; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
	; INLINE-HOT-NEXT: Callee context found: main:3.1 @ _Z5funcBi			; INLINE-HOT-NEXT: Callee context found: main:3.1 @ _Z5funcBi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !63			; INLINE-HOT-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi
	; INLINE-HOT-NEXT: Callee context found: main:3 @ _Z5funcAi			; INLINE-HOT-NEXT: Callee context found: main:3 @ _Z5funcAi
	; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcAi			; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcAi
	; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcAi			; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcAi
	; INLINE-HOT-NEXT: Found context tree root to promote: main:3 @ _Z5funcAi			; INLINE-HOT-NEXT: Found context tree root to promote: main:3 @ _Z5funcAi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi			; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !50			; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !50
	; INLINE-HOT-NEXT: Callee context found: _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Callee context found: _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcAi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcAi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62			; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62
	; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69			; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69
	; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcBi			; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcBi
	; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcBi			; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcBi
	; INLINE-HOT-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi			; INLINE-HOT-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi			; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi
	; INLINE-HOT-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi			; INLINE-HOT-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi			; INLINE-HOT-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi
	; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi			; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !50			; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi
	; INLINE-HOT-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi			; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62			; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi
	; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69			; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
	; INLINE-HOT-NEXT: Getting base profile for function: _Z8funcLeafi			; INLINE-HOT-NEXT: Getting base profile for function: _Z8funcLeafi
	; INLINE-HOT-NEXT: Merging context profile into base profile: _Z8funcLeafi			; INLINE-HOT-NEXT: Merging context profile into base profile: _Z8funcLeafi


	@factor = dso_local global i32 3, align 4, !dbg !0			@factor = dso_local global i32 3, align 4, !dbg !0

	define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {			define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
	entry:			entry:
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/profile-context-tracker.ll

This file was copied to llvm/test/Transforms/SampleProfile/csspgo-inline-debug.ll, llvm/test/Transforms/SampleProfile/csspgo-inline.ll.

	; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly			; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly
	; based on inline decision, so post inline counts are accurate.			; based on inline decision, so post inline counts are accurate.

	; RUN: llvm-profdata merge --sample --extbinary %S/Inputs/profile-context-tracker.prof -o %t			; RUN: llvm-profdata merge --sample --extbinary %S/Inputs/profile-context-tracker.prof -o %t

	; Note that we need new pass manager to enable top-down processing for sample profile loader			; Note that we need new pass manager to enable top-down processing for sample profile loader
	; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile			; Test we inlined the following in top-down order and entry counts accurate reflects post-inline base profile
	; main:3 @ _Z5funcAi			; main:3 @ _Z5funcAi
	; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi			; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
	; _Z5funcBi:1 @ _Z8funcLeafi			; _Z5funcBi:1 @ _Z8funcLeafi
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-prioritized-inline=0 -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -sample-profile-inline-size -sample-profile-prioritized-inline=0 -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=200 -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL
	; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -sample-profile-inline-size -sample-profile-cold-inline-threshold=200 -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL
	; main:3 @ _Z5funcAi			;
				; Test we inlined the following in top-down order and entry counts accurate reflects post-inline base profile
	; _Z5funcAi:1 @ _Z8funcLeafi			; _Z5funcAi:1 @ _Z8funcLeafi
	; _Z5funcBi:1 @ _Z8funcLeafi			; _Z5funcBi:1 @ _Z8funcLeafi
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%t -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT


	@factor = dso_local global i32 3, align 4, !dbg !0			@factor = dso_local global i32 3, align 4, !dbg !0

	define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {			define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
	; INLINE-ALL: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]			; INLINE-ALL: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
	; INLINE-HOT: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]			; INLINE-HOT: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
	entry:			entry:
	▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Call site prioritized inlining for sample PGOClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 320655

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/indirect-call-csspgo.prof

llvm/test/Transforms/SampleProfile/csspgo-inline-debug.ll

llvm/test/Transforms/SampleProfile/csspgo-inline-icall.ll

llvm/test/Transforms/SampleProfile/csspgo-inline.ll

llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll

llvm/test/Transforms/SampleProfile/profile-context-tracker.ll

[CSSPGO] Call site prioritized inlining for sample PGO
ClosedPublic