This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
ProfileData/
21/21
SampleProf.h
-
SampleProfReader.h
-
Transforms/IPO/
-
IPO/
3/3
SampleContextTracker.h
-
lib/
-
ProfileData/
-
SampleProf.cpp
2/2
SampleProfReader.cpp
-
Transforms/IPO/
-
IPO/
-
CMakeLists.txt
22/24
SampleContextTracker.cpp
2/2
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
2/2
profile-context-tracker.prof
4/4
profile-context-tracker-debug.ll
-
profile-context-tracker.ll

Differential D90125

[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
ClosedPublic

Authored by wenlei on Oct 25 2020, 1:02 PM.

Download Raw Diff

Details

Reviewers

wmi
hoy
davidxl

Commits

rG6b989a171073: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining

Summary

This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.

Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.

Interface

There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.

Query base profile (getBaseSamplesFor)

Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.

Query context profile (getContextSamplesFor)

Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.

Track inlined context profile (markContextSamplesInlined)

When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.

Track not-inlined context profile (promoteMergeContextSamplesTree)

When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.

Implementation

Implementation-wise, SampleContext is created as abstraction for context - currently it uses string of call path as internal representation. Each SampleContext also has a ContextState indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each FunctionSamples now has a SampleContext that tells whether it's base profile or context profile, and for context profile what is the context and state.

On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, SampleContextTracker is implemented that encapsulates a trie tree with ContextTireNode as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track FunctionSamples for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.

Integration

SampleContextTracker is now also integrated with AutoFDO, SampleProfileReader and SampleProfileLoader. When we detected input profile contains context-sensitive profile, SampleContextTracker will be used to track profiles, and all profile query will go to SampleContextTracker instead of SampleProfileReader automatically. Tracking APIs are called automatically for each inline decision from SampleProfileLoader.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,530 ms	x64 debian > libarcher.races::lock-unrelated.c

Event Timeline

wenlei created this revision.Oct 25 2020, 1:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2020, 1:02 PM

Herald added subscribers: llvm-commits, modimo, lxfind and 2 others. · View Herald Transcript

wenlei requested review of this revision.Oct 25 2020, 1:02 PM

Herald added a subscriber: ormris. · View Herald TranscriptOct 25 2020, 1:02 PM

wenlei edited the summary of this revision. (Show Details)Oct 25 2020, 1:07 PM

wenlei added reviewers: wmi, hoy, davidxl.

wenlei added a subscriber: wlei.

format, separate change in ControlHeightReduction, remove remaining internal markers.

Harbormaster completed remote builds in B76333: Diff 300553.Oct 25 2020, 1:41 PM

Harbormaster completed remote builds in B76334: Diff 300554.Oct 25 2020, 2:02 PM

davidxl added inline comments.Oct 29 2020, 2:33 PM

llvm/include/llvm/ProfileData/SampleProf.h
347	Since these states are not mutually exclusive, perhaps name it ContextStateMask?
364	document this method.
377	Document the context string format here.
409	what is the input format? document it here.
527	CSFDO ==> CSSPGO
llvm/lib/ProfileData/SampleProfReader.cpp
225–233	so FName is also the context String?

davidxl added inline comments.Oct 29 2020, 2:33 PM

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
95	Document the public APIs.

Thanks for the very helpful description!

llvm/include/llvm/ProfileData/SampleProf.h
527–529	It means CSSPGO will treat all the new lines as cold, even if some of them may be inferred from other parts of the profile. How much extra size is needed if zero is emitted?
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
247	I don't understand what the top level means here. Better document it. Do we cache the base profile somewhere or we merge it everytime?
369	Do we need to call getCanonicalFnName here to make the name in inline stack canonical so we can match the name in inline stack with the name in context?
llvm/lib/Transforms/IPO/SampleProfile.cpp
1907–1911	Here it means no any profile loading or just no CS profile? ThinLTO thinlink phase needs to know which functions are hot and it can import them, so profile information is needed in ThinLTO prelink.
llvm/test/Transforms/SampleProfile/Inputs/profile-context-tracker.prof
28	Here "main" doesn't show up in the context. Is it a problem of unwinding or debug info?
llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll
2–4	Is there anything which cannot be tested in profile-context-tracker.ll? The debug message is usually used as last resort if something cannot be fully tested by just checking IR.

Another question. Have you ever evaluated the performance by comparing this patch "CSSPGO profile working with existing AFDO pipeline" with the default SPGO? I understand CSSPGO's benefit has not been used without the change of CGSCC inliner. The intention of the comparison is to understand how well existing AFDO pipeline work with CSSPGO profile. It may expose problem in existing SPGO profile or CSSPGO.

address feedback from Wei and David

Harbormaster completed remote builds in B77269: Diff 302294.Nov 2 2020, 9:24 AM

Thanks for quick review! I've update the patch, also see replies inline.

llvm/include/llvm/ProfileData/SampleProf.h
347	Renamed.
364	Done.
377	Added header comment to `SampleContext`.
409	Added comment.
527–529	Knowing that CS profile will be much bigger, we started with trimming zero counts trying to save size as much as we can. But I don't actually have the data at hand. Let me see if I can get some data on this. New lines will be less of a problem for pseudo-probe if they don't change CFG.
llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
95	Done.
llvm/lib/ProfileData/SampleProfReader.cpp
225–233	Yes, `SampleContext` can take both full context string (wrapped with `[]`) as well as context-less function names, and it will set internal state accordingly. I've updated header comment for `SampleContext` with details. // Example of full context string (note the wrapping `[]`): // `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]` // Example of context-less function name (same as AutoFDO): // `_Z8funcLeafi`
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
247	Top-level means it's under root node directly - path from to node is empty, hence no context. I added comment here as well as in the header comment of `SampleContextTracker`. If `getBaseSamplesFor` is called for the same function again, we'll retrieve the existing top-level node from last call (it must exist, with context profile all merged into it already), then iterating over `FuncToCtxtProfileSet[Name]` but won't do anything since all context profiles have been merged (check on line 265). So it's somewhat like caching. Currently `getBaseSamplesFor` is only called once for each function from sample profile loader.
369	Good catch, I think we need to canonicalize `CalleeName` which is the leaf. (The names of middle inline frames should be fine as they're from debug metadata which are not modified when suffixes are appended for symbol promotion, etc..) I guess we need to add `getCanonicalFnName` for `SampleProfileLoader::findCalleeFunctionSamples`. IIUC, we need it there for today's FDO too?
llvm/lib/Transforms/IPO/SampleProfile.cpp
1907–1911	Oops, we don't have this change now, forgot to remove when upstreaming. And you're right, we need to load profile for thinlto so thinlink importing can be profile guided. Thanks for catch this.
llvm/test/Transforms/SampleProfile/Inputs/profile-context-tracker.prof
28	This is an artificial context to simulate the case where funcB is also called from external functions to current module (compile time profile loader's case), and we merge context involving external caller correctly. Real profile for this case doesn't have problem in capturing the correct context.
llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll
2–4	I added this hoping to make it easier to reason about the internals operations/state of context tracker, and also capture any unintended subtle change in context tracking. But if we look at end result of IR, the non-debug test should be able to cover it as good. I can remove this one if you think that's better.

Fix typo and linter, add getCanonicalFnName.

In D90125#2365948, @wmi wrote:

Another question. Have you ever evaluated the performance by comparing this patch "CSSPGO profile working with existing AFDO pipeline" with the default SPGO? I understand CSSPGO's benefit has not been used without the change of CGSCC inliner. The intention of the comparison is to understand how well existing AFDO pipeline work with CSSPGO profile. It may expose problem in existing SPGO profile or CSSPGO.

Yeah, I've tried that initially on SPEC. It showed some perf win, however the problem with inliner (actually the sample loader inliner, not the CGSCC inliner) is very visible on a few cases. It's mostly because today's sample loader inliner is a replay inliner, so it won't be more aggressive than CGSCC inline from previous build, hence simple hotness heuristic works. But with CSSPGO profile, it's unbounded on hot path (as long as it's hot, there's no check on inlinee's size or inline cost) and can lead to size bloat and perf regression in some cases.

I have an upcoming change to make sample loader inliner a priority based inliner with size and cost checks. And it should work for today's AFDO as well - if we treat all call sites to inline with equal priority, and set inline limit to infinite, it should be a no-op change for AFDO, but perhaps can be tuned it to benefit AFDO later too.

Harbormaster completed remote builds in B77282: Diff 302327.Nov 2 2020, 10:48 AM

In D90125#2368991, @wenlei wrote:

In D90125#2365948, @wmi wrote:

Another question. Have you ever evaluated the performance by comparing this patch "CSSPGO profile working with existing AFDO pipeline" with the default SPGO? I understand CSSPGO's benefit has not been used without the change of CGSCC inliner. The intention of the comparison is to understand how well existing AFDO pipeline work with CSSPGO profile. It may expose problem in existing SPGO profile or CSSPGO.

Yeah, I've tried that initially on SPEC. It showed some perf win, however the problem with inliner (actually the sample loader inliner, not the CGSCC inliner) is very visible on a few cases. It's mostly because today's sample loader inliner is a replay inliner, so it won't be more aggressive than CGSCC inline from previous build, hence simple hotness heuristic works. But with CSSPGO profile, it's unbounded on hot path (as long as it's hot, there's no check on inlinee's size or inline cost) and can lead to size bloat and perf regression in some cases.

I see. Thanks. CSSPGO profile is currently oblivious to inline/outline, unlike current SPGO profile which has the concept of inline/outline instance. So CSSPGO profile cannot replace SPGO to drive the current early inliner (oblivious to inline size, used mainly for maximize profile matching).

I have an upcoming change to make sample loader inliner a priority based inliner with size and cost checks. And it should work for today's AFDO as well - if we treat all call sites to inline with equal priority, and set inline limit to infinite, it should be a no-op change for AFDO, but perhaps can be tuned it to benefit AFDO later too.

If CSSPGO has the priority based inliner, does it mean it will do most inliner work and CGSCC inliner will mostly be used as an iterative clean up pass?

If CSSPGO has the priority based inliner, does it mean it will do most inliner work and CGSCC inliner will mostly be used as an iterative clean up pass?

Yeah, we want early inliner to take over more inlining, for the full benefit of top-down specialization as well as accurate post-inline profile. CSSPGO+priority-based TD inline now lead to more inlining shifted from CGSCC inline to early inline, but currently most of the inlining is still done by CGSCC inline (by number of inline sites). This is subject to tuning and something we're looking into. We want to make sure at least all hot inlining are covered early TD inliner.

wmi added inline comments.Nov 3 2020, 10:13 AM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369	In the context, there are also levels contributed by stack unwinding. Those frames should have the same names as elf symbols. To be consistent, do we want to apply getCanonicalFnName for all the context levels? I guess we need to add getCanonicalFnName for SampleProfileLoader::findCalleeFunctionSamples. IIUC, we need it there for today's FDO too? Agree. Today, it may not need it because most suffixes are appended after inline so like you said the names of the inline frames from debug metadata don't contain the suffixes. But there are now suffixes being added before inline (https://reviews.llvm.org/D89617) and there may be others in the future. It is good to always apply the function.
llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll
2–4	If there could be unintended subtle change which cannot be caught by the non-debug test, we can keep it. Just make sure the debug messages used in CHECK are all necessary in terms of ensuring the result we are expecting to see.

wenlei added inline comments.Nov 4 2020, 11:47 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369	In the context, there are also levels contributed by stack unwinding. Those frames should have the same names as elf symbols. To be consistent, do we want to apply getCanonicalFnName for all the context levels? Good point. In this case, I think it's better canonicalize all names during profile generation though. IIRC AutoFDO get names from dwarf hence it does not have the suffixes (as if it's canonicalized). So doing canonicalization during profile generation would make it consistent with AutoFDO.

davidxl added inline comments.Nov 10 2020, 12:56 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	The syntax of the context string can probably be made more consistent like: SampleContext : LeafContext : ParentContext LeafContext LeafContext: function_name ParentContext: [ParentFrames] ParentFrames: OneParentFrame : ParentFrames OneParentFrame OneParentFrame: function_name:line @ So in your example, the full context string should look like: [main:3 @_Z5funcAi:1 @] _Z8funcLeafi

wenlei added inline comments.Nov 10 2020, 1:10 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	Agreed, that's indeed more consistent and I like that better. Thanks for the suggestion. Will make the change (llvm-profgen change will follow too).

wenlei added inline comments.Nov 10 2020, 6:17 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	Actually, there're other implication if we were to change context string to be `[context] leaf`. With the currently syntax, when we promote context, new context string is a substring of the original one. So we just create StringRef wrapper for context promotion without creating new strings. E.g. when `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi` is promoted, it becomes `_Z5funcAi:1 @ _Z8funcLeafi` which is sub string of the original one (StringRef that reuses the underlying string). If we use `[main:3 @_Z5funcAi:1 @] _Z8funcLeafi`, promotion will lead to something like `[_Z5funcAi:1 @] _Z8funcLeafi`, which is no longer a substring and a new string need to be created. We use two StringRef to represent context part and leaf part today, but we also need a consistent string representation for the full context `getNameWithContext`. So practically, current syntax is more efficient for context promotion. Additionally, with the proposed syntax, top level context would look like this `[] main` to differentiate from context-less header. In this case, `[main]` is probably better. What do you think?

davidxl added inline comments.Nov 10 2020, 6:57 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	Actually, there're other implication if we were to change context string to be `[context] leaf`. With the currently syntax, when we promote context, new context string is a substring of the original one. So we just create StringRef wrapper for context promotion without creating new strings. E.g. when `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi` is promoted, it becomes `_Z5funcAi:1 @ _Z8funcLeafi` which is sub string of the original one (StringRef that reuses the underlying string). In this case, where does the bracket go? If we use `[main:3 @_Z5funcAi:1 @] _Z8funcLeafi`, promotion will lead to something like `[_Z5funcAi:1 @] _Z8funcLeafi`, which is no longer a substring and a new string need to be created. We use two StringRef to represent context part and leaf part today, but we also need a consistent string representation for the full context `getNameWithContext`. So practically, current syntax is more efficient for context promotion. Additionally, with the proposed syntax, top level context would look like this `[] main` to differentiate from context-less header. In this case, `[main]` is probably better. what is the difference between top level context vs context less header? What do you think?

wenlei added inline comments.Nov 11 2020, 9:08 AM

llvm/include/llvm/ProfileData/SampleProf.h
361	In this case, where does the bracket go? Internally, we don't use the bracket, and since they're the first and last character, we just create a StringRef with bracket removed for `SampleContext`, without creating new string. So in profile file, we have `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`, while in `SampleContext` used by LLVM and tools, we have `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi`. The form used by `SampleContext` is easier for context promotion, and the conversion from `[]` form to `SampleContext` is just a StringRef wrapper. what is the difference between top level context vs context less header? They have different meanings, consider a dso, context-less profile `foo` means we just don't know the calling context, while context profile `[foo]` means this is called directly from external function. In practice, we don't have context-less profile and context profile in a single profile file now, so it's also about consistency in context profile and enough differentiation between context profile and context-less profile (i.e reserve the form `main` only for context-less profile for today's AFDO).

davidxl added inline comments.Nov 11 2020, 12:17 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	In this case, where does the bracket go? Internally, we don't use the bracket, and since they're the first and last character, we just create a StringRef with bracket removed for `SampleContext`, without creating new string. So in profile file, we have `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`, while in `SampleContext` used by LLVM and tools, we have `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi`. The form used by `SampleContext` is easier for context promotion, and the conversion from `[]` form to `SampleContext` is just a StringRef wrapper. That is what I was thinking -- for external format, we need to make it well defined and as consistent as possible, while for internal representation, any format is fine. Here is the question: is there a need to share external and internal rep? Once the external strings are parsed they can be discarded for internal format, is there more compact form ? Is there a need to use string ? what is the difference between top level context vs context less header? They have different meanings, consider a dso, context-less profile `foo` means we just don't know the calling context, while context profile `[foo]` means this is called directly from external function. In practice, we don't have context-less profile and context profile in a single profile file now, so it's also about consistency in context profile and enough differentiation between context profile and context-less profile (i.e reserve the form `main` only for context-less profile for today's AFDO).

wenlei added inline comments.Nov 11 2020, 12:58 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	for internal format, is there more compact form ? Is there a need to use string ? This is something we thought about too. We were thinking about something along the lines of a rolling hash (integer encoding that is friendly to context promotion operation) to eliminate StringRef. But actually with current implementation, it's quite compact already because it's always StringRef and we never need to create any new string (the order of context syntax, root to leaf, was also intentional to make promoted context substring of original context). So we're happy with it. is there a need to share external and internal rep? Once the external strings are parsed they can be discarded Sharing between internal and external format isn't must have, but I think it's nice to have if cost is minimal. External strings can be discarded but that would require non-trivial framework changes that affects AFDO too. Currently, for AFDO (and CSSPGO) we keep profile file in a memory buffer, and all external strings are StringRef, which is used throughout SampleProfileReader and hence SampleProfileLoader, assuming the underlying strings are always available. E.g. FunctionSamples::Name is StringRef wrapped around external string from the memory buffer of input profile. CSSPGO uses the same framework, and all of that would need to be changed if we want to discard external strings (essentially freeing the memory buffer after loading profile). This (freeing the memory buffer after loading profile for both AFDO and CSSPGO) feels like more of an optimization rather than part of the CSSPGO framework, and if we do that optimization later, we could change internal representation accordingly. What do you think?

davidxl added inline comments.Nov 11 2020, 2:15 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	for internal format, is there more compact form ? Is there a need to use string ? This is something we thought about too. We were thinking about something along the lines of a rolling hash (integer encoding that is friendly to context promotion operation) to eliminate StringRef. But actually with current implementation, it's quite compact already because it's always StringRef and we never need to create any new string (the order of context syntax, root to leaf, was also intentional to make promoted context substring of original context). So we're happy with it. A related question is how large is memory consumption when the raw string is used throughout ? By changing the internal format, can it bring down memory usage further for a large profile? is there a need to share external and internal rep? Once the external strings are parsed they can be discarded Sharing between internal and external format isn't must have, but I think it's nice to have if cost is minimal. External strings can be discarded but that would require non-trivial framework changes that affects AFDO too. Currently, for AFDO (and CSSPGO) we keep profile file in a memory buffer, and all external strings are StringRef, which is used throughout SampleProfileReader and hence SampleProfileLoader, assuming the underlying strings are always available. E.g. FunctionSamples::Name is StringRef wrapped around external string from the memory buffer of input profile. CSSPGO uses the same framework, and all of that would need to be changed if we want to discard external strings (essentially freeing the memory buffer after loading profile). This (freeing the memory buffer after loading profile for both AFDO and CSSPGO) feels like more of an optimization rather than part of the CSSPGO framework, and if we do that optimization later, we could change internal representation accordingly. What do you think? Wei can probably chime in -- he has plans to reduce cross binary FDO string table size. If we tie our implementation too much on string sharing, it makes it less flexible to change in the future. If we can decouple external/internal representation, it allows us to get it right for the external format from the beginning.

wenlei added inline comments.Nov 11 2020, 3:30 PM

llvm/include/llvm/ProfileData/SampleProf.h
361	A related question is how large is memory consumption when the raw string is used throughout ? By changing the internal format, can it bring down memory usage further for a large profile? The extra memory consumption now is just fixed 12 bytes (StringRef) for each context, not a string for each context. There's of course memory consumption for the memory buffer holding input profile (and that is the size of input profile). If we tie our implementation too much on string sharing, it makes it less flexible to change in the future. If we can decouple external/internal representation, it allows us to get it right for the external format from the beginning. I see your point. But I think since this is only about internal representation, we always have the flexibility in changing it in the future (no compatibility issue, the coupling isn't invasive neither is the change). With current framework, we thought string representation with StringRef fits nicely. If in the future, we want to free memory buffer after profile loading, I'd be happy to take care of this part. What we're doing here is consistent with AFDO today which also shares strings from memory buffer for internal FunctionSamples. In that sense, I don't think this needs to be treated differently from AFDO. It's also not easy to decouple cleanly now with AFDO and common things like FunctionSample still using strings from memory buffer.

Given Wei's feedback, I am ok with the current design now, including the
external sample format.

wenlei added inline comments.Nov 19 2020, 2:59 PM

llvm/include/llvm/ProfileData/SampleProf.h
527–529	I took a look at current profile generation tool. It requires some extra work to fill in zeros for not sample lines for CSSPGO. I collected AutoFDO for mysql w/ and w/o zeros filled, here's the size difference. CSSPGO is likely to see similar or bigger relative size difference (profile for context can be more sparse). w/ zero filled for unsampled lines: 3.9M w/o zero filled for unsampled lines: 1.4M

Thanks for reviewing and the feedbacks. I think I've addressed all of the current ones. Please take a look, thanks!

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
369	Name canonicalization is now done in llvm-profgen (https://reviews.llvm.org/D89723).
llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll
2–4	Ok, I'll keep it then. We want to make sure context tracker is doing exactly what it has to do (and checking on inlining alone may not be strong enough).

wmi added inline comments.Nov 30 2020, 11:25 PM

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
71	Rename it to ChildContexts or AllChildContext?
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
62–66	Make param AllowCreate default to true so we don't need this wrapper?
72	Add an assert message.
270	Add an assertion message.
284	Add an assertion message.
358	Add an assertion message.
374	A question, the context in SampleContextTracker includes not only inline stack but also call stack. S vector below only contains the inline stack at the DIL location. How can it match with the full stack starting from RootContext?
438	Add an assertion message.
447	Add an assertion message.
454	Add assertion message.
463	Use OldCallSiteLoc instead?
474–491	It will be slightly easier to read if the block can be extracted to a function.

wenlei marked 11 inline comments as done.Dec 1 2020, 11:34 PM

wenlei added inline comments.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
62–66	Good catch, changed.
374	When we decided to not inline a call site, context profile will be promoted to root, so what remains in context tracker should reflect the accurate remainder context profile. E.g. if we start with A->(call) B->(inline) C in context tracker. At some point if we're looking at B->C from DIL, there're two scenarios: If A inlined B, in this case, we wouldn't be able to match B->C from DIL to anything in context tracker. But this is intentional and desired, because The remainder/base profile for B, or the context profile B->C shouldn't have anything if A->B inline happened. If A not inlined B, in this case, B->C should be moved/promoted from child of A to be under root. Then we would be able to match B->C from DIL to B->C (under root) in context tracker.

Address Wei's feedback.

Harbormaster completed remote builds in B80778: Diff 308881.Dec 2 2020, 12:30 AM

rebase

Harbormaster completed remote builds in B80823: Diff 308980.Dec 2 2020, 9:06 AM

wmi added inline comments.Dec 2 2020, 12:34 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
374	I see, thanks. After compiler decides it won't inline at some callsite, the profile for the callsite will be promoted and some context information will be loss. This seems to assume the inlining happens in top-down order and happens only once. I remember the CSSPGO profile will be used to drive CGSCC Inliner in the future. CGSCC Inliner will need to do the inlining iteratively so how it supposes to work with profile promotion?

wenlei added inline comments.Dec 2 2020, 1:38 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
374	You're right that it currently assumes top-down order - that is the best way to leverage context sensitive profile. If we try to use CSSPGO profile to drive SCC inline, bottom-up order and iterative nature are two key differences. The bottom-up inlining means we can't promote context profile by moving them to be under root, instead, we will need to copy (and merge) context profile into the base profile under root. For the same example A->B->C, with SCC inline, we could end up processing B first before A. When processing B, we promote the not inlined context profile of B to be under root (B->C), and merge them together into a base profile of B. However, we still need to keep the original context profile tree (A->B->C) so later when we processing A, we will still see the B and C under A. Actually the promotion happens when we try to access a function's base profile (getBaseSamplesFor calls promoteMergeContextSamplesTree for each not inlined context profile), so the difference between top-down and bottom-up inline is more about accuracy - with bottom-up inline, when getting base profile for B, we'd assume none of B's call sites is inlined even if later A inlines B. For iterative inlining, we can getBaseSamplesFor every time we process a function again to redo the promotion and merge based on the up-to-date inline decisions. E.g. if we process B then A (which inlines B), then B again, the 2nd time we process B, we would not merge the B under A into B's current base profile, which makes the profile more accurate than first pass over B. (But it's still not as good as top-down inline because even if we can unmerge context profile, we can't undo inlining).

wmi added inline comments.Dec 2 2020, 3:09 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
374	Thanks for the detailed explanation. That makes sense to me. Talking about the profile permotion and merging, if function is still hot after the inlining of all its callsites has been decided, and if it still has different profiles under different contexts, it may be interesting to clone the function so we can still apply the context sensitive profile in group. It will be interesting to have some support to compare profiles under different contexts and split them into groups. I feel the full context sensitive profile opens up some new opportunity we can explore by maximizing its usage in the future.

LGTM.

This revision is now accepted and ready to land.Dec 2 2020, 3:09 PM

wenlei added inline comments.Dec 2 2020, 6:53 PM

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
374	Yeah, that's a good point. We're also thinking about cloning as it's something clang is still behind gcc. I think it will take some time before we fully leverage the new opportunities. I will send another patch for priority based top-down inlining with CSSPGO, with that more inlining will be done during early top-down inline, but it will take more effort to rebalance inline between sample loader vs CGSCC.

rebase

This revision was landed with ongoing or failed builds.Dec 6 2020, 12:12 PM

Closed by commit rG6b989a171073: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining (authored by wenlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wenlei added a commit: rG6b989a171073: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

Harbormaster completed remote builds in B81236: Diff 309785.Dec 6 2020, 12:43 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

SampleProf.h

146 lines

SampleProfReader.h

5 lines

Transforms/

IPO/

SampleContextTracker.h

141 lines

lib/

ProfileData/

SampleProf.cpp

1 line

SampleProfReader.cpp

21 lines

Transforms/

IPO/

CMakeLists.txt

1 line

SampleContextTracker.cpp

512 lines

SampleProfile.cpp

66 lines

test/

Transforms/

SampleProfile/

Inputs/

profile-context-tracker.prof

36 lines

profile-context-tracker-debug.ll

234 lines

profile-context-tracker.ll

197 lines

Diff 308980

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	struct LineLocation {
void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;

bool operator<(const LineLocation &O) const {		bool operator<(const LineLocation &O) const {
return LineOffset < O.LineOffset \|\|		return LineOffset < O.LineOffset \|\|
(LineOffset == O.LineOffset && Discriminator < O.Discriminator);		(LineOffset == O.LineOffset && Discriminator < O.Discriminator);
}		}

		bool operator==(const LineLocation &O) const {
		return LineOffset == O.LineOffset && Discriminator == O.Discriminator;
		}

uint32_t LineOffset;		uint32_t LineOffset;
uint32_t Discriminator;		uint32_t Discriminator;
};		};

raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);		raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);

/// Representation of a single sample record.		/// Representation of a single sample record.
///		///
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines

private:		private:
uint64_t NumSamples = 0;		uint64_t NumSamples = 0;
CallTargetMap CallTargets;		CallTargetMap CallTargets;
};		};

raw_ostream &operator<<(raw_ostream &OS, const SampleRecord &Sample);		raw_ostream &operator<<(raw_ostream &OS, const SampleRecord &Sample);

		// State of context associated with FunctionSamples
		enum ContextStateMask {
		davidxlUnsubmitted Done Reply Inline Actions Since these states are not mutually exclusive, perhaps name it ContextStateMask? davidxl: Since these states are not mutually exclusive, perhaps name it ContextStateMask?
		wenleiAuthorUnsubmitted Done Reply Inline Actions Renamed. wenlei: Renamed.
		UnknownContext = 0x0, // Profile without context
		RawContext = 0x1, // Full context profile from input profile
		SyntheticContext = 0x2, // Synthetic context created for context promotion
		InlinedContext = 0x4, // Profile for context that is inlined into caller
		MergedContext = 0x8 // Profile for context merged into base profile
		};

		// Sample context for FunctionSamples. It consists of the calling context,
		// the function name and context state. Internally sample context is represented
		// using StringRef, which is also the input for constructing a `SampleContext`.
		// It can accept and represent both full context string as well as context-less
		// function name.
		// Example of full context string (note the wrapping `[]`):
		// `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`
		davidxlUnsubmitted Done Reply Inline Actions The syntax of the context string can probably be made more consistent like: SampleContext : LeafContext : ParentContext LeafContext LeafContext: function_name ParentContext: [ParentFrames] ParentFrames: OneParentFrame : ParentFrames OneParentFrame OneParentFrame: function_name:line @ So in your example, the full context string should look like: [main:3 @_Z5funcAi:1 @] _Z8funcLeafi davidxl: The syntax of the context string can probably be made more consistent like: SampleContext…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Agreed, that's indeed more consistent and I like that better. Thanks for the suggestion. Will make the change (llvm-profgen change will follow too). wenlei: Agreed, that's indeed more consistent and I like that better. Thanks for the suggestion. Will…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Actually, there're other implication if we were to change context string to be `[context] leaf`. With the currently syntax, when we promote context, new context string is a substring of the original one. So we just create StringRef wrapper for context promotion without creating new strings. E.g. when `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi` is promoted, it becomes `_Z5funcAi:1 @ _Z8funcLeafi` which is sub string of the original one (StringRef that reuses the underlying string). If we use `[main:3 @_Z5funcAi:1 @] _Z8funcLeafi`, promotion will lead to something like `[_Z5funcAi:1 @] _Z8funcLeafi`, which is no longer a substring and a new string need to be created. We use two StringRef to represent context part and leaf part today, but we also need a consistent string representation for the full context `getNameWithContext`. So practically, current syntax is more efficient for context promotion. Additionally, with the proposed syntax, top level context would look like this `[] main` to differentiate from context-less header. In this case, `[main]` is probably better. What do you think? wenlei: Actually, there're other implication if we were to change context string to be `[context] leaf`.
		davidxlUnsubmitted Done Reply Inline Actions Actually, there're other implication if we were to change context string to be `[context] leaf`. With the currently syntax, when we promote context, new context string is a substring of the original one. So we just create StringRef wrapper for context promotion without creating new strings. E.g. when `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi` is promoted, it becomes `_Z5funcAi:1 @ _Z8funcLeafi` which is sub string of the original one (StringRef that reuses the underlying string). In this case, where does the bracket go? If we use `[main:3 @_Z5funcAi:1 @] _Z8funcLeafi`, promotion will lead to something like `[_Z5funcAi:1 @] _Z8funcLeafi`, which is no longer a substring and a new string need to be created. We use two StringRef to represent context part and leaf part today, but we also need a consistent string representation for the full context `getNameWithContext`. So practically, current syntax is more efficient for context promotion. Additionally, with the proposed syntax, top level context would look like this `[] main` to differentiate from context-less header. In this case, `[main]` is probably better. what is the difference between top level context vs context less header? What do you think? davidxl: > Actually, there're other implication if we were to change context string to be `[context]…
		wenleiAuthorUnsubmitted Done Reply Inline Actions In this case, where does the bracket go? Internally, we don't use the bracket, and since they're the first and last character, we just create a StringRef with bracket removed for `SampleContext`, without creating new string. So in profile file, we have `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`, while in `SampleContext` used by LLVM and tools, we have `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi`. The form used by `SampleContext` is easier for context promotion, and the conversion from `[]` form to `SampleContext` is just a StringRef wrapper. what is the difference between top level context vs context less header? They have different meanings, consider a dso, context-less profile `foo` means we just don't know the calling context, while context profile `[foo]` means this is called directly from external function. In practice, we don't have context-less profile and context profile in a single profile file now, so it's also about consistency in context profile and enough differentiation between context profile and context-less profile (i.e reserve the form `main` only for context-less profile for today's AFDO). wenlei: > In this case, where does the bracket go? Internally, we don't use the bracket, and since…
		davidxlUnsubmitted Done Reply Inline Actions In this case, where does the bracket go? Internally, we don't use the bracket, and since they're the first and last character, we just create a StringRef with bracket removed for `SampleContext`, without creating new string. So in profile file, we have `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`, while in `SampleContext` used by LLVM and tools, we have `main:3 @ _Z5funcAi:1 @ _Z8funcLeafi`. The form used by `SampleContext` is easier for context promotion, and the conversion from `[]` form to `SampleContext` is just a StringRef wrapper. That is what I was thinking -- for external format, we need to make it well defined and as consistent as possible, while for internal representation, any format is fine. Here is the question: is there a need to share external and internal rep? Once the external strings are parsed they can be discarded for internal format, is there more compact form ? Is there a need to use string ? what is the difference between top level context vs context less header? They have different meanings, consider a dso, context-less profile `foo` means we just don't know the calling context, while context profile `[foo]` means this is called directly from external function. In practice, we don't have context-less profile and context profile in a single profile file now, so it's also about consistency in context profile and enough differentiation between context profile and context-less profile (i.e reserve the form `main` only for context-less profile for today's AFDO). davidxl: > > In this case, where does the bracket go? > > Internally, we don't use the bracket, and…
		wenleiAuthorUnsubmitted Done Reply Inline Actions for internal format, is there more compact form ? Is there a need to use string ? This is something we thought about too. We were thinking about something along the lines of a rolling hash (integer encoding that is friendly to context promotion operation) to eliminate StringRef. But actually with current implementation, it's quite compact already because it's always StringRef and we never need to create any new string (the order of context syntax, root to leaf, was also intentional to make promoted context substring of original context). So we're happy with it. is there a need to share external and internal rep? Once the external strings are parsed they can be discarded Sharing between internal and external format isn't must have, but I think it's nice to have if cost is minimal. External strings can be discarded but that would require non-trivial framework changes that affects AFDO too. Currently, for AFDO (and CSSPGO) we keep profile file in a memory buffer, and all external strings are StringRef, which is used throughout SampleProfileReader and hence SampleProfileLoader, assuming the underlying strings are always available. E.g. FunctionSamples::Name is StringRef wrapped around external string from the memory buffer of input profile. CSSPGO uses the same framework, and all of that would need to be changed if we want to discard external strings (essentially freeing the memory buffer after loading profile). This (freeing the memory buffer after loading profile for both AFDO and CSSPGO) feels like more of an optimization rather than part of the CSSPGO framework, and if we do that optimization later, we could change internal representation accordingly. What do you think? wenlei: > for internal format, is there more compact form ? Is there a need to use string ? This is…
		davidxlUnsubmitted Done Reply Inline Actions for internal format, is there more compact form ? Is there a need to use string ? This is something we thought about too. We were thinking about something along the lines of a rolling hash (integer encoding that is friendly to context promotion operation) to eliminate StringRef. But actually with current implementation, it's quite compact already because it's always StringRef and we never need to create any new string (the order of context syntax, root to leaf, was also intentional to make promoted context substring of original context). So we're happy with it. A related question is how large is memory consumption when the raw string is used throughout ? By changing the internal format, can it bring down memory usage further for a large profile? is there a need to share external and internal rep? Once the external strings are parsed they can be discarded Sharing between internal and external format isn't must have, but I think it's nice to have if cost is minimal. External strings can be discarded but that would require non-trivial framework changes that affects AFDO too. Currently, for AFDO (and CSSPGO) we keep profile file in a memory buffer, and all external strings are StringRef, which is used throughout SampleProfileReader and hence SampleProfileLoader, assuming the underlying strings are always available. E.g. FunctionSamples::Name is StringRef wrapped around external string from the memory buffer of input profile. CSSPGO uses the same framework, and all of that would need to be changed if we want to discard external strings (essentially freeing the memory buffer after loading profile). This (freeing the memory buffer after loading profile for both AFDO and CSSPGO) feels like more of an optimization rather than part of the CSSPGO framework, and if we do that optimization later, we could change internal representation accordingly. What do you think? Wei can probably chime in -- he has plans to reduce cross binary FDO string table size. If we tie our implementation too much on string sharing, it makes it less flexible to change in the future. If we can decouple external/internal representation, it allows us to get it right for the external format from the beginning. davidxl: > > for internal format, is there more compact form ? Is there a need to use string ? > >…
		wenleiAuthorUnsubmitted Done Reply Inline Actions A related question is how large is memory consumption when the raw string is used throughout ? By changing the internal format, can it bring down memory usage further for a large profile? The extra memory consumption now is just fixed 12 bytes (StringRef) for each context, not a string for each context. There's of course memory consumption for the memory buffer holding input profile (and that is the size of input profile). If we tie our implementation too much on string sharing, it makes it less flexible to change in the future. If we can decouple external/internal representation, it allows us to get it right for the external format from the beginning. I see your point. But I think since this is only about internal representation, we always have the flexibility in changing it in the future (no compatibility issue, the coupling isn't invasive neither is the change). With current framework, we thought string representation with StringRef fits nicely. If in the future, we want to free memory buffer after profile loading, I'd be happy to take care of this part. What we're doing here is consistent with AFDO today which also shares strings from memory buffer for internal FunctionSamples. In that sense, I don't think this needs to be treated differently from AFDO. It's also not easy to decouple cleanly now with AFDO and common things like FunctionSample still using strings from memory buffer. wenlei: > A related question is how large is memory consumption when the raw string is used throughout ?
		// Example of context-less function name (same as AutoFDO):
		// `_Z8funcLeafi`
		class SampleContext {
		davidxlUnsubmitted Done Reply Inline Actions document this method. davidxl: document this method.
		wenleiAuthorUnsubmitted Done Reply Inline Actions Done. wenlei: Done.
		public:
		SampleContext() : State(UnknownContext) {}
		SampleContext(StringRef ContextStr,
		ContextStateMask CState = UnknownContext) {
		setContext(ContextStr, CState);
		}

		// Promote context by removing top frames (represented by `ContextStrToRemove`).
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - // Promote context by removing top frames (represented by `ContextStrToRemove`). - // Note that with string representation of context, the promotion is effectively - // a substr operation with `ContextStrToRemove` removed from left. + // Promote context by removing top frames (represented by + // `ContextStrToRemove`). Note that with string representation of context, the + // promotion is effectively a substr operation with `ContextStrToRemove` + // removed from left. Lint: Pre-merge checks: clang-format: please reformat the code ``` - // Promote context by removing top frames…
		// Note that with string representation of context, the promotion is effectively
		// a substr operation with `ContextStrToRemove` removed from left.
		void promoteOnPath(StringRef ContextStrToRemove) {
		assert(FullContext.startswith(ContextStrToRemove));

		davidxlUnsubmitted Done Reply Inline Actions Document the context string format here. davidxl: Document the context string format here.
		wenleiAuthorUnsubmitted Done Reply Inline Actions Added header comment to `SampleContext`. wenlei: Added header comment to `SampleContext`.
		// Remove leading context and frame separator " @ ".
		FullContext = FullContext.substr(ContextStrToRemove.size() + 3);
		CallingContext = CallingContext.substr(ContextStrToRemove.size() + 3);
		}

		// Split the top context frame (left-most substr) from context.
		static std::pair<StringRef, StringRef>
		splitContextString(StringRef ContextStr) {
		return ContextStr.split(" @ ");
		}

		// Decode context string for a frame to get function name and location.
		// `ContextStr` is in the form of `FuncName:StartLine.Discriminator`.
		static void decodeContextString(StringRef ContextStr, StringRef &FName,
		LineLocation &LineLoc) {
		// Get function name
		auto EntrySplit = ContextStr.split(':');
		FName = EntrySplit.first;

		LineLoc = {0, 0};
		if (!EntrySplit.second.empty()) {
		// Get line offset, use signed int for getAsInteger so string will
		// be parsed as signed.
		int LineOffset = 0;
		auto LocSplit = EntrySplit.second.split('.');
		LocSplit.first.getAsInteger(10, LineOffset);
		LineLoc.LineOffset = LineOffset;

		// Get discriminator
		if (!LocSplit.second.empty())
		LocSplit.second.getAsInteger(10, LineLoc.Discriminator);
		}
		davidxlUnsubmitted Done Reply Inline Actions what is the input format? document it here. davidxl: what is the input format? document it here.
		wenleiAuthorUnsubmitted Done Reply Inline Actions Added comment. wenlei: Added comment.
		}

		operator StringRef() const { return FullContext; }
		bool hasState(ContextStateMask S) { return State & (uint32_t)S; }
		void setState(ContextStateMask S) { State \|= (uint32_t)S; }
		void clearState(ContextStateMask S) { State &= (uint32_t)~S; }
		bool hasContext() const { return State != UnknownContext; }
		bool isBaseContext() const { return CallingContext.empty(); }
		StringRef getName() const { return Name; }
		StringRef getCallingContext() const { return CallingContext; }
		StringRef getNameWithContext() const { return FullContext; }

		private:
		// Give a context string, decode and populate internal states like
		// Function name, Calling context and context state. Example of input
		// `ContextStr`: `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]`
		void setContext(StringRef ContextStr, ContextStateMask CState) {
		assert(!ContextStr.empty());
		// Note that `[]` wrapped input indicates a full context string, otherwise
		// it's treated as context-less function name only.
		bool HasContext = ContextStr.startswith("[");
		if (!HasContext && CState == UnknownContext) {
		State = UnknownContext;
		Name = FullContext = ContextStr;
		} else {
		// Assume raw context profile if unspecified
		if (CState == UnknownContext)
		State = RawContext;
		else
		State = CState;

		// Remove encapsulating '[' and ']' if any
		if (HasContext)
		FullContext = ContextStr.substr(1, ContextStr.size() - 2);
		else
		FullContext = ContextStr;

		// Caller is to the left of callee in context string
		auto NameContext = FullContext.rsplit(" @ ");
		if (NameContext.second.empty()) {
		Name = NameContext.first;
		CallingContext = NameContext.second;
		} else {
		Name = NameContext.second;
		CallingContext = NameContext.first;
		}
		}
		}

		// Full context string including calling context and leaf function name
		StringRef FullContext;
		// Function name for the associated sample profile
		StringRef Name;
		// Calling context (leaf function excluded) for the associated sample profile
		StringRef CallingContext;
		// State of the associated sample profile
		uint32_t State;
		};

class FunctionSamples;		class FunctionSamples;
class SampleProfileReaderItaniumRemapper;		class SampleProfileReaderItaniumRemapper;

using BodySampleMap = std::map<LineLocation, SampleRecord>;		using BodySampleMap = std::map<LineLocation, SampleRecord>;
// NOTE: Using a StringMap here makes parsed profiles consume around 17% more		// NOTE: Using a StringMap here makes parsed profiles consume around 17% more
// memory, which is very significant for large profiles.		// memory, which is very significant for large profiles.
using FunctionSamplesMap = std::map<std::string, FunctionSamples, std::less<>>;		using FunctionSamplesMap = std::map<std::string, FunctionSamples, std::less<>>;
using CallsiteSampleMap = std::map<LineLocation, FunctionSamplesMap>;		using CallsiteSampleMap = std::map<LineLocation, FunctionSamplesMap>;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
}		}

/// Return the number of samples collected at the given location.		/// Return the number of samples collected at the given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,		ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,
uint32_t Discriminator) const {		uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end())		if (ret == BodySamples.end()) {
		// For CSSPGO, in order to conserve profile size, we no longer write out
		davidxlUnsubmitted Done Reply Inline Actions CSFDO ==> CSSPGO davidxl: CSFDO ==> CSSPGO
		// locations profile for those not hit during training, so we need to
		// treat them as zero instead of error here.
		wmiUnsubmitted Done Reply Inline Actions It means CSSPGO will treat all the new lines as cold, even if some of them may be inferred from other parts of the profile. How much extra size is needed if zero is emitted? wmi: It means CSSPGO will treat all the new lines as cold, even if some of them may be inferred from…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Knowing that CS profile will be much bigger, we started with trimming zero counts trying to save size as much as we can. But I don't actually have the data at hand. Let me see if I can get some data on this. New lines will be less of a problem for pseudo-probe if they don't change CFG. wenlei: Knowing that CS profile will be much bigger, we started with trimming zero counts trying to…
		wenleiAuthorUnsubmitted Done Reply Inline Actions I took a look at current profile generation tool. It requires some extra work to fill in zeros for not sample lines for CSSPGO. I collected AutoFDO for mysql w/ and w/o zeros filled, here's the size difference. CSSPGO is likely to see similar or bigger relative size difference (profile for context can be more sparse). w/ zero filled for unsampled lines: 3.9M w/o zero filled for unsampled lines: 1.4M wenlei: I took a look at current profile generation tool. It requires some extra work to fill in zeros…
		if (ProfileIsCS)
		return 0;
return std::error_code();		return std::error_code();
else		} else {
return ret->second.getSamples();		return ret->second.getSamples();
}		}
		}

/// Returns the call target map collected at a given location.		/// Returns the call target map collected at a given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
ErrorOr<SampleRecord::CallTargetMap>		ErrorOr<SampleRecord::CallTargetMap>
findCallTargetMapAt(uint32_t LineOffset, uint32_t Discriminator) const {		findCallTargetMapAt(uint32_t LineOffset, uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end())		if (ret == BodySamples.end())
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	public:
///		///
/// \returns the FunctionSamples pointer to the inlined instance.		/// \returns the FunctionSamples pointer to the inlined instance.
/// If \p Remapper is not nullptr, it will be used to find matching		/// If \p Remapper is not nullptr, it will be used to find matching
/// FunctionSamples with not exactly the same but equivalent name.		/// FunctionSamples with not exactly the same but equivalent name.
const FunctionSamples *findFunctionSamples(		const FunctionSamples *findFunctionSamples(
const DILocation *DIL,		const DILocation *DIL,
SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;		SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;

		static bool ProfileIsCS;

		SampleContext &getContext() const { return Context; }

		void setContext(const SampleContext &FContext) { Context = FContext; }

static SampleProfileFormat Format;		static SampleProfileFormat Format;

/// Whether the profile uses MD5 to represent string.		/// Whether the profile uses MD5 to represent string.
static bool UseMD5;		static bool UseMD5;

/// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for		/// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for
/// all the function symbols defined or declared in current module.		/// all the function symbols defined or declared in current module.
DenseMap<uint64_t, StringRef> *GUIDToFuncNameMap = nullptr;		DenseMap<uint64_t, StringRef> *GUIDToFuncNameMap = nullptr;

// Assume the input \p Name is a name coming from FunctionSamples itself.		// Assume the input \p Name is a name coming from FunctionSamples itself.
// If UseMD5 is true, the name is already a GUID and we		// If UseMD5 is true, the name is already a GUID and we
// don't want to return the GUID of GUID.		// don't want to return the GUID of GUID.
static uint64_t getGUID(StringRef Name) {		static uint64_t getGUID(StringRef Name) {
return UseMD5 ? std::stoull(Name.data()) : Function::getGUID(Name);		return UseMD5 ? std::stoull(Name.data()) : Function::getGUID(Name);
}		}

// Find all the names in the current FunctionSamples including names in		// Find all the names in the current FunctionSamples including names in
// all the inline instances and names of call targets.		// all the inline instances and names of call targets.
void findAllNames(DenseSet<StringRef> &NameSet) const;		void findAllNames(DenseSet<StringRef> &NameSet) const;

private:		private:
/// Mangled name of the function.		/// Mangled name of the function.
StringRef Name;		StringRef Name;

		/// Calling context for function profile
		mutable SampleContext Context;

/// Total number of samples collected inside this function.		/// Total number of samples collected inside this function.
///		///
/// Samples are cumulative, they include all the samples collected		/// Samples are cumulative, they include all the samples collected
/// inside this function and all its inlined callees.		/// inside this function and all its inlined callees.
uint64_t TotalSamples = 0;		uint64_t TotalSamples = 0;

/// Total number of samples collected at the head of the function.		/// Total number of samples collected at the head of the function.
/// This is an approximation of the number of calls made to this function		/// This is an approximation of the number of calls made to this function
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/SampleProfReader.h

Show First 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	public:
/// Return the profile summary.		/// Return the profile summary.
ProfileSummary &getSummary() const { return *(Summary.get()); }		ProfileSummary &getSummary() const { return *(Summary.get()); }

MemoryBuffer *getBuffer() const { return Buffer.get(); }		MemoryBuffer *getBuffer() const { return Buffer.get(); }

/// \brief Return the profile format.		/// \brief Return the profile format.
SampleProfileFormat getFormat() const { return Format; }		SampleProfileFormat getFormat() const { return Format; }

		/// Whether input profile is fully context-sensitie
		bool profileIsCS() const { return ProfileIsCS; }

virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {		virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {
return nullptr;		return nullptr;
};		};

/// It includes all the names that have samples either in outline instance		/// It includes all the names that have samples either in outline instance
/// or inline instance.		/// or inline instance.
virtual std::vector<StringRef> *getNameTable() { return nullptr; }		virtual std::vector<StringRef> *getNameTable() { return nullptr; }
virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) { return false; };		virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) { return false; };
Show All 26 Lines	takeSummary(SampleProfileReader &Reader) {
return std::move(Reader.Summary);		return std::move(Reader.Summary);
}		}

/// Compute summary for this profile.		/// Compute summary for this profile.
void computeSummary();		void computeSummary();

std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;		std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;

		bool ProfileIsCS = false;

/// \brief The format of sample.		/// \brief The format of sample.
SampleProfileFormat Format = SPF_None;		SampleProfileFormat Format = SPF_None;
};		};

class SampleProfileReaderText : public SampleProfileReader {		class SampleProfileReaderText : public SampleProfileReader {
public:		public:
SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)		SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
: SampleProfileReader(std::move(B), C, SPF_Text) {}		: SampleProfileReader(std::move(B), C, SPF_Text) {}
▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

This file was added.

				//===- Transforms/IPO/SampleContextTracker.h --------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This file provides the interface for context-sensitive profile tracker used
				/// by CSSPGO.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H
				#define LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H

				#include "llvm/ADT/SmallSet.h"
				#include "llvm/ADT/StringMap.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/ProfileData/SampleProf.h"
				#include <list>
				#include <map>

				using namespace llvm;
				using namespace sampleprof;

				namespace llvm {

				// Internal trie tree representation used for tracking context tree and sample
				// profiles. The path from root node to a given node represents the context of
				// that nodes' profile.
				class ContextTrieNode {
				public:
				ContextTrieNode(ContextTrieNode *Parent = nullptr,
				StringRef FName = StringRef(),
				FunctionSamples *FSamples = nullptr,
				LineLocation CallLoc = {0, 0})
				: ParentContext(Parent), FuncName(FName), FuncSamples(FSamples),
				CallSiteLoc(CallLoc){};
				ContextTrieNode *getChildContext(const LineLocation &CallSite,
				StringRef CalleeName);
				ContextTrieNode *getChildContext(const LineLocation &CallSite);
				ContextTrieNode *getOrCreateChildContext(const LineLocation &CallSite,
				StringRef CalleeName,
				bool AllowCreate = true);

				ContextTrieNode &moveToChildContext(const LineLocation &CallSite,
				ContextTrieNode &&NodeToMove,
				StringRef ContextStrToRemove,
				bool DeleteNode = true);
				void removeChildContext(const LineLocation &CallSite, StringRef CalleeName);
				std::map<uint32_t, ContextTrieNode> &getAllChildContext();
				const StringRef getFuncName() const;
				FunctionSamples *getFunctionSamples() const;
				void setFunctionSamples(FunctionSamples *FSamples);
				LineLocation getCallSiteLoc() const;
				ContextTrieNode *getParentContext() const;
				void setParentContext(ContextTrieNode *Parent);
				void dump();

				private:
				static uint32_t nodeHash(StringRef ChildName, const LineLocation &Callsite);

				// Map line+discriminator location to child context
				std::map<uint32_t, ContextTrieNode> AllChildContext;

				// Link to parent context node
				ContextTrieNode *ParentContext;
				wmiUnsubmitted Done Reply Inline Actions Rename it to ChildContexts or AllChildContext? wmi: Rename it to ChildContexts or AllChildContext?

				// Function name for current context
				StringRef FuncName;

				// Function Samples for current context
				FunctionSamples *FuncSamples;

				// Callsite location in parent context
				LineLocation CallSiteLoc;
				};

				// Profile tracker that manages profiles and its associated context. It
				// provides interfaces used by sample profile loader to query context profile or
				// base profile for given function or location; it also manages context tree
				// manipulation that is needed to accommodate inline decisions so we have
				// accurate post-inline profile for functions. Internally context profiles
				// are organized in a trie, with each node representing profile for specific
				// calling context and the context is identified by path from root to the node.
				class SampleContextTracker {
				public:
				SampleContextTracker(StringMap<FunctionSamples> &Profiles);
				// Query context profile for a specific callee with given name at a given
				// call-site. The full context is identified by location of call instruction.
				FunctionSamples *getCalleeContextSamplesFor(const CallBase &Inst,
				davidxlUnsubmitted Done Reply Inline Actions Document the public APIs. davidxl: Document the public APIs.
				wenleiAuthorUnsubmitted Done Reply Inline Actions Done. wenlei: Done.
				StringRef CalleeName);
				// Query context profile for a given location. The full context
				// is identified by input DILocation.
				FunctionSamples getContextSamplesFor(const DILocation DIL);
				// Query context profile for a given sample contxt of a function.
				FunctionSamples *getContextSamplesFor(const SampleContext &Context);
				// Query base profile for a given function. A base profile is a merged view
				// of all context profiles for contexts that are not inlined.
				FunctionSamples *getBaseSamplesFor(const Function &Func,
				bool MergeContext = true);
				// Query base profile for a given function by name.
				FunctionSamples *getBaseSamplesFor(StringRef Name, bool MergeContext);
				// Mark a context profile as inlined when function is inlined.
				// This makes sure that inlined context profile will be excluded in
				// function's base profile.
				void markContextSamplesInlined(const FunctionSamples *InlinedSamples);
				// Dump the internal context profile trie.
				void dump();

				private:
				ContextTrieNode getContextFor(const DILocation DIL);
				ContextTrieNode *getContextFor(const SampleContext &Context);
				ContextTrieNode getCalleeContextFor(const DILocation DIL,
				StringRef CalleeName);
				ContextTrieNode *getOrCreateContextPath(const SampleContext &Context,
				bool AllowCreate);
				ContextTrieNode *getTopLevelContextNode(StringRef FName);
				ContextTrieNode &addTopLevelContextNode(StringRef FName);
				ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &NodeToPromo);
				void promoteMergeContextSamplesTree(const Instruction &Inst,
				StringRef CalleeName);
				void mergeContextNode(ContextTrieNode &FromNode, ContextTrieNode &ToNode,
				StringRef ContextStrToRemove);
				ContextTrieNode &promoteMergeContextSamplesTree(ContextTrieNode &FromNode,
				ContextTrieNode &ToNodeParent,
				StringRef ContextStrToRemove);

				// Map from function name to context profiles (excluding base profile)
				StringMap<SmallSet<FunctionSamples *, 16>> FuncToCtxtProfileSet;

				// Root node for context trie tree
				ContextTrieNode RootContext;
				};

				} // end namespace llvm
				#endif // LLVM_TRANSFORMS_IPO_SAMPLECONTEXTTRACKER_H

llvm/lib/ProfileData/SampleProf.cpp

	Show All 25 Lines
	#include <system_error>			#include <system_error>

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;

	namespace llvm {			namespace llvm {
	namespace sampleprof {			namespace sampleprof {
	SampleProfileFormat FunctionSamples::Format;			SampleProfileFormat FunctionSamples::Format;
				bool FunctionSamples::ProfileIsCS = false;
	bool FunctionSamples::UseMD5;			bool FunctionSamples::UseMD5;
	} // namespace sampleprof			} // namespace sampleprof
	} // namespace llvm			} // namespace llvm

	namespace {			namespace {

	// FIXME: This class is only here to support the transition to llvm::Error. It			// FIXME: This class is only here to support the transition to llvm::Error. It
	// will be removed once this transition is complete. Clients should prefer to			// will be removed once this transition is complete. Clients should prefer to
	▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfReader.cpp

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
/// the expected format.		/// the expected format.
///		///
/// \returns true if the file was loaded successfully, false otherwise.		/// \returns true if the file was loaded successfully, false otherwise.
std::error_code SampleProfileReaderText::readImpl() {		std::error_code SampleProfileReaderText::readImpl() {
line_iterator LineIt(Buffer, /SkipBlanks=*/true, '#');		line_iterator LineIt(Buffer, /SkipBlanks=*/true, '#');
sampleprof_error Result = sampleprof_error::success;		sampleprof_error Result = sampleprof_error::success;

InlineCallStack InlineStack;		InlineCallStack InlineStack;
		int CSProfileCount = 0;
		int RegularProfileCount = 0;

for (; !LineIt.is_at_eof(); ++LineIt) {		for (; !LineIt.is_at_eof(); ++LineIt) {
if ((LineIt)[(LineIt).find_first_not_of(' ')] == '#')		if ((LineIt)[(LineIt).find_first_not_of(' ')] == '#')
continue;		continue;
// Read the header of each function.		// Read the header of each function.
//		//
// Note that for function identifiers we are actually expecting		// Note that for function identifiers we are actually expecting
// mangled names, but we may not always get them. This happens when		// mangled names, but we may not always get them. This happens when
// the compiler decides not to emit the function (e.g., it was inlined		// the compiler decides not to emit the function (e.g., it was inlined
// and removed). In this case, the binary will not have the linkage		// and removed). In this case, the binary will not have the linkage
// name for the function, so the profiler will emit the function's		// name for the function, so the profiler will emit the function's
// unmangled name, which may contain characters like ':' and '>' in its		// unmangled name, which may contain characters like ':' and '>' in its
// name (member functions, templates, etc).		// name (member functions, templates, etc).
//		//
// The only requirement we place on the identifier, then, is that it		// The only requirement we place on the identifier, then, is that it
// should not begin with a number.		// should not begin with a number.
if ((*LineIt)[0] != ' ') {		if ((*LineIt)[0] != ' ') {
uint64_t NumSamples, NumHeadSamples;		uint64_t NumSamples, NumHeadSamples;
StringRef FName;		StringRef FName;
if (!ParseHead(*LineIt, FName, NumSamples, NumHeadSamples)) {		if (!ParseHead(*LineIt, FName, NumSamples, NumHeadSamples)) {
reportError(LineIt.line_number(),		reportError(LineIt.line_number(),
"Expected 'mangled_name:NUM:NUM', found " + *LineIt);		"Expected 'mangled_name:NUM:NUM', found " + *LineIt);
return sampleprof_error::malformed;		return sampleprof_error::malformed;
}		}
Profiles[FName] = FunctionSamples();		SampleContext FContext(FName);
FunctionSamples &FProfile = Profiles[FName];		if (FContext.hasContext())
FProfile.setName(FName);		++CSProfileCount;
		else
		++RegularProfileCount;
		Profiles[FContext] = FunctionSamples();
		FunctionSamples &FProfile = Profiles[FContext];
		FProfile.setName(FContext.getName());
		FProfile.setContext(FContext);
		davidxlUnsubmitted Done Reply Inline Actions so FName is also the context String? davidxl: so FName is also the context String?
		wenleiAuthorUnsubmitted Done Reply Inline Actions Yes, `SampleContext` can take both full context string (wrapped with `[]`) as well as context-less function names, and it will set internal state accordingly. I've updated header comment for `SampleContext` with details. // Example of full context string (note the wrapping `[]`): // `[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]` // Example of context-less function name (same as AutoFDO): // `_Z8funcLeafi` wenlei: Yes, `SampleContext` can take both full context string (wrapped with `[]`) as well as context…
MergeResult(Result, FProfile.addTotalSamples(NumSamples));		MergeResult(Result, FProfile.addTotalSamples(NumSamples));
MergeResult(Result, FProfile.addHeadSamples(NumHeadSamples));		MergeResult(Result, FProfile.addHeadSamples(NumHeadSamples));
InlineStack.clear();		InlineStack.clear();
InlineStack.push_back(&FProfile);		InlineStack.push_back(&FProfile);
} else {		} else {
uint64_t NumSamples;		uint64_t NumSamples;
StringRef FName;		StringRef FName;
DenseMap<StringRef, uint64_t> TargetCountMap;		DenseMap<StringRef, uint64_t> TargetCountMap;
Show All 25 Lines	if ((*LineIt)[0] != ' ') {
LineOffset, Discriminator, name_count.first,		LineOffset, Discriminator, name_count.first,
name_count.second));		name_count.second));
}		}
MergeResult(Result, FProfile.addBodySamples(LineOffset, Discriminator,		MergeResult(Result, FProfile.addBodySamples(LineOffset, Discriminator,
NumSamples));		NumSamples));
}		}
}		}
}		}

		assert((RegularProfileCount == 0 \|\| CSProfileCount == 0) &&
		"Cannot have both context-sensitive and regular profile");
		ProfileIsCS = (CSProfileCount > 0);

if (Result == sampleprof_error::success)		if (Result == sampleprof_error::success)
computeSummary();		computeSummary();

return Result;		return Result;
}		}

bool SampleProfileReaderText::hasFormat(const MemoryBuffer &Buffer) {		bool SampleProfileReaderText::hasFormat(const MemoryBuffer &Buffer) {
bool result = false;		bool result = false;
▲ Show 20 Lines • Show All 1,012 Lines • ▼ Show 20 Lines	if (Reader.useMD5()) {
Ctx.diagnose(DiagnosticInfoSampleProfile(		Ctx.diagnose(DiagnosticInfoSampleProfile(
Reader.getBuffer()->getBufferIdentifier(),		Reader.getBuffer()->getBufferIdentifier(),
"Profile data remapping cannot be applied to profile data "		"Profile data remapping cannot be applied to profile data "
"in compact format (original mangled names are not available).",		"in compact format (original mangled names are not available).",
DS_Warning));		DS_Warning));
return;		return;
}		}

		// CSSPGO-TODO: Remapper is not yet supported.
		// We will need to remap the entire context string.
assert(Remappings && "should be initialized while creating remapper");		assert(Remappings && "should be initialized while creating remapper");
for (auto &Sample : Reader.getProfiles()) {		for (auto &Sample : Reader.getProfiles()) {
DenseSet<StringRef> NamesInSample;		DenseSet<StringRef> NamesInSample;
Sample.second.findAllNames(NamesInSample);		Sample.second.findAllNames(NamesInSample);
for (auto &Name : NamesInSample)		for (auto &Name : NamesInSample)
if (auto Key = Remappings->insert(Name))		if (auto Key = Remappings->insert(Name))
NameMap.insert({Key, Name});		NameMap.insert({Key, Name});
}		}
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/CMakeLists.txt

Show All 25 Lines	add_llvm_component_library(LLVMipo
Internalize.cpp		Internalize.cpp
LoopExtractor.cpp		LoopExtractor.cpp
LowerTypeTests.cpp		LowerTypeTests.cpp
MergeFunctions.cpp		MergeFunctions.cpp
OpenMPOpt.cpp		OpenMPOpt.cpp
PartialInlining.cpp		PartialInlining.cpp
PassManagerBuilder.cpp		PassManagerBuilder.cpp
PruneEH.cpp		PruneEH.cpp
		SampleContextTracker.cpp
SampleProfile.cpp		SampleProfile.cpp
SampleProfileProbe.cpp		SampleProfileProbe.cpp
SCCP.cpp		SCCP.cpp
StripDeadPrototypes.cpp		StripDeadPrototypes.cpp
StripSymbols.cpp		StripSymbols.cpp
SyntheticCountsPropagation.cpp		SyntheticCountsPropagation.cpp
ThinLTOBitcodeWriter.cpp		ThinLTOBitcodeWriter.cpp
WholeProgramDevirt.cpp		WholeProgramDevirt.cpp
Show All 30 Lines

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

This file was added.

				//===- SampleContextTracker.cpp - Context-sensitive Profile Tracker -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the SampleContextTracker used by CSSPGO.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/IPO/SampleContextTracker.h"
				#include "llvm/ADT/StringMap.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/ProfileData/SampleProf.h"
				#include <map>
				#include <queue>
				#include <vector>

				using namespace llvm;
				using namespace sampleprof;

				#define DEBUG_TYPE "sample-context-tracker"

				namespace llvm {

				ContextTrieNode *ContextTrieNode::getChildContext(const LineLocation &CallSite,
				StringRef CalleeName) {
				if (CalleeName.empty())
				return getChildContext(CallSite);

				uint32_t Hash = nodeHash(CalleeName, CallSite);
				auto It = AllChildContext.find(Hash);
				if (It != AllChildContext.end())
				return &It->second;
				return nullptr;
				}

				ContextTrieNode *
				ContextTrieNode::getChildContext(const LineLocation &CallSite) {
				// CSFDO-TODO: This could be slow, change AllChildContext so we can
				// do point look up for child node by call site alone.
				// CSFDO-TODO: Return the child with max count for indirect call
				ContextTrieNode *ChildNodeRet = nullptr;
				for (auto &It : AllChildContext) {
				ContextTrieNode &ChildNode = It.second;
				if (ChildNode.CallSiteLoc == CallSite) {
				if (ChildNodeRet)
				return nullptr;
				else
				ChildNodeRet = &ChildNode;
				}
				}

				return ChildNodeRet;
				}

				ContextTrieNode &ContextTrieNode::moveToChildContext(
				const LineLocation &CallSite, ContextTrieNode &&NodeToMove,
				StringRef ContextStrToRemove, bool DeleteNode) {
				uint32_t Hash = nodeHash(NodeToMove.getFuncName(), CallSite);
				assert(!AllChildContext.count(Hash) && "Node to remove must exist");
				LineLocation OldCallSite = NodeToMove.CallSiteLoc;
				wmiUnsubmitted Done Reply Inline Actions Make param AllowCreate default to true so we don't need this wrapper? wmi: Make param AllowCreate default to true so we don't need this wrapper?
				wenleiAuthorUnsubmitted Done Reply Inline Actions Good catch, changed. wenlei: Good catch, changed.
				ContextTrieNode &OldParentContext = *NodeToMove.getParentContext();
				AllChildContext[Hash] = NodeToMove;
				ContextTrieNode &NewNode = AllChildContext[Hash];
				NewNode.CallSiteLoc = CallSite;

				// Walk through nodes in the moved the subtree, and update
				wmiUnsubmitted Done Reply Inline Actions Add an assert message. wmi: Add an assert message.
				// FunctionSamples' context as for the context promotion.
				// We also need to set new parant link for all children.
				std::queue<ContextTrieNode *> NodeToUpdate;
				NewNode.setParentContext(this);
				NodeToUpdate.push(&NewNode);

				while (!NodeToUpdate.empty()) {
				ContextTrieNode *Node = NodeToUpdate.front();
				NodeToUpdate.pop();
				FunctionSamples *FSamples = Node->getFunctionSamples();

				if (FSamples) {
				FSamples->getContext().promoteOnPath(ContextStrToRemove);
				FSamples->getContext().setState(SyntheticContext);
				LLVM_DEBUG(dbgs() << " Context promoted to: " << FSamples->getContext()
				<< "\n");
				}

				for (auto &It : Node->getAllChildContext()) {
				ContextTrieNode *ChildNode = &It.second;
				ChildNode->setParentContext(Node);
				NodeToUpdate.push(ChildNode);
				}
				}

				// Original context no longer needed, destroy if requested.
				if (DeleteNode)
				OldParentContext.removeChildContext(OldCallSite, NewNode.getFuncName());

				return NewNode;
				}

				void ContextTrieNode::removeChildContext(const LineLocation &CallSite,
				StringRef CalleeName) {
				uint32_t Hash = nodeHash(CalleeName, CallSite);
				// Note this essentially calls dtor and destroys that child context
				AllChildContext.erase(Hash);
				}

				std::map<uint32_t, ContextTrieNode> &ContextTrieNode::getAllChildContext() {
				return AllChildContext;
				}

				const StringRef ContextTrieNode::getFuncName() const { return FuncName; }

				FunctionSamples *ContextTrieNode::getFunctionSamples() const {
				return FuncSamples;
				}

				void ContextTrieNode::setFunctionSamples(FunctionSamples *FSamples) {
				FuncSamples = FSamples;
				}

				LineLocation ContextTrieNode::getCallSiteLoc() const { return CallSiteLoc; }

				ContextTrieNode *ContextTrieNode::getParentContext() const {
				return ParentContext;
				}

				void ContextTrieNode::setParentContext(ContextTrieNode *Parent) {
				ParentContext = Parent;
				}

				void ContextTrieNode::dump() {
				dbgs() << "Node: " << FuncName << "\n"
				<< " Callsite: " << CallSiteLoc << "\n"
				<< " Children:\n";

				for (auto &It : AllChildContext) {
				dbgs() << " Node: " << It.second.getFuncName() << "\n";
				}
				}

				uint32_t ContextTrieNode::nodeHash(StringRef ChildName,
				const LineLocation &Callsite) {
				// We still use child's name for child hash, this is
				// because for children of root node, we don't have
				// different line/discriminator, and we'll rely on name
				// to differentiate children.
				uint32_t NameHash = std::hash<std::string>{}(ChildName.str());
				uint32_t LocId = (Callsite.LineOffset << 16) \| Callsite.Discriminator;
				return NameHash + (LocId << 5) + LocId;
				}

				ContextTrieNode *ContextTrieNode::getOrCreateChildContext(
				const LineLocation &CallSite, StringRef CalleeName, bool AllowCreate) {
				uint32_t Hash = nodeHash(CalleeName, CallSite);
				auto It = AllChildContext.find(Hash);
				if (It != AllChildContext.end()) {
				assert(It->second.getFuncName() == CalleeName &&
				"Hash collision for child context node");
				return &It->second;
				}

				if (!AllowCreate)
				return nullptr;

				AllChildContext[Hash] = ContextTrieNode(this, CalleeName, nullptr, CallSite);
				return &AllChildContext[Hash];
				}

				// Profiler tracker than manages profiles and its associated context
				SampleContextTracker::SampleContextTracker(
				StringMap<FunctionSamples> &Profiles) {
				for (auto &FuncSample : Profiles) {
				FunctionSamples *FSamples = &FuncSample.second;
				SampleContext Context(FuncSample.first(), RawContext);
				LLVM_DEBUG(dbgs() << "Tracking Context for function: " << Context << "\n");
				if (!Context.isBaseContext())
				FuncToCtxtProfileSet[Context.getName()].insert(FSamples);
				ContextTrieNode *NewNode = getOrCreateContextPath(Context, true);
				assert(!NewNode->getFunctionSamples() &&
				"New node can't have sample profile");
				NewNode->setFunctionSamples(FSamples);
				}
				}

				FunctionSamples *
				SampleContextTracker::getCalleeContextSamplesFor(const CallBase &Inst,
				StringRef CalleeName) {
				LLVM_DEBUG(dbgs() << "Getting callee context for instr: " << Inst << "\n");
				// CSFDO-TODO: We use CalleeName to differentiate indirect call
				// We need to get sample for indirect callee too.
				DILocation *DIL = Inst.getDebugLoc();
				if (!DIL)
				return nullptr;

				ContextTrieNode *CalleeContext = getCalleeContextFor(DIL, CalleeName);
				if (CalleeContext) {
				FunctionSamples *FSamples = CalleeContext->getFunctionSamples();
				LLVM_DEBUG(if (FSamples) {
				dbgs() << " Callee context found: " << FSamples->getContext() << "\n";
				});
				return FSamples;
				}

				return nullptr;
				}

				FunctionSamples *
				SampleContextTracker::getContextSamplesFor(const DILocation *DIL) {
				assert(DIL && "Expect non-null location");

				ContextTrieNode *ContextNode = getContextFor(DIL);
				if (ContextNode) {
				return ContextNode->getFunctionSamples();
				}

				return nullptr;
				}

				FunctionSamples *
				SampleContextTracker::getContextSamplesFor(const SampleContext &Context) {
				ContextTrieNode *Node = getContextFor(Context);
				if (!Node)
				return nullptr;

				return Node->getFunctionSamples();
				}

				FunctionSamples *SampleContextTracker::getBaseSamplesFor(const Function &Func,
				bool MergeContext) {
				StringRef CanonName = FunctionSamples::getCanonicalFnName(Func);
				return getBaseSamplesFor(CanonName, MergeContext);
				}

				FunctionSamples *SampleContextTracker::getBaseSamplesFor(StringRef Name,
				bool MergeContext) {
				LLVM_DEBUG(dbgs() << "Getting base profile for function: " << Name << "\n");
				// Base profile is top-level node (child of root node), so try to retrieve
				// existing top-level node for given function first. If it exists, it could be
				// that we've merged base profile before, or there's actually context-less
				// profile from the input (e.g. due to unreliable stack walking).
				ContextTrieNode *Node = getTopLevelContextNode(Name);
				if (MergeContext) {
				wmiUnsubmitted Done Reply Inline Actions I don't understand what the top level means here. Better document it. Do we cache the base profile somewhere or we merge it everytime? wmi: I don't understand what the top level means here. Better document it. Do we cache the base…
				wenleiAuthorUnsubmitted Done Reply Inline Actions Top-level means it's under root node directly - path from to node is empty, hence no context. I added comment here as well as in the header comment of `SampleContextTracker`. If `getBaseSamplesFor` is called for the same function again, we'll retrieve the existing top-level node from last call (it must exist, with context profile all merged into it already), then iterating over `FuncToCtxtProfileSet[Name]` but won't do anything since all context profiles have been merged (check on line 265). So it's somewhat like caching. Currently `getBaseSamplesFor` is only called once for each function from sample profile loader. wenlei: Top-level means it's under root node directly - path from to node is empty, hence no context. I…
				LLVM_DEBUG(dbgs() << " Merging context profile into base profile: " << Name
				<< "\n");

				// We have profile for function under different contexts,
				// create synthetic base profile and merge context profiles
				// into base profile.
				for (auto *CSamples : FuncToCtxtProfileSet[Name]) {
				SampleContext &Context = CSamples->getContext();
				ContextTrieNode *FromNode = getContextFor(Context);
				if (FromNode == Node)
				continue;

				// Skip inlined context profile and also don't re-merge any context
				if (Context.hasState(InlinedContext) \|\| Context.hasState(MergedContext))
				continue;

				ContextTrieNode &ToNode = promoteMergeContextSamplesTree(*FromNode);
				assert(!Node \|\| Node == &ToNode && "Expect only one base profile");
				Node = &ToNode;
				}
				}

				// Still no profile even after merge/promotion (if allowed)
				wmiUnsubmitted Done Reply Inline Actions Add an assertion message. wmi: Add an assertion message.
				if (!Node)
				return nullptr;

				return Node->getFunctionSamples();
				}

				void SampleContextTracker::markContextSamplesInlined(
				const FunctionSamples *InlinedSamples) {
				assert(InlinedSamples && "Expect non-null inlined samples");
				LLVM_DEBUG(dbgs() << "Marking context profile as inlined: "
				<< InlinedSamples->getContext() << "\n");
				InlinedSamples->getContext().setState(InlinedContext);
				}

				wmiUnsubmitted Done Reply Inline Actions Add an assertion message. wmi: Add an assertion message.
				void SampleContextTracker::promoteMergeContextSamplesTree(
				const Instruction &Inst, StringRef CalleeName) {
				LLVM_DEBUG(dbgs() << "Promoting and merging context tree for instr: \n"
				<< Inst << "\n");
				// CSFDO-TODO: We also need to promote context profile from indirect
				// calls. We won't have callee names from those from call instr.
				if (CalleeName.empty())
				return;

				// Get the caller context for the call instruction, we don't use callee
				// name from call because there can be context from indirect calls too.
				DILocation *DIL = Inst.getDebugLoc();
				ContextTrieNode *CallerNode = getContextFor(DIL);
				if (!CallerNode)
				return;

				// Get the context that needs to be promoted
				LineLocation CallSite(FunctionSamples::getOffset(DIL),
				DIL->getBaseDiscriminator());
				ContextTrieNode *NodeToPromo =
				CallerNode->getChildContext(CallSite, CalleeName);
				if (!NodeToPromo)
				return;

				promoteMergeContextSamplesTree(*NodeToPromo);
				}

				ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(
				ContextTrieNode &NodeToPromo) {
				// Promote the input node to be directly under root. This can happen
				// when we decided to not inline a function under context represented
				// by the input node. The promote and merge is then needed to reflect
				// the context profile in the base (context-less) profile.
				FunctionSamples *FromSamples = NodeToPromo.getFunctionSamples();
				assert(FromSamples && "Shouldn't promote a context without profile");
				LLVM_DEBUG(dbgs() << " Found context tree root to promote: "
				<< FromSamples->getContext() << "\n");

				StringRef ContextStrToRemove = FromSamples->getContext().getCallingContext();
				return promoteMergeContextSamplesTree(NodeToPromo, RootContext,
				ContextStrToRemove);
				}

				void SampleContextTracker::dump() {
				dbgs() << "Context Profile Tree:\n";
				std::queue<ContextTrieNode *> NodeQueue;
				NodeQueue.push(&RootContext);

				while (!NodeQueue.empty()) {
				ContextTrieNode *Node = NodeQueue.front();
				NodeQueue.pop();
				Node->dump();

				for (auto &It : Node->getAllChildContext()) {
				ContextTrieNode *ChildNode = &It.second;
				NodeQueue.push(ChildNode);
				}
				}
				}

				ContextTrieNode *
				SampleContextTracker::getContextFor(const SampleContext &Context) {
				return getOrCreateContextPath(Context, false);
				}

				ContextTrieNode *
				SampleContextTracker::getCalleeContextFor(const DILocation *DIL,
				StringRef CalleeName) {
				assert(DIL && "Expect non-null location");

				// CSSPGO-TODO: need to support indirect callee
				if (CalleeName.empty())
				return nullptr;

				wmiUnsubmitted Done Reply Inline Actions Add an assertion message. wmi: Add an assertion message.
				ContextTrieNode *CallContext = getContextFor(DIL);
				if (!CallContext)
				return nullptr;

				return CallContext->getChildContext(
				LineLocation(FunctionSamples::getOffset(DIL),
				DIL->getBaseDiscriminator()),
				CalleeName);
				}

				ContextTrieNode SampleContextTracker::getContextFor(const DILocation DIL) {
				wmiUnsubmitted Done Reply Inline Actions Do we need to call getCanonicalFnName here to make the name in inline stack canonical so we can match the name in inline stack with the name in context? wmi: Do we need to call getCanonicalFnName here to make the name in inline stack canonical so we can…
				wenleiAuthorUnsubmitted Done Reply Inline Actions Good catch, I think we need to canonicalize `CalleeName` which is the leaf. (The names of middle inline frames should be fine as they're from debug metadata which are not modified when suffixes are appended for symbol promotion, etc..) I guess we need to add `getCanonicalFnName` for `SampleProfileLoader::findCalleeFunctionSamples`. IIUC, we need it there for today's FDO too? wenlei: Good catch, I think we need to canonicalize `CalleeName` which is the leaf. (The names of…
				wmiUnsubmitted Done Reply Inline Actions In the context, there are also levels contributed by stack unwinding. Those frames should have the same names as elf symbols. To be consistent, do we want to apply getCanonicalFnName for all the context levels? I guess we need to add getCanonicalFnName for SampleProfileLoader::findCalleeFunctionSamples. IIUC, we need it there for today's FDO too? Agree. Today, it may not need it because most suffixes are appended after inline so like you said the names of the inline frames from debug metadata don't contain the suffixes. But there are now suffixes being added before inline (https://reviews.llvm.org/D89617) and there may be others in the future. It is good to always apply the function. wmi: In the context, there are also levels contributed by stack unwinding. Those frames should have…
				wenleiAuthorUnsubmitted Done Reply Inline Actions In the context, there are also levels contributed by stack unwinding. Those frames should have the same names as elf symbols. To be consistent, do we want to apply getCanonicalFnName for all the context levels? Good point. In this case, I think it's better canonicalize all names during profile generation though. IIRC AutoFDO get names from dwarf hence it does not have the suffixes (as if it's canonicalized). So doing canonicalization during profile generation would make it consistent with AutoFDO. wenlei: > In the context, there are also levels contributed by stack unwinding. Those frames should…
				wenleiAuthorUnsubmitted Done Reply Inline Actions Name canonicalization is now done in llvm-profgen (https://reviews.llvm.org/D89723). wenlei: Name canonicalization is now done in llvm-profgen (https://reviews.llvm.org/D89723).
				assert(DIL && "Expect non-null location");
				SmallVector<std::pair<LineLocation, StringRef>, 10> S;

				// Use C++ linkage name if possible.
				const DILocation *PrevDIL = DIL;
				wmiUnsubmitted Not Done Reply Inline Actions A question, the context in SampleContextTracker includes not only inline stack but also call stack. S vector below only contains the inline stack at the DIL location. How can it match with the full stack starting from RootContext? wmi: A question, the context in SampleContextTracker includes not only inline stack but also call…
				wenleiAuthorUnsubmitted Done Reply Inline Actions When we decided to not inline a call site, context profile will be promoted to root, so what remains in context tracker should reflect the accurate remainder context profile. E.g. if we start with A->(call) B->(inline) C in context tracker. At some point if we're looking at B->C from DIL, there're two scenarios: If A inlined B, in this case, we wouldn't be able to match B->C from DIL to anything in context tracker. But this is intentional and desired, because The remainder/base profile for B, or the context profile B->C shouldn't have anything if A->B inline happened. If A not inlined B, in this case, B->C should be moved/promoted from child of A to be under root. Then we would be able to match B->C from DIL to B->C (under root) in context tracker. wenlei: When we decided to not inline a call site, context profile will be promoted to root, so what…
				wmiUnsubmitted Not Done Reply Inline Actions I see, thanks. After compiler decides it won't inline at some callsite, the profile for the callsite will be promoted and some context information will be loss. This seems to assume the inlining happens in top-down order and happens only once. I remember the CSSPGO profile will be used to drive CGSCC Inliner in the future. CGSCC Inliner will need to do the inlining iteratively so how it supposes to work with profile promotion? wmi: I see, thanks. After compiler decides it won't inline at some callsite, the profile for the…
				wenleiAuthorUnsubmitted Done Reply Inline Actions You're right that it currently assumes top-down order - that is the best way to leverage context sensitive profile. If we try to use CSSPGO profile to drive SCC inline, bottom-up order and iterative nature are two key differences. The bottom-up inlining means we can't promote context profile by moving them to be under root, instead, we will need to copy (and merge) context profile into the base profile under root. For the same example A->B->C, with SCC inline, we could end up processing B first before A. When processing B, we promote the not inlined context profile of B to be under root (B->C), and merge them together into a base profile of B. However, we still need to keep the original context profile tree (A->B->C) so later when we processing A, we will still see the B and C under A. Actually the promotion happens when we try to access a function's base profile (getBaseSamplesFor calls promoteMergeContextSamplesTree for each not inlined context profile), so the difference between top-down and bottom-up inline is more about accuracy - with bottom-up inline, when getting base profile for B, we'd assume none of B's call sites is inlined even if later A inlines B. For iterative inlining, we can getBaseSamplesFor every time we process a function again to redo the promotion and merge based on the up-to-date inline decisions. E.g. if we process B then A (which inlines B), then B again, the 2nd time we process B, we would not merge the B under A into B's current base profile, which makes the profile more accurate than first pass over B. (But it's still not as good as top-down inline because even if we can unmerge context profile, we can't undo inlining). wenlei: You're right that it currently assumes top-down order - that is the best way to leverage…
				wmiUnsubmitted Done Reply Inline Actions Thanks for the detailed explanation. That makes sense to me. Talking about the profile permotion and merging, if function is still hot after the inlining of all its callsites has been decided, and if it still has different profiles under different contexts, it may be interesting to clone the function so we can still apply the context sensitive profile in group. It will be interesting to have some support to compare profiles under different contexts and split them into groups. I feel the full context sensitive profile opens up some new opportunity we can explore by maximizing its usage in the future. wmi: Thanks for the detailed explanation. That makes sense to me. Talking about the profile…
				wenleiAuthorUnsubmitted Done Reply Inline Actions Yeah, that's a good point. We're also thinking about cloning as it's something clang is still behind gcc. I think it will take some time before we fully leverage the new opportunities. I will send another patch for priority based top-down inlining with CSSPGO, with that more inlining will be done during early top-down inline, but it will take more effort to rebalance inline between sample loader vs CGSCC. wenlei: Yeah, that's a good point. We're also thinking about cloning as it's something clang is still…
				for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
				StringRef Name = PrevDIL->getScope()->getSubprogram()->getLinkageName();
				if (Name.empty())
				Name = PrevDIL->getScope()->getSubprogram()->getName();
				S.push_back(
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - S.push_back( - std::make_pair(LineLocation(FunctionSamples::getOffset(DIL), - DIL->getBaseDiscriminator()), Name)); + S.push_back(std::make_pair(LineLocation(FunctionSamples::getOffset(DIL), + DIL->getBaseDiscriminator()), + Name)); Lint: Pre-merge checks: clang-format: please reformat the code ``` - S.push_back( - std::make_pair…
				std::make_pair(LineLocation(FunctionSamples::getOffset(DIL),
				DIL->getBaseDiscriminator()), Name));
				PrevDIL = DIL;
				}

				// Push root node, note that root node like main may only
				// a name, but not linkage name.
				StringRef RootName = PrevDIL->getScope()->getSubprogram()->getLinkageName();
				if (RootName.empty())
				RootName = PrevDIL->getScope()->getSubprogram()->getName();
				S.push_back(std::make_pair(LineLocation(0, 0), RootName));

				ContextTrieNode *ContextNode = &RootContext;
				int I = S.size();
				while (--I >= 0 && ContextNode) {
				LineLocation &CallSite = S[I].first;
				StringRef &CalleeName = S[I].second;
				ContextNode = ContextNode->getChildContext(CallSite, CalleeName);
				}

				if (I < 0)
				return ContextNode;

				return nullptr;
				}

				ContextTrieNode *
				SampleContextTracker::getOrCreateContextPath(const SampleContext &Context,
				bool AllowCreate) {
				ContextTrieNode *ContextNode = &RootContext;
				StringRef ContextRemain = Context;
				StringRef ChildContext;
				StringRef CalleeName;
				LineLocation CallSiteLoc(0, 0);

				while (ContextNode && !ContextRemain.empty()) {
				auto ContextSplit = SampleContext::splitContextString(ContextRemain);
				ChildContext = ContextSplit.first;
				ContextRemain = ContextSplit.second;
				LineLocation NextCallSiteLoc(0, 0);
				SampleContext::decodeContextString(ChildContext, CalleeName,
				NextCallSiteLoc);

				// Create child node at parent line/disc location
				if (AllowCreate) {
				ContextNode =
				ContextNode->getOrCreateChildContext(CallSiteLoc, CalleeName);
				} else {
				ContextNode = ContextNode->getChildContext(CallSiteLoc, CalleeName);
				}
				CallSiteLoc = NextCallSiteLoc;
				}

				assert((!AllowCreate \|\| ContextNode) &&
				"Node must exist if creation is allowed");
				return ContextNode;
				}

				ContextTrieNode *SampleContextTracker::getTopLevelContextNode(StringRef FName) {
				wmiUnsubmitted Done Reply Inline Actions Add an assertion message. wmi: Add an assertion message.
				return RootContext.getChildContext(LineLocation(0, 0), FName);
				}

				ContextTrieNode &SampleContextTracker::addTopLevelContextNode(StringRef FName) {
				assert(!getTopLevelContextNode(FName) && "Node to add must not exist");
				return *RootContext.getOrCreateChildContext(LineLocation(0, 0), FName);
				}

				void SampleContextTracker::mergeContextNode(ContextTrieNode &FromNode,
				wmiUnsubmitted Done Reply Inline Actions Add an assertion message. wmi: Add an assertion message.
				ContextTrieNode &ToNode,
				StringRef ContextStrToRemove) {
				FunctionSamples *FromSamples = FromNode.getFunctionSamples();
				FunctionSamples *ToSamples = ToNode.getFunctionSamples();
				if (FromSamples && ToSamples) {
				// Merge/duplicate FromSamples into ToSamples
				ToSamples->merge(*FromSamples);
				wmiUnsubmitted Done Reply Inline Actions Add assertion message. wmi: Add assertion message.
				ToSamples->getContext().setState(SyntheticContext);
				FromSamples->getContext().setState(MergedContext);
				} else if (FromSamples) {
				// Transfer FromSamples from FromNode to ToNode
				ToNode.setFunctionSamples(FromSamples);
				FromSamples->getContext().setState(SyntheticContext);
				FromSamples->getContext().promoteOnPath(ContextStrToRemove);
				FromNode.setFunctionSamples(nullptr);
				}
				wmiUnsubmitted Done Reply Inline Actions Use OldCallSiteLoc instead? wmi: Use OldCallSiteLoc instead?
				}

				ContextTrieNode &SampleContextTracker::promoteMergeContextSamplesTree(
				ContextTrieNode &FromNode, ContextTrieNode &ToNodeParent,
				StringRef ContextStrToRemove) {
				assert(!ContextStrToRemove.empty() && "Context to remove can't be empty");

				// Ignore call site location if destination is top level under root
				LineLocation NewCallSiteLoc = LineLocation(0, 0);
				LineLocation OldCallSiteLoc = FromNode.getCallSiteLoc();
				ContextTrieNode &FromNodeParent = *FromNode.getParentContext();
				ContextTrieNode *ToNode = nullptr;
				bool MoveToRoot = (&ToNodeParent == &RootContext);
				if (!MoveToRoot) {
				NewCallSiteLoc = OldCallSiteLoc;
				}

				// Locate destination node, create/move if not existing
				ToNode = ToNodeParent.getChildContext(NewCallSiteLoc, FromNode.getFuncName());
				if (!ToNode) {
				// Do not delete node to move from its parent here because
				// caller is iterating over children of that parent node.
				ToNode = &ToNodeParent.moveToChildContext(
				NewCallSiteLoc, std::move(FromNode), ContextStrToRemove, false);
				} else {
				// Destination node exists, merge samples for the context tree
				mergeContextNode(FromNode, *ToNode, ContextStrToRemove);
				LLVM_DEBUG(dbgs() << " Context promoted and merged to: "
				wmiUnsubmitted Done Reply Inline Actions It will be slightly easier to read if the block can be extracted to a function. wmi: It will be slightly easier to read if the block can be extracted to a function.
				<< ToNode->getFunctionSamples()->getContext() << "\n");

				// Recursively promote and merge children
				for (auto &It : FromNode.getAllChildContext()) {
				ContextTrieNode &FromChildNode = It.second;
				promoteMergeContextSamplesTree(FromChildNode, *ToNode,
				ContextStrToRemove);
				}

				// Remove children once they're all merged
				FromNode.getAllChildContext().clear();
				}

				// For root of subtree, remove itself from old parent too
				if (MoveToRoot)
				FromNodeParent.removeChildContext(OldCallSiteLoc, ToNode->getFuncName());

				return *ToNode;
				}

				} // namespace llvm

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/GenericDomTree.h"		#include "llvm/Support/GenericDomTree.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
		#include "llvm/Transforms/IPO/SampleContextTracker.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Utils/CallPromotionUtils.h"		#include "llvm/Transforms/Utils/CallPromotionUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <limits>		#include <limits>
▲ Show 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	protected:
/// Successors for each basic block in the CFG.		/// Successors for each basic block in the CFG.
BlockEdgeMap Successors;		BlockEdgeMap Successors;

SampleCoverageTracker CoverageTracker;		SampleCoverageTracker CoverageTracker;

/// Profile reader object.		/// Profile reader object.
std::unique_ptr<SampleProfileReader> Reader;		std::unique_ptr<SampleProfileReader> Reader;

		/// Profile tracker for different context.
		std::unique_ptr<SampleContextTracker> ContextTracker;

/// Samples collected for the body of this function.		/// Samples collected for the body of this function.
FunctionSamples *Samples = nullptr;		FunctionSamples *Samples = nullptr;

/// Name of the profile file to load.		/// Name of the profile file to load.
std::string Filename;		std::string Filename;

/// Name of the profile remapping file to load.		/// Name of the profile remapping file to load.
std::string RemappingFilename;		std::string RemappingFilename;

/// Flag indicating whether the profile input loaded successfully.		/// Flag indicating whether the profile input loaded successfully.
bool ProfileIsValid = false;		bool ProfileIsValid = false;

		/// Flag indicating whether input profile is context-sensitive
		bool ProfileIsCS = false;

/// Flag indicating if the pass is invoked in ThinLTO compile phase.		/// Flag indicating if the pass is invoked in ThinLTO compile phase.
///		///
/// In this phase, in annotation, we should not promote indirect calls.		/// In this phase, in annotation, we should not promote indirect calls.
/// Instead, we will mark GUIDs that needs to be annotated to the function.		/// Instead, we will mark GUIDs that needs to be annotated to the function.
bool IsThinLTOPreLink;		bool IsThinLTOPreLink;

/// Profile Summary Info computed from sample profile.		/// Profile Summary Info computed from sample profile.
ProfileSummaryInfo *PSI = nullptr;		ProfileSummaryInfo *PSI = nullptr;
▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	ErrorOr<uint64_t> SampleProfileLoader::getInstWeight(const Instruction &Inst) {
// the residing basic block, thus we ignore them during annotation.		// the residing basic block, thus we ignore them during annotation.
if (isa<BranchInst>(Inst) \|\| isa<IntrinsicInst>(Inst) \|\| isa<PHINode>(Inst))		if (isa<BranchInst>(Inst) \|\| isa<IntrinsicInst>(Inst) \|\| isa<PHINode>(Inst))
return std::error_code();		return std::error_code();

// If a direct call/invoke instruction is inlined in profile		// If a direct call/invoke instruction is inlined in profile
// (findCalleeFunctionSamples returns non-empty result), but not inlined here,		// (findCalleeFunctionSamples returns non-empty result), but not inlined here,
// it means that the inlined callsite has no sample, thus the call		// it means that the inlined callsite has no sample, thus the call
// instruction should have 0 count.		// instruction should have 0 count.
if (auto *CB = dyn_cast<CallBase>(&Inst))		if (!ProfileIsCS)
		if (const auto *CB = dyn_cast<CallBase>(&Inst))
if (!CB->isIndirectCall() && findCalleeFunctionSamples(*CB))		if (!CB->isIndirectCall() && findCalleeFunctionSamples(*CB))
return 0;		return 0;

const DILocation *DIL = DLoc;		const DILocation *DIL = DLoc;
uint32_t LineOffset = FunctionSamples::getOffset(DIL);		uint32_t LineOffset = FunctionSamples::getOffset(DIL);
uint32_t Discriminator = DIL->getBaseDiscriminator();		uint32_t Discriminator = DIL->getBaseDiscriminator();
ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);		ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);
if (R) {		if (R) {
bool FirstMark =		bool FirstMark =
CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());		CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
SampleProfileLoader::findCalleeFunctionSamples(const CallBase &Inst) const {		SampleProfileLoader::findCalleeFunctionSamples(const CallBase &Inst) const {
const DILocation *DIL = Inst.getDebugLoc();		const DILocation *DIL = Inst.getDebugLoc();
if (!DIL) {		if (!DIL) {
return nullptr;		return nullptr;
}		}

StringRef CalleeName;		StringRef CalleeName;
if (Function *Callee = Inst.getCalledFunction())		if (Function *Callee = Inst.getCalledFunction())
CalleeName = Callee->getName();		CalleeName = FunctionSamples::getCanonicalFnName(*Callee);

		if (ProfileIsCS)
		return ContextTracker->getCalleeContextSamplesFor(Inst, CalleeName);

const FunctionSamples *FS = findFunctionSamples(Inst);		const FunctionSamples *FS = findFunctionSamples(Inst);
if (FS == nullptr)		if (FS == nullptr)
return nullptr;		return nullptr;

return FS->findFunctionSamplesAt(LineLocation(FunctionSamples::getOffset(DIL),		return FS->findFunctionSamplesAt(LineLocation(FunctionSamples::getOffset(DIL),
DIL->getBaseDiscriminator()),		DIL->getBaseDiscriminator()),
CalleeName, Reader->getRemapper());		CalleeName, Reader->getRemapper());
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
/// \returns the FunctionSamples pointer to the inlined instance.		/// \returns the FunctionSamples pointer to the inlined instance.
const FunctionSamples *		const FunctionSamples *
SampleProfileLoader::findFunctionSamples(const Instruction &Inst) const {		SampleProfileLoader::findFunctionSamples(const Instruction &Inst) const {
const DILocation *DIL = Inst.getDebugLoc();		const DILocation *DIL = Inst.getDebugLoc();
if (!DIL)		if (!DIL)
return Samples;		return Samples;

auto it = DILocation2SampleMap.try_emplace(DIL,nullptr);		auto it = DILocation2SampleMap.try_emplace(DIL,nullptr);
if (it.second)		if (it.second) {
it.first->second = Samples->findFunctionSamples(DIL, Reader->getRemapper());		if (ProfileIsCS)
		it.first->second = ContextTracker->getContextSamplesFor(DIL);
		else
		it.first->second =
		Samples->findFunctionSamples(DIL, Reader->getRemapper());
		}
return it.first->second;		return it.first->second;
}		}

bool SampleProfileLoader::inlineCallInstruction(CallBase &CB) {		bool SampleProfileLoader::inlineCallInstruction(CallBase &CB) {
if (ExternalInlineAdvisor) {		if (ExternalInlineAdvisor) {
auto Advice = ExternalInlineAdvisor->getAdvice(CB);		auto Advice = ExternalInlineAdvisor->getAdvice(CB);
if (!Advice->isInliningRecommended()) {		if (!Advice->isInliningRecommended()) {
Advice->recordUnattemptedInlining();		Advice->recordUnattemptedInlining();
Show All 38 Lines	bool SampleProfileLoader::shouldInlineColdCallee(CallBase &CallInst) {

Function *Callee = CallInst.getCalledFunction();		Function *Callee = CallInst.getCalledFunction();
if (Callee == nullptr)		if (Callee == nullptr)
return false;		return false;

InlineCost Cost = getInlineCost(CallInst, getInlineParams(), GetTTI(*Callee),		InlineCost Cost = getInlineCost(CallInst, getInlineParams(), GetTTI(*Callee),
GetAC, GetTLI);		GetAC, GetTLI);

		if (Cost.isNever())
		return false;

		if (Cost.isAlways())
		return true;

return Cost.getCost() <= SampleColdCallSiteThreshold;		return Cost.getCost() <= SampleColdCallSiteThreshold;
}		}

void SampleProfileLoader::emitOptimizationRemarksForInlineCandidates(		void SampleProfileLoader::emitOptimizationRemarksForInlineCandidates(
const SmallVectorImpl<CallBase *> &Candidates, const Function &F,		const SmallVectorImpl<CallBase *> &Candidates, const Function &F,
bool Hot) {		bool Hot) {
for (auto I : Candidates) {		for (auto I : Candidates) {
Function *CalledFunction = I->getCalledFunction();		Function *CalledFunction = I->getCalledFunction();
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	for (auto &BB : F) {
SmallVector<CallBase *, 10> ColdCandidates;		SmallVector<CallBase *, 10> ColdCandidates;
for (auto &I : BB.getInstList()) {		for (auto &I : BB.getInstList()) {
const FunctionSamples *FS = nullptr;		const FunctionSamples *FS = nullptr;
if (auto *CB = dyn_cast<CallBase>(&I)) {		if (auto *CB = dyn_cast<CallBase>(&I)) {
if (!isa<IntrinsicInst>(I) && (FS = findCalleeFunctionSamples(*CB))) {		if (!isa<IntrinsicInst>(I) && (FS = findCalleeFunctionSamples(*CB))) {
assert((!FunctionSamples::UseMD5 \|\| FS->GUIDToFuncNameMap) &&		assert((!FunctionSamples::UseMD5 \|\| FS->GUIDToFuncNameMap) &&
"GUIDToFuncNameMap has to be populated");		"GUIDToFuncNameMap has to be populated");
AllCandidates.push_back(CB);		AllCandidates.push_back(CB);
if (FS->getEntrySamples() > 0)		if (FS->getEntrySamples() > 0 \|\| ProfileIsCS)
localNotInlinedCallSites.try_emplace(CB, FS);		localNotInlinedCallSites.try_emplace(CB, FS);
if (callsiteIsHot(FS, PSI))		if (callsiteIsHot(FS, PSI))
Hot = true;		Hot = true;
else if (shouldInlineColdCallee(*CB))		else if (shouldInlineColdCallee(*CB))
ColdCandidates.push_back(CB);		ColdCandidates.push_back(CB);
}		}
}		}
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (CallBase *I : CIS) {
uint64_t C = FS->getEntrySamples();		uint64_t C = FS->getEntrySamples();
auto &DI =		auto &DI =
pgo::promoteIndirectCall(*I, R->getValue(), C, Sum, false, ORE);		pgo::promoteIndirectCall(*I, R->getValue(), C, Sum, false, ORE);
Sum -= C;		Sum -= C;
PromotedInsns.insert(I);		PromotedInsns.insert(I);
// If profile mismatches, we should not attempt to inline DI.		// If profile mismatches, we should not attempt to inline DI.
if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&		if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&
inlineCallInstruction(cast<CallBase>(DI))) {		inlineCallInstruction(cast<CallBase>(DI))) {
		if (ProfileIsCS)
		ContextTracker->markContextSamplesInlined(FS);
localNotInlinedCallSites.erase(I);		localNotInlinedCallSites.erase(I);
LocalChanged = true;		LocalChanged = true;
++NumCSInlined;		++NumCSInlined;
}		}
} else {		} else {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "\nFailed to promote indirect call to "		<< "\nFailed to promote indirect call to "
<< CalleeFunctionName << " because " << Reason << "\n");		<< CalleeFunctionName << " because " << Reason << "\n");
}		}
}		}
} else if (CalledFunction && CalledFunction->getSubprogram() &&		} else if (CalledFunction && CalledFunction->getSubprogram() &&
!CalledFunction->isDeclaration()) {		!CalledFunction->isDeclaration()) {
if (inlineCallInstruction(*I)) {		if (inlineCallInstruction(*I)) {
		if (ProfileIsCS)
		ContextTracker->markContextSamplesInlined(
		localNotInlinedCallSites[I]);
localNotInlinedCallSites.erase(I);		localNotInlinedCallSites.erase(I);
LocalChanged = true;		LocalChanged = true;
++NumCSInlined;		++NumCSInlined;
}		}
} else if (IsThinLTOPreLink) {		} else if (IsThinLTOPreLink) {
findCalleeFunctionSamples(*I)->findInlinedFunctions(		findCalleeFunctionSamples(*I)->findInlinedFunctions(
InlinedGUIDs, F.getParent(), PSI->getOrCompHotCountThreshold());		InlinedGUIDs, F.getParent(), PSI->getOrCompHotCountThreshold());
}		}
▲ Show 20 Lines • Show All 771 Lines • ▼ Show 20 Lines	bool SampleProfileLoader::doInitialization(Module &M,

if (FAM && !ProfileInlineReplayFile.empty()) {		if (FAM && !ProfileInlineReplayFile.empty()) {
ExternalInlineAdvisor = std::make_unique<ReplayInlineAdvisor>(		ExternalInlineAdvisor = std::make_unique<ReplayInlineAdvisor>(
*FAM, Ctx, ProfileInlineReplayFile);		*FAM, Ctx, ProfileInlineReplayFile);
if (!ExternalInlineAdvisor->areReplayRemarksLoaded())		if (!ExternalInlineAdvisor->areReplayRemarksLoaded())
ExternalInlineAdvisor.reset();		ExternalInlineAdvisor.reset();
}		}

		// Apply tweaks if context-sensitive profile is available.
		if (Reader->profileIsCS()) {
		ProfileIsCS = true;
		FunctionSamples::ProfileIsCS = true;

		// Tracker for profiles under different context
		ContextTracker =
		wmiUnsubmitted Done Reply Inline Actions Here it means no any profile loading or just no CS profile? ThinLTO thinlink phase needs to know which functions are hot and it can import them, so profile information is needed in ThinLTO prelink. wmi: Here it means no any profile loading or just no CS profile? ThinLTO thinlink phase needs to…
		wenleiAuthorUnsubmitted Done Reply Inline Actions Oops, we don't have this change now, forgot to remove when upstreaming. And you're right, we need to load profile for thinlto so thinlink importing can be profile guided. Thanks for catch this. wenlei: Oops, we don't have this change now, forgot to remove when upstreaming. And you're right, we…
		std::make_unique<SampleContextTracker>(Reader->getProfiles());
		}

return true;		return true;
}		}

ModulePass *llvm::createSampleProfileLoaderPass() {		ModulePass *llvm::createSampleProfileLoaderPass() {
return new SampleProfileLoaderLegacyPass();		return new SampleProfileLoaderLegacyPass();
}		}

ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {		ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
bool retval = false;		bool retval = false;
for (auto F : buildFunctionOrder(M, CG)) {		for (auto F : buildFunctionOrder(M, CG)) {
assert(!F->isDeclaration());		assert(!F->isDeclaration());
clearFunctionData();		clearFunctionData();
retval \|= runOnFunction(*F, AM);		retval \|= runOnFunction(*F, AM);
}		}

// Account for cold calls not inlined....		// Account for cold calls not inlined....
		if (!ProfileIsCS)
for (const std::pair<Function *, NotInlinedProfileInfo> &pair :		for (const std::pair<Function *, NotInlinedProfileInfo> &pair :
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'pair' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'pair' [readability-identifier-naming]…
notInlinedCallInfo)		notInlinedCallInfo)
updateProfileCallee(pair.first, pair.second.entryCount);		updateProfileCallee(pair.first, pair.second.entryCount);

return retval;		return retval;
}		}

bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {		bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {
ACT = &getAnalysis<AssumptionCacheTracker>();		ACT = &getAnalysis<AssumptionCacheTracker>();
TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();		TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();
TLIWP = &getAnalysis<TargetLibraryInfoWrapperPass>();		TLIWP = &getAnalysis<TargetLibraryInfoWrapperPass>();
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);		return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);
}		}

bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {		bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {

DILocation2SampleMap.clear();		DILocation2SampleMap.clear();
// By default the entry count is initialized to -1, which will be treated		// By default the entry count is initialized to -1, which will be treated
// conservatively by getEntryCount as the same as unknown (None). This is		// conservatively by getEntryCount as the same as unknown (None). This is
// to avoid newly added code to be treated as cold. If we have samples		// to avoid newly added code to be treated as cold. If we have samples
// this will be overwritten in emitAnnotations.		// this will be overwritten in emitAnnotations.
uint64_t initialEntryCount = -1;		uint64_t initialEntryCount = -1;

ProfAccForSymsInList = ProfileAccurateForSymsInList && PSL;		ProfAccForSymsInList = ProfileAccurateForSymsInList && PSL;
Show All 36 Lines	if (AM) {
auto &FAM =		auto &FAM =
AM->getResult<FunctionAnalysisManagerModuleProxy>(*F.getParent())		AM->getResult<FunctionAnalysisManagerModuleProxy>(*F.getParent())
.getManager();		.getManager();
ORE = &FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);		ORE = &FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
} else {		} else {
OwnedORE = std::make_unique<OptimizationRemarkEmitter>(&F);		OwnedORE = std::make_unique<OptimizationRemarkEmitter>(&F);
ORE = OwnedORE.get();		ORE = OwnedORE.get();
}		}

		if (ProfileIsCS)
		Samples = ContextTracker->getBaseSamplesFor(F);
		else
Samples = Reader->getSamplesFor(F);		Samples = Reader->getSamplesFor(F);

if (Samples && !Samples->empty())		if (Samples && !Samples->empty())
return emitAnnotations(F);		return emitAnnotations(F);
return false;		return false;
}		}

PreservedAnalyses SampleProfileLoaderPass::run(Module &M,		PreservedAnalyses SampleProfileLoaderPass::run(Module &M,
ModuleAnalysisManager &AM) {		ModuleAnalysisManager &AM) {
FunctionAnalysisManager &FAM =		FunctionAnalysisManager &FAM =
Show All 28 Lines

llvm/test/Transforms/SampleProfile/Inputs/profile-context-tracker.prof

This file was added.

				[main:3 @ _Z5funcAi:1 @ _Z8funcLeafi]:1467299:11
				0: 6
				1: 6
				3: 287884
				4: 287864 _Z3fibi:315608
				15: 23
				[main:3.1 @ _Z5funcBi:1 @ _Z8funcLeafi]:500853:20
				0: 15
				1: 15
				3: 74946
				4: 74941 _Z3fibi:82359
				10: 23324
				11: 23327 _Z3fibi:25228
				15: 11
				[main]:154:0
				2: 12
				3: 18 _Z5funcAi:11
				3.1: 18 _Z5funcBi:19
				[external:12 @ main]:154:12
				2: 12
				3: 10 _Z5funcAi:7
				3.1: 10 _Z5funcBi:11
				[main:3.1 @ _Z5funcBi]:120:19
				0: 19
				1: 19 _Z8funcLeafi:20
				3: 12
				[externalA:17 @ _Z5funcBi]:120:3
				0: 3
				wmiUnsubmitted Done Reply Inline Actions Here "main" doesn't show up in the context. Is it a problem of unwinding or debug info? wmi: Here "main" doesn't show up in the context. Is it a problem of unwinding or debug info?
				wenleiAuthorUnsubmitted Done Reply Inline Actions This is an artificial context to simulate the case where funcB is also called from external functions to current module (compile time profile loader's case), and we merge context involving external caller correctly. Real profile for this case doesn't have problem in capturing the correct context. wenlei: This is an artificial context to simulate the case where funcB is also called from external…
				1: 3
				[external:10 @ _Z5funcBi]:120:10
				0: 10
				1: 10
				[main:3 @ _Z5funcAi]:99:11
				0: 10
				1: 10 _Z8funcLeafi:11
				3: 24

llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll

This file was added.

				; REQUIRES: asserts
				; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly
				; based on inline decision, so post inline counts are accurate.

				wmiUnsubmitted Done Reply Inline Actions Is there anything which cannot be tested in profile-context-tracker.ll? The debug message is usually used as last resort if something cannot be fully tested by just checking IR. wmi: Is there anything which cannot be tested in profile-context-tracker.ll? The debug message is…
				wenleiAuthorUnsubmitted Done Reply Inline Actions I added this hoping to make it easier to reason about the internals operations/state of context tracker, and also capture any unintended subtle change in context tracking. But if we look at end result of IR, the non-debug test should be able to cover it as good. I can remove this one if you think that's better. wenlei: I added this hoping to make it easier to reason about the internals operations/state of context…
				wmiUnsubmitted Done Reply Inline Actions If there could be unintended subtle change which cannot be caught by the non-debug test, we can keep it. Just make sure the debug messages used in CHECK are all necessary in terms of ensuring the result we are expecting to see. wmi: If there could be unintended subtle change which cannot be caught by the non-debug test, we can…
				wenleiAuthorUnsubmitted Done Reply Inline Actions Ok, I'll keep it then. We want to make sure context tracker is doing exactly what it has to do (and checking on inlining alone may not be strong enough). wenlei: Ok, I'll keep it then. We want to make sure context tracker is doing exactly what it has to do…
				; Note that we need new pass manager to enable top-down processing for sample profile loader
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-ALL
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -debug-only=sample-context-tracker -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INLINE-HOT


				; Testwe we inlined the following in top-down order and promot rest not inlined context profile into base profile
				; main:3 @ _Z5funcAi
				; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
				; _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-ALL: Getting base profile for function: main
				; INLINE-ALL-NEXT: Merging context profile into base profile: main
				; INLINE-ALL-NEXT: Found context tree root to promote: external:12 @ main
				; INLINE-ALL-NEXT: Context promoted and merged to: main
				; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
				; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi
				; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi
				; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi(
				; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z8funcLeafi
				; INLINE-ALL-NEXT: Callee context found: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-ALL-NEXT: Marking context profile as inlined: main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi
				; INLINE-ALL-NEXT: Callee context found: main:3.1 @ _Z5funcBi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call.i1 = tail call i32 @_Z3fibi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
				; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcAi
				; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcAi
				; INLINE-ALL-NEXT: Getting base profile for function: _Z5funcBi
				; INLINE-ALL-NEXT: Merging context profile into base profile: _Z5funcBi
				; INLINE-ALL-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi
				; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi
				; INLINE-ALL-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi
				; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi
				; INLINE-ALL-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-ALL-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi
				; INLINE-ALL-NEXT: Context promoted and merged to: _Z5funcBi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi
				; INLINE-ALL-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-ALL-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi
				; INLINE-ALL-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi
				; INLINE-ALL-NEXT: Getting base profile for function: _Z8funcLeafi
				; INLINE-ALL-NEXT: Merging context profile into base profile: _Z8funcLeafi

				; Testwe we inlined the following in top-down order and promot rest not inlined context profile into base profile
				; main:3 @ _Z5funcAi
				; _Z5funcAi:1 @ _Z8funcLeafi
				; _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-HOT: Getting base profile for function: main
				; INLINE-HOT-NEXT: Merging context profile into base profile: main
				; INLINE-HOT-NEXT: Found context tree root to promote: external:12 @ main
				; INLINE-HOT-NEXT: Context promoted and merged to: main
				; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !58
				; INLINE-HOT-NEXT: Callee context found: main:3.1 @ _Z5funcBi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !63
				; INLINE-HOT-NEXT: Callee context found: main:3 @ _Z5funcAi
				; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcAi
				; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcAi
				; INLINE-HOT-NEXT: Found context tree root to promote: main:3 @ _Z5funcAi
				; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi
				; INLINE-HOT-NEXT: Context promoted to: _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !50
				; INLINE-HOT-NEXT: Callee context found: _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcAi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62
				; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69
				; INLINE-HOT-NEXT: Getting base profile for function: _Z5funcBi
				; INLINE-HOT-NEXT: Merging context profile into base profile: _Z5funcBi
				; INLINE-HOT-NEXT: Found context tree root to promote: external:10 @ _Z5funcBi
				; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi
				; INLINE-HOT-NEXT: Found context tree root to promote: main:3.1 @ _Z5funcBi
				; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi
				; INLINE-HOT-NEXT: Context promoted to: _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Found context tree root to promote: externalA:17 @ _Z5funcBi
				; INLINE-HOT-NEXT: Context promoted and merged to: _Z5funcBi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !50
				; INLINE-HOT-NEXT: Callee context found: _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Marking context profile as inlined: _Z5funcBi:1 @ _Z8funcLeafi
				; INLINE-HOT-NEXT: Getting callee context for instr: %call.i = tail call i32 @_Z3fibi(i32 %tmp.i) #2, !dbg !62
				; INLINE-HOT-NEXT: Getting callee context for instr: %call5.i = tail call i32 @_Z3fibi(i32 %tmp1.i) #2, !dbg !69
				; INLINE-HOT-NEXT: Getting base profile for function: _Z8funcLeafi
				; INLINE-HOT-NEXT: Merging context profile into base profile: _Z8funcLeafi


				@factor = dso_local global i32 3, align 4, !dbg !0

				define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
				entry:
				br label %for.body, !dbg !25

				for.cond.cleanup: ; preds = %for.body
				ret i32 %add3, !dbg !27

				for.body: ; preds = %for.body, %entry
				%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]
				%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]
				%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32
				%add = add nuw nsw i32 %x.011, 1, !dbg !31
				%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28
				%add2 = add i32 %call, %r.010, !dbg !34
				%add3 = add i32 %add2, %call1, !dbg !35
				%dec = add nsw i32 %x.011, -1, !dbg !36
				%cmp = icmp eq i32 %x.011, 0, !dbg !38
				br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25
				}

				define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #1 !dbg !40 {
				entry:
				%add = add nsw i32 %x, 100000, !dbg !44
				%call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !45
				ret i32 %call, !dbg !46
				}

				define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {
				entry:
				%cmp = icmp sgt i32 %x, 0, !dbg !57
				br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59

				while.cond2.preheader: ; preds = %entry
				%cmp313 = icmp slt i32 %x, 0, !dbg !60
				br i1 %cmp313, label %while.body4, label %if.end, !dbg !63

				while.body: ; preds = %while.body, %entry
				%x.addr.016 = phi i32 [ %sub, %while.body ], [ %x, %entry ]
				%tmp = load volatile i32, i32* @factor, align 4, !dbg !64
				%call = tail call i32 @_Z3fibi(i32 %tmp), !dbg !67
				%sub = sub nsw i32 %x.addr.016, %call, !dbg !68
				%cmp1 = icmp sgt i32 %sub, 0, !dbg !69
				br i1 %cmp1, label %while.body, label %if.end, !dbg !71

				while.body4: ; preds = %while.body4, %while.cond2.preheader
				%x.addr.114 = phi i32 [ %add, %while.body4 ], [ %x, %while.cond2.preheader ]
				%tmp1 = load volatile i32, i32* @factor, align 4, !dbg !72
				%call5 = tail call i32 @_Z3fibi(i32 %tmp1), !dbg !74
				%add = add nsw i32 %call5, %x.addr.114, !dbg !75
				%cmp3 = icmp slt i32 %add, 0, !dbg !60
				br i1 %cmp3, label %while.body4, label %if.end, !dbg !63

				if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader
				%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]
				ret i32 %x.addr.2, !dbg !76
				}

				define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {
				entry:
				%sub = add nsw i32 %x, -100000, !dbg !51
				%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52
				ret i32 %call, !dbg !53
				}

				declare i32 @_Z3fibi(i32)

				attributes #0 = { nofree noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }
				attributes #1 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!14, !15, !16}
				!llvm.ident = !{!17}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "factor", scope: !2, file: !3, line: 21, type: !13, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, producer: "clang version 11.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !5, globals: !12, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!3 = !DIFile(filename: "merged.cpp", directory: "/local/autofdo")
				!4 = !{}
				!5 = !{!6, !10, !11}
				!6 = !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 6, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!7 = !DISubroutineType(types: !8)
				!8 = !{!9, !9}
				!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!10 = !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 7, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!11 = !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 22, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!12 = !{!0}
				!13 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !9)
				!14 = !{i32 7, !"Dwarf Version", i32 4}
				!15 = !{i32 2, !"Debug Info Version", i32 3}
				!16 = !{i32 1, !"wchar_size", i32 4}
				!17 = !{!"clang version 11.0.0"}
				!18 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 11, type: !19, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !21)
				!19 = !DISubroutineType(types: !20)
				!20 = !{!9}
				!21 = !{!22, !23}
				!22 = !DILocalVariable(name: "r", scope: !18, file: !3, line: 12, type: !9)
				!23 = !DILocalVariable(name: "x", scope: !24, file: !3, line: 13, type: !9)
				!24 = distinct !DILexicalBlock(scope: !18, file: !3, line: 13, column: 3)
				!25 = !DILocation(line: 13, column: 3, scope: !26)
				!26 = !DILexicalBlockFile(scope: !24, file: !3, discriminator: 2)
				!27 = !DILocation(line: 17, column: 3, scope: !18)
				!28 = !DILocation(line: 14, column: 10, scope: !29)
				!29 = distinct !DILexicalBlock(scope: !30, file: !3, line: 13, column: 37)
				!30 = distinct !DILexicalBlock(scope: !24, file: !3, line: 13, column: 3)
				!31 = !DILocation(line: 14, column: 29, scope: !29)
				!32 = !DILocation(line: 14, column: 21, scope: !33)
				!33 = !DILexicalBlockFile(scope: !29, file: !3, discriminator: 2)
				!34 = !DILocation(line: 14, column: 19, scope: !29)
				!35 = !DILocation(line: 14, column: 7, scope: !29)
				!36 = !DILocation(line: 13, column: 33, scope: !37)
				!37 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 6)
				!38 = !DILocation(line: 13, column: 26, scope: !39)
				!39 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 2)
				!40 = distinct !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 26, type: !7, scopeLine: 26, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!44 = !DILocation(line: 27, column: 22, scope: !40)
				!45 = !DILocation(line: 27, column: 11, scope: !40)
				!46 = !DILocation(line: 29, column: 3, scope: !40)
				!47 = distinct !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 32, type: !7, scopeLine: 32, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!51 = !DILocation(line: 33, column: 22, scope: !47)
				!52 = !DILocation(line: 33, column: 11, scope: !47)
				!53 = !DILocation(line: 35, column: 3, scope: !47)
				!54 = distinct !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 48, type: !7, scopeLine: 48, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!57 = !DILocation(line: 49, column: 9, scope: !58)
				!58 = distinct !DILexicalBlock(scope: !54, file: !3, line: 49, column: 7)
				!59 = !DILocation(line: 49, column: 7, scope: !54)
				!60 = !DILocation(line: 58, column: 14, scope: !61)
				!61 = !DILexicalBlockFile(scope: !62, file: !3, discriminator: 2)
				!62 = distinct !DILexicalBlock(scope: !58, file: !3, line: 56, column: 8)
				!63 = !DILocation(line: 58, column: 5, scope: !61)
				!64 = !DILocation(line: 52, column: 16, scope: !65)
				!65 = distinct !DILexicalBlock(scope: !66, file: !3, line: 51, column: 19)
				!66 = distinct !DILexicalBlock(scope: !58, file: !3, line: 49, column: 14)
				!67 = !DILocation(line: 52, column: 12, scope: !65)
				!68 = !DILocation(line: 52, column: 9, scope: !65)
				!69 = !DILocation(line: 51, column: 14, scope: !70)
				!70 = !DILexicalBlockFile(scope: !66, file: !3, discriminator: 2)
				!71 = !DILocation(line: 51, column: 5, scope: !70)
				!72 = !DILocation(line: 59, column: 16, scope: !73)
				!73 = distinct !DILexicalBlock(scope: !62, file: !3, line: 58, column: 19)
				!74 = !DILocation(line: 59, column: 12, scope: !73)
				!75 = !DILocation(line: 59, column: 9, scope: !73)
				!76 = !DILocation(line: 63, column: 3, scope: !54)

llvm/test/Transforms/SampleProfile/profile-context-tracker.ll

This file was added.

				; Test for CSSPGO's SampleContextTracker to make sure context profile tree is promoted and merged properly
				; based on inline decision, so post inline counts are accurate.

				; Note that we need new pass manager to enable top-down processing for sample profile loader
				; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile
				; main:3 @ _Z5funcAi
				; main:3 @ _Z5funcAi:1 @ _Z8funcLeafi
				; _Z5funcBi:1 @ _Z8funcLeafi
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -sample-profile-inline-size -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-ALL

				; Testwe we inlined the following in top-down order and entry counts accurate reflects post-inline base profile
				; main:3 @ _Z5funcAi
				; _Z5funcAi:1 @ _Z8funcLeafi
				; _Z5funcBi:1 @ _Z8funcLeafi
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/profile-context-tracker.prof -profile-sample-accurate -S \| FileCheck %s --check-prefix=INLINE-HOT


				@factor = dso_local global i32 3, align 4, !dbg !0

				define dso_local i32 @main() local_unnamed_addr #0 !dbg !18 {
				; INLINE-ALL: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
				; INLINE-HOT: @main{{.*}}!prof ![[MAIN_PROF:[0-9]+]]
				entry:
				br label %for.body, !dbg !25

				for.cond.cleanup: ; preds = %for.body
				ret i32 %add3, !dbg !27

				for.body: ; preds = %for.body, %entry
				%x.011 = phi i32 [ 300000, %entry ], [ %dec, %for.body ]
				%r.010 = phi i32 [ 0, %entry ], [ %add3, %for.body ]
				%call = tail call i32 @_Z5funcBi(i32 %x.011), !dbg !32
				; _Z5funcBi is marked noinline
				; INLINE-ALL: call i32 @_Z5funcBi
				; INLINE-HOT: call i32 @_Z5funcBi
				%add = add nuw nsw i32 %x.011, 1, !dbg !31
				%call1 = tail call i32 @_Z5funcAi(i32 %add), !dbg !28
				; INLINE-ALL-NOT: call i32 @_Z5funcAi
				; INLINE-HOT: call i32 @_Z5funcAi
				%add2 = add i32 %call, %r.010, !dbg !34
				%add3 = add i32 %add2, %call1, !dbg !35
				%dec = add nsw i32 %x.011, -1, !dbg !36
				%cmp = icmp eq i32 %x.011, 0, !dbg !38
				br i1 %cmp, label %for.cond.cleanup, label %for.body, !dbg !25
				}

				define dso_local i32 @_Z5funcAi(i32 %x) local_unnamed_addr #1 !dbg !40 {
				; _Z5funcAi is inlined, so outline remainder should have zero counts
				; INLINE-ALL: @_Z5funcAi{{.*}}!prof ![[FUNCA_PROF:[0-9]+]]
				; INLINE-HOT: @_Z5funcAi{{.*}}!prof ![[FUNCA_PROF:[0-9]+]]
				entry:
				%add = add nsw i32 %x, 100000, !dbg !44
				; _Z8funcLeafi is already inlined on main->_Z5funcAi->_Z8funcLeafi,
				; so it should not be inlined on _Z5funcAi->_Z8funcLeafi based on updated
				; (merged and promoted) context profile
				; INLINE-ALL: call i32 @_Z8funcLeafi
				; INLINE-HOT-NOT: call i32 @_Z8funcLeafi
				%call = tail call i32 @_Z8funcLeafi(i32 %add), !dbg !45
				ret i32 %call, !dbg !46
				}

				define dso_local i32 @_Z8funcLeafi(i32 %x) local_unnamed_addr #1 !dbg !54 {
				; main->_Z5funcAi->_Z8funcLeafi is inlined, and _Z5funcBi->_Z8funcLeafi is also
				; inlined, so outline remainder should have empty profile
				; INLINE-ALL: @_Z8funcLeafi{{.*}}!prof ![[LEAF_PROF:[0-9]+]]
				; INLINE-HOT: @_Z8funcLeafi{{.*}}!prof ![[LEAF_PROF:[0-9]+]]
				entry:
				%cmp = icmp sgt i32 %x, 0, !dbg !57
				br i1 %cmp, label %while.body, label %while.cond2.preheader, !dbg !59

				while.cond2.preheader: ; preds = %entry
				%cmp313 = icmp slt i32 %x, 0, !dbg !60
				br i1 %cmp313, label %while.body4, label %if.end, !dbg !63

				while.body: ; preds = %while.body, %entry
				%x.addr.016 = phi i32 [ %sub, %while.body ], [ %x, %entry ]
				%tmp = load volatile i32, i32* @factor, align 4, !dbg !64
				%call = tail call i32 @_Z3fibi(i32 %tmp), !dbg !67
				%sub = sub nsw i32 %x.addr.016, %call, !dbg !68
				%cmp1 = icmp sgt i32 %sub, 0, !dbg !69
				br i1 %cmp1, label %while.body, label %if.end, !dbg !71

				while.body4: ; preds = %while.body4, %while.cond2.preheader
				%x.addr.114 = phi i32 [ %add, %while.body4 ], [ %x, %while.cond2.preheader ]
				%tmp1 = load volatile i32, i32* @factor, align 4, !dbg !72
				%call5 = tail call i32 @_Z3fibi(i32 %tmp1), !dbg !74
				%add = add nsw i32 %call5, %x.addr.114, !dbg !75
				%cmp3 = icmp slt i32 %add, 0, !dbg !60
				br i1 %cmp3, label %while.body4, label %if.end, !dbg !63

				if.end: ; preds = %while.body4, %while.body, %while.cond2.preheader
				%x.addr.2 = phi i32 [ 0, %while.cond2.preheader ], [ %sub, %while.body ], [ %add, %while.body4 ]
				ret i32 %x.addr.2, !dbg !76
				}

				define dso_local i32 @_Z5funcBi(i32 %x) local_unnamed_addr #0 !dbg !47 {
				; _Z5funcBi is marked noinline, so outline remainder has promoted context profile
				; INLINE-ALL: @_Z5funcBi{{.*}}!prof ![[FUNCB_PROF:[0-9]+]]
				; INLINE-HOT: @_Z5funcBi{{.*}}!prof ![[FUNCB_PROF:[0-9]+]]
				entry:
				%sub = add nsw i32 %x, -100000, !dbg !51
				%call = tail call i32 @_Z8funcLeafi(i32 %sub), !dbg !52
				; _Z5funcBi is not inlined into main, so we main->_Z5funcBi->_Z8funcLeafi
				; should be inlined based on promoted context profile
				; INLINE-ALL-NOT: call i32 @_Z8funcLeafi
				; INLINE-HOT-NOT: call i32 @_Z8funcLeafi
				ret i32 %call, !dbg !53
				}

				; INLINE-ALL-DAG: [[MAIN_PROF]] = !{!"function_entry_count", i64 13}
				; INLINE-ALL-DAG: [[FUNCA_PROF]] = !{!"function_entry_count", i64 0}
				; INLINE-ALL-DAG-SAME: [[LEAF_PROF]] = !{!"function_entry_count", i64 0}
				; INLINE-ALL-DAG: [[FUNCB_PROF]] = !{!"function_entry_count", i64 33}

				; INLINE-HOT-DAG: [[MAIN_PROF]] = !{!"function_entry_count", i64 13}
				; INLINE-HOT-DAG: [[FUNCA_PROF]] = !{!"function_entry_count", i64 12}
				; INLINE-HOT-DAG-SAME: [[LEAF_PROF]] = !{!"function_entry_count", i64 0}
				; INLINE-HOT-DAG: [[FUNCB_PROF]] = !{!"function_entry_count", i64 33}

				declare i32 @_Z3fibi(i32)

				attributes #0 = { nofree noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }
				attributes #1 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "use-sample-profile" }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!14, !15, !16}
				!llvm.ident = !{!17}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "factor", scope: !2, file: !3, line: 21, type: !13, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, producer: "clang version 11.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, retainedTypes: !5, globals: !12, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
				!3 = !DIFile(filename: "merged.cpp", directory: "/local/autofdo")
				!4 = !{}
				!5 = !{!6, !10, !11}
				!6 = !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 6, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!7 = !DISubroutineType(types: !8)
				!8 = !{!9, !9}
				!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!10 = !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 7, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!11 = !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 22, type: !7, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !4)
				!12 = !{!0}
				!13 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !9)
				!14 = !{i32 7, !"Dwarf Version", i32 4}
				!15 = !{i32 2, !"Debug Info Version", i32 3}
				!16 = !{i32 1, !"wchar_size", i32 4}
				!17 = !{!"clang version 11.0.0"}
				!18 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 11, type: !19, scopeLine: 11, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !21)
				!19 = !DISubroutineType(types: !20)
				!20 = !{!9}
				!21 = !{!22, !23}
				!22 = !DILocalVariable(name: "r", scope: !18, file: !3, line: 12, type: !9)
				!23 = !DILocalVariable(name: "x", scope: !24, file: !3, line: 13, type: !9)
				!24 = distinct !DILexicalBlock(scope: !18, file: !3, line: 13, column: 3)
				!25 = !DILocation(line: 13, column: 3, scope: !26)
				!26 = !DILexicalBlockFile(scope: !24, file: !3, discriminator: 2)
				!27 = !DILocation(line: 17, column: 3, scope: !18)
				!28 = !DILocation(line: 14, column: 10, scope: !29)
				!29 = distinct !DILexicalBlock(scope: !30, file: !3, line: 13, column: 37)
				!30 = distinct !DILexicalBlock(scope: !24, file: !3, line: 13, column: 3)
				!31 = !DILocation(line: 14, column: 29, scope: !29)
				!32 = !DILocation(line: 14, column: 21, scope: !33)
				!33 = !DILexicalBlockFile(scope: !29, file: !3, discriminator: 2)
				!34 = !DILocation(line: 14, column: 19, scope: !29)
				!35 = !DILocation(line: 14, column: 7, scope: !29)
				!36 = !DILocation(line: 13, column: 33, scope: !37)
				!37 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 6)
				!38 = !DILocation(line: 13, column: 26, scope: !39)
				!39 = !DILexicalBlockFile(scope: !30, file: !3, discriminator: 2)
				!40 = distinct !DISubprogram(name: "funcA", linkageName: "_Z5funcAi", scope: !3, file: !3, line: 26, type: !7, scopeLine: 26, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!44 = !DILocation(line: 27, column: 22, scope: !40)
				!45 = !DILocation(line: 27, column: 11, scope: !40)
				!46 = !DILocation(line: 29, column: 3, scope: !40)
				!47 = distinct !DISubprogram(name: "funcB", linkageName: "_Z5funcBi", scope: !3, file: !3, line: 32, type: !7, scopeLine: 32, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!51 = !DILocation(line: 33, column: 22, scope: !47)
				!52 = !DILocation(line: 33, column: 11, scope: !47)
				!53 = !DILocation(line: 35, column: 3, scope: !47)
				!54 = distinct !DISubprogram(name: "funcLeaf", linkageName: "_Z8funcLeafi", scope: !3, file: !3, line: 48, type: !7, scopeLine: 48, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2)
				!57 = !DILocation(line: 49, column: 9, scope: !58)
				!58 = distinct !DILexicalBlock(scope: !54, file: !3, line: 49, column: 7)
				!59 = !DILocation(line: 49, column: 7, scope: !54)
				!60 = !DILocation(line: 58, column: 14, scope: !61)
				!61 = !DILexicalBlockFile(scope: !62, file: !3, discriminator: 2)
				!62 = distinct !DILexicalBlock(scope: !58, file: !3, line: 56, column: 8)
				!63 = !DILocation(line: 58, column: 5, scope: !61)
				!64 = !DILocation(line: 52, column: 16, scope: !65)
				!65 = distinct !DILexicalBlock(scope: !66, file: !3, line: 51, column: 19)
				!66 = distinct !DILexicalBlock(scope: !58, file: !3, line: 49, column: 14)
				!67 = !DILocation(line: 52, column: 12, scope: !65)
				!68 = !DILocation(line: 52, column: 9, scope: !65)
				!69 = !DILocation(line: 51, column: 14, scope: !70)
				!70 = !DILexicalBlockFile(scope: !66, file: !3, discriminator: 2)
				!71 = !DILocation(line: 51, column: 5, scope: !70)
				!72 = !DILocation(line: 59, column: 16, scope: !73)
				!73 = distinct !DILexicalBlock(scope: !62, file: !3, line: 58, column: 19)
				!74 = !DILocation(line: 59, column: 12, scope: !73)
				!75 = !DILocation(line: 59, column: 9, scope: !73)
				!76 = !DILocation(line: 63, column: 3, scope: !54)

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Infrastructure for context-sensitive Sample PGO and InliningClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 308980

llvm/include/llvm/ProfileData/SampleProf.h

llvm/include/llvm/ProfileData/SampleProfReader.h

llvm/include/llvm/Transforms/IPO/SampleContextTracker.h

llvm/lib/ProfileData/SampleProf.cpp

llvm/lib/ProfileData/SampleProfReader.cpp

llvm/lib/Transforms/IPO/CMakeLists.txt

llvm/lib/Transforms/IPO/SampleContextTracker.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/profile-context-tracker.prof

llvm/test/Transforms/SampleProfile/profile-context-tracker-debug.ll

llvm/test/Transforms/SampleProfile/profile-context-tracker.ll

[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
ClosedPublic