This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
-
Inputs/
-
noinline-tailcall-probe.perfbin
-
noinline-tailcall-probe.perfscript
-
cs-tailcall.test
-
tools/llvm-profgen/
-
llvm-profgen/
-
CMakeLists.txt
2/2
MissingFrameInferrer.h
1
MissingFrameInferrer.cpp
-
ProfileGenerator.h
8/14
ProfileGenerator.cpp
5/10
ProfiledBinary.h
3/6
ProfiledBinary.cpp

Differential D139367

[CSSPGO][llvm-profgen] Missing frame inference.
ClosedPublic

Authored by hoy on Dec 5 2022, 1:56 PM.

Download Raw Diff

Details

Reviewers

wenlei
wlei

Commits

rG5d7950a403be: [CSSPGO][llvm-profgen] Missing frame inference.

Summary

This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible.

The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path.

A switch --infer-missing-frame is introduced and defaults to be on.

Some testing results:

0.4% perf win according to three internal benchmarks.
About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark.
10% more profile generation time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Dec 5 2022, 1:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 1:56 PM

Herald added subscribers: modimo, wenlei. · View Herald Transcript

hoy requested review of this revision.Dec 5 2022, 1:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 1:56 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy edited the summary of this revision. (Show Details)Dec 5 2022, 1:57 PM

hoy added reviewers: wenlei, wlei.

Harbormaster completed remote builds in B201211: Diff 480232.Dec 5 2022, 8:22 PM

wenlei added inline comments.Dec 7 2022, 12:56 AM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
17–18	I didn't see new instance of these containers. Also didn't see these includes being removed from other headers.
781	why is this only done for probe case?
819	The way this is done not only turn unknown indirect call targets into known, but also removes/filters some unsampled indirect calls. I'm wondering whether doing such filtering on direct call would also help narrowing down the search space?
828	either a known a zero typo? I also don't understand this message.
848	Maybe just my pet peeve, but I always feel plain, simple description is much less confusing than those ambiguous terms. Through static analysis, indirect call targets are known, but with samples, these targets become known. Can we just simply say known, unknown, and avoid materialized, unmaterialized?
871	NDEBUG
llvm/tools/llvm-profgen/ProfiledBinary.cpp
552	Can we assert that `Target` is always available from `evaluateBranch` for direct branch?
565	Is `MCDesc.isBarrier()` for filtering unconditional branch? Make it explicit in the comments below.
llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	Did you choose `unordered_multimap<uint64_t, uint64_t>` over `unordered_map<uint64_t, unordered_set<uint64_t>>` to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner.. The name is bit confusing because this contains both call source and target. I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now.
480	nit: `type& get..` -> `type &get..`
llvm/tools/llvm-profgen/TailCallTracker.cpp
42–43 ↗	(On Diff #480232)	I found the names a bit confusing, how about making these tweaks: TailCallTargets -> TailCallToTargetMap TailCallReachableFuncs -> TailCallTargetFuncs FuncTailCalls -> FuncToTailCallMap
50 ↗	(On Diff #480232)	Why use `unordered_multimap` to allow duplication in the first place? With `unordered_map<key_t, unordered_set<value_t>>` for both `TailCallTargets` and `FuncTailCalls`, there won't be duplicates.
61–63 ↗	(On Diff #480232)	`FuncTailCalls` contains all functions and their associated tail calls, and it's not a per-function map. Why size 1 means a single tail call function?
86 ↗	(On Diff #480232)	The no debug macro used should be `NDEBUG`
115 ↗	(On Diff #480232)	The return value can be simplified to use boolean instead of int, because we only care whether there's unique path? We may need to check for cycle at caller side, so we avoid the need to return 0.
115 ↗	(On Diff #480232)	Do we want to limit the level of recursive search? Practically, it's probably very rare to have 3+ back to back tail calls, and three consecutive missing frames due to that. I'm not sure how much this can help though. If we open this up to general missing frame inference beyond tail call, such pruning might be useful.
154 ↗	(On Diff #480232)	Reachable via multiple paths is considered unreachable This is just non-unique path situation. Why do we need to introduce a notion of "unreachable" while it is actually reachable? This is probably pure terminology issue, but I don't think I understand the rational behind "unreachable".
186 ↗	(On Diff #480232)	It doesn't seem like we need to return num of path here. We bail early when `NumPaths > 1`, so the number isn't accurate anyways. At most, a boolean indicated whether we found a unique path should be enough.
229 ↗	(On Diff #480232)	typo: broken
232 ↗	(On Diff #480232)	NDEBUG
llvm/tools/llvm-profgen/TailCallTracker.h
23 ↗	(On Diff #480232)	The core improvement is to find unique path between a caller frame and callee frame, so we can infer missing frames. You identified cases of missing frames caused by tail call, so you centered the implementation around tail call. But actually recovering missing frame for FPO is not that different either. So what I'm thinking about is, have a way to not limit this to just handling tail call, instead of `TailCallContextTracker`, make it a generic `MissingFrameInferrer`. The trade off is that the search is going to be more expensive and more likely to have multiple path. But implementation wise it should be fairly similar and could be turned on/off via switch. We don't have to cover everything now, but I think `MissingFrameInferrer` could be a better name than `TailCallContextTracker`, even if we decided to limit to tail call for now. This really is an inferrer instead of a tracker. WDYT? Correspondingly `computeUniqueTailCallPath` would be `computeUniqueCallPath` even if it only deals with tail call now.
83–85 ↗	(On Diff #480232)	Add comment for the two data structures - what is the vector and the uint64_t value. I'd avoid introducing terms when more intuitive and plain expression is available. How about just use `CallerCalleePair` instead of `FrameTransition`? This also goes better with `PairHash` For `NonUniquePaths`, do we really care how many paths are there? Would an `unordered_set` be good enough? Can we merge `UniquePaths` and `NonUniquePaths`? so we don't need two separate look up. we could use empty vector to represent non-unique path?
89 ↗	(On Diff #480232)	NDEBUG
92 ↗	(On Diff #480232)	Would be nice to unify terms, you used unique vs non-unique elsewhere.

hoy marked 5 inline comments as done.Dec 7 2022, 11:00 AM

hoy added inline comments.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
17–18	Good catch, removed.
781	Because the current tracker implementation only applies to the pseudo probe case where `AddrBasedCtxKey` is used. The line number base uses `StringBasedCtxKey`. So there's no need to do `refineTailCallTargets` for the line number case for now.
819	Good point. I think it should be helpful. Will give it a try.
828	should be "a known or a zero"
848	Yeah, known/unknown sounds good.
871	fixed.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
552	Done.
565	It's to filter out conditional branch. Tail call is an unconditional branch. Comment updated.
llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	Did you choose unordered_multimap<uint64_t, uint64_t> over unordered_map<uint64_t, unordered_set<uint64_t>> to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner.. Yeah, that's the point. Single targets take 95% and with multipmap it wouldn't bother construct a container for each of those targets. Duplications are handled in `refineTailCallTargets` . The name is bit confusing because this contains both call source and target. How about CallPairs or just CallEdges? I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now. Yeah, they are separated mostly because of efficiency. A `unordered_map<uint64_t, unordered_set<uint64_t>>` would unify them but at the expense of efficiency.
llvm/tools/llvm-profgen/TailCallTracker.cpp
42–43 ↗	(On Diff #480232)	Sounds good.
50 ↗	(On Diff #480232)	It's about efficiency. 95% of tail calls have a single target, using `unordered_set` for them sounds like unnecessary. WDYT?
61–63 ↗	(On Diff #480232)	Size 1 means there is only one tail call in the program. It in turn means there's only one function having single tail call.
115 ↗	(On Diff #480232)	The return value can be simplified to use boolean instead of int, because we only care whether there's unique path? Hmm, it currently returns three kind of values, 0, 1 and multiple. I'm not seeing how a boolean could make the logic simpler. Do we want to limit the level of recursive search? Practically, it's probably very rare to have 3+ back to back tail calls, and three consecutive missing frames due to that. I'm not sure how much this can help though. Having a limit of the search level makes sense, but I'm not sure what is a good value there. It' not uncommon to have a tail call chain with more than 3 frames. For example, a chain consists of one-liner getter helpers. I can get some numbers and see.
154 ↗	(On Diff #480232)	The comment is confusing. `unreachable` has its literal meaning in the algorithm. That reachable via multiple paths has a different meaning. I just simplified the comment to "Stop analyzing the remaining if we are already seeing more than one reachable paths."
186 ↗	(On Diff #480232)	Not sure a boolean is enough. E.g, it wouldn't differentiate the following two cases when inferring missing frames between A -> F Case 1 with actually paths: A -> B -> C -> F A -> B -> D -> F A -> E -> F The integer return value can tell that we should stop analyzing the path `A -> E -> F` because there are already multiple available paths. Returning 0 or false would not tell it apart from the case below. Case 2 with actually paths: A -> B -> C A -> E -> F
llvm/tools/llvm-profgen/TailCallTracker.h
23 ↗	(On Diff #480232)	Yeah, renaming `TailCallContextTracker` as `MissingFrameInferrer` sounds good. Correspondingly computeUniqueTailCallPath would be computeUniqueCallPath even if it only deals with tail call now. I still like `computeUniqueTailCallPath` since what it real does is to compute a unique tail call path. Moving forward we could extend `inferMissingFrames` to make an extra call to another say `computeUniqueFPOPath` or something?
83–85 ↗	(On Diff #480232)	`CallerCalleePair` sounds good. For NonUniquePaths, do we really care how many paths are there? Would an unordered_set be good enough? We don't really care. `unordered_set` should be good enough. Can we merge UniquePaths and NonUniquePaths? so we don't need two separate look up. we could use empty vector to represent non-unique path? Constructing an empty vector also takes extra memory and time, but is probably better than an extra hash lookup?

Addressing feedbacks.

Harbormaster completed remote builds in B201755: Diff 480985.Dec 7 2022, 11:01 AM

Thanks for the changes and addressing the feedbacks. Can you also rename all the instance of tail call tracker accordingly, for change description as well as title? And to be consistent, use simple terms instead of "materialized" in change description too?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
828	maybe "a known or a zero (unknown)"?
llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	There's also duplication between ProfiledBinary members and MissingFrameInferrer members.
llvm/tools/llvm-profgen/TailCallTracker.cpp
50 ↗	(On Diff #480232)	How much performance does it help? The dedup also has cost. I think the use multimap has readability cost in quite a few places (makes things more complicated). one example is the sliding window you used in order to count tail calls. If the perf cost isn't too much, I'd favor simplicity and readability.
60 ↗	(On Diff #480232)	typo: mutiple -> multiple
61–63 ↗	(On Diff #480232)	Ok, I think the confusion comes from multimap, where one element means not only one function, but also one tail call in the program. And this special case is needed for the sliding window below to work. How much does it help for performance by using multimap? I think that readability will improve if you use a map of a set instead.
115 ↗	(On Diff #480232)	Ok, I thought that returning 0 was only for cycles. But now I see that 0 is also possible for unreachable cases. Yeah, we need a tri-state here. Maybe use a proper tri-state return value? The thing is the returned value also isn't really NumPaths because of early bail on NumPaths > 1. I'm fine with what it is now too.
154 ↗	(On Diff #480232)	The updated comment looks good to me. And I think I understand what you're doing here, but I'm curious - what is the literal meaning of unreachable in the algorithm you mentioned? I understand that we want to stop processing upon multiple paths, but stop process is different from unreachable, even though unreachable will lead to top processing for sure.
186 ↗	(On Diff #480232)	I see the need for tri-state, but this one is a bit different from the return value of `computeUniqueTailCallPath`. Here, what I really meant is that you didn't use the return value of `inferMissingFrames` anywhere, or did miss anything? I think that having a boolean to indicate whether we successfully inferred something could be useful even if we aren't using it now.
llvm/tools/llvm-profgen/TailCallTracker.h
23 ↗	(On Diff #480232)	If we extend it in the future, I think it won't be a separate function. Because the tweak is going to be about what is being considered a viable edge -- currently it's tail call only, but later it can be more. I don't think we need a completely separate DFS to accommodate other edges in the future. That's why I think having one general function for DFS is reasonable. And the name can reflect that. But this isn't critical.
83–85 ↗	(On Diff #480232)	Constructing an empty vector also takes extra memory and time, but is probably better than an extra hash lookup? I'm not sure if this is perf critical. But I was thinking more about better structure, consolidation and readability.

hoy marked 2 inline comments as done.Dec 7 2022, 1:18 PM

hoy added inline comments.

llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	The MissingFrameInferrer members are mostly `BinaryFunction` based while the ProfiledBinary members are address based. The separation is to reduce the disassembling time by avoiding the address to BinaryFunction lookup there.
llvm/tools/llvm-profgen/TailCallTracker.cpp
50 ↗	(On Diff #480232)	It's hard to measure without implementing the non-multimap version. But I can think that besides the extra constructor call and the extra memory that the container takes, there's also an extra cost in accessing a single element in the container. one example is the sliding window you used in order to count tail calls. If the perf cost isn't too much, This is debug code. I can rewrite it by converting the multimap to non-multimap, if it helps readability.
115 ↗	(On Diff #480232)	Having a limit of the search level makes sense, but I'm not sure what is a good value there. It' not uncommon to have a tail call chain with more than 3 frames. For example, a chain consists of one-liner getter helpers. I can get some numbers and see. An update on this. The longest tail call path for a medium benchmark is 16. There are 5% paths whose length is larger than 3.
154 ↗	(On Diff #480232)	By literal meaning of unreachable, I mean the there isn't a path available on the dynamic call graph, probably because the call edges are not LBR-sampled.
186 ↗	(On Diff #480232)	Oh yeah, the return value of `inferMissingFrames` is not used anywhere. Changed it to boolean type.

Addressing feedbacks.

In D139367#3979269, @wenlei wrote:

Thanks for the changes and addressing the feedbacks. Can you also rename all the instance of tail call tracker accordingly, for change description as well as title? And to be consistent, use simple terms instead of "materialized" in change description too?

I will take another look later, but I think the title and description etc still needs update per comment above.

wlei added inline comments.Dec 7 2022, 3:37 PM

llvm/tools/llvm-profgen/ProfiledBinary.cpp
471–472	Nit: `findFuncRange` is also called inside `setIsFuncEntry`, perhaps we could hoist and reuse that, this should save some running time.

hoy retitled this revision from [CSSPGO][llvm-profgen] A tail call tracker to infer missing tail call frames. to [CSSPGO][llvm-profgen] Missing frame inference..Dec 7 2022, 4:22 PM

hoy edited the summary of this revision. (Show Details)

hoy added inline comments.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
471–472	Good point, fixed.

Bounding the DFS search with a speific size (default to INT32_MAX). Will run more experiments to settle down on a good limit.

Updating D139367: [CSSPGO][llvm-profgen] Missing frame inference.

Harbormaster completed remote builds in B201847: Diff 481107.Dec 8 2022, 3:08 AM

wenlei added inline comments.Dec 15 2022, 1:08 PM

llvm/tools/llvm-profgen/MissingFrameInferrer.cpp
78	as mentioned earlier, this can be very straightforward code if we're not using multimap. also the dedup above won't be needed. correct me if i'm wrong, but I doubt setup/initialize time is dominant comparing to actual query/infer time.
llvm/tools/llvm-profgen/MissingFrameInferrer.h
2–3	nit: fix the comment to be on single line.
97	nit: ReachableViaMultiPaths
llvm/tools/llvm-profgen/ProfileGenerator.cpp
1228	Update "tail call tracker"
llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	Yeah, that's the point. Single targets take 95% and with multipmap it wouldn't bother construct a container for each of those targets. Duplications are handled in refineTailCallTargets . You mentioned efficiency a few times here. I agree that unifying may have a (slight) cost. But I feel that the actual cost of inferring missing frame is from the "query" or "infer" time, not the "setup" time here. These things are hard to estimate exactly without measuring, but I would guess that changing the data structure to favor readability, simplicity and structure would not impact performance visibly as long as it doesn't hurt query path. For that reason, I feel that the optimization here could be premature at the expense of readability, and I'm still not convinced it is needed. How about CallPairs or just CallEdges? CallEdges sounds good. The MissingFrameInferrer members are mostly BinaryFunction based while the ProfiledBinary members are address based. The separation is to reduce the disassembling time by avoiding the address to BinaryFunction lookup there. The look up should be quick, right? And shifting the look up cost earlier (as opposed to later in inferrer) doesn't mean we're doing more work? One other thing to consider is to move data structure only needed by tail call into the inferrer, as these are technically owned by inferrer. Ideally, we want to reduce duplication, but if that's not possible, we can move the address based map into inferrer as well, and have it populated during disassembly time. This way these seemingly similar data is at least centralized together.

Switching to using single maps.

llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	OK, switched to using single maps, which appears to be 3% slower than using multimaps. This pushes out the existing 8% overhead to 11%. Do we think it's worth a trade-off for readability?

wenlei added inline comments.Dec 15 2022, 7:16 PM

llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	Thanks for experimenting. 3% extra overhead is a bit surprising. Could you do one more thing to confirm where this is coming from -- whether it's initialization or query/infer time overhead? I thought this change would mostly affect initialization path, which I don't expect to be dominant in cost. Is that not the case? what's the cost split between initialization and query/infer? If there's indeed a solid, repeatable 3% perf hit, I don't think it's worth the overhead for readability.

Harbormaster completed remote builds in B203525: Diff 483409.Dec 15 2022, 9:44 PM

hoy added inline comments.Dec 16 2022, 8:22 AM

llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	Looks like the slowdown is mostly from initialization path where all functions (sampled or not) are disassembled. Interestingly, I remeasured for three times and the average slowdown is like 1.3%. The actual wall time is like 28min50sec vs 29min12 sec. I guess it isn't a big deal? The single map implementation is indeed simpler.

Ltgm, thanks.

llvm/tools/llvm-profgen/ProfiledBinary.h
251–255	Sounds good. Thanks for working through this patiently!

This revision is now accepted and ready to land.Dec 16 2022, 8:31 AM

This revision was landed with ongoing or failed builds.Dec 16 2022, 8:45 AM

Closed by commit rG5d7950a403be: [CSSPGO][llvm-profgen] Missing frame inference. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG5d7950a403be: [CSSPGO][llvm-profgen] Missing frame inference..

Fix ( 09e79659bf2aeb0a5bd8ad6a9a40734b42caaf8a ) for a minor build break this patch caused - stats should be gated by LLVM_ENABLE_STATS, not by NDEBUG

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-profgen/

Inputs/

noinline-tailcall-probe.perfbin

noinline-tailcall-probe.perfscript

5 lines

cs-tailcall.test

94 lines

tools/

llvm-profgen/

CMakeLists.txt

1 line

MissingFrameInferrer.h

115 lines

MissingFrameInferrer.cpp

318 lines

15 lines

60 lines

27 lines

71 lines

Diff 483557

llvm/test/tools/llvm-profgen/Inputs/noinline-tailcall-probe.perfbin

This binary file was added.

llvm/test/tools/llvm-profgen/Inputs/noinline-tailcall-probe.perfscript

This file was added.

				400626
				400665
				7f196c13dd85
				5541f689495641d7
				0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/5 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x4005e1/0x4005fe/P/-/-/2 0x400645/0x4005d0/P/-/-/1 0x400660/0x400640/P/-/-/1 0x400668/0x400660/P/-/-/1 0x40063f/0x400665/P/-/-/26 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3 0x400621/0x4005b0/P/-/-/3 0x40062f/0x4005fe/P/-/-/1 0x4005cf/0x400626/P/-/-/3

llvm/test/tools/llvm-profgen/cs-tailcall.test

This file was added.

				;; The test fails on Windows. Fix it before removing the following requirement.
				; REQUIRES: x86_64-linux
				; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-tailcall-probe.perfscript --binary=%S/Inputs/noinline-tailcall-probe.perfbin --output=%t --profile-summary-hot-count=0 --csspgo-preinliner=0 --gen-cs-nested-profile=0 --infer-missing-frames=0
				; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-NOINFER
				; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-tailcall-probe.perfscript --binary=%S/Inputs/noinline-tailcall-probe.perfbin --output=%t --profile-summary-hot-count=0 --csspgo-preinliner=0 --gen-cs-nested-profile=0 --infer-missing-frames=1
				; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-INFER


				; CHECK-NOINFER: [main:5 @ foo]:48:1
				; CHECK-NOINFER-NEXT: 1: 1
				; CHECK-NOINFER-NEXT: 2: 10
				; CHECK-NOINFER-NEXT: 3: 9
				; CHECK-NOINFER-NEXT: 4: 9
				; CHECK-NOINFER-NEXT: 5: 0
				; CHECK-NOINFER-NEXT: 6: 9
				; CHECK-NOINFER-NEXT: 7: 1
				; CHECK-NOINFER-NEXT: 8: 9 bar:9
				; CHECK-NOINFER-NEXT: !CFGChecksum: 281613927302580
				; CHECK-NOINFER-NEXT:[main:5 @ foo:8 @ bar]:18:9
				; CHECK-NOINFER-NEXT: 1: 9
				; CHECK-NOINFER-NEXT: 4: 9
				; CHECK-NOINFER-NEXT: !CFGChecksum: 72617220756
				; CHECK-NOINFER-NEXT:[main]:3:0
				; CHECK-NOINFER-NEXT: 1: 0
				; CHECK-NOINFER-NEXT: 2: 1
				; CHECK-NOINFER-NEXT: 3: 1
				; CHECK-NOINFER-NEXT: 4: 0
				; CHECK-NOINFER-NEXT: 5: 1 go:1
				; CHECK-NOINFER-NEXT: 6: 0
				; CHECK-NOINFER-NEXT: !CFGChecksum: 563022115997000
				; CHECK-NOINFER-NEXT:[main:5 @ go]:2:1
				; CHECK-NOINFER-NEXT: 1: 1
				; CHECK-NOINFER-NEXT: 2: 1 foo:1
				; CHECK-NOINFER-NEXT: !CFGChecksum: 281479271677951

				; CHECK-INFER: [main:5 @ go:2 @ foo]:48:1
				; CHECK-INFER-NEXT: 1: 1
				; CHECK-INFER-NEXT: 2: 10
				; CHECK-INFER-NEXT: 3: 9
				; CHECK-INFER-NEXT: 4: 9
				; CHECK-INFER-NEXT: 5: 0
				; CHECK-INFER-NEXT: 6: 9
				; CHECK-INFER-NEXT: 7: 1
				; CHECK-INFER-NEXT: 8: 9 bar:9
				; CHECK-INFER-NEXT: !CFGChecksum: 281613927302580
				; CHECK-INFER-NEXT:[main:5 @ go:2 @ foo:8 @ bar]:18:9
				; CHECK-INFER-NEXT: 1: 9
				; CHECK-INFER-NEXT: 4: 9
				; CHECK-INFER-NEXT: !CFGChecksum: 72617220756
				; CHECK-INFER-NEXT:[main]:3:0
				; CHECK-INFER-NEXT: 1: 0
				; CHECK-INFER-NEXT: 2: 1
				; CHECK-INFER-NEXT: 3: 1
				; CHECK-INFER-NEXT: 4: 0
				; CHECK-INFER-NEXT: 5: 1 go:1
				; CHECK-INFER-NEXT: 6: 0
				; CHECK-INFER-NEXT: !CFGChecksum: 563022115997000
				; CHECK-INFER-NEXT:[main:5 @ go]:2:1
				; CHECK-INFER-NEXT: 1: 1
				; CHECK-INFER-NEXT: 2: 1 foo:1
				; CHECK-INFER-NEXT: !CFGChecksum: 281479271677951

				; original code:
				; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
				; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g test.c -o a.out

				#include <stdio.h>

				int s;
				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				int foo() {
				int i = 0;
				while (i++ < 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				return 0;
				}

				void go() {
				foo();
				}

				int main() {
				int i = 0;
				while (i++ < 4000)
				go();
				printf("sum is %d\n", s);
				return 0;
				}

llvm/tools/llvm-profgen/CMakeLists.txt

Show All 14 Lines	set(LLVM_LINK_COMPONENTS
)		)

add_llvm_tool(llvm-profgen		add_llvm_tool(llvm-profgen
llvm-profgen.cpp		llvm-profgen.cpp
PerfReader.cpp		PerfReader.cpp
CSPreInliner.cpp		CSPreInliner.cpp
ProfiledBinary.cpp		ProfiledBinary.cpp
ProfileGenerator.cpp		ProfileGenerator.cpp
		MissingFrameInferrer.cpp
)		)

llvm/tools/llvm-profgen/MissingFrameInferrer.h

This file was added.

				//===-- MissingFrameInferrer.h - Missing frame inferrer ---------- C++/-*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				wenleiUnsubmitted Done Reply Inline Actions nit: fix the comment to be on single line. wenlei: nit: fix the comment to be on single line.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TOOLS_LLVM_PROFGEN_MISSINGFRAMEINFERRER_H
				#define LLVM_TOOLS_LLVM_PROFGEN_MISSINGFRAMEINFERRER_H

				#include "PerfReader.h"
				#include "llvm/ADT/DenseSet.h"
				#include "llvm/ADT/SmallVector.h"
				#include <unordered_map>
				#include <unordered_set>

				namespace llvm {
				namespace sampleprof {

				class ProfiledBinary;
				struct BinaryFunction;

				class MissingFrameInferrer {
				public:
				MissingFrameInferrer(ProfiledBinary *Binary) : Binary(Binary) {}

				// Defininig a frame transition from a caller function to the callee function.
				using CallerCalleePair = std::pair<BinaryFunction , BinaryFunction >;

				void initialize(const ContextSampleCounterMap *SampleCounters);

				// Given an input `Context`, output `NewContext` with inferred missing tail
				// call frames.
				void inferMissingFrames(const SmallVectorImpl<uint64_t> &Context,
				SmallVectorImpl<uint64_t> &NewContext);

				private:
				friend class ProfiledBinary;

				// Compute a unique tail call path for a pair of source frame address and
				// target frame address. Append the unique path prefix (not including `To`) to
				// `UniquePath` if exists. Return the whether this's a unqiue tail call
				// path. The source/dest frame will typically be a pair of adjacent frame
				// entries of call stack samples.
				bool inferMissingFrames(uint64_t From, uint64_t To,
				SmallVectorImpl<uint64_t> &UniquePath);

				// Compute a unique tail call path from the source frame address to the target
				// function. Output the unique path prefix (not including `To`) in
				// `UniquePath` if exists. Return the number of possibly availabe tail call
				// paths.
				uint64_t computeUniqueTailCallPath(uint64_t From, BinaryFunction *To,
				SmallVectorImpl<uint64_t> &UniquePath);

				// Compute a unique tail call path from the source function to the target
				// function. Output the unique path prefix (not including `To`) in
				// `UniquePath` if exists. Return the number of possibly availabe tail call
				// paths.
				uint64_t computeUniqueTailCallPath(BinaryFunction From, BinaryFunction To,
				SmallVectorImpl<uint64_t> &UniquePath);

				ProfiledBinary *Binary;

				// A map of call instructions to their target addresses. This is first
				// populated with static call edges but then trimmed down to dynamic call
				// edges based on LBR samples.
				std::unordered_map<uint64_t, std::unordered_set<uint64_t>> CallEdges;

				// A map of tail call instructions to their target addresses. This is first
				// populated with static call edges but then trimmed down to dynamic call
				// edges based on LBR samples.
				std::unordered_map<uint64_t, std::unordered_set<uint64_t>> TailCallEdges;

				// Dynamic call targets in terms of BinaryFunction for any calls.
				std::unordered_map<uint64_t, std::unordered_set<BinaryFunction *>> CallEdgesF;

				// Dynamic call targets in terms of BinaryFunction for tail calls.
				std::unordered_map<uint64_t, std::unordered_set<BinaryFunction *>>
				TailCallEdgesF;

				// Dynamic tail call targets of caller functions.
				std::unordered_map<BinaryFunction *, std::vector<uint64_t>> FuncToTailCallMap;

				// Functions that are reachable via tail calls.
				DenseSet<const BinaryFunction *> TailCallTargetFuncs;

				struct PairHash {
				std::size_t operator()(
				const std::pair<BinaryFunction , BinaryFunction > &Pair) const {
				return std::hash<BinaryFunction *>()(Pair.first) ^
				std::hash<BinaryFunction *>()(Pair.second);
				}
				};

				// Cached results from a CallerCalleePair to a unique call path between them.
				std::unordered_map<CallerCalleePair, std::vector<uint64_t>, PairHash>
				wenleiUnsubmitted Done Reply Inline Actions nit: ReachableViaMultiPaths wenlei: nit: ReachableViaMultiPaths
				UniquePaths;
				// Cached results from CallerCalleePair to the number of available call paths.
				std::unordered_map<CallerCalleePair, uint64_t, PairHash> NonUniquePaths;

				DenseSet<BinaryFunction *> Visiting;

				uint32_t CurSearchingDepth = 0;

				#ifndef NDEBUG
				DenseSet<std::pair<uint64_t, uint64_t>> ReachableViaUniquePaths;
				DenseSet<std::pair<uint64_t, uint64_t>> Unreachables;
				DenseSet<std::pair<uint64_t, uint64_t>> ReachableViaMultiPaths;
				#endif
				};
				} // end namespace sampleprof
				} // end namespace llvm

				#endif

llvm/tools/llvm-profgen/MissingFrameInferrer.cpp

This file was added.

				//===-- MissingFrameInferrer.cpp - Missing frame inferrer --------- C++ -*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "MissingFrameInferrer.h"
				#include "PerfReader.h"
				#include "ProfiledBinary.h"
				#include "llvm/ADT/SCCIterator.h"
				#include "llvm/ADT/Statistic.h"
				#include <algorithm>
				#include <cstdint>
				#include <iterator>
				#include <queue>
				#include <sys/types.h>

				#define DEBUG_TYPE "missing-frame-inferrer"

				using namespace llvm;
				using namespace sampleprof;

				STATISTIC(TailCallUniReachable,
				"Number of frame pairs reachable via a unique tail call path");
				STATISTIC(TailCallMultiReachable,
				"Number of frame pairs reachable via a multiple tail call paths");
				STATISTIC(TailCallUnreachable,
				"Number of frame pairs unreachable via any tail call path");
				STATISTIC(TailCallFuncSingleTailCalls,
				"Number of functions with single tail call site");
				STATISTIC(TailCallFuncMultipleTailCalls,
				"Number of functions with multiple tail call sites");
				STATISTIC(TailCallMaxTailCallPath, "Length of the longest tail call path");

				static cl::opt<uint32_t>
				MaximumSearchDepth("max-search-depth", cl::init(UINT32_MAX - 1),
				cl::desc("The maximum levels the DFS-based missing "
				"frame search should go with"));

				void MissingFrameInferrer::initialize(
				const ContextSampleCounterMap *SampleCounters) {
				// Refine call edges based on LBR samples.
				if (SampleCounters) {
				std::unordered_map<uint64_t, std::unordered_set<uint64_t>> SampledCalls;
				std::unordered_map<uint64_t, std::unordered_set<uint64_t>> SampledTailCalls;

				// Populate SampledCalls based on static call sites. Similarly to
				// SampledTailCalls.
				for (const auto &CI : *SampleCounters) {
				for (auto Item : CI.second.BranchCounter) {
				auto From = Item.first.first;
				auto To = Item.first.second;
				if (CallEdges.count(From)) {
				assert(CallEdges[From].size() == 1 &&
				"A callsite should only appear once with either a known or a "
				"zero (unknown) target value at this point");
				SampledCalls[From].insert(To);
				}
				if (TailCallEdges.count(From)) {
				assert(TailCallEdges[From].size() == 1 &&
				"A callsite should only appear once with either a known or a "
				"zero (unknown) target value at this point");
				FuncRange *FromFRange = Binary->findFuncRange(From);
				FuncRange *ToFRange = Binary->findFuncRange(To);
				if (FromFRange != ToFRange)
				SampledTailCalls[From].insert(To);
				}
				}
				}

				// Replace static edges with dynamic edges.
				CallEdges = SampledCalls;
				TailCallEdges = SampledTailCalls;
				}

				// Populate function-based edges. This is to speed up address to function
				wenleiUnsubmitted Not Done Reply Inline Actions as mentioned earlier, this can be very straightforward code if we're not using multimap. also the dedup above won't be needed. correct me if i'm wrong, but I doubt setup/initialize time is dominant comparing to actual query/infer time. wenlei: as mentioned earlier, this can be very straightforward code if we're not using multimap. also…
				// translation.
				for (auto Call : CallEdges)
				for (auto Target : Call.second)
				if (FuncRange *ToFRange = Binary->findFuncRange(Target))
				CallEdgesF[Call.first].insert(ToFRange->Func);

				for (auto Call : TailCallEdges) {
				for (auto Target : Call.second) {
				if (FuncRange *ToFRange = Binary->findFuncRange(Target)) {
				TailCallEdgesF[Call.first].insert(ToFRange->Func);
				TailCallTargetFuncs.insert(ToFRange->Func);
				}
				}
				if (FuncRange *FromFRange = Binary->findFuncRange(Call.first))
				FuncToTailCallMap[FromFRange->Func].push_back(Call.first);
				}

				#if LLVM_ENABLE_STATS
				for (auto F : FuncToTailCallMap) {
				assert(F.second.size() > 0 && "");
				if (F.second.size() > 1)
				TailCallFuncMultipleTailCalls++;
				else
				TailCallFuncSingleTailCalls++;
				}
				#endif

				#ifndef NDEBUG
				auto PrintCallTargets =
				[&](const std::unordered_map<uint64_t, std::unordered_set<uint64_t>>
				&CallTargets,
				bool IsTailCall) {
				for (const auto &Targets : CallTargets) {
				for (const auto &Target : Targets.second) {
				dbgs() << (IsTailCall ? "TailCall" : "Call");
				dbgs() << " From " << format("%8" PRIx64, Targets.first) << " to "
				<< format("%8" PRIx64, Target) << "\n";
				}
				}
				};

				LLVM_DEBUG(dbgs() << "============================\n ";
				dbgs() << "Call targets:\n";
				PrintCallTargets(CallEdges, false);
				dbgs() << "\nTail call targets:\n";
				PrintCallTargets(CallEdges, true);
				dbgs() << "============================\n";);
				#endif
				}

				uint64_t MissingFrameInferrer::computeUniqueTailCallPath(
				BinaryFunction From, BinaryFunction To, SmallVectorImpl<uint64_t> &Path) {
				// Search for a unique path comprised of only tail call edges for a given
				// source and target frame address on the a tail call graph that consists of
				// only tail call edges. Note that only a unique path counts. Multiple paths
				// are treated unreachable.
				if (From == To)
				return 1;

				// Ignore cyclic paths. Since we are doing a recursive DFS walk, if the source
				// frame being visited is already in the stack, it means we are seeing a
				// cycle. This is done before querying the cached result because the cached
				// result may be computed based on the same path. Consider the following case:
				// A -> B, B -> A, A -> D
				// When computing unique reachablity from A to D, the cached result for (B,D)
				// should not be counted since the unique path B->A->D is basically the same
				// path as A->D. Counting that with invalidate the uniqueness from A to D.
				if (Visiting.contains(From))
				return 0;

				// If already computed, return the cached result.
				auto I = UniquePaths.find({From, To});
				if (I != UniquePaths.end()) {
				Path.append(I->second.begin(), I->second.end());
				return 1;
				}

				auto J = NonUniquePaths.find({From, To});
				if (J != NonUniquePaths.end()) {
				return J->second;
				}

				uint64_t Pos = Path.size();

				// DFS walk each outgoing tail call edges.
				// Bail out if we are already at the the maximum searching depth.
				if (CurSearchingDepth == MaximumSearchDepth)
				return 0;


				if (!FuncToTailCallMap.count(From))
				return 0;

				CurSearchingDepth++;
				Visiting.insert(From);
				uint64_t NumPaths = 0;
				for (auto TailCall : FuncToTailCallMap[From]) {
				NumPaths += computeUniqueTailCallPath(TailCall, To, Path);
				// Stop analyzing the remaining if we are already seeing more than one
				// reachable paths.
				if (NumPaths > 1)
				break;
				}
				CurSearchingDepth--;
				Visiting.erase(From);

				// Undo already-computed path if it is not unique.
				if (NumPaths != 1) {
				Path.pop_back_n(Path.size() - Pos);
				}

				// Cache the result.
				if (NumPaths == 1) {
				UniquePaths[{From, To}].assign(Path.begin() + Pos, Path.end());
				#if LLVM_ENABLE_STATS
				auto &LocalPath = UniquePaths[{From, To}];
				assert((LocalPath.size() <= MaximumSearchDepth + 1) &&
				"Path should not be longer than the maximum searching depth");
				TailCallMaxTailCallPath =
				std::max(LocalPath.size(), TailCallMaxTailCallPath.getValue());
				#endif
				} else {
				NonUniquePaths[{From, To}] = NumPaths;
				}

				return NumPaths;
				}

				uint64_t MissingFrameInferrer::computeUniqueTailCallPath(
				uint64_t From, BinaryFunction *To, SmallVectorImpl<uint64_t> &Path) {
				if (!TailCallEdgesF.count(From))
				return 0;
				Path.push_back(From);
				uint64_t NumPaths = 0;
				for (auto Target : TailCallEdgesF[From]) {
				NumPaths += computeUniqueTailCallPath(Target, To, Path);
				// Stop analyzing the remaining if we are already seeing more than one
				// reachable paths.
				if (NumPaths > 1)
				break;
				}

				// Undo already-computed path if it is not unique.
				if (NumPaths != 1)
				Path.pop_back();
				return NumPaths;
				}

				bool MissingFrameInferrer::inferMissingFrames(
				uint64_t From, uint64_t To, SmallVectorImpl<uint64_t> &UniquePath) {
				assert(!TailCallEdgesF.count(From) &&
				"transition between From and To cannot be via a tailcall otherwise "
				"they would not show up at the same time");
				UniquePath.push_back(From);
				uint64_t Pos = UniquePath.size();

				FuncRange *ToFRange = Binary->findFuncRange(To);
				if (!ToFRange)
				return false;

				// Bail out if caller has no known outgoing call edges.
				if (!CallEdgesF.count(From))
				return false;

				// Done with the inference if the calle is reachable via a single callsite.
				// This may not be accurate but it improves the search throughput.
				for (auto Target : CallEdgesF[From]) {
				if (Target == ToFRange->Func)
				return true;
				}

				// Bail out if callee is not tailcall reachable at all.
				if (!TailCallTargetFuncs.contains(ToFRange->Func))
				return false;

				Visiting.clear();
				CurSearchingDepth = 0;
				uint64_t NumPaths = 0;
				for (auto Target : CallEdgesF[From]) {
				NumPaths +=
				computeUniqueTailCallPath(Target, ToFRange->Func, UniquePath);
				// Stop analyzing the remaining if we are already seeing more than one
				// reachable paths.
				if (NumPaths > 1)
				break;
				}

				// Undo already-computed path if it is not unique.
				if (NumPaths != 1) {
				UniquePath.pop_back_n(UniquePath.size() - Pos);
				assert(UniquePath.back() == From && "broken path");
				}

				#ifndef NDEBUG
				if (NumPaths == 1) {
				if (ReachableViaUniquePaths.insert({From, ToFRange->StartAddress}).second)
				TailCallUniReachable++;
				} else if (NumPaths == 0) {
				if (Unreachables.insert({From, ToFRange->StartAddress}).second) {
				TailCallUnreachable++;
				LLVM_DEBUG(dbgs() << "No path found from "
				<< format("%8" PRIx64 ":", From) << " to "
				<< format("%8" PRIx64 ":", ToFRange->StartAddress)
				<< "\n");
				}
				} else if (NumPaths > 1) {
				if (ReachableViaMultiPaths.insert({From, ToFRange->StartAddress})
				.second) {
				TailCallMultiReachable++;
				LLVM_DEBUG(dbgs() << "Multiple paths found from "
				<< format("%8" PRIx64 ":", From) << " to "
				<< format("%8" PRIx64 ":", ToFRange->StartAddress)
				<< "\n");
				}
				}
				#endif

				return NumPaths == 1;
				}

				void MissingFrameInferrer::inferMissingFrames(
				const SmallVectorImpl<uint64_t> &Context,
				SmallVectorImpl<uint64_t> &NewContext) {
				if (Context.size() == 1) {
				NewContext = Context;
				return;
				}

				NewContext.clear();
				for (uint64_t I = 1; I < Context.size(); I++) {
				inferMissingFrames(Context[I - 1], Context[I], NewContext);
				}
				NewContext.push_back(Context.back());

				assert((NewContext.size() >= Context.size()) &&
				"Inferred context should include all frames in the original context");
				assert((NewContext.size() > Context.size() \|\| NewContext == Context) &&
				"Inferred context should be exactly the same "
				"with the original context");
				}

llvm/tools/llvm-profgen/ProfileGenerator.h

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	private:
void populateInferredFunctionSamples(ContextTrieNode &Node);		void populateInferredFunctionSamples(ContextTrieNode &Node);

void updateFunctionSamples();		void updateFunctionSamples();

void generateProbeBasedProfile();		void generateProbeBasedProfile();

// Fill in function body samples from probes		// Fill in function body samples from probes
void populateBodySamplesWithProbes(const RangeSample &RangeCounter,		void populateBodySamplesWithProbes(const RangeSample &RangeCounter,
SampleContextFrames ContextStack);		const AddrBasedCtxKey *CtxKey);
// Fill in boundary samples for a call probe		// Fill in boundary samples for a call probe
void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,		void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,
SampleContextFrames ContextStack);		const AddrBasedCtxKey *CtxKey);

ContextTrieNode *		ContextTrieNode *
getContextNodeForLeafProbe(SampleContextFrames ContextStack,		getContextNodeForLeafProbe(const AddrBasedCtxKey *CtxKey,
const MCDecodedPseudoProbe *LeafProbe);		const MCDecodedPseudoProbe *LeafProbe);

// Helper function to get FunctionSamples for the leaf probe		// Helper function to get FunctionSamples for the leaf probe
FunctionSamples &		FunctionSamples &
getFunctionProfileForLeafProbe(SampleContextFrames ContextStack,		getFunctionProfileForLeafProbe(const AddrBasedCtxKey *CtxKey,
const MCDecodedPseudoProbe *LeafProbe);		const MCDecodedPseudoProbe *LeafProbe);

void convertToProfileMap(ContextTrieNode &Node,		void convertToProfileMap(ContextTrieNode &Node,
SampleContextFrameVector &Context);		SampleContextFrameVector &Context);

void convertToProfileMap();		void convertToProfileMap();

void computeSummaryAndThreshold();		void computeSummaryAndThreshold();

bool collectFunctionsFromLLVMProfile(		bool collectFunctionsFromLLVMProfile(
std::unordered_set<const BinaryFunction *> &ProfiledFunctions) override;		std::unordered_set<const BinaryFunction *> &ProfiledFunctions) override;

		void initializeMissingFrameInferrer();

		// Given an input `Context`, output `NewContext` with inferred missing tail
		// call frames.
		void inferMissingFrames(const SmallVectorImpl<uint64_t> &Context,
		SmallVectorImpl<uint64_t> &NewContext);

ContextTrieNode &getRootContext() { return ContextTracker.getRootContext(); };		ContextTrieNode &getRootContext() { return ContextTracker.getRootContext(); };

// The container for holding the FunctionSamples used by context trie.		// The container for holding the FunctionSamples used by context trie.
std::list<FunctionSamples> FSamplesList;		std::list<FunctionSamples> FSamplesList;

// Underlying context table serves for sample profile writer.		// Underlying context table serves for sample profile writer.
std::unordered_set<SampleContextFrameVector, SampleContextFrameHash> Contexts;		std::unordered_set<SampleContextFrameVector, SampleContextFrameHash> Contexts;

Show All 15 Lines

llvm/tools/llvm-profgen/ProfileGenerator.cpp

//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//		//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "ProfileGenerator.h"		#include "ProfileGenerator.h"
#include "ErrorHandling.h"		#include "ErrorHandling.h"
		#include "MissingFrameInferrer.h"
#include "PerfReader.h"		#include "PerfReader.h"
#include "ProfiledBinary.h"		#include "ProfiledBinary.h"
#include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"		#include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include <algorithm>		#include <algorithm>
#include <float.h>		#include <float.h>
#include <unordered_set>		#include <unordered_set>
#include <utility>		#include <utility>
		wenleiUnsubmitted Not Done Reply Inline Actions I didn't see new instance of these containers. Also didn't see these includes being removed from other headers. wenlei: I didn't see new instance of these containers. Also didn't see these includes being removed…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good catch, removed. hoy: Good catch, removed.

cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),		cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
cl::Required,		cl::Required,
cl::desc("Output profile file"));		cl::desc("Output profile file"));
static cl::alias OutputA("o", cl::desc("Alias for --output"),		static cl::alias OutputA("o", cl::desc("Alias for --output"),
cl::aliasopt(OutputFilename));		cl::aliasopt(OutputFilename));

static cl::opt<SampleProfileFormat> OutputFormat(		static cl::opt<SampleProfileFormat> OutputFormat(
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static cl::opt<bool> UpdateTotalSamples(
llvm::cl::desc(		llvm::cl::desc(
"Update total samples by accumulating all its body samples."),		"Update total samples by accumulating all its body samples."),
llvm::cl::Optional);		llvm::cl::Optional);

static cl::opt<bool> GenCSNestedProfile(		static cl::opt<bool> GenCSNestedProfile(
"gen-cs-nested-profile", cl::Hidden, cl::init(true),		"gen-cs-nested-profile", cl::Hidden, cl::init(true),
cl::desc("Generate nested function profiles for CSSPGO"));		cl::desc("Generate nested function profiles for CSSPGO"));

		cl::opt<bool> InferMissingFrames(
		"infer-missing-frames", llvm::cl::init(true),
		llvm::cl::desc(
		"Infer missing call frames due to compiler tail call elimination."),
		llvm::cl::Optional);

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

namespace llvm {		namespace llvm {
extern cl::opt<int> ProfileSummaryCutoffHot;		extern cl::opt<int> ProfileSummaryCutoffHot;
extern cl::opt<bool> UseContextLessSummary;		extern cl::opt<bool> UseContextLessSummary;

namespace sampleprof {		namespace sampleprof {
▲ Show 20 Lines • Show All 659 Lines • ▼ Show 20 Lines	CSProfileGenerator::getOrCreateContextNode(const SampleContextFrames Context,
return ContextNode;		return ContextNode;
}		}

void CSProfileGenerator::generateProfile() {		void CSProfileGenerator::generateProfile() {
FunctionSamples::ProfileIsCS = true;		FunctionSamples::ProfileIsCS = true;

collectProfiledFunctions();		collectProfiledFunctions();

if (Binary->usePseudoProbes())		if (Binary->usePseudoProbes()) {
Binary->decodePseudoProbe();		Binary->decodePseudoProbe();
		if (InferMissingFrames)
		wenleiUnsubmitted Not Done Reply Inline Actions why is this only done for probe case? wenlei: why is this only done for probe case?
		hoyAuthorUnsubmitted Done Reply Inline Actions Because the current tracker implementation only applies to the pseudo probe case where `AddrBasedCtxKey` is used. The line number base uses `StringBasedCtxKey`. So there's no need to do `refineTailCallTargets` for the line number case for now. hoy: Because the current tracker implementation only applies to the pseudo probe case where…
		initializeMissingFrameInferrer();
		}

if (SampleCounters) {		if (SampleCounters) {
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
generateProbeBasedProfile();		generateProbeBasedProfile();
} else {		} else {
generateLineNumBasedProfile();		generateLineNumBasedProfile();
}		}
}		}

if (Binary->getTrackFuncContextSize())		if (Binary->getTrackFuncContextSize())
computeSizeForProfiledFunctions();		computeSizeForProfiledFunctions();

postProcessProfiles();		postProcessProfiles();
}		}

		void CSProfileGenerator::initializeMissingFrameInferrer() {
		Binary->getMissingContextInferrer()->initialize(SampleCounters);
		}

		void CSProfileGenerator::inferMissingFrames(
		const SmallVectorImpl<uint64_t> &Context,
		SmallVectorImpl<uint64_t> &NewContext) {
		Binary->inferMissingFrames(Context, NewContext);
		}

void CSProfileGenerator::computeSizeForProfiledFunctions() {		void CSProfileGenerator::computeSizeForProfiledFunctions() {
for (auto *Func : Binary->getProfiledFunctions())		for (auto *Func : Binary->getProfiledFunctions())
Binary->computeInlinedContextSizeForFunc(Func);		Binary->computeInlinedContextSizeForFunc(Func);

// Flush the symbolizer to save memory.		// Flush the symbolizer to save memory.
Binary->flushSymbolizer();		Binary->flushSymbolizer();
}		}

void CSProfileGenerator::updateFunctionSamples() {		void CSProfileGenerator::updateFunctionSamples() {
for (auto *Node : ContextTracker) {		for (auto *Node : ContextTracker) {
FunctionSamples *FSamples = Node->getFunctionSamples();		FunctionSamples *FSamples = Node->getFunctionSamples();
		wenleiUnsubmitted Not Done Reply Inline Actions The way this is done not only turn unknown indirect call targets into known, but also removes/filters some unsampled indirect calls. I'm wondering whether doing such filtering on direct call would also help narrowing down the search space? wenlei: The way this is done not only turn unknown indirect call targets into known, but also…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good point. I think it should be helpful. Will give it a try. hoy: Good point. I think it should be helpful. Will give it a try.
if (FSamples) {		if (FSamples) {
if (UpdateTotalSamples)		if (UpdateTotalSamples)
FSamples->updateTotalSamples();		FSamples->updateTotalSamples();
FSamples->updateCallsiteSamples();		FSamples->updateCallsiteSamples();
}		}
}		}
}		}

void CSProfileGenerator::generateLineNumBasedProfile() {		void CSProfileGenerator::generateLineNumBasedProfile() {
		wenleiUnsubmitted Not Done Reply Inline Actions either a known a zero typo? I also don't understand this message. wenlei: > either a known a zero typo? I also don't understand this message.
		hoyAuthorUnsubmitted Done Reply Inline Actions should be "a known or a zero" hoy: should be "a known or a zero"
		wenleiUnsubmitted Done Reply Inline Actions maybe "a known or a zero (unknown)"? wenlei: maybe "a known or a zero (unknown)"?
for (const auto &CI : *SampleCounters) {		for (const auto &CI : *SampleCounters) {
const auto *CtxKey = cast<StringBasedCtxKey>(CI.first.getPtr());		const auto *CtxKey = cast<StringBasedCtxKey>(CI.first.getPtr());

ContextTrieNode *ContextNode = &getRootContext();		ContextTrieNode *ContextNode = &getRootContext();
// Sample context will be empty if the jump is an external-to-internal call		// Sample context will be empty if the jump is an external-to-internal call
// pattern, the head samples should be added for the internal function.		// pattern, the head samples should be added for the internal function.
if (!CtxKey->Context.empty()) {		if (!CtxKey->Context.empty()) {
// Get or create function profile for the range		// Get or create function profile for the range
ContextNode =		ContextNode =
getOrCreateContextNode(CtxKey->Context, CtxKey->WasLeafInlined);		getOrCreateContextNode(CtxKey->Context, CtxKey->WasLeafInlined);
// Fill in function body samples		// Fill in function body samples
populateBodySamplesForFunction(*ContextNode->getFunctionSamples(),		populateBodySamplesForFunction(*ContextNode->getFunctionSamples(),
CI.second.RangeCounter);		CI.second.RangeCounter);
}		}
// Fill in boundary sample counts as well as call site samples for calls		// Fill in boundary sample counts as well as call site samples for calls
populateBoundarySamplesForFunction(ContextNode, CI.second.BranchCounter);		populateBoundarySamplesForFunction(ContextNode, CI.second.BranchCounter);
}		}
// Fill in call site value sample for inlined calls and also use context to		// Fill in call site value sample for inlined calls and also use context to
// infer missing samples. Since we don't have call count for inlined		// infer missing samples. Since we don't have call count for inlined
// functions, we estimate it from inlinee's profile using the entry of the		// functions, we estimate it from inlinee's profile using the entry of the
		wenleiUnsubmitted Not Done Reply Inline Actions Maybe just my pet peeve, but I always feel plain, simple description is much less confusing than those ambiguous terms. Through static analysis, indirect call targets are known, but with samples, these targets become known. Can we just simply say known, unknown, and avoid materialized, unmaterialized? wenlei: Maybe just my pet peeve, but I always feel plain, simple description is much less confusing…
		hoyAuthorUnsubmitted Done Reply Inline Actions Yeah, known/unknown sounds good. hoy: Yeah, known/unknown sounds good.
// body sample.		// body sample.
populateInferredFunctionSamples(getRootContext());		populateInferredFunctionSamples(getRootContext());

updateFunctionSamples();		updateFunctionSamples();
}		}

void CSProfileGenerator::populateBodySamplesForFunction(		void CSProfileGenerator::populateBodySamplesForFunction(
FunctionSamples &FunctionProfile, const RangeSample &RangeCounter) {		FunctionSamples &FunctionProfile, const RangeSample &RangeCounter) {
// Compute disjoint ranges first, so we can use MAX		// Compute disjoint ranges first, so we can use MAX
// for calculating count for each location.		// for calculating count for each location.
RangeSample Ranges;		RangeSample Ranges;
findDisjointRanges(Ranges, RangeCounter);		findDisjointRanges(Ranges, RangeCounter);
for (const auto &Range : Ranges) {		for (const auto &Range : Ranges) {
uint64_t RangeBegin = Range.first.first;		uint64_t RangeBegin = Range.first.first;
uint64_t RangeEnd = Range.first.second;		uint64_t RangeEnd = Range.first.second;
uint64_t Count = Range.second;		uint64_t Count = Range.second;
// Disjoint ranges have introduce zero-filled gap that		// Disjoint ranges have introduce zero-filled gap that
// doesn't belong to current context, filter them out.		// doesn't belong to current context, filter them out.
if (Count == 0)		if (Count == 0)
continue;		continue;

InstructionPointer IP(Binary, RangeBegin, true);		InstructionPointer IP(Binary, RangeBegin, true);
// Disjoint ranges may have range in the middle of two instr,		// Disjoint ranges may have range in the middle of two instr,
		wenleiUnsubmitted Not Done Reply Inline Actions NDEBUG wenlei: NDEBUG
		hoyAuthorUnsubmitted Done Reply Inline Actions fixed. hoy: fixed.
// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range		// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
// can be Addr1+1 to Addr2-1. We should ignore such range.		// can be Addr1+1 to Addr2-1. We should ignore such range.
if (IP.Address > RangeEnd)		if (IP.Address > RangeEnd)
continue;		continue;

do {		do {
auto LeafLoc = Binary->getInlineLeafFrameLoc(IP.Address);		auto LeafLoc = Binary->getInlineLeafFrameLoc(IP.Address);
if (LeafLoc) {		if (LeafLoc) {
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
}		}

void CSProfileGenerator::generateProbeBasedProfile() {		void CSProfileGenerator::generateProbeBasedProfile() {
// Enable pseudo probe functionalities in SampleProf		// Enable pseudo probe functionalities in SampleProf
FunctionSamples::ProfileIsProbeBased = true;		FunctionSamples::ProfileIsProbeBased = true;
for (const auto &CI : *SampleCounters) {		for (const auto &CI : *SampleCounters) {
const AddrBasedCtxKey *CtxKey =		const AddrBasedCtxKey *CtxKey =
dyn_cast<AddrBasedCtxKey>(CI.first.getPtr());		dyn_cast<AddrBasedCtxKey>(CI.first.getPtr());
SampleContextFrameVector ContextStack;
extractPrefixContextStack(ContextStack, CtxKey->Context, Binary);
// Fill in function body samples from probes, also infer caller's samples		// Fill in function body samples from probes, also infer caller's samples
// from callee's probe		// from callee's probe
populateBodySamplesWithProbes(CI.second.RangeCounter, ContextStack);		populateBodySamplesWithProbes(CI.second.RangeCounter, CtxKey);
// Fill in boundary samples for a call probe		// Fill in boundary samples for a call probe
populateBoundarySamplesWithProbes(CI.second.BranchCounter, ContextStack);		populateBoundarySamplesWithProbes(CI.second.BranchCounter, CtxKey);
}		}
}		}

void CSProfileGenerator::populateBodySamplesWithProbes(		void CSProfileGenerator::populateBodySamplesWithProbes(
const RangeSample &RangeCounter, SampleContextFrames ContextStack) {		const RangeSample &RangeCounter, const AddrBasedCtxKey *CtxKey) {
ProbeCounterMap ProbeCounter;		ProbeCounterMap ProbeCounter;
// Extract the top frame probes by looking up each address among the range in		// Extract the top frame probes by looking up each address among the range in
// the Address2ProbeMap		// the Address2ProbeMap
extractProbesFromRange(RangeCounter, ProbeCounter);		extractProbesFromRange(RangeCounter, ProbeCounter);
std::unordered_map<MCDecodedPseudoProbeInlineTree *,		std::unordered_map<MCDecodedPseudoProbeInlineTree *,
std::unordered_set<FunctionSamples *>>		std::unordered_set<FunctionSamples *>>
FrameSamples;		FrameSamples;
for (const auto &PI : ProbeCounter) {		for (const auto &PI : ProbeCounter) {
const MCDecodedPseudoProbe *Probe = PI.first;		const MCDecodedPseudoProbe *Probe = PI.first;
uint64_t Count = PI.second;		uint64_t Count = PI.second;
// Disjoint ranges have introduce zero-filled gap that		// Disjoint ranges have introduce zero-filled gap that
// doesn't belong to current context, filter them out.		// doesn't belong to current context, filter them out.
if (!Probe->isBlock() \|\| Count == 0)		if (!Probe->isBlock() \|\| Count == 0)
continue;		continue;

ContextTrieNode *ContextNode =		ContextTrieNode *ContextNode = getContextNodeForLeafProbe(CtxKey, Probe);
getContextNodeForLeafProbe(ContextStack, Probe);
FunctionSamples &FunctionProfile = *ContextNode->getFunctionSamples();		FunctionSamples &FunctionProfile = *ContextNode->getFunctionSamples();
// Record the current frame and FunctionProfile whenever samples are		// Record the current frame and FunctionProfile whenever samples are
// collected for non-danglie probes. This is for reporting all of the		// collected for non-danglie probes. This is for reporting all of the
// zero count probes of the frame later.		// zero count probes of the frame later.
FrameSamples[Probe->getInlineTreeNode()].insert(&FunctionProfile);		FrameSamples[Probe->getInlineTreeNode()].insert(&FunctionProfile);
FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);		FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);
FunctionProfile.addTotalSamples(Count);		FunctionProfile.addTotalSamples(Count);
if (Probe->isEntry()) {		if (Probe->isEntry()) {
Show All 27 Lines	for (auto *FunctionProfile : I.second) {
for (auto *Probe : I.first->getProbes()) {		for (auto *Probe : I.first->getProbes()) {
FunctionProfile->addBodySamplesForProbe(Probe->getIndex(), 0);		FunctionProfile->addBodySamplesForProbe(Probe->getIndex(), 0);
}		}
}		}
}		}
}		}

void CSProfileGenerator::populateBoundarySamplesWithProbes(		void CSProfileGenerator::populateBoundarySamplesWithProbes(
const BranchSample &BranchCounter, SampleContextFrames ContextStack) {		const BranchSample &BranchCounter, const AddrBasedCtxKey *CtxKey) {
for (const auto &BI : BranchCounter) {		for (const auto &BI : BranchCounter) {
uint64_t SourceAddress = BI.first.first;		uint64_t SourceAddress = BI.first.first;
uint64_t TargetAddress = BI.first.second;		uint64_t TargetAddress = BI.first.second;
uint64_t Count = BI.second;		uint64_t Count = BI.second;
const MCDecodedPseudoProbe *CallProbe =		const MCDecodedPseudoProbe *CallProbe =
Binary->getCallProbeForAddr(SourceAddress);		Binary->getCallProbeForAddr(SourceAddress);
if (CallProbe == nullptr)		if (CallProbe == nullptr)
continue;		continue;
FunctionSamples &FunctionProfile =		FunctionSamples &FunctionProfile =
getFunctionProfileForLeafProbe(ContextStack, CallProbe);		getFunctionProfileForLeafProbe(CtxKey, CallProbe);
FunctionProfile.addBodySamples(CallProbe->getIndex(), 0, Count);		FunctionProfile.addBodySamples(CallProbe->getIndex(), 0, Count);
FunctionProfile.addTotalSamples(Count);		FunctionProfile.addTotalSamples(Count);
StringRef CalleeName = getCalleeNameForAddress(TargetAddress);		StringRef CalleeName = getCalleeNameForAddress(TargetAddress);
if (CalleeName.size() == 0)		if (CalleeName.size() == 0)
continue;		continue;
FunctionProfile.addCalledTargetSamples(CallProbe->getIndex(), 0, CalleeName,		FunctionProfile.addCalledTargetSamples(CallProbe->getIndex(), 0, CalleeName,
Count);		Count);
}		}
}		}

ContextTrieNode *CSProfileGenerator::getContextNodeForLeafProbe(		ContextTrieNode *CSProfileGenerator::getContextNodeForLeafProbe(
SampleContextFrames ContextStack, const MCDecodedPseudoProbe *LeafProbe) {		const AddrBasedCtxKey CtxKey, const MCDecodedPseudoProbe LeafProbe) {

		const SmallVectorImpl<uint64_t> *PContext = &CtxKey->Context;
		SmallVector<uint64_t, 16> NewContext;

		if (InferMissingFrames) {
		wenleiUnsubmitted Done Reply Inline Actions Update "tail call tracker" wenlei: Update "tail call tracker"
		SmallVector<uint64_t, 16> Context = CtxKey->Context;
		// Append leaf frame for a complete inference.
		Context.push_back(LeafProbe->getAddress());
		inferMissingFrames(Context, NewContext);
		// Pop out the leaf probe that was pushed in above.
		NewContext.pop_back();
		PContext = &NewContext;
		}

		SampleContextFrameVector ContextStack;
		extractPrefixContextStack(ContextStack, *PContext, Binary);

// Explicitly copy the context for appending the leaf context		// Explicitly copy the context for appending the leaf context
SampleContextFrameVector NewContextStack(ContextStack.begin(),		SampleContextFrameVector NewContextStack(ContextStack.begin(),
ContextStack.end());		ContextStack.end());
Binary->getInlineContextForProbe(LeafProbe, NewContextStack, true);		Binary->getInlineContextForProbe(LeafProbe, NewContextStack, true);
// For leaf inlined context with the top frame, we should strip off the top		// For leaf inlined context with the top frame, we should strip off the top
// frame's probe id, like:		// frame's probe id, like:
// Inlined stack: [foo:1, bar:2], the ContextId will be "foo:1 @ bar"		// Inlined stack: [foo:1, bar:2], the ContextId will be "foo:1 @ bar"
Show All 9 Lines	ContextTrieNode *CSProfileGenerator::getContextNodeForLeafProbe(
bool WasLeafInlined = LeafProbe->getInlineTreeNode()->hasInlineSite();		bool WasLeafInlined = LeafProbe->getInlineTreeNode()->hasInlineSite();
ContextTrieNode *ContextNode =		ContextTrieNode *ContextNode =
getOrCreateContextNode(NewContextStack, WasLeafInlined);		getOrCreateContextNode(NewContextStack, WasLeafInlined);
ContextNode->getFunctionSamples()->setFunctionHash(FuncDesc->FuncHash);		ContextNode->getFunctionSamples()->setFunctionHash(FuncDesc->FuncHash);
return ContextNode;		return ContextNode;
}		}

FunctionSamples &CSProfileGenerator::getFunctionProfileForLeafProbe(		FunctionSamples &CSProfileGenerator::getFunctionProfileForLeafProbe(
SampleContextFrames ContextStack, const MCDecodedPseudoProbe *LeafProbe) {		const AddrBasedCtxKey CtxKey, const MCDecodedPseudoProbe LeafProbe) {
return *getContextNodeForLeafProbe(ContextStack, LeafProbe)		return *getContextNodeForLeafProbe(CtxKey, LeafProbe)->getFunctionSamples();
->getFunctionSamples();
}		}

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-profgen/ProfiledBinary.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;
using namespace llvm::object;		using namespace llvm::object;

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

class ProfiledBinary;		class ProfiledBinary;
		class MissingFrameInferrer;

struct InstructionPointer {		struct InstructionPointer {
const ProfiledBinary *Binary;		const ProfiledBinary *Binary;
// Address of the executable segment of the binary.		// Address of the executable segment of the binary.
uint64_t Address;		uint64_t Address;
// Index to the sorted code address array of the binary.		// Index to the sorted code address array of the binary.
uint64_t Index = 0;		uint64_t Index = 0;
InstructionPointer(const ProfiledBinary *Binary, uint64_t Address,		InstructionPointer(const ProfiledBinary *Binary, uint64_t Address,
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	class ProfiledBinary {
std::unordered_set<uint64_t> CallAddressSet;		std::unordered_set<uint64_t> CallAddressSet;
// A set of return instruction addresses. Used by virtual unwinding.		// A set of return instruction addresses. Used by virtual unwinding.
std::unordered_set<uint64_t> RetAddressSet;		std::unordered_set<uint64_t> RetAddressSet;
// An ordered set of unconditional branch instruction addresses.		// An ordered set of unconditional branch instruction addresses.
std::set<uint64_t> UncondBranchAddrSet;		std::set<uint64_t> UncondBranchAddrSet;
// A set of branch instruction addresses.		// A set of branch instruction addresses.
std::unordered_set<uint64_t> BranchAddressSet;		std::unordered_set<uint64_t> BranchAddressSet;

// Estimate and track function prolog and epilog ranges.		// Estimate and track function prolog and epilog ranges.
PrologEpilogTracker ProEpilogTracker;		PrologEpilogTracker ProEpilogTracker;

		// Infer missing frames due to compiler optimizations such as tail call
		// elimination.
		wenleiUnsubmitted Not Done Reply Inline Actions Did you choose `unordered_multimap<uint64_t, uint64_t>` over `unordered_map<uint64_t, unordered_set<uint64_t>>` to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner.. The name is bit confusing because this contains both call source and target. I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now. wenlei: 1. Did you choose `unordered_multimap<uint64_t, uint64_t>` over `unordered_map<uint64_t…
		hoyAuthorUnsubmitted Done Reply Inline Actions Did you choose unordered_multimap<uint64_t, uint64_t> over unordered_map<uint64_t, unordered_set<uint64_t>> to optimize for mostly 1 target per call? multimap has multiple copies of the keys too, but map doesn't. The later feels a bit cleaner.. Yeah, that's the point. Single targets take 95% and with multipmap it wouldn't bother construct a container for each of those targets. Duplications are handled in `refineTailCallTargets` . The name is bit confusing because this contains both call source and target. How about CallPairs or just CallEdges? I haven't think through this carefully, but these data structures seems quite ad-hoc in that it somewhat overlaps with both CallAddressSet above and the data in the new tracker. I know that each serves slightly different purpose, but it just feels a bit unorganized, and bolted on the way it is now. Yeah, they are separated mostly because of efficiency. A `unordered_map<uint64_t, unordered_set<uint64_t>>` would unify them but at the expense of efficiency. hoy: > Did you choose unordered_multimap<uint64_t, uint64_t> over unordered_map<uint64_t…
		wenleiUnsubmitted Not Done Reply Inline Actions There's also duplication between ProfiledBinary members and MissingFrameInferrer members. wenlei: There's also duplication between ProfiledBinary members and MissingFrameInferrer members.
		hoyAuthorUnsubmitted Done Reply Inline Actions The MissingFrameInferrer members are mostly `BinaryFunction` based while the ProfiledBinary members are address based. The separation is to reduce the disassembling time by avoiding the address to BinaryFunction lookup there. hoy: The MissingFrameInferrer members are mostly `BinaryFunction` based while the ProfiledBinary…
		wenleiUnsubmitted Not Done Reply Inline Actions Yeah, that's the point. Single targets take 95% and with multipmap it wouldn't bother construct a container for each of those targets. Duplications are handled in refineTailCallTargets . You mentioned efficiency a few times here. I agree that unifying may have a (slight) cost. But I feel that the actual cost of inferring missing frame is from the "query" or "infer" time, not the "setup" time here. These things are hard to estimate exactly without measuring, but I would guess that changing the data structure to favor readability, simplicity and structure would not impact performance visibly as long as it doesn't hurt query path. For that reason, I feel that the optimization here could be premature at the expense of readability, and I'm still not convinced it is needed. How about CallPairs or just CallEdges? CallEdges sounds good. The MissingFrameInferrer members are mostly BinaryFunction based while the ProfiledBinary members are address based. The separation is to reduce the disassembling time by avoiding the address to BinaryFunction lookup there. The look up should be quick, right? And shifting the look up cost earlier (as opposed to later in inferrer) doesn't mean we're doing more work? One other thing to consider is to move data structure only needed by tail call into the inferrer, as these are technically owned by inferrer. Ideally, we want to reduce duplication, but if that's not possible, we can move the address based map into inferrer as well, and have it populated during disassembly time. This way these seemingly similar data is at least centralized together. wenlei: > Yeah, that's the point. Single targets take 95% and with multipmap it wouldn't bother…
		hoyAuthorUnsubmitted Done Reply Inline Actions OK, switched to using single maps, which appears to be 3% slower than using multimaps. This pushes out the existing 8% overhead to 11%. Do we think it's worth a trade-off for readability? hoy: OK, switched to using single maps, which appears to be 3% slower than using multimaps. This…
		wenleiUnsubmitted Not Done Reply Inline Actions Thanks for experimenting. 3% extra overhead is a bit surprising. Could you do one more thing to confirm where this is coming from -- whether it's initialization or query/infer time overhead? I thought this change would mostly affect initialization path, which I don't expect to be dominant in cost. Is that not the case? what's the cost split between initialization and query/infer? If there's indeed a solid, repeatable 3% perf hit, I don't think it's worth the overhead for readability. wenlei: Thanks for experimenting. 3% extra overhead is a bit surprising. Could you do one more thing to…
		hoyAuthorUnsubmitted Done Reply Inline Actions Looks like the slowdown is mostly from initialization path where all functions (sampled or not) are disassembled. Interestingly, I remeasured for three times and the average slowdown is like 1.3%. The actual wall time is like 28min50sec vs 29min12 sec. I guess it isn't a big deal? The single map implementation is indeed simpler. hoy: Looks like the slowdown is mostly from initialization path where all functions (sampled or not)…
		wenleiUnsubmitted Not Done Reply Inline Actions Sounds good. Thanks for working through this patiently! wenlei: Sounds good. Thanks for working through this patiently!
		std::unique_ptr<MissingFrameInferrer> MissingContextInferrer;

// Track function sizes under different context		// Track function sizes under different context
BinarySizeContextTracker FuncSizeTracker;		BinarySizeContextTracker FuncSizeTracker;

// The symbolizer used to get inline context for an instruction.		// The symbolizer used to get inline context for an instruction.
std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;		std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;

// String table owning function name strings created from the symbolizer.		// String table owning function name strings created from the symbolizer.
std::unordered_set<std::string> NameStrings;		std::unordered_set<std::string> NameStrings;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	class ProfiledBinary {

// Load debug info from DWARF unit.		// Load debug info from DWARF unit.
void loadSymbolsFromDWARFUnit(DWARFUnit &CompilationUnit);		void loadSymbolsFromDWARFUnit(DWARFUnit &CompilationUnit);

// Create elf symbol to its start address mapping.		// Create elf symbol to its start address mapping.
void populateElfSymbolAddressList(const ELFObjectFileBase *O);		void populateElfSymbolAddressList(const ELFObjectFileBase *O);

// A function may be spilt into multiple non-continuous address ranges. We use		// A function may be spilt into multiple non-continuous address ranges. We use
// this to set whether start address of a function is the real entry of the		// this to set whether start a function range is the real entry of the
// function and also set false to the non-function label.		// function and also set false to the non-function label.
void setIsFuncEntry(uint64_t Address, StringRef RangeSymName);		void setIsFuncEntry(FuncRange *FRange, StringRef RangeSymName);

// Warn if no entry range exists in the function.		// Warn if no entry range exists in the function.
void warnNoFuncEntry();		void warnNoFuncEntry();

/// Dissassemble the text section and build various address maps.		/// Dissassemble the text section and build various address maps.
void disassemble(const ELFObjectFileBase *O);		void disassemble(const ELFObjectFileBase *O);

/// Helper function to dissassemble the symbol and extract info for unwinding		/// Helper function to dissassemble the symbol and extract info for unwinding
bool dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,		bool dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,
SectionSymbolsTy &Symbols, const SectionRef &Section);		SectionSymbolsTy &Symbols, const SectionRef &Section);
/// Symbolize a given instruction pointer and return a full call context.		/// Symbolize a given instruction pointer and return a full call context.
SampleContextFrameVector symbolize(const InstructionPointer &IP,		SampleContextFrameVector symbolize(const InstructionPointer &IP,
bool UseCanonicalFnName = false,		bool UseCanonicalFnName = false,
bool UseProbeDiscriminator = false);		bool UseProbeDiscriminator = false);
/// Decode the interesting parts of the binary and build internal data		/// Decode the interesting parts of the binary and build internal data
/// structures. On high level, the parts of interest are:		/// structures. On high level, the parts of interest are:
/// 1. Text sections, including the main code section and the PLT		/// 1. Text sections, including the main code section and the PLT
/// entries that will be used to handle cross-module call transitions.		/// entries that will be used to handle cross-module call transitions.
/// 2. The .debug_line section, used by Dwarf-based profile generation.		/// 2. The .debug_line section, used by Dwarf-based profile generation.
/// 3. Pseudo probe related sections, used by probe-based profile		/// 3. Pseudo probe related sections, used by probe-based profile
/// generation.		/// generation.
void load();		void load();

public:		public:
ProfiledBinary(const StringRef ExeBinPath, const StringRef DebugBinPath)		ProfiledBinary(const StringRef ExeBinPath, const StringRef DebugBinPath);
: Path(ExeBinPath), DebugBinaryPath(DebugBinPath), ProEpilogTracker(this),		~ProfiledBinary();
TrackFuncContextSize(EnableCSPreInliner &&
UseContextCostForPreInliner) {
// Point to executable binary if debug info binary is not specified.
SymbolizerPath = DebugBinPath.empty() ? ExeBinPath : DebugBinPath;
setupSymbolizer();
load();
}

void decodePseudoProbe();		void decodePseudoProbe();

StringRef getPath() const { return Path; }		StringRef getPath() const { return Path; }
StringRef getName() const { return llvm::sys::path::filename(Path); }		StringRef getName() const { return llvm::sys::path::filename(Path); }
uint64_t getBaseAddress() const { return BaseAddress; }		uint64_t getBaseAddress() const { return BaseAddress; }
void setBaseAddress(uint64_t Address) { BaseAddress = Address; }		void setBaseAddress(uint64_t Address) { BaseAddress = Address; }

▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	public:
void setProfiledFunctions(std::unordered_set<const BinaryFunction *> &Funcs) {		void setProfiledFunctions(std::unordered_set<const BinaryFunction *> &Funcs) {
ProfiledFunctions = Funcs;		ProfiledFunctions = Funcs;
}		}

BinaryFunction *getBinaryFunction(StringRef FName) {		BinaryFunction *getBinaryFunction(StringRef FName) {
auto I = BinaryFunctions.find(FName.str());		auto I = BinaryFunctions.find(FName.str());
if (I == BinaryFunctions.end())		if (I == BinaryFunctions.end())
return nullptr;		return nullptr;
return &I->second;		return &I->second;
		wenleiUnsubmitted Done Reply Inline Actions nit: `type& get..` -> `type &get..` wenlei: nit: `type& get..` -> `type &get..`
}		}

uint32_t getFuncSizeForContext(const ContextTrieNode *ContextNode) {		uint32_t getFuncSizeForContext(const ContextTrieNode *ContextNode) {
return FuncSizeTracker.getFuncSizeForContext(ContextNode);		return FuncSizeTracker.getFuncSizeForContext(ContextNode);
}		}

		void inferMissingFrames(const SmallVectorImpl<uint64_t> &Context,
		SmallVectorImpl<uint64_t> &NewContext);

// Load the symbols from debug table and populate into symbol list.		// Load the symbols from debug table and populate into symbol list.
void populateSymbolListFromDWARF(ProfileSymbolList &SymbolList);		void populateSymbolListFromDWARF(ProfileSymbolList &SymbolList);

SampleContextFrameVector		SampleContextFrameVector
getFrameLocationStack(uint64_t Address, bool UseProbeDiscriminator = false) {		getFrameLocationStack(uint64_t Address, bool UseProbeDiscriminator = false) {
InstructionPointer IP(this, Address);		InstructionPointer IP(this, Address);
return symbolize(IP, true, UseProbeDiscriminator);		return symbolize(IP, true, UseProbeDiscriminator);
}		}
Show All 12 Lines	std::optional<SampleContextFrame> getInlineLeafFrameLoc(uint64_t Address) {
const auto &Stack = getCachedFrameLocationStack(Address);		const auto &Stack = getCachedFrameLocationStack(Address);
if (Stack.empty())		if (Stack.empty())
return {};		return {};
return Stack.back();		return Stack.back();
}		}

void flushSymbolizer() { Symbolizer.reset(); }		void flushSymbolizer() { Symbolizer.reset(); }

		MissingFrameInferrer* getMissingContextInferrer() {
		return MissingContextInferrer.get();
		}

// Compare two addresses' inline context		// Compare two addresses' inline context
bool inlineContextEqual(uint64_t Add1, uint64_t Add2);		bool inlineContextEqual(uint64_t Add1, uint64_t Add2);

// Get the full context of the current stack with inline context filled in.		// Get the full context of the current stack with inline context filled in.
// It will search the disassembling info stored in AddressToLocStackMap. This		// It will search the disassembling info stored in AddressToLocStackMap. This
// is used as the key of function sample map		// is used as the key of function sample map
SampleContextFrameVector		SampleContextFrameVector
getExpandedContext(const SmallVectorImpl<uint64_t> &Stack,		getExpandedContext(const SmallVectorImpl<uint64_t> &Stack,
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfiledBinary.cpp

//===-- ProfiledBinary.cpp - Binary decoder ---------------------- C++ --===//		//===-- ProfiledBinary.cpp - Binary decoder ---------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ProfiledBinary.h"		#include "ProfiledBinary.h"
#include "ErrorHandling.h"		#include "ErrorHandling.h"
		#include "MissingFrameInferrer.h"
#include "ProfileGenerator.h"		#include "ProfileGenerator.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"		#include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"
#include "llvm/Demangle/Demangle.h"		#include "llvm/Demangle/Demangle.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
		#include "llvm/Support/Debug.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include <optional>		#include <optional>

#define DEBUG_TYPE "load-binary"		#define DEBUG_TYPE "load-binary"

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;
Show All 23 Lines	DWPPath("dwp", cl::init(""),
"<binary>.dwp in the same directory as the main binary."));		"<binary>.dwp in the same directory as the main binary."));

static cl::list<std::string> DisassembleFunctions(		static cl::list<std::string> DisassembleFunctions(
"disassemble-functions", cl::CommaSeparated,		"disassemble-functions", cl::CommaSeparated,
cl::desc("List of functions to print disassembly for. Accept demangled "		cl::desc("List of functions to print disassembly for. Accept demangled "
"names only. Only work with show-disassembly-only"));		"names only. Only work with show-disassembly-only"));

extern cl::opt<bool> ShowDetailedWarning;		extern cl::opt<bool> ShowDetailedWarning;
		extern cl::opt<bool> InferMissingFrames;

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

static const Target getTarget(const ObjectFile Obj) {		static const Target getTarget(const ObjectFile Obj) {
Triple TheTriple = Obj->makeTriple();		Triple TheTriple = Obj->makeTriple();
std::string Error;		std::string Error;
std::string ArchName;		std::string ArchName;
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	for (const auto &ChildNode : ProbeNode.getChildren()) {
ProbeContext.back().second = std::get<1>(Location);		ProbeContext.back().second = std::get<1>(Location);
trackInlineesOptimizedAway(ProbeDecoder, *ChildNode.second.get(),		trackInlineesOptimizedAway(ProbeDecoder, *ChildNode.second.get(),
ProbeContext);		ProbeContext);
}		}

ProbeContext.pop_back();		ProbeContext.pop_back();
}		}

		ProfiledBinary::ProfiledBinary(const StringRef ExeBinPath,
		const StringRef DebugBinPath)
		: Path(ExeBinPath), DebugBinaryPath(DebugBinPath), ProEpilogTracker(this),
		TrackFuncContextSize(EnableCSPreInliner && UseContextCostForPreInliner) {
		// Point to executable binary if debug info binary is not specified.
		SymbolizerPath = DebugBinPath.empty() ? ExeBinPath : DebugBinPath;
		setupSymbolizer();
		if (InferMissingFrames)
		MissingContextInferrer = std::make_unique<MissingFrameInferrer>(this);
		load();
		}

		ProfiledBinary::~ProfiledBinary() {}

void ProfiledBinary::warnNoFuncEntry() {		void ProfiledBinary::warnNoFuncEntry() {
uint64_t NoFuncEntryNum = 0;		uint64_t NoFuncEntryNum = 0;
for (auto &F : BinaryFunctions) {		for (auto &F : BinaryFunctions) {
if (F.second.Ranges.empty())		if (F.second.Ranges.empty())
continue;		continue;
bool hasFuncEntry = false;		bool hasFuncEntry = false;
for (auto &R : F.second.Ranges) {		for (auto &R : F.second.Ranges) {
if (FuncRange *FR = findFuncRangeForStartAddr(R.first)) {		if (FuncRange *FR = findFuncRangeForStartAddr(R.first)) {
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines

void ProfiledBinary::decodePseudoProbe() {		void ProfiledBinary::decodePseudoProbe() {
OwningBinary<Binary> OBinary = unwrapOrError(createBinary(Path), Path);		OwningBinary<Binary> OBinary = unwrapOrError(createBinary(Path), Path);
Binary &ExeBinary = *OBinary.getBinary();		Binary &ExeBinary = *OBinary.getBinary();
auto *Obj = dyn_cast<ELFObjectFileBase>(&ExeBinary);		auto *Obj = dyn_cast<ELFObjectFileBase>(&ExeBinary);
decodePseudoProbe(Obj);		decodePseudoProbe(Obj);
}		}

void ProfiledBinary::setIsFuncEntry(uint64_t Address, StringRef RangeSymName) {		void ProfiledBinary::setIsFuncEntry(FuncRange *FuncRange,
// Note that the start address of each ELF section can be a non-function		StringRef RangeSymName) {
// symbol, we need to binary search for the start of a real function range.
auto *FuncRange = findFuncRange(Address);
// Skip external function symbol.		// Skip external function symbol.
if (!FuncRange)		if (!FuncRange)
return;		return;

// Set IsFuncEntry to ture if there is only one range in the function or the		// Set IsFuncEntry to ture if there is only one range in the function or the
// RangeSymName from ELF is equal to its DWARF-based function name.		// RangeSymName from ELF is equal to its DWARF-based function name.
if (FuncRange->Func->Ranges.size() == 1 \|\|		if (FuncRange->Func->Ranges.size() == 1 \|\|
(!FuncRange->IsFuncEntry && FuncRange->getFuncName() == RangeSymName))		(!FuncRange->IsFuncEntry && FuncRange->getFuncName() == RangeSymName))
FuncRange->IsFuncEntry = true;		FuncRange->IsFuncEntry = true;
}		}

bool ProfiledBinary::dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,		bool ProfiledBinary::dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,
SectionSymbolsTy &Symbols,		SectionSymbolsTy &Symbols,
const SectionRef &Section) {		const SectionRef &Section) {
std::size_t SE = Symbols.size();		std::size_t SE = Symbols.size();
uint64_t SectionAddress = Section.getAddress();		uint64_t SectionAddress = Section.getAddress();
uint64_t SectSize = Section.getSize();		uint64_t SectSize = Section.getSize();
uint64_t StartAddress = Symbols[SI].Addr;		uint64_t StartAddress = Symbols[SI].Addr;
uint64_t NextStartAddress =		uint64_t NextStartAddress =
(SI + 1 < SE) ? Symbols[SI + 1].Addr : SectionAddress + SectSize;		(SI + 1 < SE) ? Symbols[SI + 1].Addr : SectionAddress + SectSize;
setIsFuncEntry(StartAddress,		FuncRange *FRange = findFuncRange(StartAddress);
FunctionSamples::getCanonicalFnName(Symbols[SI].Name));		setIsFuncEntry(FRange, FunctionSamples::getCanonicalFnName(Symbols[SI].Name));
		wleiUnsubmitted Not Done Reply Inline Actions Nit: `findFuncRange` is also called inside `setIsFuncEntry`, perhaps we could hoist and reuse that, this should save some running time. wlei: Nit: `findFuncRange` is also called inside `setIsFuncEntry`, perhaps we could hoist and reuse…
		hoyAuthorUnsubmitted Done Reply Inline Actions Good point, fixed. hoy: Good point, fixed.

StringRef SymbolName =		StringRef SymbolName =
ShowCanonicalFnName		ShowCanonicalFnName
? FunctionSamples::getCanonicalFnName(Symbols[SI].Name)		? FunctionSamples::getCanonicalFnName(Symbols[SI].Name)
: Symbols[SI].Name;		: Symbols[SI].Name;
bool ShowDisassembly =		bool ShowDisassembly =
ShowDisassemblyOnly && (DisassembleFunctionSet.empty() \|\|		ShowDisassemblyOnly && (DisassembleFunctionSet.empty() \|\|
DisassembleFunctionSet.count(SymbolName));		DisassembleFunctionSet.count(SymbolName));
if (ShowDisassembly)		if (ShowDisassembly)
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (Disassembled) {
RetAddressSet.insert(Address);		RetAddressSet.insert(Address);
UncondBranchAddrSet.insert(Address);		UncondBranchAddrSet.insert(Address);
} else if (MCDesc.isBranch()) {		} else if (MCDesc.isBranch()) {
if (MCDesc.isUnconditionalBranch())		if (MCDesc.isUnconditionalBranch())
UncondBranchAddrSet.insert(Address);		UncondBranchAddrSet.insert(Address);
BranchAddressSet.insert(Address);		BranchAddressSet.insert(Address);
}		}

		// Record potential call targets for tail frame inference later-on.
		if (InferMissingFrames && FRange) {
		uint64_t Target = 0;
		MIA->evaluateBranch(Inst, Address, Size, Target);
		if (MCDesc.isCall()) {
		// Indirect call targets are unknown at this point. Recording the
		// unknown target (zero) for further LBR-based refinement.
		MissingContextInferrer->CallEdges[Address].insert(Target);
		} else if (MCDesc.isUnconditionalBranch()) {
		assert(Target &&
		wenleiUnsubmitted Not Done Reply Inline Actions Can we assert that `Target` is always available from `evaluateBranch` for direct branch? wenlei: Can we assert that `Target` is always available from `evaluateBranch` for direct branch?
		hoyAuthorUnsubmitted Done Reply Inline Actions Done. hoy: Done.
		"target should be known for unconditional direct branch");
		// Any inter-function unconditional jump is considered tail call at
		// this point. This is not 100% accurate and could further be
		// optimized based on some source annotation.
		FuncRange *ToFRange = findFuncRange(Target);
		if (ToFRange && ToFRange->Func != FRange->Func)
		MissingContextInferrer->TailCallEdges[Address].insert(Target);
		LLVM_DEBUG({
		dbgs() << "Direct Tail call: " << format("%8" PRIx64 ":", Address);
		IPrinter->printInst(&Inst, Address + Size, "", *STI.get(), dbgs());
		dbgs() << "\n";
		});
		} else if (MCDesc.isIndirectBranch() && MCDesc.isBarrier()) {
		wenleiUnsubmitted Not Done Reply Inline Actions Is `MCDesc.isBarrier()` for filtering unconditional branch? Make it explicit in the comments below. wenlei: Is `MCDesc.isBarrier()` for filtering unconditional branch? Make it explicit in the comments…
		hoyAuthorUnsubmitted Done Reply Inline Actions It's to filter out conditional branch. Tail call is an unconditional branch. Comment updated. hoy: It's to filter out conditional branch. Tail call is an unconditional branch. Comment updated.
		// This is an indirect branch but not necessarily an indirect tail
		// call. The isBarrier check is to filter out conditional branch.
		// Similar with indirect call targets, recording the unknown target
		// (zero) for further LBR-based refinement.
		MissingContextInferrer->TailCallEdges[Address].insert(Target);
		LLVM_DEBUG({
		dbgs() << "Indirect Tail call: "
		<< format("%8" PRIx64 ":", Address);
		IPrinter->printInst(&Inst, Address + Size, "", *STI.get(), dbgs());
		dbgs() << "\n";
		});
		}
		}

if (InvalidInstLength) {		if (InvalidInstLength) {
WarnInvalidInsts(Address - InvalidInstLength, Address - 1);		WarnInvalidInsts(Address - InvalidInstLength, Address - 1);
InvalidInstLength = 0;		InvalidInstLength = 0;
}		}
} else {		} else {
InvalidInstLength += Size;		InvalidInstLength += Size;
}		}

▲ Show 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	if (usePseudoProbes()) {
if (I != TopLevelProbeFrameMap.end()) {		if (I != TopLevelProbeFrameMap.end()) {
BinarySizeContextTracker::ProbeFrameStack ProbeContext;		BinarySizeContextTracker::ProbeFrameStack ProbeContext;
FuncSizeTracker.trackInlineesOptimizedAway(ProbeDecoder, *I->second,		FuncSizeTracker.trackInlineesOptimizedAway(ProbeDecoder, *I->second,
ProbeContext);		ProbeContext);
}		}
}		}
}		}

		void ProfiledBinary::inferMissingFrames(
		const SmallVectorImpl<uint64_t> &Context,
		SmallVectorImpl<uint64_t> &NewContext) {
		MissingContextInferrer->inferMissingFrames(Context, NewContext);
		}

InstructionPointer::InstructionPointer(const ProfiledBinary *Binary,		InstructionPointer::InstructionPointer(const ProfiledBinary *Binary,
uint64_t Address, bool RoundToNext)		uint64_t Address, bool RoundToNext)
: Binary(Binary), Address(Address) {		: Binary(Binary), Address(Address) {
Index = Binary->getIndexForAddr(Address);		Index = Binary->getIndexForAddr(Address);
if (RoundToNext) {		if (RoundToNext) {
// we might get address which is not the code		// we might get address which is not the code
// it should round to the next valid address		// it should round to the next valid address
if (Index >= Binary->getCodeAddrVecSize())		if (Index >= Binary->getCodeAddrVecSize())
Show All 33 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Missing frame inference.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 483557

llvm/test/tools/llvm-profgen/Inputs/noinline-tailcall-probe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-tailcall-probe.perfscript

llvm/test/tools/llvm-profgen/cs-tailcall.test

llvm/tools/llvm-profgen/CMakeLists.txt

llvm/tools/llvm-profgen/MissingFrameInferrer.h

llvm/tools/llvm-profgen/MissingFrameInferrer.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/ProfiledBinary.cpp

[CSSPGO][llvm-profgen] Missing frame inference.
ClosedPublic