This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
29/66
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
1/2
pseudo-probe-stale-profile-matching.prof
-
pseudo-probe-stale-profile-matching.ll

Differential D147545

[SamplePGO] Stale profile matching(part 2)
ClosedPublic

Authored by wlei on Apr 4 2023, 9:57 AM.

Download Raw Diff

Details

Reviewers

hoy
wenlei
xur
davidxl
mingmingl
mtrofin

Commits

rG892daede7201: [SamplePGO] Stale profile matching(part 2)

Summary

Part 2 of https://reviews.llvm.org/D147456

Call target name anchor based profile fuzzy matching

Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular:

Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles.
Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped.

The input of the algorithm:

IR locations: the anchor is the callee name of direct callsite.
Profile locations: the anchor is the call target name for BodySamples or inlinee's profile name for CallsiteSamples.

The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor.
For example: say location 1.2(foo) refers to a callsite at 1.2 with callee name foo and 1.3 refers to a non-directcall location 1.3.

// The current build source code:
   int main() {
1.     ...
2.     foo();
3.     ...
4      ...
5.     ...
6.     bar();
7.     ...
   }

IR locations are populated and simplified as: [1, 2(foo), 3, 5, 6(bar), 7].

; The "stale" profile:
main:350:1
 1: 1
 2: 3
 3: 100 foo:100
 4: 2
 7: 2
 8: 200 bar:200
 9: 30

Profile locations are populated and simplified as [1, 2, 3(foo), 4, 7, 8(bar), 9]
Matching heuristic:

Match all the anchors in lexical order first.
Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor.

So the example above is matched like:

[1,    2(foo), 3,  5,  6(bar), 7]
 |     |       |   |     |     |
[1, 2, 3(foo), 4,  7,  8(bar), 9]

3 -> 4 matching is based on anchor foo, 5 -> 7 matching is based on anchor bar.
The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9].

For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into IRToProfileLocationMap(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(distributeIRToProfileLocationMap)

Performance evaluation:

Clang-self build benchmark:
Current build version: clang-10
The profiled version: clang-9
Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list).

Regression to using refresh profile with this off : -3.93%
Regression to using refresh profile with this on : -1.1%

So this algorithm can recover ~72% of the regression.
Internal(Meta) large-scale services.
we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win.

Notes or future work:

Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure.
The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future.
Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples.
This doesn't handle function name mismatch, we plan to solve it in future.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wlei created this revision.Apr 4 2023, 9:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 9:57 AM

Herald added subscribers: ormris, hoy, wenlei, hiraditya. · View Herald Transcript

wlei requested review of this revision.Apr 4 2023, 9:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 9:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B223609: Diff 510846.Apr 4 2023, 9:59 AM

wlei edited the summary of this revision. (Show Details)Apr 5 2023, 2:11 PM

wlei edited the summary of this revision. (Show Details)Apr 5 2023, 2:26 PM

wlei mentioned this in D147456: [SamplePGO] Stale profile matching(part 1).Apr 5 2023, 2:38 PM

Updating D147545: [CSSPGO] Stale profile matching(part 2)

Harbormaster completed remote builds in B223903: Diff 511242.Apr 5 2023, 5:27 PM

Updating D147545: [CSSPGO] Stale profile matching(part 2)

Harbormaster completed remote builds in B223904: Diff 511243.Apr 5 2023, 5:41 PM

wlei edited the summary of this revision. (Show Details)Apr 5 2023, 6:05 PM

wlei added reviewers: hoy, wenlei, xur, davidxl, mingmingl, mtrofin.

wenlei added inline comments.Apr 5 2023, 6:12 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1842	Unrelated to this change, but it may be useful to have a way for users to print out functions with stale profile without needing debug compiler. Often times users ask this: is this change going to invalidate profile for this function? having a switch may help them self-diagnose to get an answer.
1848	As mentioned below, I think we should still throw away profiles that can't be matched due to missing call site anchors.
2235–2241	This doesn't seem to do what you intended to achieve, or at least what the comment says non-anchor location match is based on the offset to the last matched anchor. If last matched anchor doesn't exist, should we still match anything? Consider a case where a function is completely changed, and there's no matched callsite at call, in which case we should probably still throw away its profile instead of matching everything with LocationDelta == 0?
2282–2284	Wondering if we can simplify by merging the two containers, i.e use `std::map<LineLocation, StringRef>`, and empty StringRef value for non-direct-callsite, then we avoid the 2nd map, and also avoid the map look up in `runStaleProfileMatching`. It may use slightly more memory, but feels a bit cleaner. Or even something like `std::map<LineLocation, std::pair<StringRef, bool>>` to have `MatchedCallsiteLocs` merged in as well. These containers feel a bit duplicated.

rebase on https://reviews.llvm.org/D147456

Harbormaster completed remote builds in B223920: Diff 511276.Apr 5 2023, 10:23 PM

wlei added a parent revision: D147456: [SamplePGO] Stale profile matching(part 1).Apr 5 2023, 10:24 PM

wlei added inline comments.Apr 5 2023, 11:01 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2235–2241	Sorry if my description is unclear. My implementation assumes a special non-callsite anchor: `0`, i.e. the beginning of the function. Without the fuzzy matching, it's just this only one anchor matched, and all the others(non-anchor) use the `LocationDelta = 0` to do the matching, which means the match source is equal to the target. Consider a case where a function is completely changed, and there's no matched callsite at call, in which case we should probably still throw away its profile instead of matching everything with LocationDelta == 0? Say: IR locations: [1...100 101(foo)] Profile locations: [1...100 101(bar)] So my intuition is even without any callsite anchor, matching the leading location would be better than dropping them. (remembering current pseudo-probe implementation, the bb ids all come first, it's not uncommon)

wenlei added inline comments.Apr 6 2023, 8:30 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2235–2241	Hmm.. I thought if there's no matching call site, that would mean the function has completely changed, so we can't match probes/blocks in a reasonable way. I can see what you have can help matching leaf functions without call sites. But there needs to be something to make sure when the change is too big, we bail out instead of doing low confidence matching. Maybe we should bail out on zero matched call site + different number of probes/blocks?

hoy added inline comments.Apr 6 2023, 10:46 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2166	Please add a comment for this. Basically we are filtering out possible indirect calls.
2191	May be worth mentioning the matching is based on sequential lexical order. "First come" means the anchors with lexically early locations.
2287	Should this be conditioned under `FunctionSamples::ProfileIsProbeBased` if the same is going to be used for non-CS profile? Also if profile does match, there's no need to populate `AllLocations`?
2288	nit: use emplace
llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof
8	Add a case for multiple call targets at same location?

davidxl added inline comments.Apr 6 2023, 12:06 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
449	nit: location -> source location
1845–1846	nit: probably rename the option to be 'resync-stale-profile' to make the meaning more accurate.
2160	The mapping from callee to location is 1 to many. Should that be handled with sequential id (per callee)?
2166	indirect call sites are good anchors too. Why not use sequential id based anchoring?

wenlei added inline comments.Apr 6 2023, 2:57 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2166	The call site matching is based on callee names, indirect call from IR doesn't have callee name, and we opt to be conservative here (using name, instead of sequential id).

wlei added inline comments.Apr 6 2023, 3:41 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1845–1846	Thanks for the name suggestion. so "resync the profile according to current build source code", it sounds good to me, @wenlei see if you are good for this name since you suggested the salvage-stale-profile
2160	The mapping from callee to location is 1 to many. Should that be handled with sequential id (per callee)? Here is also `1 to many` map, see `StringMap<std::set<LineLocation>> &CalleeToCallsitesMap` which is a callee name string to a set of location. So with no callsite changes, this works the same as using sequential id, but with callsite change, this should be more resilient. Say if we add one or delete the first callsite, all following callsite would be mismatched if using sequential id match, but for using this callsite name match, all other callsite with different name won't be affected.
2235–2241	Good point, I will add some check(like a threshold) to make sure the very bad match to be bailed out and drop the profiles.

consider this scenario:

{

S1
foo();
S2

}

{

S3
bar();
S2

}

The first block including foo() call is shifted in source code, but the bar() block does not.

With this remapping algorithm, A won't find sample, bar() and S2 is fine; but C's location will be wrongly remapped.

I wonder if there is a need to do two way anchoring to handle this.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2219	What does RA stand for?

In D147545#4252021, @davidxl wrote:
consider this scenario:

{
S1
foo();
S2
}

{
S3
bar();
S2
}

The first block including foo() call is shifted in source code, but the bar() block does not.

With this remapping algorithm, A won't find sample, bar() and S2 is fine; but C's location will be wrongly remapped.

I wonder if there is a need to do two way anchoring to handle this.

Can you please point out what A and C are? Do they refer to S1 and S3, respectively?

I think the intuition of the current matching algorithm focuses on getting the callsites right as the first step, as we found out that brining back missing inlining (due to profile mismatch) is quite helpful. Getting the block weights right is also important, and there will be upcoming improvements on a better numbering of callsite ids and probe ids (mostly for CSSPGO). That said, the current heuristics is never perfect as it does not try to reason about user semantics changes. Our observation is that for many cases users don't change hot function calls often, so we currently rely on the calls as anchors.

In D147545#4255847, @hoy wrote:
In D147545#4252021, @davidxl wrote:
consider this scenario:

{
S1
foo();
S2
}

{
S3
bar();
S2
}

The first block including foo() call is shifted in source code, but the bar() block does not.

With this remapping algorithm, A won't find sample, bar() and S2 is fine; but C's location will be wrongly remapped.

I wonder if there is a need to do two way anchoring to handle this.
Can you please point out what A and C are? Do they refer to S1 and S3, respectively?

Right -- sorry about the confusion.

I think the intuition of the current matching algorithm focuses on getting the callsites right as the first step, as we found out that brining back missing inlining (due to profile mismatch) is quite helpful. Getting the block weights right is also important, and there will be upcoming improvements on a better numbering of callsite ids and probe ids (mostly for CSSPGO). That said, the current heuristics is never perfect as it does not try to reason about user semantics changes. Our observation is that for many cases users don't change hot function calls often, so we currently rely on the calls as anchors.

Understand. I also believe this patch works for most of the cases. The thing that remains concerning is if there are cases that applying the source remapping lead to worse performance. For instances, statements in S3 may get wrong profile data after remapping.

In D147545#4255880, @davidxl wrote:
In D147545#4255847, @hoy wrote:
In D147545#4252021, @davidxl wrote:
consider this scenario:

{
S1
foo();
S2
}

{
S3
bar();
S2
}

The first block including foo() call is shifted in source code, but the bar() block does not.

With this remapping algorithm, A won't find sample, bar() and S2 is fine; but C's location will be wrongly remapped.

I wonder if there is a need to do two way anchoring to handle this.
Can you please point out what A and C are? Do they refer to S1 and S3, respectively?
Right -- sorry about the confusion.

I think the intuition of the current matching algorithm focuses on getting the callsites right as the first step, as we found out that brining back missing inlining (due to profile mismatch) is quite helpful. Getting the block weights right is also important, and there will be upcoming improvements on a better numbering of callsite ids and probe ids (mostly for CSSPGO). That said, the current heuristics is never perfect as it does not try to reason about user semantics changes. Our observation is that for many cases users don't change hot function calls often, so we currently rely on the calls as anchors.

Understand. I also believe this patch works for most of the cases. The thing that remains concerning is if there are cases that applying the source remapping lead to worse performance. For instances, statements in S3 may get wrong profile data after remapping.

That's a valid concern. I think yes, there will be blocks getting wrong profile data if no call anchors are matched or the user changes the semantics that makes our assumption completed invalid. For non-CS AutoFDO, I think something similar is already happening as the current line-offset based profile loading has no knowledge about how source semantics is changed. For CSSPGO, the wrong profile data may not be completely bad since at least the function would not be considered as cold as it was previously, because of the mismatch that caused the profile completely dropped.

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

In D147545#4255928, @wenlei wrote:

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

I guess what @davidxl meant for two way matching is to spilt the non-anchor range. some matches are based on the start anchor, some are based on the end anchor.

foo()
inst1
inst2
inst3
inst4
bar()

if foo is mismatched and bar is matched, the current version, all inst1-4 will be mismatched.
and if matching in two way, say split evenly, inst3, inst4 will be matched based on bar().

In D147545#4255941, @wlei wrote:
In D147545#4255928, @wenlei wrote:

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

I guess what @davidxl meant for two way matching is to spilt the non-anchor range. some matches are based on the start anchor, some are based on the end anchor.
foo()
inst1
inst2
inst3
inst4
bar()
if foo is mismatched and bar is matched, the current version, all inst1-4 will be mismatched.
and if matching in two way, say split evenly, inst3, inst4 will be matched based on bar().

This one way. Another way is to do backward scanning of the anchors and create another loc mapping. The heuristic can be used to merge two location mappings (i.e., prefer the one that matches the current IR loc. If both differs, perhaps keep both - which may complicate the profile annotation).

In D147545#4255941, @wlei wrote:
In D147545#4255928, @wenlei wrote:

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

I guess what @davidxl meant for two way matching is to spilt the non-anchor range. some matches are based on the start anchor, some are based on the end anchor.
foo()
inst1
inst2
inst3
inst4
bar()
if foo is mismatched and bar is matched, the current version, all inst1-4 will be mismatched.
and if matching in two way, say split evenly, inst3, inst4 will be matched based on bar().

Ok, then for both heuristics, we can always have cases that yield correct matching with one heuristic, but wrong result with the other.

In D147545#4255958, @davidxl wrote:
In D147545#4255941, @wlei wrote:
In D147545#4255928, @wenlei wrote:

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

I guess what @davidxl meant for two way matching is to spilt the non-anchor range. some matches are based on the start anchor, some are based on the end anchor.
foo()
inst1
inst2
inst3
inst4
bar()
if foo is mismatched and bar is matched, the current version, all inst1-4 will be mismatched.
and if matching in two way, say split evenly, inst3, inst4 will be matched based on bar().
This one way. Another way is to do backward scanning of the anchors and create another loc mapping.

Ok, that sounds reasonable. We also discussed backward matching in the past. I'm fine if we want to quickly test backward matching (or a mix of forward and backward matching and compare result), or just get the current version in and iterate on that later.

The heuristic can be used to merge two location mappings (i.e., prefer the one that matches the current IR loc. If both differs, perhaps keep both - which may complicate the profile annotation).

Keep both will add complexity -- we will need to see meaningfully better result to justify it. Inclined to keep it simple.

addressing reviewers' feedback
added to match backwards(split evenly) for non-anchor locations.

wlei added inline comments.Apr 17 2023, 10:54 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
449	Done.
1842	Sounds good, I will add a switch in a separate diff.
2166	Commnets added
2191	Thanks for the suggestion, refined the comments
2219	I wanted to say "Result of the Anchor", changed to a more meaningful name `ProfileAnchors`
2235–2241	I found to do this we need to get some metrics to define how big the change is, also need to test the perf number and tune the threshold, I will handle this in a separate diff.
2282–2284	Good point! going to remove the `MatchedCallsiteLocs` , so changed to use `std::map<LineLocation, StringRef>`
2287	Should this be conditioned under FunctionSamples::ProfileIsProbeBased if the same is going to be used for non-CS profile? Yes, added the `FunctionSamples::ProfileIsProbeBased` check. Also if profile does match, there's no need to populate AllLocations? Yes, going the remove the `MatchedCallsiteLocs` , so here only leave one structure IRLocations.
2288	Done.
llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof
8	Sounds good, added a line including multiple call targets

In D147545#4255958, @davidxl wrote:
In D147545#4255941, @wlei wrote:
In D147545#4255928, @wenlei wrote:

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

I guess what @davidxl meant for two way matching is to spilt the non-anchor range. some matches are based on the start anchor, some are based on the end anchor.
foo()
inst1
inst2
inst3
inst4
bar()
if foo is mismatched and bar is matched, the current version, all inst1-4 will be mismatched.
and if matching in two way, say split evenly, inst3, inst4 will be matched based on bar().
This one way. Another way is to do backward scanning of the anchors and create another loc mapping. The heuristic can be used to merge two location mappings (i.e., prefer the one that matches the current IR loc. If both differs, perhaps keep both - which may complicate the profile annotation).

@davidxl For the non-anchor locations, I added the support to split the range evenly and match backwards for half of it. I tested, it showed a sight win like from ~1.1 regression to ~1.0 regression, maybe also noise, but I checked several instances, some backwards matchings can help.

For the anchor locations, yes, keeping both candidates will complicate the profile annotation. "prefer the one that matches the current IR loc." or search the closest loc are also good heuristic, we plan to explore more heuristic later.

Moveover, I did a statistical analysis, it appears that 86% of the anchors in the profile are unique anchor for the clang-self build, that might explain the current way can cover most of the cases. So I'm also leaning to make it simple at this moment.

In D147545#4274843, @wlei wrote:
In D147545#4255958, @davidxl wrote:
In D147545#4255941, @wlei wrote:
In D147545#4255928, @wenlei wrote:

I wonder if there is a need to do two way anchoring to handle this.

@davidxl can you elaborate on two way anchoring? is that also consider location of non-call site as anchor? It's trickier to use non-callsite as anchor since we would need a "signature" for block that is unique enough.

Also since this is all in the territory of heuristic and tuning, maybe we can get the initial version in while keep improving it. We got to this version after fair amount of internal evaluation, and it showed good results already. We still plan to keep tuning it though it will take some time to experiment.

I guess what @davidxl meant for two way matching is to spilt the non-anchor range. some matches are based on the start anchor, some are based on the end anchor.
foo()
inst1
inst2
inst3
inst4
bar()
if foo is mismatched and bar is matched, the current version, all inst1-4 will be mismatched.
and if matching in two way, say split evenly, inst3, inst4 will be matched based on bar().
This one way. Another way is to do backward scanning of the anchors and create another loc mapping. The heuristic can be used to merge two location mappings (i.e., prefer the one that matches the current IR loc. If both differs, perhaps keep both - which may complicate the profile annotation).
@davidxl For the non-anchor locations, I added the support to split the range evenly and match backwards for half of it. I tested, it showed a sight win like from ~1.1 regression to ~1.0 regression, maybe also noise, but I checked several instances, some backwards matchings can help.

For the anchor locations, yes, keeping both candidates will complicate the profile annotation. "prefer the one that matches the current IR loc." or search the closest loc are also good heuristic, we plan to explore more heuristic later.

Moveover, I did a statistical analysis, it appears that 86% of the anchors in the profile are unique anchor for the clang-self build, that might explain the current way can cover most of the cases. So I'm also leaning to make it simple at this moment.

I am ok with keeping it simple for now -- assuming this feature is off by default.

Harbormaster completed remote builds in B226153: Diff 514315.Apr 17 2023, 1:06 PM

wenlei added inline comments.Apr 17 2023, 8:09 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
1845–1846	not a big deal but maybe `recover-stale-profile` is a middle ground? the intention was to use a name that tells users what the switch does on a high level (salvage/recover), rather than how it's implemented (match/sync).
2235–2241	sounds good.
2282–2284	Looks like MatchedCallsiteLocs is still there?

wlei added inline comments.Apr 18 2023, 10:13 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2282–2284	Sorry for confusion, it will also be in a separate diff, same diff with checking big profile change. I plan to use the IRLocations to directly match profile locations, there will be no this `MatchedCallsiteLocs` and use it to guid if the mismatch is big to drop or not.

hoy added inline comments.Apr 18 2023, 10:27 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2289	I guess `IRLocations` should also be populated for non-CS. Maybe add a TODO?
2304	typo: wrote -> written BTW, have you seen such collision? With pseudo probes this shouldn't happen. It should not happen with dwarf discriminators either.
2334	BTW, wondering if you've ever seen mismatched callsites when function hash matches. The hash counts number of callsites but not their orders.

addressing Hongtao's feedback.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2289	TODO added, Yes, currently only support pseudo probe.
2304	Before I didn't add `isa<IntrinsicInst>(&I)` condition to `(std::optional<PseudoProbe> Probe = extractProbe(I))` I saw some callee name become empty due to the overwriting by the above emplace. So I was thinking this is to record all the anchors, the anchor should always be higher priority than non-anchor, so I changed like here to force the writing. Also in case any changes when we support AutoFDO or post-link time matching. Or we can use the assertion here.
2334	Good question. Yes, there are many mismatched callsites even hash is matched, current work only support when a checksum mismatch is detected. There is a general issue whether we can turn it on for all the functions. That is whether the matching algorithm can handle perfectly with the non-stale profile. the current heuristic is "first come first match", but not all the functions are in the profile(supposing there are functions doesn't hit any samples), it could give inconsistent anchors for the non-stale profile then cause a mismatch. In order to solve it, I think we can try: Use a more strict checksum, like also count the orders. find a threshold from the mismatch metrics to control it. Use a different heuristic, like search the closest location which can handle well with non-stale profile, but need more measuring for the mismatched function. This is also an issue blocking AutoFDO, since AutoFDO doesn't have the checksum.

wenlei added inline comments.Apr 18 2023, 5:07 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2334	it might be useful to have a debug print, or `STATISTIC` so we know when and how often this happens. It's essentially hash collision.

hoy added inline comments.Apr 19 2023, 11:06 AM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2304	I see. Yeah, `extractProbe` on call instruction gets the call probe. If you want block probes only, you can change the check `isa<IntrinsicInst>` to `isa<PseudoProbeInst>`.
2334	Good point. Counting how many direct callsites having mismatched targets in the profile and on the IR would be helpful.

wenlei added inline comments.Apr 19 2023, 2:42 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2282–2284	Sounds good to deal with it in a separate patch. For `IRLocations`, would be good to have a comment mentioning the use of empty StringRef non-direct-call site.
2289	Looks like we're not populating non-call locations for AutoFDO? In that case, should we assert on CSSPGO to make sure we don't accidentally run this for AutoFDO before its support is complete?
2304	For non-call location, the name should be empty. Maybe assert that we never overwrite non-empty name?

hoy added inline comments.Apr 19 2023, 3:55 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2304	An assert sounds good for pseudo probes. The overwrite is possible for autofdo without enabling dwarf discriminator.
2334	Also as discussed offline, since IR callsites are matched with profile callsites sequentially, it may be interesting to see how many same-named callsites are not able to get a match. If it's non-trivial, adding zero-count callsites in the profile may allow for a more precise match. Of course that's going to increase profile size, but just curious if it is worth.

addressing feedback.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2282–2284	comment added
2289	Yes, now it's not supported for AutoFDO. It's already under `FunctionSamples::ProfileIsProbeBased`, so this code won't be run in AutoFDO.
2304	Sounds good, changed to use `isa<PseudoProbeInst>. and added the assert for pseudo probe.
2334	it might be useful to have a debug print, or STATISTIC so we know when and how often this happens. It's essentially hash collision. We already have the global statistic report for the mismatched callsite(the `TotalProfiledCallsites` under `ReportProfileStaleness`), added a debug print for the function level debugging. Also as discussed offline, since IR callsites are matched with profile callsites sequentially, it may be interesting to see how many same-named callsites are not able to get a match. If it's non-trivial, adding zero-count callsites in the profile may allow for a more precise match. Of course that's going to increase profile size, but just curious if it is worth. Yeah, we could do a offline analysis for how many callsites whose name can be found in the profile but still remain mismatched after matching. We could do add the zero-count calliste in llvm-profgen, only for top-level is enough, I guess that size should be ok with extbinary. Alternatively, we could use "search closest location" heuristic so that for a fresh profile where the profile and the IR should be the same location, the IR location can always be matched to the same location in the Profile.

Updating D147545: [CSSPGO] Stale profile matching(part 2)

wenlei added inline comments.Apr 20 2023, 12:52 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2332	A bit confused by the use of `_TotalProfiledCallsites` and `_NumMismatchedCallsites`, what are you trying to do with the two additional variables?

wlei added inline comments.Apr 20 2023, 12:55 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
2332	Because the `TotalProfiledCallsites` and `NumMismatchedCallsites` is the sum for the whole module functions not one function, so here for function level debug print, use a location variable to do it.

Harbormaster completed remote builds in B226938: Diff 515422.Apr 20 2023, 2:24 PM

How about retitle it as "[AutoFDO] Stale profile matching (part 2)" to align with the first patch? It's mentioned in the summary that only supports CSSPGO for now, so it should be good.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2304	still seeing `if (FunctionSamples::ProfileIsProbeBased && isa<IntrinsicInst>(&I))` at line 2290. LGTM otherwise .

wlei retitled this revision from [CSSPGO] Stale profile matching(part 2) to [AutoFDO] Stale profile matching(part 2).Apr 21 2023, 1:18 PM

wlei edited the summary of this revision. (Show Details)

In D147545#4287782, @hoy wrote:

How about retitle it as "[AutoFDO] Stale profile matching (part 2)" to align with the first patch? It's mentioned in the summary that only supports CSSPGO for now, so it should be good.

Sounds good.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2304	Oops, I missed that.

Updating D147545: [AutoFDO] Stale profile matching(part 2)

hoy accepted this revision.Apr 21 2023, 1:39 PM

This revision is now accepted and ready to land.Apr 21 2023, 1:39 PM

wenlei added inline comments.Apr 25 2023, 4:09 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
451	nit: FuncToMatchingsMap->FuncMappings?
2208	Can/should we assert `IRToProfileLocationMap` is empty to begin with? i.e. we shouldn't be populating the same map multiple times?
2229	If you provide `erase` with an iterator instead of an element, it would save a find..
2256	It looks to me that `IsAnchorMatched` can be replaced with `!IRToProfileLocationMap.empty()`?
2332	Ok, now I see what's going on. The code as is can be confusing (both the naming of variables, and the subtraction involved to get per-function stats). I suggest we make `countProfileMismatches` take two `int&` input through parameters, so it doesn't change global state as a way to output, which is inconsistent with the way it takes func as input through parameters. Then use `FuncMismatchedCallsites`, `FuncProfiledCallsites` to pass into `countProfileMismatches` to get its output. At caller side, we accumulate `FuncMismatchedCallsites`, `FuncProfiledCallsites` on to `TotalProfiledCallsites`, `TotalProfiledCallsites`.
2353	I'm wondering if we should just create a map here, and change the `getOrCreateIRToProfileLocationMap` API into a simple getter? There seems to be very clear separation as to when we need to create and when we need to get, since there's no case for on-demand creation. Hence, simple/separate API might be cleaner, and less error-prone.

wlei added inline comments.Apr 25 2023, 8:20 PM

llvm/lib/Transforms/IPO/SampleProfile.cpp
451	done.
2208	Good point, done.
2229	done.
2256	This is inside the for loop : for (auto &IR : IRLocations) { bool IsAnchorMatched = false; .... } For each location, it needs to check if it's an anchor then if it's matched, maybe the name is confusing, changed to `IsMatchedAnchor`.
2332	yeah, it's clearer this way, done.
2353	Yes, here the matching should run only once per function, makes sense to just create a map.

addressing Wenlei's feedback

lgtm, thanks!

As we evolve SampleProfileMatcher, it might be worth considering moving the matcher into its own file similar to SampleProfileProbe.cpp, to avoid bloating SampleProfile.cpp indefinitely. But this can be handled later in separate patch.

In D147545#4287782, @hoy wrote:

How about retitle it as "[AutoFDO] Stale profile matching (part 2)" to align with the first patch? It's mentioned in the summary that only supports CSSPGO for now, so it should be good.

I found this confusing, and the patch really isn't about AutoFDO, but CSSPGO. Maybe rename the first patch as [CSSPGO] instead for consistency. Or just say SamplePGO to avoid differentiating the two.

Harbormaster completed remote builds in B228199: Diff 517034.Apr 25 2023, 10:10 PM

In D147545#4297736, @wenlei wrote:

lgtm, thanks!

As we evolve SampleProfileMatcher, it might be worth considering moving the matcher into its own file similar to SampleProfileProbe.cpp, to avoid bloating SampleProfile.cpp indefinitely. But this can be handled later in separate patch.

Good point, will do in a separate patch.

In D147545#4297742, @wenlei wrote:

In D147545#4287782, @hoy wrote:

How about retitle it as "[AutoFDO] Stale profile matching (part 2)" to align with the first patch? It's mentioned in the summary that only supports CSSPGO for now, so it should be good.

I found this confusing, and the patch really isn't about AutoFDO, but CSSPGO. Maybe rename the first patch as [CSSPGO] instead for consistency. Or just say SamplePGO to avoid differentiating the two.

The infra might be useful for AutoFDO, changed to SamplePGO.

wlei retitled this revision from [AutoFDO] Stale profile matching(part 2) to [SamplePGO] Stale profile matching(part 2).Apr 26 2023, 10:05 AM

wlei edited the summary of this revision. (Show Details)Apr 28 2023, 12:37 PM

This revision was landed with ongoing or failed builds.Apr 28 2023, 1:13 PM

Closed by commit rG892daede7201: [SamplePGO] Stale profile matching(part 2) (authored by wlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rG892daede7201: [SamplePGO] Stale profile matching(part 2).

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

SampleProfile.cpp

188 lines

test/

Transforms/

SampleProfile/

Inputs/

pseudo-probe-stale-profile-matching.prof

28 lines

pseudo-probe-stale-profile-matching.ll

342 lines

Diff 514315

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	PriorityQueue<InlineCandidate, std::vector<InlineCandidate>,
CandidateComparer>;		CandidateComparer>;

// Sample profile matching - fuzzy match.		// Sample profile matching - fuzzy match.
class SampleProfileMatcher {		class SampleProfileMatcher {
Module &M;		Module &M;
SampleProfileReader &Reader;		SampleProfileReader &Reader;
const PseudoProbeManager *ProbeManager;		const PseudoProbeManager *ProbeManager;
SampleProfileMap FlattenedProfiles;		SampleProfileMap FlattenedProfiles;
		// For each function, the matcher generates a map, of which each entry is a
		// mapping from the source location of current build to the source location in
		davidxlUnsubmitted Not Done Reply Inline Actions nit: location -> source location davidxl: nit: location -> source location
		wleiAuthorUnsubmitted Done Reply Inline Actions Done. wlei: Done.
		// the profile.
		StringMap<LocToLocMap> FuncToMatchingsMap;
		wenleiUnsubmitted Not Done Reply Inline Actions nit: FuncToMatchingsMap->FuncMappings? wenlei: nit: FuncToMatchingsMap->FuncMappings?
		wleiAuthorUnsubmitted Done Reply Inline Actions done. wlei: done.

// Profile mismatching statstics.		// Profile mismatching statstics.
uint64_t TotalProfiledCallsites = 0;		uint64_t TotalProfiledCallsites = 0;
uint64_t NumMismatchedCallsites = 0;		uint64_t NumMismatchedCallsites = 0;
uint64_t MismatchedCallsiteSamples = 0;		uint64_t MismatchedCallsiteSamples = 0;
uint64_t TotalCallsiteSamples = 0;		uint64_t TotalCallsiteSamples = 0;
uint64_t TotalProfiledFunc = 0;		uint64_t TotalProfiledFunc = 0;
uint64_t NumMismatchedFuncHash = 0;		uint64_t NumMismatchedFuncHash = 0;
Show All 19 Lines	if (It != FlattenedProfiles.end())
return &It->second;		return &It->second;
return nullptr;		return nullptr;
}		}
void runOnFunction(const Function &F, const FunctionSamples &FS);		void runOnFunction(const Function &F, const FunctionSamples &FS);
void countProfileMismatches(		void countProfileMismatches(
const FunctionSamples &FS,		const FunctionSamples &FS,
const std::unordered_set<LineLocation, LineLocationHash>		const std::unordered_set<LineLocation, LineLocationHash>
&MatchedCallsiteLocs);		&MatchedCallsiteLocs);

		LocToLocMap &getOrCreateIRToProfileLocationMap(const Function &F) {
		auto Ret = FuncToMatchingsMap.try_emplace(
		FunctionSamples::getCanonicalFnName(F.getName()), LocToLocMap());
		return Ret.first->second;
		}
		void distributeIRToProfileLocationMap();
		void distributeIRToProfileLocationMap(FunctionSamples &FS);
		void populateProfileCallsites(
		const FunctionSamples &FS,
		StringMap<std::set<LineLocation>> &CalleeToCallsitesMap);
		void runStaleProfileMatching(
		const std::map<LineLocation, StringRef> &IRLocations,
		StringMap<std::set<LineLocation>> &CalleeToCallsitesMap,
		LocToLocMap &IRToProfileLocationMap);
};		};

/// Sample profile pass.		/// Sample profile pass.
///		///
/// This pass reads profile data from the file specified by		/// This pass reads profile data from the file specified by
/// -sample-profile-file and annotates every affected function with the		/// -sample-profile-file and annotates every affected function with the
/// profile information found in that file.		/// profile information found in that file.
class SampleProfileLoader final		class SampleProfileLoader final
▲ Show 20 Lines • Show All 1,324 Lines • ▼ Show 20 Lines
///		///
/// \returns true if \p F was modified. Returns false, otherwise.		/// \returns true if \p F was modified. Returns false, otherwise.
bool SampleProfileLoader::emitAnnotations(Function &F) {		bool SampleProfileLoader::emitAnnotations(Function &F) {
bool Changed = false;		bool Changed = false;

if (FunctionSamples::ProfileIsProbeBased) {		if (FunctionSamples::ProfileIsProbeBased) {
if (!ProbeManager->profileIsValid(F, *Samples)) {		if (!ProbeManager->profileIsValid(F, *Samples)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Profile is invalid due to CFG mismatch for Function "		dbgs() << "Profile is invalid due to CFG mismatch for Function "
		wenleiUnsubmitted Not Done Reply Inline Actions Unrelated to this change, but it may be useful to have a way for users to print out functions with stale profile without needing debug compiler. Often times users ask this: is this change going to invalidate profile for this function? having a switch may help them self-diagnose to get an answer. wenlei: Unrelated to this change, but it may be useful to have a way for users to print out functions…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good, I will add a switch in a separate diff. wlei: Sounds good, I will add a switch in a separate diff.
<< F.getName());		<< F.getName() << "\n");
++NumMismatchedProfile;		++NumMismatchedProfile;
		if (!SalvageStaleProfile)
return false;		return false;
		davidxlUnsubmitted Not Done Reply Inline Actions nit: probably rename the option to be 'resync-stale-profile' to make the meaning more accurate. davidxl: nit: probably rename the option to be 'resync-stale-profile' to make the meaning more accurate.
		wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for the name suggestion. so "resync the profile according to current build source code", it sounds good to me, @wenlei see if you are good for this name since you suggested the salvage-stale-profile wlei: Thanks for the name suggestion. so "resync the profile according to current build source code"…
		wenleiUnsubmitted Not Done Reply Inline Actions not a big deal but maybe `recover-stale-profile` is a middle ground? the intention was to use a name that tells users what the switch does on a high level (salvage/recover), rather than how it's implemented (match/sync). wenlei: not a big deal but maybe `recover-stale-profile` is a middle ground? the intention was to use a…
}		}
++NumMatchedProfile;		++NumMatchedProfile;
		wenleiUnsubmitted Not Done Reply Inline Actions As mentioned below, I think we should still throw away profiles that can't be matched due to missing call site anchors. wenlei: As mentioned below, I think we should still throw away profiles that can't be matched due to…
} else {		} else {
if (getFunctionLoc(F) == 0)		if (getFunctionLoc(F) == 0)
return false;		return false;

LLVM_DEBUG(dbgs() << "Line number for the first instruction in "		LLVM_DEBUG(dbgs() << "Line number for the first instruction in "
<< F.getName() << ": " << getFunctionLoc(F) << "\n");		<< F.getName() << ": " << getFunctionLoc(F) << "\n");
}		}

▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	for (auto &I : FS.getCallsiteSamples()) {
TotalProfiledCallsites++;		TotalProfiledCallsites++;
if (!MatchedCallsiteLocs.count(Loc)) {		if (!MatchedCallsiteLocs.count(Loc)) {
MismatchedCallsiteSamples += Count;		MismatchedCallsiteSamples += Count;
NumMismatchedCallsites++;		NumMismatchedCallsites++;
}		}
}		}
}		}

		// Populate the anchors(direct callee name) from profile.
		void SampleProfileMatcher::populateProfileCallsites(
		davidxlUnsubmitted Not Done Reply Inline Actions The mapping from callee to location is 1 to many. Should that be handled with sequential id (per callee)? davidxl: The mapping from callee to location is 1 to many. Should that be handled with sequential id…
		wleiAuthorUnsubmitted Done Reply Inline Actions The mapping from callee to location is 1 to many. Should that be handled with sequential id (per callee)? Here is also `1 to many` map, see `StringMap<std::set<LineLocation>> &CalleeToCallsitesMap` which is a callee name string to a set of location. So with no callsite changes, this works the same as using sequential id, but with callsite change, this should be more resilient. Say if we add one or delete the first callsite, all following callsite would be mismatched if using sequential id match, but for using this callsite name match, all other callsite with different name won't be affected. wlei: > The mapping from callee to location is 1 to many. Should that be handled with sequential id…
		const FunctionSamples &FS,
		StringMap<std::set<LineLocation>> &CalleeToCallsitesMap) {
		for (const auto &I : FS.getBodySamples()) {
		const auto &Loc = I.first;
		const auto &CTM = I.second.getCallTargets();
		// Filter out possible indirect calls, use direct callee name as anchor.
		hoyUnsubmitted Not Done Reply Inline Actions Please add a comment for this. Basically we are filtering out possible indirect calls. hoy: Please add a comment for this. Basically we are filtering out possible indirect calls.
		davidxlUnsubmitted Not Done Reply Inline Actions indirect call sites are good anchors too. Why not use sequential id based anchoring? davidxl: indirect call sites are good anchors too. Why not use sequential id based anchoring?
		wenleiUnsubmitted Not Done Reply Inline Actions The call site matching is based on callee names, indirect call from IR doesn't have callee name, and we opt to be conservative here (using name, instead of sequential id). wenlei: The call site matching is based on callee names, indirect call from IR doesn't have callee name…
		wleiAuthorUnsubmitted Done Reply Inline Actions Commnets added wlei: Commnets added
		if (CTM.size() == 1) {
		StringRef CalleeName = CTM.begin()->first();
		const auto &Candidates = CalleeToCallsitesMap.try_emplace(
		CalleeName, std::set<LineLocation>());
		Candidates.first->second.insert(Loc);
		}
		}

		for (const auto &I : FS.getCallsiteSamples()) {
		const LineLocation &Loc = I.first;
		const auto &CalleeMap = I.second;
		// Filter out possible indirect calls, use direct callee name as anchor.
		if (CalleeMap.size() == 1) {
		StringRef CalleeName = CalleeMap.begin()->first;
		const auto &Candidates = CalleeToCallsitesMap.try_emplace(
		CalleeName, std::set<LineLocation>());
		Candidates.first->second.insert(Loc);
		}
		}
		}

		// Call target name anchor based profile fuzzy matching.
		// Input:
		// For IR locations, the anchor is the callee name of direct callsite; For
		// profile locations, it's the call target name for BodySamples or inlinee's
		hoyUnsubmitted Not Done Reply Inline Actions May be worth mentioning the matching is based on sequential lexical order. "First come" means the anchors with lexically early locations. hoy: May be worth mentioning the matching is based on sequential lexical order. "First come" means…
		wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for the suggestion, refined the comments wlei: Thanks for the suggestion, refined the comments
		// profile name for CallsiteSamples.
		// Matching heuristic:
		// First match all the anchors in lexical order, then split the non-anchor
		// locations between the two anchors evenly, first half are matched based on the
		// start anchor, second half are matched based on the end anchor.
		// For example, given:
		// IR locations: [1, 2(foo), 3, 5, 6(bar), 7]
		// Profile locations: [1, 2, 3(foo), 4, 7, 8(bar), 9]
		// The matching gives:
		// [1, 2(foo), 3, 5, 6(bar), 7]
		// \| \| \| \| \| \|
		// [1, 2, 3(foo), 4, 7, 8(bar), 9]
		// The output mapping: [2->3, 3->4, 5->7, 6->8, 7->9].
		void SampleProfileMatcher::runStaleProfileMatching(
		const std::map<LineLocation, StringRef> &IRLocations,
		StringMap<std::set<LineLocation>> &CalleeToCallsitesMap,
		LocToLocMap &IRToProfileLocationMap) {
		wenleiUnsubmitted Not Done Reply Inline Actions Can/should we assert `IRToProfileLocationMap` is empty to begin with? i.e. we shouldn't be populating the same map multiple times? wenlei: Can/should we assert `IRToProfileLocationMap` is empty to begin with? i.e. we shouldn't be…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point, done. wlei: Good point, done.
		auto InsertMatching = [&](const LineLocation &From, const LineLocation &To) {
		// Skip the unchanged location mapping to save memory.
		if (From != To)
		IRToProfileLocationMap.insert({From, To});
		};

		// Use function's beginning location as the initial anchor.
		int32_t LocationDelta = 0;
		SmallVector<LineLocation> LastMatchedNonAnchors;

		for (auto &IR : IRLocations) {
		davidxlUnsubmitted Not Done Reply Inline Actions What does RA stand for? davidxl: What does RA stand for?
		wleiAuthorUnsubmitted Done Reply Inline Actions I wanted to say "Result of the Anchor", changed to a more meaningful name `ProfileAnchors` wlei: I wanted to say "Result of the Anchor", changed to a more meaningful name `ProfileAnchors`
		const auto &Loc = IR.first;
		StringRef CalleeName = IR.second;
		bool IsAnchorMatched = false;
		// Match the anchor location in lexical order.
		if (!CalleeName.empty()) {
		auto ProfileAnchors = CalleeToCallsitesMap.find(CalleeName);
		if (ProfileAnchors != CalleeToCallsitesMap.end() &&
		!ProfileAnchors->second.empty()) {
		const auto &Candidate = *ProfileAnchors->second.begin();
		ProfileAnchors->second.erase(Candidate);
		wenleiUnsubmitted Not Done Reply Inline Actions If you provide `erase` with an iterator instead of an element, it would save a find.. wenlei: If you provide `erase` with an iterator instead of an element, it would save a find..
		wleiAuthorUnsubmitted Done Reply Inline Actions done. wlei: done.
		InsertMatching(Loc, Candidate);
		LLVM_DEBUG(dbgs() << "Callsite with callee:" << CalleeName
		<< " is matched from " << Loc << " to " << Candidate
		<< "\n");
		LocationDelta = Candidate.LineOffset - Loc.LineOffset;

		// Match backwards for non-anchor locations.
		// The locations in LastMatchedNonAnchors have been matched forwards
		// based on the previous anchor, spilt it evenly and overwrite the
		// second half based on the current anchor.
		for (size_t I = (LastMatchedNonAnchors.size() + 1) / 2;
		I < LastMatchedNonAnchors.size(); I++) {
		wenleiUnsubmitted Not Done Reply Inline Actions This doesn't seem to do what you intended to achieve, or at least what the comment says non-anchor location match is based on the offset to the last matched anchor. If last matched anchor doesn't exist, should we still match anything? Consider a case where a function is completely changed, and there's no matched callsite at call, in which case we should probably still throw away its profile instead of matching everything with LocationDelta == 0? wenlei: This doesn't seem to do what you intended to achieve, or at least what the comment says > non…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sorry if my description is unclear. My implementation assumes a special non-callsite anchor: `0`, i.e. the beginning of the function. Without the fuzzy matching, it's just this only one anchor matched, and all the others(non-anchor) use the `LocationDelta = 0` to do the matching, which means the match source is equal to the target. Consider a case where a function is completely changed, and there's no matched callsite at call, in which case we should probably still throw away its profile instead of matching everything with LocationDelta == 0? Say: IR locations: [1...100 101(foo)] Profile locations: [1...100 101(bar)] So my intuition is even without any callsite anchor, matching the leading location would be better than dropping them. (remembering current pseudo-probe implementation, the bb ids all come first, it's not uncommon) wlei: Sorry if my description is unclear. My implementation assumes a special non-callsite anchor…
		wenleiUnsubmitted Not Done Reply Inline Actions Hmm.. I thought if there's no matching call site, that would mean the function has completely changed, so we can't match probes/blocks in a reasonable way. I can see what you have can help matching leaf functions without call sites. But there needs to be something to make sure when the change is too big, we bail out instead of doing low confidence matching. Maybe we should bail out on zero matched call site + different number of probes/blocks? wenlei: Hmm.. I thought if there's no matching call site, that would mean the function has completely…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point, I will add some check(like a threshold) to make sure the very bad match to be bailed out and drop the profiles. wlei: Good point, I will add some check(like a threshold) to make sure the very bad match to be…
		wleiAuthorUnsubmitted Done Reply Inline Actions I found to do this we need to get some metrics to define how big the change is, also need to test the perf number and tune the threshold, I will handle this in a separate diff. wlei: I found to do this we need to get some metrics to define how big the change is, also need to…
		wenleiUnsubmitted Not Done Reply Inline Actions sounds good. wenlei: sounds good.
		const auto &L = LastMatchedNonAnchors[I];
		uint32_t CandidateLineOffset = L.LineOffset + LocationDelta;
		LineLocation Candidate(CandidateLineOffset, L.Discriminator);
		InsertMatching(L, Candidate);
		LLVM_DEBUG(dbgs() << "Location is rematched backwards from " << L
		<< " to " << Candidate << "\n");
		}

		IsAnchorMatched = true;
		LastMatchedNonAnchors.clear();
		}
		}

		// Match forwards for non-anchor locations.
		if (!IsAnchorMatched) {
		wenleiUnsubmitted Not Done Reply Inline Actions It looks to me that `IsAnchorMatched` can be replaced with `!IRToProfileLocationMap.empty()`? wenlei: It looks to me that `IsAnchorMatched` can be replaced with `!IRToProfileLocationMap.empty()`?
		wleiAuthorUnsubmitted Done Reply Inline Actions This is inside the for loop : for (auto &IR : IRLocations) { bool IsAnchorMatched = false; .... } For each location, it needs to check if it's an anchor then if it's matched, maybe the name is confusing, changed to `IsMatchedAnchor`. wlei: This is inside the for loop : ``` for (auto &IR : IRLocations) { bool IsAnchorMatched…
		uint32_t CandidateLineOffset = Loc.LineOffset + LocationDelta;
		LineLocation Candidate(CandidateLineOffset, Loc.Discriminator);
		InsertMatching(Loc, Candidate);
		LLVM_DEBUG(dbgs() << "Location is matched from " << Loc << " to "
		<< Candidate << "\n");
		LastMatchedNonAnchors.emplace_back(Loc);
		}
		}
		}

void SampleProfileMatcher::runOnFunction(const Function &F,		void SampleProfileMatcher::runOnFunction(const Function &F,
const FunctionSamples &FS) {		const FunctionSamples &FS) {
		bool IsFuncHashMismatch = false;
if (FunctionSamples::ProfileIsProbeBased) {		if (FunctionSamples::ProfileIsProbeBased) {
uint64_t Count = FS.getTotalSamples();		uint64_t Count = FS.getTotalSamples();
TotalFuncHashSamples += Count;		TotalFuncHashSamples += Count;
TotalProfiledFunc++;		TotalProfiledFunc++;
if (!ProbeManager->profileIsValid(F, FS)) {		if (!ProbeManager->profileIsValid(F, FS)) {
MismatchedFuncHashSamples += Count;		MismatchedFuncHashSamples += Count;
NumMismatchedFuncHash++;		NumMismatchedFuncHash++;
return;		IsFuncHashMismatch = true;
}		}
}		}

std::unordered_set<LineLocation, LineLocationHash> MatchedCallsiteLocs;		std::unordered_set<LineLocation, LineLocationHash> MatchedCallsiteLocs;
		std::map<LineLocation, StringRef> IRLocations;

// Go through all the callsites on the IR and flag the callsite if the target		// Extract profile matching anchors and profile mismatch metrics in the IR.
		wenleiUnsubmitted Not Done Reply Inline Actions Wondering if we can simplify by merging the two containers, i.e use `std::map<LineLocation, StringRef>`, and empty StringRef value for non-direct-callsite, then we avoid the 2nd map, and also avoid the map look up in `runStaleProfileMatching`. It may use slightly more memory, but feels a bit cleaner. Or even something like `std::map<LineLocation, std::pair<StringRef, bool>>` to have `MatchedCallsiteLocs` merged in as well. These containers feel a bit duplicated. wenlei: Wondering if we can simplify by merging the two containers, i.e use `std::map<LineLocation…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point! going to remove the `MatchedCallsiteLocs` , so changed to use `std::map<LineLocation, StringRef>` wlei: Good point! going to remove the `MatchedCallsiteLocs `, so changed to use `std…
		wenleiUnsubmitted Not Done Reply Inline Actions Looks like MatchedCallsiteLocs is still there? wenlei: Looks like MatchedCallsiteLocs is still there?
		wleiAuthorUnsubmitted Done Reply Inline Actions Sorry for confusion, it will also be in a separate diff, same diff with checking big profile change. I plan to use the IRLocations to directly match profile locations, there will be no this `MatchedCallsiteLocs` and use it to guid if the mismatch is big to drop or not. wlei: Sorry for confusion, it will also be in a separate diff, same diff with checking big profile…
		wenleiUnsubmitted Not Done Reply Inline Actions Sounds good to deal with it in a separate patch. For `IRLocations`, would be good to have a comment mentioning the use of empty StringRef non-direct-call site. wenlei: Sounds good to deal with it in a separate patch. For `IRLocations`, would be good to have a…
		wleiAuthorUnsubmitted Done Reply Inline Actions comment added wlei: comment added
// name is the same as the one in the profile.
for (auto &BB : F) {		for (auto &BB : F) {
for (auto &I : BB) {		for (auto &I : BB) {
		if (FunctionSamples::ProfileIsProbeBased && isa<IntrinsicInst>(&I)) {
		hoyUnsubmitted Not Done Reply Inline Actions Should this be conditioned under `FunctionSamples::ProfileIsProbeBased` if the same is going to be used for non-CS profile? Also if profile does match, there's no need to populate `AllLocations`? hoy: Should this be conditioned under `FunctionSamples::ProfileIsProbeBased` if the same is going to…
		wleiAuthorUnsubmitted Done Reply Inline Actions Should this be conditioned under FunctionSamples::ProfileIsProbeBased if the same is going to be used for non-CS profile? Yes, added the `FunctionSamples::ProfileIsProbeBased` check. Also if profile does match, there's no need to populate AllLocations? Yes, going the remove the `MatchedCallsiteLocs` , so here only leave one structure IRLocations. wlei: > Should this be conditioned under FunctionSamples::ProfileIsProbeBased if the same is going to…
		if (std::optional<PseudoProbe> Probe = extractProbe(I))
		hoyUnsubmitted Not Done Reply Inline Actions nit: use emplace hoy: nit: use emplace
		wleiAuthorUnsubmitted Done Reply Inline Actions Done. wlei: Done.
		IRLocations.emplace(LineLocation(Probe->Id, 0), StringRef());
		hoyUnsubmitted Not Done Reply Inline Actions I guess `IRLocations` should also be populated for non-CS. Maybe add a TODO? hoy: I guess `IRLocations` should also be populated for non-CS. Maybe add a TODO?
		wleiAuthorUnsubmitted Done Reply Inline Actions TODO added, Yes, currently only support pseudo probe. wlei: TODO added, Yes, currently only support pseudo probe.
		wenleiUnsubmitted Not Done Reply Inline Actions Looks like we're not populating non-call locations for AutoFDO? In that case, should we assert on CSSPGO to make sure we don't accidentally run this for AutoFDO before its support is complete? wenlei: Looks like we're not populating non-call locations for AutoFDO? In that case, should we assert…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yes, now it's not supported for AutoFDO. It's already under `FunctionSamples::ProfileIsProbeBased`, so this code won't be run in AutoFDO. wlei: Yes, now it's not supported for AutoFDO. It's already under `FunctionSamples…
		}

if (!isa<CallBase>(&I) \|\| isa<IntrinsicInst>(&I))		if (!isa<CallBase>(&I) \|\| isa<IntrinsicInst>(&I))
continue;		continue;

const auto *CB = dyn_cast<CallBase>(&I);		const auto *CB = dyn_cast<CallBase>(&I);
if (auto &DLoc = I.getDebugLoc()) {		if (auto &DLoc = I.getDebugLoc()) {
LineLocation IRCallsite = FunctionSamples::getCallSiteIdentifier(DLoc);		LineLocation IRCallsite = FunctionSamples::getCallSiteIdentifier(DLoc);

StringRef CalleeName;		StringRef CalleeName;
if (Function *Callee = CB->getCalledFunction())		if (Function *Callee = CB->getCalledFunction())
CalleeName = FunctionSamples::getCanonicalFnName(Callee->getName());		CalleeName = FunctionSamples::getCanonicalFnName(Callee->getName());

		// Force to overwrite the callee name in case any non-call location was
		// wrote before.
		hoyUnsubmitted Not Done Reply Inline Actions typo: wrote -> written BTW, have you seen such collision? With pseudo probes this shouldn't happen. It should not happen with dwarf discriminators either. hoy: typo: wrote -> written BTW, have you seen such collision? With pseudo probes this shouldn't…
		wleiAuthorUnsubmitted Done Reply Inline Actions Before I didn't add `isa<IntrinsicInst>(&I)` condition to `(std::optional<PseudoProbe> Probe = extractProbe(I))` I saw some callee name become empty due to the overwriting by the above emplace. So I was thinking this is to record all the anchors, the anchor should always be higher priority than non-anchor, so I changed like here to force the writing. Also in case any changes when we support AutoFDO or post-link time matching. Or we can use the assertion here. wlei: Before I didn't add `isa<IntrinsicInst>(&I)` condition to `(std::optional<PseudoProbe> Probe =…
		hoyUnsubmitted Not Done Reply Inline Actions I see. Yeah, `extractProbe` on call instruction gets the call probe. If you want block probes only, you can change the check `isa<IntrinsicInst>` to `isa<PseudoProbeInst>`. hoy: I see. Yeah, `extractProbe` on call instruction gets the call probe. If you want block probes…
		wenleiUnsubmitted Not Done Reply Inline Actions For non-call location, the name should be empty. Maybe assert that we never overwrite non-empty name? wenlei: For non-call location, the name should be empty. Maybe assert that we never overwrite non-empty…
		hoyUnsubmitted Not Done Reply Inline Actions An assert sounds good for pseudo probes. The overwrite is possible for autofdo without enabling dwarf discriminator. hoy: An assert sounds good for pseudo probes. The overwrite is possible for autofdo without enabling…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good, changed to use `isa<PseudoProbeInst>. and added the assert for pseudo probe. wlei: Sounds good, changed to use `isa<PseudoProbeInst>. and added the assert for pseudo probe.
		hoyUnsubmitted Not Done Reply Inline Actions still seeing `if (FunctionSamples::ProfileIsProbeBased && isa<IntrinsicInst>(&I))` at line 2290. LGTM otherwise . hoy: still seeing `if (FunctionSamples::ProfileIsProbeBased && isa<IntrinsicInst>(&I))` at line 2290.
		wleiAuthorUnsubmitted Done Reply Inline Actions Oops, I missed that. wlei: Oops, I missed that.
		IRLocations[IRCallsite] = CalleeName;

		// Go through all the callsites on the IR and flag the callsite if the
		// target name is the same as the one in the profile.
const auto CTM = FS.findCallTargetMapAt(IRCallsite);		const auto CTM = FS.findCallTargetMapAt(IRCallsite);
const auto CallsiteFS = FS.findFunctionSamplesMapAt(IRCallsite);		const auto CallsiteFS = FS.findFunctionSamplesMapAt(IRCallsite);

// Indirect call case.		// Indirect call case.
if (CalleeName.empty()) {		if (CalleeName.empty()) {
// Since indirect call does not have the CalleeName, check		// Since indirect call does not have the CalleeName, check
// conservatively if callsite in the profile is a callsite location.		// conservatively if callsite in the profile is a callsite location.
// This is to avoid nums of false positive since otherwise all the		// This is to avoid nums of false positive since otherwise all the
// indirect call samples will be reported as mismatching.		// indirect call samples will be reported as mismatching.
if ((CTM && !CTM->empty()) \|\| (CallsiteFS && !CallsiteFS->empty()))		if ((CTM && !CTM->empty()) \|\| (CallsiteFS && !CallsiteFS->empty()))
MatchedCallsiteLocs.insert(IRCallsite);		MatchedCallsiteLocs.insert(IRCallsite);
} else {		} else {
// Check if the call target name is matched for direct call case.		// Check if the call target name is matched for direct call case.
if ((CTM && CTM->count(CalleeName)) \|\|		if ((CTM && CTM->count(CalleeName)) \|\|
(CallsiteFS && CallsiteFS->count(CalleeName)))		(CallsiteFS && CallsiteFS->count(CalleeName)))
MatchedCallsiteLocs.insert(IRCallsite);		MatchedCallsiteLocs.insert(IRCallsite);
}		}
}		}
}		}
}		}

// Detect profile mismatch for profile staleness metrics report.		// Detect profile mismatch for profile staleness metrics report.
if (ReportProfileStaleness \|\| PersistProfileStaleness)		if (ReportProfileStaleness \|\| PersistProfileStaleness)
countProfileMismatches(FS, MatchedCallsiteLocs);		countProfileMismatches(FS, MatchedCallsiteLocs);
		wenleiUnsubmitted Not Done Reply Inline Actions A bit confused by the use of `_TotalProfiledCallsites` and `_NumMismatchedCallsites`, what are you trying to do with the two additional variables? wenlei: A bit confused by the use of `_TotalProfiledCallsites` and `_NumMismatchedCallsites`, what are…
		wleiAuthorUnsubmitted Done Reply Inline Actions Because the `TotalProfiledCallsites` and `NumMismatchedCallsites` is the sum for the whole module functions not one function, so here for function level debug print, use a location variable to do it. wlei: Because the `TotalProfiledCallsites` and `NumMismatchedCallsites` is the sum for the whole…
		wenleiUnsubmitted Not Done Reply Inline Actions Ok, now I see what's going on. The code as is can be confusing (both the naming of variables, and the subtraction involved to get per-function stats). I suggest we make `countProfileMismatches` take two `int&` input through parameters, so it doesn't change global state as a way to output, which is inconsistent with the way it takes func as input through parameters. Then use `FuncMismatchedCallsites`, `FuncProfiledCallsites` to pass into `countProfileMismatches` to get its output. At caller side, we accumulate `FuncMismatchedCallsites`, `FuncProfiledCallsites` on to `TotalProfiledCallsites`, `TotalProfiledCallsites`. wenlei: Ok, now I see what's going on. The code as is can be confusing (both the naming of variables…
		wleiAuthorUnsubmitted Done Reply Inline Actions yeah, it's clearer this way, done. wlei: yeah, it's clearer this way, done.

		if (IsFuncHashMismatch && SalvageStaleProfile) {
		hoyUnsubmitted Not Done Reply Inline Actions BTW, wondering if you've ever seen mismatched callsites when function hash matches. The hash counts number of callsites but not their orders. hoy: BTW, wondering if you've ever seen mismatched callsites when function hash matches. The hash…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good question. Yes, there are many mismatched callsites even hash is matched, current work only support when a checksum mismatch is detected. There is a general issue whether we can turn it on for all the functions. That is whether the matching algorithm can handle perfectly with the non-stale profile. the current heuristic is "first come first match", but not all the functions are in the profile(supposing there are functions doesn't hit any samples), it could give inconsistent anchors for the non-stale profile then cause a mismatch. In order to solve it, I think we can try: Use a more strict checksum, like also count the orders. find a threshold from the mismatch metrics to control it. Use a different heuristic, like search the closest location which can handle well with non-stale profile, but need more measuring for the mismatched function. This is also an issue blocking AutoFDO, since AutoFDO doesn't have the checksum. wlei: Good question. Yes, there are many mismatched callsites even hash is matched, current work only…
		wenleiUnsubmitted Not Done Reply Inline Actions it might be useful to have a debug print, or `STATISTIC` so we know when and how often this happens. It's essentially hash collision. wenlei: it might be useful to have a debug print, or `STATISTIC` so we know when and how often this…
		hoyUnsubmitted Not Done Reply Inline Actions Good point. Counting how many direct callsites having mismatched targets in the profile and on the IR would be helpful. hoy: Good point. Counting how many direct callsites having mismatched targets in the profile and on…
		hoyUnsubmitted Not Done Reply Inline Actions Also as discussed offline, since IR callsites are matched with profile callsites sequentially, it may be interesting to see how many same-named callsites are not able to get a match. If it's non-trivial, adding zero-count callsites in the profile may allow for a more precise match. Of course that's going to increase profile size, but just curious if it is worth. hoy: Also as discussed offline, since IR callsites are matched with profile callsites sequentially…
		wleiAuthorUnsubmitted Done Reply Inline Actions it might be useful to have a debug print, or STATISTIC so we know when and how often this happens. It's essentially hash collision. We already have the global statistic report for the mismatched callsite(the `TotalProfiledCallsites` under `ReportProfileStaleness`), added a debug print for the function level debugging. Also as discussed offline, since IR callsites are matched with profile callsites sequentially, it may be interesting to see how many same-named callsites are not able to get a match. If it's non-trivial, adding zero-count callsites in the profile may allow for a more precise match. Of course that's going to increase profile size, but just curious if it is worth. Yeah, we could do a offline analysis for how many callsites whose name can be found in the profile but still remain mismatched after matching. We could do add the zero-count calliste in llvm-profgen, only for top-level is enough, I guess that size should be ok with extbinary. Alternatively, we could use "search closest location" heuristic so that for a fresh profile where the profile and the IR should be the same location, the IR location can always be matched to the same location in the Profile. wlei: > it might be useful to have a debug print, or STATISTIC so we know when and how often this…
		LLVM_DEBUG(dbgs() << "Run stale profile matching for " << F.getName()
		<< "\n");

		StringMap<std::set<LineLocation>> CalleeToCallsitesMap;
		populateProfileCallsites(FS, CalleeToCallsitesMap);

		// The matching result will be saved to IRToProfileLocationMap.
		auto &IRToProfileLocationMap = getOrCreateIRToProfileLocationMap(F);

		runStaleProfileMatching(IRLocations, CalleeToCallsitesMap,
		IRToProfileLocationMap);
		}
}		}

void SampleProfileMatcher::runOnModule() {		void SampleProfileMatcher::runOnModule() {
for (auto &F : M) {		for (auto &F : M) {
if (F.isDeclaration() \|\| !F.hasFnAttribute("use-sample-profile"))		if (F.isDeclaration() \|\| !F.hasFnAttribute("use-sample-profile"))
continue;		continue;
FunctionSamples *FS = nullptr;		FunctionSamples *FS = nullptr;
		wenleiUnsubmitted Not Done Reply Inline Actions I'm wondering if we should just create a map here, and change the `getOrCreateIRToProfileLocationMap` API into a simple getter? There seems to be very clear separation as to when we need to create and when we need to get, since there's no case for on-demand creation. Hence, simple/separate API might be cleaner, and less error-prone. wenlei: I'm wondering if we should just create a map here, and change the…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yes, here the matching should run only once per function, makes sense to just create a map. wlei: Yes, here the matching should run only once per function, makes sense to just create a map.
if (FlattenProfileForMatching)		if (FlattenProfileForMatching)
FS = getFlattenedSamplesFor(F);		FS = getFlattenedSamplesFor(F);
else		else
FS = Reader.getSamplesFor(F);		FS = Reader.getSamplesFor(F);
if (!FS)		if (!FS)
continue;		continue;
runOnFunction(F, *FS);		runOnFunction(F, *FS);
}		}
		if (SalvageStaleProfile)
		distributeIRToProfileLocationMap();

if (ReportProfileStaleness) {		if (ReportProfileStaleness) {
if (FunctionSamples::ProfileIsProbeBased) {		if (FunctionSamples::ProfileIsProbeBased) {
errs() << "(" << NumMismatchedFuncHash << "/" << TotalProfiledFunc << ")"		errs() << "(" << NumMismatchedFuncHash << "/" << TotalProfiledFunc << ")"
<< " of functions' profile are invalid and "		<< " of functions' profile are invalid and "
<< " (" << MismatchedFuncHashSamples << "/" << TotalFuncHashSamples		<< " (" << MismatchedFuncHashSamples << "/" << TotalFuncHashSamples
<< ")"		<< ")"
<< " of samples are discarded due to function hash mismatch.\n";		<< " of samples are discarded due to function hash mismatch.\n";
Show All 26 Lines	if (PersistProfileStaleness) {
ProfStatsVec.emplace_back("TotalCallsiteSamples", TotalCallsiteSamples);		ProfStatsVec.emplace_back("TotalCallsiteSamples", TotalCallsiteSamples);

auto *MD = MDB.createLLVMStats(ProfStatsVec);		auto *MD = MDB.createLLVMStats(ProfStatsVec);
auto *NMD = M.getOrInsertNamedMetadata("llvm.stats");		auto *NMD = M.getOrInsertNamedMetadata("llvm.stats");
NMD->addOperand(MD);		NMD->addOperand(MD);
}		}
}		}

		void SampleProfileMatcher::distributeIRToProfileLocationMap(
		FunctionSamples &FS) {
		const auto ProfileMappings = FuncToMatchingsMap.find(FS.getName());
		if (ProfileMappings != FuncToMatchingsMap.end()) {
		FS.setIRToProfileLocationMap(&(ProfileMappings->second));
		}

		for (auto &Inlinees : FS.getCallsiteSamples()) {
		for (auto FS : Inlinees.second) {
		distributeIRToProfileLocationMap(FS.second);
		}
		}
		}

		// Use a central place to distribute the matching results. Outlined and inlined
		// profile with the function name will be set to the same pointer.
		void SampleProfileMatcher::distributeIRToProfileLocationMap() {
		for (auto &I : Reader.getProfiles()) {
		distributeIRToProfileLocationMap(I.second);
		}
		}

bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,		bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummaryInfo *_PSI,		ProfileSummaryInfo *_PSI,
LazyCallGraph &CG) {		LazyCallGraph &CG) {
GUIDToFuncNameMapper Mapper(M, *Reader, GUIDToFuncNameMap);		GUIDToFuncNameMapper Mapper(M, *Reader, GUIDToFuncNameMap);

PSI = _PSI;		PSI = _PSI;
if (M.getProfileSummary(/* IsCS */ false) == nullptr) {		if (M.getProfileSummary(/* IsCS */ false) == nullptr) {
M.setProfileSummary(Reader->getSummary().getMD(M.getContext()),		M.setProfileSummary(Reader->getSummary().getMD(M.getContext()),
▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof

This file was added.

				main:1497:0
				1: 0
				2: 112
				3: 112 bar:60 dummy_calltarget:50
				4: 116
				5: 0
				7: 124 bar:124
				9: 126 bar:126
				hoyUnsubmitted Not Done Reply Inline Actions Add a case for multiple call targets at same location? hoy: Add a case for multiple call targets at same location?
				wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good, added a line including multiple call targets wlei: Sounds good, added a line including multiple call targets
				6: foo:452
				1: 112
				2: 101
				3: 13
				4: 112
				5: 101 bar:109
				6: 13 bar:14
				!CFGChecksum: 563022570642068
				8: foo:472
				1: 117
				2: 104
				3: 13
				4: 121
				5: 104 bar:104
				6: 14 bar:14
				!CFGChecksum: 563022570642068
				!CFGChecksum: 1125988587804525
				bar:491:491
				1: 491
				!CFGChecksum: 4294967295

llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll

This file was added.

				; REQUIRES: x86_64-linux
				; REQUIRES: asserts
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-stale-profile-matching.prof --salvage-stale-profile -S --debug-only=sample-profile 2>&1 \| FileCheck %s

				; The profiled source code:

				; volatile int x = 1;
				; __attribute__((noinline)) int bar(int p) {
				; return p;
				; }

				; __attribute__((always_inline)) int foo(int i, int p) {
				; if (i % 10) return bar(p);
				; else return bar(p + 1);
				; }

				; int main() {
				; for (int i = 0; i < 1000 * 1000; i++) {
				; x += foo(i, x);
				; x += bar(x);
				; x += foo(i, x);
				; x += bar(x);
				; }
				; }

				; The source code for the current build:

				; volatile int x = 1;
				; __attribute__((noinline)) int bar(int p) {
				; return p;
				; }

				; __attribute__((always_inline)) int foo(int i, int p) {
				; if (i % 10) return bar(p);
				; else return bar(p + 1);
				; }

				; int main() {
				; if (x == 0) // code change
				; return 0; // code change
				; for (int i = 0; i < 1000 * 1000; i++) {
				; x += foo(i, x);
				; x += bar(x);
				; if (i < 0) // code change
				; return 0; // code change
				; x += foo(i, x);
				; x += bar(x);
				; }
				; }


				; CHECK: Run stale profile matching for main

				; CHECK: Location is matched from 1 to 1
				; CHECK: Location is matched from 2 to 2
				; CHECK: Location is matched from 3 to 3
				; CHECK: Location is matched from 4 to 4
				; CHECK: Location is matched from 5 to 5
				; CHECK: Location is matched from 6 to 6
				; CHECK: Location is matched from 7 to 7
				; CHECK: Location is matched from 8 to 8
				; CHECK: Location is matched from 9 to 9
				; CHECK: Location is matched from 10 to 10
				; CHECK: Location is matched from 11 to 11

				; CHECK: Callsite with callee:foo is matched from 13 to 6
				; CHECK: Location is rematched backwards from 7 to 0
				; CHECK: Location is rematched backwards from 8 to 1
				; CHECK: Location is rematched backwards from 9 to 2
				; CHECK: Location is rematched backwards from 10 to 3
				; CHECK: Location is rematched backwards from 11 to 4
				; CHECK: Callsite with callee:bar is matched from 14 to 7
				; CHECK: Callsite with callee:foo is matched from 15 to 8
				; CHECK: Callsite with callee:bar is matched from 16 to 9


				; CHECK: 2: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 2, i32 0, i64 -1), !dbg !60 - weight: 112 - factor: 1.00)
				; CHECK: 3: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 3, i32 0, i64 -1), !dbg !61 - weight: 112 - factor: 1.00)
				; CHECK: 4: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 4, i32 0, i64 -1), !dbg !65 - weight: 116 - factor: 1.00)
				; CHECK: 5: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 5, i32 0, i64 -1), !dbg !68 - weight: 0 - factor: 1.00)
				; CHECK: 1: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 1, i32 0, i64 -1), !dbg !81 - weight: 112 - factor: 1.00)
				; CHECK: 2: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 2, i32 0, i64 -1), !dbg !85 - weight: 101 - factor: 1.00)
				; CHECK: 5: %call.i3 = call i32 @bar(i32 noundef %1), !dbg !86 - weight: 101 - factor: 1.00)
				; CHECK: 3: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 3, i32 0, i64 -1), !dbg !89 - weight: 13 - factor: 1.00)
				; CHECK: 6: %call1.i6 = call i32 @bar(i32 noundef %add.i5), !dbg !91 - weight: 13 - factor: 1.00)
				; CHECK: 4: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 4, i32 0, i64 -1), !dbg !95 - weight: 112 - factor: 1.00)
				; CHECK: 14: %call2 = call i32 @bar(i32 noundef %3), !dbg !98 - weight: 124 - factor: 1.00)
				; CHECK: 8: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 8, i32 0, i64 -1), !dbg !104 - weight: 0 - factor: 1.00)
				; CHECK: 1: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 1, i32 0, i64 -1), !dbg !109 - weight: 117 - factor: 1.00)
				; CHECK: 2: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 2, i32 0, i64 -1), !dbg !112 - weight: 104 - factor: 1.00)
				; CHECK: 5: %call.i = call i32 @bar(i32 noundef %5), !dbg !113 - weight: 104 - factor: 1.00)
				; CHECK: 3: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 3, i32 0, i64 -1), !dbg !115 - weight: 13 - factor: 1.00)
				; CHECK: 6: %call1.i = call i32 @bar(i32 noundef %add.i), !dbg !117 - weight: 14 - factor: 1.00)
				; CHECK: 4: call void @llvm.pseudoprobe(i64 6699318081062747564, i64 4, i32 0, i64 -1), !dbg !120 - weight: 121 - factor: 1.00)
				; CHECK: 16: %call9 = call i32 @bar(i32 noundef %7), !dbg !123 - weight: 126 - factor: 1.00)
				; CHECK: 9: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 9, i32 0, i64 -1), !dbg !126 - weight: 112 - factor: 1.00)
				; CHECK: 10: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 10, i32 0, i64 -1), !dbg !131 - weight: 112 - factor: 1.00)
				; CHECK: 11: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 11, i32 0, i64 -1), !dbg !132 - weight: 116 - factor: 1.00)
				; CHECK: 1: call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 1, i32 0, i64 -1), !dbg !52 - weight: 0 - factor: 1.00)


				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@x = dso_local global i32 1, align 4, !dbg !0

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32 @bar(i32 noundef %p) #0 !dbg !16 {
				entry:
				call void @llvm.dbg.value(metadata i32 %p, metadata !20, metadata !DIExpression()), !dbg !21
				call void @llvm.pseudoprobe(i64 -2012135647395072713, i64 1, i32 0, i64 -1), !dbg !22
				ret i32 %p, !dbg !23
				}

				; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
				declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

				; Function Attrs: alwaysinline nounwind uwtable
				define dso_local i32 @foo(i32 noundef %i, i32 noundef %p) #2 !dbg !24 {
				entry:
				call void @llvm.dbg.value(metadata i32 %i, metadata !28, metadata !DIExpression()), !dbg !30
				call void @llvm.dbg.value(metadata i32 %p, metadata !29, metadata !DIExpression()), !dbg !30
				call void @llvm.pseudoprobe(i64 6699318081062747564, i64 1, i32 0, i64 -1), !dbg !31
				%rem = srem i32 %i, 10, !dbg !33
				%tobool = icmp ne i32 %rem, 0, !dbg !33
				br i1 %tobool, label %if.then, label %if.else, !dbg !34

				if.then: ; preds = %entry
				call void @llvm.pseudoprobe(i64 6699318081062747564, i64 2, i32 0, i64 -1), !dbg !35
				%call = call i32 @bar(i32 noundef %p), !dbg !36
				br label %return, !dbg !38

				if.else: ; preds = %entry
				call void @llvm.pseudoprobe(i64 6699318081062747564, i64 3, i32 0, i64 -1), !dbg !39
				%add = add nsw i32 %p, 1, !dbg !40
				%call1 = call i32 @bar(i32 noundef %add), !dbg !41
				br label %return, !dbg !43

				return: ; preds = %if.else, %if.then
				%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ], !dbg !44
				call void @llvm.pseudoprobe(i64 6699318081062747564, i64 4, i32 0, i64 -1), !dbg !45
				ret i32 %retval.0, !dbg !45
				}

				; Function Attrs: nounwind uwtable
				define dso_local i32 @main() #3 !dbg !46 {
				entry:
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 1, i32 0, i64 -1), !dbg !52
				%0 = load volatile i32, ptr @x, align 4, !dbg !52, !tbaa !54
				%cmp = icmp eq i32 %0, 0, !dbg !58
				br i1 %cmp, label %if.then, label %if.end, !dbg !59

				if.then: ; preds = %entry
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 2, i32 0, i64 -1), !dbg !60
				br label %for.end, !dbg !60

				if.end: ; preds = %entry
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 3, i32 0, i64 -1), !dbg !61
				call void @llvm.dbg.value(metadata i32 0, metadata !50, metadata !DIExpression()), !dbg !62
				br label %for.cond, !dbg !63

				for.cond: ; preds = %if.end6, %if.end
				%i.0 = phi i32 [ 0, %if.end ], [ %inc, %if.end6 ], !dbg !64
				call void @llvm.dbg.value(metadata i32 %i.0, metadata !50, metadata !DIExpression()), !dbg !62
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 4, i32 0, i64 -1), !dbg !65
				%cmp1 = icmp slt i32 %i.0, 1000000, !dbg !67
				br i1 %cmp1, label %for.body, label %for.cond.cleanup, !dbg !68

				for.cond.cleanup: ; preds = %for.cond
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 5, i32 0, i64 -1), !dbg !68
				br label %cleanup, !dbg !68

				for.body: ; preds = %for.cond
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 6, i32 0, i64 -1), !dbg !69
				%1 = load volatile i32, ptr @x, align 4, !dbg !71, !tbaa !54
				%call = call i32 @foo(i32 noundef %i.0, i32 noundef %1), !dbg !72
				%2 = load volatile i32, ptr @x, align 4, !dbg !74, !tbaa !54
				%add = add nsw i32 %2, %call, !dbg !74
				store volatile i32 %add, ptr @x, align 4, !dbg !74, !tbaa !54
				%3 = load volatile i32, ptr @x, align 4, !dbg !75, !tbaa !54
				%call2 = call i32 @bar(i32 noundef %3), !dbg !76
				%4 = load volatile i32, ptr @x, align 4, !dbg !78, !tbaa !54
				%add3 = add nsw i32 %4, %call2, !dbg !78
				store volatile i32 %add3, ptr @x, align 4, !dbg !78, !tbaa !54
				br i1 false, label %if.then5, label %if.end6, !dbg !79

				if.then5: ; preds = %for.body
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 7, i32 0, i64 -1), !dbg !80
				br label %cleanup, !dbg !80

				if.end6: ; preds = %for.body
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 8, i32 0, i64 -1), !dbg !82
				%5 = load volatile i32, ptr @x, align 4, !dbg !83, !tbaa !54
				%call7 = call i32 @foo(i32 noundef %i.0, i32 noundef %5), !dbg !84
				%6 = load volatile i32, ptr @x, align 4, !dbg !86, !tbaa !54
				%add8 = add nsw i32 %6, %call7, !dbg !86
				store volatile i32 %add8, ptr @x, align 4, !dbg !86, !tbaa !54
				%7 = load volatile i32, ptr @x, align 4, !dbg !87, !tbaa !54
				%call9 = call i32 @bar(i32 noundef %7), !dbg !88
				%8 = load volatile i32, ptr @x, align 4, !dbg !90, !tbaa !54
				%add10 = add nsw i32 %8, %call9, !dbg !90
				store volatile i32 %add10, ptr @x, align 4, !dbg !90, !tbaa !54
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 9, i32 0, i64 -1), !dbg !91
				%inc = add nsw i32 %i.0, 1, !dbg !91
				call void @llvm.dbg.value(metadata i32 %inc, metadata !50, metadata !DIExpression()), !dbg !62
				br label %for.cond, !dbg !92, !llvm.loop !93

				cleanup: ; preds = %if.then5, %for.cond.cleanup
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 10, i32 0, i64 -1), !dbg !96
				br label %for.end

				for.end: ; preds = %cleanup, %if.then
				call void @llvm.pseudoprobe(i64 -2624081020897602054, i64 11, i32 0, i64 -1), !dbg !97
				ret i32 0, !dbg !97
				}

				; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
				declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #4

				; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
				declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #4

				; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
				declare void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata) #1

				; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite)
				declare void @llvm.pseudoprobe(i64, i64, i32, i64) #5

				; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
				declare void @llvm.dbg.value(metadata, metadata, metadata) #6

				attributes #0 = { noinline nounwind uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #1 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }
				attributes #2 = { alwaysinline nounwind uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #3 = { nounwind uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "use-sample-profile" }
				attributes #4 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
				attributes #5 = { mustprogress nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite) }
				attributes #6 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!7, !8, !9, !10, !11}
				!llvm.ident = !{!12}
				!llvm.pseudo_probe_desc = !{!13, !14, !15}

				!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
				!1 = distinct !DIGlobalVariable(name: "x", scope: !2, file: !3, line: 1, type: !5, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C11, file: !3, producer: "clang version 17.0.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, nameTableKind: None)
				!3 = !DIFile(filename: "test.c", directory: "path")
				!4 = !{!0}
				!5 = !DIDerivedType(tag: DW_TAG_volatile_type, baseType: !6)
				!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!7 = !{i32 7, !"Dwarf Version", i32 5}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"wchar_size", i32 4}
				!10 = !{i32 7, !"uwtable", i32 2}
				!11 = !{i32 7, !"debug-info-assignment-tracking", i1 true}
				!12 = !{!"clang version 17.0.0"}
				!13 = !{i64 -2012135647395072713, i64 4294967295, !"bar"}
				!14 = !{i64 6699318081062747564, i64 563022570642068, !"foo"}
				!15 = !{i64 -2624081020897602054, i64 1126158552146340, !"main"}
				!16 = distinct !DISubprogram(name: "bar", scope: !3, file: !3, line: 2, type: !17, scopeLine: 2, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !19)
				!17 = !DISubroutineType(types: !18)
				!18 = !{!6, !6}
				!19 = !{!20}
				!20 = !DILocalVariable(name: "p", arg: 1, scope: !16, file: !3, line: 2, type: !6)
				!21 = !DILocation(line: 0, scope: !16)
				!22 = !DILocation(line: 3, column: 10, scope: !16)
				!23 = !DILocation(line: 3, column: 3, scope: !16)
				!24 = distinct !DISubprogram(name: "foo", scope: !3, file: !3, line: 6, type: !25, scopeLine: 6, flags: DIFlagPrototyped \| DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !27)
				!25 = !DISubroutineType(types: !26)
				!26 = !{!6, !6, !6}
				!27 = !{!28, !29}
				!28 = !DILocalVariable(name: "i", arg: 1, scope: !24, file: !3, line: 6, type: !6)
				!29 = !DILocalVariable(name: "p", arg: 2, scope: !24, file: !3, line: 6, type: !6)
				!30 = !DILocation(line: 0, scope: !24)
				!31 = !DILocation(line: 7, column: 6, scope: !32)
				!32 = distinct !DILexicalBlock(scope: !24, file: !3, line: 7, column: 6)
				!33 = !DILocation(line: 7, column: 8, scope: !32)
				!34 = !DILocation(line: 7, column: 6, scope: !24)
				!35 = !DILocation(line: 7, column: 26, scope: !32)
				!36 = !DILocation(line: 7, column: 22, scope: !37)
				!37 = !DILexicalBlockFile(scope: !32, file: !3, discriminator: 186646575)
				!38 = !DILocation(line: 7, column: 14, scope: !32)
				!39 = !DILocation(line: 8, column: 19, scope: !32)
				!40 = !DILocation(line: 8, column: 21, scope: !32)
				!41 = !DILocation(line: 8, column: 15, scope: !42)
				!42 = !DILexicalBlockFile(scope: !32, file: !3, discriminator: 186646583)
				!43 = !DILocation(line: 8, column: 8, scope: !32)
				!44 = !DILocation(line: 0, scope: !32)
				!45 = !DILocation(line: 9, column: 1, scope: !24)
				!46 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 11, type: !47, scopeLine: 11, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !2, retainedNodes: !49)
				!47 = !DISubroutineType(types: !48)
				!48 = !{!6}
				!49 = !{!50}
				!50 = !DILocalVariable(name: "i", scope: !51, file: !3, line: 14, type: !6)
				!51 = distinct !DILexicalBlock(scope: !46, file: !3, line: 14, column: 3)
				!52 = !DILocation(line: 12, column: 6, scope: !53)
				!53 = distinct !DILexicalBlock(scope: !46, file: !3, line: 12, column: 6)
				!54 = !{!55, !55, i64 0}
				!55 = !{!"int", !56, i64 0}
				!56 = !{!"omnipotent char", !57, i64 0}
				!57 = !{!"Simple C/C++ TBAA"}
				!58 = !DILocation(line: 12, column: 8, scope: !53)
				!59 = !DILocation(line: 12, column: 6, scope: !46)
				!60 = !DILocation(line: 13, column: 5, scope: !53)
				!61 = !DILocation(line: 14, column: 11, scope: !51)
				!62 = !DILocation(line: 0, scope: !51)
				!63 = !DILocation(line: 14, column: 7, scope: !51)
				!64 = !DILocation(line: 14, scope: !51)
				!65 = !DILocation(line: 14, column: 18, scope: !66)
				!66 = distinct !DILexicalBlock(scope: !51, file: !3, line: 14, column: 3)
				!67 = !DILocation(line: 14, column: 20, scope: !66)
				!68 = !DILocation(line: 14, column: 3, scope: !51)
				!69 = !DILocation(line: 15, column: 15, scope: !70)
				!70 = distinct !DILexicalBlock(scope: !66, file: !3, line: 14, column: 40)
				!71 = !DILocation(line: 15, column: 18, scope: !70)
				!72 = !DILocation(line: 15, column: 11, scope: !73)
				!73 = !DILexicalBlockFile(scope: !70, file: !3, discriminator: 186646639)
				!74 = !DILocation(line: 15, column: 8, scope: !70)
				!75 = !DILocation(line: 16, column: 15, scope: !70)
				!76 = !DILocation(line: 16, column: 11, scope: !77)
				!77 = !DILexicalBlockFile(scope: !70, file: !3, discriminator: 186646647)
				!78 = !DILocation(line: 16, column: 8, scope: !70)
				!79 = !DILocation(line: 17, column: 9, scope: !70)
				!80 = !DILocation(line: 18, column: 8, scope: !81)
				!81 = distinct !DILexicalBlock(scope: !70, file: !3, line: 17, column: 9)
				!82 = !DILocation(line: 19, column: 15, scope: !70)
				!83 = !DILocation(line: 19, column: 18, scope: !70)
				!84 = !DILocation(line: 19, column: 11, scope: !85)
				!85 = !DILexicalBlockFile(scope: !70, file: !3, discriminator: 186646655)
				!86 = !DILocation(line: 19, column: 8, scope: !70)
				!87 = !DILocation(line: 20, column: 15, scope: !70)
				!88 = !DILocation(line: 20, column: 11, scope: !89)
				!89 = !DILexicalBlockFile(scope: !70, file: !3, discriminator: 186646663)
				!90 = !DILocation(line: 20, column: 8, scope: !70)
				!91 = !DILocation(line: 14, column: 36, scope: !66)
				!92 = !DILocation(line: 14, column: 3, scope: !66)
				!93 = distinct !{!93, !68, !94, !95}
				!94 = !DILocation(line: 21, column: 3, scope: !51)
				!95 = !{!"llvm.loop.mustprogress"}
				!96 = !DILocation(line: 0, scope: !46)
				!97 = !DILocation(line: 22, column: 1, scope: !46)

This is an archive of the discontinued LLVM Phabricator instance.

[SamplePGO] Stale profile matching(part 2)ClosedPublic

Details

Call target name anchor based profile fuzzy matching

Performance evaluation:

Diff Detail

Event Timeline

Revision Contents

Diff 514315

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof

llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll

[SamplePGO] Stale profile matching(part 2)
ClosedPublic