This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Retire FlattenProfileForMatching
ClosedPublic

Authored by wlei on Aug 25 2023, 2:13 PM.

Details

Summary
  • Always use flattened profile to find the profile anchors. Since profile under different contexts may have different inlined callsites, to get more profile anchors, we use a merged profile from all the contexts(the flattened profile) to find callsite anchors.
  • Compute the staleness metrics based on the original nested profile, as currently once a callsite is mismatched, all its children profile are dropped.(TODO: in future, we can improve to reuse the children valid profile)

Diff Detail

Event Timeline

wlei created this revision.Aug 25 2023, 2:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 2:13 PM
wlei requested review of this revision.Aug 25 2023, 2:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 2:13 PM
wlei edited the summary of this revision. (Show Details)Aug 25 2023, 2:19 PM
wlei added reviewers: hoy, wenlei.
wenlei accepted this revision.Aug 25 2023, 2:24 PM

lgtm except nits, thanks.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2379

nit: move this right before the use on line 2398?

2380–2382

nit: it may be useful to emphasize the benefit of use flattened profile for matching, i.e. it's not an arbitrary choice, but we actually see benefit from doing that.

This revision is now accepted and ready to land.Aug 25 2023, 2:24 PM
wlei updated this revision to Diff 553678.Aug 25 2023, 5:38 PM

Updating D158891: [CSSPGO] Retire FlattenProfileForMatching

llvm/lib/Transforms/IPO/SampleProfile.cpp
2379

done!

2380–2382

Refined the comments.

hoy added inline comments.Aug 29 2023, 4:44 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

nit: profile -> profiles

once a callsite is mismatched, all its children profile are drooped

Why children profiles can be dropped? Counting happens before inlining right?

wlei added inline comments.Aug 29 2023, 5:03 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

yeah, In the current findCalleeFunctionSamples implementation, if the mismatch happens, it would return null, but there is no extra process to reuse those samples, so it's "dropped".

I think as Wenlei has proposed a solution in other patch:

Maybe what we should actually do is to reuse the nested profile from those mismatched parent functions, something like run promoteMergeNotInlinedContextSamples on top level functions samples not used? We do reuse nested profile for not inlined callees from parent function.

That sounds reasonable, but as we talked offline, we could leave it in future.

wlei updated this revision to Diff 554531.Aug 29 2023, 5:06 PM

addressing feedback, fix typo.

hoy added inline comments.Aug 29 2023, 5:27 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

I'm a bit confused. FlattenedProfiles is immutable once initialized right? What do we lose when using FSFlattened for countProfileMismatches?

If we are not to count for external functions, the nested callees there will be skipped too? But using a flattened profile can keep them, if they are not external.

wlei added inline comments.Aug 29 2023, 7:55 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

Let me show you with an example, say we have a profile like

A:102:1
 1:1
 2:B: 101:1
   1:1
   2:C:100:1
     1:1
      ...
     100:1

Say callsite B in A is changed to other location and the callsite is mismatched. Here like in this patch, it use the non-flattened profile.
the mismached Samples is 101, i,e, it not only count B's profile but also all the children profiles(C)

But if we use flattened profile:

A:2:1
 1:1
 1: B:1
B:2:1
 1:1
 1: C:1
C:100:1
 1:1
  ...
  100:1

The callsite of B in A is mismatched, but now the callsite count is 1, the mismatched Samples is 1.

I know it feels counter-intuitive, there is no mismatch in B and C, but that is the current implementation, we don't reuse the children profile. Once the parent callsite is mismatched, all the children profiles are dropped.

hoy added inline comments.Aug 29 2023, 9:14 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

Thanks for the explanation. I have a better understanding now.

But I still have a question about how to count mismatched samples. In your examples, the 101 samples for B & C may still be used if B & C have no source changes. Yes, I get that the inlining won't happen, but even if it happens it only saves the cost of one call instruction and perhaps corresponding parameter passing cost? In other words, whether the 101 samples can be applied to inlined B&C depends on how well the matching works for B & C?

BTW, how much difference in terms of the staleness number did you see with switching to using nested profile?

wlei added inline comments.Aug 29 2023, 10:52 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

Sorry I may not fully understand your question. Here the metrics is computed before the fuzzy matching, i,e. if it's a mismatch in the source code, even later the fuzzy matching recover to the right location, we still count that location is a mismatch. We haven't done the post fuzzy matching metrics.

In your examples, the 101 samples for B & C may still be used if B & C have no source changes.

This can't happen in the current implementation, if the mismatch happens, the 101 samples is never used no matter whether B&C is inlined/matched or not, because In A, when findCalleeFunctionSamples return null, it won't visit A's children profiles, the children B &C profile won't be merged into the base B & C profile.

BTW, how much difference in terms of the staleness number did you see with switching to using nested profile?

It should matter, I saw a case about 2 times differences, it depends on how hot the dropped children profile is.

hoy accepted this revision.Aug 29 2023, 11:09 PM
hoy added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

This can't happen in the current implementation, if the mismatch happens, the 101 samples is never used no matter whether B&C is inlined/matched or not, because In A, when findCalleeFunctionSamples return null, it won't visit A's children profiles, the children B &C profile won't be merged into the base B & C profile.

I see. It makes sense now. A callee profile can be returned only if it is discovered at some callsite in the caller. Thanks for the explanation!

hoy added inline comments.Aug 30 2023, 12:02 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

Left a question in the other thread https://reviews.llvm.org/D156725, but I think it may also be relavant here.

The question is, when and where will the nested callee samples be counted if they are not inlined but its checksum is mismatched. In this case, the callee samples can be returned during inlining.

It sounds like we may need to selectively flatten a profile for more accurate counting, e.g, only flattening matched callsites. Maybe what Wenlei proposed in the other patch is something similar.

I feel that to be more accurate, the mismatch counter needs to more closely mimic what the sample inliner does, or the other way around, i.e, to fix the inliner to return samples for callsites that don't exist on the current IR.

Not sure how important the problem is though. But if we really want to be as accurate as enough, maybe combining the matcher and the inliner can get us there.

wenlei added inline comments.Aug 30 2023, 10:28 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

It sounds like we may need to selectively flatten a profile for more accurate counting, e.g, only flattening matched callsites. Maybe what Wenlei proposed in the other patch is something similar.

Making mismatch counting and inliner's use of profile consistent will make the mismatch/stale metric more meaningful. However, I think there are still things to be improved in terms of how inliner uses profile - i.e. even if top level function mismatches, nested profile may still be reused.

My suggestion is to prioritize better use of profile first, before do everything we can to align mismatch counting.

wlei added inline comments.Aug 30 2023, 2:17 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2402

Yeah, I agree there are two remaining issues:

  1. The inliner. Currently we don't reused the mismatched callsite's children profile.
  1. The mismatch metric. Yes, even with current fix, we only count the top-level profile's mismatched callsite, we don't count the nested inlined profile's mismatched callsite.

IMO, the solution for #1 is we can do the similar thing as promoteMergeNotInlinedContextSamples.

the solution for #2 is we can do the similar things as Compute checksum mismatch recursively on nested profile , also count the nested profile. another solution is to extend SampleCoverageTracker (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h#L35)

SampleCoverageTracker is the thing that count the coverage(mismatch) along the sample loader(inlining), however, it only support BodySamples, we can extend the to callsiteSamples.

llvm/lib/Transforms/IPO/SampleProfile.cpp