This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Skip reporting staleness metrics for imported functions
ClosedPublic

Authored by wlei on Jul 31 2023, 11:29 AM.

Details

Summary

Accumulating the staleness metrics from per-link is less accurate than doing it from post-link time(assuming we use the offline profile mismatch as baseline), the reason is that there are some duplicated reports for the same functions, for example, one template function could be included in multiple TUs, but in post thin link time, only one function are kept(linkonce_odr) and others are marked as available-externally function. Hence, this change skips reporting the metrics for imported functions(available-externally).

I saw the post-link number is now very close to the offline number(dump the mismatched functions and count the metrics offline based on the entire profile), sightly smaller than offline number due to some missing inlined functions.

Diff Detail

Event Timeline

wlei created this revision.Jul 31 2023, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 11:29 AM
Herald added subscribers: ormris, hoy, wenlei and 2 others. · View Herald Transcript
wlei requested review of this revision.Jul 31 2023, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 11:29 AM
wlei retitled this revision from [CSSPGO] Improve computing profile staleness metrics in post-link time to [CSSPGO] Improve profile staleness report for post-link time.Jul 31 2023, 12:28 PM
wlei edited the summary of this revision. (Show Details)
wlei added reviewers: hoy, wenlei.
wlei updated this revision to Diff 550203.Aug 15 2023, 12:06 AM

retire flatten-profile-for-matching switch, always use flattened profile for matching.

wlei updated this revision to Diff 550205.Aug 15 2023, 12:17 AM

Updating D156725: [CSSPGO] Improve profile staleness report for post-link time

For the current sample loader, if the root or parent function profile are lost or mismatched, all its profile and its children profile are dropped, we should use the non-flattened profile(the original nested profile) to count the total mismatched samples.

How significantly does this affect staleness metric?

Maybe what we should actually do is to reuse the nested profile from those mismatched parent functions, something like run promoteMergeNotInlinedContextSamples on top level functions samples not used? We do reuse nested profile for not inlined callees from parent function.

llvm/lib/Transforms/IPO/SampleProfile.cpp
2120

Maybe this function needs a refactoring - break it into two functions: 1) findAnchors that only uses IR, 2) matchCallsites that also uses profile in addition to anchors. We call always call #1, but only call #2 when profile is available and reporting is enabled.

Also if MatchedCallsiteLocs is only used for reporting, but not matching, can we move #2 into countProfileMismatches? Currently the different parts seem a bit inter-wined.. would be good have some separation.

Maybe #2 can also be a standalone functions that covers both call site matching for report and for actual matching (like populateProfileCallsites on flattened profile)

2411–2412

nit: Skip to report -> Skip reporting, the imported functions -> imported functions

2411–2412

This function is growing bigger. Can we move line 2384-2399 and line 2405-2414 into countProfileMismatches? These are all related to mismatch statistics.

2418

remove?

2437

nit: instead of FSForReporting vs FSForMatching, just say FS and FSFlattened instead? it applies to other instances too.

wlei updated this revision to Diff 553384.Aug 24 2023, 11:49 PM

Updating D156725: [CSSPGO] Improve profile staleness report for post-link time

wlei retitled this revision from [CSSPGO] Improve profile staleness report for post-link time to [CSSPGO] Skip reporting staleness metrics for imported functions.Aug 25 2023, 3:13 PM
wlei edited the summary of this revision. (Show Details)
wlei added a comment.Aug 25 2023, 3:16 PM

Split the original patch into four patches, https://reviews.llvm.org/D156725, https://reviews.llvm.org/D158891, https://reviews.llvm.org/D158900, https://reviews.llvm.org/D158817
Or you can simply find them in the patch stack.

wenlei accepted this revision.Aug 25 2023, 3:39 PM

The stack with refactoring first makes the changes easier to review. Thanks!

LGTM

This revision is now accepted and ready to land.Aug 25 2023, 3:39 PM
hoy added inline comments.Aug 29 2023, 5:28 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2411–2412

As discussed in the other patch https://reviews.llvm.org/D158891, skipping external functions may cause its nested non-external callee samples to be skipped too. This is not what we want?

2453

nit: remove the colon

hoy accepted this revision.Aug 29 2023, 11:39 PM
hoy added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
2411–2412

As using nested profile to count mismatches is by-design, the problem here is not quite relevant to the other patch and the external function check here.

A better way to frame the question is, when and where will the nested callee samples be counted if they are not inlined but its checksum is mismatched? In this case, the callee samples can be returned during inlining, but it should be counted when processing the callee.

The problem seems not easy to solve as the matcher doesn't know if a callsite will be inlined or not, and due to a hack in our system inlined samples always skip the checksum check.

This revision was landed with ongoing or failed builds.Aug 30 2023, 6:02 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-profile-mismatch-nested.prof