This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Support stale profile matching for LTO
ClosedPublic

Authored by wlei on Jul 31 2023, 11:01 AM.

Details

Summary

As in per-link time, callsites could be optimized out by inlining, we don't have those original call targets in the IR in LTO time. Additionally, the inlined code doesn't actually belong to the original function, the IR locations or pseudo probe parsed from it are incorrect and could mislead the matching later.

This change adds the support to extract the original IR location info from the inlined code, specifically, it make sure to skip all the inlined code that doesn't belong the original function, but before that, it processes the inline frames of the debug info to extract the base frame and recover its callsite and callee target(name).

Measured on some stale profile instances, all showed some perf improvements.

Diff Detail

Event Timeline

wlei created this revision.Jul 31 2023, 11:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 11:01 AM
Herald added subscribers: ormris, hoy, wenlei and 3 others. · View Herald Transcript
wlei requested review of this revision.Jul 31 2023, 11:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 11:01 AM
wlei edited the summary of this revision. (Show Details)Jul 31 2023, 11:19 AM
wlei added reviewers: hoy, wenlei.
hoy added inline comments.Aug 3 2023, 1:24 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2143–2146

nit: use PrevDIL->getSubprogramLinkageName() instead.

2150

Is the continue needed since instructions in the same BB may not from a single callsite.

wlei updated this revision to Diff 547276.Aug 4 2023, 10:50 AM

addressing feedback

llvm/lib/Transforms/IPO/SampleProfile.cpp
2143–2146

Done, thanks!

2150

We need the continue, this is needed for all the inlined code, because it's not part of the original function.
otherwise the below code will extract the probe and callsite from a different function.

Note that this is under this condition, so it won't affect the original code.

if (DIL->getInlinedAt()) { 
   ....
   continue;
  }
hoy added inline comments.Aug 10 2023, 10:42 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2127

Might be good to mention IR flattening in the comment somewhere which is needed as the flattened profile is always used for matching.

2134

Do we also want to flatten for callsite probes?

hoy added inline comments.Aug 10 2023, 10:47 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2134

Also should this be done under FlattenProfileForMatching?

wlei added inline comments.Aug 10 2023, 11:57 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2134

Do we also want to flatten for callsite probes?

Block probe should be sufficient. for every callsite, there should be a block probe in the same BB, so their debug info inline stack is the same.

Also should this be done under FlattenProfileForMatching?

Not sure if this will works with FlattenProfileForMatching off. Maybe we even don't need this `FlattenProfileForMatching switch, I'm thinking FlattenIR or FlattenProfile should be always beneficial to the matching. Or maybe we need another switch FlattenIRForMatching

wlei updated this revision to Diff 549130.Aug 10 2023, 12:09 PM

update the commment.

hoy added inline comments.Aug 10 2023, 1:12 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2134

Block probe should be sufficient. for every callsite, there should be a block probe in the same BB, so their debug info inline stack is the same.

Sounds good, as long as the inlinee has one block probe left, the original callsite should be recoverd.

Not sure if this will works with FlattenProfileForMatching off. Maybe we even don't need this `FlattenProfileForMatching switch, I'm thinking FlattenIR or FlattenProfile should be always beneficial to the matching. Or maybe we need another switch FlattenIRForMatching

I mean as long as the IR flattening and profile flattening are synchronized it should be fine. Maybe we should just retire the FlattenProfileForMatching switch as we always use flattened profile.

wlei added inline comments.Aug 11 2023, 9:41 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2134

Sounds good to retire FlattenProfileForMatching , I will do it in https://reviews.llvm.org/D156725, which already made some changes to FlattenProfileForMatching.

wenlei added inline comments.Aug 24 2023, 10:44 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2121

nit on function naming: it not only populates IRLocations, but also MatchedCallsiteLocs.

2138

Given we tested isa<PseudoProbeInst>(&I) above, is extractProbe ever going to fail? Remove the if check or replace it with an assertion?

2139

nit: "base inline" is not a canonical term and can be confusing. You mean top-level inliner frame?

2156

nit: instead of spraying TODOs for line-number-based AutoFDO, we can have a top-level top for AutoFDO, and also explain 1) the current behavior for AutoFDO, 2) what actually needs to be done for AutoFDO.

2157–2160

For readability, would it make sense to tweak the code a bit so it's explicit that we're handling not-inlined probes. Initially I got confused since it didn't pay attention to the continue above..

if (isa<PseudoProbeInst>(&I)) {
  get DIL to check for inlining..
  if (Inlined case)... 

  else { // not inlined case

  }
}
2162

I know this is old code, but what intrinsics are we skipping here, is that just for skipping probe intrinsics or something else? If it's for skipping probes, maybe we can consider this?

if (probe inst) {
  record empty string in IRLocations
else if (call inst) {
  record actual call site in IRLocations
}

or

if (probe inst) {
  record empty string in IRLocations
else if (not call inst) {
  continue
}
record actual call site in IRLocations
2389–2390

As I read the code I noticed that "non-direct-call site" can be confusing as it can mean call site that is not direct call. But we really mean to say is 1) not a call site, 2) indirect call site?

wenlei added inline comments.Aug 24 2023, 10:59 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2121

More like findAnchorLocations?

wlei updated this revision to Diff 553373.Aug 24 2023, 11:30 PM

addressing feedback

llvm/lib/Transforms/IPO/SampleProfile.cpp
2121

Sounds good to move MatchedCallsiteLocs out of populateIRLocations, I also saw you mentioned to break the functions in other patch, I will do the refactoring in a new diff.

Regarding the name findAnchorLocations, we use the term "Anchor" for both IR and Profile, see in the runStaleProfileMatching, we use the ProfileAnchors. So maybe still keep populateIRlocations or findIRLocations? after the refactoring, it should only be related with IR.

2138

Sounds good!

2139

changed to top-level frame.

2156

Thanks for the suggestion, refine the comments.

2157–2160

Sounds good!

2162

IntrinsicInst may not only be used for probe intrinsics. I can't remember clearly, but I vaguely remembered there were a lot of dbg intrinsics that messed up the matching, and the fix was to add this isa<IntrinsicInst>(&I) check.

wenlei added inline comments.Aug 25 2023, 9:35 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2121

Assuming the part that deals with profile will be moved out of this function ,how about let's name it as findIRAnchors?

In runStaleProfileMatching, we find ProfileAnchors and match them with IRAnchors established here.

2389–2390

Can you update the comments to avoid ambiguity of "non-direct-call site"?

wenlei accepted this revision.Aug 25 2023, 3:36 PM

lgtm, thanks.

This revision is now accepted and ready to land.Aug 25 2023, 3:36 PM
hoy accepted this revision.Aug 29 2023, 5:02 PM
This revision was landed with ongoing or failed builds.Aug 30 2023, 6:01 PM
This revision was automatically updated to reflect the committed changes.