This is an archive of the discontinued LLVM Phabricator instance.

[IndirectCallPromotion][SimpifyCFG] Preserve indirect callsites in hoist/sink when merging !prof is unprofitable and merge these callsites after ICP completes in a build pipeline
Needs ReviewPublic

Authored by mingmingl on Apr 20 2023, 11:38 PM.

Details

Reviewers
xur
tejohnson
Summary

The problem to solve:

  • When simplify-cfg could hoist / sink indirect callsites with inherently different target values, there isn't a simple lossless way to merge !prof value profile metadata.

This patch:

  1. Before indirect-call-promotion happens, preserve the indirect callsites with target value profiles in simplify-cfg when there isn't a good way to merge.
    • Basically, when two indirect callsites each has value profiles, preserve them; or else it's fine to hoist or sink before ICP.
  2. After indirect-call-promotion happens, always sink or hoist (when possible) and merge value profiles.

Implementation:

  1. Set a module flag after the last ICP pass completes in a binary build pipeline.
    • Pass in ThinOrFullLTOPhase to ICP pass constructor. IndirectCallProm transformations all happen after ICP pass completes in a {thin, regular} LTO postlink pipeline or in a non-LTO pipeline.
  2. Add helper functions to decide whether it's profitable to merge and get merged results.
  3. SimplifyCFG calls helper functions above to preserve callsites for indirect-call-prom to happen, or hoist/sink indirect calls && merge value profiles.
  4. Clearing value profiles after ICP makes use of them won't work since CGProfile pass still uses <target-value, counter> pairs to compute function hotness.

Diff Detail

Event Timeline

mingmingl created this revision.Apr 20 2023, 11:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 11:38 PM
mingmingl requested review of this revision.Apr 20 2023, 11:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 11:38 PM
mingmingl added a subscriber: tejohnson.

Would be good to explain motivation (maybe reference other patch?)

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

llvm/include/llvm/IR/Instructions.h
4018

Nothing in this patch uses the IndirectCallMD value set here - is it needed?

llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
309

This is unnecessary if !MDProf

313

Can the CallBase be anything other than a CallInst or InvokeInst?

408

Comment describing why

wenlei added a subscriber: hoy.Apr 21 2023, 12:58 PM

cc @hoy to make sure this works for csspgo.

mingmingl planned changes to this revision.Apr 21 2023, 1:33 PM

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

hoy added a comment.Apr 21 2023, 2:11 PM

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

Value profiles for indirect call sites can still be used by lld call graph sorting after ICP. Please see https://github.com/llvm/llvm-project/blob/26202a57e5e78b5473ef657737d181f0a68ed56d/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L81

mingmingl added a subscriber: davidxl.EditedApr 21 2023, 2:35 PM

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

Value profiles for indirect call sites can still be used by lld call graph sorting after ICP. Please see https://github.com/llvm/llvm-project/blob/26202a57e5e78b5473ef657737d181f0a68ed56d/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L81

oh thanks for this pointer! Clearing is not an option then.

CGProfile.cpp uses <counter, target-val> pairs but not 'total-count', I'm suggesting this option. Let me know your thoughts!

In this patch

  1. extend its format from !1 = !{!"VP", IndirectCallKind , Sum, counter1, val1, counter2, val2} to !1 = !{!"VP", Kind , Sum, counter1, val1, counter2, val2, boolean/int 'indirect-call-promotion-all-done'}
  2. indirect-call-promotion-all-done is initially 0, and set to 1 after the last ICP in a compiler pipeline

In D147954

  1. simplify-cfg bails out from merge indirect-calls when seeing indirect-call-promotion-all-done as 0, could merge target value profiles after seeing indirect-call-promotion-all-done being set to 1, so that CGProfile.cpp could get the <counter, target-val> pairs.

Meanwhile, the current ICP should probably still preserve counts as opposed to clearing when all targets are used.

cc @tejohnson @davidxl

hoy added a comment.Apr 21 2023, 2:38 PM

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

Value profiles for indirect call sites can still be used by lld call graph sorting after ICP. Please see https://github.com/llvm/llvm-project/blob/26202a57e5e78b5473ef657737d181f0a68ed56d/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L81

oh thanks for this pointer! Clearing is not an option then.

CGProfile.cpp uses <counter, target-val> pairs but not 'total-count', I'm suggesting this option. Let me know your thoughts!

In this patch

  1. extend its format from !1 = !{!"VP", IndirectCallKind , Sum, counter1, val1, counter2, val2} to !1 = !{!"VP", Kind , Sum, counter1, val1, counter2, val2, boolean/int 'indirect-call-promotion-all-done'}
  2. indirect-call-promotion-all-done is initially 0, and set to 1 after the last ICP in a compiler pipeline

In D147954

  1. simplify-cfg bails out from merge indirect-calls when seeing indirect-call-promotion-all-done as 0, could merge target value profiles after seeing indirect-call-promotion-all-done being set to 1, so that CGProfile.cpp could get the <counter, target-val> pairs.

Meanwhile, the current ICP should probably still preserve counts as opposed to clearing when all targets are used.

cc @tejohnson @davidxl

I guess I don't get the motivation of the original change. Can you please remind me why you'd like to cleanup VP metadata after ICP?

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

Value profiles for indirect call sites can still be used by lld call graph sorting after ICP. Please see https://github.com/llvm/llvm-project/blob/26202a57e5e78b5473ef657737d181f0a68ed56d/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L81

oh thanks for this pointer! Clearing is not an option then.

CGProfile.cpp uses <counter, target-val> pairs but not 'total-count', I'm suggesting this option. Let me know your thoughts!

In this patch

  1. extend its format from !1 = !{!"VP", IndirectCallKind , Sum, counter1, val1, counter2, val2} to !1 = !{!"VP", Kind , Sum, counter1, val1, counter2, val2, boolean/int 'indirect-call-promotion-all-done'}
  2. indirect-call-promotion-all-done is initially 0, and set to 1 after the last ICP in a compiler pipeline

In D147954

  1. simplify-cfg bails out from merge indirect-calls when seeing indirect-call-promotion-all-done as 0, could merge target value profiles after seeing indirect-call-promotion-all-done being set to 1, so that CGProfile.cpp could get the <counter, target-val> pairs.

Meanwhile, the current ICP should probably still preserve counts as opposed to clearing when all targets are used.

cc @tejohnson @davidxl

I guess I don't get the motivation of the original change. Can you please remind me why you'd like to cleanup VP metadata after ICP?

(D147954 is about the problem to solve and has more discussions) In brief, https://gcc.godbolt.org/z/Pxoh8PrGE shows a case where simplify-cfg aggressively drops profile metadata while it shouldn't. This would happen when prelink simplify-cfg enables instruction sink and postlink ICP couldn't see the value profiles.
The motivation of this patch and D147954 is to preserve value profiles attached on indirect calls for indirect-call-promotion to use.

Turning off simplify-cfg sinking/hoisting completely in prelink pipeline would be more invasive -> for 'llvm/test/Transforms/SimplifyCFG/merge-direct-call-branch-weights-in-hoist.ll' in D147954, it's desired to hoist address calculation instructions (from if.then and if.else in https://gcc.godbolt.org/z/b4Y9Mqrq1) in prelink stage.

hoy added a comment.Apr 21 2023, 3:19 PM

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

Value profiles for indirect call sites can still be used by lld call graph sorting after ICP. Please see https://github.com/llvm/llvm-project/blob/26202a57e5e78b5473ef657737d181f0a68ed56d/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L81

oh thanks for this pointer! Clearing is not an option then.

CGProfile.cpp uses <counter, target-val> pairs but not 'total-count', I'm suggesting this option. Let me know your thoughts!

In this patch

  1. extend its format from !1 = !{!"VP", IndirectCallKind , Sum, counter1, val1, counter2, val2} to !1 = !{!"VP", Kind , Sum, counter1, val1, counter2, val2, boolean/int 'indirect-call-promotion-all-done'}
  2. indirect-call-promotion-all-done is initially 0, and set to 1 after the last ICP in a compiler pipeline

In D147954

  1. simplify-cfg bails out from merge indirect-calls when seeing indirect-call-promotion-all-done as 0, could merge target value profiles after seeing indirect-call-promotion-all-done being set to 1, so that CGProfile.cpp could get the <counter, target-val> pairs.

Meanwhile, the current ICP should probably still preserve counts as opposed to clearing when all targets are used.

cc @tejohnson @davidxl

I guess I don't get the motivation of the original change. Can you please remind me why you'd like to cleanup VP metadata after ICP?

(D147954 is about the problem to solve and has more discussions) In brief, https://gcc.godbolt.org/z/Pxoh8PrGE shows a case where simplify-cfg aggressively drops profile metadata while it shouldn't. This would happen when prelink simplify-cfg enables instruction sink and postlink ICP couldn't see the value profiles.
The motivation of this patch and D147954 is to preserve value profiles attached on indirect calls for indirect-call-promotion to use.

Turning off simplify-cfg sinking/hoisting completely in prelink pipeline would be more invasive -> for 'llvm/test/Transforms/SimplifyCFG/merge-direct-call-branch-weights-in-hoist.ll' in D147954, it's desired to hoist address calculation instructions (from if.then and if.else in https://gcc.godbolt.org/z/b4Y9Mqrq1) in prelink stage.

Thanks for the explanation. IIUC, the fix for SimplyCFG dropping !VP is to either block the corresponding optimization or to merge the !VP data when merge callsites. What is the problem of always merging callsite !VP?

mingmingl added a comment.EditedApr 21 2023, 3:28 PM

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Also, the new tests only test the behavior with the internal option. It would be good to have tests with various pipelines to check the logic on when it gets cleared in those different cases simply based on the new pipeline logic.

I see your point. I'm going to write a test like https://github.com/llvm/llvm-project/blob/main/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll to test pipeline logic.

cc @hoy to make sure this works for csspgo.

thanks! The assumption here is that ICP runs once per pipeline and there are no other passes that uses value-profiles. If the assumption doesn't hold, I'm thinking about an explicit boolean in ICP pass constructor (which makes pipeline logic a little bit more complex)

Value profiles for indirect call sites can still be used by lld call graph sorting after ICP. Please see https://github.com/llvm/llvm-project/blob/26202a57e5e78b5473ef657737d181f0a68ed56d/llvm/lib/Transforms/Instrumentation/CGProfile.cpp#L81

oh thanks for this pointer! Clearing is not an option then.

CGProfile.cpp uses <counter, target-val> pairs but not 'total-count', I'm suggesting this option. Let me know your thoughts!

In this patch

  1. extend its format from !1 = !{!"VP", IndirectCallKind , Sum, counter1, val1, counter2, val2} to !1 = !{!"VP", Kind , Sum, counter1, val1, counter2, val2, boolean/int 'indirect-call-promotion-all-done'}
  2. indirect-call-promotion-all-done is initially 0, and set to 1 after the last ICP in a compiler pipeline

In D147954

  1. simplify-cfg bails out from merge indirect-calls when seeing indirect-call-promotion-all-done as 0, could merge target value profiles after seeing indirect-call-promotion-all-done being set to 1, so that CGProfile.cpp could get the <counter, target-val> pairs.

Meanwhile, the current ICP should probably still preserve counts as opposed to clearing when all targets are used.

cc @tejohnson @davidxl

I guess I don't get the motivation of the original change. Can you please remind me why you'd like to cleanup VP metadata after ICP?

(D147954 is about the problem to solve and has more discussions) In brief, https://gcc.godbolt.org/z/Pxoh8PrGE shows a case where simplify-cfg aggressively drops profile metadata while it shouldn't. This would happen when prelink simplify-cfg enables instruction sink and postlink ICP couldn't see the value profiles.
The motivation of this patch and D147954 is to preserve value profiles attached on indirect calls for indirect-call-promotion to use.

Turning off simplify-cfg sinking/hoisting completely in prelink pipeline would be more invasive -> for 'llvm/test/Transforms/SimplifyCFG/merge-direct-call-branch-weights-in-hoist.ll' in D147954, it's desired to hoist address calculation instructions (from if.then and if.else in https://gcc.godbolt.org/z/b4Y9Mqrq1) in prelink stage.

Thanks for the explanation. IIUC, the fix for SimplyCFG dropping !VP is to either block the corresponding optimization or to merge the !VP data when merge callsites. What is the problem of always merging callsite !VP?

Take https://gcc.godbolt.org/z/o6G68rn3v as example, func1 and func2 are two different virtual methods so inherently has different target values.

Say each virtual callsite has a sum and a list of <counter, target-value> pairs, merging <counter, target-value> pairs is okay, either merging sum, or use max(sum1, sum2) changed the ratio, and could make a significant target value no longer profitable according to isPromotionProfitable

Thanks for the explanation. IIUC, the fix for SimplyCFG dropping !VP is to either block the corresponding optimization or to merge the !VP data when merge callsites. What is the problem of always merging callsite !VP?

@hoy the problem is you lose flow-sensitivity for the VP after merge. And since ICP is important, it perhaps warrant special handling.

hoy added a comment.Apr 21 2023, 4:13 PM

I see. Merging could cause the loss of flow-sensitivity for PGO, but not merging doesn't give optimal code quality.

Here are my thoughts about what could happen with @mingmingl's approach that allows merging only after ICP.

  1. For the training build that does not use a profile, my intuition is that we will always merge the indirect callsites, which will lead to a merged !VP profile.
  2. Then for the next build that uses the profile, the callsites will first be promoted in separate, and then the leftover unpromoted callsites will be merged. That merged callsite will lead to a merged !VP profile. The already promoted callsites, if not merged, will lead to separate !VP profiles.
  3. For all subsequent builds, the separate !VP profiles will be dominant, the leftover merged profile shouldn't be important as they represent non-typical paths.

The questions is, in #2, whether promoted callsites can be merged or not. I'm not quite sure but since their promotion is based on the same profile from #1, the resulting direct calls will all look the same which might be good candidates to merge. If merged, the generated profile based on this build will still lose flow-sensitivity.

There's also another possibility which is no callsites can be merged after ICP, because of the conditional code introduced during ICP. If this is common, then it sounds like just letting indirect callsites be merged (#1) as is isn't a deal breaker.

Thoughts?

davidxl added a comment.EditedApr 21 2023, 4:46 PM

I see. Merging could cause the loss of flow-sensitivity for PGO, but not merging doesn't give optimal code quality.

Here are my thoughts about what could happen with @mingmingl's approach that allows merging only after ICP.

For instrumentation PGO, the situation is much simpler -- just need to make sure the callsite merge does not happen before instrumentation. Your concern is on AutoFDO with iteration.

  1. For the training build that does not use a profile, my intuition is that we will always merge the indirect callsites, which will lead to a merged !VP profile.

For initial build of the profiling binary, yes this is true.

  1. Then for the next build that uses the profile, the callsites will first be promoted in separate, and then the leftover unpromoted callsites will be merged.

Depending the phase ordering of sample loading. The unpromoted callsites is not merged before sample loading, then I assume one of the callsites will be annotated with VP data (assuming that in the profile binary the merged callsite inherits one source's debug info). If that is the case, this patch will prevent those sites from being merged. And starting from this iteration, all future iterations should be fine (remain unmerged).

The promoted callsite should behave similarly as unpromoted callsite.

For instance,

// original code:

if (cond) {
    fp1->foo();   //line 3; targeting bar
} else  {
  fp2-> foo(); // line 5; targeting blah

}

// After merging in profile binary we get

fp = select cond, fp1, fp2;
fp->foo(); // line 3 targeting bar, blah

// After sampling, line 3 gets VP data 'bar', 'blah'

// In optimized build, assuming merge does not happen before sample loading, the following will happen:

 

-   fp1 gets VP data 'bar', and 'blah' and get promoted, 
-   fp2 at line 5 has no VP data
-   merging will be prevented
-   line 3 will be promoted (less precisedly)
-   line5 won't be promoted (no VP data)
-

// in the next round of profiling, both line 3 and line 5 have independent VP data, and things will start get stable from now on.

  1. For all subsequent builds, the separate !VP profiles will be dominant, the leftover merged profile shouldn't be important as they represent non-typical paths.

The VP data from promoted branches will be mapped to the same line including the left over one.

For instance, the line 3 is promoted:

 if (fp1 == 'bar') 
     bar();
 else if (fp1 == 'blah')
     blah();            // no samples in this case
else
  fp1();

The resulting profile for line 3 will have 'bar' as dominating target.

The questions is, in #2, whether promoted callsites can be merged or not. I'm not quite sure but since their promotion is based on the same profile from #1, the resulting direct calls will all look the same which might be good candidates to merge. If merged, the generated profile based on this build will still lose flow-sensitivity.

Icall promotion itself will likely block any future merging.

There's also another possibility which is no callsites can be merged after ICP, because of the conditional code introduced during ICP. If this is common, then it sounds like just letting indirect callsites be merged (#1) as is isn't a deal breaker.

See above. This patch will help stablize promotion behavior after 2 iterations which is good for AFDO.

Thoughts?

Note the format of my last reply is messed up a little.

mingmingl added a comment.EditedApr 21 2023, 4:50 PM

I see. Merging could cause the loss of flow-sensitivity for PGO, but not merging doesn't give optimal code quality.

Here are my thoughts about what could happen with @mingmingl's approach that allows merging only after ICP.

  1. For the training build that does not use a profile, my intuition is that we will always merge the indirect callsites, which will lead to a merged !VP profile.

For an instrumentation + thinlto build, value profiles of two different indirect callsites won't be merged

  • instrumentation of target values happens before the simplify-cfg pass that enables hoist and sink.
  • https://gcc.godbolt.org/z/MM976vn55 is an example -> xur@ pointed out the counter index is from %. = select i1 %cmp, i64 2, i64 1 , so each callsite is correctly instrumented (each index for one entry in the linked list>. Note simplify-cfg pass runs multiple times in a pipeline, configured with different options. SimplifyCFG pass before PGOInstrumentationGen doesn't sink or hoist as shown in the example.

For a SamplePGO + thinlto build,

  1. if a function foo gets two indirect callsites c1 and c2 merged in the training build, hypothetically one debug location is kept in the merged callsite, or maybe both debug locations are dropped.
    • If both debug locations are dropped, the merged one callsite won't be annotated with value profiles.
    • If one debug location is kept, target values of both c1 and c2 are attributed to this debug location, and indirect callsite for this debug location is annotated.
  2. When collected sample-pgo profiles have two value profiles in the first place, sample-pgo-use is more likely to see two indirect callsites, each annotated with different value profiles.
  1. Then for the next build that uses the profile, the callsites will first be promoted in separate, and then the leftover unpromoted callsites will be merged. That merged callsite will lead to a merged !VP profile. The already promoted callsites, if not merged, will lead to separate !VP profiles.

#2 discusses what happens when value profiles are merged on two different callsites.

For Sample-PGO, when two indirect callsites are merged in the first place in the training build (where sample-pgo profiles are collected), the value profile could be associated with at most one debug location, so at most one of the two indirect callsites could be annotated.

  1. For all subsequent builds, the separate !VP profiles will be dominant, the leftover merged profile shouldn't be important as they represent non-typical paths.

The questions is, in #2, whether promoted callsites can be merged or not. I'm not quite sure but since their promotion is based on the same profile from #1, the resulting direct calls will all look the same which might be good candidates to merge. If merged, the generated profile based on this build will still lose flow-sensitivity.

There's also another possibility which is no callsites can be merged after ICP, because of the conditional code introduced during ICP. If this is common, then it sounds like just letting indirect callsites be merged (#1) as is isn't a deal breaker.

It's likely that indirect callsites cannot be merged because of the introduced if-else branches; there is a trade-off to be made here (and already made when indirect calls are promoted today) -> for the significant indirect calls that show up enough number of times (per isPromotionProfitable), speculatively devirtualize it (possibly inline it in the later inliner pass) is overall good.

If extending value profiles to make it one-bit more stateful is ok (in terms of code quality, profile size, etc), it preserves the original per-callsite distribution compared with merging two different callsites.

hoy added a comment.EditedApr 21 2023, 8:48 PM

Thanks for the illustration. For AutoFDO, I took a look at the debug info merger (https://github.com/llvm/llvm-project/blob/a17b71d17f853350dcd6c72ab141b196d7caec2a/llvm/lib/IR/DebugInfoMetadata.cpp#L114). Here is my understanding:

  1. For two callsites that are originally from different lines of a function, the merged location will have a zero line number. This basically means the value profile for the merged callsite will be dropped by the sample loader.
  1. For callsites originally from same location but ended up with different inline contexts in the caller function, the merger tries to reconcile their inline contexts by finding a common suffix of the contexts. If successful, the common suffix will be the debug location for the merged callsites. This may allow for the value profile to be shared by the two callsites (under a different inlining situation). E.g, for a callsite defined in function D at line 4, if somehow it gets inlined twice into function A:
A:1 @ B:2 @ C:3 @ D:4

A:2 @ B:3 @ C:3 @ D:4

Then my understanding is that the merged callsite will have C:3 @ D:4 as its debug string. If somehow in the next compilation A never inlines B but B inlines C twice, then the two callsites could end up sharing the same value profile.

#2 may sound uncommon, but #1 should be common. If that's the case, then we basically lose track of value profiles for merged callsites.

Please let me know if I miss anything. Despite AutoFDO, I think Mingming's solution that allows merging only after ICP should be good for Instrument PGO.

Thanks for the illustration. For AutoFDO, I took a look at the debug info merger (https://github.com/llvm/llvm-project/blob/a17b71d17f853350dcd6c72ab141b196d7caec2a/llvm/lib/IR/DebugInfoMetadata.cpp#L114). Here is my understanding:

  1. For two callsites that are originally from different lines of a function, the merged location will have a zero line number. This basically means the value profile for the merged callsite will be dropped by the sample loader.

Thanks for checking. I suspect bugs like this may exist, but it is orthogonal to the problem at hand.

  1. For callsites originally from same location but ended up with different inline contexts in the caller function, the merger tries to reconcile their inline contexts by finding a common suffix of the contexts. If successful, the common suffix will be the debug location for the merged callsites. This may allow for the value profile to be shared by the two callsites (under a different inlining situation). E.g, for a callsite defined in function D at line 4, if somehow it gets inlined twice into function A:
A:1 @ B:2 @ C:3 @ D:4

A:2 @ B:3 @ C:3 @ D:4

Then my understanding is that the merged callsite will have C:3 @ D:4 as its debug string. If somehow in the next compilation A never inlines B but B inlines C twice, then the two callsites could end up sharing the same value profile.

True, but this happens today as well -- it is very unlikely such two callsites from inline instances can be merged in the first place.

#2 may sound uncommon,

yes. Besides sharing profile may still be better than dropping profile.

> but #1 should be common. If that's the case, then we basically lose track of value profiles for merged callsites.

Please let me know if I miss anything.

See my reply above.

Despite AutoFDO, I think Mingming's solution that allows merging only after ICP should be good for Instrument PGO.

xur added a comment.Apr 24 2023, 12:33 PM

Thanks for the illustration. For AutoFDO, I took a look at the debug info merger (https://github.com/llvm/llvm-project/blob/a17b71d17f853350dcd6c72ab141b196d7caec2a/llvm/lib/IR/DebugInfoMetadata.cpp#L114). Here is my understanding:

  1. For two callsites that are originally from different lines of a function, the merged location will have a zero line number. This basically means the value profile for the merged callsite will be dropped by the sample loader.

Thanks for checking. I suspect bugs like this may exist, but it is orthogonal to the problem at hand.

I don't think this is a bug. This is the intended behavior -- if there is conflict debug info, just set to zero line number of remove the debug info.
This does create some problems for Sample FDO.

This is not a problem for pseudo-probe, right?

  1. For callsites originally from same location but ended up with different inline contexts in the caller function, the merger tries to reconcile their inline contexts by finding a common suffix of the contexts. If successful, the common suffix will be the debug location for the merged callsites. This may allow for the value profile to be shared by the two callsites (under a different inlining situation). E.g, for a callsite defined in function D at line 4, if somehow it gets inlined twice into function A:
A:1 @ B:2 @ C:3 @ D:4

A:2 @ B:3 @ C:3 @ D:4

Then my understanding is that the merged callsite will have C:3 @ D:4 as its debug string. If somehow in the next compilation A never inlines B but B inlines C twice, then the two callsites could end up sharing the same value profile.

True, but this happens today as well -- it is very unlikely such two callsites from inline instances can be merged in the first place.

#2 may sound uncommon,

yes. Besides sharing profile may still be better than dropping profile.

> but #1 should be common. If that's the case, then we basically lose track of value profiles for merged callsites.

Please let me know if I miss anything.

See my reply above.

Despite AutoFDO, I think Mingming's solution that allows merging only after ICP should be good for Instrument PGO.

This does not make AutoFDO worse, right?
This seems also to benefit pseudo-probe based AutoFDO.

hoy added a comment.Apr 24 2023, 1:26 PM

Thanks for the illustration. For AutoFDO, I took a look at the debug info merger (https://github.com/llvm/llvm-project/blob/a17b71d17f853350dcd6c72ab141b196d7caec2a/llvm/lib/IR/DebugInfoMetadata.cpp#L114). Here is my understanding:

  1. For two callsites that are originally from different lines of a function, the merged location will have a zero line number. This basically means the value profile for the merged callsite will be dropped by the sample loader.

Thanks for checking. I suspect bugs like this may exist, but it is orthogonal to the problem at hand.

I don't think this is a bug. This is the intended behavior -- if there is conflict debug info, just set to zero line number of remove the debug info.
This does create some problems for Sample FDO.

This is not a problem for pseudo-probe, right?

Unfortunately it somehow still affects pseudo probe which uses !dbg info for callsites to store their probe information.

  1. For callsites originally from same location but ended up with different inline contexts in the caller function, the merger tries to reconcile their inline contexts by finding a common suffix of the contexts. If successful, the common suffix will be the debug location for the merged callsites. This may allow for the value profile to be shared by the two callsites (under a different inlining situation). E.g, for a callsite defined in function D at line 4, if somehow it gets inlined twice into function A:
A:1 @ B:2 @ C:3 @ D:4

A:2 @ B:3 @ C:3 @ D:4

Then my understanding is that the merged callsite will have C:3 @ D:4 as its debug string. If somehow in the next compilation A never inlines B but B inlines C twice, then the two callsites could end up sharing the same value profile.

True, but this happens today as well -- it is very unlikely such two callsites from inline instances can be merged in the first place.

#2 may sound uncommon,

yes. Besides sharing profile may still be better than dropping profile.

> but #1 should be common. If that's the case, then we basically lose track of value profiles for merged callsites.

Please let me know if I miss anything.

See my reply above.

Despite AutoFDO, I think Mingming's solution that allows merging only after ICP should be good for Instrument PGO.

This does not make AutoFDO worse, right?
This seems also to benefit pseudo-probe based AutoFDO.

Right, it doesn't make AutoFDO worse. It should help pseudo probe in some cases where merging biases one callsite.

cc @hoy to make sure this works for csspgo.

Sorry for being vague. What I meant is to double check the pass ordering to make sure csspgo doesn't have extra use of VP after sample loader and ICP, so clearing is ok.

CGProfile is a good catch. Now if the final solution is to allow merging of VP after ICP is done, instead clearing all VP, I don't see any problem then.

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Can I piggyback on this to suggest a general feedback :) Reviewers would benefit from more explanation on why, in addition to what, and it applies to other patches as well. Without asking explicitly, I failed to grasp your intention of D147954 just by looking at the patch without asking questions. It somewhat applies to the profile reader changes too.

cc @hoy to make sure this works for csspgo.

Sorry for being vague. What I meant is to double check the pass ordering to make sure csspgo doesn't have extra use of VP after sample loader and ICP, so clearing is ok.

CGProfile is a good catch. Now if the final solution is to allow merging of VP after ICP is done, instead clearing all VP, I don't see any problem then.

Yep without the catch the clearing option would be bad. So indeed good catch!

As an update, I'll proceed to implement the idea of allow merging after ICP transformation, with slightly better implementations than extending "vp" format;

  1. given allow-merging is either true or false for all value-profile instances depending on the execution stage of pipeline, use per module flag rather than per !prof boolean
  2. If possible, introduce helper functions for individual transformation passes (simplifycfg or cse or whatever pass) to call, the helper function either returns 'wait, don't merge for now' (maybe fleshes out as nullptr) or the result of the merged instructions (with merged value profiles).

This way, 1) reduces repeated states to a per-module level, and 2) makes it easier to preserve metadata for other passes (not only simplify-cfg, might be early-cse pass, etc).

Would be good to explain motivation (maybe reference other patch?)

Makes sense, will do.

Can I piggyback on this to suggest a general feedback :) Reviewers would benefit from more explanation on why, in addition to what, and it applies to other patches as well. Without asking explicitly, I failed to grasp your intention of D147954 just by looking at the patch without asking questions. It somewhat applies to the profile reader changes too.

Sure, I'll try to do better. My take is that natural language could be ambiguous compared with reduced godbolt examples (or diagrams to describe cfg, etc). Even with how-and-why explained, I could imagine questions are welcome rather than being frowned upon, like how existing questions inspires me to learn and find problems early.

mingmingl updated this revision to Diff 518973.May 2 2023, 10:49 PM
mingmingl retitled this revision from [IndirectCallPromotion] Clear value profile metadata after the last run of indirect-call-promotion in a pipeline to [IndirectCallPromotion][SimpifyCFG] Preserve indirect callsites in hoist/sink when merging !prof is unprofitable and merge these callsites after ICP completes in a build pipeline.
mingmingl edited the summary of this revision. (Show Details)
mingmingl added a reviewer: tejohnson.

resolve comments and absorbs D147954 to illustrate how module flag and helper functions are used.. Could split out to smaller patches as well.

  • described the problem to solve in summary.
  • modified existing tests for pass pipeline changes.
  • (implementation) consider the possibility that an invoke-instruction could have both branch_weights'and value profiles.
mingmingl marked 4 inline comments as done.May 2 2023, 10:58 PM
mingmingl added inline comments.
llvm/include/llvm/IR/Instructions.h
4018

In the previous version this function was not called so should be deleted back then. Only getBranchWeightProfData was used. Now the change becomes obsolete..

llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
309

Done in the sense that the code is obsolete

313

Good question. Only CallInst or InvokeInst should be relevant for function calls, since CallBrInst is [[ I see many isa<CallInst>(I) || isa<Invokeinst> in the codebase. | only used for inline assembly goto ]].

Added comment around helper function getFunctionCallInstr in llvm/lib/IR/Metadata.cpp for this.