This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Merge branch_weights for direct callsites and introduce options to preserve value profile for indirect callsites
Changes PlannedPublic

Authored by mingmingl on Apr 10 2023, 11:11 AM.

Details

Summary
  • Introduce option to preserve indirect callsites with value profiles, with default value false to keep existing behavior.
    • The planned usage is to turn on this option to true in LTO prelink pipelines in a PGO build.
    • [Why] Before MD_prof metadata is used, two different callsites with MD_prof shouldn't be simplified to one.
  • Merge branch_weights on direct callsites.

IR test cases are generated based on two C++ code with manually-annotated !prof metadata

Diff Detail

Event Timeline

mingmingl created this revision.Apr 10 2023, 11:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 11:11 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
mingmingl requested review of this revision.Apr 10 2023, 11:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 11:11 AM
mingmingl edited the summary of this revision. (Show Details)Apr 10 2023, 11:12 AM
mingmingl added a subscriber: xur.
mingmingl edited the summary of this revision. (Show Details)
mingmingl added reviewers: davidxl, xur.

flag-gate the change and add test cases

davidxl added inline comments.Apr 17 2023, 4:08 PM
llvm/test/Transforms/SimplifyCFG/preserve-call-metadata-in-hoist.ll
9

Perhaps add "merging meta data will make the profile data less precise and not desirable either".

18

->createptr to match the name used in IR.

21

createptr

llvm/test/Transforms/SimplifyCFG/preserve-call-metadata-in-sink.ll
17

createptr

mingmingl updated this revision to Diff 514446.Apr 17 2023, 4:16 PM
mingmingl marked 4 inline comments as done.

resolve comments.

thanks for reviews!

+Teresa for pipeline changes.

I guess you do this only for calls and limit it for pre-link because you want to minimize the restriction on optimization, while making post-link inlining operate on accurate call profile?

Code motion is generally destructive for profile quality. Instead of blocking all sinking and hoisting, have you considered trying to update profile for sinking/hoisting when there's no merging, but only block those that involves merging (i.e. head/tail merge) with incompatible profile? I imagine in some cases, allow sinking/hoisting but updating profile properly may help better inlining later.

llvm/test/Transforms/SimplifyCFG/preserve-call-metadata-in-hoist.ll
9

would be good to mention it in code comment and change description as well.

I guess you do this only for calls and limit it for pre-link because you want to minimize the restriction on optimization, while making post-link inlining operate on accurate call profile?

Code motion is generally destructive for profile quality. Instead of blocking all sinking and hoisting, have you considered trying to update profile for sinking/hoisting when there's no merging, but only block those that involves merging (i.e. head/tail merge) with incompatible profile? I imagine in some cases, allow sinking/hoisting but updating profile properly may help better inlining later.

This one is mainly for profile based devirtualization. Related to inlining, there is a pre-pass to do the unmerging:

arg = select cond, 1, val
call foo(arg)

>

if (cond) {

call foo(1)

} else

call foo(val)
mingmingl marked an inline comment as done.
mingmingl retitled this revision from [SimplifyCFG]Prevent premature simplification of callsites with profile data to [SimplifyCFG]Prevent premature simplification of indirect callsites with profile data.
mingmingl edited the summary of this revision. (Show Details)

follow up on comments

mingmingl added a comment.EditedApr 17 2023, 11:26 PM

I guess you do this only for calls and limit it for pre-link because you want to minimize the restriction on optimization, while making post-link inlining operate on accurate call profile?

Yep this is the plan, but for post-link indirect-call-promotion (which could be subsequently inlined after ICP)

Code motion is generally destructive for profile quality. Instead of blocking all sinking and hoisting, have you considered trying to update profile for sinking/hoisting when there's no merging, but only block those that involves merging (i.e. head/tail merge) with incompatible profile? I imagine in some cases, allow sinking/hoisting but updating profile properly may help better inlining later.

For my understanding, is https://gcc.godbolt.org/z/cMWY9Tqzv an example (based on c++ source with manual !prof) where updating profile would be useful? I think sinking/hoisting would merge instructions but I might miss something.

  • In this example, two direct callsites are merged but both branch_weights metadata (!prof 0, !prof 1) are dropped which is sub-optimal. It's likely better in majority of cases by merging two !prof into one and accumulate counters.

Nevertheless, (after thinking about direct callsites above), I modified the implementation (and option names) to look at indirect calls only primarily because 1) the desired behavior upon icall (indicating !prof is value profile as opposed to branch_weights for direct call) is probably clearer and 2) the pattern shows up (as C++ code shows) with polymorphism, and meanwhile added a 'FIXME' to mention the current 'hoisting' is still suboptimal since it dropped branch_weights on two identical direct calls.

I concur it's better to decide on option name once (preserve-call vs preserve-indirect-call) so I'm fine with implementing the sum-of-branch-weight counters or a better alternative in this patch (and name the option back to more general preserve-call... Let me know your thoughts!

This one is mainly for profile based devirtualization. Related to inlining, there is a pre-pass to do the unmerging:

For my information, in which pass does the unmerging happens?

arg = select cond, 1, val
call foo(arg)

>

if (cond) {

call foo(1)

} else

call foo(val)
llvm/test/Transforms/SimplifyCFG/preserve-call-metadata-in-hoist.ll
9

sure, added comments why preserving !prof metadata (and not hoisting/sinking) in cpp files (indent existing comments accordingly).

I guess you do this only for calls and limit it for pre-link because you want to minimize the restriction on optimization, while making post-link inlining operate on accurate call profile?

Yep this is the plan, but for post-link indirect-call-promotion (which could be subsequently inlined after ICP)

Code motion is generally destructive for profile quality. Instead of blocking all sinking and hoisting, have you considered trying to update profile for sinking/hoisting when there's no merging, but only block those that involves merging (i.e. head/tail merge) with incompatible profile? I imagine in some cases, allow sinking/hoisting but updating profile properly may help better inlining later.

For my understanding, is https://gcc.godbolt.org/z/cMWY9Tqzv an example (based on c++ source with manual !prof) where updating profile would be useful? I think sinking/hoisting would merge instructions but I might miss something.

  • In this example, two direct callsites are merged but both branch_weights metadata (!prof 0, !prof 1) are dropped which is sub-optimal. It's likely better in majority of cases by merging two !prof into one and accumulate counters.

Nevertheless, (after thinking about direct callsites above), I modified the implementation (and option names) to look at indirect calls only primarily because 1) the desired behavior upon icall (indicating !prof is value profile as opposed to branch_weights for direct call) is probably clearer and 2) the pattern shows up (as C++ code shows) with polymorphism, and meanwhile added a 'FIXME' to mention the current 'hoisting' is still suboptimal since it dropped branch_weights on two identical direct calls.

I concur it's better to decide on option name once (preserve-call vs preserve-indirect-call) so I'm fine with implementing the sum-of-branch-weight counters or a better alternative in this patch (and name the option back to more general preserve-call... Let me know your thoughts!

My original thought is,

  • given that SimplifyCFG runs multiple times, one simpler way is to conservatively prevent premature hoisting/sinking for both direct calls and indirect-calls in early stage of the pipeline and preserve the !prof (no need to sum counters for branch_weight of direct callsites), let passes that use !prof sees the original IR and make transformations (e.g., inline hot calls or whatever), and do aggressive simplifycfg, in other words, do not preserve !prof in {direct, indirect} calls in later stage of the pipeline.

Hoist direct calls and sum counters is probably better since
(use two callsites as example)

  1. when both direct callsites are hot, they are hoisted once and both inlined -> faster (in terms of compile time) and more reliable than inline twice and do common code sequence elimination.
  2. if one callsite is cold and another is hot, sum counters gets both callsites inlined (since there is only one hot callsites after hoist), compared with leave cold call outlined && inline hot calls.

This one is mainly for profile based devirtualization. Related to inlining, there is a pre-pass to do the unmerging:

For my information, in which pass does the unmerging happens?

arg = select cond, 1, val
call foo(arg)

>

if (cond) {

call foo(1)

} else

call foo(val)
nikic added a subscriber: nikic.Apr 18 2023, 1:18 AM

Could you please start by implementing metadata merging, and then return to this patch only if strictly necessary? I'd rather not have another SimplifyCFG option for such a niche use case, if it's possible to get most of the benefit via standard metadata merging.

Instead of making this false by default and requiring it to be set from the command line for pre-LTO compile steps, it would be better to add a "setPreserveIndirectCallInstWithProfInHoistAndSink" interface and set this to false when appropriate from the pass pipeline builder (see other examples where the SimplifyCFGOptions are set from there). That way it can be controlled more exactly. I'm a little confused why we would disable this only in the pre-LTO compile step right now: I suppose this is specific to SamplePGO because ICP is not performed during the pre-LTO compile step? In other cases (non-LTO and instrumentation PGO) ICP appears to be performed right after profile annotation.

However, we also perform SimplifyCFG at least once before profile annotation (both sample and instrumentation PGO). Wouldn't we still end up merging indirect calls in some cases too aggressively, because they don't have any profile metadata so we won't know any better? I'm currently testing a change to avoid speculation in SimplifyCFG before profile annotation to avoid doing incorrectly aggressive transformations. We might want to do something similar here too: setPreserveIndirectCallInstWithProfInHoistAndSink(false) for all pre-PGO annotation invocations of SimplifyCFG in PGO compiles.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1818

Suggest undoing this change which just moves the comment a little but unnecessarily adds a diff.

mingmingl planned changes to this revision.Apr 18 2023, 8:13 AM

Could you please start by implementing metadata merging, and then return to this patch only if strictly necessary? I'd rather not have another SimplifyCFG option for such a niche use case, if it's possible to get most of the benefit via standard metadata merging.

(quick reply to describe the plan) okay, let me implement something as described and see how it works for this use case

  1. do metadata merging for direct callsites 2) conservatively not hoisting/sinking indirect callsites for indirect callsites if !prof presents -> indirect-call-promotion pass will clear !prof for indirect-callsite (since there are no other uses of this type of value profiles) so the SimplifyCFG pass after indirect-call-promotion should be able to do a full cleanup if the indirect-call is still in the IR but not speculatively devirtualized.

I guess you do this only for calls and limit it for pre-link because you want to minimize the restriction on optimization, while making post-link inlining operate on accurate call profile?

Yep this is the plan, but for post-link indirect-call-promotion (which could be subsequently inlined after ICP)

Code motion is generally destructive for profile quality. Instead of blocking all sinking and hoisting, have you considered trying to update profile for sinking/hoisting when there's no merging, but only block those that involves merging (i.e. head/tail merge) with incompatible profile? I imagine in some cases, allow sinking/hoisting but updating profile properly may help better inlining later.

For my understanding, is https://gcc.godbolt.org/z/cMWY9Tqzv an example (based on c++ source with manual !prof) where updating profile would be useful? I think sinking/hoisting would merge instructions but I might miss something.

  • In this example, two direct callsites are merged but both branch_weights metadata (!prof 0, !prof 1) are dropped which is sub-optimal. It's likely better in majority of cases by merging two !prof into one and accumulate counters.

Nevertheless, (after thinking about direct callsites above), I modified the implementation (and option names) to look at indirect calls only primarily because 1) the desired behavior upon icall (indicating !prof is value profile as opposed to branch_weights for direct call) is probably clearer and 2) the pattern shows up (as C++ code shows) with polymorphism, and meanwhile added a 'FIXME' to mention the current 'hoisting' is still suboptimal since it dropped branch_weights on two identical direct calls.

I concur it's better to decide on option name once (preserve-call vs preserve-indirect-call) so I'm fine with implementing the sum-of-branch-weight counters or a better alternative in this patch (and name the option back to more general preserve-call... Let me know your thoughts!

This one is mainly for profile based devirtualization. Related to inlining, there is a pre-pass to do the unmerging:

For my information, in which pass does the unmerging happens?

There is a CallSiteSplitting Pass that does it.

arg = select cond, 1, val
call foo(arg)

>

if (cond) {

call foo(1)

} else

call foo(val)

Could you please start by implementing metadata merging, and then return to this patch only if strictly necessary? I'd rather not have another SimplifyCFG option for such a niche use case, if it's possible to get most of the benefit via standard metadata merging.

Merging value profile data will likely render the data not quite useful anymore -- the same effect of dropping the data completely.

mingmingl updated this revision to Diff 514766.Apr 18 2023, 3:42 PM
mingmingl retitled this revision from [SimplifyCFG]Prevent premature simplification of indirect callsites with profile data to [SimplifyCFG] Merge branch_weights for direct callsites and introduce options to preserve value profile for indirect callsites.
mingmingl edited the summary of this revision. (Show Details)

implemented branch_weight merge for direct callsites in hoist and sink, and get rid of the command line option to override indirect-call specific option -> SimplifyCFG option parser still parses it for testing purposes.

mingmingl added a comment.EditedApr 18 2023, 3:42 PM

Instead of making this false by default and requiring it to be set from the command line for pre-LTO compile steps, it would be better to add a "setPreserveIndirectCallInstWithProfInHoistAndSink" interface and set this to false when appropriate from the pass pipeline builder (see other examples where the SimplifyCFGOptions are set from there). That way it can be controlled more exactly.

Added setPreserveIndirectCallInstWithProfInHoistAndSink method for pass builder pipeline usage.

I'm a little confused why we would disable this only in the pre-LTO compile step right now: I suppose this is specific to SamplePGO because ICP is not performed during the pre-LTO compile step?

(I was confused when seeing it as well) It happens in IFDO since in this particular case, prelink pipeline doesn't run ICP pass; the !prof metadata is dropped in prelink build, so postlink ICP didn't see it.

In other cases (non-LTO and instrumentation PGO) ICP appears to be performed right after profile annotation.

However, we also perform SimplifyCFG at least once before profile annotation (both sample and instrumentation PGO). Wouldn't we still end up merging indirect calls in some cases too aggressively, because they don't have any profile metadata so we won't know any better? I'm currently testing a change to avoid speculation in SimplifyCFG before profile annotation to avoid doing incorrectly aggressive transformations. We might want to do something similar here too: setPreserveIndirectCallInstWithProfInHoistAndSink(false) for all pre-PGO annotation invocations of SimplifyCFG in PGO compiles.

As we briefly discussed, when profile-gen and profile-use in IFDO see the same cfg (i.e., merge happens in profile-gen) only one !prof is generated initially.

For SamplePGO, I think when two indirect calls show up in sampled binary and gives two !prof metadata at two different line offsets in SamplePGO profiles, two indirect calls are likely to be seen in sample-pgo pipeline.

Could you please start by implementing metadata merging, and then return to this patch only if strictly necessary? I'd rather not have another SimplifyCFG option for such a niche use case, if it's possible to get most of the benefit via standard metadata merging.

(quick reply to describe the plan) okay, let me implement something as described and see how it works for this use case

  1. do metadata merging for direct callsites 2) conservatively not hoisting/sinking indirect callsites for indirect callsites if !prof presents -> indirect-call-promotion pass will clear !prof for indirect-callsite (since there are no other uses of this type of value profiles) so the SimplifyCFG pass after indirect-call-promotion should be able to do a full cleanup if the indirect-call is still in the IR but not speculatively devirtualized.

Actually merging two different virtual calls with the same counters could be counterproductive, and ICP pass preserves !prof when not all information is used so I kept the option.

I guess you do this only for calls and limit it for pre-link because you want to minimize the restriction on optimization, while making post-link inlining operate on accurate call profile?

Yep this is the plan, but for post-link indirect-call-promotion (which could be subsequently inlined after ICP)

Code motion is generally destructive for profile quality. Instead of blocking all sinking and hoisting, have you considered trying to update profile for sinking/hoisting when there's no merging, but only block those that involves merging (i.e. head/tail merge) with incompatible profile? I imagine in some cases, allow sinking/hoisting but updating profile properly may help better inlining later.

For my understanding, is https://gcc.godbolt.org/z/cMWY9Tqzv an example (based on c++ source with manual !prof) where updating profile would be useful? I think sinking/hoisting would merge instructions but I might miss something.

  • In this example, two direct callsites are merged but both branch_weights metadata (!prof 0, !prof 1) are dropped which is sub-optimal. It's likely better in majority of cases by merging two !prof into one and accumulate counters.

Nevertheless, (after thinking about direct callsites above), I modified the implementation (and option names) to look at indirect calls only primarily because 1) the desired behavior upon icall (indicating !prof is value profile as opposed to branch_weights for direct call) is probably clearer and 2) the pattern shows up (as C++ code shows) with polymorphism, and meanwhile added a 'FIXME' to mention the current 'hoisting' is still suboptimal since it dropped branch_weights on two identical direct calls.

I concur it's better to decide on option name once (preserve-call vs preserve-indirect-call) so I'm fine with implementing the sum-of-branch-weight counters or a better alternative in this patch (and name the option back to more general preserve-call... Let me know your thoughts!

This one is mainly for profile based devirtualization. Related to inlining, there is a pre-pass to do the unmerging:

For my information, in which pass does the unmerging happens?

There is a CallSiteSplitting Pass that does it.

Ack, thanks!

arg = select cond, 1, val
call foo(arg)

>

if (cond) {

call foo(1)

} else

call foo(val)
mingmingl updated this revision to Diff 514775.Apr 18 2023, 4:02 PM
mingmingl marked an inline comment as done.

undo unrelated change around comment as suggested.

Could you please start by implementing metadata merging, and then return to this patch only if strictly necessary? I'd rather not have another SimplifyCFG option for such a niche use case, if it's possible to get most of the benefit via standard metadata merging.

(quick reply to describe the plan) okay, let me implement something as described and see how it works for this use case

  1. do metadata merging for direct callsites 2) conservatively not hoisting/sinking indirect callsites for indirect callsites if !prof presents -> indirect-call-promotion pass will clear !prof for indirect-callsite (since there are no other uses of this type of value profiles) so the SimplifyCFG pass after indirect-call-promotion should be able to do a full cleanup if the indirect-call is still in the IR but not speculatively devirtualized.

Actually merging two different virtual calls with the same counters could be counterproductive, and ICP pass preserves !prof when not all information is used so I kept the option.

I thought per our offline chat that you were going to try merging the !prof VP metadata. When is it counterproductive?

Should the direct call prof data merge be in a separate patch?

mingmingl planned changes to this revision.Apr 19 2023, 10:00 AM

Could you please start by implementing metadata merging, and then return to this patch only if strictly necessary? I'd rather not have another SimplifyCFG option for such a niche use case, if it's possible to get most of the benefit via standard metadata merging.

(quick reply to describe the plan) okay, let me implement something as described and see how it works for this use case

  1. do metadata merging for direct callsites 2) conservatively not hoisting/sinking indirect callsites for indirect callsites if !prof presents -> indirect-call-promotion pass will clear !prof for indirect-callsite (since there are no other uses of this type of value profiles) so the SimplifyCFG pass after indirect-call-promotion should be able to do a full cleanup if the indirect-call is still in the IR but not speculatively devirtualized.

Actually merging two different virtual calls with the same counters could be counterproductive, and ICP pass preserves !prof when not all information is used so I kept the option.

I thought per our offline chat that you were going to try merging the !prof VP metadata. When is it counterproductive?

sorry for the confusion. merging !prof of indirect call targets may make target values not significant (e.g., in https://gcc.godbolt.org/z/o6G68rn3v, 'func1' and func2' are virtual calls with different address offset so the target values are different; when call-count is updated as a sum, the values become not significant).

As discussed (with xur@ together, I'll change the implementation as
(Possible in a another patch before this)

  1. introduce an option in ICP pass -to clear 'direct-call-targets' after the last ICP-> it could be a partial clear when one !prof has more than indirect-call target value profiles
  2. Let pass builder set the option above (e.g. post link in a LTO pipeline, or the ICP in a non-LTO pipeline)

In this D147954

  1. Get rid of the option in SimplifyCFG pass
  2. Change SimplifyCFG to bail out when it sees an indirect call with target value profiles

And move 'branch_weight' merge to a separate patch as well (per comment)

In this way, value-profile metadata is preserved for ICP to make use of it, and simplifycfg could proceed to sink/hoist indirect calls if ICP didn't happen.

Should the direct call prof data merge be in a separate patch?

Yep, a separate patch is reasonable. Will do that.

Just to make sure I understand it correctly. The solution to the issue Teresa raised,

In other cases (non-LTO and instrumentation PGO) ICP appears to be performed right after profile annotation.

However, we also perform SimplifyCFG at least once before profile annotation (both sample and instrumentation PGO). Wouldn't we still end up merging indirect calls in some cases too aggressively, because they don't have any profile metadata so we won't know any better? I'm currently testing a change to avoid speculation in SimplifyCFG before profile annotation to avoid doing incorrectly aggressive transformations. We might want to do something similar here too: setPreserveIndirectCallInstWithProfInHoistAndSink(false) for all pre-PGO annotation invocations of SimplifyCFG in PGO compiles.

is to not rely on pre-link vs post-link to decide when to enable/disable hoist/sink for calls, but to check whether calls have target value profile remaining (i.e. ICP yet to happen) so we can always avoid destroying profile for ICP regardless of where simpilifyCFG is run. Is that correct?

Just to make sure I understand it correctly. The solution to the issue Teresa raised,

In other cases (non-LTO and instrumentation PGO) ICP appears to be performed right after profile annotation.

However, we also perform SimplifyCFG at least once before profile annotation (both sample and instrumentation PGO). Wouldn't we still end up merging indirect calls in some cases too aggressively, because they don't have any profile metadata so we won't know any better? I'm currently testing a change to avoid speculation in SimplifyCFG before profile annotation to avoid doing incorrectly aggressive transformations. We might want to do something similar here too: setPreserveIndirectCallInstWithProfInHoistAndSink(false) for all pre-PGO annotation invocations of SimplifyCFG in PGO compiles.

is to not rely on pre-link vs post-link to decide when to enable/disable hoist/sink for calls, but to check whether calls have target value profile remaining (i.e. ICP yet to happen) so we can always avoid destroying profile for ICP regardless of where simpilifyCFG is run. Is that correct?

Here is a summary of how dropped metadata happens in an instrumented FDO + thinlto build

  1. SimplifyCFG pass run multiple times in prelink pipeline and postlink pipeline, and each run could be configured with different SimplifyCFGOptions (sink or no sink, hoist or no hoist, etc)
  2. The dropped metadata is observed in a instrumented fdo + lto build, when prelink pipeline runs simplifycfg with hoist && sink, so postlink ICP couldn't see the !prof metadata.

For the questions, here are my thoughts

  • could SimplifyCFG undesirably prematurely sink/hoist callsites before we got a chance to instrument them,
    • the answer is that it shouldn't. Instead SimplifyCFG pass should preserve the callsites for instrumentation (I'm not very familiar with callsite splitting that's mentioned above, but guessing from the name it wants to split callsites and expose them)
  • could SimplifyCFG sink/hoist callsites in a binary foo over which SamplePGO profiles are collected
    • if foo gets two indirect callsites c1 and c2 merged, hypothetically one debug location is kept in the merged callsite, and target values of both c1 and c2 are attributed to this debug location, so it looks like a) hoist && sink could affect the collected profiles -> we want two value profiles, one for c1 and another for c2, but we only get one profile with merged target values (since profiled binary only has one callsite) b) as long as collected sample-pgo profiles have two value profiles, in an AutoFDO build, we should see both callsites exposed and annotated.

About why we get two value profiles in the first place in an instrumentation + LTO build

  • I took a look at pass builder just now, looks like earlier simplify-cfg passes in one pipeline don't turn on 'sink' or 'hoist' [1], and later simplify-cfg passes in a pipeline do turn on sink and host. Similarly, for the small test case (https://gcc.godbolt.org/z/WxE8zshqE), hoist or sink doesn't happen until the 6th SimplifyCFG run (out of 8 SimplifyCFG runs in total). That probably explains why we are able to get two different indirect callsites exposed and instrumented in the first place.

[1] https://gcc.godbolt.org/z/8E39cTosE prints the passes with options in opt since printing in clang is difficult

Just to make sure I understand it correctly. The solution to the issue Teresa raised,

In other cases (non-LTO and instrumentation PGO) ICP appears to be performed right after profile annotation.

However, we also perform SimplifyCFG at least once before profile annotation (both sample and instrumentation PGO). Wouldn't we still end up merging indirect calls in some cases too aggressively, because they don't have any profile metadata so we won't know any better? I'm currently testing a change to avoid speculation in SimplifyCFG before profile annotation to avoid doing incorrectly aggressive transformations. We might want to do something similar here too: setPreserveIndirectCallInstWithProfInHoistAndSink(false) for all pre-PGO annotation invocations of SimplifyCFG in PGO compiles.

is to not rely on pre-link vs post-link to decide when to enable/disable hoist/sink for calls, but to check whether calls have target value profile remaining (i.e. ICP yet to happen) so we can always avoid destroying profile for ICP regardless of where simpilifyCFG is run. Is that correct?

Here is a summary of how dropped metadata happens in an instrumented FDO + thinlto build

  1. SimplifyCFG pass run multiple times in prelink pipeline and postlink pipeline, and each run could be configured with different SimplifyCFGOptions (sink or no sink, hoist or no hoist, etc)
  2. The dropped metadata is observed in a instrumented fdo + lto build, when prelink pipeline runs simplifycfg with hoist && sink, so postlink ICP couldn't see the !prof metadata.

For the questions, here are my thoughts

  • could SimplifyCFG undesirably prematurely sink/hoist callsites before we got a chance to instrument them,
    • the answer is that it shouldn't. Instead SimplifyCFG pass should preserve the callsites for instrumentation (I'm not very familiar with callsite splitting that's mentioned above, but guessing from the name it wants to split callsites and expose them)

Actually https://gcc.godbolt.org/z/MM976vn55 shows two indirect callsites are merged into one in the instrumented build, meaning instrumented build could be hoist two different indirect callsites.

  • could SimplifyCFG sink/hoist callsites in a binary foo over which SamplePGO profiles are collected
    • if foo gets two indirect callsites c1 and c2 merged, hypothetically one debug location is kept in the merged callsite, and target values of both c1 and c2 are attributed to this debug location, so it looks like a) hoist && sink could affect the collected profiles -> we want two value profiles, one for c1 and another for c2, but we only get one profile with merged target values (since profiled binary only has one callsite) b) as long as collected sample-pgo profiles have two value profiles, in an AutoFDO build, we should see both callsites exposed and annotated.

About why we get two value profiles in the first place in an instrumentation + LTO build

  • I took a look at pass builder just now, looks like earlier simplify-cfg passes in one pipeline don't turn on 'sink' or 'hoist' [1], and later simplify-cfg passes in a pipeline do turn on sink and host. Similarly, for the small test case (https://gcc.godbolt.org/z/WxE8zshqE), hoist or sink doesn't happen until the 6th SimplifyCFG run (out of 8 SimplifyCFG runs in total). That probably explains why we are able to get two different indirect callsites exposed and instrumented in the first place.

[1] https://gcc.godbolt.org/z/8E39cTosE prints the passes with options in opt since printing in clang is difficult

the planned change of 'llvm/lib/Transforms/Utils/SimplifyCFG.cpp' for the problem to solve is in D148876 (to illustrate how module flag and added helper functions are used in one patch).

Probably will come back to update this patch for simplify-cfg as a split of D148876.