This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Stop repeated indirect call promotion for the same target
ClosedPublic

Authored by wmi on Feb 16 2021, 11:22 AM.

Details

Summary

Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Diff Detail

Event Timeline

wmi created this revision.Feb 16 2021, 11:22 AM
wmi requested review of this revision.Feb 16 2021, 11:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 11:22 AM
wmi retitled this revision from [SampleFDO] Stop indirect call promotion for the same target repeatedly to [SampleFDO] Stop repeated indirect call promotion for the same target.Feb 16 2021, 11:22 AM
wmi added a comment.Feb 16 2021, 12:09 PM

Currently to prevent recursive indirect call promotion, SampleLoaderPass use a DenseSet PromotedInsns to memorize which indirect call has been promoted inside of current function. However, PromotedInsns is cleared everytime emitAnnotation has done for a function, so it cannot prevent the same indirect call being promoted for the same target again in another iteration of emitAnnotation.

On the other side, because of PromotedInsns, an indirect call is limited to be promoted only for one target in every iteration of emitAnnotation. That limits inline instance profile annotation related with indirect call targets. I think PromotedInsns can be deleted after this patch. I will address it in a followup patch after performance testing.

davidxl added inline comments.Feb 16 2021, 2:57 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1072

Why do we need to keep the zero count data in the first place?

1108

Why updating here? Is the meta data more up to date?

hoy added a comment.Feb 16 2021, 3:14 PM
In D96806#2566458, @wmi wrote:

Currently to prevent recursive indirect call promotion, SampleLoaderPass use a DenseSet PromotedInsns to memorize which indirect call has been promoted inside of current function. However, PromotedInsns is cleared everytime emitAnnotation has done for a function, so it cannot prevent the same indirect call being promoted for the same target again in another iteration of emitAnnotation.

On the other side, because of PromotedInsns, an indirect call is limited to be promoted only for one target in every iteration of emitAnnotation. That limits inline instance profile annotation related with indirect call targets. I think PromotedInsns can be deleted after this patch. I will address it in a followup patch after performance testing.

Did you see the recursive indirect call promotion between functions that do not follow top-down order, like functions in an SCC?

I remember this change https://reviews.llvm.org/D38763 prevents re-promotion done by subsequent PGO ICP. Could the existing heuristic in inlineHotFunctions be tweaked like PGO ICP? BTW, multiple indirect call targets can be promoted by inlineHotFunctionsWithPriority.

wmi added inline comments.Feb 16 2021, 4:18 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1072

I added the zero count data in updateIDTMetaData to indicate that a target has been promoted so it won't be promoted again. Once an indirect call is promoted for a target, we hope it won't be promoted for the same target again in any case, no matter where it is inlined into, so I think metadata is the best place to save the information. Currently I use zero count to represent that a target cannot be promoted anymore, this is consistent with how regular ICP handles zero count profile.

1108

Yes, SampleFDO is different from PGO. PGO initialize the value profiling data once and only substract count of a target when the target is promoted. For SampleFDO, it needs to update metadata every time the indirect call is reannotated with more precise profile after inlining.

wmi added a comment.Feb 16 2021, 4:43 PM
In D96806#2566946, @hoy wrote:
In D96806#2566458, @wmi wrote:

Currently to prevent recursive indirect call promotion, SampleLoaderPass use a DenseSet PromotedInsns to memorize which indirect call has been promoted inside of current function. However, PromotedInsns is cleared everytime emitAnnotation has done for a function, so it cannot prevent the same indirect call being promoted for the same target again in another iteration of emitAnnotation.

On the other side, because of PromotedInsns, an indirect call is limited to be promoted only for one target in every iteration of emitAnnotation. That limits inline instance profile annotation related with indirect call targets. I think PromotedInsns can be deleted after this patch. I will address it in a followup patch after performance testing.

Did you see the recursive indirect call promotion between functions that do not follow top-down order, like functions in an SCC?

Right. If everything follow top-down order, the recursive indirect call issue will be mostly hidden. But currently even the top-down order flag is enabled by default, SCC and missing call graph edges from indirect call can lead to bottom-up inlining.

I remember this change https://reviews.llvm.org/D38763 prevents re-promotion done by subsequent PGO ICP. Could the existing heuristic in inlineHotFunctions be tweaked like PGO ICP?

I think there is a major difference between SampleFDO ICP and regular ICP.

Regular ICP uses the value profile metadata to decide which targets should be promoted. Once metadata is populated, the promotion candidates are mostly decided.

SampleFDO ICP is different. The metadata is refreshed every time the indirect call is inlined into a new function and new inline instance profile is annotated to it. Even if the indirect call is promoted before, the metadata updated after the promotion will be dropped when the new profile is annotated. That is why I am trying to keep "0" counts for those targets which have already been promoted, and need to merge those "0" counts when we update the metadata with new profile -- if there is "0" count for a target, we will keep it as "0" during metadata merge. That will ensure the target won't be promoted again. I think we need this process no matter how we tweak existing heuristic like regular ICP.

BTW, multiple indirect call targets can be promoted by inlineHotFunctionsWithPriority.

I was wrong that PromotedInsns prevent an indirect call from being promoted for multiple targets in every iteration of emitAnnotation. It could. I misread the code. PromotedInsns can still be useful as a cache so it doesn't need to access metadata every loop iteration in inlineHotFunctions, so we may want to keep it in the end. Sorry about it.

wmi updated this revision to Diff 324145.Feb 16 2021, 6:06 PM

Rebase to resolve some conflicts.

PromotedInsns can still be useful as a cache so it doesn't need to access metadata every loop iteration in inlineHotFunctions, so we may want to keep it in the end. Sorry about it.

I would still remove PromotedInsns to keep it simple. If we worry about build time in inlineHotFunctions loop, we can change it to a work list to avoid revisiting all call sites on each iteration, similar to InlineHotFunctionWithPriority does this. That will save more repeated work.

The fix brings 0.1~0.2% performance on our search benchmark.

This is great - thanks for the fix. Without top-down order, the performance improvement would probably be larger.

llvm/lib/Transforms/IPO/SampleProfile.cpp
983

A comment here would be helpful too.

996

Do we expect the same function to appear multiple times in ValueData? If not, we could exit as long as the candidate is found?

1032

When we update a non-zero count to zero to represent promoted call, do we need to reduce the total as well to reflect the remaining count for the actual indirect call?

hoy added a comment.Feb 17 2021, 11:25 AM
In D96806#2567120, @wmi wrote:
In D96806#2566946, @hoy wrote:
In D96806#2566458, @wmi wrote:

Currently to prevent recursive indirect call promotion, SampleLoaderPass use a DenseSet PromotedInsns to memorize which indirect call has been promoted inside of current function. However, PromotedInsns is cleared everytime emitAnnotation has done for a function, so it cannot prevent the same indirect call being promoted for the same target again in another iteration of emitAnnotation.

On the other side, because of PromotedInsns, an indirect call is limited to be promoted only for one target in every iteration of emitAnnotation. That limits inline instance profile annotation related with indirect call targets. I think PromotedInsns can be deleted after this patch. I will address it in a followup patch after performance testing.

Did you see the recursive indirect call promotion between functions that do not follow top-down order, like functions in an SCC?

Right. If everything follow top-down order, the recursive indirect call issue will be mostly hidden. But currently even the top-down order flag is enabled by default, SCC and missing call graph edges from indirect call can lead to bottom-up inlining.

I remember this change https://reviews.llvm.org/D38763 prevents re-promotion done by subsequent PGO ICP. Could the existing heuristic in inlineHotFunctions be tweaked like PGO ICP?

I think there is a major difference between SampleFDO ICP and regular ICP.

Regular ICP uses the value profile metadata to decide which targets should be promoted. Once metadata is populated, the promotion candidates are mostly decided.

SampleFDO ICP is different. The metadata is refreshed every time the indirect call is inlined into a new function and new inline instance profile is annotated to it. Even if the indirect call is promoted before, the metadata updated after the promotion will be dropped when the new profile is annotated. That is why I am trying to keep "0" counts for those targets which have already been promoted, and need to merge those "0" counts when we update the metadata with new profile -- if there is "0" count for a target, we will keep it as "0" during metadata merge. That will ensure the target won't be promoted again. I think we need this process no matter how we tweak existing heuristic like regular ICP.

I see. I remember that I had to use probe distribution factor to scale down the unpromoted indirect callsites.

BTW, multiple indirect call targets can be promoted by inlineHotFunctionsWithPriority.

I was wrong that PromotedInsns prevent an indirect call from being promoted for multiple targets in every iteration of emitAnnotation. It could. I misread the code. PromotedInsns can still be useful as a cache so it doesn't need to access metadata every loop iteration in inlineHotFunctions, so we may want to keep it in the end. Sorry about it.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1027

Nit: group this if-statement and the one below?

1031

Is a zero count automatically set when a target is promoted in a callee?

wmi updated this revision to Diff 324530.Feb 17 2021, 10:36 PM

Address Wenlei and Hongtao's comments.

wenlei added inline comments.Feb 17 2021, 10:43 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
997–999

nit: return (ValueData[I].Count == 0);?

1035

In what case do we have Total < Data.Count? And Total == Data.Count should be fine for subtraction too?

wmi marked 4 inline comments as done.Feb 17 2021, 10:44 PM

I would still remove PromotedInsns to keep it simple.

Yes, it is simpler. I removed them.

llvm/lib/Transforms/IPO/SampleProfile.cpp
983

Comment added.

996

Good point. Changed.

1027

I added some other statement after the statement below when I addressed Wenlei's comment so now it is clearer not to group them.

1031

It is set by updateIDTMetaData with CallTargets containing the promoted target and count being set to 0.

1032

Another good point. Changed.

wmi marked 5 inline comments as done.Feb 17 2021, 10:58 PM
wmi added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
997–999

That is better. Will fix.

1035

Currently we won't have such case. It is just some extra protect in case of future change. I think assertion will be better. Will change it.

Right, Total == Data.Count should be added. will change it.

wmi marked an inline comment as done.Feb 18 2021, 9:47 AM
wmi added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
1035

The assertion catches a case where we have Total < Data.Count when running the test test/Transforms/SampleProfile/csspgo-inline-icall.ll.

For CS profile, the sum computed in findIndirectCallFunctionSamples can be smaller than the sum of the target counts returned by findCallTargetMapAt. Is it expected?

hoy added inline comments.Feb 18 2021, 9:58 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1035

For CS profile, we use callsite count (including both inlined and non-inlined targets) instead of using entry counts, since there's yet a way to tell if an indirect call target is inlined or not from a CS profile.

We have a fix for this to review. Haven't sent it out yet. It basically replaces this code

uint64_t Sum = 0;
 findIndirectCallFunctionSamples(I, Sum);

with

if (FunctionSamples::ProfileIsCS) {
            // With CSSPGO all indirect call targets are counted towards the
            // original indirect call site in the profile, including both
            // inlined and non-inlined targets.
            Sum = 0;
            for (const auto &T_C : T.get())
              Sum += T_C.second;
          } else {
            findIndirectCallFunctionSamples(I, Sum);
          }
wmi added inline comments.Feb 18 2021, 10:04 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1035

findIndirectCallFunctionSamples

1035

Thanks. That fix makes sense to me. This change will depend on your fix if I keep the assumption Total >= Data.Count and the assertion.

hoy added inline comments.Feb 18 2021, 12:17 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1035

Sure, let me send out the fix.

wmi updated this revision to Diff 324814.Feb 18 2021, 4:52 PM

Address Wenlei's comment. Rebase on Hongtao's fix https://reviews.llvm.org/D96990

wenlei accepted this revision.Feb 18 2021, 4:54 PM

lgtm, thanks!

This revision is now accepted and ready to land.Feb 18 2021, 4:54 PM
This revision was landed with ongoing or failed builds.Feb 18 2021, 5:05 PM
This revision was automatically updated to reflect the committed changes.
wmi added a comment.Feb 19 2021, 10:11 AM

I added PromotedInsns back in https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf because I found there was some problem in current implementation and it may cause repeated ICP without the protection of PromotedInsns. Basically, when we read value profile using getValueProfDataFromInst, we specify MaxNumPromotions as the maximum value to read. It is possible that the size of valuedata is larger than MaxNumPromotions so getValueProfDataFromInst ignore some of the 0 count values. I added PromotedInsns back temporarily while I am figuring out what is the best way to fix this.