This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Overwrite branch weight annotated in previous pass.
ClosedPublic

Authored by hoy on May 14 2021, 2:43 PM.

Details

Summary

Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (-overwrite-existing-weights=1) to enable a full overwrite, which includes:

  1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication.
  1. Clear indirect call metadata if somehow all the rest targets have a sum of zero count.
  1. Overwrite branch weight for basic blocks.

With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink. It's not expected to work well for non-CS case with a less-accurate post-inline count quality.

It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive.

Diff Detail

Event Timeline

hoy created this revision.May 14 2021, 2:43 PM
hoy requested review of this revision.May 14 2021, 2:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2021, 2:43 PM

Thanks for bringing this up - it makes sure we give full context profile precedence even when we had to use context-less profile in prelink.

When counts are accurate, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obselete metedata.

I assume the above is from CSSPGO, right?

It'd be good to call out in the change description as to why postlink overwrite could be beneficial.

If we target AutoFDO as well, would be good to mention the impact on AutoFDO too.

llvm/lib/Transforms/IPO/SampleProfile.cpp
226

nit: keep the blank line before namespace?

1458–1459

Btw, this comment is no longer accurate, right? Based on the discussion in D100993, the factor is strictly for handling duplication now, not for scaling counts after ICP.

1482–1486

So we have special case for Sum == 0 because updateIDTMetaData with Sum 0 is reserved for marking promoted targets, right?

We may also reach Sum == 0 inside updateIDTMetaData after promoted target is removed, however it looks like annotateValueSite would simply bailout without removing existing metadata when new target list is empty.. so we may miss some overwrite still? Though for such cases, it needs to be updated even when OverwriteExistingWeights is not used.

hoy added a comment.May 16 2021, 10:24 PM

Thanks for bringing this up - it makes sure we give full context profile precedence even when we had to use context-less profile in prelink.

When counts are accurate, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obselete metedata.

I assume the above is from CSSPGO, right?

It'd be good to call out in the change description as to why postlink overwrite could be beneficial.

Yes, they are for CSSPGO.

If we target AutoFDO as well, would be good to mention the impact on AutoFDO too.

I haven't tested for non-CS case. In theory it should help there but with a less-than-ideal counts quality, the change may not help in practice.

llvm/lib/Transforms/IPO/SampleProfile.cpp
226

done.

1458–1459

Good point. The proration is done based on a pre-ICP distribution factor.

1482–1486

So we have special case for Sum == 0 because updateIDTMetaData with Sum 0 is reserved for marking promoted targets, right?

Right, Sum == 0 for updateIDTMetaData is used to mark a certain target to be promoted already.

We may also reach Sum == 0 inside updateIDTMetaData after promoted target is removed, however it looks like annotateValueSite would simply bailout without removing existing metadata when new target list is empty.. so we may miss some overwrite still? Though for such cases, it needs to be updated even when OverwriteExistingWeights is not used.

This is true. When Sum reaches zero when promoted target counts are withdrawn, instead of removing the existing metadata, the current implementation just assigns the call instruction a new zero metadata. That should give equivalent information to subsequent passes.

hoy edited the summary of this revision. (Show Details)May 16 2021, 11:16 PM
hoy updated this revision to Diff 345753.May 16 2021, 11:17 PM

Addressing Wenlei's comment.

Thanks for bringing this up - it makes sure we give full context profile precedence even when we had to use context-less profile in prelink.

When counts are accurate, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obselete metedata.

I assume the above is from CSSPGO, right?

It'd be good to call out in the change description as to why postlink overwrite could be beneficial.

Yes, they are for CSSPGO.

If we target AutoFDO as well, would be good to mention the impact on AutoFDO too.

I haven't tested for non-CS case. In theory it should help there but with a less-than-ideal counts quality, the change may not help in practice.

For future reference, can you make it explicit in the change description: 1) what made this beneficial for CSSPGO (give post-link full context profile precedence over pre-link partial context profile, i.e. be more specific about obsolete metadata); 2) what is the expected impact on AutoFDO.

I think there's another caveat worth calling out, by overwriting all pre-link weights, we would lose the prof metadata adjustment done by prelink opt passes (e.g. prelink loop passes).

llvm/lib/Transforms/IPO/SampleProfile.cpp
1482–1486

Ok took another look, i was looking at the wrong overload of annotateValueSite. There's one that would bail out without setting metadata if value data is empty, but that's not the overload we call from updateIDTMetaData. So we should be good.

hoy edited the summary of this revision. (Show Details)May 17 2021, 10:23 PM

Thanks for bringing this up - it makes sure we give full context profile precedence even when we had to use context-less profile in prelink.

When counts are accurate, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obselete metedata.

I assume the above is from CSSPGO, right?

It'd be good to call out in the change description as to why postlink overwrite could be beneficial.

Yes, they are for CSSPGO.

If we target AutoFDO as well, would be good to mention the impact on AutoFDO too.

I haven't tested for non-CS case. In theory it should help there but with a less-than-ideal counts quality, the change may not help in practice.

For future reference, can you make it explicit in the change description: 1) what made this beneficial for CSSPGO (give post-link full context profile precedence over pre-link partial context profile, i.e. be more specific about obsolete metadata); 2) what is the expected impact on AutoFDO.

I think there's another caveat worth calling out, by overwriting all pre-link weights, we would lose the prof metadata adjustment done by prelink opt passes (e.g. prelink loop passes).

Good point, summary updated.

wmi added a comment.May 17 2021, 11:33 PM

If we target AutoFDO as well, would be good to mention the impact on AutoFDO too.

I haven't tested for non-CS case. In theory it should help there but with a less-than-ideal counts quality, the change may not help in practice.

AutoFDO doesn't have the distribution factor which CSSPGO+pseudo hook have so OverwriteExistingWeights may not help for AutoFDO.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1483–1486

The second updateIDTMetaData call should be removed?

1496–1498

Remove MD_prof means compiler may use static heuristic to infer the callsite hotness, which can still be hot. This is different from marking it as cold.

If the profile is accurate, why not set the count to 0?

1545–1547

Update the comment to mention OverwriteExistingWeights.

hoy added inline comments.May 18 2021, 12:24 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1483–1486

Good catch, kind of rebasing issue.

1496–1498

I see. Thanks for pointing it out. We want it to be treated as cold here. Should be using zero count for direct callsites. For indirect callsites, I'm inclined to removing the metadata which shouldn't trigger PGO ICP to do anything. What do you think?

1545–1547

Done.

hoy updated this revision to Diff 346241.May 18 2021, 12:25 PM
hoy edited the summary of this revision. (Show Details)

Addressing Wei's comment.

hoy retitled this revision from [SampleFDO] Overwrite branch weight annotated in previous pass. to [CSSPGO] Overwrite branch weight annotated in previous pass..May 18 2021, 12:25 PM
wmi accepted this revision.May 18 2021, 1:27 PM

LGTM.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1496–1498

Removing metadata for indirect callsites makes sense to me.

This revision is now accepted and ready to land.May 18 2021, 1:27 PM
wenlei accepted this revision.May 18 2021, 10:31 PM

lgtm, thanks.

This revision was automatically updated to reflect the committed changes.