Restrict call metadata based hotness detection to Sample PGO mode
ClosedPublic

Authored by tejohnson on May 4 2017, 11:57 AM.

Details

Summary

Don't use the metadata on call instructions for determining hotness
unless we are in sample PGO mode, where it is needed because profile
counts are not accurate. In instrumentation mode this is not necessary
and does more harm than good when calls have VP metadata that hasn't
been properly scaled after transformations or dropped after constant
prop based devirtualization (both should be fixed, but we don't need
to do this in the first place for instrumentation PGO).

This required adjusting a number of tests to distinguish between sample
and instrumentation PGO handling.

tejohnson created this revision.May 4 2017, 11:57 AM
eraman added a subscriber: eraman.May 5 2017, 2:42 PM

I feel the underlying problem here is the use of VP metadata to get counts in sample PGO mode. Dehao, even when VP metadata is available, can we add branch_weights to the call? Then, only sample PGO will annotate branch weights to calls and that won't affect instrumented PGO.

I feel the underlying problem here is the use of VP metadata to get counts in sample PGO mode. Dehao, even when VP metadata is available, can we add branch_weights to the call? Then, only sample PGO will annotate branch weights to calls and that won't affect instrumented PGO.

That sounds like potentially a good longer term change. But I'd still like for this to go in as an immediate fix for the issue, which is causing a noticeable impact on instrumentation PGO performance, and because there is no reason AFAIK for anything other than Sample PGO to need to look at call metadata for determining hotness.

I feel the underlying problem here is the use of VP metadata to get counts in sample PGO mode. Dehao, even when VP metadata is available, can we add branch_weights to the call? Then, only sample PGO will annotate branch weights to calls and that won't affect instrumented PGO.

That sounds like potentially a good longer term change. But I'd still like for this to go in as an immediate fix for the issue, which is causing a noticeable impact on instrumentation PGO performance, and because there is no reason AFAIK for anything other than Sample PGO to need to look at call metadata for determining hotness.

Just to confirm, with https://reviews.llvm.org/D32773, we should have recovered all the instrumented PGO performance, isn't it?

lib/Analysis/ProfileSummaryInfo.cpp
82

Looks like you can always get the Kind from Inst, why would you want to pass in the Summary?

I feel the underlying problem here is the use of VP metadata to get counts in sample PGO mode. Dehao, even when VP metadata is available, can we add branch_weights to the call? Then, only sample PGO will annotate branch weights to calls and that won't affect instrumented PGO.

That sounds like potentially a good longer term change. But I'd still like for this to go in as an immediate fix for the issue, which is causing a noticeable impact on instrumentation PGO performance, and because there is no reason AFAIK for anything other than Sample PGO to need to look at call metadata for determining hotness.

Just to confirm, with https://reviews.llvm.org/D32773, we should have recovered all the instrumented PGO performance, isn't it?

You're right, I suspect that it may (will be checking after it is fully integrated here), but my concern is that there may still be other transformations that don't update the VP metadata properly (I mentioned earlier this week that e.g. it appears the metadata isn't being dropped when calls are devirtualized after constant prop, although I don't think this causes the performance issue). These things should be fixed for other reasons including Sample PGO accuracy, but it is not needed for instrumentation PGO in the first place - is there any reason for anything other than Sample PGO to look at this instead of the branch weights?

lib/Analysis/ProfileSummaryInfo.cpp
82

It involves doing some work that is already done when the summary is available, so this was added just for the case where it is invoked without a summary.

eraman added inline comments.May 5 2017, 4:10 PM
include/llvm/Analysis/ProfileSummaryInfo.h
62 ↗(On Diff #97862)

A major rationale for adding ProfileSummaryInfo as a separate analysis is to prevent Profilesummary from being directly manipulated, so I believe we shouldn't add an interface that takes ProfileSummary as a parameter. As Dehao mentions below, you anyway get the module from the instruction and get the kind from there, so this is not necessary. My concern there is this becomes unnecessarily expensive as we get an invariant value (kind) many times.

tejohnson added inline comments.May 5 2017, 4:21 PM
include/llvm/Analysis/ProfileSummaryInfo.h
62 ↗(On Diff #97862)

A major rationale for adding ProfileSummaryInfo as a separate analysis is to prevent Profilesummary from being directly manipulated, so I believe we shouldn't add an interface that takes ProfileSummary as a parameter.

Note that this is only passed in when getProfileCount() is called from a ProfileSummaryInfo method (it passes in the Summary object it owns), so I am not sure what the concern is about directly manipulating it?

As Dehao mentions below, you anyway get the module from the instruction and get the kind from there, so this is not necessary. My concern there is this becomes unnecessarily expensive as we get an invariant value (kind) many times.

Exactly, that's why I don't think we should keep re-finding the profile metadata every time when it is called from the ProfileSummaryInfo.

eraman added inline comments.May 5 2017, 4:28 PM
include/llvm/Analysis/ProfileSummaryInfo.h
62 ↗(On Diff #97862)

The concern is you now have a public method that takes ProfileSummaryInfo * (even though no one outside the class passes it). What you can do is have a private helper method that takes a ProfileSummaryInfo *. Within the class, call this directly. Note that
the class already owns the Summary. External users will still call the public method, where you can get the summary from the instruction and call the private method (with a comment about the overhead).

tejohnson added inline comments.May 8 2017, 1:53 PM
include/llvm/Analysis/ProfileSummaryInfo.h
62 ↗(On Diff #97862)

Done

tejohnson updated this revision to Diff 98202.May 8 2017, 1:53 PM

Address review comments

tejohnson updated this revision to Diff 98560.May 10 2017, 4:33 PM

Update patch now that getProfileCount is not static. This greatly simplified
the code changes, but required adjusting a few more tests which needed
profile summary metadata.

eraman added inline comments.May 10 2017, 5:51 PM
lib/Analysis/ProfileSummaryInfo.cpp
78–85

I wonder if we should check if Summary is non-null and then the summary kind is PSK_Sample. There is one test case down below (inliner count update) where you had to attach the summary to the test case. Is there any reason the summary has to be present to get the count based on entry count and block frequency?

tejohnson added inline comments.May 10 2017, 6:58 PM
lib/Analysis/ProfileSummaryInfo.cpp
78–85

Do you mean only do the metadata-based hotness when computeSummary() returns true and the kind is PSK_Sample? I.e. if !computeSummary(), then assume instrumentation based? I could do that, it would mean a few less test changes.

eraman added inline comments.May 11 2017, 10:53 AM
lib/Analysis/ProfileSummaryInfo.cpp
78–85

Yes, that's what I should've written. As long as we have function entry counts, we should return the profile count.

tejohnson updated this revision to Diff 98695.May 11 2017, 4:00 PM

Treat missing summary as instrumentation PGO, and remove unneeded test changes

eraman accepted this revision.May 11 2017, 4:29 PM

LGTM

This revision is now accepted and ready to land.May 11 2017, 4:29 PM
This revision was automatically updated to reflect the committed changes.