This is an archive of the discontinued LLVM Phabricator instance.

Use max(BFI_Count, TotalProfCount) to get block's profile count.
AbandonedPublic

Authored by danielcdh on Nov 7 2016, 1:30 PM.

Details

Reviewers
davidxl
eraman
Summary

For Sample PGO, BFI count may be inaccurate due to errors during propagation. In this patch, another mechanism is added to make sure the block's profile count is no less than the total prof count extracted from the metadata. A similar mechanism was introduced to inliner in r275073.

Event Timeline

danielcdh updated this revision to Diff 77091.Nov 7 2016, 1:30 PM
danielcdh retitled this revision from to Use max(BFI_Count, TotalProfCount) to get block's profile count..
danielcdh updated this object.
danielcdh added reviewers: davidxl, eraman.
danielcdh added a subscriber: llvm-commits.
davidxl added inline comments.Nov 8 2016, 3:56 PM
lib/Analysis/BlockFrequencyInfo.cpp
155

This does not look like the right place for the adjustment. Maybe provide a wrapper method in ProfileSummaryInfo::getBlockProfileCount ..

Besides, the adjustment does not work well for Instrumentation based PGO -- the branch weight may be scaled (though since it scales down, it does not matter in reality).

danielcdh added inline comments.Nov 8 2016, 4:08 PM
lib/Analysis/BlockFrequencyInfo.cpp
155

My concern is that if we add ProfileSummaryInfo::getBlockProfileCount, it will get confused with BlockFrequencyInfo::getBlockProfileCount.

Do you mean that we hide this function in BlockFrequencyInfo, and only expose it in ProfileSummaryInfo, and change all callsites to PSI? It will introduce dependency to PSI (though there are not many callsites to this function)

davidxl added inline comments.Nov 9 2016, 10:37 AM
lib/Analysis/BlockFrequencyInfo.cpp
155

Another reason this is probably the wrong way to handle this is because inlining/cloning operation does not actually update the weight meta data (scaling) -- using the original total weight is not correct.

Is it possible for SampleProfiler pass to identify those BBs that need special hanlding and introduce a new MD_prof data (with name "profile_count") -- the instruction cloner needs to be taught to update it properly

danielcdh abandoned this revision.Nov 9 2016, 1:53 PM

As discussed with David offline:

  • Adding this to the BFI interface is too hacky
  • The perfect solution would be improve samplepgo branch_probability propagation algorithm:
    • The algorithm takes BB->count map (for a subset of all BBs in the function) as input, and output the Edge->probability map
    • The current algorithm uses iterative solution to infer branch probability from BB counts. It is guaranteed to converge in a few iterations (fast), but it's adhoc, and does not guarantee flow-consistency (i.e. sum(in-edge-weights) = sum(out-edge-weights)). As a result, Applying BFI on the computed branch probability can lead to incorrect BB count (comparing with the input BB->count map).
    • To ensure flow consistency, MCF needs to be used, which is super expensive (We initially experimented with it in GCC, and finally gave up).
    • A new adhoc algorithm needs to be designed to derive reasonably accurate branch probability while ensuring hot BB counts computed by BFI agrees with the original BB count map. It seems very difficult to design such an algorithm as there are a lot to consider in the propagation, e.g.
      1. missing BB count in the original input
      2. 0 sample count for function/loop entry
      3. loop early-exit and irregular control flow
  • As we do not have a clear picture towards how to design a perfect propagation algorithm, we can use the extracProfTotalWeights to get the original counts from metadata for now. But we need to make sure that the branch probability metadata gets updated by optimizations that would clone code, especially function inlining. And also, we need to move the logic to a helper function in PSI instead of a public API in BFI.

So I will close this patch for now, and move the changes to https://reviews.llvm.org/D26353