This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Improve the working set size heuristics under the partial sample PGO.
ClosedPublic

Authored by hjyamauchi on May 12 2020, 6:02 PM.

Details

Summary

The working set size heuristics (ProfileSummaryInfo::hasHugeWorkingSetSize)
under the partial sample PGO may not be accurate because the profile is partial
and the number of hot profile counters in the ProfileSummary may not reflect the
actual working set size of the program being compiled.

To improve this, the (approximated) ratio of the the number of profile counters
of the program being compiled to the number of profile counters in the partial
sample profile is computed (which is called the partial profile ratio) and the
working set size of the profile is scaled by this ratio to reflect the working
set size of the program being compiled and used for the working set size
heuristics.

The partial profile ratio is approximated based on the number of the basic
blocks in the program and the NumCounts field in the ProfileSummary and computed
through the thin LTO indexing. This means that there is the limitation that the
scaled working set size is available to the thin LTO post link passes only.

Diff Detail

Event Timeline

hjyamauchi created this revision.May 12 2020, 6:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2020, 6:02 PM

clang-format.

I'm going to split this patch into parts:

  • Use Module::setModuleFlag rather than addModuleFlag for the ProfileSummary metadata.
  • Add the PartialProfileRatio field in ProfileSummary metadata.
  • Compute the block count through thin LTO.
  • The ProfileSummaryInfo change.
davidxl added a reviewer: wmi.May 13 2020, 9:57 AM

If the profile parser part were in a separate patch, would the size of the resulting patches be more balanced out? I'm wondering if splitting this into smaller patches would be meaningful.

llvm/include/llvm/IR/ProfileSummary.h
61

Can this be initialized, for maintainability (even if it's currently initialized in the ctor - code evolution may omit that later. Initializing at declaration reduces surprise bugs later)

llvm/lib/Analysis/ProfileSummaryInfo.cpp
78

How did we come by the 0.008 value?

llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
987

unrelated change (the last line)

llvm/lib/IR/ProfileSummary.cpp
119

Most of the code looks similar to the getVal above. Could some factoring reduce that duplication?

195

While at it - could we rename lower-case i to upper case something?

If the profile parser part were in a separate patch, would the size of the resulting patches be more balanced out? I'm wondering if splitting this into smaller patches would be meaningful.

Yup, splitting is the plan (see the earlier comment.)

hjyamauchi marked 8 inline comments as done.
hjyamauchi added inline comments.
llvm/include/llvm/IR/ProfileSummary.h
61

Done in D79951.

llvm/lib/Analysis/ProfileSummaryInfo.cpp
78

It came from tuning of large app / benchmarks so large working set size won't trigger for benchmarks.

llvm/lib/IR/ProfileSummary.cpp
119

Done in D79951.

195

Done in D79951.

hjyamauchi marked 4 inline comments as done.May 18 2020, 10:27 AM

Please review the split patches (child revisions) D79902 and D79951 first.

hjyamauchi marked 2 inline comments as done.May 21 2020, 1:40 PM

D79902 and D79951 have been checked in.
Split the part that computes the block count into D80403.

llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
987

Done in D80403.

wmi added inline comments.May 21 2020, 4:33 PM
llvm/include/llvm/IR/ProfileSummary.h
61

PartialProfileRatio is only available in ThinLTO mode. Better make it explicit in the comment.

PartialProfileRatio is set to "BlockCount / NumCounts". It doesn't match the ratio of "profile counters of the program being built" / "the number of profile counters in the partial sample profile", because for AFDO every block may have multiple counters inside of it. I think that is why you put "approximiately" in front of the comment.

Can we redefine PartialProfileRatio to "BlockCount / NumCounts * PartialSampleProfileWorkingSetSizeScaleFactor", so PartialProfileRatio can use the definition in the comment exactly instead of approximately? We may rename PartialSampleProfileWorkingSetSizeScaleFactor using something like NumofCounterInBlockScaleFactor. That will also give the flag PartialSampleProfileWorkingSetSizeScaleFactor more concrete meaning.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
396

Can we use Index.addBlockCount(F.size())?

hjyamauchi marked 3 inline comments as done.May 26 2020, 11:34 AM
hjyamauchi added inline comments.
llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
396

Done in D80403.

hjyamauchi marked 2 inline comments as done.May 26 2020, 12:57 PM
hjyamauchi added inline comments.
llvm/include/llvm/IR/ProfileSummary.h
61

Will add a comment about "only available in thin LTO mode."

It'd be nice to give precise meaning, but I wonder if those counts are that precise.

If NumofCounterInBlockScaleFactor == the number of counters per block, it'd have the value of 0.008, which is set by tuning, would the value/name make sense?

wmi added inline comments.May 27 2020, 8:58 PM
llvm/include/llvm/IR/ProfileSummary.h
61

Ok, the scale factor is not just the "factor of counters per block", but "factors of counters per block * (number of block s in working set / number of all the blocks of current program)". How about keep using PartialSampleProfileWorkingSetSizeScaleFactor but explain its meaning more clearly?

hjyamauchi marked an inline comment as done.

Rebase past D79902, D79951 and D80403.
Address comments.
Some cleanup.

hjyamauchi added inline comments.May 29 2020, 1:57 PM
llvm/include/llvm/IR/ProfileSummary.h
61

I thought about this more. I think that by and large, the small scale factor value is due to the fact that the hot working set sizes (NumCounts at the 99 percentile) in the PGO/PartialSamplePGO profiles are looking like ~1:100, respectively and the ProfileSummaryHugeWorkingSetSizeThreshold value was originally tuned for PGO. I guess the 0.008 value resulted from tuning to scale it down to the PGO level because the threshold values are shared.

I clarified the description of PartialSampleProfileWorkingSetSizeScaleFactor to include this and the "factor of counters per block".

Thoughts?

Disable the flag to be enabled later.

wmi accepted this revision.May 29 2020, 4:44 PM

LGTM.

llvm/include/llvm/IR/ProfileSummary.h
61

It sounds good to me.

This revision is now accepted and ready to land.May 29 2020, 4:44 PM
This revision was automatically updated to reflect the committed changes.