This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Fix return value of getProbeWeight
ClosedPublic

Authored by wlei on May 6 2021, 10:16 AM.

Details

Summary

Currently we didn't support multiple return type, we work around to use error_code to represent:

  1. The dangling probe.
  2. Ignore the weight of non-probe instruction

While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block.

Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero.

Also refine the comments.

Diff Detail

Event Timeline

wlei created this revision.May 6 2021, 10:16 AM
wlei requested review of this revision.May 6 2021, 10:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2021, 10:16 AM
wlei updated this revision to Diff 343452.May 6 2021, 10:22 AM

update comments

wlei edited the summary of this revision. (Show Details)May 6 2021, 10:35 AM
wlei added reviewers: hoy, wenlei, wmi, davidxl.
hoy accepted this revision.May 7 2021, 2:13 PM

LGTM, thanks for the fix.

This revision is now accepted and ready to land.May 7 2021, 2:13 PM
wenlei added inline comments.May 7 2021, 4:15 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
572

Given it more thought, perhaps whether this is 0 or error_code is subject to something like profile-sample-accurate? I.e. with that flag, it means cold, otherwise it means unknown (error_code)?

The default behavior for getInstWeight the AutoFDO counterpart is returning error_code here. We don't have to do it now, but may be both AutoFDO and CSSPGO can check profile-sample-accurate here to decide between 0 or error_code? @wmi what do you think about AutoFDO?

hoy added inline comments.May 7 2021, 4:30 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
572

Good point. Was thinking about this. When it comes here, Inst should be mostly from a function that has a profile, like

  • If it is from a top-level inliner, the inliner should have a profile, otherwise it wouldn't be compiled.
  • If it is from an inlinee, the inlinee should have a profile, otherwise it wouldn't be inlined (see SampleProfileLoader::getInlineCandidate).
wmi added inline comments.May 7 2021, 8:56 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
572

Currently profile-sample-accurate is only used on function profile, i.e., if a function profile is not present, and if profile-sample-accurate is true, compiler will set the function entry count to be zero. It is a balance between best performance/smaller binary size and performance stableness. In google currently some targets caring about size more than performance uses it since it will generate smaller binary.

If we do the same for instruction level profile, it will make the target using the flag to be more subject to source drift issue. I am not sure whether it will decrease the usability of the flag -- some targets currently using it can get their expected balance between binary size and performance stableness, and the balance may be broken if we introduce more performance unstableness to it.

wenlei added inline comments.May 7 2021, 9:33 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
572

I was hoping that by being more stringent on profile and trust zero counts to be zero, we would be able to save size and at the same time improve performance when profile is accurate.

From your experience, does profile-sample-accurate often lead to worse performance along with smaller size? If so, that is mostly because of source drift, right? If profile is refreshed very often, or used in instr. PGO like two step builds, it should not lead to worse performance..

I think we can leave it as is, or perhaps experiment with an additional switch for profile-sample-block-accurate for block level counts.. in order to avoid affecting existing usage.

hoy added inline comments.May 7 2021, 11:52 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
572

Yeah, profile-sample-block-accurate may work well for non-probe based profile. Currently for probe-based profile, newly added blocks may cause the entire function profile obsolete due to mismatched cfg checksum.

wlei updated this revision to Diff 344622.May 11 2021, 5:54 PM
wlei edited the summary of this revision. (Show Details)

Update the comments according reviewers' discussion on profile-accurate. Currently, this version only work for probe-based code, it won't be affected by the source drift(see the comments)

wenlei accepted this revision.May 14 2021, 1:47 PM

lgtm, thanks.

llvm/lib/Transforms/IPO/SampleProfile.cpp
572

Fair point on checksum - if we get here for csspgo, we don't expect source drift.

This revision was landed with ongoing or failed builds.May 14 2021, 2:06 PM
This revision was automatically updated to reflect the committed changes.