This is an archive of the discontinued LLVM Phabricator instance.

Do not want to use BFI to get profile count for sample pgo
ClosedPublic

Authored by danielcdh on Jul 28 2017, 3:25 PM.

Details

Summary

For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate.

Event Timeline

danielcdh created this revision.Jul 28 2017, 3:25 PM
eraman edited edge metadata.Jul 28 2017, 4:09 PM

If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.

If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.

We only want to do this on callsites, as we still rely on BFI to get other block frequencies.

If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.

We only want to do this on callsites, as we still rely on BFI to get other block frequencies.

So, either the BFI based count is usable in sample PGO or not. If it is the latter, I think it should not be used at all. If it is usable (but not very accurate), then there is no harm in using in call.

Do we use the profile count of a non-call instruction anywhere for sample PGO? If we don't, may be assert in BFI->getProfileCount when we use sample PGO? Or alternatively, annotate all call instructions so that you can assert that every call has a metadata (which could be 0)?

If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.

We only want to do this on callsites, as we still rely on BFI to get other block frequencies.

So, either the BFI based count is usable in sample PGO or not. If it is the latter, I think it should not be used at all. If it is usable (but not very accurate), then there is no harm in using in call.

It's usable. But inaccurate.

Do we use the profile count of a non-call instruction anywhere for sample PGO? If we don't, may be assert in BFI->getProfileCount when we use sample PGO?

Yes, there is already an assertion in ProfileSummaryInfo::getProfileCount to ensure Inst is a call/invoke instruction :)

Or alternatively, annotate all call instructions so that you can assert that every call has a metadata (which could be 0)?

In sample PGO, if the call count is 0, we don't set prof metadata to save space, as most of the call instructions will have 0 count.

I still feel this behavior is confusing, but I'll let David and Teresa to weigh in. Do you have any cases in real code where a block containing a call with no metadata has a high profile count? Is this masking a bug in some other code (BFI or sample profile loader)?

eraman accepted this revision.Aug 2 2017, 2:08 PM

LGTM based on offline discussion with Dehao on why this could happen. Please add comments to the effect that if any callsite has a non-zero count, it gets a metadata and conversely any callsite without metadata has no count.

This revision is now accepted and ready to land.Aug 2 2017, 2:08 PM
danielcdh updated this revision to Diff 109432.Aug 2 2017, 2:20 PM

add comment

danielcdh updated this revision to Diff 109471.Aug 2 2017, 6:07 PM

updated the implementation to include a flag to control whether to return 0 or None if the profile is missing for the entire function. PTAL.

danielcdh updated this revision to Diff 109474.Aug 2 2017, 6:36 PM

Refactor the code after talking to Easwaran (Thanks!): Move the logic to IsCallsiteCold.

danielcdh updated this revision to Diff 109475.Aug 2 2017, 7:04 PM

update test.

danielcdh closed this revision.Aug 3 2017, 10:12 AM