For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate.
Details
Diff Detail
- Build Status
Buildable 8709 Build 8709: arc lint + arc unit
Event Timeline
If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.
We only want to do this on callsites, as we still rely on BFI to get other block frequencies.
So, either the BFI based count is usable in sample PGO or not. If it is the latter, I think it should not be used at all. If it is usable (but not very accurate), then there is no harm in using in call.
Do we use the profile count of a non-call instruction anywhere for sample PGO? If we don't, may be assert in BFI->getProfileCount when we use sample PGO? Or alternatively, annotate all call instructions so that you can assert that every call has a metadata (which could be 0)?
It's usable. But inaccurate.
Do we use the profile count of a non-call instruction anywhere for sample PGO? If we don't, may be assert in BFI->getProfileCount when we use sample PGO?
Yes, there is already an assertion in ProfileSummaryInfo::getProfileCount to ensure Inst is a call/invoke instruction :)
Or alternatively, annotate all call instructions so that you can assert that every call has a metadata (which could be 0)?
In sample PGO, if the call count is 0, we don't set prof metadata to save space, as most of the call instructions will have 0 count.
I still feel this behavior is confusing, but I'll let David and Teresa to weigh in. Do you have any cases in real code where a block containing a call with no metadata has a high profile count? Is this masking a bug in some other code (BFI or sample profile loader)?
LGTM based on offline discussion with Dehao on why this could happen. Please add comments to the effect that if any callsite has a non-zero count, it gets a metadata and conversely any callsite without metadata has no count.
updated the implementation to include a flag to control whether to return 0 or None if the profile is missing for the entire function. PTAL.
Refactor the code after talking to Easwaran (Thanks!): Move the logic to IsCallsiteCold.