Page MenuHomePhabricator

[SampleFDO] Never set profile entry count to 0
Needs ReviewPublic

Authored by wmi on Apr 19 2019, 9:10 AM.

Details

Reviewers
davidxl
eraman
Summary

When profile entry count is 0, it is prone to trigger "div 0" error during profile update in inlining. We have seen a tricky case that expose the error (the testcase is a simplified example to expose it). In the patch, when we need to set the profile entry count to 0, change it to 1. That is aligned with how we handle such case before.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi created this revision.Apr 19 2019, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2019, 9:10 AM

It is possible to for an entry with zero count -- as for instance in instrumentation PGO. Should it be fixed in place where the div by zero happens?

wmi added a comment.Apr 19 2019, 9:46 AM

It is possible to for an entry with zero count -- as for instance in instrumentation PGO. Should it be fixed in place where the div by zero happens?

Oh, that is good to know. I think the problem only exists for SamplePGO because SamplePGO has prof meta data annotated on callsite instruction, and profile update in inlining needs to update that meta data proportionally according to entry count. Instrumentation PGO only use BFI and don't have to do that update.

I think it is possible to be fixed the error I am looking at in place where the div by zero happens. It is in Instruction::updateProfWeight when the function parameter T is 0. We can either change T to 1, or simply return. I slightly lean towards current approach because we may want to do it in a more consistent way -- currently adjusting 0 to 1 happens in multiple places in SamplePGO, even in profile generation tools). Another reason is Instruction::updateProfWeight is used by other components other than SamplePGO. Doing it in Instruction::updateProfWeight may silently suppress some floating point exception which may expose some other serious error, like the prof meta data is reset to 0 by memory overflow.

Can you clarify? IIRC, we never update the per-call site profile meta data with inline transformations.

Looked at the Instruction::updateProfWeight() --- the part that update branch_weights seems bogus -- there is need need to scale branch weight at all.

VP data needs to be scaled of course, but if 'S' or 'T' is zero, just zero them out.

wmi added a comment.Apr 19 2019, 10:19 AM

Seems we did that all the time even before https://reviews.llvm.org/rL352001 (Since rL352001, we did the update even in sample profile loader pass, when we found cold inline instance was not inlined by sample profile loader). Here is the code snippet before the refactoring involved in rL352001.

298270      dehao /// Update the branch metadata for cloned call instructions.
298270      dehao static void updateCallProfile(Function *Callee, const ValueToValueMapTy &VMap,
322771     eraman                               const ProfileCount &CalleeEntryCount,
302597     eraman                               const Instruction *TheCall,
303574  tejohnson                               ProfileSummaryInfo *PSI,
303574  tejohnson                               BlockFrequencyInfo *CallerBFI) {
322771     eraman   if (!CalleeEntryCount.hasValue() || CalleeEntryCount.isSynthetic() ||
322771     eraman       CalleeEntryCount.getCount() < 1)
298270      dehao     return;
322771     eraman   auto CallSiteCount = PSI ? PSI->getProfileCount(TheCall, CallerBFI) : None;
298270      dehao   uint64_t CallCount =
298270      dehao       std::min(CallSiteCount.hasValue() ? CallSiteCount.getValue() : 0,
322771     eraman                CalleeEntryCount.getCount());
298270      dehao 
298270      dehao   for (auto const &Entry : VMap)
298279   dblaikie     if (isa<CallInst>(Entry.first))
298279   dblaikie       if (auto *CI = dyn_cast_or_null<CallInst>(Entry.second))
322771     eraman         CI->updateProfWeight(CallCount, CalleeEntryCount.getCount());
298270      dehao   for (BasicBlock &BB : *Callee)
298270      dehao     // No need to update the callsite if it is pruned during inlining.
298270      dehao     if (VMap.count(&BB))
298270      dehao       for (Instruction &I : BB)
298270      dehao         if (CallInst *CI = dyn_cast<CallInst>(&I))
322771     eraman           CI->updateProfWeight(CalleeEntryCount.getCount() - CallCount,
322771     eraman                                CalleeEntryCount.getCount());
wmi added a comment.Apr 19 2019, 10:35 AM

Looked at the Instruction::updateProfWeight() --- the part that update branch_weights seems bogus -- there is need need to scale branch weight at all.

VP data needs to be scaled of course, but if 'S' or 'T' is zero, just zero them out.

Discussed with David offline: SamplePGO borrow the branch_weight meta data to represent call count, so the update for it is needed. But scaling branch weights for branch instructions is unnecessary and will introduce more inaccuracy. We need some refactoring in this part. I will do that in a follow up patch.