When profile entry count is 0, it is prone to trigger "div 0" error during profile update in inlining. We have seen a tricky case that expose the error (the testcase is a simplified example to expose it). In the patch, when we need to set the profile entry count to 0, change it to 1. That is aligned with how we handle such case before.
Diff Detail
- Repository
- rL LLVM
Event Timeline
It is possible to for an entry with zero count -- as for instance in instrumentation PGO. Should it be fixed in place where the div by zero happens?
Oh, that is good to know. I think the problem only exists for SamplePGO because SamplePGO has prof meta data annotated on callsite instruction, and profile update in inlining needs to update that meta data proportionally according to entry count. Instrumentation PGO only use BFI and don't have to do that update.
I think it is possible to be fixed the error I am looking at in place where the div by zero happens. It is in Instruction::updateProfWeight when the function parameter T is 0. We can either change T to 1, or simply return. I slightly lean towards current approach because we may want to do it in a more consistent way -- currently adjusting 0 to 1 happens in multiple places in SamplePGO, even in profile generation tools). Another reason is Instruction::updateProfWeight is used by other components other than SamplePGO. Doing it in Instruction::updateProfWeight may silently suppress some floating point exception which may expose some other serious error, like the prof meta data is reset to 0 by memory overflow.
Can you clarify? IIRC, we never update the per-call site profile meta data with inline transformations.
Looked at the Instruction::updateProfWeight() --- the part that update branch_weights seems bogus -- there is need need to scale branch weight at all.
VP data needs to be scaled of course, but if 'S' or 'T' is zero, just zero them out.
Seems we did that all the time even before https://reviews.llvm.org/rL352001 (Since rL352001, we did the update even in sample profile loader pass, when we found cold inline instance was not inlined by sample profile loader). Here is the code snippet before the refactoring involved in rL352001.
298270 dehao /// Update the branch metadata for cloned call instructions. 298270 dehao static void updateCallProfile(Function *Callee, const ValueToValueMapTy &VMap, 322771 eraman const ProfileCount &CalleeEntryCount, 302597 eraman const Instruction *TheCall, 303574 tejohnson ProfileSummaryInfo *PSI, 303574 tejohnson BlockFrequencyInfo *CallerBFI) { 322771 eraman if (!CalleeEntryCount.hasValue() || CalleeEntryCount.isSynthetic() || 322771 eraman CalleeEntryCount.getCount() < 1) 298270 dehao return; 322771 eraman auto CallSiteCount = PSI ? PSI->getProfileCount(TheCall, CallerBFI) : None; 298270 dehao uint64_t CallCount = 298270 dehao std::min(CallSiteCount.hasValue() ? CallSiteCount.getValue() : 0, 322771 eraman CalleeEntryCount.getCount()); 298270 dehao 298270 dehao for (auto const &Entry : VMap) 298279 dblaikie if (isa<CallInst>(Entry.first)) 298279 dblaikie if (auto *CI = dyn_cast_or_null<CallInst>(Entry.second)) 322771 eraman CI->updateProfWeight(CallCount, CalleeEntryCount.getCount()); 298270 dehao for (BasicBlock &BB : *Callee) 298270 dehao // No need to update the callsite if it is pruned during inlining. 298270 dehao if (VMap.count(&BB)) 298270 dehao for (Instruction &I : BB) 298270 dehao if (CallInst *CI = dyn_cast<CallInst>(&I)) 322771 eraman CI->updateProfWeight(CalleeEntryCount.getCount() - CallCount, 322771 eraman CalleeEntryCount.getCount());
Discussed with David offline: SamplePGO borrow the branch_weight meta data to represent call count, so the update for it is needed. But scaling branch weights for branch instructions is unnecessary and will introduce more inaccuracy. We need some refactoring in this part. I will do that in a follow up patch.