Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 taken, i32 1 exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
732 | Would be better to separate the functional (+1) change from the variable name changes and divideNearest(). The latter is an NFC which is independent of and hides the former. If the functional change goes first, it can e.g. simply do + uint64_t EstimatedBackedgeTakenCount; if (LatchBR->getSuccessor(0) == L->getHeader()) - return (TrueVal + (FalseVal / 2)) / FalseVal; + EstimatedBackedgeTakenCount = (TrueVal + (FalseVal / 2)) / FalseVal; else - return (FalseVal + (TrueVal / 2)) / TrueVal; + EstimatedBackedgeTakenCount = (FalseVal + (TrueVal / 2)) / TrueVal; + return EstimatedBackedgeTakenCount + 1; Can alternatively renamed original function to getEstimatedBackedgeTakenCount(), and introduced a new getLoopEstimatedTripCount() which returns the former plus one, if it's useful(?) | |
llvm/unittests/Transforms/Utils/CMakeLists.txt | ||
18 ↗ | (On Diff #235561) | already there? |
llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp | ||
1 ↗ | (On Diff #235561) | "This file was added." - this file already exists, right? |
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
726–727 | If !LatchCycleWeight (if !BackedgeTakenWeight) it should return 1 instead of 0. I.e., should always be equal to "getEstimateBackedgeTakenCount()" plus 1. If !LatchExitWeight, perhaps better return None instead of 0? This implies the loop was never entered when gathering the profile, or loop was entered once and never exited(?); in either case estimating the trip count to be zero seems wrong. |
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
726–727 | Completely agree. |
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
731 | Maybe clearer to emphasize the +1 effect, noting that Trip count is one plus the estimated backedge taken count, and doing uint64_t BackedgeTakenCount = llvm::divideNearest(BackedgeTakenWeight, LatchExitWeight); return BackedgeTakenCount + 1; (cf. InnerLoopVectorizer::getOrCreateTripCount(Loop *L)) | |
llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll | ||
14 | It is indeed confusing at first sight, to see that the total number of estimated trip count is 2+5, given branch_weights of 6 and 1 (and thus estimated backedge taken count of 6/1). Perhaps worth clarifying in the above comment about "remained iterations". Should -unroll-peel-max-count=7 be updated? Note that loop peeling already updates the estimated trip count (from 7 to 5 above), by calling Transforms/Utils/LoopUnrollPeel.cpp's fixupBranchWeights(); potential reuse with D67905's setLoopEstimatedTripCount(). | |
llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp | ||
18 ↗ | (On Diff #236990) | These should all be in lex order. |
1 ↗ | (On Diff #235561) |
This change seems mostly unrelated to the +1 accuracy fix, right? Suggest to commit it first, testing the original (backedge taken) value, and then modify it along with the +1 patch, similar to above fixes of lit tests. |
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
731 | Sure, works for me. In fact, this +1 approach was used in original patch :-) | |
llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll | ||
14 |
Not sure I understand why it should be updated. Loop has 7 estimated iterations and we aim at peeling 2 iterations. Thus -unroll-peel-max-count=7 seems reasonable to me. | |
14 |
Even if I clarify here it won't help a lot since there are bunch of other similar places. Probably it's better to add explanation to getLoopEstimatedTripCount? | |
llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp | ||
18 ↗ | (On Diff #236990) | Will fix |
1 ↗ | (On Diff #235561) |
I believe you requested to add such test at some time. Anyway it's better to have such test than not :-). Will extract this change to a separate review. |
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
729–734 | "Trip count" >> "Estimated backedge taken count" | |
734 | Suffice to say that Estimated trip count is one plus estimated backedge taken count. | |
llvm/test/Transforms/LoopUnroll/peel-loop-conditions-pgo-1.ll | ||
14 | OK; just noting that original test estimated the trip count to be 6, and used -unroll-peel-max-count=7, so seemed reasonable to bump this threshold (to 8) given that the estimation bumped (to 7). | |
14 | OK; just suggesting to extend the existing "remained iterations" comment above, clarifying how many they are. | |
llvm/unittests/Transforms/Utils/LoopUtilsTest.cpp | ||
1 ↗ | (On Diff #235561) | Sorry, my original request was just to make sure the +1 change is covered by tests, as opposed to NFC patches; the above lit tests seem fine. |
This look good to me, with a last minor nit, thanks!
llvm/lib/Transforms/Utils/LoopUtils.cpp | ||
---|---|---|
730 | "the the edge exiting weight" >> "the weight of the edge exiting the loop" |
If !LatchCycleWeight (if !BackedgeTakenWeight) it should return 1 instead of 0. I.e., should always be equal to "getEstimateBackedgeTakenCount()" plus 1.
If !LatchExitWeight, perhaps better return None instead of 0? This implies the loop was never entered when gathering the profile, or loop was entered once and never exited(?); in either case estimating the trip count to be zero seems wrong.