Raw profile count values for each BB are not kept after profile annotation.
We record function entry count and branch weights and use them to compute the count when needed.
This mechanism works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in BFI) .
This patch add functionality to compare BFI counts with real profile counts right after reading the profile.
It also fixes function entry count to make the BFI count close to real profile counts.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
260 | It is probably more useful to have an option controlling checking hot BB's only -- check with either RawCount or BFI count is hot (can catch the case when a cold block becomes 'hot' with BFI info). | |
1649 | what is the rationale and use of this fixup? | |
1702 | Add comment of make variable name NonC0BBNum more obvious for its meaning. | |
1726 | Variable not used. | |
llvm/test/Transforms/PGOProfile/fix_bfi.ll | ||
1 | Add an comment on the what the test case is testing (especially the fix check part). |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
260 | Yes. Checking hot BFI counts will catch some abnormal cases. I will probably keep this option and add another option that checking hot BB as you suggested. | |
1649 | There is no perfect way to fix this: when the information is lost, we could not recover. This is just one way to fix it so that the total count close. | |
1702 | will do. | |
1726 | will put this variable under Debug macro. | |
llvm/test/Transforms/PGOProfile/fix_bfi.ll | ||
1 | will do. |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
1654 | assert here that function entry count is already set. | |
1658 | In what situation can this happen? | |
1661 | how can this be possible? | |
1669 | maybe early return? | |
1672 | negate the logic and early return? | |
1673 | assert SumBFCount is not zero | |
1675 | early return? | |
1678 | Is this handled by the next statements? | |
1692 | Is this needed? |
I'm reviving this patch because this fixes one of our internal bugs.
Before the fix, The BFI counters are way off the raw profile counter:
` BB : Count=96615607 BFI Count=16
BB : Count=96615607 BFI Count=16 BB : Count=9874441 BFI Count=2 BB : Count=9849389 BFI Count=2 BB : Count=9928400 BFI Count=2 BB : Count=96694618 BFI Count=16 BB : Count=30 BFI Count=0 BB : Count=96694588 BFI Count=16 BB : Count=93506121 BFI Count=16 BB : Count=99833531 BFI Count=16 BB : Count=99833531 BFI Count=16 BB : Count=6933665 BFI Count=1 BB : Count=6933665 BFI Count=1 BB : Count=4641130 BFI Count=1 BB : Count=4634507 BFI Count=1 BB : Count=4634507 BFI Count=1 BB : Count=3439260 BFI Count=1 BB : Count=4634090 BFI Count=1 BB : Count=4624009 BFI Count=1 BB : Count=4634090 BFI Count=1 BB : Count=2299158 BFI Count=0 BB : Count=2230547 BFI Count=0 BB : Count=2234047 BFI Count=0 BB : Count=3500 BFI Count=0 BB : Count=2230547 BFI Count=0 BB : Count=2225955 BFI Count=0 BB : Count=18824 BFI Count=0 BB : Count=2230547 BFI Count=0 BB : Count=2230547 BFI Count=0 BB : Count=2230585 BFI Count=0 BB : Count=2229242 BFI Count=0 BB : Count=38 BFI Count=0 BB : Count=2230547 BFI Count=0 BB : Count=68611 BFI Count=0 BB : Count=103021581 BFI Count=16 BB : Count=103021611 BFI Count=16 BB : Count=6406005 BFI Count=1
`
With this patch, we can get:
`BB : Count=1 BFI Count=6145756 BB : Count=96615607 BFI Count=98836277 BB : Count=96615607 BFI Count=98836277 BB : Count=9874441 BFI Count=10101401 BB : Count=9849389 BFI Count=10075773 BB : Count=9928400 BFI Count=10075773 BB : Count=96694618 BFI Count=98836277 BB : Count=30 BFI Count=31 BB : Count=96694588 BFI Count=98836246 BB : Count=93506121 BFI Count=95577159 BB : Count=99833531 BFI Count=95577159 BB : Count=99833531 BFI Count=95577159 BB : Count=6933665 BFI Count=6844610 BB : Count=6933665 BFI Count=6844610 BB : Count=4641130 BFI Count=4581520 BB : Count=4634507 BFI Count=4574982 BB : Count=4634507 BFI Count=4574982 BB : Count=3439260 BFI Count=3395087 BB : Count=4634090 BFI Count=4574982 BB : Count=4624009 BFI Count=4565030 BB : Count=4634090 BFI Count=4574982 BB : Count=2299158 BFI Count=2269628 BB : Count=2230547 BFI Count=2201898 BB : Count=2234047 BFI Count=2205353 BB : Count=3500 BFI Count=3455 BB : Count=2230547 BFI Count=2201898 BB : Count=2225955 BFI Count=2197365 BB : Count=18824 BFI Count=18582 BB : Count=2230547 BFI Count=2201898 BB : Count=2230547 BFI Count=2201898 BB : Count=2230585 BFI Count=2201936 BB : Count=2229242 BFI Count=2200610 BB : Count=2230547 BFI Count=2201898 BB : Count=68611 BFI Count=67730 BB : Count=103021581 BFI Count=98836246 BB : Count=103021611 BFI Count=98836277 BB : Count=6406005 BFI Count=6145756
`
I will post the updated patch shortly.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
1658 | I changed to findBBInfo. | |
1661 | If the entryCount is 0, we don't have Profile Count for BB. But now, 0 entryCount is not possible after we populatedCounters (if the maxcount is nonzero). This is not needed. I will remove. | |
1675 | Done | |
1678 | Changed the code. | |
1692 | Probably not. I removed it. |
Updated the patch that addressed David's review comments.
One noticeable change was to .add a new option "pgo-vefify-hot-bfi" which will
output a message when one of the following happens:
(1) a hot rawCount becomes non-hot in BFI
(2) a cold rawCount becomes hot in BFI
-Rong
We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs
We're battling this exact problem from sample PGO side. And we have internal patches for comparing block counts from BFI with ground truth. Perhaps these all can be refactored to be shared between Instr PGO and sample PGO. +@hoy
How does fixing up entry count alone help mitigate this problem?
Interesting to see the BFI-computed execution count diverse so much from real counts, even for PGO. I'm wondering if this is due to optimizations in the training build. For the example above, the BFI-computed block frequencies (ratio, not execution count) seem not diverging too much for each block, compared to the real profile counts.
Split the verification to another patch.
verification patch:
https://reviews.llvm.org/D91813
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
1681 | Perhaps take the max of the original func entry count and the new entry count? |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
1681 | Not sure if want a max here. Here a value of 1 makes the sum of raw counters and sum of BFI counters close to each other, better than original func entry count. Using max will never reduce the entry count -- I don't think that is what we want. |
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp | ||
---|---|---|
1681 | I thought the intention of the patch is to correct the 'guessed' entry count which is usually '1'. If the original entry count is not 1, it is usually a value we can trust. Sometimes BFI can create insanely large counts which makes the scale really small. It ends up leading to new entry entry count to become 1 (from non-1 value). Is that the intended behavior? |
It is probably more useful to have an option controlling checking hot BB's only -- check with either RawCount or BFI count is hot (can catch the case when a cold block becomes 'hot' with BFI info).