Page MenuHomePhabricator

[PGO] Use sum of count values to fix func entry count and add a check to verify BFI counts
AcceptedPublic

Authored by xur on May 3 2019, 2:30 PM.

Details

Reviewers
davidxl
Summary

Raw profile count values for each BB are not kept after profile annotation.
We record function entry count and branch weights and use them to compute the count when needed.
This mechanism works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in BFI) .
This patch add functionality to compare BFI counts with real profile counts right after reading the profile.
It also fixes function entry count to make the BFI count close to real profile counts.

Diff Detail

Event Timeline

xur created this revision.May 3 2019, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 2:30 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
davidxl added inline comments.May 20 2019, 1:47 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
260

It is probably more useful to have an option controlling checking hot BB's only -- check with either RawCount or BFI count is hot (can catch the case when a cold block becomes 'hot' with BFI info).

1625

what is the rationale and use of this fixup?

1678

Add comment of make variable name NonC0BBNum more obvious for its meaning.

1702

Variable not used.

llvm/test/Transforms/PGOProfile/fix_bfi.ll
2

Add an comment on the what the test case is testing (especially the fix check part).

xur marked 5 inline comments as done.May 20 2019, 3:59 PM
xur added inline comments.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
260

Yes. Checking hot BFI counts will catch some abnormal cases.
The reason I used a cutoff instead of default hot threshold is that there could be too many hot BB to check for some programs.
Using a high cutoff could make the output much cleaner. And the user can specify the hot threshold from detailed summary.

I will probably keep this option and add another option that checking hot BB as you suggested.

1625

There is no perfect way to fix this: when the information is lost, we could not recover. This is just one way to fix it so that the total count close.

1678

will do.

1702

will put this variable under Debug macro.

llvm/test/Transforms/PGOProfile/fix_bfi.ll
2

will do.

davidxl added inline comments.May 21 2019, 4:34 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1630

assert here that function entry count is already set.

1634

In what situation can this happen?

1637

how can this be possible?

1645

maybe early return?

1648

negate the logic and early return?

1649

assert SumBFCount is not zero

1651

early return?

1654

Is this handled by the next statements?

1668

Is this needed?

xur marked 10 inline comments as done.Wed, Nov 18, 5:35 PM

I'm reviving this patch because this fixes one of our internal bugs.

Before the fix, The BFI counters are way off the raw profile counter:

` BB : Count=96615607 BFI Count=16

BB : Count=96615607  BFI Count=16
BB : Count=9874441  BFI Count=2
BB : Count=9849389  BFI Count=2
BB : Count=9928400  BFI Count=2
BB : Count=96694618  BFI Count=16
BB : Count=30  BFI Count=0
BB : Count=96694588  BFI Count=16
BB : Count=93506121  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=6933665  BFI Count=1
BB : Count=6933665  BFI Count=1
BB : Count=4641130  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=3439260  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=4624009  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=2299158  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2234047  BFI Count=0
BB : Count=3500  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2225955  BFI Count=0
BB : Count=18824  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230585  BFI Count=0
BB : Count=2229242  BFI Count=0
BB : Count=38  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=68611  BFI Count=0
BB : Count=103021581  BFI Count=16
BB : Count=103021611  BFI Count=16
BB : Count=6406005  BFI Count=1

`

With this patch, we can get:

`BB : Count=1  BFI Count=6145756
BB : Count=96615607  BFI Count=98836277
BB : Count=96615607  BFI Count=98836277
BB : Count=9874441  BFI Count=10101401
BB : Count=9849389  BFI Count=10075773
BB : Count=9928400  BFI Count=10075773
BB : Count=96694618  BFI Count=98836277
BB : Count=30  BFI Count=31
BB : Count=96694588  BFI Count=98836246
BB : Count=93506121  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=6933665  BFI Count=6844610
BB : Count=6933665  BFI Count=6844610
BB : Count=4641130  BFI Count=4581520
BB : Count=4634507  BFI Count=4574982
BB : Count=4634507  BFI Count=4574982
BB : Count=3439260  BFI Count=3395087
BB : Count=4634090  BFI Count=4574982
BB : Count=4624009  BFI Count=4565030
BB : Count=4634090  BFI Count=4574982
BB : Count=2299158  BFI Count=2269628
BB : Count=2230547  BFI Count=2201898
BB : Count=2234047  BFI Count=2205353
BB : Count=3500  BFI Count=3455
BB : Count=2230547  BFI Count=2201898
BB : Count=2225955  BFI Count=2197365
BB : Count=18824  BFI Count=18582
BB : Count=2230547  BFI Count=2201898
BB : Count=2230547  BFI Count=2201898
BB : Count=2230585  BFI Count=2201936
BB : Count=2229242  BFI Count=2200610
BB : Count=2230547  BFI Count=2201898
BB : Count=68611  BFI Count=67730
BB : Count=103021581  BFI Count=98836246
BB : Count=103021611  BFI Count=98836277
BB : Count=6406005  BFI Count=6145756

`

I will post the updated patch shortly.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1634

I changed to findBBInfo.
Usually this won't happen. But this can happen if there is unreachable BB (like when build at -O0).

1637

If the entryCount is 0, we don't have Profile Count for BB.
This could happen if we did not fix the entryCount.

But now, 0 entryCount is not possible after we populatedCounters (if the maxcount is nonzero). This is not needed. I will remove.

1651

Done

1654

Changed the code.

1668

Probably not. I removed it.

xur updated this revision to Diff 306271.Wed, Nov 18, 5:42 PM
xur marked 4 inline comments as done.

Updated the patch that addressed David's review comments.
One noticeable change was to .add a new option "pgo-vefify-hot-bfi" which will
output a message when one of the following happens:
(1) a hot rawCount becomes non-hot in BFI
(2) a cold rawCount becomes hot in BFI

-Rong

Can you split this patch into two? one for verification, the other for fixing?

wenlei added a subscriber: hoy.Wed, Nov 18, 5:48 PM

We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs

We're battling this exact problem from sample PGO side. And we have internal patches for comparing block counts from BFI with ground truth. Perhaps these all can be refactored to be shared between Instr PGO and sample PGO. +@hoy

How does fixing up entry count alone help mitigate this problem?

hoy added a comment.Wed, Nov 18, 6:13 PM
In D61540#2404348, @xur wrote:

I'm reviving this patch because this fixes one of our internal bugs.

Before the fix, The BFI counters are way off the raw profile counter:

` BB : Count=96615607 BFI Count=16

BB : Count=96615607  BFI Count=16
BB : Count=9874441  BFI Count=2
BB : Count=9849389  BFI Count=2
BB : Count=9928400  BFI Count=2
BB : Count=96694618  BFI Count=16
BB : Count=30  BFI Count=0
BB : Count=96694588  BFI Count=16
BB : Count=93506121  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=6933665  BFI Count=1
BB : Count=6933665  BFI Count=1
BB : Count=4641130  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=3439260  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=4624009  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=2299158  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2234047  BFI Count=0
BB : Count=3500  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2225955  BFI Count=0
BB : Count=18824  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230585  BFI Count=0
BB : Count=2229242  BFI Count=0
BB : Count=38  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=68611  BFI Count=0
BB : Count=103021581  BFI Count=16
BB : Count=103021611  BFI Count=16
BB : Count=6406005  BFI Count=1

`

With this patch, we can get:

`BB : Count=1  BFI Count=6145756
BB : Count=96615607  BFI Count=98836277
BB : Count=96615607  BFI Count=98836277
BB : Count=9874441  BFI Count=10101401
BB : Count=9849389  BFI Count=10075773
BB : Count=9928400  BFI Count=10075773
BB : Count=96694618  BFI Count=98836277
BB : Count=30  BFI Count=31
BB : Count=96694588  BFI Count=98836246
BB : Count=93506121  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=6933665  BFI Count=6844610
BB : Count=6933665  BFI Count=6844610
BB : Count=4641130  BFI Count=4581520
BB : Count=4634507  BFI Count=4574982
BB : Count=4634507  BFI Count=4574982
BB : Count=3439260  BFI Count=3395087
BB : Count=4634090  BFI Count=4574982
BB : Count=4624009  BFI Count=4565030
BB : Count=4634090  BFI Count=4574982
BB : Count=2299158  BFI Count=2269628
BB : Count=2230547  BFI Count=2201898
BB : Count=2234047  BFI Count=2205353
BB : Count=3500  BFI Count=3455
BB : Count=2230547  BFI Count=2201898
BB : Count=2225955  BFI Count=2197365
BB : Count=18824  BFI Count=18582
BB : Count=2230547  BFI Count=2201898
BB : Count=2230547  BFI Count=2201898
BB : Count=2230585  BFI Count=2201936
BB : Count=2229242  BFI Count=2200610
BB : Count=2230547  BFI Count=2201898
BB : Count=68611  BFI Count=67730
BB : Count=103021581  BFI Count=98836246
BB : Count=103021611  BFI Count=98836277
BB : Count=6406005  BFI Count=6145756

`

I will post the updated patch shortly.

Interesting to see the BFI-computed execution count diverse so much from real counts, even for PGO. I'm wondering if this is due to optimizations in the training build. For the example above, the BFI-computed block frequencies (ratio, not execution count) seem not diverging too much for each block, compared to the real profile counts.

xur updated this revision to Diff 306496.EditedThu, Nov 19, 11:55 AM

Split the verification to another patch.

verification patch:
https://reviews.llvm.org/D91813

davidxl added inline comments.Thu, Nov 19, 1:48 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1657

Perhaps take the max of the original func entry count and the new entry count?

xur added inline comments.Thu, Nov 19, 3:03 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1657

Not sure if want a max here. Here a value of 1 makes the sum of raw counters and sum of BFI counters close to each other, better than original func entry count.

Using max will never reduce the entry count -- I don't think that is what we want.

davidxl added inline comments.Thu, Nov 19, 4:24 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1657

I thought the intention of the patch is to correct the 'guessed' entry count which is usually '1'. If the original entry count is not 1, it is usually a value we can trust. Sometimes BFI can create insanely large counts which makes the scale really small. It ends up leading to new entry entry count to become 1 (from non-1 value). Is that the intended behavior?

davidxl accepted this revision.Thu, Nov 19, 5:23 PM

lgtm

This revision is now accepted and ready to land.Thu, Nov 19, 5:23 PM