This is an archive of the discontinued LLVM Phabricator instance.

Differential D61540

[PGO] Use sum of count values to fix func entry count and add a check to verify BFI counts
ClosedPublic

Authored by xur on May 3 2019, 2:30 PM.

Download Raw Diff

Details

Reviewers

davidxl

Commits

rG0abd744597ee: [PGO] Use the sum of profile counts to fix the function entry count

Summary

Raw profile count values for each BB are not kept after profile annotation.
We record function entry count and branch weights and use them to compute the count when needed.
This mechanism works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in BFI) .
This patch add functionality to compare BFI counts with real profile counts right after reading the profile.
It also fixes function entry count to make the BFI count close to real profile counts.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

xur created this revision.May 3 2019, 2:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 2:30 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

davidxl added inline comments.May 20 2019, 1:47 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
260	It is probably more useful to have an option controlling checking hot BB's only -- check with either RawCount or BFI count is hot (can catch the case when a cold block becomes 'hot' with BFI info).
1649	what is the rationale and use of this fixup?
1702	Add comment of make variable name NonC0BBNum more obvious for its meaning.
1726	Variable not used.
llvm/test/Transforms/PGOProfile/fix_bfi.ll
1	Add an comment on the what the test case is testing (especially the fix check part).

xur marked 5 inline comments as done.May 20 2019, 3:59 PM

xur added inline comments.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
260	Yes. Checking hot BFI counts will catch some abnormal cases. The reason I used a cutoff instead of default hot threshold is that there could be too many hot BB to check for some programs. Using a high cutoff could make the output much cleaner. And the user can specify the hot threshold from detailed summary. I will probably keep this option and add another option that checking hot BB as you suggested.
1649	There is no perfect way to fix this: when the information is lost, we could not recover. This is just one way to fix it so that the total count close.
1702	will do.
1726	will put this variable under Debug macro.
llvm/test/Transforms/PGOProfile/fix_bfi.ll
1	will do.

davidxl added inline comments.May 21 2019, 4:34 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1654	assert here that function entry count is already set.
1658	In what situation can this happen?
1661	how can this be possible?
1669	maybe early return?
1672	negate the logic and early return?
1673	assert SumBFCount is not zero
1675	early return?
1678	Is this handled by the next statements?
1692	Is this needed?

I'm reviving this patch because this fixes one of our internal bugs.

Before the fix, The BFI counters are way off the raw profile counter:

` BB : Count=96615607 BFI Count=16

BB : Count=96615607  BFI Count=16
BB : Count=9874441  BFI Count=2
BB : Count=9849389  BFI Count=2
BB : Count=9928400  BFI Count=2
BB : Count=96694618  BFI Count=16
BB : Count=30  BFI Count=0
BB : Count=96694588  BFI Count=16
BB : Count=93506121  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=6933665  BFI Count=1
BB : Count=6933665  BFI Count=1
BB : Count=4641130  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=3439260  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=4624009  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=2299158  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2234047  BFI Count=0
BB : Count=3500  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2225955  BFI Count=0
BB : Count=18824  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230585  BFI Count=0
BB : Count=2229242  BFI Count=0
BB : Count=38  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=68611  BFI Count=0
BB : Count=103021581  BFI Count=16
BB : Count=103021611  BFI Count=16
BB : Count=6406005  BFI Count=1

With this patch, we can get:

`BB : Count=1  BFI Count=6145756
BB : Count=96615607  BFI Count=98836277
BB : Count=96615607  BFI Count=98836277
BB : Count=9874441  BFI Count=10101401
BB : Count=9849389  BFI Count=10075773
BB : Count=9928400  BFI Count=10075773
BB : Count=96694618  BFI Count=98836277
BB : Count=30  BFI Count=31
BB : Count=96694588  BFI Count=98836246
BB : Count=93506121  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=6933665  BFI Count=6844610
BB : Count=6933665  BFI Count=6844610
BB : Count=4641130  BFI Count=4581520
BB : Count=4634507  BFI Count=4574982
BB : Count=4634507  BFI Count=4574982
BB : Count=3439260  BFI Count=3395087
BB : Count=4634090  BFI Count=4574982
BB : Count=4624009  BFI Count=4565030
BB : Count=4634090  BFI Count=4574982
BB : Count=2299158  BFI Count=2269628
BB : Count=2230547  BFI Count=2201898
BB : Count=2234047  BFI Count=2205353
BB : Count=3500  BFI Count=3455
BB : Count=2230547  BFI Count=2201898
BB : Count=2225955  BFI Count=2197365
BB : Count=18824  BFI Count=18582
BB : Count=2230547  BFI Count=2201898
BB : Count=2230547  BFI Count=2201898
BB : Count=2230585  BFI Count=2201936
BB : Count=2229242  BFI Count=2200610
BB : Count=2230547  BFI Count=2201898
BB : Count=68611  BFI Count=67730
BB : Count=103021581  BFI Count=98836246
BB : Count=103021611  BFI Count=98836277
BB : Count=6406005  BFI Count=6145756

I will post the updated patch shortly.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1658	I changed to findBBInfo. Usually this won't happen. But this can happen if there is unreachable BB (like when build at -O0).
1661	If the entryCount is 0, we don't have Profile Count for BB. This could happen if we did not fix the entryCount. But now, 0 entryCount is not possible after we populatedCounters (if the maxcount is nonzero). This is not needed. I will remove.
1675	Done
1678	Changed the code.
1692	Probably not. I removed it.

Herald added a subscriber: wenlei. · View Herald TranscriptNov 18 2020, 5:35 PM

Updated the patch that addressed David's review comments.
One noticeable change was to .add a new option "pgo-vefify-hot-bfi" which will
output a message when one of the following happens:
(1) a hot rawCount becomes non-hot in BFI
(2) a cold rawCount becomes hot in BFI

-Rong

Can you split this patch into two? one for verification, the other for fixing?

We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs

We're battling this exact problem from sample PGO side. And we have internal patches for comparing block counts from BFI with ground truth. Perhaps these all can be refactored to be shared between Instr PGO and sample PGO. +@hoy

How does fixing up entry count alone help mitigate this problem?

In D61540#2404348, @xur wrote:

I'm reviving this patch because this fixes one of our internal bugs.

Before the fix, The BFI counters are way off the raw profile counter:

` BB : Count=96615607 BFI Count=16

BB : Count=96615607  BFI Count=16
BB : Count=9874441  BFI Count=2
BB : Count=9849389  BFI Count=2
BB : Count=9928400  BFI Count=2
BB : Count=96694618  BFI Count=16
BB : Count=30  BFI Count=0
BB : Count=96694588  BFI Count=16
BB : Count=93506121  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=99833531  BFI Count=16
BB : Count=6933665  BFI Count=1
BB : Count=6933665  BFI Count=1
BB : Count=4641130  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=4634507  BFI Count=1
BB : Count=3439260  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=4624009  BFI Count=1
BB : Count=4634090  BFI Count=1
BB : Count=2299158  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2234047  BFI Count=0
BB : Count=3500  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2225955  BFI Count=0
BB : Count=18824  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=2230585  BFI Count=0
BB : Count=2229242  BFI Count=0
BB : Count=38  BFI Count=0
BB : Count=2230547  BFI Count=0
BB : Count=68611  BFI Count=0
BB : Count=103021581  BFI Count=16
BB : Count=103021611  BFI Count=16
BB : Count=6406005  BFI Count=1

With this patch, we can get:

`BB : Count=1  BFI Count=6145756
BB : Count=96615607  BFI Count=98836277
BB : Count=96615607  BFI Count=98836277
BB : Count=9874441  BFI Count=10101401
BB : Count=9849389  BFI Count=10075773
BB : Count=9928400  BFI Count=10075773
BB : Count=96694618  BFI Count=98836277
BB : Count=30  BFI Count=31
BB : Count=96694588  BFI Count=98836246
BB : Count=93506121  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=99833531  BFI Count=95577159
BB : Count=6933665  BFI Count=6844610
BB : Count=6933665  BFI Count=6844610
BB : Count=4641130  BFI Count=4581520
BB : Count=4634507  BFI Count=4574982
BB : Count=4634507  BFI Count=4574982
BB : Count=3439260  BFI Count=3395087
BB : Count=4634090  BFI Count=4574982
BB : Count=4624009  BFI Count=4565030
BB : Count=4634090  BFI Count=4574982
BB : Count=2299158  BFI Count=2269628
BB : Count=2230547  BFI Count=2201898
BB : Count=2234047  BFI Count=2205353
BB : Count=3500  BFI Count=3455
BB : Count=2230547  BFI Count=2201898
BB : Count=2225955  BFI Count=2197365
BB : Count=18824  BFI Count=18582
BB : Count=2230547  BFI Count=2201898
BB : Count=2230547  BFI Count=2201898
BB : Count=2230585  BFI Count=2201936
BB : Count=2229242  BFI Count=2200610
BB : Count=2230547  BFI Count=2201898
BB : Count=68611  BFI Count=67730
BB : Count=103021581  BFI Count=98836246
BB : Count=103021611  BFI Count=98836277
BB : Count=6406005  BFI Count=6145756

I will post the updated patch shortly.

Interesting to see the BFI-computed execution count diverse so much from real counts, even for PGO. I'm wondering if this is due to optimizations in the training build. For the example above, the BFI-computed block frequencies (ratio, not execution count) seem not diverging too much for each block, compared to the real profile counts.

Split the verification to another patch.

verification patch:
https://reviews.llvm.org/D91813

xur mentioned this in D91813: [PGO] verify BFI counts after loading profile data.Nov 19 2020, 11:57 AM

davidxl added inline comments.Nov 19 2020, 1:48 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1681	Perhaps take the max of the original func entry count and the new entry count?

xur added inline comments.Nov 19 2020, 3:03 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1681	Not sure if want a max here. Here a value of 1 makes the sum of raw counters and sum of BFI counters close to each other, better than original func entry count. Using max will never reduce the entry count -- I don't think that is what we want.

davidxl added inline comments.Nov 19 2020, 4:24 PM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1681	I thought the intention of the patch is to correct the 'guessed' entry count which is usually '1'. If the original entry count is not 1, it is usually a value we can trust. Sometimes BFI can create insanely large counts which makes the scale really small. It ends up leading to new entry entry count to become 1 (from non-1 value). Is that the intended behavior?

lgtm

This revision is now accepted and ready to land.Nov 19 2020, 5:23 PM

Closed by commit rG0abd744597ee: [PGO] Use the sum of profile counts to fix the function entry count (authored by xur). · Explain WhyDec 16 2020, 1:38 PM

This revision was automatically updated to reflect the committed changes.

xur added a commit: rG0abd744597ee: [PGO] Use the sum of profile counts to fix the function entry count.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Instrumentation/

PGOInstrumentation.cpp

60 lines

test/

Transforms/

PGOProfile/

Inputs/

fix_bfi.proftext

16 lines

bfi_verification.ll

4 lines

	fix_bfi.ll
	bfi_verification.ll

22 lines

Diff 312298

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	EmitBranchProbability("pgo-emit-branch-prob", cl::init(false), cl::Hidden,
"branch probability will be emitted as "		"branch probability will be emitted as "
"optimization remarks: -{Rpass\|"		"optimization remarks: -{Rpass\|"
"pass-remarks}=pgo-instrumentation"));		"pass-remarks}=pgo-instrumentation"));

static cl::opt<bool> PGOInstrumentEntry(		static cl::opt<bool> PGOInstrumentEntry(
"pgo-instrument-entry", cl::init(false), cl::Hidden,		"pgo-instrument-entry", cl::init(false), cl::Hidden,
cl::desc("Force to instrument function entry basicblock."));		cl::desc("Force to instrument function entry basicblock."));

		static cl::opt<bool>
		PGOFixEntryCount("pgo-fix-entry-count", cl::init(true), cl::Hidden,
		cl::desc("Fix function entry count in profile use."));

static cl::opt<bool> PGOVerifyHotBFI(		static cl::opt<bool> PGOVerifyHotBFI(
"pgo-verify-hot-bfi", cl::init(false), cl::Hidden,		"pgo-verify-hot-bfi", cl::init(false), cl::Hidden,
		davidxlUnsubmitted Not Done Reply Inline Actions It is probably more useful to have an option controlling checking hot BB's only -- check with either RawCount or BFI count is hot (can catch the case when a cold block becomes 'hot' with BFI info). davidxl: It is probably more useful to have an option controlling checking hot BB's only -- check with…
		xurAuthorUnsubmitted Done Reply Inline Actions Yes. Checking hot BFI counts will catch some abnormal cases. The reason I used a cutoff instead of default hot threshold is that there could be too many hot BB to check for some programs. Using a high cutoff could make the output much cleaner. And the user can specify the hot threshold from detailed summary. I will probably keep this option and add another option that checking hot BB as you suggested. xur: Yes. Checking hot BFI counts will catch some abnormal cases. The reason I used a cutoff instead…
cl::desc("Print out the non-match BFI count if a hot raw profile count "		cl::desc("Print out the non-match BFI count if a hot raw profile count "
"becomes non-hot, or a cold raw profile count becomes hot. "		"becomes non-hot, or a cold raw profile count becomes hot. "
"The print is enabled under -Rpass-analysis=pgo, or "		"The print is enabled under -Rpass-analysis=pgo, or "
"internal option -pass-remakrs-analysis=pgo."));		"internal option -pass-remakrs-analysis=pgo."));

static cl::opt<bool> PGOVerifyBFI(		static cl::opt<bool> PGOVerifyBFI(
"pgo-verify-bfi", cl::init(false), cl::Hidden,		"pgo-verify-bfi", cl::init(false), cl::Hidden,
cl::desc("Print out mismatched BFI counts after setting profile metadata "		cl::desc("Print out mismatched BFI counts after setting profile metadata "
▲ Show 20 Lines • Show All 1,370 Lines • ▼ Show 20 Lines	PreservedAnalyses PGOInstrumentationGen::run(Module &M,
};		};

if (!InstrumentAllFunctions(M, LookupTLI, LookupBPI, LookupBFI, IsCS))		if (!InstrumentAllFunctions(M, LookupTLI, LookupBPI, LookupBFI, IsCS))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

		// Using the ratio b/w sums of profile count values and BFI count values to
		// adjust the func entry count.
		static void fixFuncEntryCount(PGOUseFunc &Func, LoopInfo &LI,
		davidxlUnsubmitted Not Done Reply Inline Actions what is the rationale and use of this fixup? davidxl: what is the rationale and use of this fixup?
		xurAuthorUnsubmitted Done Reply Inline Actions There is no perfect way to fix this: when the information is lost, we could not recover. This is just one way to fix it so that the total count close. xur: There is no perfect way to fix this: when the information is lost, we could not recover. This…
		BranchProbabilityInfo &NBPI) {
		Function &F = Func.getFunc();
		BlockFrequencyInfo NBFI(F, NBPI, LI);
		#ifndef NDEBUG
		auto BFIEntryCount = F.getEntryCount();
		davidxlUnsubmitted Done Reply Inline Actions assert here that function entry count is already set. davidxl: assert here that function entry count is already set.
		assert(BFIEntryCount.hasValue() && (BFIEntryCount.getCount() > 0) &&
		"Invalid BFI Entrycount");
		#endif
		auto SumCount = APFloat::getZero(APFloat::IEEEdouble());
		davidxlUnsubmitted Not Done Reply Inline Actions In what situation can this happen? davidxl: In what situation can this happen?
		xurAuthorUnsubmitted Done Reply Inline Actions I changed to findBBInfo. Usually this won't happen. But this can happen if there is unreachable BB (like when build at -O0). xur: I changed to findBBInfo. Usually this won't happen. But this can happen if there is unreachable…
		auto SumBFICount = APFloat::getZero(APFloat::IEEEdouble());
		for (auto &BBI : F) {
		uint64_t CountValue = 0;
		davidxlUnsubmitted Done Reply Inline Actions how can this be possible? davidxl: how can this be possible?
		xurAuthorUnsubmitted Done Reply Inline Actions If the entryCount is 0, we don't have Profile Count for BB. This could happen if we did not fix the entryCount. But now, 0 entryCount is not possible after we populatedCounters (if the maxcount is nonzero). This is not needed. I will remove. xur: If the entryCount is 0, we don't have Profile Count for BB. This could happen if we did not…
		uint64_t BFICountValue = 0;
		if (!Func.findBBInfo(&BBI))
		continue;
		auto BFICount = NBFI.getBlockProfileCount(&BBI);
		CountValue = Func.getBBInfo(&BBI).CountValue;
		BFICountValue = BFICount.getValue();
		SumCount.add(APFloat(CountValue * 1.0), APFloat::rmNearestTiesToEven);
		SumBFICount.add(APFloat(BFICountValue * 1.0), APFloat::rmNearestTiesToEven);
		davidxlUnsubmitted Done Reply Inline Actions maybe early return? davidxl: maybe early return?
		}
		if (SumCount.isZero())
		return;
		davidxlUnsubmitted Done Reply Inline Actions negate the logic and early return? davidxl: negate the logic and early return?

		davidxlUnsubmitted Done Reply Inline Actions assert SumBFCount is not zero davidxl: assert SumBFCount is not zero
		assert(SumBFICount.compare(APFloat(0.0)) == APFloat::cmpGreaterThan &&
		"Incorrect sum of BFI counts");
		davidxlUnsubmitted Done Reply Inline Actions early return? davidxl: early return?
		xurAuthorUnsubmitted Done Reply Inline Actions Done xur: Done
		if (SumBFICount.compare(SumCount) == APFloat::cmpEqual)
		return;
		double Scale = (SumCount / SumBFICount).convertToDouble();
		davidxlUnsubmitted Done Reply Inline Actions Is this handled by the next statements? davidxl: Is this handled by the next statements?
		xurAuthorUnsubmitted Done Reply Inline Actions Changed the code. xur: Changed the code.
		if (Scale < 1.001 && Scale > 0.999)
		return;

		davidxlUnsubmitted Not Done Reply Inline Actions Perhaps take the max of the original func entry count and the new entry count? davidxl: Perhaps take the max of the original func entry count and the new entry count?
		xurAuthorUnsubmitted Done Reply Inline Actions Not sure if want a max here. Here a value of 1 makes the sum of raw counters and sum of BFI counters close to each other, better than original func entry count. Using max will never reduce the entry count -- I don't think that is what we want. xur: Not sure if want a max here. Here a value of 1 makes the sum of raw counters and sum of BFI…
		davidxlUnsubmitted Not Done Reply Inline Actions I thought the intention of the patch is to correct the 'guessed' entry count which is usually '1'. If the original entry count is not 1, it is usually a value we can trust. Sometimes BFI can create insanely large counts which makes the scale really small. It ends up leading to new entry entry count to become 1 (from non-1 value). Is that the intended behavior? davidxl: I thought the intention of the patch is to correct the 'guessed' entry count which is usually…
		uint64_t FuncEntryCount = Func.getBBInfo(&*F.begin()).CountValue;
		uint64_t NewEntryCount = 0.5 + FuncEntryCount * Scale;
		if (NewEntryCount == 0)
		NewEntryCount = 1;
		if (NewEntryCount != FuncEntryCount) {
		F.setEntryCount(ProfileCount(NewEntryCount, Function::PCT_Real));
		LLVM_DEBUG(dbgs() << "FixFuncEntryCount: in " << F.getName()
		<< ", entry_count " << FuncEntryCount << " --> "
		<< NewEntryCount << "\n");
		}
		}
		davidxlUnsubmitted Done Reply Inline Actions Is this needed? davidxl: Is this needed?
		xurAuthorUnsubmitted Done Reply Inline Actions Probably not. I removed it. xur: Probably not. I removed it.

// Compare the profile count values with BFI count values, and print out		// Compare the profile count values with BFI count values, and print out
// the non-matching ones.		// the non-matching ones.
static void verifyFuncBFI(PGOUseFunc &Func, LoopInfo &LI,		static void verifyFuncBFI(PGOUseFunc &Func, LoopInfo &LI,
BranchProbabilityInfo &NBPI,		BranchProbabilityInfo &NBPI,
uint64_t HotCountThreshold,		uint64_t HotCountThreshold,
uint64_t ColdCountThreshold) {		uint64_t ColdCountThreshold) {
Function &F = Func.getFunc();		Function &F = Func.getFunc();
BlockFrequencyInfo NBFI(F, NBPI, LI);		BlockFrequencyInfo NBFI(F, NBPI, LI);
// bool PrintFunc = false;		// bool PrintFunc = false;
		davidxlUnsubmitted Done Reply Inline Actions Add comment of make variable name NonC0BBNum more obvious for its meaning. davidxl: Add comment of make variable name NonC0BBNum more obvious for its meaning.
		xurAuthorUnsubmitted Done Reply Inline Actions will do. xur: will do.
bool HotBBOnly = PGOVerifyHotBFI;		bool HotBBOnly = PGOVerifyHotBFI;
std::string Msg;		std::string Msg;
OptimizationRemarkEmitter ORE(&F);		OptimizationRemarkEmitter ORE(&F);

unsigned BBNum = 0, BBMisMatchNum = 0, NonZeroBBNum = 0;		unsigned BBNum = 0, BBMisMatchNum = 0, NonZeroBBNum = 0;
for (auto &BBI : F) {		for (auto &BBI : F) {
uint64_t CountValue = 0;		uint64_t CountValue = 0;
uint64_t BFICountValue = 0;		uint64_t BFICountValue = 0;

if (Func.getBBInfo(&BBI).CountValid)		if (Func.getBBInfo(&BBI).CountValid)
CountValue = Func.getBBInfo(&BBI).CountValue;		CountValue = Func.getBBInfo(&BBI).CountValue;

BBNum++;		BBNum++;
if (CountValue)		if (CountValue)
NonZeroBBNum++;		NonZeroBBNum++;
auto BFICount = NBFI.getBlockProfileCount(&BBI);		auto BFICount = NBFI.getBlockProfileCount(&BBI);
if (BFICount)		if (BFICount)
BFICountValue = BFICount.getValue();		BFICountValue = BFICount.getValue();

if (HotBBOnly) {		if (HotBBOnly) {
bool rawIsHot = CountValue >= HotCountThreshold;		bool rawIsHot = CountValue >= HotCountThreshold;
bool BFIIsHot = BFICountValue >= HotCountThreshold;		bool BFIIsHot = BFICountValue >= HotCountThreshold;
bool rawIsCold = CountValue <= ColdCountThreshold;		bool rawIsCold = CountValue <= ColdCountThreshold;
bool ShowCount = false;		bool ShowCount = false;
		davidxlUnsubmitted Not Done Reply Inline Actions Variable not used. davidxl: Variable not used.
		xurAuthorUnsubmitted Done Reply Inline Actions will put this variable under Debug macro. xur: will put this variable under Debug macro.
if (rawIsHot && !BFIIsHot) {		if (rawIsHot && !BFIIsHot) {
Msg = "raw-Hot to BFI-nonHot";		Msg = "raw-Hot to BFI-nonHot";
ShowCount = true;		ShowCount = true;
} else if (rawIsCold && BFIIsHot) {		} else if (rawIsCold && BFIIsHot) {
Msg = "raw-Cold to BFI-Hot";		Msg = "raw-Cold to BFI-Hot";
ShowCount = true;		ShowCount = true;
}		}
if (!ShowCount)		if (!ShowCount)
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	if (PGOViewRawCounts != PGOVCT_None &&
else		else
ViewGraph(&Func, Twine("PGORawCounts_") + Func.getFunc().getName());		ViewGraph(&Func, Twine("PGORawCounts_") + Func.getFunc().getName());
else if (PGOViewRawCounts == PGOVCT_Text) {		else if (PGOViewRawCounts == PGOVCT_Text) {
dbgs() << "pgo-view-raw-counts: " << Func.getFunc().getName() << "\n";		dbgs() << "pgo-view-raw-counts: " << Func.getFunc().getName() << "\n";
Func.dumpInfo();		Func.dumpInfo();
}		}
}		}

// Verify BlockFrequency information.		if (PGOVerifyBFI \|\| PGOVerifyHotBFI \|\| PGOFixEntryCount) {
if (PGOVerifyBFI \|\| PGOVerifyHotBFI) {
LoopInfo LI{DominatorTree(F)};		LoopInfo LI{DominatorTree(F)};
BranchProbabilityInfo NBPI(F, LI);		BranchProbabilityInfo NBPI(F, LI);

		// Fix func entry count.
		if (PGOFixEntryCount)
		fixFuncEntryCount(Func, LI, NBPI);

		// Verify BlockFrequency information.
uint64_t HotCountThreshold = 0, ColdCountThreshold = 0;		uint64_t HotCountThreshold = 0, ColdCountThreshold = 0;
if (PGOVerifyHotBFI) {		if (PGOVerifyHotBFI) {
HotCountThreshold = PSI->getOrCompHotCountThreshold();		HotCountThreshold = PSI->getOrCompHotCountThreshold();
ColdCountThreshold = PSI->getOrCompColdCountThreshold();		ColdCountThreshold = PSI->getOrCompColdCountThreshold();
}		}
verifyFuncBFI(Func, LI, NBPI, HotCountThreshold, ColdCountThreshold);		verifyFuncBFI(Func, LI, NBPI, HotCountThreshold, ColdCountThreshold);
}		}
}		}
▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/Inputs/fix_bfi.proftext

This file was added.

				# IR level Instrumentation Flag
				:ir
				sort_basket
				# Func Hash:
				948827210500800754
				# Num Counters:
				7
				# Counter Values:
				41017879
				31616738
				39637749
				32743703
				13338888
				6990942
				6013544

llvm/test/Transforms/PGOProfile/bfi_verification.ll

This file was copied to llvm/test/Transforms/PGOProfile/fix_bfi.ll.

	; Note: Verify bfi counter after loading the profile.			; Note: Verify bfi counter after loading the profile.
	; RUN: llvm-profdata merge %S/Inputs/bfi_verification.proftext -o %t.profdata			; RUN: llvm-profdata merge %S/Inputs/bfi_verification.proftext -o %t.profdata
	; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-verify-bfi-ratio=2 -pgo-verify-bfi=true -pass-remarks-analysis=pgo 2>&1 \| FileCheck %s --check-prefix=THRESHOLD-CHECK			; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-verify-bfi-ratio=2 -pgo-verify-bfi=true -pgo-fix-entry-count=false -pass-remarks-analysis=pgo 2>&1 \| FileCheck %s --check-prefix=THRESHOLD-CHECK
	; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-verify-hot-bfi=true -pass-remarks-analysis=pgo 2>&1 \| FileCheck %s --check-prefix=HOTONLY-CHECK			; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-verify-hot-bfi=true -pgo-fix-entry-count=false -pass-remarks-analysis=pgo 2>&1 \| FileCheck %s --check-prefix=HOTONLY-CHECK

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.basket = type { %struct.arc*, i64, i64 }			%struct.basket = type { %struct.arc*, i64, i64 }
	%struct.arc = type { i64, %struct.node, %struct.node, i32, %struct.arc, %struct.arc, i64, i64 }			%struct.arc = type { i64, %struct.node, %struct.node, i32, %struct.arc, %struct.arc, i64, i64 }
	%struct.node = type { i64, i32, %struct.node, %struct.node, %struct.node, %struct.node, %struct.arc, %struct.arc, %struct.arc, %struct.arc, i64, i64, i32, i32 }			%struct.node = type { i64, i32, %struct.node, %struct.node, %struct.node, %struct.node, %struct.arc, %struct.arc, %struct.arc, %struct.arc, i64, i64, i32, i32 }

	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/fix_bfi.ll

This file was copied from llvm/test/Transforms/PGOProfile/bfi_verification.ll.

	; Note: Verify bfi counter after loading the profile.			; Note: Scaling the func entry count (using the sum of count value) so that BFI counter value is close to raw profile counter values.
				davidxlUnsubmitted Done Reply Inline Actions Add an comment on the what the test case is testing (especially the fix check part). davidxl: Add an comment on the what the test case is testing (especially the fix check part).
				xurAuthorUnsubmitted Done Reply Inline Actions will do. xur: will do.
	; RUN: llvm-profdata merge %S/Inputs/bfi_verification.proftext -o %t.profdata			; RUN: llvm-profdata merge %S/Inputs/fix_bfi.proftext -o %t.profdata
	; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-verify-bfi-ratio=2 -pgo-verify-bfi=true -pass-remarks-analysis=pgo 2>&1 \| FileCheck %s --check-prefix=THRESHOLD-CHECK			; RUN: opt -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-fix-entry-count=true < %s 2>&1 \| FileCheck %s
	; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S -pgo-verify-hot-bfi=true -pass-remarks-analysis=pgo 2>&1 \| FileCheck %s --check-prefix=HOTONLY-CHECK

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.basket = type { %struct.arc*, i64, i64 }			%struct.basket = type { %struct.arc*, i64, i64 }
	%struct.arc = type { i64, %struct.node, %struct.node, i32, %struct.arc, %struct.arc, i64, i64 }			%struct.arc = type { i64, %struct.node, %struct.node, i32, %struct.arc, %struct.arc, i64, i64 }
	%struct.node = type { i64, i32, %struct.node, %struct.node, %struct.node, %struct.node, %struct.arc, %struct.arc, %struct.arc, %struct.arc, i64, i64, i32, i32 }			%struct.node = type { i64, i32, %struct.node, %struct.node, %struct.node, %struct.node, %struct.arc, %struct.arc, %struct.arc, %struct.arc, i64, i64, i32, i32 }

	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines

	if.then25:			if.then25:
	call void @sort_basket(i64 %l.2, i64 %max)			call void @sort_basket(i64 %l.2, i64 %max)
	br label %if.end26			br label %if.end26

	if.end26:			if.end26:
	ret void			ret void
	}			}
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB do.body Count=39637749 BFI_Count=40801304
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB while.cond Count=80655628 BFI_Count=83956530			; CHECK: define dso_local void @sort_basket(i64 %min, i64 %max) #0 !prof [[ENTRY_COUNT:![0-9]+]]
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB while.body Count=41017879 BFI_Count=42370585			; CHECK: [[ENTRY_COUNT]] = !{!"function_entry_count", i64 12949310}
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB while.cond3 Count=71254487 BFI_Count=73756204
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB while.body7 Count=31616738 BFI_Count=32954900
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB while.end8 Count=39637749 BFI_Count=40801304
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB if.then Count=32743703 BFI_Count=33739540
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB if.end Count=39637749 BFI_Count=40801304
	; THRESHOLD-CHECK: remark: <unknown>:0:0: BB if.then25 Count=6013544 BFI_Count=6277124
	; THRESHOLD-CHECK: remark: <unknown>:0:0: In Func sort_basket: Num_of_BB=14, Num_of_non_zerovalue_BB=14, Num_of_mis_matching_BB=9
	; HOTONLY-CHECK: remark: <unknown>:0:0: BB if.then25 Count=6013544 BFI_Count=6277124 (raw-Cold to BFI-Hot)
	; HOTONLY-CHECK: remark: <unknown>:0:0: In Func sort_basket: Num_of_BB=14, Num_of_non_zerovalue_BB=14, Num_of_mis_matching_BB=1