This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
1/1
BasicBlockUtils.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
Transforms/
-
Instrumentation/
-
GCOVProfiling.cpp
-
PGOInstrumentation.cpp
-
Utils/
-
BreakCriticalEdges.cpp
-
test/Transforms/
-
Transforms/
-
GCOVProfiling/
2/3
split-indirectbr-critical-edges.ll
-
PGOProfile/
-
Inputs/
-
irreducible.proftext
-
irreducible_entry.proftext
3/4
irreducible.ll
-
split-indirectbr-critical-edges.ll
-
unittests/Transforms/Utils/
-
Transforms/
-
Utils/
1/1
BasicBlockUtilsTest.cpp

Differential D120096

PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs
ClosedPublic

Authored by MatzeB on Feb 17 2022, 3:51 PM.

Download Raw Diff

Details

Reviewers

ellis
xur
MaskRay
phosek
aeubanks
modimo
wenlei

Summary

The SplitIndirectBrCriticalEdges function was originally designed for CodeGenPrepare and skipped splitting of edges when the destination block didn't contain any PHI instructions. This only makes sense when reducing COPYs like CodeGenPrepare. In the case of PGOInstrumentation or GCOVProfiling it would result in missed counters and wrong result in functions with computed goto.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MatzeB created this revision.Feb 17 2022, 3:51 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptFeb 17 2022, 3:51 PM

MatzeB requested review of this revision.Feb 17 2022, 3:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2022, 3:51 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

MatzeB added a parent revision: D120095: Simplify/cleanup BasicBlockUtilsTest.Feb 17 2022, 3:51 PM

MatzeB edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B150347: Diff 409809.Feb 17 2022, 4:48 PM

modimo added inline comments.Feb 17 2022, 5:56 PM

llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
503–505	Please document this flag in the comment above
llvm/test/Transforms/GCOVProfiling/split-indirectbr-critical-edges.ll
37	nit:s/impossibel/impossible/g nit2: If we're willing to change the labels the indirect gotos jump to we can technically split an arbitrary # of edges. Not that we should, of course..
llvm/test/Transforms/PGOProfile/irreducible.ll
142	I'm curious how the counters shifted around now that indirectgoto->sw.bb gets split. How does the changed counter values in the profiles above map to that?
llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp
475	`s/bb3/bb2/g` has no PHI

rebase, address review comments

llvm/test/Transforms/PGOProfile/irreducible.ll
142	This is hard to figure out given the size of the test and the fact that the profile files are checked-in as-is. I hope this is fine...

MatzeB updated this revision to Diff 410885.Feb 23 2022, 11:31 AM

MatzeB added inline comments.

llvm/test/Transforms/GCOVProfiling/split-indirectbr-critical-edges.ll
37	Not completely sure what you mean in "nit2", but I changed the wording to make it clear that this is a trick to keep the edge from getting split so the test keeps working (I know more normalization etc. could "fix" this, but we're only running a single pass here)

MatzeB updated this revision to Diff 410887.Feb 23 2022, 11:33 AM

MatzeB marked an inline comment as done.

Harbormaster completed remote builds in B151105: Diff 410887.Feb 23 2022, 12:26 PM

LGTM

llvm/test/Transforms/GCOVProfiling/split-indirectbr-critical-edges.ll
37	You got it, "nit2" was that handling this case isn't technically impossible. Thanks for the wording change.
llvm/test/Transforms/PGOProfile/irreducible.ll
142	Took a look because I was curious how to get this information. Adding `-debug-only=pgo-instrumentation` dumps the edges and the instrumented edges line up with the counts in the profile. Comparing the two (irreducible.proftext) left is original right is with this diff: The count of `100` gets inferred now and the new counts are for the split edges as expected. The change that causes the `INDIRECTGOTO_IRR_LOOP` to go from 400->399 is that the probes shifted so that edge 21 and edge 22 on RHS flipped compared to LHS. That's not a big deal though.

This revision is now accepted and ready to land.Feb 23 2022, 1:10 PM

LGTM.
Thanks for this fix. I kind of know this (I used to have an assert to after calling SplitIndirectBrCriticalEdges(), but got rid of it because some cases it does return false).
Even this fix, there are still chance that we cannot split the edge.
Maybe we should emit a warning in this case.

Even this fix, there are still chance that we cannot split the edge.
Maybe we should emit a warning in this case.

Yes I'm aware that it's not always possible to split critical edges; I think it should be possible in those case to just instrument the source and destination of the critical edge with a counter to get valid data?

Admittedly I did not try to fix that either as this fix here seems to be all we need for our software... I think the remaining cases should only effect exception handling code which is not perf critical anyway? For cases with two indirectbr predecessors I always saw LLVM normalizing control flow to a single one in earlier passes, so that was never a problem for instrumentation in my experiments.

MatzeB added inline comments.Feb 23 2022, 4:28 PM

llvm/test/Transforms/PGOProfile/irreducible.ll
142	Thanks!

good catch, lgtm!

landed as 6a383369f9b800eac5de2456e49fa70577be8e33

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

BasicBlockUtils.h

4 lines

lib/

CodeGen/

CodeGenPrepare.cpp

3 lines

Transforms/

Instrumentation/

GCOVProfiling.cpp

3 lines

PGOInstrumentation.cpp

4 lines

Utils/

BreakCriticalEdges.cpp

13 lines

test/

Transforms/

GCOVProfiling/

split-indirectbr-critical-edges.ll

5 lines

PGOProfile/

Inputs/

irreducible.proftext

4 lines

irreducible_entry.proftext

4 lines

irreducible.ll

2 lines

split-indirectbr-critical-edges.ll

8 lines

unittests/

Transforms/

Utils/

BasicBlockUtilsTest.cpp

71 lines

Diff 410887

llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h

	Show First 20 Lines • Show All 494 Lines • ▼ Show 20 Lines
	// predecessor, with the others being regular branches, we can do it in a			// predecessor, with the others being regular branches, we can do it in a
	// different way.			// different way.
	// Say we have A -> D, B -> D, I -> D where only I -> D is an indirectbr.			// Say we have A -> D, B -> D, I -> D where only I -> D is an indirectbr.
	// We can split D into D0 and D1, where D0 contains only the PHIs from D,			// We can split D into D0 and D1, where D0 contains only the PHIs from D,
	// and D1 is the D block body. We can then duplicate D0 as D0A and D0B, and			// and D1 is the D block body. We can then duplicate D0 as D0A and D0B, and
	// create the following structure:			// create the following structure:
	// A -> D0A, B -> D0A, I -> D0B, D0A -> D1, D0B -> D1			// A -> D0A, B -> D0A, I -> D0B, D0A -> D1, D0B -> D1
	// If BPI and BFI aren't non-null, BPI/BFI will be updated accordingly.			// If BPI and BFI aren't non-null, BPI/BFI will be updated accordingly.
	bool SplitIndirectBrCriticalEdges(Function &F,			// When `IgnoreBlocksWithoutPHI` is set to `true` critical edges leading to a
				// block without phi-instructions will not be split.
				bool SplitIndirectBrCriticalEdges(Function &F, bool IgnoreBlocksWithoutPHI,
				modimoUnsubmitted Done Reply Inline Actions Please document this flag in the comment above modimo: Please document this flag in the comment above
	BranchProbabilityInfo *BPI = nullptr,			BranchProbabilityInfo *BPI = nullptr,
	BlockFrequencyInfo *BFI = nullptr);			BlockFrequencyInfo *BFI = nullptr);

	/// Given a set of incoming and outgoing blocks, create a "hub" such that every			/// Given a set of incoming and outgoing blocks, create a "hub" such that every
	/// edge from an incoming block InBB to an outgoing block OutBB is now split			/// edge from an incoming block InBB to an outgoing block OutBB is now split
	/// into two edges, one from InBB to the hub and another from the hub to			/// into two edges, one from InBB to the hub and another from the hub to
	/// OutBB. The hub consists of a series of guard blocks, one for each outgoing			/// OutBB. The hub consists of a series of guard blocks, one for each outgoing
	/// block. Each guard block conditionally branches to the corresponding outgoing			/// block. Each guard block conditionally branches to the corresponding outgoing
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::runOnFunction(Function &F) {
EverMadeChange \|= eliminateMostlyEmptyBlocks(F);		EverMadeChange \|= eliminateMostlyEmptyBlocks(F);

bool ModifiedDT = false;		bool ModifiedDT = false;
if (!DisableBranchOpts)		if (!DisableBranchOpts)
EverMadeChange \|= splitBranchCondition(F, ModifiedDT);		EverMadeChange \|= splitBranchCondition(F, ModifiedDT);

// Split some critical edges where one of the sources is an indirect branch,		// Split some critical edges where one of the sources is an indirect branch,
// to help generate sane code for PHIs involving such edges.		// to help generate sane code for PHIs involving such edges.
EverMadeChange \|= SplitIndirectBrCriticalEdges(F);		EverMadeChange \|=
		SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/true);

bool MadeChange = true;		bool MadeChange = true;
while (MadeChange) {		while (MadeChange) {
MadeChange = false;		MadeChange = false;
DT.reset();		DT.reset();
for (BasicBlock &BB : llvm::make_early_inc_range(F)) {		for (BasicBlock &BB : llvm::make_early_inc_range(F)) {
bool ModifiedDTOnIteration = false;		bool ModifiedDTOnIteration = false;
MadeChange \|= optimizeBlock(BB, ModifiedDTOnIteration);		MadeChange \|= optimizeBlock(BB, ModifiedDTOnIteration);
▲ Show 20 Lines • Show All 7,763 Lines • Show Last 20 Lines

llvm/lib/Transforms/Instrumentation/GCOVProfiling.cpp

Show First 20 Lines • Show All 856 Lines • ▼ Show 20 Lines	for (auto &F : M->functions()) {
uint32_t Line = SP->getLine();		uint32_t Line = SP->getLine();
auto Filename = getFilename(SP);		auto Filename = getFilename(SP);

BranchProbabilityInfo *BPI = GetBPI(F);		BranchProbabilityInfo *BPI = GetBPI(F);
BlockFrequencyInfo *BFI = GetBFI(F);		BlockFrequencyInfo *BFI = GetBFI(F);

// Split indirectbr critical edges here before computing the MST rather		// Split indirectbr critical edges here before computing the MST rather
// than later in getInstrBB() to avoid invalidating it.		// than later in getInstrBB() to avoid invalidating it.
SplitIndirectBrCriticalEdges(F, BPI, BFI);		SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/false, BPI,
		BFI);

CFGMST<Edge, BBInfo> MST(F, /InstrumentFuncEntry_=/false, BPI, BFI);		CFGMST<Edge, BBInfo> MST(F, /InstrumentFuncEntry_=/false, BPI, BFI);

// getInstrBB can split basic blocks and push elements to AllEdges.		// getInstrBB can split basic blocks and push elements to AllEdges.
for (size_t I : llvm::seq<size_t>(0, MST.AllEdges.size())) {		for (size_t I : llvm::seq<size_t>(0, MST.AllEdges.size())) {
auto &E = *MST.AllEdges[I];		auto &E = *MST.AllEdges[I];
// For now, disable spanning tree optimization when fork or exec* is		// For now, disable spanning tree optimization when fork or exec* is
// used.		// used.
▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

Show First 20 Lines • Show All 934 Lines • ▼ Show 20 Lines
// Critical edges will be split.		// Critical edges will be split.
static void instrumentOneFunc(		static void instrumentOneFunc(
Function &F, Module M, TargetLibraryInfo &TLI, BranchProbabilityInfo BPI,		Function &F, Module M, TargetLibraryInfo &TLI, BranchProbabilityInfo BPI,
BlockFrequencyInfo *BFI,		BlockFrequencyInfo *BFI,
std::unordered_multimap<Comdat , GlobalValue > &ComdatMembers,		std::unordered_multimap<Comdat , GlobalValue > &ComdatMembers,
bool IsCS) {		bool IsCS) {
// Split indirectbr critical edges here before computing the MST rather than		// Split indirectbr critical edges here before computing the MST rather than
// later in getInstrBB() to avoid invalidating it.		// later in getInstrBB() to avoid invalidating it.
SplitIndirectBrCriticalEdges(F, BPI, BFI);		SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/false, BPI, BFI);

FuncPGOInstrumentation<PGOEdge, BBInfo> FuncInfo(		FuncPGOInstrumentation<PGOEdge, BBInfo> FuncInfo(
F, TLI, ComdatMembers, true, BPI, BFI, IsCS, PGOInstrumentEntry);		F, TLI, ComdatMembers, true, BPI, BFI, IsCS, PGOInstrumentEntry);

Type *I8PtrTy = Type::getInt8PtrTy(M->getContext());		Type *I8PtrTy = Type::getInt8PtrTy(M->getContext());
auto Name = ConstantExpr::getBitCast(FuncInfo.FuncNameVar, I8PtrTy);		auto Name = ConstantExpr::getBitCast(FuncInfo.FuncNameVar, I8PtrTy);
auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()),		auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()),
FuncInfo.FunctionHash);		FuncInfo.FunctionHash);
▲ Show 20 Lines • Show All 972 Lines • ▼ Show 20 Lines	static bool annotateAllFunctions(
for (auto &F : M) {		for (auto &F : M) {
if (F.isDeclaration())		if (F.isDeclaration())
continue;		continue;
auto &TLI = LookupTLI(F);		auto &TLI = LookupTLI(F);
auto *BPI = LookupBPI(F);		auto *BPI = LookupBPI(F);
auto *BFI = LookupBFI(F);		auto *BFI = LookupBFI(F);
// Split indirectbr critical edges here before computing the MST rather than		// Split indirectbr critical edges here before computing the MST rather than
// later in getInstrBB() to avoid invalidating it.		// later in getInstrBB() to avoid invalidating it.
SplitIndirectBrCriticalEdges(F, BPI, BFI);		SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=/false, BPI, BFI);
PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,		PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,
InstrumentFuncEntry);		InstrumentFuncEntry);
// When AllMinusOnes is true, it means the profile for the function		// When AllMinusOnes is true, it means the profile for the function
// is unrepresentative and this function is actually hot. Set the		// is unrepresentative and this function is actually hot. Set the
// entry count of the function to be multiple times of hot threshold		// entry count of the function to be multiple times of hot threshold
// and drop all its internal counters.		// and drop all its internal counters.
bool AllMinusOnes = false;		bool AllMinusOnes = false;
bool AllZeros = false;		bool AllZeros = false;
▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/BreakCriticalEdges.cpp

	Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
	}			}

	// Return the unique indirectbr predecessor of a block. This may return null			// Return the unique indirectbr predecessor of a block. This may return null
	// even if such a predecessor exists, if it's not useful for splitting.			// even if such a predecessor exists, if it's not useful for splitting.
	// If a predecessor is found, OtherPreds will contain all other (non-indirectbr)			// If a predecessor is found, OtherPreds will contain all other (non-indirectbr)
	// predecessors of BB.			// predecessors of BB.
	static BasicBlock *			static BasicBlock *
	findIBRPredecessor(BasicBlock BB, SmallVectorImpl<BasicBlock > &OtherPreds) {			findIBRPredecessor(BasicBlock BB, SmallVectorImpl<BasicBlock > &OtherPreds) {
	// If the block doesn't have any PHIs, we don't care about it, since there's
	// no point in splitting it.
	PHINode *PN = dyn_cast<PHINode>(BB->begin());
	if (!PN)
	return nullptr;

	// Verify we have exactly one IBR predecessor.			// Verify we have exactly one IBR predecessor.
	// Conservatively bail out if one of the other predecessors is not a "regular"			// Conservatively bail out if one of the other predecessors is not a "regular"
	// terminator (that is, not a switch or a br).			// terminator (that is, not a switch or a br).
	BasicBlock *IBB = nullptr;			BasicBlock *IBB = nullptr;
	for (unsigned Pred = 0, E = PN->getNumIncomingValues(); Pred != E; ++Pred) {			for (BasicBlock *PredBB : predecessors(BB)) {
	BasicBlock *PredBB = PN->getIncomingBlock(Pred);
	Instruction *PredTerm = PredBB->getTerminator();			Instruction *PredTerm = PredBB->getTerminator();
	switch (PredTerm->getOpcode()) {			switch (PredTerm->getOpcode()) {
	case Instruction::IndirectBr:			case Instruction::IndirectBr:
	if (IBB)			if (IBB)
	return nullptr;			return nullptr;
	IBB = PredBB;			IBB = PredBB;
	break;			break;
	case Instruction::Br:			case Instruction::Br:
	case Instruction::Switch:			case Instruction::Switch:
	OtherPreds.push_back(PredBB);			OtherPreds.push_back(PredBB);
	continue;			continue;
	default:			default:
	return nullptr;			return nullptr;
	}			}
	}			}

	return IBB;			return IBB;
	}			}

	bool llvm::SplitIndirectBrCriticalEdges(Function &F,			bool llvm::SplitIndirectBrCriticalEdges(Function &F,
				bool IgnoreBlocksWithoutPHI,
	BranchProbabilityInfo *BPI,			BranchProbabilityInfo *BPI,
	BlockFrequencyInfo *BFI) {			BlockFrequencyInfo *BFI) {
	// Check whether the function has any indirectbrs, and collect which blocks			// Check whether the function has any indirectbrs, and collect which blocks
	// they may jump to. Since most functions don't have indirect branches,			// they may jump to. Since most functions don't have indirect branches,
	// this lowers the common case's overhead to O(Blocks) instead of O(Edges).			// this lowers the common case's overhead to O(Blocks) instead of O(Edges).
	SmallSetVector<BasicBlock *, 16> Targets;			SmallSetVector<BasicBlock *, 16> Targets;
	for (auto &BB : F) {			for (auto &BB : F) {
	auto *IBI = dyn_cast<IndirectBrInst>(BB.getTerminator());			auto *IBI = dyn_cast<IndirectBrInst>(BB.getTerminator());
	if (!IBI)			if (!IBI)
	continue;			continue;

	for (unsigned Succ = 0, E = IBI->getNumSuccessors(); Succ != E; ++Succ)			for (unsigned Succ = 0, E = IBI->getNumSuccessors(); Succ != E; ++Succ)
	Targets.insert(IBI->getSuccessor(Succ));			Targets.insert(IBI->getSuccessor(Succ));
	}			}

	if (Targets.empty())			if (Targets.empty())
	return false;			return false;

	bool ShouldUpdateAnalysis = BPI && BFI;			bool ShouldUpdateAnalysis = BPI && BFI;
	bool Changed = false;			bool Changed = false;
	for (BasicBlock *Target : Targets) {			for (BasicBlock *Target : Targets) {
				if (IgnoreBlocksWithoutPHI && Target->phis().empty())
				continue;

	SmallVector<BasicBlock *, 16> OtherPreds;			SmallVector<BasicBlock *, 16> OtherPreds;
	BasicBlock *IBRPred = findIBRPredecessor(Target, OtherPreds);			BasicBlock *IBRPred = findIBRPredecessor(Target, OtherPreds);
	// If we did not found an indirectbr, or the indirectbr is the only			// If we did not found an indirectbr, or the indirectbr is the only
	// incoming edge, this isn't the kind of edge we're looking for.			// incoming edge, this isn't the kind of edge we're looking for.
	if (!IBRPred \|\| OtherPreds.empty())			if (!IBRPred \|\| OtherPreds.empty())
	continue;			continue;

	// Don't even think about ehpads/landingpads.			// Don't even think about ehpads/landingpads.
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/Transforms/GCOVProfiling/split-indirectbr-critical-edges.ll

Show All 27 Lines	indirect.preheader: ; preds = %for.cond
%idxprom = sext i8 %1 to i64, !dbg !21		%idxprom = sext i8 %1 to i64, !dbg !21
%arrayidx4 = getelementptr inbounds <2 x i8>, <2 x i8>* %targets, i64 0, i64 %idxprom, !dbg !21		%arrayidx4 = getelementptr inbounds <2 x i8>, <2 x i8>* %targets, i64 0, i64 %idxprom, !dbg !21
%2 = load i8, i8* %arrayidx4, align 8, !dbg !21		%2 = load i8, i8* %arrayidx4, align 8, !dbg !21
br label %indirect		br label %indirect

indirect: ; preds = %indirect.preheader, %indirect		indirect: ; preds = %indirect.preheader, %indirect
indirectbr i8* %2, [label %indirect, label %end]		indirectbr i8* %2, [label %indirect, label %end]

		indirect2:
		; For this test we do not want critical edges split. Adding a 2nd `indirectbr`
		modimoUnsubmitted Done Reply Inline Actions nit:s/impossibel/impossible/g nit2: If we're willing to change the labels the indirect gotos jump to we can technically split an arbitrary # of edges. Not that we should, of course.. modimo: nit:s/impossibel/impossible/g nit2: If we're willing to change the labels the indirect gotos…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions Not completely sure what you mean in "nit2", but I changed the wording to make it clear that this is a trick to keep the edge from getting split so the test keeps working (I know more normalization etc. could "fix" this, but we're only running a single pass here) MatzeB: Not completely sure what you mean in "nit2", but I changed the wording to make it clear that…
		modimoUnsubmitted Not Done Reply Inline Actions You got it, "nit2" was that handling this case isn't technically impossible. Thanks for the wording change. modimo: You got it, "nit2" was that handling this case isn't technically impossible. Thanks for the…
		; does the trick.
		indirectbr i8* %2, [label %indirect, label %end]

end: ; preds = %indirect		end: ; preds = %indirect
ret i32 0, !dbg !22		ret i32 0, !dbg !22
}		}

attributes #0 = { norecurse nounwind readonly uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }		attributes #0 = { norecurse nounwind readonly uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.dbg.cu = !{!0}		!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}		!llvm.module.flags = !{!3, !4, !5}
Show All 18 Lines

llvm/test/Transforms/PGOProfile/Inputs/irreducible.proftext

	:ir			:ir
	_Z11irreducibleii			_Z11irreducibleii
	# Func Hash:			# Func Hash:
	287486624745028451			287486624745028451
	# Num Counters:			# Num Counters:
	6			6
	# Counter Values:			# Counter Values:
	1000			1000
	950			950
	100			100
	373			373
	1			1
	0			0

	_Z11irreduciblePh			_Z11irreduciblePh
	# Func Hash:			# Func Hash:
	331779889035882993			52047014671956012
	# Num Counters:			# Num Counters:
	9			9
	# Counter Values:			# Counter Values:
	100
	300			300
	99			99
	300			300
	201			201
	1			1
	1			1
	0			0
	0			0
				0

llvm/test/Transforms/PGOProfile/Inputs/irreducible_entry.proftext

	Show All 9 Lines
	1000			1000
	950			950
	100			100
	373			373
	0			0

	_Z11irreduciblePh			_Z11irreduciblePh
	# Func Hash:			# Func Hash:
	331779889035882993			52047014671956012
	# Num Counters:			# Num Counters:
	9			9
	# Counter Values:			# Counter Values:
	1			1
	100
	300			300
	99			99
	300			300
	201			201
	1			1
	0			0
	0			0
				0

llvm/test/Transforms/PGOProfile/irreducible.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; USE-SAME: !irr_loop ![[INDIRECTGOTO_IRR_LOOP:[0-9]+]]			; USE-SAME: !irr_loop ![[INDIRECTGOTO_IRR_LOOP:[0-9]+]]
	}			}

	; USE: ![[FOR_COND2_IRR_LOOP]] = !{!"loop_header_weight", i64 1050}			; USE: ![[FOR_COND2_IRR_LOOP]] = !{!"loop_header_weight", i64 1050}
	; USE: ![[ENTRY8_IRR_LOOP]] = !{!"loop_header_weight", i64 373}			; USE: ![[ENTRY8_IRR_LOOP]] = !{!"loop_header_weight", i64 373}
	; USE: ![[IF_END9_IRR_LOOP]] = !{!"loop_header_weight", i64 1000}			; USE: ![[IF_END9_IRR_LOOP]] = !{!"loop_header_weight", i64 1000}
	; USE: ![[SW_BB6_IRR_LOOP]] = !{!"loop_header_weight", i64 501}			; USE: ![[SW_BB6_IRR_LOOP]] = !{!"loop_header_weight", i64 501}
	; USE: ![[SW_BB15_IRR_LOOP]] = !{!"loop_header_weight", i64 100}			; USE: ![[SW_BB15_IRR_LOOP]] = !{!"loop_header_weight", i64 100}
	; USE: ![[INDIRECTGOTO_IRR_LOOP]] = !{!"loop_header_weight", i64 400}			; USE: ![[INDIRECTGOTO_IRR_LOOP]] = !{!"loop_header_weight", i64 399}
				modimoUnsubmitted Done Reply Inline Actions I'm curious how the counters shifted around now that indirectgoto->sw.bb gets split. How does the changed counter values in the profiles above map to that? modimo: I'm curious how the counters shifted around now that indirectgoto->sw.bb gets split. How does…
				MatzeBAuthorUnsubmitted Done Reply Inline Actions This is hard to figure out given the size of the test and the fact that the profile files are checked-in as-is. I hope this is fine... MatzeB: This is hard to figure out given the size of the test and the fact that the profile files are…
				modimoUnsubmitted Not Done Reply Inline Actions Took a look because I was curious how to get this information. Adding `-debug-only=pgo-instrumentation` dumps the edges and the instrumented edges line up with the counts in the profile. Comparing the two (irreducible.proftext) left is original right is with this diff: The count of `100` gets inferred now and the new counts are for the split edges as expected. The change that causes the `INDIRECTGOTO_IRR_LOOP` to go from 400->399 is that the probes shifted so that edge 21 and edge 22 on RHS flipped compared to LHS. That's not a big deal though. modimo: Took a look because I was curious how to get this information. Adding `-debug-only=pgo…
				MatzeBAuthorUnsubmitted Done Reply Inline Actions Thanks! MatzeB: Thanks!

llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll

	Show All 37 Lines
	; CHECK: indirectbr i8* %2, [label %for.cond2, label %if.end]			; CHECK: indirectbr i8* %2, [label %for.cond2, label %if.end]
	}			}

	;; If an indirectbr critical edge cannot be split, ignore it.			;; If an indirectbr critical edge cannot be split, ignore it.
	;; The edge will not be profiled.			;; The edge will not be profiled.
	; CHECK-LABEL: @cannot_split(			; CHECK-LABEL: @cannot_split(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: call void @llvm.instrprof.increment			; CHECK-NEXT: call void @llvm.instrprof.increment
				; CHECK: indirect:
	; CHECK-NOT: call void @llvm.instrprof.increment			; CHECK-NOT: call void @llvm.instrprof.increment
				; CHECK: indirect2:
				; CHECK-NEXT: call void @llvm.instrprof.increment
	define i32 @cannot_split(i8* nocapture readonly %p) {			define i32 @cannot_split(i8* nocapture readonly %p) {
	entry:			entry:
	%targets = alloca <2 x i8*>, align 16			%targets = alloca <2 x i8*>, align 16
	store <2 x i8> <i8 blockaddress(@cannot_split, %indirect), i8* blockaddress(@cannot_split, %end)>, <2 x i8> %targets, align 16			store <2 x i8> <i8 blockaddress(@cannot_split, %indirect), i8* blockaddress(@cannot_split, %end)>, <2 x i8> %targets, align 16
	%arrayidx2 = getelementptr inbounds i8, i8* %p, i64 1			%arrayidx2 = getelementptr inbounds i8, i8* %p, i64 1
	%0 = load i8, i8* %arrayidx2			%0 = load i8, i8* %arrayidx2
	%idxprom = sext i8 %0 to i64			%idxprom = sext i8 %0 to i64
	%arrayidx3 = getelementptr inbounds <2 x i8>, <2 x i8>* %targets, i64 0, i64 %idxprom			%arrayidx3 = getelementptr inbounds <2 x i8>, <2 x i8>* %targets, i64 0, i64 %idxprom
	%1 = load i8, i8* %arrayidx3, align 8			%1 = load i8, i8* %arrayidx3, align 8
	br label %indirect			br label %indirect

	indirect: ; preds = %entry, %indirect			indirect: ; preds = %entry, %indirect
				indirectbr i8* %1, [label %indirect, label %end, label %indirect2]

				indirect2:
				; For this test we do not want critical edges split. Adding a 2nd `indirectbr`
				; does the trick.
	indirectbr i8* %1, [label %indirect, label %end]			indirectbr i8* %1, [label %indirect, label %end]

	end: ; preds = %indirect			end: ; preds = %indirect
	ret i32 0			ret i32 0
	}			}

llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp

Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	)IR");
PostDominatorTree PDT(*F);		PostDominatorTree PDT(*F);

CriticalEdgeSplittingOptions CESO(&DT, nullptr, nullptr, &PDT);		CriticalEdgeSplittingOptions CESO(&DT, nullptr, nullptr, &PDT);
EXPECT_EQ(1u, SplitAllCriticalEdges(*F, CESO));		EXPECT_EQ(1u, SplitAllCriticalEdges(*F, CESO));
EXPECT_TRUE(DT.verify());		EXPECT_TRUE(DT.verify());
EXPECT_TRUE(PDT.verify());		EXPECT_TRUE(PDT.verify());
}		}

TEST(BasicBlockUtils, SplitIndirectBrCriticalEdge) {		TEST(BasicBlockUtils, SplitIndirectBrCriticalEdgesIgnorePHIs) {
LLVMContext C;		LLVMContext C;
std::unique_ptr<Module> M = parseIR(C, R"IR(		std::unique_ptr<Module> M = parseIR(C, R"IR(
define void @crit_edge(i8* %cond0, i1 %cond1) {		define void @crit_edge(i8* %tgt, i1 %cond0, i1 %cond1) {
entry:		entry:
indirectbr i8* %cond0, [label %bb0, label %bb1]		indirectbr i8* %tgt, [label %bb0, label %bb1, label %bb2]
bb0:		bb0:
br label %bb1		br i1 %cond0, label %bb1, label %bb2
bb1:		bb1:
%p = phi i32 [0, %bb0], [0, %entry]		%p = phi i32 [0, %bb0], [0, %entry]
br i1 %cond1, label %bb2, label %bb3		br i1 %cond1, label %bb3, label %bb4
bb2:		bb2:
ret void		ret void
bb3:		bb3:
ret void		ret void
		bb4:
		ret void
}		}
)IR");		)IR");
Function *F = M->getFunction("crit_edge");		Function *F = M->getFunction("crit_edge");
DominatorTree DT(*F);		DominatorTree DT(*F);
LoopInfo LI(DT);		LoopInfo LI(DT);
BranchProbabilityInfo BPI(*F, LI);		BranchProbabilityInfo BPI(*F, LI);
BlockFrequencyInfo BFI(*F, BPI, LI);		BlockFrequencyInfo BFI(*F, BPI, LI);

ASSERT_TRUE(SplitIndirectBrCriticalEdges(*F, &BPI, &BFI));		ASSERT_TRUE(SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=*/true,
		&BPI, &BFI));

// Check that successors of the split block get their probability correct.		// Check that successors of the split block get their probability correct.
BasicBlock BB1 = getBasicBlockByName(F, "bb1");		BasicBlock BB1 = getBasicBlockByName(F, "bb1");
BasicBlock *SplitBB = BB1->getTerminator()->getSuccessor(0);		BasicBlock *SplitBB = BB1->getTerminator()->getSuccessor(0);
EXPECT_EQ(2u, SplitBB->getTerminator()->getNumSuccessors());		ASSERT_EQ(2u, SplitBB->getTerminator()->getNumSuccessors());
EXPECT_EQ(BranchProbability(1, 2), BPI.getEdgeProbability(SplitBB, 0u));		EXPECT_EQ(BranchProbability(1, 2), BPI.getEdgeProbability(SplitBB, 0u));
EXPECT_EQ(BranchProbability(1, 2), BPI.getEdgeProbability(SplitBB, 1u));		EXPECT_EQ(BranchProbability(1, 2), BPI.getEdgeProbability(SplitBB, 1u));

		// bb2 has no PHI, so we shouldn't split bb0 -> bb2
		modimoUnsubmitted Done Reply Inline Actions `s/bb3/bb2/g` has no PHI modimo: `s/bb3/bb2/g` has no PHI
		BasicBlock BB0 = getBasicBlockByName(F, "bb0");
		ASSERT_EQ(2u, BB0->getTerminator()->getNumSuccessors());
		EXPECT_EQ(BB0->getTerminator()->getSuccessor(1),
		getBasicBlockByName(*F, "bb2"));
		}

		TEST(BasicBlockUtils, SplitIndirectBrCriticalEdges) {
		LLVMContext C;
		std::unique_ptr<Module> M = parseIR(C, R"IR(
		define void @crit_edge(i8* %tgt, i1 %cond0, i1 %cond1) {
		entry:
		indirectbr i8* %tgt, [label %bb0, label %bb1, label %bb2]
		bb0:
		br i1 %cond0, label %bb1, label %bb2
		bb1:
		%p = phi i32 [0, %bb0], [0, %entry]
		br i1 %cond1, label %bb3, label %bb4
		bb2:
		ret void
		bb3:
		ret void
		bb4:
		ret void
		}
		)IR");
		Function *F = M->getFunction("crit_edge");
		DominatorTree DT(*F);
		LoopInfo LI(DT);
		BranchProbabilityInfo BPI(*F, LI);
		BlockFrequencyInfo BFI(*F, BPI, LI);

		ASSERT_TRUE(SplitIndirectBrCriticalEdges(F, /IgnoreBlocksWithoutPHI=*/false,
		&BPI, &BFI));

		// Check that successors of the split block get their probability correct.
		BasicBlock BB1 = getBasicBlockByName(F, "bb1");
		BasicBlock *SplitBB = BB1->getTerminator()->getSuccessor(0);
		ASSERT_EQ(2u, SplitBB->getTerminator()->getNumSuccessors());
		EXPECT_EQ(BranchProbability(1, 2), BPI.getEdgeProbability(SplitBB, 0u));
		EXPECT_EQ(BranchProbability(1, 2), BPI.getEdgeProbability(SplitBB, 1u));

		// Should split, resulting in:
		// bb0 -> bb2.clone; bb2 -> split1; bb2.clone -> split,
		BasicBlock BB0 = getBasicBlockByName(F, "bb0");
		ASSERT_EQ(2u, BB0->getTerminator()->getNumSuccessors());
		BasicBlock *BB2Clone = BB0->getTerminator()->getSuccessor(1);
		BasicBlock BB2 = getBasicBlockByName(F, "bb2");
		EXPECT_NE(BB2Clone, BB2);
		ASSERT_EQ(1u, BB2->getTerminator()->getNumSuccessors());
		ASSERT_EQ(1u, BB2Clone->getTerminator()->getNumSuccessors());
		EXPECT_EQ(BB2->getTerminator()->getSuccessor(0),
		BB2Clone->getTerminator()->getSuccessor(0));
}		}

TEST(BasicBlockUtils, SetEdgeProbability) {		TEST(BasicBlockUtils, SetEdgeProbability) {
LLVMContext C;		LLVMContext C;
std::unique_ptr<Module> M = parseIR(C, R"IR(		std::unique_ptr<Module> M = parseIR(C, R"IR(
define void @edge_probability(i32 %0) {		define void @edge_probability(i32 %0) {
entry:		entry:
switch i32 %0, label %LD [		switch i32 %0, label %LD [
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 410887

llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Transforms/Instrumentation/GCOVProfiling.cpp

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

llvm/lib/Transforms/Utils/BreakCriticalEdges.cpp

llvm/test/Transforms/GCOVProfiling/split-indirectbr-critical-edges.ll

llvm/test/Transforms/PGOProfile/Inputs/irreducible.proftext

llvm/test/Transforms/PGOProfile/Inputs/irreducible_entry.proftext

llvm/test/Transforms/PGOProfile/irreducible.ll

llvm/test/Transforms/PGOProfile/split-indirectbr-critical-edges.ll

llvm/unittests/Transforms/Utils/BasicBlockUtilsTest.cpp

PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs
ClosedPublic