Diff 66146

lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 625 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::hasBetterLayoutPredecessor(
// means the cost of topological order is greater.		// means the cost of topological order is greater.
// When profile data is not available, however, we need to be more		// When profile data is not available, however, we need to be more
// conservative. If the branch prediction is wrong, breaking the topo-order		// conservative. If the branch prediction is wrong, breaking the topo-order
// will actually yield a layout with large cost. For this reason, we need		// will actually yield a layout with large cost. For this reason, we need
// strong biased branch at block S with Prob(S->BB) in order to select		// strong biased branch at block S with Prob(S->BB) in order to select
// BB->Succ. This is equivalent to looking the CFG backward with backward		// BB->Succ. This is equivalent to looking the CFG backward with backward
// edge: Prob(Succ->BB) needs to >= HotProb in order to be selected (without		// edge: Prob(Succ->BB) needs to >= HotProb in order to be selected (without
// profile data).		// profile data).
		// --------------------------------------------------------------------------
		// Case 3: forked diamond
		// S
		// / \
		// / \
		// BB Pred
		davidxlUnsubmitted Done Reply Inline Actions Nit: can you make the art work like the following to not split S2 into two 'blocks': // // Head (Or Entry, Top) // / \ // / \ // BB Pred // / \ / \| // \| S1 \| // \ / // S2 // davidxl: Nit: can you make the art work like the following to not split S2 into two 'blocks': // //…
		// \| \ / \|
		// \| \ / \|
		// \| X \|
		// \| / \ \|
		davidxlUnsubmitted Done Reply Inline Actions Nit: can you make the art work like the following to not split S2 into two 'blocks': // // Head (Or Entry, Top) // / \ // / \ // BB Pred // / \ / \| // \| S1 \| // \ / // S2 // davidxl: Nit: can you make the art work like the following to not split S2 into two 'blocks': // //…
		// \| / \ \|
		// S1 S2
		//
		// The current block is BB and edge BB->S1 is now being evaluated.
		// As above S->BB was already selected because
		// prob(S->BB) > prob(S->Pred). Assume that prob(BB->S1) >= prob(BB->S2).
		//
		// topo-order:
		//
		// S-------\| ---S
		// \| \| \| \|
		// ---BB \| \| BB
		// \| \| \| \|
		// \| Pred----\| \| S1----
		// \| \| \| \|
		// --(S1 or S2) ---Pred--
		//
		davidxlUnsubmitted Done Reply Inline Actions layed out --> laid out davidxl: layed out --> laid out
		// topo-cost = freq(S->Pred) + freq(BB->S1) + freq(BB->S2)
		davidxlUnsubmitted Done Reply Inline Actions Another way to explain in terms of savings instead of cost: the savings is the total freq of the fall through edges. In topo case, the savings is freq(S->BB) + max(freq(Pred->S1), freq(Pred->S2). (1) For non-top case, the saving is: freq(S->BB) + freq(BB->S1) + freq(Pred->S2) (2) When freq(Pred->S2) > freq(Pred->S1), (2) is strictly larger than (1). In the opposite case, the check below will also lead to (2) > (1) davidxl: Another way to explain in terms of savings instead of cost: the savings is the total freq of…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions I think I'll stick with cost, as that's how the other 2 cases are explained. iteratee: I think I'll stick with cost, as that's how the other 2 cases are explained.
		// + min(freq(Pred->S1), freq(Pred->S2))
		// Non-topo-order cost:
		// In the worst case, S2 will not get laid out after Pred.
		// non-topo-cost = 2 * freq(S->Pred) + freq(BB->S2).
		// To be conservative, we can assume that min(freq(Pred->S1), freq(Pred->S2))
		// is 0. Then the non topo layout is better when
		// freq(S->Pred) < freq(BB->S1).
		// This is exactly what is checked below.
		// Note there are other shapes that apply (Pred may not be a single block,
		// but they all fit this general pattern.)
BranchProbability HotProb = getLayoutSuccessorProbThreshold(BB);		BranchProbability HotProb = getLayoutSuccessorProbThreshold(BB);

// Forward checking. For case 2, SuccProb will be 1.
if (SuccProb < HotProb) {
DEBUG(dbgs() << " Not a candidate: " << getBlockName(Succ) << " "
<< "Respecting topological ordering because "
<< "probability is less than prob treshold: "
<< SuccProb << "\n");
return true;
}

// Make sure that a hot successor doesn't have a globally more		// Make sure that a hot successor doesn't have a globally more
// important predecessor.		// important predecessor.
BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;		BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;
bool BadCFGConflict = false;		bool BadCFGConflict = false;

for (MachineBasicBlock *Pred : Succ->predecessors()) {		for (MachineBasicBlock *Pred : Succ->predecessors()) {
if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|		if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|
(BlockFilter && !BlockFilter->count(Pred)) \|\|		(BlockFilter && !BlockFilter->count(Pred)) \|\|
BlockToChain[Pred] == &Chain)		BlockToChain[Pred] == &Chain)
continue;		continue;
// Do backward checking. For case 1, it is actually redundant check. For		// Do backward checking.
// case 2 above, we need a backward checking to filter out edges that are		// For all cases above, we need a backward checking to filter out edges that
		davidxlUnsubmitted Done Reply Inline Actions For case 1 and 2 davidxl: For case 1 and 2
// not 'strongly' biased. With profile data available, the check is mostly		// are not 'strongly' biased. With profile data available, the check is
// redundant too (when threshold prob is set at 50%) unless S has more than		// mostly redundant for case 2 (when threshold prob is set at 50%) unless S
// two successors.		// has more than two successors.
// BB Pred		// BB Pred
		davidxlUnsubmitted Done Reply Inline Actions This comment does not fit here. With profile data, such check won't be skipped, but just does not need to be as biased. davidxl: This comment does not fit here. With profile data, such check won't be skipped, but just does…
// \ /		// \ /
// Succ		// Succ
// We select edge BB->Succ if		// We select edge BB->Succ if
// freq(BB->Succ) > freq(Succ) * HotProb		// freq(BB->Succ) > freq(Succ) * HotProb
// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *		// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *
// HotProb		// HotProb
// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb		// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb
		// Case 1 is covered too, because the first equation reduces to:
		// prob(BB->Succ) > HotProb. (freq(Succ) = freq(BB) for a triangle)
BlockFrequency PredEdgeFreq =		BlockFrequency PredEdgeFreq =
MBFI->getBlockFreq(Pred) * MBPI->getEdgeProbability(Pred, Succ);		MBFI->getBlockFreq(Pred) * MBPI->getEdgeProbability(Pred, Succ);
if (PredEdgeFreq * HotProb >= CandidateEdgeFreq * HotProb.getCompl()) {		if (PredEdgeFreq * HotProb >= CandidateEdgeFreq * HotProb.getCompl()) {
BadCFGConflict = true;		BadCFGConflict = true;
break;		break;
}		}
}		}

▲ Show 20 Lines • Show All 1,140 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-andCmpBrToTBZ.ll

; RUN: llc -O1 -mtriple=arm64-apple-ios7.0.0 -enable-andcmp-sinking=true < %s \| FileCheck %s		; RUN: llc -O1 -mtriple=arm64-apple-ios7.0.0 -enable-andcmp-sinking=true < %s \| FileCheck %s
; ModuleID = 'and-cbz-extr-mr.bc'		; ModuleID = 'and-cbz-extr-mr.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32:64-S128"

define zeroext i1 @foo(i1 %IsEditable, i1 %isTextField, i8* %str1, i8* %str2, i8* %str3, i8* %str4, i8* %str5, i8* %str6, i8* %str7, i8* %str8, i8* %str9, i8* %str10, i8* %str11, i8* %str12, i8* %str13, i32 %int1, i8* %str14) unnamed_addr #0 align 2 {		define zeroext i1 @foo(i1 %IsEditable, i1 %isTextField, i8* %str1, i8* %str2, i8* %str3, i8* %str4, i8* %str5, i8* %str6, i8* %str7, i8* %str8, i8* %str9, i8* %str10, i8* %str11, i8* %str12, i8* %str13, i32 %int1, i8* %str14) unnamed_addr #0 align 2 {
; CHECK: _foo:		; CHECK: _foo:
entry:		entry:
%tobool = icmp eq i8* %str14, null		%tobool = icmp eq i8* %str14, null
br i1 %tobool, label %return, label %if.end		br i1 %tobool, label %return, label %if.end

; CHECK: %if.end		; CHECK: %if.end
; CHECK: tbz		; CHECK: tbz
if.end: ; preds = %entry		if.end: ; preds = %entry
%and.i.i.i = and i32 %int1, 4		%and.i.i.i = and i32 %int1, 4
%tobool.i.i.i = icmp eq i32 %and.i.i.i, 0		%tobool.i.i.i = icmp eq i32 %and.i.i.i, 0
br i1 %tobool.i.i.i, label %if.end12, label %land.rhs.i		br i1 %tobool.i.i.i, label %if.end12, label %land.rhs.i, !prof !1

land.rhs.i: ; preds = %if.end		land.rhs.i: ; preds = %if.end
%cmp.i.i.i = icmp eq i8* %str12, %str13		%cmp.i.i.i = icmp eq i8* %str12, %str13
br i1 %cmp.i.i.i, label %if.then3, label %lor.rhs.i.i.i		br i1 %cmp.i.i.i, label %if.then3, label %lor.rhs.i.i.i

lor.rhs.i.i.i: ; preds = %land.rhs.i		lor.rhs.i.i.i: ; preds = %land.rhs.i
%cmp.i13.i.i.i = icmp eq i8* %str10, %str11		%cmp.i13.i.i.i = icmp eq i8* %str10, %str11
br i1 %cmp.i13.i.i.i, label %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit, label %if.end5		br i1 %cmp.i13.i.i.i, label %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit, label %if.end5

_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit: ; preds = %lor.rhs.i.i.i		_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit: ; preds = %lor.rhs.i.i.i
%cmp.i.i.i.i = icmp eq i8* %str8, %str9		%cmp.i.i.i.i = icmp eq i8* %str8, %str9
br i1 %cmp.i.i.i.i, label %if.then3, label %if.end5		br i1 %cmp.i.i.i.i, label %if.then3, label %if.end5

if.then3: ; preds = %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit, %land.rhs.i		if.then3: ; preds = %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit, %land.rhs.i
%tmp11 = load i8, i8* %str14, align 8		%tmp11 = load i8, i8* %str14, align 8
%tmp12 = and i8 %tmp11, 2		%tmp12 = and i8 %tmp11, 2
%tmp13 = icmp ne i8 %tmp12, 0		%tmp13 = icmp ne i8 %tmp12, 0
br label %return		br label %return

if.end5: ; preds = %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit, %lor.rhs.i.i.i		if.end5: ; preds = %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit, %lor.rhs.i.i.i
; CHECK: %if.end5		; CHECK: %if.end5
; CHECK: tbz		; CHECK: tbz
br i1 %tobool.i.i.i, label %if.end12, label %land.rhs.i19		br i1 %tobool.i.i.i, label %if.end12, label %land.rhs.i19, !prof !1

land.rhs.i19: ; preds = %if.end5		land.rhs.i19: ; preds = %if.end5
%cmp.i.i.i18 = icmp eq i8* %str6, %str7		%cmp.i.i.i18 = icmp eq i8* %str6, %str7
br i1 %cmp.i.i.i18, label %if.then7, label %lor.rhs.i.i.i23		br i1 %cmp.i.i.i18, label %if.then7, label %lor.rhs.i.i.i23

lor.rhs.i.i.i23: ; preds = %land.rhs.i19		lor.rhs.i.i.i23: ; preds = %land.rhs.i19
%cmp.i13.i.i.i22 = icmp eq i8* %str3, %str4		%cmp.i13.i.i.i22 = icmp eq i8* %str3, %str4
br i1 %cmp.i13.i.i.i22, label %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit28, label %if.end12		br i1 %cmp.i13.i.i.i22, label %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit28, label %if.end12
Show All 16 Lines	if.end12: ; preds = %if.then7, %_ZNK7WebCore4Node10hasTagNameERKNS_13QualifiedNameE.exit28, %lor.rhs.i.i.i23, %if.end5, %if.end
br label %return		br label %return

return: ; preds = %if.end12, %if.then9, %if.then3, %entry		return: ; preds = %if.end12, %if.then9, %if.then3, %entry
%retval.0 = phi i1 [ %tmp13, %if.then3 ], [ %tmp25, %if.then9 ], [ %lnot, %if.end12 ], [ true, %entry ]		%retval.0 = phi i1 [ %tmp13, %if.then3 ], [ %tmp25, %if.then9 ], [ %lnot, %if.end12 ], [ true, %entry ]
ret i1 %retval.0		ret i1 %retval.0
}		}

attributes #0 = { nounwind ssp }		attributes #0 = { nounwind ssp }
		!1 = !{!"branch_weights", i32 3, i32 5}

test/CodeGen/AArch64/compare-branch.ll

	; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu \| FileCheck %s

	@var32 = global i32 0			@var32 = global i32 0
	@var64 = global i64 0			@var64 = global i64 0

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:

	%val1 = load volatile i32, i32* @var32			%val1 = load volatile i32, i32* @var32
	%tst1 = icmp eq i32 %val1, 0			%tst1 = icmp eq i32 %val1, 0
	br i1 %tst1, label %end, label %test2			br i1 %tst1, label %end, label %test2, !prof !1
	; CHECK: cbz {{w[0-9]+}}, .LBB			; CHECK: cbz {{w[0-9]+}}, .LBB

	test2:			test2:
	%val2 = load volatile i32, i32* @var32			%val2 = load volatile i32, i32* @var32
	%tst2 = icmp ne i32 %val2, 0			%tst2 = icmp ne i32 %val2, 0
	br i1 %tst2, label %end, label %test3			br i1 %tst2, label %end, label %test3, !prof !1
	; CHECK: cbnz {{w[0-9]+}}, .LBB			; CHECK: cbnz {{w[0-9]+}}, .LBB

	test3:			test3:
	%val3 = load volatile i64, i64* @var64			%val3 = load volatile i64, i64* @var64
	%tst3 = icmp eq i64 %val3, 0			%tst3 = icmp eq i64 %val3, 0
	br i1 %tst3, label %end, label %test4			br i1 %tst3, label %end, label %test4, !prof !1
	; CHECK: cbz {{x[0-9]+}}, .LBB			; CHECK: cbz {{x[0-9]+}}, .LBB

	test4:			test4:
	%val4 = load volatile i64, i64* @var64			%val4 = load volatile i64, i64* @var64
	%tst4 = icmp ne i64 %val4, 0			%tst4 = icmp ne i64 %val4, 0
	br i1 %tst4, label %end, label %test5			br i1 %tst4, label %end, label %test5, !prof !1
	; CHECK: cbnz {{x[0-9]+}}, .LBB			; CHECK: cbnz {{x[0-9]+}}, .LBB

	test5:			test5:
	store volatile i64 %val4, i64* @var64			store volatile i64 %val4, i64* @var64
	ret void			ret void

	end:			end:
	ret void			ret void
	}			}


				!1 = !{!"branch_weights", i32 1, i32 1}

test/CodeGen/AArch64/logical_shifted_reg.ll

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: flag_setting:			; CHECK-LABEL: flag_setting:
	%val1 = load i64, i64* @var1_64			%val1 = load i64, i64* @var1_64
	%val2 = load i64, i64* @var2_64			%val2 = load i64, i64* @var2_64

	; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}			; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}
	; CHECK: b.gt .L			; CHECK: b.gt .L
	%simple_and = and i64 %val1, %val2			%simple_and = and i64 %val1, %val2
	%tst1 = icmp sgt i64 %simple_and, 0			%tst1 = icmp sgt i64 %simple_and, 0
	br i1 %tst1, label %ret, label %test2			br i1 %tst1, label %ret, label %test2, !prof !1

	test2:			test2:
	; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}, lsl #63			; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}, lsl #63
	; CHECK: b.lt .L			; CHECK: b.lt .L
	%shifted_op = shl i64 %val2, 63			%shifted_op = shl i64 %val2, 63
	%shifted_and = and i64 %val1, %shifted_op			%shifted_and = and i64 %val1, %shifted_op
	%tst2 = icmp slt i64 %shifted_and, 0			%tst2 = icmp slt i64 %shifted_and, 0
	br i1 %tst2, label %ret, label %test3			br i1 %tst2, label %ret, label %test3, !prof !1

	test3:			test3:
	; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}, asr #12			; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}, asr #12
	; CHECK: b.gt .L			; CHECK: b.gt .L
	%asr_op = ashr i64 %val2, 12			%asr_op = ashr i64 %val2, 12
	%asr_and = and i64 %asr_op, %val1			%asr_and = and i64 %asr_op, %val1
	%tst3 = icmp sgt i64 %asr_and, 0			%tst3 = icmp sgt i64 %asr_and, 0
	br i1 %tst3, label %ret, label %other_exit			br i1 %tst3, label %ret, label %other_exit, !prof !1

	other_exit:			other_exit:
	store volatile i64 %val1, i64* @var1_64			store volatile i64 %val1, i64* @var1_64
	ret void			ret void
	ret:			ret:
	ret void			ret void
	}			}

				!1 = !{!"branch_weights", i32 1, i32 1}

test/CodeGen/SystemZ/tdc-06.ll

	; Test the Test Data Class instruction, as used by fpclassify.			; Test the Test Data Class instruction, as used by fpclassify.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s
	;			;

	declare float @llvm.fabs.f32(float)			declare float @llvm.fabs.f32(float)
	declare double @llvm.fabs.f64(double)			declare double @llvm.fabs.f64(double)
	declare fp128 @llvm.fabs.f128(fp128)			declare fp128 @llvm.fabs.f128(fp128)

	define i32 @fpc(double %x) {			define i32 @fpc(double %x) {
	entry:			entry:
	; CHECK-LABEL: fpc			; CHECK-LABEL: fpc
	; CHECK: lhi %r2, 5			; CHECK: lhi %r2, 5
	; CHECK: ltdbr %f0, %f0			; CHECK: ltdbr %f0, %f0
	; CHECK: je [[RET:.L.*]]			; CHECK: je [[RET:.L.*]]
	%testeq = fcmp oeq double %x, 0.000000e+00			%testeq = fcmp oeq double %x, 0.000000e+00
	br i1 %testeq, label %ret, label %nonzero			br i1 %testeq, label %ret, label %nonzero, !prof !1

	nonzero:			nonzero:
	; CHECK: lhi %r2, 1			; CHECK: lhi %r2, 1
	; CHECK: cdbr %f0, %f0			; CHECK: cdbr %f0, %f0
	; CHECK: jo [[RET]]			; CHECK: jo [[RET]]
	%testnan = fcmp uno double %x, 0.000000e+00			%testnan = fcmp uno double %x, 0.000000e+00
	br i1 %testnan, label %ret, label %nonzeroord			br i1 %testnan, label %ret, label %nonzeroord, !prof !1

	nonzeroord:			nonzeroord:
	; CHECK: lhi %r2, 2			; CHECK: lhi %r2, 2
	; CHECK: tcdb %f0, 48			; CHECK: tcdb %f0, 48
	; CHECK: jl [[RET]]			; CHECK: jl [[RET]]
	%abs = tail call double @llvm.fabs.f64(double %x)			%abs = tail call double @llvm.fabs.f64(double %x)
	%testinf = fcmp oeq double %abs, 0x7FF0000000000000			%testinf = fcmp oeq double %abs, 0x7FF0000000000000
	br i1 %testinf, label %ret, label %finite			br i1 %testinf, label %ret, label %finite, !prof !1

	finite:			finite:
	; CHECK: lhi %r2, 3			; CHECK: lhi %r2, 3
	; CHECK: tcdb %f0, 831			; CHECK: tcdb %f0, 831
	; CHECK: blr %r14			; CHECK: blr %r14
	; CHECK: lhi %r2, 4			; CHECK: lhi %r2, 4
	%testnormal = fcmp uge double %abs, 0x10000000000000			%testnormal = fcmp uge double %abs, 0x10000000000000
	%finres = select i1 %testnormal, i32 3, i32 4			%finres = select i1 %testnormal, i32 3, i32 4
	br label %ret			br label %ret

	ret:			ret:
	; CHECK: [[RET]]:			; CHECK: [[RET]]:
	; CHECK: br %r14			; CHECK: br %r14
	%res = phi i32 [ 5, %entry ], [ 1, %nonzero ], [ 2, %nonzeroord ], [ %finres, %finite ]			%res = phi i32 [ 5, %entry ], [ 1, %nonzero ], [ 2, %nonzeroord ], [ %finres, %finite ]
	ret i32 %res			ret i32 %res
	}			}

				!1 = !{!"branch_weights", i32 1, i32 1}

test/CodeGen/X86/block-placement.ll

Show First 20 Lines • Show All 1,277 Lines • ▼ Show 20 Lines	then:
call void @hot_function()		call void @hot_function()
br label %exit		br label %exit

exit:		exit:
call void @hot_function()		call void @hot_function()
ret void		ret void
}		}

		declare void @a()
		declare void @b()

		define void @test_forked_hot_diamond(i32* %a) {
		; Test that a hot-branch with probability > 80% followed by a 50/50 branch
		; will not place the cold predecessor if the probability for the fallthrough
		; remains above 80%
		; CHECK-LABEL: test_forked_hot_diamond
		; CHECK: %entry
		; CHECK: %then
		; CHECK: %fork1
		; CHECK: %else
		; CHECK: %fork2
		; CHECK: %exit
		entry:
		%gep1 = getelementptr i32, i32* %a, i32 1
		%val1 = load i32, i32* %gep1
		%cond1 = icmp ugt i32 %val1, 1
		br i1 %cond1, label %then, label %else, !prof !5

		then:
		call void @hot_function()
		%gep2 = getelementptr i32, i32* %a, i32 2
		%val2 = load i32, i32* %gep2
		%cond2 = icmp ugt i32 %val2, 2
		br i1 %cond2, label %fork1, label %fork2, !prof !8

		else:
		call void @cold_function()
		%gep3 = getelementptr i32, i32* %a, i32 3
		%val3 = load i32, i32* %gep3
		%cond3 = icmp ugt i32 %val3, 3
		br i1 %cond3, label %fork1, label %fork2, !prof !8

		fork1:
		call void @a()
		br label %exit

		fork2:
		call void @b()
		br label %exit

		exit:
		call void @hot_function()
		ret void
		}

		define void @test_forked_hot_diamond_gets_cold(i32* %a) {
		; Test that a hot-branch with probability > 80% followed by a 50/50 branch
		davidxlUnsubmitted Done Reply Inline Actions I think we probably don't need to test the exact boundary condition here to make the test more robust (say when hotprob is tuned). I suggest making the branch at then1 block to have (60%, 40%) branch prob distribution to make the case more obvious (such that the backward branch prob of edge then2->fork1 is around 70%. davidxl: I think we probably don't need to test the exact boundary condition here to make the test more…
		; will place the cold predecessor if the probability for the fallthrough
		; falls below 80%
		; The probability for both branches is 85%. For then2 vs else1
		; this results in a compounded probability of 83%.
		; Neither then2->fork1 nor then2->fork2 has a large enough relative
		; probability to break the CFG.
		; Relative probs:
		; then2 -> fork1 vs else1 -> fork1 = 71%
		; then2 -> fork2 vs else2 -> fork2 = 74%
		; CHECK-LABEL: test_forked_hot_diamond_gets_cold
		; CHECK: %entry
		; CHECK: %then1
		; CHECK: %then2
		; CHECK: %else1
		; CHECK: %fork1
		; CHECK: %else2
		; CHECK: %fork2
		; CHECK: %exit
		entry:
		%gep1 = getelementptr i32, i32* %a, i32 1
		%val1 = load i32, i32* %gep1
		%cond1 = icmp ugt i32 %val1, 1
		br i1 %cond1, label %then1, label %else1, !prof !9

		then1:
		call void @hot_function()
		%gep2 = getelementptr i32, i32* %a, i32 2
		%val2 = load i32, i32* %gep2
		%cond2 = icmp ugt i32 %val2, 2
		br i1 %cond2, label %then2, label %else2, !prof !9

		else1:
		call void @cold_function()
		br label %fork1

		then2:
		call void @hot_function()
		%gep3 = getelementptr i32, i32* %a, i32 3
		%val3 = load i32, i32* %gep2
		%cond3 = icmp ugt i32 %val2, 3
		br i1 %cond3, label %fork1, label %fork2, !prof !8

		else2:
		call void @cold_function()
		br label %fork2

		fork1:
		call void @a()
		br label %exit

		fork2:
		call void @b()
		br label %exit

		exit:
		call void @hot_function()
		ret void
		}

		define void @test_forked_hot_diamond_stays_hot(i32* %a) {
		; Test that a hot-branch with probability > 88.88% (1:8) followed by a 50/50
		; branch will not place the cold predecessor as the probability for the
		; fallthrough stays above 80%
		; (1:8) followed by (1:1) is still (1:4)
		; Here we use 90% probability because two in a row
		; have a 89 % probability vs the original branch.
		; CHECK-LABEL: test_forked_hot_diamond_stays_hot
		; CHECK: %entry
		; CHECK: %then1
		; CHECK: %then2
		; CHECK: %fork1
		; CHECK: %else1
		; CHECK: %else2
		; CHECK: %fork2
		; CHECK: %exit
		entry:
		%gep1 = getelementptr i32, i32* %a, i32 1
		%val1 = load i32, i32* %gep1
		%cond1 = icmp ugt i32 %val1, 1
		br i1 %cond1, label %then1, label %else1, !prof !10

		then1:
		call void @hot_function()
		%gep2 = getelementptr i32, i32* %a, i32 2
		%val2 = load i32, i32* %gep2
		%cond2 = icmp ugt i32 %val2, 2
		br i1 %cond2, label %then2, label %else2, !prof !10

		else1:
		call void @cold_function()
		br label %fork1

		then2:
		call void @hot_function()
		%gep3 = getelementptr i32, i32* %a, i32 3
		%val3 = load i32, i32* %gep2
		%cond3 = icmp ugt i32 %val2, 3
		br i1 %cond3, label %fork1, label %fork2, !prof !8

		else2:
		call void @cold_function()
		br label %fork2

		fork1:
		call void @a()
		br label %exit

		fork2:
		call void @b()
		br label %exit

		exit:
		call void @hot_function()
		ret void
		}

!5 = !{!"branch_weights", i32 84, i32 16}		!5 = !{!"branch_weights", i32 84, i32 16}
!6 = !{!"function_entry_count", i32 10}		!6 = !{!"function_entry_count", i32 10}
!7 = !{!"branch_weights", i32 60, i32 40}		!7 = !{!"branch_weights", i32 60, i32 40}
		!8 = !{!"branch_weights", i32 5001, i32 4999}
		!9 = !{!"branch_weights", i32 85, i32 15}
		!10 = !{!"branch_weights", i32 90, i32 10}

This is an archive of the discontinued LLVM Phabricator instance.

Codegen: MachineBlockPlacement Improve probability layout.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 66146

lib/CodeGen/MachineBlockPlacement.cpp

test/CodeGen/AArch64/arm64-andCmpBrToTBZ.ll

test/CodeGen/AArch64/compare-branch.ll

test/CodeGen/AArch64/logical_shifted_reg.ll

test/CodeGen/SystemZ/tdc-06.ll

test/CodeGen/X86/block-placement.ll

This is an archive of the discontinued LLVM Phabricator instance.

Codegen: MachineBlockPlacement Improve probability layout.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 66146

lib/CodeGen/MachineBlockPlacement.cpp

test/CodeGen/AArch64/arm64-andCmpBrToTBZ.ll

test/CodeGen/AArch64/compare-branch.ll

test/CodeGen/AArch64/logical_shifted_reg.ll

test/CodeGen/SystemZ/tdc-06.ll

test/CodeGen/X86/block-placement.ll

Codegen: MachineBlockPlacement Improve probability layout.
ClosedPublic