This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Correctly adjust branch probability in PPCReduceCRLogicals
ClosedPublic

Authored by Carrot on May 24 2019, 3:19 PM.

Download Raw Diff

Details

Reviewers

hfinkel
echristo
nemanjai
kbarton
xur
congh

Commits

rGc3a24e93d527: [PPC] Correctly adjust branch probability in PPCReduceCRLogicals
rL362237: [PPC] Correctly adjust branch probability in PPCReduceCRLogicals

Summary

In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.

condc = conda || condb
br condc, label %target, label %fallthrough

It can be transformed to following,

br conda, label %target, label %newbb

newbb:

br condb, label %target, label %fallthrough

Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.

This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.

Diff Detail

Event Timeline

Carrot created this revision.May 24 2019, 3:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2019, 3:19 PM

Herald added subscribers: llvm-commits, jsji, kristina. · View Herald Transcript

hfinkel added inline comments.May 24 2019, 3:28 PM

include/llvm/Support/BranchProbability.h
173	This would be the first use of floating-point here, right? Can we do this any other way?

hfinkel added a reviewer: congh.May 24 2019, 3:29 PM

Carrot marked an inline comment as done.May 28 2019, 8:15 AM

Carrot added inline comments.

include/llvm/Support/BranchProbability.h
173	Suppose the original branch probability to Target is P0, after splitting MBB, there are two branches to Target, the probability to it is P1 and P2. If the edge frequency to Target is same, then F * P1 = F * P0 / 2 ==> P1 = P0 / 2 F * (1 - P1) * P2 = F * P1 ==> P2 = P1 / (1 - P1) With this distribution we can avoid square root.

Changed the edge frequency distribution, so I can avoid floating point square root computation.

xur added inline comments.May 28 2019, 3:46 PM

lib/Target/PowerPC/PPCReduceCRLogicals.cpp
171	NIT: (1) "need" -> "needs". (2) Please also explain what is P0, P1, P2. A diagram would be helpful.

Carrot updated this revision to Diff 201970.May 29 2019, 10:17 AM

Carrot marked an inline comment as done.

Hi Carrot,
I agree with this change, conceptually. Have you done any performance measurements to see what the impact is?

looks good to me.

lib/Target/PowerPC/PPCReduceCRLogicals.cpp
187	NIT: Probably use ProbFallThough?

This revision is now accepted and ready to land.May 30 2019, 10:52 AM

In D62430#1523424, @kbarton wrote:

Hi Carrot,
I agree with this change, conceptually. Have you done any performance measurements to see what the impact is?

I didn't measure its impact, and it should not have any performance impact because currently it is disabled by default.

When I worked on code layout improvements, I got an unreasonable result for test case CodeGen/PowerPC/tail-dup-layout.ll because of this bad branch probability adjustment. So I fixed it with this patch.

Carrot marked an inline comment as done.May 30 2019, 11:38 AM

Carrot added inline comments.

lib/Target/PowerPC/PPCReduceCRLogicals.cpp
187	I use the current expression to make it consistent with the formula in line 178.

Closed by commit rL362237: [PPC] Correctly adjust branch probability in PPCReduceCRLogicals (authored by Carrot). · Explain WhyMay 31 2019, 9:08 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Support/

BranchProbability.h

13 lines

lib/

Target/

PowerPC/

PPCReduceCRLogicals.cpp

39 lines

test/

CodeGen/

PowerPC/

reduce_cr.ll

88 lines

select-i1-vs-i1.ll

2 lines

Diff 201714

include/llvm/Support/BranchProbability.h

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:

BranchProbability &operator*=(uint32_t RHS) {		BranchProbability &operator*=(uint32_t RHS) {
assert(N != UnknownN &&		assert(N != UnknownN &&
"Unknown probability cannot participate in arithmetics.");		"Unknown probability cannot participate in arithmetics.");
N = (uint64_t(N) * RHS > D) ? D : N * RHS;		N = (uint64_t(N) * RHS > D) ? D : N * RHS;
return *this;		return *this;
}		}

		BranchProbability &operator/=(BranchProbability RHS) {
		assert(N != UnknownN && RHS.N != UnknownN &&
		"Unknown probability cannot participate in arithmetics.");
		N = (static_cast<uint64_t>(N) * D + RHS.N / 2) / RHS.N;
		return *this;
		}

BranchProbability &operator/=(uint32_t RHS) {		BranchProbability &operator/=(uint32_t RHS) {
assert(N != UnknownN &&		assert(N != UnknownN &&
"Unknown probability cannot participate in arithmetics.");		"Unknown probability cannot participate in arithmetics.");
assert(RHS > 0 && "The divider cannot be zero.");		assert(RHS > 0 && "The divider cannot be zero.");
N /= RHS;		N /= RHS;
return *this;		return *this;
}		}

Show All 16 Lines	public:
}		}

BranchProbability operator*(uint32_t RHS) const {		BranchProbability operator*(uint32_t RHS) const {
BranchProbability Prob(*this);		BranchProbability Prob(*this);
Prob *= RHS;		Prob *= RHS;
return Prob;		return Prob;
}		}

		BranchProbability operator/(BranchProbability RHS) const {
		BranchProbability Prob(*this);
		Prob /= RHS;
		return Prob;
		}

BranchProbability operator/(uint32_t RHS) const {		BranchProbability operator/(uint32_t RHS) const {
BranchProbability Prob(*this);		BranchProbability Prob(*this);
Prob /= RHS;		Prob /= RHS;
return Prob;		return Prob;
}		}

bool operator==(BranchProbability RHS) const { return N == RHS.N; }		bool operator==(BranchProbability RHS) const { return N == RHS.N; }
bool operator!=(BranchProbability RHS) const { return !(*this == RHS); }		bool operator!=(BranchProbability RHS) const { return !(*this == RHS); }
		hfinkelUnsubmitted Not Done Reply Inline Actions This would be the first use of floating-point here, right? Can we do this any other way? hfinkel: This would be the first use of floating-point here, right? Can we do this any other way?
		CarrotAuthorUnsubmitted Done Reply Inline Actions Suppose the original branch probability to Target is P0, after splitting MBB, there are two branches to Target, the probability to it is P1 and P2. If the edge frequency to Target is same, then F * P1 = F * P0 / 2 ==> P1 = P0 / 2 F * (1 - P1) * P2 = F * P1 ==> P2 = P1 / (1 - P1) With this distribution we can avoid square root. Carrot: Suppose the original branch probability to Target is P0, after splitting MBB, there are two…

bool operator<(BranchProbability RHS) const {		bool operator<(BranchProbability RHS) const {
assert(N != UnknownN && RHS.N != UnknownN &&		assert(N != UnknownN && RHS.N != UnknownN &&
"Unknown probability cannot participate in comparisons.");		"Unknown probability cannot participate in comparisons.");
return N < RHS.N;		return N < RHS.N;
}		}

bool operator>(BranchProbability RHS) const {		bool operator>(BranchProbability RHS) const {
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCReduceCRLogicals.cpp

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	unsigned InvertedOpcode =
: OrigBROpcode == PPC::BCLR ? PPC::BCLRn : PPC::BCLR;		: OrigBROpcode == PPC::BCLR ? PPC::BCLRn : PPC::BCLR;
unsigned NewBROpcode = BSI.InvertNewBranch ? InvertedOpcode : OrigBROpcode;		unsigned NewBROpcode = BSI.InvertNewBranch ? InvertedOpcode : OrigBROpcode;
MachineBasicBlock *OrigTarget = BSI.OrigBranch->getOperand(1).getMBB();		MachineBasicBlock *OrigTarget = BSI.OrigBranch->getOperand(1).getMBB();
MachineBasicBlock OrigFallThrough = OrigTarget == ThisMBB->succ_begin()		MachineBasicBlock OrigFallThrough = OrigTarget == ThisMBB->succ_begin()
? *ThisMBB->succ_rbegin()		? *ThisMBB->succ_rbegin()
: *ThisMBB->succ_begin();		: *ThisMBB->succ_begin();
MachineBasicBlock *NewBRTarget =		MachineBasicBlock *NewBRTarget =
BSI.BranchToFallThrough ? OrigFallThrough : OrigTarget;		BSI.BranchToFallThrough ? OrigFallThrough : OrigTarget;
BranchProbability ProbToNewTarget =
!BSI.MBPI ? BranchProbability::getUnknown()		// It's impossible to know the precise branch probability after the split.
: BSI.MBPI->getEdgeProbability(ThisMBB, NewBRTarget);		// But it still need to be reasonable, the whole probability to original
		xurUnsubmitted Done Reply Inline Actions NIT: (1) "need" -> "needs". (2) Please also explain what is P0, P1, P2. A diagram would be helpful. xur: NIT: (1) "need" -> "needs". (2) Please also explain what is P0, P1, P2. A diagram would be…
		// targets should not be changed.
		// After split NewBRTarget will get two incoming edges, if the two edge
		// frequencies are same, then
		// F * P1 = F * P0 / 2 ==> P1 = P0 / 2
		// F * (1 - P1) * P2 = F * P1 ==> P2 = P1 / (1 - P1)
		BranchProbability ProbToNewTarget, ProbFallThrough; // Prob for new Br.
		BranchProbability ProbOrigTarget, ProbOrigFallThrough; // Prob for orig Br.
		ProbToNewTarget = ProbFallThrough = BranchProbability::getUnknown();
		ProbOrigTarget = ProbOrigFallThrough = BranchProbability::getUnknown();
		if (BSI.MBPI) {
		if (BSI.BranchToFallThrough) {
		ProbToNewTarget = BSI.MBPI->getEdgeProbability(ThisMBB, OrigFallThrough) / 2;
		ProbFallThrough = ProbToNewTarget.getCompl();
		ProbOrigFallThrough = ProbToNewTarget / ProbToNewTarget.getCompl();
		ProbOrigTarget = ProbOrigFallThrough.getCompl();
		} else {
		xurUnsubmitted Not Done Reply Inline Actions NIT: Probably use ProbFallThough? xur: NIT: Probably use ProbFallThough?
		CarrotAuthorUnsubmitted Done Reply Inline Actions I use the current expression to make it consistent with the formula in line 178. Carrot: I use the current expression to make it consistent with the formula in line 178.
		ProbToNewTarget = BSI.MBPI->getEdgeProbability(ThisMBB, OrigTarget) / 2;
		ProbFallThrough = ProbToNewTarget.getCompl();
		ProbOrigTarget = ProbToNewTarget / ProbToNewTarget.getCompl();
		ProbOrigFallThrough = ProbOrigTarget.getCompl();
		}
		}

// Create a new basic block.		// Create a new basic block.
MachineBasicBlock::iterator InsertPoint = BSI.SplitBefore;		MachineBasicBlock::iterator InsertPoint = BSI.SplitBefore;
const BasicBlock *LLVM_BB = ThisMBB->getBasicBlock();		const BasicBlock *LLVM_BB = ThisMBB->getBasicBlock();
MachineFunction::iterator It = ThisMBB->getIterator();		MachineFunction::iterator It = ThisMBB->getIterator();
MachineBasicBlock *NewMBB = MF->CreateMachineBasicBlock(LLVM_BB);		MachineBasicBlock *NewMBB = MF->CreateMachineBasicBlock(LLVM_BB);
MF->insert(++It, NewMBB);		MF->insert(++It, NewMBB);

// Move everything after SplitBefore into the new block.		// Move everything after SplitBefore into the new block.
NewMBB->splice(NewMBB->end(), ThisMBB, InsertPoint, ThisMBB->end());		NewMBB->splice(NewMBB->end(), ThisMBB, InsertPoint, ThisMBB->end());
NewMBB->transferSuccessors(ThisMBB);		NewMBB->transferSuccessors(ThisMBB);
		if (!ProbOrigTarget.isUnknown()) {
		auto MBBI = std::find(NewMBB->succ_begin(), NewMBB->succ_end(), OrigTarget);
		NewMBB->setSuccProbability(MBBI, ProbOrigTarget);
		MBBI = std::find(NewMBB->succ_begin(), NewMBB->succ_end(), OrigFallThrough);
		NewMBB->setSuccProbability(MBBI, ProbOrigFallThrough);
		}

// Add the two successors to ThisMBB. The probabilities come from the		// Add the two successors to ThisMBB.
// existing blocks if available.
ThisMBB->addSuccessor(NewBRTarget, ProbToNewTarget);		ThisMBB->addSuccessor(NewBRTarget, ProbToNewTarget);
ThisMBB->addSuccessor(NewMBB, ProbToNewTarget.getCompl());		ThisMBB->addSuccessor(NewMBB, ProbFallThrough);

// Add the branches to ThisMBB.		// Add the branches to ThisMBB.
BuildMI(*ThisMBB, ThisMBB->end(), BSI.SplitBefore->getDebugLoc(),		BuildMI(*ThisMBB, ThisMBB->end(), BSI.SplitBefore->getDebugLoc(),
TII->get(NewBROpcode))		TII->get(NewBROpcode))
.addReg(BSI.SplitCond->getOperand(0).getReg())		.addReg(BSI.SplitCond->getOperand(0).getReg())
.addMBB(NewBRTarget);		.addMBB(NewBRTarget);
BuildMI(*ThisMBB, ThisMBB->end(), BSI.SplitBefore->getDebugLoc(),		BuildMI(*ThisMBB, ThisMBB->end(), BSI.SplitBefore->getDebugLoc(),
TII->get(PPC::B))		TII->get(PPC::B))
▲ Show 20 Lines • Show All 512 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/reduce_cr.ll

				; RUN: llc -O2 -ppc-reduce-cr-logicals -print-machine-bfi -o - %s 2>&1 \| FileCheck %s
				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-grtev4-linux-gnu"

				; First block frequency info
				;CHECK: block-frequency-info: loop_test
				;CHECK-NEXT: - BB0[entry]: float = 1.0, int = 12
				;CHECK-NEXT: - BB1[for.check]: float = 2.6667, int = 34
				;CHECK-NEXT: - BB2[test1]: float = 1.6667, int = 21
				;CHECK-NEXT: - BB3[optional1]: float = 0.625, int = 8

				;CHECK: block-frequency-info: loop_test
				;CHECK: block-frequency-info: loop_test
				;CHECK: block-frequency-info: loop_test

				; Last block frequency info
				;CHECK: block-frequency-info: loop_test
				;CHECK-NEXT: - BB0[entry]: float = 1.0, int = 12
				;CHECK-NEXT: - BB1[for.check]: float = 2.6667, int = 34
				;CHECK-NEXT: - BB2[for.check]: float = 2.1667, int = 27
				;CHECK-NEXT: - BB3[test1]: float = 1.6667, int = 21
				;CHECK-NEXT: - BB4[optional1]: float = 0.625, int = 8


				define void @loop_test(i32* %tags, i32 %count) {
				entry:
				br label %for.check
				for.check:
				%count.loop = phi i32 [%count, %entry], [%count.sub, %for.latch]
				%done.count = icmp ugt i32 %count.loop, 0
				%tag_ptr = getelementptr inbounds i32, i32* %tags, i32 %count
				%tag = load i32, i32* %tag_ptr
				%done.tag = icmp eq i32 %tag, 0
				%done = and i1 %done.count, %done.tag
				br i1 %done, label %test1, label %exit, !prof !1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %optional1, !prof !1
				optional1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %test3, label %optional2, !prof !1
				optional2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %test3
				test3:
				%tagbit3 = and i32 %tag, 4
				%tagbit3eq0 = icmp eq i32 %tagbit3, 0
				br i1 %tagbit3eq0, label %test4, label %optional3, !prof !1
				optional3:
				call void @c()
				call void @c()
				call void @c()
				call void @c()
				br label %test4
				test4:
				%tagbit4 = and i32 %tag, 8
				%tagbit4eq0 = icmp eq i32 %tagbit4, 0
				br i1 %tagbit4eq0, label %for.latch, label %optional4, !prof !1
				optional4:
				call void @d()
				call void @d()
				call void @d()
				call void @d()
				br label %for.latch
				for.latch:
				%count.sub = sub i32 %count.loop, 1
				br label %for.check
				exit:
				ret void
				}

				declare void @a()
				declare void @b()
				declare void @c()
				declare void @d()

				!1 = !{!"branch_weights", i32 5, i32 3}

test/CodeGen/PowerPC/select-i1-vs-i1.ll

	; RUN: llc -ppc-reduce-cr-logicals -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -ppc-reduce-cr-logicals -verify-machineinstrs -tail-dup-placement=false < %s \| FileCheck %s
	; RUN: llc -ppc-reduce-cr-logicals -verify-machineinstrs \			; RUN: llc -ppc-reduce-cr-logicals -verify-machineinstrs \
	; RUN: -ppc-gen-isel=false < %s \| FileCheck --check-prefix=CHECK-NO-ISEL %s			; RUN: -ppc-gen-isel=false < %s \| FileCheck --check-prefix=CHECK-NO-ISEL %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; FIXME: We should check the operands to the cr* logical operation itself, but			; FIXME: We should check the operands to the cr* logical operation itself, but
	; unfortunately, FileCheck does not yet understand how to do arithmetic, so we			; unfortunately, FileCheck does not yet understand how to do arithmetic, so we
	; can't do so without introducing a register-allocation dependency.			; can't do so without introducing a register-allocation dependency.
	▲ Show 20 Lines • Show All 1,791 Lines • Show Last 20 Lines