This is an archive of the discontinued LLVM Phabricator instance.

Differential D8108

[MBP] Don't outline short optional branches
ClosedPublic

Authored by djasper on Mar 6 2015, 8:00 AM.

Download Raw Diff

Details

Reviewers

chandlerc

Summary

With the option -outline-optional-branches, LLVM will place optional branches out of line (more details on r231230).

With this patch, this is not done for short optional branches. A short optional branch is a branch containing a single block with an instruction count below a certain threshold (defaulting to 3). Still everything is guarded under -outline-optional-branches).

Outlining a short branch can't significantly improve code locality. It can however decrease performance because of the additional jmp and in cases where the optional branch is hot. This fixes a compile time regression I have observed in a benchmark.

Diff Detail

Event Timeline

djasper updated this revision to Diff 21359.Mar 6 2015, 8:00 AM

djasper retitled this revision from to [MBP] Don't outline short optional branches.

djasper updated this object.

djasper edited the test plan for this revision. (Show Details)

djasper added a reviewer: chandlerc.

djasper added a subscriber: Unknown Object (MLST).

djasper updated this object.Mar 6 2015, 8:03 AM

chandlerc added inline comments.Mar 6 2015, 1:48 PM

lib/CodeGen/MachineBlockPlacement.cpp
394	Shouldn't this be disabling the return of Succ here rather than arbitrarily returning Pred? And in general, it seems very strange to look at all of the predecessors' predecessors here. That seems really expensive and it isn't clear why. This at least needs some serious commenting to clarify the algorithm used.

djasper updated this revision to Diff 21472.Mar 9 2015, 2:45 AM

djasper added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
394	Addressed both concerns.

Ping?

I'm fine for this to go in behind the flag. It makes the code strictly better.

However, looking at this and thinking about it is really confirming for me that this isn't quite the correct approach. Here are my thoughts, mostly for reference going forward:

Currently, the code is working to select a definite successor to lay out next in the chain. But some of the time, we don't want to do that because there is a non-definite successor that doesn't have a CFG conflict and is small. I feel like it would be better to phrase this whole thing from the perspective of actually *outlining*. Rather than returning a successor early, I wonder if it would work better to *skip* successors which are non-small and optional.

I've tried to think through how this would be implemented, and sadly it doesn't seem straight forward. I really feel like we're struggling with a much more deeply flawed design here and that is why nothing seems to work elegantly.

lib/CodeGen/MachineBlockPlacement.cpp
399–414	I feel like this code would be much easier to read as a lambda predicate.

This revision is now accepted and ready to land.Mar 20 2015, 1:40 AM

Turned into lambda and committed as r232802.

Revision Contents

Path

Size

lib/

CodeGen/

MachineBlockPlacement.cpp

19 lines

test/

CodeGen/

X86/

code_placement_outline_optional_branches.ll

29 lines

Diff 21359

lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> ExitBlockBias(
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

static cl::opt<bool> OutlineOptionalBranches(		static cl::opt<bool> OutlineOptionalBranches(
"outline-optional-branches",		"outline-optional-branches",
cl::desc("Put completely optional branches, i.e. branches with a common "		cl::desc("Put completely optional branches, i.e. branches with a common "
"post dominator, out of line."),		"post dominator, out of line."),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

		static cl::opt<unsigned> OutlineOptionalThreshold(
		"outline-optional-threshold",
		cl::desc("Don't outline optional branches that are a single block with an "
		"instruction count below this threshold"),
		cl::init(4), cl::Hidden);

namespace {		namespace {
class BlockChain;		class BlockChain;
/// \brief Type for our function-wide basic block -> block chain mapping.		/// \brief Type for our function-wide basic block -> block chain mapping.
typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;		typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;
}		}

namespace {		namespace {
/// \brief A chain of blocks which will be laid out contiguously.		/// \brief A chain of blocks which will be laid out contiguously.
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *Succ : BB->successors()) {

uint32_t SuccWeight = MBPI->getEdgeWeight(BB, Succ);		uint32_t SuccWeight = MBPI->getEdgeWeight(BB, Succ);
BranchProbability SuccProb(SuccWeight / WeightScale, SumWeight);		BranchProbability SuccProb(SuccWeight / WeightScale, SumWeight);

// If we outline optional branches, look whether Succ is unavoidable, i.e.		// If we outline optional branches, look whether Succ is unavoidable, i.e.
// dominates all terminators of the MachineFunction. If it does, other		// dominates all terminators of the MachineFunction. If it does, other
// successors must be optional. Don't do this for cold branches.		// successors must be optional. Don't do this for cold branches.
if (OutlineOptionalBranches && SuccProb > HotProb.getCompl() &&		if (OutlineOptionalBranches && SuccProb > HotProb.getCompl() &&
UnavoidableBlocks.count(Succ) > 0)		UnavoidableBlocks.count(Succ) > 0) {
		for (MachineBasicBlock *Pred : Succ->predecessors()) {
		if (Pred == Succ \|\| (BlockFilter && !BlockFilter->count(Pred)) \|\|
		BlockToChain[Pred] == &Chain)
		continue;
		for (MachineBasicBlock *PredPred : Pred->predecessors()) {
		if (BlockToChain[PredPred] == &Chain &&
		Pred->size() < OutlineOptionalThreshold)
		return Pred;
		chandlercUnsubmitted Not Done Reply Inline Actions Shouldn't this be disabling the return of Succ here rather than arbitrarily returning Pred? And in general, it seems very strange to look at all of the predecessors' predecessors here. That seems really expensive and it isn't clear why. This at least needs some serious commenting to clarify the algorithm used. chandlerc: Shouldn't this be disabling the return of Succ here rather than arbitrarily returning Pred?
		djasperAuthorUnsubmitted Not Done Reply Inline Actions Addressed both concerns. djasper: Addressed both concerns.
		}
		}
return Succ;		return Succ;
		}

// Only consider successors which are either "hot", or wouldn't violate		// Only consider successors which are either "hot", or wouldn't violate
// any CFG constraints.		// any CFG constraints.
if (SuccChain.LoopPredecessors != 0) {		if (SuccChain.LoopPredecessors != 0) {
if (SuccProb < HotProb) {		if (SuccProb < HotProb) {
DEBUG(dbgs() << " " << getBlockName(Succ) << " -> " << SuccProb		DEBUG(dbgs() << " " << getBlockName(Succ) << " -> " << SuccProb
<< " (prob) (CFG conflict)\n");		<< " (prob) (CFG conflict)\n");
continue;		continue;
}		}

// Make sure that a hot successor doesn't have a globally more		// Make sure that a hot successor doesn't have a globally more
// important predecessor.		// important predecessor.
BlockFrequency CandidateEdgeFreq =		BlockFrequency CandidateEdgeFreq =
MBFI->getBlockFreq(BB) * SuccProb * HotProb.getCompl();		MBFI->getBlockFreq(BB) * SuccProb * HotProb.getCompl();
bool BadCFGConflict = false;		bool BadCFGConflict = false;
for (MachineBasicBlock *Pred : Succ->predecessors()) {		for (MachineBasicBlock *Pred : Succ->predecessors()) {
		chandlercUnsubmitted Not Done Reply Inline Actions I feel like this code would be much easier to read as a lambda predicate. chandlerc: I feel like this code would be much easier to read as a lambda predicate.
if (Pred == Succ \|\| (BlockFilter && !BlockFilter->count(Pred)) \|\|		if (Pred == Succ \|\| (BlockFilter && !BlockFilter->count(Pred)) \|\|
BlockToChain[Pred] == &Chain)		BlockToChain[Pred] == &Chain)
continue;		continue;
BlockFrequency PredEdgeFreq =		BlockFrequency PredEdgeFreq =
MBFI->getBlockFreq(Pred) * MBPI->getEdgeProbability(Pred, Succ);		MBFI->getBlockFreq(Pred) * MBPI->getEdgeProbability(Pred, Succ);
if (PredEdgeFreq >= CandidateEdgeFreq) {		if (PredEdgeFreq >= CandidateEdgeFreq) {
BadCFGConflict = true;		BadCFGConflict = true;
break;		break;
▲ Show 20 Lines • Show All 798 Lines • Show Last 20 Lines

test/CodeGen/X86/code_placement_outline_optional_branches.ll

	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s -check-prefix=CHECK			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s -check-prefix=CHECK
	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -outline-optional-branches < %s \| FileCheck %s -check-prefix=CHECK-OUTLINE			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -outline-optional-branches < %s \| FileCheck %s -check-prefix=CHECK-OUTLINE

	define void @foo(i32 %t1, i32 %t2) {			define void @foo(i32 %t1, i32 %t2, i32 %t3) {
	; Test that we lift the call to 'c' up to immediately follow the call to 'b'			; Test that we lift the call to 'c' up to immediately follow the call to 'b'
	; when we disable the cfg conflict check.			; when we disable the cfg conflict check.
	;			;
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: callq a			; CHECK: callq a
				; CHECK: callq a
				; CHECK: callq a
				; CHECK: callq a
	; CHECK: callq b			; CHECK: callq b
	; CHECK: callq c			; CHECK: callq c
	; CHECK: callq d			; CHECK: callq d
				; CHECK: callq e
				; CHECK: callq f
	;			;
	; CHECK-OUTLINE-LABEL: foo:			; CHECK-OUTLINE-LABEL: foo:
	; CHECK-OUTLINE: callq b			; CHECK-OUTLINE: callq b
	; CHECK-OUTLINE: callq c			; CHECK-OUTLINE: callq c
	; CHECK-OUTLINE: callq d			; CHECK-OUTLINE: callq d
				; CHECK-OUTLINE: callq e
				; CHECK-OUTLINE: callq f
				; CHECK-OUTLINE: callq a
				; CHECK-OUTLINE: callq a
				; CHECK-OUTLINE: callq a
	; CHECK-OUTLINE: callq a			; CHECK-OUTLINE: callq a

	entry:			entry:
	%cmp = icmp eq i32 %t1, 0			%cmp = icmp eq i32 %t1, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then:			if.then:
	call void @a()			call void @a()
				call void @a()
				call void @a()
				call void @a()
	br label %if.end			br label %if.end

	if.end:			if.end:
	call void @b()			call void @b()
	br label %hotbranch			br label %hotbranch

	hotbranch:			hotbranch:
	%cmp2 = icmp eq i32 %t2, 0			%cmp2 = icmp eq i32 %t2, 0
	br i1 %cmp2, label %if.then2, label %if.end2, !prof !1			br i1 %cmp2, label %if.then2, label %if.end2, !prof !1

	if.then2:			if.then2:
	call void @c()			call void @c()
	br label %if.end2			br label %if.end2

	if.end2:			if.end2:
	call void @d()			call void @d()
				br label %shortbranch

				shortbranch:
				%cmp3 = icmp eq i32 %t3, 0
				br i1 %cmp3, label %if.then3, label %if.end3

				if.then3:
				call void @e()
				br label %if.end3

				if.end3:
				call void @f()
	ret void			ret void
	}			}

	declare void @a()			declare void @a()
	declare void @b()			declare void @b()
	declare void @c()			declare void @c()
	declare void @d()			declare void @d()
				declare void @e()
				declare void @f()

	!1 = !{!"branch_weights", i32 64, i32 4}			!1 = !{!"branch_weights", i32 64, i32 4}