This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
ClosedPublic

Authored by jmolloy on Aug 26 2016, 12:46 AM.

Download Raw Diff

Details

Reviewers

spatel
hans

Summary

This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

if (a)
  x(1);
else if (b)
  x(2);

This produces the following CFG:

   [if]
  /    \
[x(1)] [if]
  |     | \
  |     |  \
  |  [x(2)] |
   \    |  /
    [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

    [if]
   /    \
 [x(1)] [if]
   |     | \
   |     |  \
   |  [x(2)] |
    \   /    |
[sink.split] |
      \     /
      [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 69325.Aug 26 2016, 12:46 AM

jmolloy retitled this revision from to [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches.

jmolloy updated this object.

jmolloy added reviewers: spatel, hans.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

hans added inline comments.Aug 26 2016, 4:28 PM

lib/Transforms/Utils/SimplifyCFG.cpp
1463	(Ultra nit: I think 'f' is a better name for a random function than 'x' which is more commonly used for variables.)
1483	Would this scale to larger if-else chains where some subset of them have similar instructions? I assume not because the order of the if-statements makes this hard even if it doesn't really matter for semantics.
1496	Maybe call these Predecessors and UnconditionalPredecessors?

jmolloy added inline comments.Aug 28 2016, 10:11 AM

lib/Transforms/Utils/SimplifyCFG.cpp
1483	In general this is a very hard problem as you say. I've been thinking that it should be possible to use a suffix tree or levenshtein distance to find the longest common substring in all incoming blocks. It should be possible to support the case where some blocks have a common subsequence and others don't... Bit of a longer term blue sky thing though.

Hi Hans,

Thanks for the comments! New version attached.

James

hans added inline comments.Aug 29 2016, 2:43 PM

lib/Transforms/Utils/SimplifyCFG.cpp

1497

Actually, is Preds even used after it's been populated?

If not, maybe this is all we need:

SmallVector<BasicBlock *, 4> UnconditionalPreds;
Instruction *ConditionalPred = nullptr;

and the logic in the for loop could be simplified a little:

if (T is unconditional)
  push to UnconditionalPreds
else if (T is conditional && !ConditionalPred)
  set ConditionalPred
else
  bail out

Thanks Hans. I should have addressed all of your comments now.

lgtm

This revision is now accepted and ready to land.Aug 30 2016, 9:41 AM

Committed in rL280364. Please specify "Differential revision: <URL>" in commit message.

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

90 lines

test/

CodeGen/

AArch64/

arm64-jumptable.ll

13 lines

branch-folder-merge-mmos.ll

2 lines

ifcvt-select.ll

2 lines

rm_redundant_cmp.ll

24 lines

MC/

ARM/

data-in-code.ll

4 lines

Transforms/

SimplifyCFG/

sink-common-code.ll

110 lines

Diff 69708

lib/Transforms/Utils/SimplifyCFG.cpp

	Show First 20 Lines • Show All 943 Lines • ▼ Show 20 Lines
	// *I-- = [B1[n-1], B2[n-1], B3[n-1]];			// *I-- = [B1[n-1], B2[n-1], B3[n-1]];
	// *I-- = [B1[n-2], B2[n-2], B3[n-2]];			// *I-- = [B1[n-2], B2[n-2], B3[n-2]];
	// ...			// ...
	class LockstepReverseIterator {			class LockstepReverseIterator {
	ArrayRef<BasicBlock*> Blocks;			ArrayRef<BasicBlock*> Blocks;
	SmallVector<Instruction*,4> Insts;			SmallVector<Instruction*,4> Insts;
	bool Fail;			bool Fail;
	public:			public:
	LockstepReverseIterator(ArrayRef<BasicBlock*> Blocks) :			LockstepReverseIterator(ArrayRef<BasicBlock*> Blocks) :
				hansUnsubmitted Done Reply Inline Actions (Ultra nit: I think 'f' is a better name for a random function than 'x' which is more commonly used for variables.) hans: (Ultra nit: I think 'f' is a better name for a random function than 'x' which is more commonly…
	Blocks(Blocks) {			Blocks(Blocks) {
	reset();			reset();
	}			}

	void reset() {			void reset() {
	Fail = false;			Fail = false;
	Insts.clear();			Insts.clear();
	for (auto *BB : Blocks) {			for (auto *BB : Blocks) {
	if (BB->size() <= 1) {			if (BB->size() <= 1) {
	// Block wasn't big enough			// Block wasn't big enough
	Fail = true;			Fail = true;
	return;			return;
	}			}
	Insts.push_back(BB->getTerminator()->getPrevNode());			Insts.push_back(BB->getTerminator()->getPrevNode());
	}			}
	}			}

	bool isValid() const {			bool isValid() const {
	return !Fail;			return !Fail;
	}			}
				hansUnsubmitted Done Reply Inline Actions Would this scale to larger if-else chains where some subset of them have similar instructions? I assume not because the order of the if-statements makes this hard even if it doesn't really matter for semantics. hans: Would this scale to larger if-else chains where some subset of them have similar instructions?
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions In general this is a very hard problem as you say. I've been thinking that it should be possible to use a suffix tree or levenshtein distance to find the longest common substring in all incoming blocks. It should be possible to support the case where some blocks have a common subsequence and others don't... Bit of a longer term blue sky thing though. jmolloy: In general this is a very hard problem as you say. I've been thinking that it should be…

	void operator -- () {			void operator -- () {
	if (Fail)			if (Fail)
	return;			return;
	for (auto *&Inst : Insts) {			for (auto *&Inst : Insts) {
	if (Inst == &Inst->getParent()->front()) {			if (Inst == &Inst->getParent()->front()) {
	Fail = true;			Fail = true;
	return;			return;
	}			}
	Inst = Inst->getPrevNode();			Inst = Inst->getPrevNode();
	}			}
	}			}

				hansUnsubmitted Not Done Reply Inline Actions Maybe call these Predecessors and UnconditionalPredecessors? hans: Maybe call these Predecessors and UnconditionalPredecessors?
	ArrayRef<Instruction> operator () const {			ArrayRef<Instruction> operator () const {
				hansUnsubmitted Not Done Reply Inline Actions Actually, is Preds even used after it's been populated? If not, maybe this is all we need: SmallVector<BasicBlock , 4> UnconditionalPreds; Instruction ConditionalPred = nullptr; and the logic in the for loop could be simplified a little: if (T is unconditional) push to UnconditionalPreds else if (T is conditional && !ConditionalPred) set ConditionalPred else bail out hans: Actually, is Preds even used after it's been populated? If not, maybe this is all we need…
	return Insts;			return Insts;
	}			}
	};			};
	}			}

	/// Given an unconditional branch that goes to BBEnd,			/// Given an unconditional branch that goes to BBEnd,
	/// check whether BBEnd has only two predecessors and the other predecessor			/// check whether BBEnd has only two predecessors and the other predecessor
	/// ends with an unconditional branch. If it is true, sink any common code			/// ends with an unconditional branch. If it is true, sink any common code
	/// in the two predecessors to BBEnd.			/// in the two predecessors to BBEnd.
	static bool SinkThenElseCodeToEnd(BranchInst *BI1) {			static bool SinkThenElseCodeToEnd(BranchInst *BI1) {
	assert(BI1->isUnconditional());			assert(BI1->isUnconditional());
	BasicBlock *BBEnd = BI1->getSuccessor(0);			BasicBlock *BBEnd = BI1->getSuccessor(0);

	// We currently only support branch targets with two predecessors.			// We support two situations:
	// FIXME: this is an arbitrary restriction and should be lifted.			// (1) all incoming arcs are unconditional
	SmallVector<BasicBlock*,4> Blocks;			// (2) one incoming arc is conditional
	for (auto *BB : predecessors(BBEnd))			//
	Blocks.push_back(BB);			// (2) is very common in switch defaults and
	if (Blocks.size() != 2 \|\|			// else-if patterns;
	!all_of(Blocks, [](const BasicBlock *BB) {			//
	auto *BI = dyn_cast<BranchInst>(BB->getTerminator());			// if (a) f(1);
	return BI && BI->isUnconditional();			// else if (b) f(2);
	}))			//
				// produces:
				//
				// [if]
				// / \
				// [f(1)] [if]
				// \| \| \
				// \| \| \
				// \| [f(2)]\|
				// \ \| /
				// [ end ]
				//
				// [end] has two unconditional predecessor arcs and one conditional. The
				// conditional refers to the implicit empty 'else' arc. This conditional
				// arc can also be caused by an empty default block in a switch.
				//
				// In this case, we attempt to sink code from all unconditional arcs.
				// If we can sink instructions from these arcs (determined during the scan
				// phase below) we insert a common successor for all unconditional arcs and
				// connect that to [end], to enable sinking:
				//
				// [if]
				// / \
				// [x(1)] [if]
				// \| \| \
				// \| \| \
				// \| [x(2)] \|
				// \ / \|
				// [sink.split] \|
				// \ /
				// [ end ]
				//
				SmallVector<BasicBlock*,4> UnconditionalPreds;
				Instruction *Cond = nullptr;
				for (auto *B : predecessors(BBEnd)) {
				auto *T = B->getTerminator();
				if (isa<BranchInst>(T) && cast<BranchInst>(T)->isUnconditional())
				UnconditionalPreds.push_back(B);
				else if ((isa<BranchInst>(T) \|\| isa<SwitchInst>(T)) && !Cond)
				Cond = T;
				else
				return false;
				}
				if (UnconditionalPreds.size() < 2)
	return false;			return false;

	bool Changed = false;			bool Changed = false;

	// We take a two-step approach to tail sinking. First we scan from the end of			// We take a two-step approach to tail sinking. First we scan from the end of
	// each block upwards in lockstep. If the n'th instruction from the end of each			// each block upwards in lockstep. If the n'th instruction from the end of each
	// block can be sunk, those instructions are added to ValuesToSink and we			// block can be sunk, those instructions are added to ValuesToSink and we
	// carry on. If we can sink an instruction but need to PHI-merge some operands			// carry on. If we can sink an instruction but need to PHI-merge some operands
	// (because they're not identical in each instruction) we add these to			// (because they're not identical in each instruction) we add these to
	// PHIOperands.			// PHIOperands.
	unsigned ScanIdx = 0;			unsigned ScanIdx = 0;
	SmallPtrSet<Value*,4> InstructionsToSink;			SmallPtrSet<Value*,4> InstructionsToSink;
	DenseMap<Instruction, SmallVector<Value,4>> PHIOperands;			DenseMap<Instruction, SmallVector<Value,4>> PHIOperands;
	LockstepReverseIterator LRI(Blocks);			LockstepReverseIterator LRI(UnconditionalPreds);
	while (LRI.isValid() &&			while (LRI.isValid() &&
	canSinkInstructions(*LRI, PHIOperands)) {			canSinkInstructions(*LRI, PHIOperands)) {
	DEBUG(dbgs() << "SINK: instruction can be sunk: " << (*LRI)[0] << "\n");			DEBUG(dbgs() << "SINK: instruction can be sunk: " << (*LRI)[0] << "\n");
	InstructionsToSink.insert((LRI).begin(), (LRI).end());			InstructionsToSink.insert((LRI).begin(), (LRI).end());
	++ScanIdx;			++ScanIdx;
	--LRI;			--LRI;
	}			}

				if (ScanIdx > 0 && Cond) {
				DEBUG(dbgs() << "SINK: Splitting edge\n");
				// We have a conditional edge and we're going to sink some instructions.
				// Insert a new block postdominating all blocks we're going to sink from.
				if (!SplitBlockPredecessors(BI1->getSuccessor(0), UnconditionalPreds,
				".sink.split"))
				// Edges couldn't be split.
				return false;
				Changed = true;
				}

	// Now that we've analyzed all potential sinking candidates, perform the			// Now that we've analyzed all potential sinking candidates, perform the
	// actual sink. We iteratively sink the last non-terminator of the source			// actual sink. We iteratively sink the last non-terminator of the source
	// blocks into their common successor unless doing so would require too			// blocks into their common successor unless doing so would require too
	// many PHI instructions to be generated (currently only one PHI is allowed			// many PHI instructions to be generated (currently only one PHI is allowed
	// per sunk instruction).			// per sunk instruction).
	//			//
	// We can use InstructionsToSink to discount values needing PHI-merging that will			// We can use InstructionsToSink to discount values needing PHI-merging that will
	// actually be sunk in a later iteration. This allows us to be more			// actually be sunk in a later iteration. This allows us to be more
	// aggressive in what we sink. This does allow a false positive where we			// aggressive in what we sink. This does allow a false positive where we
	// sink presuming a later value will also be sunk, but stop half way through			// sink presuming a later value will also be sunk, but stop half way through
	// and never actually sink it which means we produce more PHIs than intended.			// and never actually sink it which means we produce more PHIs than intended.
	// This is unlikely in practice though.			// This is unlikely in practice though.
	for (unsigned SinkIdx = 0; SinkIdx != ScanIdx; ++SinkIdx) {			for (unsigned SinkIdx = 0; SinkIdx != ScanIdx; ++SinkIdx) {
	DEBUG(dbgs() << "SINK: Sink: "			DEBUG(dbgs() << "SINK: Sink: "
	<< *Blocks[0]->getTerminator()->getPrevNode()			<< *UnconditionalPreds[0]->getTerminator()->getPrevNode()
	<< "\n");			<< "\n");
	// Because we've sunk every instruction in turn, the current instruction to			// Because we've sunk every instruction in turn, the current instruction to
	// sink is always at index 0.			// sink is always at index 0.
	LRI.reset();			LRI.reset();
	unsigned NumPHIdValues = 0;			unsigned NumPHIdValues = 0;
	for (auto I : LRI)			for (auto I : LRI)
	for (auto *V : PHIOperands[I])			for (auto *V : PHIOperands[I])
	if (InstructionsToSink.count(V) == 0)			if (InstructionsToSink.count(V) == 0)
	++NumPHIdValues;			++NumPHIdValues;
	DEBUG(dbgs() << "SINK: #phid values: " << NumPHIdValues << "\n");			DEBUG(dbgs() << "SINK: #phid values: " << NumPHIdValues << "\n");
	assert((NumPHIdValues % Blocks.size() == 0) &&			assert((NumPHIdValues % UnconditionalPreds.size() == 0) &&
	"Every operand must either be PHId or not PHId!");			"Every operand must either be PHId or not PHId!");

	if (NumPHIdValues / Blocks.size() > 1)			if (NumPHIdValues / UnconditionalPreds.size() > 1) {
	// Too many PHIs would be created.			// Too many PHIs would be created.
				DEBUG(dbgs() << "SINK: stopping here, too many PHIs would be created!\n");
	break;			break;
				}

	sinkLastInstruction(Blocks);			sinkLastInstruction(UnconditionalPreds);
	NumSinkCommons++;			NumSinkCommons++;
	Changed = true;			Changed = true;
	}			}

	return Changed;			return Changed;
	}			}

	/// \brief Determine if we can hoist sink a sole store instruction out of a			/// \brief Determine if we can hoist sink a sole store instruction out of a
	/// conditional block.			/// conditional block.
	///			///
	/// We are looking for code like the following:			/// We are looking for code like the following:
	/// BrBB:			/// BrBB:
	▲ Show 20 Lines • Show All 991 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-jumptable.ll

	; RUN: llc -mtriple=arm64-apple-ios < %s \| FileCheck %s			; RUN: llc -mtriple=arm64-apple-ios < %s \| FileCheck %s
	; RUN: llc -mtriple=arm64-linux-gnu < %s \| FileCheck %s --check-prefix=CHECK-LINUX			; RUN: llc -mtriple=arm64-linux-gnu < %s \| FileCheck %s --check-prefix=CHECK-LINUX
	; <rdar://11417675>			; <rdar://11417675>

	define void @sum(i32* %to) {			define void @sum(i32 %a, i32* %to, i32 %c) {
	entry:			entry:
	switch i32 undef, label %exit [			switch i32 %a, label %exit [
	i32 1, label %bb1			i32 1, label %bb1
	i32 2, label %bb2			i32 2, label %bb2
	i32 3, label %bb3			i32 3, label %bb3
	i32 4, label %bb4			i32 4, label %bb4
	]			]
	bb1:			bb1:
	store i32 undef, i32* %to			%b = add i32 %c, 1
				store i32 %b, i32* %to
	br label %exit			br label %exit
	bb2:			bb2:
	store i32 undef, i32* %to			store i32 2, i32* %to
	br label %exit			br label %exit
	bb3:			bb3:
	store i32 undef, i32* %to			store i32 3, i32* %to
	br label %exit			br label %exit
	bb4:			bb4:
	store i32 undef, i32* %to			store i32 4, i32* %to
	br label %exit			br label %exit
	exit:			exit:
	ret void			ret void
	}			}

	; CHECK-LABEL: sum:			; CHECK-LABEL: sum:
	; CHECK: adrp {{x[0-9]+}}, LJTI0_0@PAGE			; CHECK: adrp {{x[0-9]+}}, LJTI0_0@PAGE
	; CHECK: add {{x[0-9]+}}, {{x[0-9]+}}, LJTI0_0@PAGEOFF			; CHECK: add {{x[0-9]+}}, {{x[0-9]+}}, LJTI0_0@PAGEOFF

	; CHECK-LINUX-LABEL: sum:			; CHECK-LINUX-LABEL: sum:
	; CHECK-LINUX: adrp {{x[0-9]+}}, .LJTI0_0			; CHECK-LINUX: adrp {{x[0-9]+}}, .LJTI0_0
	; CHECK-LINUX: add {{x[0-9]+}}, {{x[0-9]+}}, :lo12:.LJTI0_0			; CHECK-LINUX: add {{x[0-9]+}}, {{x[0-9]+}}, :lo12:.LJTI0_0

test/CodeGen/AArch64/branch-folder-merge-mmos.ll

	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -stop-after branch-folder \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -stop-after branch-folder \| FileCheck %s
	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

	; Function Attrs: norecurse nounwind			; Function Attrs: norecurse nounwind
	define void @foo(i32 %a, i32 %b, float* nocapture %foo_arr) #0 {			define void @foo(i32 %a, i32 %b, float* nocapture %foo_arr) #0 {
	; CHECK: (load 4 from %ir.arrayidx1.{{i[1-2]}}), (load 4 from %ir.arrayidx1.{{i[1-2]}})			; CHECK: (load 4 from %ir.arrayidx1.{{i[1-2]}})
	entry:			entry:
	%cmp = icmp sgt i32 %a, 0			%cmp = icmp sgt i32 %a, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%0 = load float, float* %foo_arr, align 4			%0 = load float, float* %foo_arr, align 4
	%arrayidx1.i1 = getelementptr inbounds float, float* %foo_arr, i64 1			%arrayidx1.i1 = getelementptr inbounds float, float* %foo_arr, i64 1
	%1 = load float, float* %arrayidx1.i1, align 4			%1 = load float, float* %arrayidx1.i1, align 4
	Show All 19 Lines

test/CodeGen/AArch64/ifcvt-select.ll

	; RUN: llc -mtriple=arm64-apple-ios -mcpu=cyclone < %s \| FileCheck %s			; RUN: llc -mtriple=arm64-apple-ios -mcpu=cyclone < %s \| FileCheck %s
	; Do not generate redundant select in early if-converstion pass.			; Do not generate redundant select in early if-converstion pass.

	define i32 @foo(i32 %a, i32 %b) {			define i32 @foo(i32 %a, i32 %b) {
	entry:			entry:
	;CHECK-LABEL: foo:			;CHECK-LABEL: foo:
	;CHECK: csinc			;CHECK: cneg
	;CHECK-NOT: csel			;CHECK-NOT: csel
	%sub = sub nsw i32 %b, %a			%sub = sub nsw i32 %b, %a
	%cmp10 = icmp sgt i32 %a, 0			%cmp10 = icmp sgt i32 %a, 0
	br i1 %cmp10, label %while.body.lr.ph, label %while.end			br i1 %cmp10, label %while.body.lr.ph, label %while.end

	while.body.lr.ph:			while.body.lr.ph:
	br label %while.body			br label %while.body

	Show All 26 Lines

test/CodeGen/AArch64/rm_redundant_cmp.ll

	; RUN: llc < %s -mtriple=aarch64-linux-gnuabi -O2 \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-linux-gnuabi -O2 \| FileCheck %s

	; The following cases are for i16			; The following cases are for i16

	%struct.s_signed_i16 = type { i16, i16, i16 }			%struct.s_signed_i16 = type { i16, i16, i16 }
	%struct.s_unsigned_i16 = type { i16, i16, i16 }			%struct.s_unsigned_i16 = type { i16, i16, i16 }

	@cost_s_i8_i16 = common global %struct.s_signed_i16 zeroinitializer, align 2			@cost_s_i8_i16 = common global %struct.s_signed_i16 zeroinitializer, align 2
	@cost_u_i16 = common global %struct.s_unsigned_i16 zeroinitializer, align 2			@cost_u_i16 = common global %struct.s_unsigned_i16 zeroinitializer, align 2

	define void @test_i16_2cmp_signed_1() {			define void @test_i16_2cmp_signed_1() {
	; CHECK-LABEL: test_i16_2cmp_signed_1			; CHECK-LABEL: test_i16_2cmp_signed_1
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.gt			; CHECK-NEXT: b.lt
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.ne			; CHECK: ret
	entry:			entry:
	%0 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 1), align 2			%0 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 1), align 2
	%1 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 2), align 2			%1 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 2), align 2
	%cmp = icmp sgt i16 %0, %1			%cmp = icmp sgt i16 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i16 %0, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 0), align 2			store i16 %0, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 0), align 2
	Show All 9 Lines

	if.end8: ; preds = %if.else, %if.then7, %if.then			if.end8: ; preds = %if.else, %if.then7, %if.then
	ret void			ret void
	}			}

	define void @test_i16_2cmp_signed_2() {			define void @test_i16_2cmp_signed_2() {
	; CHECK-LABEL: test_i16_2cmp_signed_2			; CHECK-LABEL: test_i16_2cmp_signed_2
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.le			; CHECK-NEXT: b.gt
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.ge			; CHECK: b.ge
	entry:			entry:
	%0 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 1), align 2			%0 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 1), align 2
	%1 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 2), align 2			%1 = load i16, i16* getelementptr inbounds (%struct.s_signed_i16, %struct.s_signed_i16* @cost_s_i8_i16, i64 0, i32 2), align 2
	%cmp = icmp sgt i16 %0, %1			%cmp = icmp sgt i16 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	Show All 11 Lines

	if.end8: ; preds = %if.else, %if.then7, %if.then			if.end8: ; preds = %if.else, %if.then7, %if.then
	ret void			ret void
	}			}

	define void @test_i16_2cmp_unsigned_1() {			define void @test_i16_2cmp_unsigned_1() {
	; CHECK-LABEL: test_i16_2cmp_unsigned_1			; CHECK-LABEL: test_i16_2cmp_unsigned_1
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.hi			; CHECK-NEXT: b.lo
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.ne			; CHECK: ret
	entry:			entry:
	%0 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 1), align 2			%0 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 1), align 2
	%1 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 2), align 2			%1 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 2), align 2
	%cmp = icmp ugt i16 %0, %1			%cmp = icmp ugt i16 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i16 %0, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 0), align 2			store i16 %0, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 0), align 2
	Show All 9 Lines

	if.end8: ; preds = %if.else, %if.then7, %if.then			if.end8: ; preds = %if.else, %if.then7, %if.then
	ret void			ret void
	}			}

	define void @test_i16_2cmp_unsigned_2() {			define void @test_i16_2cmp_unsigned_2() {
	; CHECK-LABEL: test_i16_2cmp_unsigned_2			; CHECK-LABEL: test_i16_2cmp_unsigned_2
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.ls			; CHECK-NEXT: b.hi
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.hs			; CHECK: b.hs
	entry:			entry:
	%0 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 1), align 2			%0 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 1), align 2
	%1 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 2), align 2			%1 = load i16, i16* getelementptr inbounds (%struct.s_unsigned_i16, %struct.s_unsigned_i16* @cost_u_i16, i64 0, i32 2), align 2
	%cmp = icmp ugt i16 %0, %1			%cmp = icmp ugt i16 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	Show All 20 Lines

	@cost_s = common global %struct.s_signed_i8 zeroinitializer, align 2			@cost_s = common global %struct.s_signed_i8 zeroinitializer, align 2
	@cost_u_i8 = common global %struct.s_unsigned_i8 zeroinitializer, align 2			@cost_u_i8 = common global %struct.s_unsigned_i8 zeroinitializer, align 2


	define void @test_i8_2cmp_signed_1() {			define void @test_i8_2cmp_signed_1() {
	; CHECK-LABEL: test_i8_2cmp_signed_1			; CHECK-LABEL: test_i8_2cmp_signed_1
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.gt			; CHECK-NEXT: b.lt
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.ne			; CHECK: ret
	entry:			entry:
	%0 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 1), align 2			%0 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 1), align 2
	%1 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 2), align 2			%1 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 2), align 2
	%cmp = icmp sgt i8 %0, %1			%cmp = icmp sgt i8 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i8 %0, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 0), align 2			store i8 %0, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 0), align 2
	Show All 9 Lines

	if.end8: ; preds = %if.else, %if.then7, %if.then			if.end8: ; preds = %if.else, %if.then7, %if.then
	ret void			ret void
	}			}

	define void @test_i8_2cmp_signed_2() {			define void @test_i8_2cmp_signed_2() {
	; CHECK-LABEL: test_i8_2cmp_signed_2			; CHECK-LABEL: test_i8_2cmp_signed_2
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.le			; CHECK-NEXT: b.gt
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.ge			; CHECK: b.ge
	entry:			entry:
	%0 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 1), align 2			%0 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 1), align 2
	%1 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 2), align 2			%1 = load i8, i8* getelementptr inbounds (%struct.s_signed_i8, %struct.s_signed_i8* @cost_s, i64 0, i32 2), align 2
	%cmp = icmp sgt i8 %0, %1			%cmp = icmp sgt i8 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	Show All 11 Lines

	if.end8: ; preds = %if.else, %if.then7, %if.then			if.end8: ; preds = %if.else, %if.then7, %if.then
	ret void			ret void
	}			}

	define void @test_i8_2cmp_unsigned_1() {			define void @test_i8_2cmp_unsigned_1() {
	; CHECK-LABEL: test_i8_2cmp_unsigned_1			; CHECK-LABEL: test_i8_2cmp_unsigned_1
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.hi			; CHECK-NEXT: b.lo
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.ne			; CHECK: ret
	entry:			entry:
	%0 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 1), align 2			%0 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 1), align 2
	%1 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 2), align 2			%1 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 2), align 2
	%cmp = icmp ugt i8 %0, %1			%cmp = icmp ugt i8 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i8 %0, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 0), align 2			store i8 %0, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 0), align 2
	Show All 9 Lines

	if.end8: ; preds = %if.else, %if.then7, %if.then			if.end8: ; preds = %if.else, %if.then7, %if.then
	ret void			ret void
	}			}

	define void @test_i8_2cmp_unsigned_2() {			define void @test_i8_2cmp_unsigned_2() {
	; CHECK-LABEL: test_i8_2cmp_unsigned_2			; CHECK-LABEL: test_i8_2cmp_unsigned_2
	; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK-NEXT: b.ls			; CHECK-NEXT: b.hi
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.hs			; CHECK: b.hs
	entry:			entry:
	%0 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 1), align 2			%0 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 1), align 2
	%1 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 2), align 2			%1 = load i8, i8* getelementptr inbounds (%struct.s_unsigned_i8, %struct.s_unsigned_i8* @cost_u_i8, i64 0, i32 2), align 2
	%cmp = icmp ugt i8 %0, %1			%cmp = icmp ugt i8 %0, %1
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	Show All 27 Lines

test/MC/ARM/data-in-code.ll

	;; RUN: llc -verify-machineinstrs \			;; RUN: llc -verify-machineinstrs \
	;; RUN: -mtriple=armv7-linux-gnueabi -filetype=obj %s -o - \| \			;; RUN: -mtriple=armv7-linux-gnueabi -filetype=obj %s -o - \| \
	;; RUN: llvm-readobj -t \| FileCheck -check-prefix=ARM %s			;; RUN: llvm-readobj -t \| FileCheck -check-prefix=ARM %s

	;; RUN: llc -verify-machineinstrs \			;; RUN: llc -verify-machineinstrs \
	;; RUN: -mtriple=thumbv7-linux-gnueabi -filetype=obj %s -o - \| \			;; RUN: -mtriple=thumbv7-linux-gnueabi -filetype=obj %s -o - \| \
	;; RUN: llvm-readobj -t \| FileCheck -check-prefix=TMB %s			;; RUN: llvm-readobj -t \| FileCheck -check-prefix=TMB %s

	;; Ensure that if a jump table is generated that it has Mapping Symbols			;; Ensure that if a jump table is generated that it has Mapping Symbols
	;; marking the data-in-code region.			;; marking the data-in-code region.

	define void @foo(i32* %ptr) nounwind ssp {			define void @foo(i32* %ptr, i32 %b) nounwind ssp {
	%tmp = load i32, i32* %ptr, align 4			%tmp = load i32, i32* %ptr, align 4
	switch i32 %tmp, label %exit [			switch i32 %tmp, label %exit [
	i32 0, label %bb0			i32 0, label %bb0
	i32 1, label %bb1			i32 1, label %bb1
	i32 2, label %bb2			i32 2, label %bb2
	i32 3, label %bb3			i32 3, label %bb3
	]			]
	bb0:			bb0:
	store i32 0, i32* %ptr, align 4			store i32 %b, i32* %ptr, align 4
	br label %exit			br label %exit
	bb1:			bb1:
	store i32 1, i32* %ptr, align 4			store i32 1, i32* %ptr, align 4
	br label %exit			br label %exit
	bb2:			bb2:
	store i32 2, i32* %ptr, align 4			store i32 2, i32* %ptr, align 4
	br label %exit			br label %exit
	bb3:			bb3:
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/sink-common-code.ll

Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	if.end:
%p = phi i1 [ %cmp1, %if.then ], [ %cmp2, %if.else ]		%p = phi i1 [ %cmp1, %if.then ], [ %cmp2, %if.else ]
ret i32 1		ret i32 1
}		}

; CHECK-LABEL: test15		; CHECK-LABEL: test15
; CHECK: getelementptr		; CHECK: getelementptr
; CHECK: load		; CHECK: load
; CHECK-NOT: load		; CHECK-NOT: load

		define zeroext i1 @test16(i1 zeroext %flag, i1 zeroext %flag2, i32 %blksA, i32 %blksB, i32 %nblks) {
		entry:
		br i1 %flag, label %if.then, label %if.else

		if.then:
		%cmp = icmp uge i32 %blksA, %nblks
		%frombool1 = zext i1 %cmp to i8
		br label %if.end

		if.else:
		br i1 %flag2, label %if.then2, label %if.end

		if.then2:
		%add = add i32 %nblks, %blksB
		%cmp2 = icmp ule i32 %add, %blksA
		%frombool3 = zext i1 %cmp2 to i8
		br label %if.end

		if.end:
		%obeys.0 = phi i8 [ %frombool1, %if.then ], [ %frombool3, %if.then2 ], [ 0, %if.else ]
		%tobool4 = icmp ne i8 %obeys.0, 0
		ret i1 %tobool4
		}

		; CHECK-LABEL: test16
		; CHECK: zext
		; CHECK-NOT: zext

		define zeroext i1 @test17(i32 %flag, i32 %blksA, i32 %blksB, i32 %nblks) {
		entry:
		switch i32 %flag, label %if.end [
		i32 0, label %if.then
		i32 1, label %if.then2
		]

		if.then:
		%cmp = icmp uge i32 %blksA, %nblks
		%frombool1 = zext i1 %cmp to i8
		br label %if.end

		if.then2:
		%add = add i32 %nblks, %blksB
		%cmp2 = icmp ule i32 %add, %blksA
		%frombool3 = zext i1 %cmp2 to i8
		br label %if.end

		if.end:
		%obeys.0 = phi i8 [ %frombool1, %if.then ], [ %frombool3, %if.then2 ], [ 0, %entry ]
		%tobool4 = icmp ne i8 %obeys.0, 0
		ret i1 %tobool4
		}

		; CHECK-LABEL: test17
		; CHECK: if.then:
		; CHECK-NEXT: icmp uge
		; CHECK-NEXT: br label %[[x:.*]]

		; CHECK: if.then2:
		; CHECK-NEXT: add
		; CHECK-NEXT: icmp ule
		; CHECK-NEXT: br label %[[x]]

		; CHECK: [[x]]:
		; CHECK-NEXT: %[[y:.*]] = phi i1 [ %cmp
		; CHECK-NEXT: %[[z:.*]] = zext i1 %[[y]]
		; CHECK-NEXT: br label %if.end

		; CHECK: if.end:
		; CHECK-NEXT: phi i8
		; CHECK-DAG: [ %[[z]], %[[x]] ]
		; CHECK-DAG: [ 0, %entry ]

		define zeroext i1 @test18(i32 %flag, i32 %blksA, i32 %blksB, i32 %nblks) {
		entry:
		switch i32 %flag, label %if.then3 [
		i32 0, label %if.then
		i32 1, label %if.then2
		]

		if.then:
		%cmp = icmp uge i32 %blksA, %nblks
		%frombool1 = zext i1 %cmp to i8
		br label %if.end

		if.then2:
		%add = add i32 %nblks, %blksB
		%cmp2 = icmp ule i32 %add, %blksA
		%frombool3 = zext i1 %cmp2 to i8
		br label %if.end

		if.then3:
		%add2 = add i32 %nblks, %blksA
		%cmp3 = icmp ule i32 %add2, %blksA
		%frombool4 = zext i1 %cmp3 to i8
		br label %if.end

		if.end:
		%obeys.0 = phi i8 [ %frombool1, %if.then ], [ %frombool3, %if.then2 ], [ %frombool4, %if.then3 ]
		%tobool4 = icmp ne i8 %obeys.0, 0
		ret i1 %tobool4
		}

		; CHECK-LABEL: test18
		; CHECK: if.end:
		; CHECK-NEXT: %[[x:.*]] = phi i1
		; CHECK-DAG: [ %cmp, %if.then ]
		; CHECK-DAG: [ %cmp2, %if.then2 ]
		; CHECK-DAG: [ %cmp3, %if.then3 ]
		; CHECK-NEXT: zext i1 %[[x]] to i8

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Handle tail-sinking of more than 2 incoming branchesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 69708

lib/Transforms/Utils/SimplifyCFG.cpp

test/CodeGen/AArch64/arm64-jumptable.ll

test/CodeGen/AArch64/branch-folder-merge-mmos.ll

test/CodeGen/AArch64/ifcvt-select.ll

test/CodeGen/AArch64/rm_redundant_cmp.ll

test/MC/ARM/data-in-code.ll

test/Transforms/SimplifyCFG/sink-common-code.ll

[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
ClosedPublic