This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: BlockPlacement: Use Branching factor to choose between near equals.
Needs ReviewPublic

Authored by iteratee on May 25 2017, 4:36 PM.

Download Raw Diff

Details

Reviewers

Summary

When choosing between blocks of approximately the same probability, we need some
other metric to choose between them. In this patch, we add the simple metric of
"Branch Factor", which is the number of independent blocks that can be reached
from a block without crossing a join point (A block with multiple successors).
This is related to the classical "Branch Factor" similarly to how a 2-3-4 tree
is related to a red-black tree. These nodes could in theory all be rolled up
into a single node like a switch. If we did that, the calculated value would be
the branching factor of that single node.

Another way of viewing this is if you don't cross join-points, you're
essentially finding a sub-tree in the CFG. We're counting the # of leaves in
that subtree.

The motivation here is that when choosing between a list of alternatives like
this:

if (v < (1<<7)) {
...
} else if (v < (1<<14)) {
...
} else if (v < (1<<21)) {
...
} else if (v < (1<<28)) {
...
} else {
...
}

That we should prefer to fall through to the next test, because those tests have
a larger number of branches they have to pass through in order to be executed.
Even if our estimates of 50% probabilities are correct for these branches,
Having the fallthrough go to the next test helps to reduce tail latency because
it reduces the # of taken branches to get to the bottom of the sub-tree of the
CFG. In the above example, each case ends up with 1 taken branch, instead of the
last case requiring 4 taken branches.

In the attached test case, the expected number of dynamic taken branches is
15/16 for both layouts. But with the patch, the maximum # of taken branches is
1 . Without the patch, it is 4.

Branch factor is a local heuristic. Ideally we would use a measure based on a
graph property like dominance, which would have a better theoretical
grounding. A good candidate would be exclusion: A excludes B if
A ∉ DomF_inf(B) && B ∉ DomF_inf(A) && A ∉ Dom(B) && B ∉ Dom(A).
This is the theoretical definition, but a good intuition is that if A executes,
then B has not, and will not, and vice versa. We are trying to use branch factor
to guess which chosen branch will have the smaller # of excluded blocks.

Diff Detail

Repository: rL LLVM

Event Timeline

iteratee created this revision.May 25 2017, 4:36 PM

Herald added subscribers: jgravelle-google, sbc100, javed.absar and 3 others. · View Herald TranscriptMay 25 2017, 4:36 PM

Looking at the example. Comparing two layout decisions: 1) always pick the next test as layout successor ; and 2) always pick the (else) if-then block as the layout successor.
A) the number of taken branches is the same
B) assuming 50% prob for all branches, the dynamic count of taken branches with 2) is actually smaller.

So I am not convinced this is a right heuristic unless more data is provided.

Here is the suggestion. You can write up a microbenchmark with this CFG shape. Make all the branch's real probability to be 50% (using controlled by random data make branch less predictable). Using __buitlin_expect to force the layout to be either 1) or 2) described above and compare
a) runtime performance
b) taken branches with PMU.

I think you counted wrong. The dynamic taken branch count is the same. But the distribution of the taken branch count is more consistent.

In the example test, the default layout gives:

0 taken: 1/2
1 taken: 1/4
2 taken: 1/8
3 taken: 1/16
4 taken: 1/16

For an average of 15/16.

The layout with this heuristic gives:
0 taken: 1/16
1 taken: 15/16

For the same average, but a much more consistent distribution.

Add comments to the change summary about the global heuristic that we're approximating (maybe badly) by calculating the branch factor.

It is basically a choice between a layout (exiting) that has 50% chance of not taking any branches , 25% of taking one branch, and 25% of taking more than one branches vs the new layout that has only 6.25% chance of taking zero branch and 93.75% of taking only one branch.

The existing layout only has 25% chance of taking more than 2 branches -- is it worth sacrificing 43.75% of chances to not take any branches for the improvement for the 25% cases?

In D33577#766128, @davidxl wrote:

It is basically a choice between a layout (exiting) that has 50% chance of not taking any branches , 25% of taking one branch, and 25% of taking more than one branches vs the new layout that has only 6.25% chance of taking zero branch and 93.75% of taking only one branch.

The existing layout only has 25% chance of taking more than 2 branches -- is it worth sacrificing 43.75% of chances to not take any branches for the improvement for the 25% cases?

Yes.
Another way of looking at this is that the new layout is more resilient to our guesses or profiles being wrong.

Note that the existing layout gets chosen somewhat at random. For exactly 50/50 branches, we rely on the order of the successors, so we're creating a guaranteed order where in the existing code, there isn't any. If the branches were reversed in the IR, we won't un-reverse them.

iteratee edited the summary of this revision. (Show Details)May 26 2017, 3:22 PM

Do you have performance numbers (to show it works better on average)?

Revision Contents

Path

Size

lib/

CodeGen/

MachineBlockPlacement.cpp

115 lines

test/

CodeGen/

AArch64/

combine-comparisons-by-cse.ll

16 lines

machine_cse.ll

2 lines

ARM/

2013-05-05-IfConvertBug.ll

8 lines

tail-opts.ll

5 lines

PowerPC/

tail-dup-branch-to-fallthrough.ll

8 lines

WebAssembly/

cfg-stackify.ll

21 lines

X86/

block-placement-branch-factor.mir

212 lines

block-placement.ll

4 lines

loop-blocks.ll

49 lines

tail-merge-after-mbp.mir

22 lines

Diff 100472

lib/CodeGen/MachineBlockPlacement.cpp

Show All 23 Lines
// function in-order.		// function in-order.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "BranchFolding.h"		#include "BranchFolding.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/BlockFrequencyInfoImpl.h"		#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	class MachineBlockPlacement : public MachineFunctionPass {

/// \brief work lists of blocks that are ready to be laid out		/// \brief work lists of blocks that are ready to be laid out
SmallVector<MachineBasicBlock *, 16> BlockWorkList;		SmallVector<MachineBasicBlock *, 16> BlockWorkList;
SmallVector<MachineBasicBlock *, 16> EHPadWorkList;		SmallVector<MachineBasicBlock *, 16> EHPadWorkList;

/// Edges that have already been computed as optimal.		/// Edges that have already been computed as optimal.
DenseMap<const MachineBasicBlock *, BlockAndTailDupResult> ComputedEdges;		DenseMap<const MachineBasicBlock *, BlockAndTailDupResult> ComputedEdges;

		/// Branch Factor. Used to decide between Equal blocks.
		DenseMap<const MachineBasicBlock *, uint32_t> BranchFactorMap;

/// \brief Machine Function		/// \brief Machine Function
MachineFunction *F;		MachineFunction *F;

/// \brief A handle to the branch probability pass.		/// \brief A handle to the branch probability pass.
const MachineBranchProbabilityInfo *MBPI;		const MachineBranchProbabilityInfo *MBPI;

/// \brief A handle to the function-wide block frequency pass.		/// \brief A handle to the function-wide block frequency pass.
std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;		std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	#endif
/// Returns true if a block can tail duplicate into all unplaced		/// Returns true if a block can tail duplicate into all unplaced
/// predecessors. Filters based on loop.		/// predecessors. Filters based on loop.
bool canTailDuplicateUnplacedPreds(		bool canTailDuplicateUnplacedPreds(
const MachineBasicBlock BB, MachineBasicBlock Succ,		const MachineBasicBlock BB, MachineBasicBlock Succ,
const BlockChain &Chain, const BlockFilterSet *BlockFilter);		const BlockChain &Chain, const BlockFilterSet *BlockFilter);
/// Find chains of triangles to tail-duplicate where a global analysis works,		/// Find chains of triangles to tail-duplicate where a global analysis works,
/// but a local analysis would not find them.		/// but a local analysis would not find them.
void precomputeTriangleChains();		void precomputeTriangleChains();
		/// Compute the branch factor of a block. Branch factor of a block with
		/// multiple predecessors is 0, Otherwise the Branch factor of a block B with
		/// n successors is sum(BF(successors(B))) + n - 1. Values are computed as
		/// needed and placed in BranchFactorMap. Choosing a block with a higher
		/// branch factor helps to make the dynamic taken branch count more
		/// consistent, without raising the average.
		uint32_t computeBranchFactor(const MachineBasicBlock *BB);

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineBlockPlacement() : MachineFunctionPass(ID) {		MachineBlockPlacement() : MachineFunctionPass(ID) {
initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());		initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;
▲ Show 20 Lines • Show All 707 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *src : reverse(Chain.Edges)) {
assert(InsertResult.second && "Block seen twice.");		assert(InsertResult.second && "Block seen twice.");
(void)InsertResult;		(void)InsertResult;

dst = src;		dst = src;
}		}
}		}
}		}

		uint32_t MachineBlockPlacement::computeBranchFactor(const MachineBasicBlock *BB) {

		auto FindIt = BranchFactorMap.find(BB);
		if (FindIt != BranchFactorMap.end())
		return FindIt->second;

		if (BB->pred_size() > 1 \|\| BB->succ_size() == 0) {
		BranchFactorMap[BB] = 1;
		return 1;
		}

		typedef GraphTraits<const MachineBasicBlock*> GT;
		typedef typename GT::ChildIteratorType ChildItTy;
		SmallVector<std::pair<const MachineBasicBlock *, ChildItTy>, 16> VisitStack;

		VisitStack.emplace_back(BB, GT::child_begin(BB));

		// Walk the nodes from BB in post order. This won't get confused by loops
		// because we don't traverse nodes with more than one successor.
		while(!VisitStack.empty()) {
		// Keep pushing things on the stack until we have a Node with all children
		// visited.
		while(VisitStack.back().second != GT::child_end(VisitStack.back().first)) {
		MachineBasicBlock NextChild = VisitStack.back().second++;
		if (!BranchFactorMap.count(NextChild)) {
		if (NextChild->pred_size() == 1 && NextChild->succ_size() != 0)
		VisitStack.emplace_back(NextChild, GT::child_begin(NextChild));
		else
		BranchFactorMap[NextChild] = 1;
		}
		}
		auto TopBB = VisitStack.back().first;
		VisitStack.pop_back();
		int SumBranches = 0;
		for (MachineBasicBlock *Succ : TopBB->successors())
		SumBranches += BranchFactorMap[Succ];
		BranchFactorMap[TopBB] = SumBranches;
		}
		return BranchFactorMap[BB];
		}

// When profile is not present, return the StaticLikelyProb.		// When profile is not present, return the StaticLikelyProb.
// When profile is available, we need to handle the triangle-shape CFG.		// When profile is available, we need to handle the triangle-shape CFG.
static BranchProbability getLayoutSuccessorProbThreshold(		static BranchProbability getLayoutSuccessorProbThreshold(
const MachineBasicBlock *BB) {		const MachineBasicBlock *BB) {
if (!BB->getParent()->getFunction()->getEntryCount())		if (!BB->getParent()->getFunction()->getEntryCount())
return BranchProbability(StaticLikelyProb, 100);		return BranchProbability(StaticLikelyProb, 100);
if (BB->succ_size() == 2) {		if (BB->succ_size() == 2) {
const MachineBasicBlock Succ1 = BB->succ_begin();		const MachineBasicBlock Succ1 = BB->succ_begin();
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	if (isTrellis(BB, Successors, Chain, BlockFilter))
return getBestTrellisSuccessor(BB, Successors, AdjustedSumProb, Chain,		return getBestTrellisSuccessor(BB, Successors, AdjustedSumProb, Chain,
BlockFilter);		BlockFilter);

// For blocks with CFG violations, we may be able to lay them out anyway with		// For blocks with CFG violations, we may be able to lay them out anyway with
// tail-duplication. We keep this vector so we can perform the probability		// tail-duplication. We keep this vector so we can perform the probability
// calculations the minimum number of times.		// calculations the minimum number of times.
SmallVector<std::tuple<BranchProbability, MachineBasicBlock *>, 4>		SmallVector<std::tuple<BranchProbability, MachineBasicBlock *>, 4>
DupCandidates;		DupCandidates;
		// If we didn't pick a tail-duplicate candidate, choose between the blocks
		// within 10% of BestProb by <BranchFactor, prob>
		SmallVector<std::tuple<uint32_t, BranchProbability, MachineBasicBlock *>, 4>
		Candidates;
for (MachineBasicBlock *Succ : Successors) {		for (MachineBasicBlock *Succ : Successors) {
auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);		auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);
BranchProbability SuccProb =		BranchProbability SuccProb =
getAdjustedProbability(RealSuccProb, AdjustedSumProb);		getAdjustedProbability(RealSuccProb, AdjustedSumProb);

BlockChain &SuccChain = *BlockToChain[Succ];		BlockChain &SuccChain = *BlockToChain[Succ];
// Skip the edge \c BB->Succ if block \c Succ has a better layout		// Skip the edge \c BB->Succ if block \c Succ has a better layout
// predecessor that yields lower global cost.		// predecessor that yields lower global cost.
if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,		if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,
Chain, BlockFilter)) {		Chain, BlockFilter)) {
// If tail duplication would make Succ profitable, place it.		// If tail duplication would make Succ profitable, place it.
if (TailDupPlacement && shouldTailDuplicate(Succ))		if (TailDupPlacement && shouldTailDuplicate(Succ))
DupCandidates.push_back(std::make_tuple(SuccProb, Succ));		DupCandidates.emplace_back(SuccProb, Succ);
continue;		continue;
}		}

DEBUG(		DEBUG(
dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "		dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "
<< SuccProb		<< SuccProb
<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")		<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")
<< "\n");		<< "\n");

if (BestSucc.BB && BestProb >= SuccProb) {		Candidates.emplace_back(0, SuccProb, Succ);
		if (BestProb >= SuccProb) {
DEBUG(dbgs() << " Not the best candidate, continuing\n");		DEBUG(dbgs() << " Not the best candidate, continuing\n");
continue;		continue;
}		}

DEBUG(dbgs() << " Setting it as best candidate\n");		DEBUG(dbgs() << " Setting it as best candidate\n");
BestSucc.BB = Succ;
BestProb = SuccProb;		BestProb = SuccProb;
		BestSucc.BB = Succ;
}		}
// Handle the tail duplication candidates in order of decreasing probability.		// Handle the tail duplication candidates in order of decreasing probability.
// Stop at the first one that is profitable. Also stop if they are less		// Stop at the first one that is profitable. Also stop if they are less
// profitable than BestSucc. Position is important because we preserve it and		// profitable than BestSucc. Position is important because we preserve it and
// prefer first best match. Here we aren't comparing in order, so we capture		// prefer first best match. Here we aren't comparing in order, so we capture
// the position instead.		// the position instead.
if (DupCandidates.size() != 0) {		if (DupCandidates.size() != 0) {
auto cmp =		auto cmp =
[](const std::tuple<BranchProbability, MachineBasicBlock *> &a,		[](const std::tuple<BranchProbability, MachineBasicBlock *> &a,
const std::tuple<BranchProbability, MachineBasicBlock *> &b) {		const std::tuple<BranchProbability, MachineBasicBlock *> &b) {
return std::get<0>(a) > std::get<0>(b);		return std::get<0>(a) > std::get<0>(b);
};		};
std::stable_sort(DupCandidates.begin(), DupCandidates.end(), cmp);		std::stable_sort(DupCandidates.begin(), DupCandidates.end(), cmp);
}		}
for(auto &Tup : DupCandidates) {		for(auto &Tup : DupCandidates) {
BranchProbability DupProb;		BranchProbability DupProb;
MachineBasicBlock *Succ;		MachineBasicBlock *Succ;
std::tie(DupProb, Succ) = Tup;		std::tie(DupProb, Succ) = Tup;
if (DupProb < BestProb)		if (DupProb < BestProb)
break;		break;
if (canTailDuplicateUnplacedPreds(BB, Succ, Chain, BlockFilter)		if (canTailDuplicateUnplacedPreds(BB, Succ, Chain, BlockFilter)
&& (isProfitableToTailDup(BB, Succ, BestProb, Chain, BlockFilter))) {		&& (isProfitableToTailDup(BB, Succ, BestProb, Chain, BlockFilter))) {
DEBUG(		DEBUG(
dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "		dbgs() << " Selected: " << getBlockName(Succ) << ", probability: "
<< DupProb		<< DupProb
<< " (Tail Duplicate)\n");		<< " (Tail Duplicate)\n");
BestSucc.BB = Succ;		BestSucc.BB = Succ;
BestSucc.ShouldTailDup = true;		BestSucc.ShouldTailDup = true;
break;		return BestSucc;
}		}
}		}

if (BestSucc.BB)		// If we didn't find any suitable successor, return now.
DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB) << "\n");		if (!BestSucc.BB)
		return BestSucc;

		// The code from here to the end chooses blocks based on branch factor if they
		// are close enough to the max probability. First we filter the vector down to
		// the tuples that are close enough to the max probability. Then if there are
		// more than one, we calculate the branch factor for those blocks. Finally we
		// choose based on (branch factor, probability). We preserve order for
		// determinism.
		BranchProbability ScaledBestProb = BranchProbability(9, 10) * BestProb;
		// If we didn't pick a tail-duplicate candidate, choose between the blocks
		// within 10% of BestProb by <BranchFactor, prob>
		auto NotProbableEnough = [&ScaledBestProb] (
		std::tuple<uint32_t, BranchProbability, MachineBasicBlock *> Candidate) {
		return std::get<1>(Candidate) < ScaledBestProb;
		};

		auto ValidCandidates = make_range(
		Candidates.begin(), std::remove_if(
		Candidates.begin(), Candidates.end(), NotProbableEnough));
		// If we only have one remaining candidate, use that.
		if (std::next(ValidCandidates.begin()) == ValidCandidates.end()) {
		DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB) << "\n");
		BestSucc.BB = std::get<2>(*ValidCandidates.begin());
		return BestSucc;
		}
		for (auto &Candidate : ValidCandidates) {
		auto Succ = std::get<2>(Candidate);
		std::get<0>(Candidate) = computeBranchFactor(Succ);
		}
		auto cmp =
		[](const std::tuple<uint32_t, BranchProbability, MachineBasicBlock *> &a,
		const std::tuple<uint32_t, BranchProbability, MachineBasicBlock *> &b) {
		return (std::get<0>(a) < std::get<0>(b) \|\|
		(std::get<0>(a) == std::get<0>(b) &&
		std::get<1>(a) < std::get<1>(b)));
		};
		BranchProbability Prob;
		uint32_t BranchFactor;
		std::tie(BranchFactor, Prob, BestSucc.BB) = *std::max_element(
		ValidCandidates.begin(), ValidCandidates.end(), cmp);
		DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB)
		<< ", probability: " << Prob
		<< ", branch factor: " << BranchFactor << "\n");
return BestSucc;		return BestSucc;
}		}

/// \brief Select the best block from a worklist.		/// \brief Select the best block from a worklist.
///		///
/// This looks through the provided worklist as a list of candidate basic		/// This looks through the provided worklist as a list of candidate basic
/// blocks and select the most profitable one to place. The definition of		/// blocks and select the most profitable one to place. The definition of
/// profitable only really makes sense in the context of a loop. This returns		/// profitable only really makes sense in the context of a loop. This returns
▲ Show 20 Lines • Show All 1,182 Lines • ▼ Show 20 Lines	BranchFolder BF(/EnableTailMerge=/true, /CommonHoist=/false, *MBFI,
*MBPI, TailMergeSize);		*MBPI, TailMergeSize);

if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),		if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),
getAnalysisIfAvailable<MachineModuleInfo>(), MLI,		getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
/AfterBlockPlacement=/true)) {		/AfterBlockPlacement=/true)) {
// Redo the layout if tail merging creates/removes/moves blocks.		// Redo the layout if tail merging creates/removes/moves blocks.
BlockToChain.clear();		BlockToChain.clear();
ComputedEdges.clear();		ComputedEdges.clear();
		BranchFactorMap.clear();
// Must redo the post-dominator tree if blocks were changed.		// Must redo the post-dominator tree if blocks were changed.
if (MPDT)		if (MPDT)
MPDT->runOnMachineFunction(MF);		MPDT->runOnMachineFunction(MF);
ChainAllocator.DestroyAll();		ChainAllocator.DestroyAll();
buildCFGChains();		buildCFGChains();
}		}
}		}

optimizeBranches();		optimizeBranches();
alignBlocks();		alignBlocks();

BlockToChain.clear();		BlockToChain.clear();
ComputedEdges.clear();		ComputedEdges.clear();
		BranchFactorMap.clear();
ChainAllocator.DestroyAll();		ChainAllocator.DestroyAll();

if (AlignAllBlock)		if (AlignAllBlock)
// Align all of the blocks in the function to a specific alignment.		// Align all of the blocks in the function to a specific alignment.
for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF)
MBB.setAlignment(AlignAllBlock);		MBB.setAlignment(AlignAllBlock);
else if (AlignAllNonFallThruBlocks) {		else if (AlignAllNonFallThruBlocks) {
// Align all of the blocks that have no fall-through predecessors to a		// Align all of the blocks that have no fall-through predecessors to a
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

test/CodeGen/AArch64/combine-comparisons-by-cse.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]		%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a > 5 && b == c) \|\| (a < 5 && b == d)		; (a > 5 && b == c) \|\| (a < 5 && b == d)
define i32 @combine_gt_lt_5() #0 {		define i32 @combine_gt_lt_5() #0 {
; CHECK-LABEL: combine_gt_lt_5		; CHECK-LABEL: combine_gt_lt_5
; CHECK: cmp		; CHECK: cmp
; CHECK: b.le		; CHECK: b.gt
; CHECK: ret
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: b.ge		; CHECK: b.ge
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp sgt i32 %0, 5		%cmp = icmp sgt i32 %0, 5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]		%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a < 5 && b == c) \|\| (a > 5 && b == d)		; (a < 5 && b == c) \|\| (a > 5 && b == d)
define i32 @combine_lt_gt_5() #0 {		define i32 @combine_lt_gt_5() #0 {
; CHECK-LABEL: combine_lt_gt_5		; CHECK-LABEL: combine_lt_gt_5
; CHECK: cmp		; CHECK: cmp
; CHECK: b.ge		; CHECK: b.lt
; CHECK: ret
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: b.le		; CHECK: b.le
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp slt i32 %0, 5		%cmp = icmp slt i32 %0, 5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 17 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]		%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a > -5 && b == c) \|\| (a < -5 && b == d)		; (a > -5 && b == c) \|\| (a < -5 && b == d)
define i32 @combine_gt_lt_n5() #0 {		define i32 @combine_gt_lt_n5() #0 {
; CHECK-LABEL: combine_gt_lt_n5		; CHECK-LABEL: combine_gt_lt_n5
; CHECK: cmn		; CHECK: cmn
; CHECK: b.le		; CHECK: b.gt
; CHECK: ret
; CHECK-NOT: cmn		; CHECK-NOT: cmn
; CHECK: b.ge		; CHECK: b.ge
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp sgt i32 %0, -5		%cmp = icmp sgt i32 %0, -5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 17 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]		%retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a < -5 && b == c) \|\| (a > -5 && b == d)		; (a < -5 && b == c) \|\| (a > -5 && b == d)
define i32 @combine_lt_gt_n5() #0 {		define i32 @combine_lt_gt_n5() #0 {
; CHECK-LABEL: combine_lt_gt_n5		; CHECK-LABEL: combine_lt_gt_n5
; CHECK: cmn		; CHECK: cmn
; CHECK: b.ge		; CHECK: b.lt
; CHECK: ret
; CHECK-NOT: cmn		; CHECK-NOT: cmn
; CHECK: b.le		; CHECK: b.le
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp slt i32 %0, -5		%cmp = icmp slt i32 %0, -5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

test/CodeGen/AArch64/machine_cse.ll

	; RUN: llc < %s -mtriple=aarch64-linux-gnuabi -O2 -tail-dup-placement=0 \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-linux-gnuabi -O2 -tail-dup-placement=0 \| FileCheck %s
	; -tail-dup-placement causes tail duplication during layout. This breaks the			; -tail-dup-placement causes tail duplication during layout. This breaks the
	; assumptions of the test case as written (specifically, it creates an			; assumptions of the test case as written (specifically, it creates an
	; additional cmp instruction, creating a false positive), so we pass			; additional cmp instruction, creating a false positive), so we pass
	; -tail-dup-placement=0 to restore the original behavior			; -tail-dup-placement=0 to restore the original behavior

	; marked as external to prevent possible optimizations			; marked as external to prevent possible optimizations
	@a = external global i32			@a = external global i32
	@b = external global i32			@b = external global i32
	@c = external global i32			@c = external global i32
	@d = external global i32			@d = external global i32
	@e = external global i32			@e = external global i32

	define void @combine-sign-comparisons-by-cse(i32 *%arg) {			define void @combine-sign-comparisons-by-cse(i32 *%arg) {
	; CHECK: cmp			; CHECK: cmp
	; CHECK: b.ge			; CHECK: b.lt
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: b.le			; CHECK: b.le

	entry:			entry:
	%a = load i32, i32* @a, align 4			%a = load i32, i32* @a, align 4
	%b = load i32, i32* @b, align 4			%b = load i32, i32* @b, align 4
	%c = load i32, i32* @c, align 4			%c = load i32, i32* @c, align 4
	%d = load i32, i32* @d, align 4			%d = load i32, i32* @d, align 4
	Show All 25 Lines

test/CodeGen/ARM/2013-05-05-IfConvertBug.ll

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: bxle lr			; CHECK-NEXT: bxle lr
	; Next BB			; Next BB
	; CHECK: [[LABEL]]:			; CHECK: [[LABEL]]:
	; CHECK-NEXT: subs r0, r1, r0			; CHECK-NEXT: subs r0, r1, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr

	; CHECK-V8-LABEL: wrapDistance:			; CHECK-V8-LABEL: wrapDistance:
	; CHECK-V8: cmp r1, #59			; CHECK-V8: cmp r1, #59
	; CHECK-V8-NEXT: bgt			; CHECK-V8-NEXT: ble
	; CHECK-V8-NEXT: %if.then
	; CHECK-V8-NEXT: subs r0, r2, #1
	; CHECK-V8-NEXT: bx lr
	; CHECK-V8-NEXT: %if.else			; CHECK-V8-NEXT: %if.else
	; CHECK-V8-NEXT: subs [[REG:r[0-9]+]], #120			; CHECK-V8-NEXT: subs [[REG:r[0-9]+]], #120
	; CHECK-V8-NEXT: cmp [[REG]], r1			; CHECK-V8-NEXT: cmp [[REG]], r1
	; CHECK-V8-NEXT: bge			; CHECK-V8-NEXT: bge
	; CHECK-V8-NEXT: %if.else			; CHECK-V8-NEXT: %if.else
	; CHECK-V8-NEXT: cmp r0, #119			; CHECK-V8-NEXT: cmp r0, #119
	; CHECK-V8-NEXT: bgt			; CHECK-V8-NEXT: bgt
	; CHECK-V8-NEXT: %if.then4			; CHECK-V8-NEXT: %if.then4
	; CHECK-V8-NEXT: adds r0, r1, #1			; CHECK-V8-NEXT: adds r0, r1, #1
	; CHECK-V8-NEXT: bx lr			; CHECK-V8-NEXT: bx lr
				; CHECK-V8-NEXT: %if.then
				; CHECK-V8-NEXT: subs r0, r2, #1
				; CHECK-V8-NEXT: bx lr
	; CHECK-V8-NEXT: %if.end5			; CHECK-V8-NEXT: %if.end5
	; CHECK-V8-NEXT: subs r0, r1, r0			; CHECK-V8-NEXT: subs r0, r1, r0
	; CHECK-V8-NEXT: bx lr			; CHECK-V8-NEXT: bx lr

	define i32 @wrapDistance(i32 %tx, i32 %sx, i32 %w) {			define i32 @wrapDistance(i32 %tx, i32 %sx, i32 %w) {
	entry:			entry:
	%cmp = icmp slt i32 %sx, 60			%cmp = icmp slt i32 %sx, 60
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else
	Show All 24 Lines

test/CodeGen/ARM/tail-opts.ll

	Show All 9 Lines
	@GHJK = global i32 0			@GHJK = global i32 0

	declare i8* @choose(i8, i8)			declare i8* @choose(i8, i8)

	; BranchFolding should tail-duplicate the indirect jump to avoid			; BranchFolding should tail-duplicate the indirect jump to avoid
	; redundant branching.			; redundant branching.

	; CHECK-LABEL: tail_duplicate_me:			; CHECK-LABEL: tail_duplicate_me:
	; CHECK: qux			; CHECK: car
	; CHECK: movw r{{[0-9]+}}, :lower16:_GHJK			; CHECK: movw r{{[0-9]+}}, :lower16:_GHJK
	; CHECK: movt r{{[0-9]+}}, :upper16:_GHJK			; CHECK: movt r{{[0-9]+}}, :upper16:_GHJK
	; CHECK: str r			; CHECK: str r
	; CHECK-NEXT: bx r			; CHECK-NEXT: bx r
	; CHECK: qux			; CHECK: bar
	; CHECK: movw r{{[0-9]+}}, :lower16:_GHJK			; CHECK: movw r{{[0-9]+}}, :lower16:_GHJK
	; CHECK: movt r{{[0-9]+}}, :upper16:_GHJK			; CHECK: movt r{{[0-9]+}}, :upper16:_GHJK
	; CHECK: str r			; CHECK: str r
	; CHECK-NEXT: bx r			; CHECK-NEXT: bx r
				; CHECK: dar
	; CHECK: movw r{{[0-9]+}}, :lower16:_GHJK			; CHECK: movw r{{[0-9]+}}, :lower16:_GHJK
	; CHECK: movt r{{[0-9]+}}, :upper16:_GHJK			; CHECK: movt r{{[0-9]+}}, :upper16:_GHJK
	; CHECK: str r			; CHECK: str r
	; CHECK-NEXT: bx r			; CHECK-NEXT: bx r

	define void @tail_duplicate_me() nounwind {			define void @tail_duplicate_me() nounwind {
	entry:			entry:
	%a = call i1 @qux()			%a = call i1 @qux()
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/tail-dup-branch-to-fallthrough.ll

	Show All 10 Lines
	declare void @f4()			declare void @f4()

	; Function Attrs: nounwind			; Function Attrs: nounwind
	; CHECK-LABEL: tail_dup_fallthrough_with_branch			; CHECK-LABEL: tail_dup_fallthrough_with_branch
	; CHECK: # %entry			; CHECK: # %entry
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}			; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: # %entry			; CHECK: # %entry
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}			; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: # %sw.0
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: # %sw.1
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: # %sw.default			; CHECK: # %sw.default
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}			; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: # %if.then			; CHECK: # %if.then
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}			; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
				; CHECK: # %sw.1
				; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
				; CHECK: # %sw.0
				; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: # %if.else			; CHECK: # %if.else
	; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}			; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
	; CHECK: .Lfunc_end0			; CHECK: .Lfunc_end0
	define fastcc void @tail_dup_fallthrough_with_branch(i32 %a, i1 %b) unnamed_addr #0 {			define fastcc void @tail_dup_fallthrough_with_branch(i32 %a, i1 %b) unnamed_addr #0 {
	entry:			entry:
	switch i32 %a, label %sw.default [			switch i32 %a, label %sw.default [
	i32 0, label %sw.0			i32 0, label %sw.0
	i32 1, label %sw.1			i32 1, label %sw.1
	Show All 31 Lines

test/CodeGen/WebAssembly/cfg-stackify.ll

	Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .LBB13_6:			; CHECK-NEXT: .LBB13_6:
	; CHECK-NEXT: end_block{{$}}			; CHECK-NEXT: end_block{{$}}
	; CHECK-NEXT: return{{$}}			; CHECK-NEXT: return{{$}}
	; OPT-LABEL: test4:			; OPT-LABEL: test4:
	; OPT-NEXT: .param i32{{$}}			; OPT-NEXT: .param i32{{$}}
	; OPT: block {{$}}			; OPT: block {{$}}
	; OPT-NEXT: block {{$}}			; OPT-NEXT: block {{$}}
	; OPT: br_if 0, $pop{{[0-9]+}}{{$}}			; OPT: br_if 0, $pop{{[0-9]+}}{{$}}
	; OPT: br_if 1, $pop{{[0-9]+}}{{$}}			; OPT: block {{$}}
	; OPT: br 1{{$}}
	; OPT-NEXT: .LBB13_3:
	; OPT-NEXT: end_block{{$}}
	; OPT-NEXT: block {{$}}
	; OPT: br_if 0, $pop{{[0-9]+}}{{$}}			; OPT: br_if 0, $pop{{[0-9]+}}{{$}}
	; OPT: br_if 1, $pop{{[0-9]+}}{{$}}			; OPT: br_if 2, $pop{{[0-9]+}}{{$}}
	; OPT-NEXT: .LBB13_5:			; OPT-NEXT: .LBB13_3:
	; OPT-NEXT: end_block{{$}}			; OPT-NEXT: end_block{{$}}
	; OPT-NEXT: return{{$}}			; OPT-NEXT: return{{$}}
	; OPT-NEXT: .LBB13_6:			; OPT-NEXT: .LBB13_4:
				; OPT-NEXT: end_block{{$}}
				; OPT: br_if 0, $pop{{[0-9]+}}{{$}}
				; OPT: .LBB13_6:
	; OPT-NEXT: end_block{{$}}			; OPT-NEXT: end_block{{$}}
	; OPT-NEXT: return{{$}}			; OPT-NEXT: return{{$}}
	define void @test4(i32 %t) {			define void @test4(i32 %t) {
	entry:			entry:
	switch i32 %t, label %default [			switch i32 %t, label %default [
	i32 0, label %bb2			i32 0, label %bb2
	i32 2, label %bb2			i32 2, label %bb2
	i32 4, label %bb1			i32 4, label %bb1
	▲ Show 20 Lines • Show All 459 Lines • ▼ Show 20 Lines
	; OPT: return{{$}}			; OPT: return{{$}}
	; OPT-NEXT: .LBB20_4:			; OPT-NEXT: .LBB20_4:
	; OPT-NEXT: end_block{{$}}			; OPT-NEXT: end_block{{$}}
	; OPT-NOT: block			; OPT-NOT: block
	; OPT: block {{$}}			; OPT: block {{$}}
	; OPT-NOT: block			; OPT-NOT: block
	; OPT: br_if 0, $pop{{[0-9]+}}{{$}}			; OPT: br_if 0, $pop{{[0-9]+}}{{$}}
	; OPT-NOT: block			; OPT-NOT: block
				; OPT: br_if 1, $pop{{[0-9]+}}{{$}}
				; OPT-NOT: block
	; OPT: return{{$}}			; OPT: return{{$}}
	; OPT-NEXT: .LBB20_6:			; OPT-NEXT: .LBB20_7:
	; OPT-NEXT: end_block{{$}}			; OPT-NEXT: end_block{{$}}
	; OPT-NOT: block			; OPT-NOT: block
	; OPT: br_if 0, $pop{{[0-9]+}}{{$}}
	; OPT-NOT: block
	; OPT: return{{$}}			; OPT: return{{$}}
	; OPT-NEXT: .LBB20_8:			; OPT-NEXT: .LBB20_8:
	; OPT-NEXT: end_block{{$}}			; OPT-NEXT: end_block{{$}}
	; OPT-NOT: block			; OPT-NOT: block
	; OPT: return{{$}}			; OPT: return{{$}}
	define void @test11() {			define void @test11() {
	bb0:			bb0:
	store volatile i32 0, i32* null			store volatile i32 0, i32* null
	▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

test/CodeGen/X86/block-placement-branch-factor.mir

This file was added.

				# RUN: llc -mtriple=x86_64-linux -run-pass=block-placement -o - %s \| FileCheck %s
				#
				# Check that all but possibly the last if.else blocks get laid out first, before
				# any of the if.then blocks, due to the branch factor checking.
				# CHECK: bb.0.entry
				# CHECK: bb.2.if.else
				# CHECK: bb.4.if.else7
				# CHECK: bb.6.if.else17
				# One of these 2 blocks should follow else17 for fallthrough. As far as branch
				# factor is concerned, they are equals, so it doesn't matter.
				# CHECK: bb.{{[0-9]+}}.if.{{(then23\|else27)}}
				# CHECK-DAG: bb.{{[0-9]+}}.if.else27
				# CHECK-DAG: bb.{{[0-9]+}}.if.then
				# CHECK-DAG: bb.{{[0-9]+}}.if.then4
				# CHECK-DAG: bb.{{[0-9]+}}.if.then13
				# The other block should be placed somewhere.
				# CHECK-DAG: bb.{{[0-9]+}}.if.{{(else27\|then23)}}
				--- \|
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i8* @varint_encode(i8* %sptr, i32 %v) local_unnamed_addr {
				entry:
				%cmp = icmp ult i32 %v, 128
				%conv = trunc i32 %v to i8
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%incdec.ptr = getelementptr inbounds i8, i8* %sptr, i64 1
				store i8 %conv, i8* %sptr, align 1, !tbaa !1
				br label %if.end37

				if.else: ; preds = %entry
				%conv1 = or i8 %conv, -128
				store i8 %conv1, i8* %sptr, align 1, !tbaa !1
				%cmp3 = icmp ult i32 %v, 16384
				%shr = lshr i32 %v, 7
				%conv5 = trunc i32 %shr to i8
				br i1 %cmp3, label %if.then4, label %if.else7

				if.then4: ; preds = %if.else
				%incdec.ptr6 = getelementptr inbounds i8, i8* %sptr, i64 2
				%sunkaddr = getelementptr i8, i8* %sptr, i64 1
				store i8 %conv5, i8* %sunkaddr, align 1, !tbaa !1
				br label %if.end37

				if.else7: ; preds = %if.else
				%conv10 = or i8 %conv5, -128
				%sunkaddr1 = getelementptr i8, i8* %sptr, i64 1
				store i8 %conv10, i8* %sunkaddr1, align 1, !tbaa !1
				%cmp12 = icmp ult i32 %v, 2097152
				%shr14 = lshr i32 %v, 14
				%conv15 = trunc i32 %shr14 to i8
				br i1 %cmp12, label %if.then13, label %if.else17

				if.then13: ; preds = %if.else7
				%incdec.ptr16 = getelementptr inbounds i8, i8* %sptr, i64 3
				%sunkaddr2 = getelementptr i8, i8* %sptr, i64 2
				store i8 %conv15, i8* %sunkaddr2, align 1, !tbaa !1
				br label %if.end37

				if.else17: ; preds = %if.else7
				%conv20 = or i8 %conv15, -128
				%sunkaddr3 = getelementptr i8, i8* %sptr, i64 2
				store i8 %conv20, i8* %sunkaddr3, align 1, !tbaa !1
				%cmp22 = icmp ult i32 %v, 268435456
				%shr24 = lshr i32 %v, 21
				%conv25 = trunc i32 %shr24 to i8
				br i1 %cmp22, label %if.then23, label %if.else27

				if.then23: ; preds = %if.else17
				%incdec.ptr26 = getelementptr inbounds i8, i8* %sptr, i64 4
				%sunkaddr4 = getelementptr i8, i8* %sptr, i64 3
				store i8 %conv25, i8* %sunkaddr4, align 1, !tbaa !1
				br label %if.end37

				if.else27: ; preds = %if.else17
				%conv30 = or i8 %conv25, -128
				%incdec.ptr31 = getelementptr inbounds i8, i8* %sptr, i64 4
				%sunkaddr5 = getelementptr i8, i8* %sptr, i64 3
				store i8 %conv30, i8* %sunkaddr5, align 1, !tbaa !1
				%shr32 = lshr i32 %v, 28
				%conv33 = trunc i32 %shr32 to i8
				%incdec.ptr34 = getelementptr inbounds i8, i8* %sptr, i64 5
				store i8 %conv33, i8* %incdec.ptr31, align 1, !tbaa !1
				br label %if.end37

				if.end37: ; preds = %if.else27, %if.then23, %if.then13, %if.then4, %if.then
				%ptr.0 = phi i8* [ %incdec.ptr, %if.then ], [ %incdec.ptr6, %if.then4 ], [ %incdec.ptr16, %if.then13 ], [ %incdec.ptr26, %if.then23 ], [ %incdec.ptr34, %if.else27 ]
				ret i8* %ptr.0
				}

				!llvm.ident = !{!0}

				!0 = !{!"clang version 5.0.0 (trunk 303077) (llvm/trunk 303082)"}
				!1 = !{!2, !2, i64 0}
				!2 = !{!"omnipotent char", !3, i64 0}
				!3 = !{!"Simple C++ TBAA"}

				...
				---
				name: varint_encode
				alignment: 4
				exposesReturnsTwice: false
				noVRegs: true
				legalized: false
				regBankSelected: false
				selected: false
				tracksRegLiveness: true
				liveins:
				- { reg: '%rdi' }
				- { reg: '%esi' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				maxCallFrameSize: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				body: \|
				bb.0.entry:
				successors: %bb.1.if.then(0x40000000), %bb.2.if.else(0x40000000)
				liveins: %esi, %rdi

				CMP32ri8 %esi, 127, implicit-def %eflags
				JA_1 %bb.2.if.else, implicit killed %eflags

				bb.1.if.then:
				liveins: %esi, %rdi

				MOV8mr %rdi, 1, _, 0, _, %sil, implicit killed %esi :: (store 1 into %ir.sptr, !tbaa !1)
				%rdi = INC64r killed %rdi, implicit-def dead %eflags
				%rax = MOV64rr killed %rdi
				RETQ %rax

				bb.2.if.else:
				successors: %bb.3.if.then4(0x40000000), %bb.4.if.else7(0x40000000)
				liveins: %esi, %rdi

				%al = MOV8rr %sil
				%al = OR8ri killed %al, -128, implicit-def dead %eflags
				MOV8mr %rdi, 1, _, 0, _, killed %al :: (store 1 into %ir.sptr, !tbaa !1)
				%eax = MOV32rr %esi
				%eax = SHR32ri killed %eax, 7, implicit-def dead %eflags
				CMP32ri %esi, 16383, implicit-def %eflags
				JA_1 %bb.4.if.else7, implicit killed %eflags

				bb.3.if.then4:
				liveins: %eax, %rdi

				MOV8mr %rdi, 1, _, 1, _, %al, implicit killed %eax :: (store 1 into %ir.sunkaddr, !tbaa !1)
				%rdi = ADD64ri8 killed %rdi, 2, implicit-def dead %eflags
				%rax = MOV64rr killed %rdi
				RETQ %rax

				bb.4.if.else7:
				successors: %bb.5.if.then13(0x40000000), %bb.6.if.else17(0x40000000)
				liveins: %eax, %esi, %rdi

				%al = OR8ri %al, -128, implicit-def dead %eflags, implicit killed %eax, implicit-def %eax
				MOV8mr %rdi, 1, _, 1, _, %al, implicit killed %eax :: (store 1 into %ir.sunkaddr1, !tbaa !1)
				%eax = MOV32rr %esi
				%eax = SHR32ri killed %eax, 14, implicit-def dead %eflags
				CMP32ri %esi, 2097151, implicit-def %eflags
				JA_1 %bb.6.if.else17, implicit killed %eflags

				bb.5.if.then13:
				liveins: %eax, %rdi

				MOV8mr %rdi, 1, _, 2, _, %al, implicit killed %eax :: (store 1 into %ir.sunkaddr2, !tbaa !1)
				%rdi = ADD64ri8 killed %rdi, 3, implicit-def dead %eflags
				%rax = MOV64rr killed %rdi
				RETQ %rax

				bb.6.if.else17:
				successors: %bb.7.if.then23(0x40000000), %bb.8.if.else27(0x40000000)
				liveins: %eax, %esi, %rdi

				%al = OR8ri %al, -128, implicit-def dead %eflags, implicit killed %eax, implicit-def %eax
				MOV8mr %rdi, 1, _, 2, _, %al, implicit killed %eax :: (store 1 into %ir.sunkaddr3, !tbaa !1)
				%eax = MOV32rr %esi
				%eax = SHR32ri killed %eax, 21, implicit-def dead %eflags
				CMP32ri %esi, 268435455, implicit-def %eflags
				JA_1 %bb.8.if.else27, implicit killed %eflags

				bb.7.if.then23:
				liveins: %eax, %rdi

				MOV8mr %rdi, 1, _, 3, _, %al, implicit killed %eax :: (store 1 into %ir.sunkaddr4, !tbaa !1)
				%rdi = ADD64ri8 killed %rdi, 4, implicit-def dead %eflags
				%rax = MOV64rr killed %rdi
				RETQ %rax

				bb.8.if.else27:
				liveins: %eax, %esi, %rdi

				%al = OR8ri %al, -128, implicit-def dead %eflags, implicit killed %eax, implicit-def %eax
				MOV8mr %rdi, 1, _, 3, _, %al, implicit killed %eax :: (store 1 into %ir.sunkaddr5, !tbaa !1)
				%esi = SHR32ri killed %esi, 28, implicit-def dead %eflags
				MOV8mr %rdi, 1, _, 4, _, %sil, implicit killed %esi :: (store 1 into %ir.incdec.ptr31, !tbaa !1)
				%rdi = ADD64ri8 killed %rdi, 5, implicit-def dead %eflags
				%rax = MOV64rr killed %rdi
				RETQ %rax

				...

test/CodeGen/X86/block-placement.ll

	Show First 20 Lines • Show All 693 Lines • ▼ Show 20 Lines
	}			}

	define void @unanalyzable_branch_to_free_block(float %x) {			define void @unanalyzable_branch_to_free_block(float %x) {
	; Ensure that we can handle unanalyzable branches where the destination block			; Ensure that we can handle unanalyzable branches where the destination block
	; gets selected as the best free block in the CFG.			; gets selected as the best free block in the CFG.
	;			;
	; CHECK-LABEL: unanalyzable_branch_to_free_block			; CHECK-LABEL: unanalyzable_branch_to_free_block
	; CHECK: %entry			; CHECK: %entry
	; CHECK: %a
	; CHECK: %b			; CHECK: %b
	; CHECK: %c
	; CHECK: %exit			; CHECK: %exit
				; CHECK: %a
				; CHECK: %c

	entry:			entry:
	br i1 undef, label %a, label %b			br i1 undef, label %a, label %b

	a:			a:
	call i32 @f()			call i32 @f()
	br label %c			br label %c

	▲ Show 20 Lines • Show All 790 Lines • Show Last 20 Lines

test/CodeGen/X86/loop-blocks.ll

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
; CHECK-LABEL: cfg_islands:		; CHECK-LABEL: cfg_islands:
; CHECK: jmp .LBB3_1		; CHECK: jmp .LBB3_1
; CHECK-NEXT: align		; CHECK-NEXT: align
; CHECK-NEXT: .LBB3_7:		; CHECK-NEXT: .LBB3_7:
; CHECK-NEXT: callq bar100		; CHECK-NEXT: callq bar100
; CHECK-NEXT: .LBB3_1:		; CHECK-NEXT: .LBB3_1:
; CHECK-NEXT: callq loop_header		; CHECK-NEXT: callq loop_header
; CHECK: jl .LBB3_7		; CHECK: jl .LBB3_7
; CHECK: jge .LBB3_3		; CHECK: jl .LBB3_8
		; CHECK: jl .LBB3_9
		; CHECK: jl .LBB3_6
		; CHECK-NEXT: callq loop_latch
		; CHECK-NEXT: jmp .LBB3_1
		; CHECK: .LBB3_8:
; CHECK-NEXT: callq bar101		; CHECK-NEXT: callq bar101
; CHECK-NEXT: jmp .LBB3_1		; CHECK-NEXT: jmp .LBB3_1
; CHECK-NEXT: align		; CHECK-NEXT: .LBB3_9:
; CHECK-NEXT: .LBB3_3:
; CHECK: jge .LBB3_4
; CHECK-NEXT: callq bar102		; CHECK-NEXT: callq bar102
; CHECK-NEXT: jmp .LBB3_1		; CHECK-NEXT: jmp .LBB3_1
; CHECK-NEXT: .LBB3_4:
; CHECK: jl .LBB3_6
; CHECK-NEXT: callq loop_latch
; CHECK-NEXT: jmp .LBB3_1
; CHECK-NEXT: .LBB3_6:		; CHECK-NEXT: .LBB3_6:
		; CHECK-NEXT: callq exit

define void @cfg_islands() nounwind {		define void @cfg_islands() nounwind {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
call void @loop_header()		call void @loop_header()
%t0 = call i32 @get()		%t0 = call i32 @get()
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	bb:
call void @loop_latch()		call void @loop_latch()
br label %loop		br label %loop

done:		done:
call void @exit()		call void @exit()
ret void		ret void
}		}

; This is exactly the same function as slightly_more_involved.
; The difference is that when optimising for size, we do not want
; to see this reordering.

; CHECK-LABEL: slightly_more_involved_2:
; CHECK-NOT: jmp .LBB5_1
; CHECK: .LBB5_1:
; CHECK-NEXT: callq body

define void @slightly_more_involved_2() #0 {
entry:
br label %loop

loop:
call void @body()
%t0 = call i32 @get()
%t1 = icmp slt i32 %t0, 2
br i1 %t1, label %block_a, label %bb

bb:
%t2 = call i32 @get()
%t3 = icmp slt i32 %t2, 99
br i1 %t3, label %exit, label %loop

block_a:
call void @bar99()
br label %loop

exit:
call void @exit()
ret void
}

attributes #0 = { minsize norecurse nounwind optsize readnone uwtable }		attributes #0 = { minsize norecurse nounwind optsize readnone uwtable }

declare void @bar99() nounwind		declare void @bar99() nounwind
declare void @bar100() nounwind		declare void @bar100() nounwind
declare void @bar101() nounwind		declare void @bar101() nounwind
declare void @bar102() nounwind		declare void @bar102() nounwind
declare void @body() nounwind		declare void @body() nounwind
declare void @exit() nounwind		declare void @exit() nounwind
declare void @loop_header() nounwind		declare void @loop_header() nounwind
declare void @loop_latch() nounwind		declare void @loop_latch() nounwind
declare i32 @get() nounwind		declare i32 @get() nounwind
declare void @block_a_true_func() nounwind		declare void @block_a_true_func() nounwind
declare void @block_a_false_func() nounwind		declare void @block_a_false_func() nounwind
declare void @block_a_merge_func() nounwind		declare void @block_a_merge_func() nounwind

test/CodeGen/X86/tail-merge-after-mbp.mir

	# RUN: llc -mtriple=x86_64-linux -run-pass=block-placement -o - %s \| FileCheck %s			# RUN: llc -mtriple=x86_64-linux -run-pass=block-placement -o - %s \| FileCheck %s

	---			---
	# check loop bb.7 is not merged with bb.10, bb.13			# check loop bb.7 is not merged with bb.10, bb.13
	# check loop bb.9 is not merged with bb.12			# check loop bb.9 is not merged with bb.12
	# CHECK: bb.2:			# CHECK: bb.1:
	# CHECK-NEXT: successors: %bb.9(0x30000000), %bb.3(0x50000000)			# CHECK-NEXT: successors: %bb.9(0x30000000), %bb.2(0x50000000)
	# CHECK: %rax = MOV64rm %r14, 1, _, 0, _			# CHECK: %rax = MOV64rm %r14, 1, _, 0, _
	# CHECK-NEXT: TEST64rr %rax, %rax			# CHECK-NEXT: TEST64rr %rax, %rax
	# CHECK-NEXT: JE_1 %bb.9			# CHECK-NEXT: JE_1 %bb.9
	# CHECK: bb.3:			# CHECK: bb.2:
	# CHECK-NEXT: successors: %bb.4(0x30000000), %bb.8(0x50000000)			# CHECK-NEXT: successors: %bb.3(0x30000000), %bb.8(0x50000000)
	# CHECK: CMP64mi8 killed %rax, 1, _, 8, _, 0			# CHECK: CMP64mi8 killed %rax, 1, _, 8, _, 0
	# CHECK-NEXT: JNE_1 %bb.8			# CHECK-NEXT: JNE_1 %bb.8
	# CHECK: bb.4:			# CHECK: bb.3:
	# CHECK-NEXT: successors: %bb.9(0x30000000), %bb.5(0x50000000)			# CHECK-NEXT: successors: %bb.9(0x30000000), %bb.4(0x50000000)
	# CHECK: %rax = MOV64rm %r14, 1, _, 0, _			# CHECK: %rax = MOV64rm %r14, 1, _, 0, _
	# CHECK-NEXT: TEST64rr %rax, %rax			# CHECK-NEXT: TEST64rr %rax, %rax
	# CHECK-NEXT: JE_1 %bb.9			# CHECK-NEXT: JE_1 %bb.9
	# CHECK: bb.5			# CHECK: bb.4
	# CHECK-NEXT: successors: %bb.6(0x71555555), %bb.8(0x0eaaaaab)			# CHECK-NEXT: successors: %bb.5(0x71555555), %bb.8(0x0eaaaaab)
	# CHECK: CMP64mi8 killed %rax, 1, _, 8, _, 0			# CHECK: CMP64mi8 killed %rax, 1, _, 8, _, 0
	# CHECK-NEXT: JNE_1 %bb.8			# CHECK-NEXT: JNE_1 %bb.8
	# CHECK: bb.6:			# CHECK: bb.5:
	# CHECK-NEXT: successors: %bb.9(0x04000000), %bb.5(0x7c000000)			# CHECK-NEXT: successors: %bb.9(0x04000000), %bb.4(0x7c000000)
	# CHECK: %rax = MOV64rm %r14, 1, _, 0, _			# CHECK: %rax = MOV64rm %r14, 1, _, 0, _
	# CHECK-NEXT: TEST64rr %rax, %rax			# CHECK-NEXT: TEST64rr %rax, %rax
	# CHECK-NEXT: JNE_1 %bb.5			# CHECK-NEXT: JNE_1 %bb.4

	name: foo			name: foo
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1(0x40000000), %bb.7(0x40000000)			successors: %bb.1(0x40000000), %bb.7(0x40000000)

	TEST8ri %dl, 1, implicit-def %eflags, implicit killed %edx			TEST8ri %dl, 1, implicit-def %eflags, implicit killed %edx
	JE_1 %bb.7, implicit %eflags			JE_1 %bb.7, implicit %eflags
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines