This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: BlockPlacement: Precompute layout for chains of triangles.
ClosedPublic

Authored by iteratee on Feb 23 2017, 12:47 PM.

Download Raw Diff

Details

Reviewers

Summary

For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

The post-dominators marked 50% are actually taken 56% (This shrinks with longer chains)
The chains are statically correlated. Branch probabilities have a very U-shaped distribution. [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805] If the branches in a chain are likely to be from the same side of the distribution as their predecessor, but are independent at runtime, this transformation is profitable. (Because the cost of being wrong is a small fixed cost, unlike the standard triangle layout where the cost of being wrong scales with the # of triangles.)
The chains are dynamically correlated. If the probability that a previous branch was taken positively influences whether the next branch will be taken

We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

Diff Detail

Event Timeline

iteratee created this revision.Feb 23 2017, 12:47 PM

Herald added a subscriber: nemanjai. · View Herald TranscriptFeb 23 2017, 12:47 PM

iteratee added a parent revision: D30308: CodeGen: MachineBlockPlacement: Rename member to more general name. NFC..Feb 23 2017, 12:48 PM

if BP is not correct, it is better to improve static branch prediction. We explicitly added a threshold for the cost based analysis result to kick in just to be conservative when the branch probability is not biased enough. Even for the long chain case, tail dup is enabled for 50/50 case, but the real profile is 40/60, taildup will hurt performance. I don't see the reason to by pass the branch prob + cost analysis by just looking at the shape.

In D30309#686772, @davidxl wrote:

if BP is not correct, it is better to improve static branch prediction. We explicitly added a threshold for the cost based analysis result to kick in just to be conservative when the branch probability is not biased enough. Even for the long chain case, tail dup is enabled for 50/50 case, but the real profile is 40/60, taildup will hurt performance. I don't see the reason to by pass the branch prob + cost analysis by just looking at the shape.

Well, long chains amortize the penalty, so looking for the shape is definitely necessary.

I can adjust the static prediction if you'd like, but I have a source for the 60/40 stat:
See page 13 here:
http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-83-121.pdf

The chains also allow us to make a correlation assumption. I can explicitly calculate that as well, however, edge frequencies run into aliasing problems. It seems that BlockFrequency wasn't designed to allow for calculations like these.

I really don't want to change this until I get a more specific feedback about what you'd like to see.
Assuming a small amount of positive correlation (10%), the cutoff is 47% (including the frequency bonus) for a chain of 2 triangles that ends in a non-triangle,
and 56% for a chain of triangles that ends in a triangle.

Would you prefer that I adjust the static probabilities for triangles, and then run the comparisons against the thresholds I calculated above? I could even include the whole table from 2-10. (The threshold goes down as the # of triangles goes up)

In D30309#687816, @iteratee wrote:

In D30309#686772, @davidxl wrote:

if BP is not correct, it is better to improve static branch prediction. We explicitly added a threshold for the cost based analysis result to kick in just to be conservative when the branch probability is not biased enough. Even for the long chain case, tail dup is enabled for 50/50 case, but the real profile is 40/60, taildup will hurt performance. I don't see the reason to by pass the branch prob + cost analysis by just looking at the shape.

Well, long chains amortize the penalty, so looking for the shape is definitely necessary.

I can adjust the static prediction if you'd like, but I have a source for the 60/40 stat:
See page 13 here:
http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-83-121.pdf

FWIW, I remember discussing this a loooooong time ago as we were really starting to set up static prediction. At the time, there was some desire to not try to put this weak of signal into static predictions. They have ways of compounding and ending up producing pretty weird results. So I'm not really sure the probabilities we use in static prediction are wrong. (Or rather, are wrong by enough of a margin or in enough cases to be worth shifting.) Maybe we should revisit this, but I'm always a bit skeptical of static heuristics with this small of a difference...

The chains also allow us to make a correlation assumption. I can explicitly calculate that as well, however, edge frequencies run into aliasing problems. It seems that BlockFrequency wasn't designed to allow for calculations like these.

Based on your explanation to me about how all of this works, my understanding is this:

Branches in these kinds of long chains of triangles empirically correlate, even though they may individually have something like 50/50 probability. And the advantage of *correlation* is pretty specific to the *layout* we're doing here.

Given that, I think it is very reasonable to handle this within the layout code by detecting the pattern of CFG combined with (nearly) 50/50 probabilities, and choosing to prioritize a layout this is profitable in the face of correlation because we believe that such correlation will often occur.

Anyways, just my two cents. I'll leave figuring out the end state to you and David. =D

In D30309#689911, @chandlerc wrote:

In D30309#687816, @iteratee wrote:

In D30309#686772, @davidxl wrote:

if BP is not correct, it is better to improve static branch prediction. We explicitly added a threshold for the cost based analysis result to kick in just to be conservative when the branch probability is not biased enough. Even for the long chain case, tail dup is enabled for 50/50 case, but the real profile is 40/60, taildup will hurt performance. I don't see the reason to by pass the branch prob + cost analysis by just looking at the shape.

Well, long chains amortize the penalty, so looking for the shape is definitely necessary.

I can adjust the static prediction if you'd like, but I have a source for the 60/40 stat:
See page 13 here:
http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-83-121.pdf

FWIW, I remember discussing this a loooooong time ago as we were really starting to set up static prediction. At the time, there was some desire to not try to put this weak of signal into static predictions. They have ways of compounding and ending up producing pretty weird results. So I'm not really sure the probabilities we use in static prediction are wrong. (Or rather, are wrong by enough of a margin or in enough cases to be worth shifting.) Maybe we should revisit this, but I'm always a bit skeptical of static heuristics with this small of a difference...

The chains also allow us to make a correlation assumption. I can explicitly calculate that as well, however, edge frequencies run into aliasing problems. It seems that BlockFrequency wasn't designed to allow for calculations like these.

Based on your explanation to me about how all of this works, my understanding is this:

Branches in these kinds of long chains of triangles empirically correlate, even though they may individually have something like 50/50 probability. And the advantage of *correlation* is pretty specific to the *layout* we're doing here.

Given that, I think it is very reasonable to handle this within the layout code by detecting the pattern of CFG combined with (nearly) 50/50 probabilities, and choosing to prioritize a layout this is profitable in the face of correlation because we believe that such correlation will often occur.

Anyways, just my two cents. I'll leave figuring out the end state to you and David. =D

So correlation is an interesting term. There are actually 2 ways that the branches may be correlated:

They may be biased in the same direction. We guess 50% for unknown branches, and over all of them that may be pretty close, but for any individual branch it's unlikely to be 50%. Tail duplication is profitable if the biases go the same direction, even if they are independent at runtime.
They may be dynamically correlated. If the branch is close to 50%, but they are positively correlated, this is also profitable.

It's also profitable if the branches are independent, but each branch is slightly more than 50%: 58% for 2 triangles in a row, and 56% for 3 triangles in a row. (This includes for a 2% penalty for size increases)

Add an internal option to allow precomputing to be disabled.

lib/CodeGen/MachineBlockPlacement.cpp
1049	Add a little more comments here for the justification for doing this: cost based analysis compute layout costs by assuming full branch indepoendence which can lead to conservative result. In certain scenarios, the existence of branch correlation can make tail dup more beneficial ...
1123	use the insert method so that you can modify the chain in place.

Added flag, improved comments and description.

iteratee added inline comments.Mar 2 2017, 2:38 PM

lib/CodeGen/MachineBlockPlacement.cpp
1123	I actually can't do that, because you find it base on the key being the edge source, and insert it based on the key being the edge destination. I wrote it that way initially, and then wondered why all my chains were of length 1. I'll add a comment though, because someone might come and try to clean it up.

Actually update diff.

davidxl added inline comments.Mar 2 2017, 3:13 PM

lib/CodeGen/MachineBlockPlacement.cpp
152	Should the default be 3?
1053	This is not entirely true -- it depends on the length the chain (e.g, when branch proabiity to Pdom blocks are low). Perhaps remove 2).
test/CodeGen/PowerPC/tail-dup-layout.ll
59	Why changing this test?

iteratee added inline comments.Mar 2 2017, 3:43 PM

lib/CodeGen/MachineBlockPlacement.cpp
152	If I had found anywhere where it was likely to cause problems in my benchmarking, I would bump it to 3, but I haven't.
1053	No, It's accurate. If you know that the frequencies are correlated, but not whether they're high or low, than this is profitable. Even if the individual branch decisions are independent. If by the length of the chain, you mean that the only correlations that matter are neighbors, then that is correct and I can add a note about that.

Comments improved, length changed to 3, and tests reverted.

davidxl added inline comments.Mar 2 2017, 4:38 PM

test/CodeGen/PowerPC/tail-dup-layout.ll
59	Missing reply here?

Remove unrelated change to test.

iteratee marked 2 inline comments as done.Mar 2 2017, 4:55 PM

lgtm

This revision is now accepted and ready to land.Mar 2 2017, 4:56 PM

Committed in rL296845

Revision Contents

Path

Size

lib/

CodeGen/

MachineBlockPlacement.cpp

115 lines

test/

CodeGen/

Mips/

llvm-ir/

ashr.ll

30 lines

lshr.ll

28 lines

shl.ll

28 lines

PowerPC/

tail-dup-layout.ll

91 lines

X86/

cmovcmov.ll

19 lines

Diff 90398

lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <algorithm>		#include <algorithm>
		#include <forward_list>
#include <functional>		#include <functional>
#include <utility>		#include <utility>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "block-placement"		#define DEBUG_TYPE "block-placement"

STATISTIC(NumCondBranches, "Number of conditional branches");		STATISTIC(NumCondBranches, "Number of conditional branches");
STATISTIC(NumUncondBranches, "Number of unconditional branches");		STATISTIC(NumUncondBranches, "Number of unconditional branches");
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> TailDupPlacementPenalty(
cl::init(2),		cl::init(2),
cl::Hidden);		cl::Hidden);

extern cl::opt<unsigned> StaticLikelyProb;		extern cl::opt<unsigned> StaticLikelyProb;
extern cl::opt<unsigned> ProfileLikelyProb;		extern cl::opt<unsigned> ProfileLikelyProb;

// Internal option used to control BFI display only after MBP pass.		// Internal option used to control BFI display only after MBP pass.
// Defined in CodeGen/MachineBlockFrequencyInfo.cpp:		// Defined in CodeGen/MachineBlockFrequencyInfo.cpp:
// -view-block-layout-with-bfi=		// -view-block-layout-with-bfi=
		davidxlUnsubmitted Not Done Reply Inline Actions Should the default be 3? davidxl: Should the default be 3?
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions If I had found anywhere where it was likely to cause problems in my benchmarking, I would bump it to 3, but I haven't. iteratee: If I had found anywhere where it was likely to cause problems in my benchmarking, I would bump…
extern cl::opt<GVDAGType> ViewBlockLayoutWithBFI;		extern cl::opt<GVDAGType> ViewBlockLayoutWithBFI;

// Command line option to specify the name of the function for CFG dump		// Command line option to specify the name of the function for CFG dump
// Defined in Analysis/BlockFrequencyInfo.cpp: -view-bfi-func-name=		// Defined in Analysis/BlockFrequencyInfo.cpp: -view-bfi-func-name=
extern cl::opt<std::string> ViewBlockFreqFuncName;		extern cl::opt<std::string> ViewBlockFreqFuncName;

namespace {		namespace {
class BlockChain;		class BlockChain;
▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	#endif
static std::pair<WeightedEdge, WeightedEdge> getBestNonConflictingEdges(		static std::pair<WeightedEdge, WeightedEdge> getBestNonConflictingEdges(
const MachineBasicBlock *BB,		const MachineBasicBlock *BB,
SmallVector<SmallVector<WeightedEdge, 8>, 2> &Edges);		SmallVector<SmallVector<WeightedEdge, 8>, 2> &Edges);
/// Returns true if a block can tail duplicate into all unplaced		/// Returns true if a block can tail duplicate into all unplaced
/// predecessors. Filters based on loop.		/// predecessors. Filters based on loop.
bool canTailDuplicateUnplacedPreds(		bool canTailDuplicateUnplacedPreds(
const MachineBasicBlock BB, MachineBasicBlock Succ,		const MachineBasicBlock BB, MachineBasicBlock Succ,
const BlockChain &Chain, const BlockFilterSet *BlockFilter);		const BlockChain &Chain, const BlockFilterSet *BlockFilter);
		/// Find chains of triangles to tail-duplicate where a global analysis works,
		/// but a local analysis would not find them.
		void precomputeTriangleChains();

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineBlockPlacement() : MachineFunctionPass(ID) {		MachineBlockPlacement() : MachineFunctionPass(ID) {
initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());		initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;
▲ Show 20 Lines • Show All 569 Lines • ▼ Show 20 Lines	if (!TailDup.canTailDuplicate(Succ, Pred)) {
// CFG.		// CFG.
continue;		continue;
return false;		return false;
}		}
}		}
return true;		return true;
}		}

		/// Find chains of triangles where we believe it would be profitable to
		/// tail-duplicate them all, but a local analysis would not find them.
		davidxlUnsubmitted Done Reply Inline Actions Add a little more comments here for the justification for doing this: cost based analysis compute layout costs by assuming full branch indepoendence which can lead to conservative result. In certain scenarios, the existence of branch correlation can make tail dup more beneficial ... davidxl: Add a little more comments here for the justification for doing this: cost based analysis…
		void MachineBlockPlacement::precomputeTriangleChains() {
		struct TriangleChain {
		int Count;
		std::forward_list<MachineBasicBlock*> Edges;
		davidxlUnsubmitted Not Done Reply Inline Actions This is not entirely true -- it depends on the length the chain (e.g, when branch proabiity to Pdom blocks are low). Perhaps remove 2). davidxl: 2) This is not entirely true -- it depends on the length the chain (e.g, when branch proabiity…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions No, It's accurate. If you know that the frequencies are correlated, but not whether they're high or low, than this is profitable. Even if the individual branch decisions are independent. If by the length of the chain, you mean that the only correlations that matter are neighbors, then that is correct and I can add a note about that. iteratee: No, It's accurate. If you know that the frequencies are correlated, but not whether they're…
		TriangleChain(MachineBasicBlock* src, MachineBasicBlock *dst) {
		Edges.push_front(src);
		Edges.push_front(dst);
		Count = 1;
		DEBUG(dbgs() << "Saving Triangle: " <<
		getBlockName(src) << "->" << getBlockName(dst) << "\n");
		}

		void append(MachineBasicBlock *dst) {
		DEBUG(dbgs() << "Extending Triangle: " <<
		getBlockName(Edges.front()) << "->" << getBlockName(dst) <<
		" " << (Count + 1) << "\n");
		assert(!Edges.empty() && Edges.front()->isSuccessor(dst) &&
		"Attempting to append a block that is not a successor.");
		Edges.push_front(dst);
		++Count;
		}

		MachineBasicBlock *getKey() {
		return Edges.front();
		}
		};

		DEBUG(dbgs() << "Pre-computing triangle chains.\n");
		// Map from last block to the chain that contains it. This allows us to extend
		// chains as we find new triangles.
		DenseMap<const MachineBasicBlock *, TriangleChain> TriangleChainMap;
		for (MachineBasicBlock &BB : *F) {
		// If BB doesn't have 2 successors, it doesn't start a triangle.
		if (BB.succ_size() != 2)
		continue;
		MachineBasicBlock *PDom = nullptr;
		for (MachineBasicBlock *Succ : BB.successors()) {
		if (!MPDT->dominates(Succ, &BB))
		continue;
		PDom = Succ;
		break;
		}
		// If BB doesn't have a post-dominating successor, it doesn't form a
		// triangle.
		if (PDom == nullptr)
		continue;
		DEBUG(dbgs() << "Found Triangle: " <<
		getBlockName(&BB) << "->" << getBlockName(PDom) << "\n");
		// If PDom has a hint that it is low probability, skip this triangle.
		if (MBPI->getEdgeProbability(&BB, PDom) < BranchProbability(50, 100))
		continue;
		// If PDom isn't eligible for duplication, this isn't the kind of triangle
		// we're looking for.
		if (!shouldTailDuplicate(PDom))
		continue;
		bool CanTailDuplicate = true;
		// If PDom can't tail-duplicate into it's non-BB predecessors, then this
		// isn't the kind of triangle we're looking for.
		for (MachineBasicBlock* Pred : PDom->predecessors()) {
		if (Pred == &BB)
		continue;
		if (!TailDup.canTailDuplicate(PDom, Pred)) {
		CanTailDuplicate = false;
		break;
		}
		}
		// If we can't tail-duplicate PDom to its predecessors, then skip this
		// triangle.
		if (!CanTailDuplicate)
		continue;

		// Now we have an interesting triangle. Insert it if it's not part of an
		// existing chain
		auto Found = TriangleChainMap.find(&BB);
		davidxlUnsubmitted Not Done Reply Inline Actions use the insert method so that you can modify the chain in place. davidxl: use the insert method so that you can modify the chain in place.
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions I actually can't do that, because you find it base on the key being the edge source, and insert it based on the key being the edge destination. I wrote it that way initially, and then wondered why all my chains were of length 1. I'll add a comment though, because someone might come and try to clean it up. iteratee: I actually can't do that, because you find it base on the key being the edge source, and insert…
		// If it is, remove the chain from the map, grow it, and put it back in the
		// map with the end as the new key.
		if (Found != TriangleChainMap.end()) {
		TriangleChain Chain = std::move(Found->second);
		TriangleChainMap.erase(Found);
		Chain.append(PDom);
		TriangleChainMap.insert(std::make_pair(Chain.getKey(), std::move(Chain)));
		} else {
		auto InsertResult = TriangleChainMap.try_emplace(PDom, &BB, PDom);
		assert (InsertResult.second && "Block seen twice.");
		(void) InsertResult;
		}
		}

		for (auto &ChainPair : TriangleChainMap) {
		TriangleChain &Chain = ChainPair.second;
		// Benchmarking has shown that due to branch correlation duplicating 2 or
		// more triangles is profitable, despite the calculations assuming
		// independence.
		DEBUG(dbgs() << "After scanning, found a chain of size: " << Chain.Count <<
		"\n");
		if (Chain.Count < 2)
		continue;
		MachineBasicBlock *dst = Chain.Edges.front();
		Chain.Edges.pop_front();
		for (MachineBasicBlock *src : Chain.Edges) {
		DEBUG(dbgs() << "Marking edge: " << getBlockName(src) << "->" <<
		getBlockName(dst) << " as pre-computed.\n");
		ComputedEdges[src] = { dst, true };
		dst = src;
		}
		}
		}

// When profile is not present, return the StaticLikelyProb.		// When profile is not present, return the StaticLikelyProb.
// When profile is available, we need to handle the triangle-shape CFG.		// When profile is available, we need to handle the triangle-shape CFG.
static BranchProbability getLayoutSuccessorProbThreshold(		static BranchProbability getLayoutSuccessorProbThreshold(
const MachineBasicBlock *BB) {		const MachineBasicBlock *BB) {
if (!BB->getParent()->getFunction()->getEntryCount())		if (!BB->getParent()->getFunction()->getEntryCount())
return BranchProbability(StaticLikelyProb, 100);		return BranchProbability(StaticLikelyProb, 100);
if (BB->succ_size() == 2) {		if (BB->succ_size() == 2) {
const MachineBasicBlock Succ1 = BB->succ_begin();		const MachineBasicBlock Succ1 = BB->succ_begin();
▲ Show 20 Lines • Show All 1,447 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
PreferredLoopExit = nullptr;		PreferredLoopExit = nullptr;

if (TailDupPlacement) {		if (TailDupPlacement) {
MPDT = &getAnalysis<MachinePostDominatorTree>();		MPDT = &getAnalysis<MachinePostDominatorTree>();
unsigned TailDupSize = TailDupPlacementThreshold;		unsigned TailDupSize = TailDupPlacementThreshold;
if (MF.getFunction()->optForSize())		if (MF.getFunction()->optForSize())
TailDupSize = 1;		TailDupSize = 1;
TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);		TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);
		precomputeTriangleChains();
}		}

assert(BlockToChain.empty());		assert(BlockToChain.empty());

buildCFGChains();		buildCFGChains();

// Changing the layout can create new tail merging opportunities.		// Changing the layout can create new tail merging opportunities.
TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();		TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

test/CodeGen/Mips/llvm-ir/ashr.ll

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
}		}

define signext i64 @ashr_i64(i64 signext %a, i64 signext %b) {		define signext i64 @ashr_i64(i64 signext %a, i64 signext %b) {
entry:		entry:
; ALL-LABEL: ashr_i64:		; ALL-LABEL: ashr_i64:

; M2: srav $[[T0:[0-9]+]], $4, $7		; M2: srav $[[T0:[0-9]+]], $4, $7
; M2: andi $[[T1:[0-9]+]], $7, 32		; M2: andi $[[T1:[0-9]+]], $7, 32
; M2: bnez $[[T1]], $[[BB0:BB[0-9_]+]]		; M2: beqz $[[T1]], $[[BB0:BB[0-9_]+]]
; M2: move $3, $[[T0]]		; M2: move $3, $[[T0]]
		; M2: bnez $[[T1]], $[[BB1:BB[0-9_]+]]
		; M2: nop
		; M2: $[[EXIT:BB[0-9_]+]]:
		; M2: jr $ra
		; M2: nop
		; M2: $[[BB0]]:
; M2: srlv $[[T2:[0-9]+]], $5, $7		; M2: srlv $[[T2:[0-9]+]], $5, $7
; M2: not $[[T3:[0-9]+]], $7		; M2: not $[[T3:[0-9]+]], $7
; M2: sll $[[T4:[0-9]+]], $4, 1		; M2: sll $[[T4:[0-9]+]], $4, 1
; M2: sllv $[[T5:[0-9]+]], $[[T4]], $[[T3]]		; M2: sllv $[[T5:[0-9]+]], $[[T4]], $[[T3]]
		; M2: beqz $[[T1]], $[[EXIT]]
; M2: or $3, $[[T3]], $[[T2]]		; M2: or $3, $[[T3]], $[[T2]]
; M2: $[[BB0]]:
; M2: beqz $[[T1]], $[[BB1:BB[0-9_]+]]
; M2: nop
; M2: sra $2, $4, 31
; M2: $[[BB1]]:		; M2: $[[BB1]]:
; M2: jr $ra		; M2: jr $ra
; M2: nop		; M2: sra $2, $4, 31

; 32R1-R5: srlv $[[T0:[0-9]+]], $5, $7		; 32R1-R5: srlv $[[T0:[0-9]+]], $5, $7
; 32R1-R5: not $[[T1:[0-9]+]], $7		; 32R1-R5: not $[[T1:[0-9]+]], $7
; 32R1-R5: sll $[[T2:[0-9]+]], $4, 1		; 32R1-R5: sll $[[T2:[0-9]+]], $4, 1
; 32R1-R5: sllv $[[T3:[0-9]+]], $[[T2]], $[[T1]]		; 32R1-R5: sllv $[[T3:[0-9]+]], $[[T2]], $[[T1]]
; 32R1-R5: or $3, $[[T3]], $[[T0]]		; 32R1-R5: or $3, $[[T3]], $[[T0]]
; 32R1-R5: srav $[[T4:[0-9]+]], $4, $7		; 32R1-R5: srav $[[T4:[0-9]+]], $4, $7
; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32		; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	; ALL-LABEL: ashr_i128:

; o32 shouldn't use TImode helpers.		; o32 shouldn't use TImode helpers.
; GP32-NOT: lw $25, %call16(__ashrti3)($gp)		; GP32-NOT: lw $25, %call16(__ashrti3)($gp)
; MM-NOT: lw $25, %call16(__ashrti3)($2)		; MM-NOT: lw $25, %call16(__ashrti3)($2)

; M3: sll $[[T0:[0-9]+]], $7, 0		; M3: sll $[[T0:[0-9]+]], $7, 0
; M3: dsrav $[[T1:[0-9]+]], $4, $7		; M3: dsrav $[[T1:[0-9]+]], $4, $7
; M3: andi $[[T2:[0-9]+]], $[[T0]], 64		; M3: andi $[[T2:[0-9]+]], $[[T0]], 64
; M3: bnez $[[T3:[0-9]+]], [[BB0:.LBB[0-9_]+]]		; M3: beqz $[[T3:[0-9]+]], [[BB0:.LBB[0-9_]+]]
; M3: move $3, $[[T1]]		; M3: move $3, $[[T1]]
		; M3: bnez $[[T3]], [[BB1:.LBB[0-9_]+]]
		; M3: nop
		; M3: [[EXIT:.LBB[0-9_]+]]:
		; M3: jr $ra
		; M3: nop
		; M3: [[BB0]]:
; M3: dsrlv $[[T4:[0-9]+]], $5, $7		; M3: dsrlv $[[T4:[0-9]+]], $5, $7
; M3: dsll $[[T5:[0-9]+]], $4, 1		; M3: dsll $[[T5:[0-9]+]], $4, 1
; M3: not $[[T6:[0-9]+]], $[[T0]]		; M3: not $[[T6:[0-9]+]], $[[T0]]
; M3: dsllv $[[T7:[0-9]+]], $[[T5]], $[[T6]]		; M3: dsllv $[[T7:[0-9]+]], $[[T5]], $[[T6]]
		; M3: beqz $[[T3]], [[EXIT]]
; M3: or $3, $[[T7]], $[[T4]]		; M3: or $3, $[[T7]], $[[T4]]
; M3: [[BB0]]:
; M3: beqz $[[T3]], [[BB1:.LBB[0-9_]+]]
; M3: nop
; M3: dsra $2, $4, 63
; M3: [[BB1]]:		; M3: [[BB1]]:
; M3: jr $ra		; M3: jr $ra
; M3: nop		; M3: dsra $2, $4, 63

; GP64-NOT-R6: dsrlv $[[T0:[0-9]+]], $5, $7		; GP64-NOT-R6: dsrlv $[[T0:[0-9]+]], $5, $7
; GP64-NOT-R6: dsll $[[T1:[0-9]+]], $4, 1		; GP64-NOT-R6: dsll $[[T1:[0-9]+]], $4, 1
; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0		; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0
; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]		; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]
; GP64-NOT-R6: dsllv $[[T4:[0-9]+]], $[[T1]], $[[T3]]		; GP64-NOT-R6: dsllv $[[T4:[0-9]+]], $[[T1]], $[[T3]]
; GP64-NOT-R6: or $3, $[[T4]], $[[T0]]		; GP64-NOT-R6: or $3, $[[T4]], $[[T0]]
; GP64-NOT-R6: dsrav $2, $4, $7		; GP64-NOT-R6: dsrav $2, $4, $7
Show All 27 Lines

test/CodeGen/Mips/llvm-ir/lshr.ll

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
}		}

define signext i64 @lshr_i64(i64 signext %a, i64 signext %b) {		define signext i64 @lshr_i64(i64 signext %a, i64 signext %b) {
entry:		entry:
; ALL-LABEL: lshr_i64:		; ALL-LABEL: lshr_i64:

; M2: srlv $[[T0:[0-9]+]], $4, $7		; M2: srlv $[[T0:[0-9]+]], $4, $7
; M2: andi $[[T1:[0-9]+]], $7, 32		; M2: andi $[[T1:[0-9]+]], $7, 32
; M2: bnez $[[T1]], $[[BB0:BB[0-9_]+]]		; M2: beqz $[[T1]], $[[BB0:BB[0-9_]+]]
; M2: move $3, $[[T0]]		; M2: move $3, $[[T0]]
		; M2: beqz $[[T1]], $[[BB1:BB[0-9_]+]]
		; M2: addiu $2, $zero, 0
		; M2: $[[EXIT:BB[0-9_]+]]:
		; M2: jr $ra
		; M2: nop
		; M2: $[[BB0]]:
; M2: srlv $[[T2:[0-9]+]], $5, $7		; M2: srlv $[[T2:[0-9]+]], $5, $7
; M2: not $[[T3:[0-9]+]], $7		; M2: not $[[T3:[0-9]+]], $7
; M2: sll $[[T4:[0-9]+]], $4, 1		; M2: sll $[[T4:[0-9]+]], $4, 1
; M2: sllv $[[T5:[0-9]+]], $[[T4]], $[[T3]]		; M2: sllv $[[T5:[0-9]+]], $[[T4]], $[[T3]]
; M2: or $3, $[[T3]], $[[T2]]		; M2: or $3, $[[T3]], $[[T2]]
; M2: $[[BB0]]:		; M2: bnez $[[T1]], $[[EXIT:BB[0-9_]+]]
; M2: bnez $[[T1]], $[[BB1:BB[0-9_]+]]
; M2: addiu $2, $zero, 0		; M2: addiu $2, $zero, 0
; M2: move $2, $[[T0]]
; M2: $[[BB1]]:		; M2: $[[BB1]]:
; M2: jr $ra		; M2: jr $ra
; M2: nop		; M2: move $2, $[[T0]]

; 32R1-R5: srlv $[[T0:[0-9]+]], $5, $7		; 32R1-R5: srlv $[[T0:[0-9]+]], $5, $7
; 32R1-R5: not $[[T1:[0-9]+]], $7		; 32R1-R5: not $[[T1:[0-9]+]], $7
; 32R1-R5: sll $[[T2:[0-9]+]], $4, 1		; 32R1-R5: sll $[[T2:[0-9]+]], $4, 1
; 32R1-R5: sllv $[[T3:[0-9]+]], $[[T2]], $[[T1]]		; 32R1-R5: sllv $[[T3:[0-9]+]], $[[T2]], $[[T1]]
; 32R1-R5: or $3, $[[T3]], $[[T0]]		; 32R1-R5: or $3, $[[T3]], $[[T0]]
; 32R1-R5: srlv $[[T4:[0-9]+]], $4, $7		; 32R1-R5: srlv $[[T4:[0-9]+]], $4, $7
; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32		; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; ALL-LABEL: lshr_i128:

; o32 shouldn't use TImode helpers.		; o32 shouldn't use TImode helpers.
; GP32-NOT: lw $25, %call16(__lshrti3)($gp)		; GP32-NOT: lw $25, %call16(__lshrti3)($gp)
; MM-NOT: lw $25, %call16(__lshrti3)($2)		; MM-NOT: lw $25, %call16(__lshrti3)($2)

; M3: sll $[[T0:[0-9]+]], $7, 0		; M3: sll $[[T0:[0-9]+]], $7, 0
; M3: dsrlv $[[T1:[0-9]+]], $4, $7		; M3: dsrlv $[[T1:[0-9]+]], $4, $7
; M3: andi $[[T2:[0-9]+]], $[[T0]], 64		; M3: andi $[[T2:[0-9]+]], $[[T0]], 64
; M3: bnez $[[T3:[0-9]+]], [[BB0:\.LBB[0-9_]+]]		; M3: beqz $[[T3:[0-9]+]], [[BB0:\.LBB[0-9_]+]]
; M3: move $3, $[[T1]]		; M3: move $3, $[[T1]]
		; M3: beqz $[[T3]], [[BB1:\.LBB[0-9_]+]]
		; M3: daddiu $2, $zero, 0
		; M3: [[EXIT:\.LBB[0-9_]+]]:
		; M3: jr $ra
		; M3: nop
		; M3: [[BB0]]:
; M3: dsrlv $[[T4:[0-9]+]], $5, $7		; M3: dsrlv $[[T4:[0-9]+]], $5, $7
; M3: dsll $[[T5:[0-9]+]], $4, 1		; M3: dsll $[[T5:[0-9]+]], $4, 1
; M3: not $[[T6:[0-9]+]], $[[T0]]		; M3: not $[[T6:[0-9]+]], $[[T0]]
; M3: dsllv $[[T7:[0-9]+]], $[[T5]], $[[T6]]		; M3: dsllv $[[T7:[0-9]+]], $[[T5]], $[[T6]]
; M3: or $3, $[[T7]], $[[T4]]		; M3: or $3, $[[T7]], $[[T4]]
; M3: [[BB0]]:		; M3: bnez $[[T3]], [[EXIT]]
; M3: bnez $[[T3]], [[BB1:\.LBB[0-9_]+]]
; M3: daddiu $2, $zero, 0		; M3: daddiu $2, $zero, 0
; M3: move $2, $[[T1]]
; M3: [[BB1]]:		; M3: [[BB1]]:
; M3: jr $ra		; M3: jr $ra
; M3: nop		; M3: move $2, $[[T1]]

; GP64-NOT-R6: dsrlv $[[T0:[0-9]+]], $5, $7		; GP64-NOT-R6: dsrlv $[[T0:[0-9]+]], $5, $7
; GP64-NOT-R6: dsll $[[T1:[0-9]+]], $4, 1		; GP64-NOT-R6: dsll $[[T1:[0-9]+]], $4, 1
; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0		; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0
; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]		; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]
; GP64-NOT-R6: dsllv $[[T4:[0-9]+]], $[[T1]], $[[T3]]		; GP64-NOT-R6: dsllv $[[T4:[0-9]+]], $[[T1]], $[[T3]]
; GP64-NOT-R6: or $3, $[[T4]], $[[T0]]		; GP64-NOT-R6: or $3, $[[T4]], $[[T0]]
; GP64-NOT-R6: dsrlv $2, $4, $7		; GP64-NOT-R6: dsrlv $2, $4, $7
Show All 23 Lines

test/CodeGen/Mips/llvm-ir/shl.ll

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
}		}

define signext i64 @shl_i64(i64 signext %a, i64 signext %b) {		define signext i64 @shl_i64(i64 signext %a, i64 signext %b) {
entry:		entry:
; ALL-LABEL: shl_i64:		; ALL-LABEL: shl_i64:

; M2: sllv $[[T0:[0-9]+]], $5, $7		; M2: sllv $[[T0:[0-9]+]], $5, $7
; M2: andi $[[T1:[0-9]+]], $7, 32		; M2: andi $[[T1:[0-9]+]], $7, 32
; M2: bnez $[[T1]], $[[BB0:BB[0-9_]+]]		; M2: beqz $[[T1]], $[[BB0:BB[0-9_]+]]
; M2: move $2, $[[T0]]		; M2: move $2, $[[T0]]
		; M2: beqz $[[T1]], $[[BB1:BB[0-9_]+]]
		; M2: addiu $3, $zero, 0
		; M2: $[[EXIT:BB[0-9_]+]]:
		; M2: jr $ra
		; M2: nop
		; M2: $[[BB0]]:
; M2: sllv $[[T2:[0-9]+]], $4, $7		; M2: sllv $[[T2:[0-9]+]], $4, $7
; M2: not $[[T3:[0-9]+]], $7		; M2: not $[[T3:[0-9]+]], $7
; M2: srl $[[T4:[0-9]+]], $5, 1		; M2: srl $[[T4:[0-9]+]], $5, 1
; M2: srlv $[[T5:[0-9]+]], $[[T4]], $[[T3]]		; M2: srlv $[[T5:[0-9]+]], $[[T4]], $[[T3]]
; M2: or $2, $[[T2]], $[[T3]]		; M2: or $2, $[[T2]], $[[T3]]
; M2: $[[BB0]]:		; M2: bnez $[[T1]], $[[EXIT]]
; M2: bnez $[[T1]], $[[BB1:BB[0-9_]+]]
; M2: addiu $3, $zero, 0		; M2: addiu $3, $zero, 0
; M2: move $3, $[[T0]]
; M2: $[[BB1]]:		; M2: $[[BB1]]:
; M2: jr $ra		; M2: jr $ra
; M2: nop		; M2: move $3, $[[T0]]

; 32R1-R5: sllv $[[T0:[0-9]+]], $4, $7		; 32R1-R5: sllv $[[T0:[0-9]+]], $4, $7
; 32R1-R5: not $[[T1:[0-9]+]], $7		; 32R1-R5: not $[[T1:[0-9]+]], $7
; 32R1-R5: srl $[[T2:[0-9]+]], $5, 1		; 32R1-R5: srl $[[T2:[0-9]+]], $5, 1
; 32R1-R5: srlv $[[T3:[0-9]+]], $[[T2]], $[[T1]]		; 32R1-R5: srlv $[[T3:[0-9]+]], $[[T2]], $[[T1]]
; 32R1-R5: or $2, $[[T0]], $[[T3]]		; 32R1-R5: or $2, $[[T0]], $[[T3]]
; 32R1-R5: sllv $[[T4:[0-9]+]], $5, $7		; 32R1-R5: sllv $[[T4:[0-9]+]], $5, $7
; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32		; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; ALL-LABEL: shl_i128:

; o32 shouldn't use TImode helpers.		; o32 shouldn't use TImode helpers.
; GP32-NOT: lw $25, %call16(__ashlti3)($gp)		; GP32-NOT: lw $25, %call16(__ashlti3)($gp)
; MM-NOT: lw $25, %call16(__ashlti3)($2)		; MM-NOT: lw $25, %call16(__ashlti3)($2)

; M3: sll $[[T0:[0-9]+]], $7, 0		; M3: sll $[[T0:[0-9]+]], $7, 0
; M3: dsllv $[[T1:[0-9]+]], $5, $7		; M3: dsllv $[[T1:[0-9]+]], $5, $7
; M3: andi $[[T2:[0-9]+]], $[[T0]], 64		; M3: andi $[[T2:[0-9]+]], $[[T0]], 64
; M3: bnez $[[T3:[0-9]+]], [[BB0:\.LBB[0-9_]+]]		; M3: beqz $[[T3:[0-9]+]], [[BB0:\.LBB[0-9_]+]]
; M3: move $2, $[[T1]]		; M3: move $2, $[[T1]]
		; M3: beqz $[[T3]], [[BB1:\.LBB[0-9_]+]]
		; M3: daddiu $3, $zero, 0
		; M3: [[EXIT:\.LBB[0-9_]+]]:
		; M3: jr $ra
		; M3: nop
		; M3: [[BB0]]:
; M3: dsllv $[[T4:[0-9]+]], $4, $7		; M3: dsllv $[[T4:[0-9]+]], $4, $7
; M3: dsrl $[[T5:[0-9]+]], $5, 1		; M3: dsrl $[[T5:[0-9]+]], $5, 1
; M3: not $[[T6:[0-9]+]], $[[T0]]		; M3: not $[[T6:[0-9]+]], $[[T0]]
; M3: dsrlv $[[T7:[0-9]+]], $[[T5]], $[[T6]]		; M3: dsrlv $[[T7:[0-9]+]], $[[T5]], $[[T6]]
; M3: or $2, $[[T4]], $[[T7]]		; M3: or $2, $[[T4]], $[[T7]]
; M3: [[BB0]]:		; M3: bnez $[[T3]], [[EXIT]]
; M3: bnez $[[T3]], [[BB1:\.LBB[0-9_]+]]
; M3: daddiu $3, $zero, 0		; M3: daddiu $3, $zero, 0
; M3: move $3, $[[T1]]
; M3: [[BB1]]:		; M3: [[BB1]]:
; M3: jr $ra		; M3: jr $ra
; M3: nop		; M3: move $3, $[[T1]]

; GP64-NOT-R6: dsllv $[[T0:[0-9]+]], $4, $7		; GP64-NOT-R6: dsllv $[[T0:[0-9]+]], $4, $7
; GP64-NOT-R6: dsrl $[[T1:[0-9]+]], $5, 1		; GP64-NOT-R6: dsrl $[[T1:[0-9]+]], $5, 1
; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0		; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0
; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]		; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]
; GP64-NOT-R6: dsrlv $[[T4:[0-9]+]], $[[T1]], $[[T3]]		; GP64-NOT-R6: dsrlv $[[T4:[0-9]+]], $[[T1]], $[[T3]]
; GP64-NOT-R6: or $2, $[[T0]], $[[T4]]		; GP64-NOT-R6: or $2, $[[T0]], $[[T4]]
; GP64-NOT-R6: dsllv $3, $5, $7		; GP64-NOT-R6: dsllv $3, $5, $7
Show All 23 Lines

test/CodeGen/PowerPC/tail-dup-layout.ll

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	entry:			entry:
	br label %test1			br label %test1
	test1:			test1:
	%tagbit1 = and i32 %tag, 1			%tagbit1 = and i32 %tag, 1
	%tagbit1eq0 = icmp eq i32 %tagbit1, 0			%tagbit1eq0 = icmp eq i32 %tagbit1, 0
	br i1 %tagbit1eq0, label %test2, label %optional1, !prof !1			br i1 %tagbit1eq0, label %test2, label %optional1, !prof !1
	optional1:			optional1:
	call void @a()			call void @a()
	call void @a()
	davidxlUnsubmitted Done Reply Inline Actions Why changing this test? davidxl: Why changing this test?
	davidxlUnsubmitted Done Reply Inline Actions Missing reply here? davidxl: Missing reply here?
	call void @a()
	call void @a()
	br label %test2			br label %test2
	test2:			test2:
	%tagbit2 = and i32 %tag, 2			%tagbit2 = and i32 %tag, 2
	%tagbit2eq0 = icmp eq i32 %tagbit2, 0			%tagbit2eq0 = icmp eq i32 %tagbit2, 0
	br i1 %tagbit2eq0, label %test3, label %optional2, !prof !1			br i1 %tagbit2eq0, label %test3, label %optional2, !prof !1
	optional2:			optional2:
	call void @b()			call void @b()
	call void @b()
	call void @b()
	call void @b()
	br label %test3			br label %test3
	test3:			test3:
	%tagbit3 = and i32 %tag, 4			%tagbit3 = and i32 %tag, 4
	%tagbit3eq0 = icmp eq i32 %tagbit3, 0			%tagbit3eq0 = icmp eq i32 %tagbit3, 0
	br i1 %tagbit3eq0, label %test4, label %optional3, !prof !1			br i1 %tagbit3eq0, label %test4, label %optional3, !prof !1
	optional3:			optional3:
	call void @c()			call void @c()
	call void @c()
	call void @c()
	call void @c()
	br label %test4			br label %test4
	test4:			test4:
	%tagbit4 = and i32 %tag, 8			%tagbit4 = and i32 %tag, 8
	%tagbit4eq0 = icmp eq i32 %tagbit4, 0			%tagbit4eq0 = icmp eq i32 %tagbit4, 0
	br i1 %tagbit4eq0, label %exit, label %optional4, !prof !1			br i1 %tagbit4eq0, label %exit, label %optional4, !prof !1
	optional4:			optional4:
	call void @d()			call void @d()
	call void @d()			br label %exit
	call void @d()			exit:
				ret void
				}

				; Intended layout:
				; The chain-of-triangles based duplicating produces the layout
				; test1
				; test2
				; test3
				; test4
				; optional1
				; optional2
				; optional3
				; optional4
				; exit
				; even for 50/50 branches.
				; Tail duplication puts test n+1 at the end of optional n
				; so optional1 includes a copy of test2 at the end, and branches
				; to test3 (at the top) or falls through to optional 2.
				; The CHECK statements check for the whole string of tests
				; and then check that the correct test has been duplicated into the end of
				; the optional blocks and that the optional blocks are in the correct order.
				;CHECK-LABEL: straight_test_50:
				; test1 may have been merged with entry
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
				;CHECK-NEXT: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]
				;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
				;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]
				;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
				;CHECK-NEXT: bne 0, .[[OPT4LABEL:[_0-9A-Za-z]+]]
				;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				;CHECK-NEXT: .[[OPT1LABEL]]:
				;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, .[[TEST3LABEL]]
				;CHECK-NEXT: .[[OPT2LABEL]]:
				;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
				;CHECK-NEXT: beq 0, .[[TEST4LABEL]]
				;CHECK-NEXT: .[[OPT3LABEL]]:
				;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
				;CHECK-NEXT: beq 0, .[[EXITLABEL]]
				;CHECK-NEXT: .[[OPT4LABEL]]:
				;CHECK: b .[[EXITLABEL]]

				define void @straight_test_50(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %optional1, !prof !2
				optional1:
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %test3, label %optional2, !prof !2
				optional2:
				call void @b()
				br label %test3
				test3:
				%tagbit3 = and i32 %tag, 4
				%tagbit3eq0 = icmp eq i32 %tagbit3, 0
				br i1 %tagbit3eq0, label %test4, label %optional3, !prof !2
				optional3:
				call void @c()
				br label %test4
				test4:
				%tagbit4 = and i32 %tag, 8
				%tagbit4eq0 = icmp eq i32 %tagbit4, 0
				br i1 %tagbit4eq0, label %exit, label %optional4, !prof !1
				optional4:
	call void @d()			call void @d()
	br label %exit			br label %exit
	exit:			exit:
	ret void			ret void
	}			}

	; Intended layout:			; Intended layout:
	; The chain-based outlining produces the layout			; The chain-based outlining produces the layout
	▲ Show 20 Lines • Show All 328 Lines • Show Last 20 Lines

test/CodeGen/X86/cmovcmov.ll

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	; vreg12 = phi(vreg7, BB#8, vreg11, BB#0, vreg12, BB#7)			; vreg12 = phi(vreg7, BB#8, vreg11, BB#0, vreg12, BB#7)
	; vreg13 = COPY vreg12			; vreg13 = COPY vreg12
	; Which was invalid as %vreg12 is not the same value as %vreg13			; Which was invalid as %vreg12 is not the same value as %vreg13

	; CHECK-LABEL: no_cascade_opt:			; CHECK-LABEL: no_cascade_opt:
	; CMOV-DAG: cmpl %edx, %esi			; CMOV-DAG: cmpl %edx, %esi
	; CMOV-DAG: movb $20, %al			; CMOV-DAG: movb $20, %al
	; CMOV-DAG: movb $20, %dl			; CMOV-DAG: movb $20, %dl
	; CMOV: jl [[BB0:.LBB[0-9_]+]]			; CMOV: jge [[BB2:.LBB[0-9_]+]]
				; CMOV: jle [[BB3:.LBB[0-9_]+]]
				; CMOV: [[BB0:.LBB[0-9_]+]]
				; CMOV: testl %edi, %edi
				; CMOV: jne [[BB4:.LBB[0-9_]+]]
				; CMOV: [[BB1:.LBB[0-9_]+]]
				; CMOV: movb %al, g8(%rip)
				; CMOV: retq
				; CMOV: [[BB2]]:
	; CMOV: movl %ecx, %edx			; CMOV: movl %ecx, %edx
	; CMOV: [[BB0]]:			; CMOV: jg [[BB0]]
	; CMOV: jg [[BB1:.LBB[0-9_]+]]			; CMOV: [[BB3]]:
	; CMOV: movl %edx, %eax			; CMOV: movl %edx, %eax
	; CMOV: [[BB1]]:
	; CMOV: testl %edi, %edi			; CMOV: testl %edi, %edi
	; CMOV: je [[BB2:.LBB[0-9_]+]]			; CMOV: je [[BB1]]
				; CMOV: [[BB4]]:
	; CMOV: movl %edx, %eax			; CMOV: movl %edx, %eax
	; CMOV: [[BB2]]:
	; CMOV: movb %al, g8(%rip)			; CMOV: movb %al, g8(%rip)
	; CMOV: retq			; CMOV: retq
	define void @no_cascade_opt(i32 %v0, i32 %v1, i32 %v2, i32 %v3) {			define void @no_cascade_opt(i32 %v0, i32 %v1, i32 %v2, i32 %v3) {
	entry:			entry:
	%c0 = icmp eq i32 %v0, 0			%c0 = icmp eq i32 %v0, 0
	%c1 = icmp slt i32 %v1, %v2			%c1 = icmp slt i32 %v1, %v2
	%c2 = icmp sgt i32 %v1, %v2			%c2 = icmp sgt i32 %v1, %v2
	%trunc = trunc i32 %v3 to i8			%trunc = trunc i32 %v3 to i8
	%sel0 = select i1 %c1, i8 20, i8 %trunc			%sel0 = select i1 %c1, i8 20, i8 %trunc
	%sel1 = select i1 %c2, i8 20, i8 %sel0			%sel1 = select i1 %c2, i8 20, i8 %sel0
	%sel2 = select i1 %c0, i8 %sel1, i8 %sel0			%sel2 = select i1 %c0, i8 %sel1, i8 %sel0
	store volatile i8 %sel2, i8* @g8			store volatile i8 %sel2, i8* @g8
	ret void			ret void
	}			}