This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic

Authored by iteratee on Jan 11 2017, 3:07 PM.

Download Raw Diff

Details

Reviewers

davidxl
• tstellarAMD
arsenm
javed.absar

Commits

rGb15c06677c63: CodeGen: Allow small copyable blocks to "break" the CFG.
rL293716: CodeGen: Allow small copyable blocks to "break" the CFG.

Summary

When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Diff Detail

Event Timeline

iteratee updated this revision to Diff 84034.Jan 11 2017, 3:07 PM

iteratee retitled this revision from to CodeGen: Allow small copyable blocks to "break" the CFG..

iteratee updated this object.

iteratee added a reviewer: davidxl.

iteratee set the repository for this revision to rL LLVM.

iteratee added subscribers: echristo, timshen, chandlerc, llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJan 11 2017, 3:07 PM

Herald added subscribers: nhaehnle, nemanjai, jyknight and 2 others. · View Herald Transcript

iteratee updated this object.Jan 11 2017, 3:08 PM

iteratee edited edge metadata.

iteratee updated this revision to Diff 84036.Jan 11 2017, 3:15 PM

iteratee added a reviewer: arsenm.

iteratee removed rL LLVM as the repository for this revision.

Herald edited edge metadata. · View Herald TranscriptJan 11 2017, 3:15 PM

Herald added a subscriber: wdng. · View Herald Transcript

Realized that one of the calculations I did was only valid for D28522. Re-worked the calculation for now, and will rebase and update the calculation there.

Herald edited edge metadata. · View Herald TranscriptJan 11 2017, 4:34 PM

junbuml added a subscriber: junbuml.Jan 12 2017, 7:11 AM

I like the direction (with more precise cost analysis) this is going. Will review the code soon.

iteratee mentioned this in D28522: Codegen: Make chains from trellis-shaped CFGs.Jan 12 2017, 12:03 PM

iteratee added a child revision: D28522: Codegen: Make chains from trellis-shaped CFGs.

davidxl added inline comments.Jan 13 2017, 10:04 AM

lib/CodeGen/MachineBlockPlacement.cpp
676	In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block. Dependinng the result, we will know that without tailDup, the layout order is Succ-> PDom or Succ->D->PDom. This will make the cost computation more precise.

davidxl added inline comments.Jan 13 2017, 10:04 AM

lib/CodeGen/MachineBlockPlacement.cpp
431	Suggest new name : isProfitableToTailDup
634	Dom -> PDom
641	Why not just check if there exists a SuccSucc that post dominates Succ directly?
660	PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ --> D so the overall cost is: Q + P V + Q ( which is smaller than Q + QV + PU + PV)
661	We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method is called. Besides, if Succ is not tail-duped, the layout decision may even reject Succ as the layout successor, so the cost is no longer P + V, but 2*Q + V instead (with U > V). In other words, isProfitable check can not be done inside 'hasBetterLayoutPredecessor', but hoisted to the caller of it when 'hasBetterLayoutPrecessor' returns, at which point we will know the layout decision if taildup does not kick in.

Updated the cost calculation to not rely on the lattice layout.
This resulted in fewer duplications in tests, so those tests changes have been rolled into D28522

Herald edited edge metadata. · View Herald TranscriptJan 13 2017, 3:42 PM

I made the calculations in terms of frequency instead of probability.

I adjusted the cost calculation when there is a post dominator based on whether it will be laid out after Succ or not.

Let me know if there are any cost calculations that you think are wrong.

Herald added a reviewer: javed.absar. · View Herald TranscriptJan 19 2017, 5:25 PM

Actually upload the diff with what I said was in the last one:
Use frequency instead of probability

Use slight lookahead for more precise probability calculations.

Let me know what you think. There is a small cleanup that could go in as a separate patch: I switched to a SmallDenseSet because we don't need the orderedness of the SmallVectorSet.

lib/CodeGen/MachineBlockPlacement.cpp
660	I thought that too. But without the lattice patch, after duplication, we won't put D after Succ because it now has an unplaced predecessor. The lattice patch fixes the behavior and the calculation.
661	I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor.
661	We now only call this function to check if we should use Succ despite it having been rejected. So we know that Succ is not the layout successor.

Tidied comments and spacing.

davidxl added inline comments.Jan 20 2017, 2:04 PM

lib/CodeGen/MachineBlockPlacement.cpp
280	There is a reason SmallSetVector is used here -- to make sure the iteration order is deterministic.
653	I assume this is loop back edge source block. You need a test case to cover it.
664	Why break here?
669	nit: -->SuccBestPred
677	Computing BestSuccPred here is unnecessary. See below for more comments.
686	Qin is not necessarily BestSuccPred. Profitability check is called only after hasBetterLayoutPredecessor is returned and it returns true. There are two scenarios it returns true Qin or Qout is larger than P, or P is larger than Qout, but not the branch is not biased enough such that the layout algorithm still decides to keep the top-order. Either way, the baseline layout to compare (with taildup) is that BB->Succ is the branch taken edge, and BB->C is the fall through edge. Qin should just be Prob(BB->C)
702	PDom is always a successor of Succ according to the way it is computed.
708	The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the fallthrough path is BB->Succ->D so the cost (normalized with freq(bb) ==1) is 2Q+ PV If P < Q, the fall through path is BB->C'->D the cost is 2P + QV
875	Add more description about what blocks to ignore.

I'll be glad to add some more comments to explain, but I think the calculations are correct. I've commented individually.

lib/CodeGen/MachineBlockPlacement.cpp
280	BlockFilterSet is never iterated. I checked.
661	I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor.
664	Because if PDom is not null, that's all that we look at for the probability calculation.
686	When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be larger than Freq(BB->C). I am using Qin to represent Freq(C'->Succ) and Qout for Freq(BB->C). I could just use different letters if that were more clear. Qout is Freq(BB->C). I don't think Qin should be as well.
702	Thanks.
708	This function is called in a loop looking for the highest probability successor. If Q > P, this function will be ignored and we will lay out Q anyway, so we can ignore the second case. As to the first case: Until the 2nd patch lands, the duplication will prevent the BB->Succ->D layout. Instead you will get BB->Succ ; C'->D So the cost is as calculated. D28522 will include an update to this calculation along with an update to the behavior.
875	Well, that's really up to the caller. Do you want me to list why you might want to ignore a block?

davidxl added inline comments.Jan 20 2017, 4:19 PM

lib/CodeGen/MachineBlockPlacement.cpp
280	See for (MachineBasicBlock *LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet);
686	differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB->Succ). What I meant is you should directly compute Qin as its definition Freq(C'->Succ)
708	You are right about Q > P case that that scenario will be dropped. It is very subtle, so please add some comment to clarify. Ok -- for the first case, also add a comment
875	something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details. However, see my comment in the function, this parameter seems unnecessary.
1007	I think it is equivalent to check Pred == BB. In normal calling context, this is covered by BlockToChain[Pred] == &Chain, but for lookahead case, it is needed to filter BB which is not laid out yet.

Improved comments based on review.

Please mark addressed comments as done. Also let me know if it is ready for another round of review (I saw some issues not addressed such as the deterministic iteration of block filter set).

Missed a comment to rename something.

In D28583#653869, @davidxl wrote:

Please mark addressed comments as done. Also let me know if it is ready for another round of review (I saw some issues not addressed such as the deterministic iteration of block filter set).

Marked.

I think it's ready, and I put back the deterministic set.

lib/CodeGen/MachineBlockPlacement.cpp
686	Did you still want me to fix something here?

davidxl added inline comments.Jan 23 2017, 2:39 PM

lib/CodeGen/MachineBlockPlacement.cpp
653	test case for this?
683	Add a short cut here with comments: // If P is not larger, the best successor selection loop will eventually select C, not Succ (as it is not profitable to do so). if (P <= Qout) return false;
686	just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming edges which have not been placed.
1008	--> ... for lookhead by isProfitableToTailDup when BB has not yet been placed.

More comments from review, and a new test case.

This version looks almost fine except for one remaining unaddressed comment.

lib/CodeGen/MachineBlockPlacement.cpp
683	How about this comment? Early return can 1) speed up the computation and 2) make the following code easier to understand.

iteratee added inline comments.Jan 23 2017, 8:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
653	It's not just a back edge. I added a test case.
683	If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know that we won't use the result.

Sorry. I'd replied to the comment, but Phabricator didn't submit it along with my diff update for some reason.

Save the blocks with CFG violations that are duplication candidates. Review them in descending order of probability, so we call isProfitableToTailDup the minimum number of times.

davidxl added inline comments.Jan 24 2017, 4:43 PM

lib/CodeGen/MachineBlockPlacement.cpp
1091	Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win depending on the iteration order, but adds more confusion.
1108	no need to set ShouldTailDup in the loop -- it is already initalized outside.
1116	Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do you need position ?
1126	Should it break instead?
1128	isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not be true here as there are other two possibilities: Succ == BestSucc.BB in the base layout BestSucc.BB == null in the base layout (all BB's successors have conflicts). In such two cases, isProfitable check should probably be skipped (as it is benefitial)

Changes from comments:
Just sort the vector instead of make_heap.
If there is a tail duplication opportunity and no other successor, take it.

lib/CodeGen/MachineBlockPlacement.cpp
1116	Will just sort the vector. Position is because we rely on the successor order being stable and the first successor being a subtle hint. Without the position, we lose track of whether the block in the vector came before or after the block we picked without tail duplication.
1128	Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the first loop by construction. Good catch. I'll add that.

Per offline discussion, I removed the ordering constraint for blocks that are profitable to tail-duplicate.

This resulted in a lot of test churn, but the source change is relatively small.

This looks very clean now.

However the amount of churns remind me of one thing. Since the profit computation is based on static branch prediction (without PGO), it is the right thing to do to be a little more conservative in taildup. In other words, instead of making 'isProtifiable' return true when the taildup cost is smaller than baseline cost, add a predefined margin (controlled by a parameter):

if (baseline_cost - taildup_cost > threshold)

return true;

return false;

The threshold also roughly models the side effect of taildup -- increased icache footprint etc due to code size increase.

Compare frequencies with a small bias against the tail-duplication side to account for increased icache pressure.

Includes a TODO to handle edge frequencies better in general.

davidxl added inline comments.Jan 30 2017, 4:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
151	perhaps simplify it to tail-dup-penalty ?
620	This basically treats the penality percent parameter as the threshold of normalized improvement: (A-B)/B if ((A-B)/B > PenaltyPercent/100) return true; The problem with this formula is that if B is very hot, it makes (A-B)/B become small, even though the (A-B) is still large. So I think it is better to compute the normalized improvement as (A-B)/Entry_Freq basically the improvement relative to the entry frequency. This will help prevent tail dup from happening in very cold paths. The implementation can makes use of BranchProbablity as well. Suppose we want to implement condition: if ( (A-B)/Entry_Freq > P/100) return true; do this 3 lines: BlockFrequency Profit = A - B; BlockFrequency Threshold = Entry_Freq * BranchProbability(P, 100); return Profit > Threshold;

Use a percentage of the entry frequency as a cutoff.

davidxl added inline comments.Jan 30 2017, 7:23 PM

lib/CodeGen/MachineBlockPlacement.cpp
156	Is this default value too low? Increase it 5 or 10 perhaps?
625	I suppose this logic here is for rounding errors or overflow? Can you explain why the simple scaling with branch prob (in BranchProbablity.cpp) does not work? return Gain > EntryFreq*ThresholdProb;

Simplify the biased comparison.

iteratee marked 4 inline comments as done.Jan 31 2017, 11:36 AM

iteratee added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
156	No, I think we should leave it. Now that it's a flag it's easy to change, and especially comparing with the entry frequency 2% is a big enough margin.
625	I did the math, and found a way to do it simply.

lgtm

(I only sampled some test case changes which look reasonable)

test/CodeGen/X86/bt.ll
27–32	This test has not changed in behavior. Better to revert the change.

This revision is now accepted and ready to land.Jan 31 2017, 1:45 PM

iteratee marked an inline comment as done.Jan 31 2017, 1:48 PM

iteratee added inline comments.

test/CodeGen/X86/bt.ll
27–32	I'll do a complete check for any tests that fall into this category and revert them.

Closed by commit rL293716: CodeGen: Allow small copyable blocks to "break" the CFG. (authored by iteratee). · Explain WhyJan 31 2017, 3:59 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

BranchFolding.h

1 line

BranchFolding.cpp

5 lines

MachineBlockPlacement.cpp

371 lines

test/

CodeGen/

AArch64/

arm64-atomic.ll

22 lines

arm64-shrink-wrapping.ll

14 lines

tail-dup-repeat-worklist.ll

tbz-tbnz.ll

16 lines

AMDGPU/

branch-relaxation.ll

16 lines

uniform-cfg.ll

22 lines

ARM/

arm-and-tst-peephole.ll

6 lines

atomic-op.ll

4 lines

atomic-ops-v8.ll

35 lines

cmpxchg-weak.ll

8 lines

Mips/

brconnez.ll

4 lines

micromips-compact-branches.ll

3 lines

PowerPC/

misched-inorder-latency.ll

4 lines

tail-dup-break-cfg.ll

140 lines

tail-dup-layout.ll

86 lines

SPARC/

sjlj.ll

9 lines

SystemZ/

int-cmp-44.ll

6 lines

Thumb/

thumb-shrink-wrapping.ll

11 lines

Thumb2/

cbnz.ll

2 lines

ifcvt-compare.ll

2 lines

v8_IT_4.ll

5 lines

WebAssembly/

phi.ll

5 lines

X86/

5 lines

215 lines

4 lines

4 lines

3 lines

Diff 86381

lib/CodeGen/BranchFolding.h

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	public:
public:		public:
MBFIWrapper(const MachineBlockFrequencyInfo &I) : MBFI(I) {}		MBFIWrapper(const MachineBlockFrequencyInfo &I) : MBFI(I) {}
BlockFrequency getBlockFreq(const MachineBasicBlock *MBB) const;		BlockFrequency getBlockFreq(const MachineBasicBlock *MBB) const;
void setBlockFreq(const MachineBasicBlock *MBB, BlockFrequency F);		void setBlockFreq(const MachineBasicBlock *MBB, BlockFrequency F);
raw_ostream &printBlockFreq(raw_ostream &OS,		raw_ostream &printBlockFreq(raw_ostream &OS,
const MachineBasicBlock *MBB) const;		const MachineBasicBlock *MBB) const;
raw_ostream &printBlockFreq(raw_ostream &OS,		raw_ostream &printBlockFreq(raw_ostream &OS,
const BlockFrequency Freq) const;		const BlockFrequency Freq) const;
		uint64_t getEntryFreq() const;

private:		private:
const MachineBlockFrequencyInfo &MBFI;		const MachineBlockFrequencyInfo &MBFI;
DenseMap<const MachineBasicBlock *, BlockFrequency> MergedBBFreq;		DenseMap<const MachineBasicBlock *, BlockFrequency> MergedBBFreq;
};		};

private:		private:
MBFIWrapper &MBBFreqInfo;		MBFIWrapper &MBBFreqInfo;
Show All 32 Lines

lib/CodeGen/BranchFolding.cpp

	Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines
	}			}

	raw_ostream &			raw_ostream &
	BranchFolder::MBFIWrapper::printBlockFreq(raw_ostream &OS,			BranchFolder::MBFIWrapper::printBlockFreq(raw_ostream &OS,
	const BlockFrequency Freq) const {			const BlockFrequency Freq) const {
	return MBFI.printBlockFreq(OS, Freq);			return MBFI.printBlockFreq(OS, Freq);
	}			}

				uint64_t
				BranchFolder::MBFIWrapper::getEntryFreq() const {
				return MBFI.getEntryFreq();
				}

	/// CountTerminators - Count the number of terminators in the given			/// CountTerminators - Count the number of terminators in the given
	/// block and set I to the position of the first non-terminator, if there			/// block and set I to the position of the first non-terminator, if there
	/// is one, or MBB->end() otherwise.			/// is one, or MBB->end() otherwise.
	static unsigned CountTerminators(MachineBasicBlock *MBB,			static unsigned CountTerminators(MachineBasicBlock *MBB,
	MachineBasicBlock::iterator &I) {			MachineBasicBlock::iterator &I) {
	I = MBB->end();			I = MBB->end();
	unsigned NumTerms = 0;			unsigned NumTerms = 0;
	for (;;) {			for (;;) {
	▲ Show 20 Lines • Show All 1,425 Lines • Show Last 20 Lines

lib/CodeGen/MachineBlockPlacement.cpp

Show All 34 Lines
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
		#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/CodeGen/TailDuplicator.h"		#include "llvm/CodeGen/TailDuplicator.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <algorithm>		#include <algorithm>
		#include <functional>
		#include <utility>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "block-placement"		#define DEBUG_TYPE "block-placement"

STATISTIC(NumCondBranches, "Number of conditional branches");		STATISTIC(NumCondBranches, "Number of conditional branches");
STATISTIC(NumUncondBranches, "Number of unconditional branches");		STATISTIC(NumUncondBranches, "Number of unconditional branches");
STATISTIC(CondBranchTakenFreq,		STATISTIC(CondBranchTakenFreq,
"Potential frequency of taking conditional branches");		"Potential frequency of taking conditional branches");
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines

static cl::opt<bool>		static cl::opt<bool>
BranchFoldPlacement("branch-fold-placement",		BranchFoldPlacement("branch-fold-placement",
cl::desc("Perform branch folding during placement. "		cl::desc("Perform branch folding during placement. "
"Reduces code size."),		"Reduces code size."),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

// Heuristic for tail duplication.		// Heuristic for tail duplication.
static cl::opt<unsigned> TailDuplicatePlacementThreshold(		static cl::opt<unsigned> TailDupPlacementThreshold(
"tail-dup-placement-threshold",		"tail-dup-placement-threshold",
cl::desc("Instruction cutoff for tail duplication during layout. "		cl::desc("Instruction cutoff for tail duplication during layout. "
"Tail merging during layout is forced to have a threshold "		"Tail merging during layout is forced to have a threshold "
"that won't conflict."), cl::init(2),		"that won't conflict."), cl::init(2),
cl::Hidden);		cl::Hidden);

		// Heuristic for tail duplication.
		static cl::opt<unsigned> TailDupPlacementPenalty(
		"tail-dup-placement-penalty",
		davidxlUnsubmitted Not Done Reply Inline Actions perhaps simplify it to tail-dup-penalty ? davidxl: perhaps simplify it to tail-dup-penalty ?
		cl::desc("Cost penalty for blocks that can avoid breaking CFG by copying. "
		"Copying can increase fallthrough, but it also increases icache "
		"pressure. This parameter controls the penalty to account for that. "
		"Percent as integer."),
		cl::init(2),
		davidxlUnsubmitted Not Done Reply Inline Actions Is this default value too low? Increase it 5 or 10 perhaps? davidxl: Is this default value too low? Increase it 5 or 10 perhaps?
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions No, I think we should leave it. Now that it's a flag it's easy to change, and especially comparing with the entry frequency 2% is a big enough margin. iteratee: No, I think we should leave it. Now that it's a flag it's easy to change, and especially…
		cl::Hidden);

extern cl::opt<unsigned> StaticLikelyProb;		extern cl::opt<unsigned> StaticLikelyProb;
extern cl::opt<unsigned> ProfileLikelyProb;		extern cl::opt<unsigned> ProfileLikelyProb;

namespace {		namespace {
class BlockChain;		class BlockChain;
/// \brief Type for our function-wide basic block -> block chain mapping.		/// \brief Type for our function-wide basic block -> block chain mapping.
typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;		typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;
}		}
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	#endif // NDEBUG
/// and then once for the function as a whole.		/// and then once for the function as a whole.
unsigned UnscheduledPredecessors;		unsigned UnscheduledPredecessors;
};		};
}		}

namespace {		namespace {
class MachineBlockPlacement : public MachineFunctionPass {		class MachineBlockPlacement : public MachineFunctionPass {
/// \brief A typedef for a block filter set.		/// \brief A typedef for a block filter set.
typedef SmallSetVector<MachineBasicBlock *, 16> BlockFilterSet;		typedef SmallSetVector<MachineBasicBlock *, 16> BlockFilterSet;
		davidxlUnsubmitted Done Reply Inline Actions There is a reason SmallSetVector is used here -- to make sure the iteration order is deterministic. davidxl: There is a reason SmallSetVector is used here -- to make sure the iteration order is…
		iterateeAuthorUnsubmitted Done Reply Inline Actions BlockFilterSet is never iterated. I checked. iteratee: BlockFilterSet is never iterated. I checked.
		davidxlUnsubmitted Done Reply Inline Actions See for (MachineBasicBlock LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet); davidxl:* See for (MachineBasicBlock *LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds…

		/// Pair struct containing basic block and taildup profitiability
		struct BlockAndTailDupResult {
		MachineBasicBlock * BB;
		bool ShouldTailDup;
		};

/// \brief work lists of blocks that are ready to be laid out		/// \brief work lists of blocks that are ready to be laid out
SmallVector<MachineBasicBlock *, 16> BlockWorkList;		SmallVector<MachineBasicBlock *, 16> BlockWorkList;
SmallVector<MachineBasicBlock *, 16> EHPadWorkList;		SmallVector<MachineBasicBlock *, 16> EHPadWorkList;

/// \brief Machine Function		/// \brief Machine Function
MachineFunction *F;		MachineFunction *F;

/// \brief A handle to the branch probability pass.		/// \brief A handle to the branch probability pass.
Show All 11 Lines	class MachineBlockPlacement : public MachineFunctionPass {
MachineBasicBlock *PreferredLoopExit;		MachineBasicBlock *PreferredLoopExit;

/// \brief A handle to the target's instruction info.		/// \brief A handle to the target's instruction info.
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;

/// \brief A handle to the target's lowering info.		/// \brief A handle to the target's lowering info.
const TargetLoweringBase *TLI;		const TargetLoweringBase *TLI;

/// \brief A handle to the post dominator tree.		/// \brief A handle to the dominator tree.
MachineDominatorTree *MDT;		MachineDominatorTree *MDT;

		/// \brief A handle to the post dominator tree.
		MachinePostDominatorTree *MPDT;

/// \brief Duplicator used to duplicate tails during placement.		/// \brief Duplicator used to duplicate tails during placement.
///		///
/// Placement decisions can open up new tail duplication opportunities, but		/// Placement decisions can open up new tail duplication opportunities, but
/// since tail duplication affects placement decisions of later blocks, it		/// since tail duplication affects placement decisions of later blocks, it
/// must be done inline.		/// must be done inline.
TailDuplicator TailDup;		TailDuplicator TailDup;

/// \brief A set of blocks that are unavoidably execute, i.e. they dominate		/// \brief A set of blocks that are unavoidably execute, i.e. they dominate
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	bool maybeTailDuplicateBlock(MachineBasicBlock BB, MachineBasicBlock LPred,
BlockFilterSet *BlockFilter,		BlockFilterSet *BlockFilter,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
bool &DuplicatedToPred);		bool &DuplicatedToPred);
bool		bool
hasBetterLayoutPredecessor(MachineBasicBlock BB, MachineBasicBlock Succ,		hasBetterLayoutPredecessor(MachineBasicBlock BB, MachineBasicBlock Succ,
BlockChain &SuccChain, BranchProbability SuccProb,		BlockChain &SuccChain, BranchProbability SuccProb,
BranchProbability RealSuccProb, BlockChain &Chain,		BranchProbability RealSuccProb, BlockChain &Chain,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);
MachineBasicBlock selectBestSuccessor(MachineBasicBlock BB,		BlockAndTailDupResult selectBestSuccessor(MachineBasicBlock *BB,
BlockChain &Chain,		BlockChain &Chain,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);
MachineBasicBlock *		MachineBasicBlock *
selectBestCandidateBlock(BlockChain &Chain,		selectBestCandidateBlock(BlockChain &Chain,
SmallVectorImpl<MachineBasicBlock *> &WorkList);		SmallVectorImpl<MachineBasicBlock *> &WorkList);
MachineBasicBlock *		MachineBasicBlock *
getFirstUnplacedBlock(const BlockChain &PlacedChain,		getFirstUnplacedBlock(const BlockChain &PlacedChain,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);

Show All 16 Lines	#endif
void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,		void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,		void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void collectMustExecuteBBs();		void collectMustExecuteBBs();
void buildCFGChains();		void buildCFGChains();
void optimizeBranches();		void optimizeBranches();
void alignBlocks();		void alignBlocks();
		bool shouldTailDuplicate(MachineBasicBlock *BB);
		/// Check the edge frequencies to see if tail duplication will increase
		/// fallthroughs.
		bool isProfitableToTailDup(
		davidxlUnsubmitted Done Reply Inline Actions Suggest new name : isProfitableToTailDup davidxl: Suggest new name : isProfitableToTailDup
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BranchProbability AdjustedSumProb,
		BlockChain &Chain, const BlockFilterSet *BlockFilter);
		/// Returns true if a block can tail duplicate into all unplaced
		/// predecessors. Filters based on loop.
		bool canTailDuplicateUnplacedPreds(
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BlockChain &Chain, const BlockFilterSet *BlockFilter);

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineBlockPlacement() : MachineFunctionPass(ID) {		MachineBlockPlacement() : MachineFunctionPass(ID) {
initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());		initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<MachineBranchProbabilityInfo>();		AU.addRequired<MachineBranchProbabilityInfo>();
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
		if (TailDupPlacement)
		AU.addRequired<MachinePostDominatorTree>();
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
AU.addRequired<TargetPassConfig>();		AU.addRequired<TargetPassConfig>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};
}		}

char MachineBlockPlacement::ID = 0;		char MachineBlockPlacement::ID = 0;
char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;		char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;
INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",
"Branch Probability Basic Block Placement", false, false)		"Branch Probability Basic Block Placement", false, false)
INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)
INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement",
"Branch Probability Basic Block Placement", false, false)		"Branch Probability Basic Block Placement", false, false)

#ifndef NDEBUG		#ifndef NDEBUG
/// \brief Helper to print the name of a MBB.		/// \brief Helper to print the name of a MBB.
///		///
/// Only used by debug logging.		/// Only used by debug logging.
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	getAdjustedProbability(BranchProbability OrigProb,
if (SuccProbN >= SuccProbD)		if (SuccProbN >= SuccProbD)
SuccProb = BranchProbability::getOne();		SuccProb = BranchProbability::getOne();
else		else
SuccProb = BranchProbability(SuccProbN, SuccProbD);		SuccProb = BranchProbability(SuccProbN, SuccProbD);

return SuccProb;		return SuccProb;
}		}

		/// Check if a block should be tail duplicated.
		/// \p BB Block to check.
		bool MachineBlockPlacement::shouldTailDuplicate(MachineBasicBlock *BB) {
		// Blocks with single successors don't create additional fallthrough
		// opportunities. Don't duplicate them. TODO: When conditional exits are
		// analyzable, allow them to be duplicated.
		bool IsSimple = TailDup.isSimpleBB(BB);

		if (BB->succ_size() == 1)
		return false;
		return TailDup.shouldTailDuplicate(IsSimple, *BB);
		}

		/// Compare 2 BlockFrequency's with a small penalty for \p A.
		/// In order to be conservative, we apply a X% penalty to account for
		/// increased icache pressure and static heuristics. For small frequencies
		/// we use only the numerators to improve accuracy. For simplicity, we assume the
		/// penalty is less than 100%
		/// TODO(iteratee): Use 64-bit fixed point edge frequencies everywhere.
		static bool greaterWithBias(BlockFrequency A, BlockFrequency B,
		davidxlUnsubmitted Not Done Reply Inline Actions This basically treats the penality percent parameter as the threshold of normalized improvement: (A-B)/B if ((A-B)/B > PenaltyPercent/100) return true; The problem with this formula is that if B is very hot, it makes (A-B)/B become small, even though the (A-B) is still large. So I think it is better to compute the normalized improvement as (A-B)/Entry_Freq basically the improvement relative to the entry frequency. This will help prevent tail dup from happening in very cold paths. The implementation can makes use of BranchProbablity as well. Suppose we want to implement condition: if ( (A-B)/Entry_Freq > P/100) return true; do this 3 lines: BlockFrequency Profit = A - B; BlockFrequency Threshold = Entry_Freq * BranchProbability(P, 100); return Profit > Threshold; davidxl: This basically treats the penality percent parameter as the threshold of normalized improvement…
		uint64_t EntryFreq) {
		if (B > A)
		return false;
		BlockFrequency Gain = A - B;
		if (Gain.getFrequency() < (uint64_t) 1 << 32) {
		davidxlUnsubmitted Done Reply Inline Actions I suppose this logic here is for rounding errors or overflow? Can you explain why the simple scaling with branch prob (in BranchProbablity.cpp) does not work? return Gain > EntryFreqThresholdProb; davidxl:* I suppose this logic here is for rounding errors or overflow? Can you explain why the simple…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions I did the math, and found a way to do it simply. iteratee: I did the math, and found a way to do it simply.
		if (EntryFreq < (uint64_t) 1 << 40) {
		return (Gain.getFrequency() * 100 > EntryFreq * TailDupPlacementPenalty);
		}
		return false;
		} else {
		BranchProbability ThresholdProb(TailDupPlacementPenalty, 100);
		return (Gain / ThresholdProb).getFrequency() > EntryFreq;
		}
		}
		davidxlUnsubmitted Done Reply Inline Actions Dom -> PDom davidxl: Dom -> PDom

		/// Check the edge frequencies to see if tail duplication will increase
		/// fallthroughs. It only makes sense to call this function when
		/// \p Succ would not be chosen otherwise. Tail duplication of \p Succ is
		/// always locally profitable if we would have picked \p Succ without
		/// considering duplication.
		bool MachineBlockPlacement::isProfitableToTailDup(
		davidxlUnsubmitted Done Reply Inline Actions Why not just check if there exists a SuccSucc that post dominates Succ directly? davidxl: Why not just check if there exists a SuccSucc that post dominates Succ directly?
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BranchProbability QProb,
		BlockChain &Chain, const BlockFilterSet *BlockFilter) {
		// We need to do a probability calculation to make sure this is profitable.
		// First: does succ have a successor that post-dominates? This affects the
		// calculation. The 2 relevant cases are:
		// BB BB
		// \| \Qout \| \Qout
		// P\| C \|P C
		// = C' = C'
		// \| /Qin \| /Qin
		// \| / \| /
		davidxlUnsubmitted Done Reply Inline Actions I assume this is loop back edge source block. You need a test case to cover it. davidxl: I assume this is loop back edge source block. You need a test case to cover it.
		davidxlUnsubmitted Done Reply Inline Actions test case for this? davidxl: test case for this?
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions It's not just a back edge. I added a test case. iteratee: It's not just a back edge. I added a test case.
		// Succ Succ
		// / \ \| \ V
		// U/ =V \|U \
		// / \ = D
		// D E \| /
		// \| /
		// \|/
		davidxlUnsubmitted Done Reply Inline Actions PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ --> D so the overall cost is: Q + P V + Q ( which is smaller than Q + QV + PU + PV) davidxl: PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ…
		iterateeAuthorUnsubmitted Done Reply Inline Actions I thought that too. But without the lattice patch, after duplication, we won't put D after Succ because it now has an unplaced predecessor. The lattice patch fixes the behavior and the calculation. iteratee: I thought that too. But without the lattice patch, after duplication, we won't put D after Succ…
		// PDom
		davidxlUnsubmitted Done Reply Inline Actions We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method is called. Besides, if Succ is not tail-duped, the layout decision may even reject Succ as the layout successor, so the cost is no longer P + V, but 2Q + V instead (with U > V). In other words, isProfitable check can not be done inside 'hasBetterLayoutPredecessor', but hoisted to the caller of it when 'hasBetterLayoutPrecessor' returns, at which point we will know the layout decision if taildup does not kick in. davidxl:* We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method…
		iterateeAuthorUnsubmitted Done Reply Inline Actions I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor. iteratee: I think I have the calculation right for when Succ would not be the layout successor, but…
		iterateeAuthorUnsubmitted Done Reply Inline Actions We now only call this function to check if we should use Succ despite it having been rejected. So we know that Succ is not the layout successor. iteratee: We now only call this function to check if we should use Succ despite it having been rejected.
		// '=' : Branch taken for that CFG edge
		// In the second case, Placing Succ while duplicating it into C prevents the
		// fallthrough of Succ into either D or PDom, because they now have C as an
		davidxlUnsubmitted Done Reply Inline Actions Why break here? davidxl: Why break here?
		iterateeAuthorUnsubmitted Done Reply Inline Actions Because if PDom is not null, that's all that we look at for the probability calculation. iteratee: Because if PDom is not null, that's all that we look at for the probability calculation.
		// unplaced predecessor

		// Start by figuring out which case we fall into
		MachineBasicBlock *PDom = nullptr;
		SmallVector<MachineBasicBlock *, 4> SuccSuccs;
		davidxlUnsubmitted Done Reply Inline Actions nit: -->SuccBestPred davidxl: nit: -->SuccBestPred
		// Only scan the relevant successors
		auto AdjustedSuccSumProb =
		collectViableSuccessors(Succ, Chain, BlockFilter, SuccSuccs);
		BranchProbability PProb = MBPI->getEdgeProbability(BB, Succ);
		auto BBFreq = MBFI->getBlockFreq(BB);
		auto SuccFreq = MBFI->getBlockFreq(Succ);
		BlockFrequency P = BBFreq * PProb;
		davidxlUnsubmitted Done Reply Inline Actions In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block. Dependinng the result, we will know that without tailDup, the layout order is Succ-> PDom or Succ->D->PDom. This will make the cost computation more precise. davidxl: In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block.
		BlockFrequency Qout = BBFreq * QProb;
		davidxlUnsubmitted Done Reply Inline Actions Computing BestSuccPred here is unnecessary. See below for more comments. davidxl: Computing BestSuccPred here is unnecessary. See below for more comments.
		uint64_t EntryFreq = MBFI->getEntryFreq();
		// If there are no more successors, it is profitable to copy, as it strictly
		// increases fallthrough.
		if (SuccSuccs.size() == 0)
		return greaterWithBias(P, Qout, EntryFreq);

		davidxlUnsubmitted Done Reply Inline Actions Add a short cut here with comments: // If P is not larger, the best successor selection loop will eventually select C, not Succ (as it is not profitable to do so). if (P <= Qout) return false; davidxl: Add a short cut here with comments: // If P is not larger, the best successor selection loop…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know that we won't use the result. iteratee: If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know…
		davidxlUnsubmitted Done Reply Inline Actions How about this comment? Early return can 1) speed up the computation and 2) make the following code easier to understand. davidxl: How about this comment? Early return can 1) speed up the computation and 2) make the following…
		auto BestSuccSucc = BranchProbability::getZero();
		// Find the PDom or the best Succ if no PDom exists.
		for (MachineBasicBlock *SuccSucc : SuccSuccs) {
		davidxlUnsubmitted Not Done Reply Inline Actions Qin is not necessarily BestSuccPred. Profitability check is called only after hasBetterLayoutPredecessor is returned and it returns true. There are two scenarios it returns true Qin or Qout is larger than P, or P is larger than Qout, but not the branch is not biased enough such that the layout algorithm still decides to keep the top-order. Either way, the baseline layout to compare (with taildup) is that BB->Succ is the branch taken edge, and BB->C is the fall through edge. Qin should just be Prob(BB->C) davidxl: Qin is not necessarily BestSuccPred. Profitability check is called only after…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be larger than Freq(BB->C). I am using Qin to represent Freq(C'->Succ) and Qout for Freq(BB->C). I could just use different letters if that were more clear. Qout is Freq(BB->C). I don't think Qin should be as well. iteratee: When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be…
		davidxlUnsubmitted Not Done Reply Inline Actions differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB->Succ). What I meant is you should directly compute Qin as its definition Freq(C'->Succ) davidxl: differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Did you still want me to fix something here? iteratee: Did you still want me to fix something here?
		davidxlUnsubmitted Done Reply Inline Actions just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming edges which have not been placed. davidxl: just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming…
		auto Prob = MBPI->getEdgeProbability(Succ, SuccSucc);
		if (Prob > BestSuccSucc)
		BestSuccSucc = Prob;
		if (PDom == nullptr)
		if (MPDT->dominates(SuccSucc, Succ)) {
		PDom = SuccSucc;
		break;
		}
		}
		// For the comparisons, we need to know Succ's best incoming edge that isn't
		// from BB.
		auto SuccBestPred = BlockFrequency(0);
		for (MachineBasicBlock *SuccPred : Succ->predecessors()) {
		if (SuccPred == Succ \|\| SuccPred == BB
		\|\| BlockToChain[SuccPred] == &Chain
		\|\| (BlockFilter && !BlockFilter->count(SuccPred)))
		davidxlUnsubmitted Done Reply Inline Actions PDom is always a successor of Succ according to the way it is computed. davidxl: PDom is always a successor of Succ according to the way it is computed.
		iterateeAuthorUnsubmitted Done Reply Inline Actions Thanks. iteratee: Thanks.
		continue;
		auto Freq = MBFI->getBlockFreq(SuccPred)
		* MBPI->getEdgeProbability(SuccPred, Succ);
		if (Freq > SuccBestPred)
		SuccBestPred = Freq;
		}
		davidxlUnsubmitted Done Reply Inline Actions The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the fallthrough path is BB->Succ->D so the cost (normalized with freq(bb) ==1) is 2Q+ PV If P < Q, the fall through path is BB->C'->D the cost is 2P + QV davidxl: The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions This function is called in a loop looking for the highest probability successor. If Q > P, this function will be ignored and we will lay out Q anyway, so we can ignore the second case. As to the first case: Until the 2nd patch lands, the duplication will prevent the BB->Succ->D layout. Instead you will get BB->Succ ; C'->D So the cost is as calculated. D28522 will include an update to this calculation along with an update to the behavior. iteratee: This function is called in a loop looking for the highest probability successor. If Q > P, this…
		davidxlUnsubmitted Done Reply Inline Actions You are right about Q > P case that that scenario will be dropped. It is very subtle, so please add some comment to clarify. Ok -- for the first case, also add a comment davidxl: You are right about Q > P case that that scenario will be dropped. It is very subtle, so please…
		// Qin is Succ's best unplaced incoming edge that isn't BB
		BlockFrequency Qin = SuccBestPred;
		// If it doesn't have a post-dominating successor, here is the calculation:
		// BB BB
		// \| \Qout \| \
		// P\| C \| =
		// = C' \| C
		// \| /Qin \| \|
		// \| / \| C' (+Succ)
		// Succ Succ /\|
		// / \ \| \/ \|
		// U/ =V = /= =
		// / \ \| / \\|
		// D E D E
		// '=' : Branch taken for that CFG edge
		// Cost in the first case is: P + V
		// For this calculation, we always assume P > Qout. If Qout > P
		// The result of this function will be ignored at the caller.
		// Cost in the second case is: Qout + Qin * V + P * U + P * V
		// TODO(iteratee): If we lay out D after Succ, the P * U term
		// goes away. This logic is coming in D28522.

		if (PDom == nullptr \|\| !Succ->isSuccessor(PDom)) {
		BranchProbability UProb = BestSuccSucc;
		BranchProbability VProb = AdjustedSuccSumProb - UProb;
		BlockFrequency V = SuccFreq * VProb;
		BlockFrequency QinV = Qin * VProb;
		BlockFrequency BaseCost = P + V;
		BlockFrequency DupCost = Qout + QinV + P * AdjustedSuccSumProb;
		return greaterWithBias(BaseCost, DupCost, EntryFreq);
		}
		BranchProbability UProb = MBPI->getEdgeProbability(Succ, PDom);
		BranchProbability VProb = AdjustedSuccSumProb - UProb;
		BlockFrequency U = SuccFreq * UProb;
		BlockFrequency V = SuccFreq * VProb;
		// If there is a post-dominating successor, here is the calculation:
		// BB BB BB BB
		// \| \Qout \| \ \| \Qout \| \
		// \|P C \| = \|P C \| =
		// = C' \|P C = C' \|P C
		// \| /Qin \| \| \| /Qin \| \|
		// \| / \| C' (+Succ) \| / \| C' (+Succ)
		// Succ Succ /\| Succ Succ /\|
		// \| \ V \| \/ \| \| \ V \| \/ \|
		// \|U \ \|U /\ \| \|U = \|U /\ \|
		// = D = = =\| \| D \| = =\|
		// \| / \|/ D \| / \|/ D
		// \| / \| / \| = \| /
		// \|/ \| / \|/ \| =
		// Dom Dom Dom Dom
		// '=' : Branch taken for that CFG edge
		// The cost for taken branches in the first case is P + U
		// The cost in the second case (assuming independence), given the layout:
		// BB, Succ, (C+Succ), D, Dom
		// is Qout + P * V + Qin * U
		// compare P + U vs Qout + P * V + Qin * U.
		//
		// The 3rd and 4th cases cover when Dom would be chosen to follow Succ.
		//
		// For the 3rd case, the cost is P + 2 * V
		// For the 4th case, the cost is Qout + Qin * U + P * V + V
		// We choose 4 over 3 when (P + V) > Qout + Qin * U + P * V
		if (UProb > AdjustedSuccSumProb / 2
		&& !hasBetterLayoutPredecessor(Succ, PDom, *BlockToChain[PDom],
		UProb, UProb, Chain, BlockFilter)) {
		// Cases 3 & 4
		return greaterWithBias((P + V), (Qout + Qin * UProb + P * VProb),
		EntryFreq);
		}
		// Cases 1 & 2
		return greaterWithBias(
		(P + U), (Qout + Qin * UProb + P * AdjustedSuccSumProb), EntryFreq);
		}


		/// When the option TailDupPlacement is on, this method checks if the
		/// fallthrough candidate block \p Succ (of block \p BB) can be tail-duplicated
		/// into all of its unplaced, unfiltered predecessors, that are not BB.
		bool MachineBlockPlacement::canTailDuplicateUnplacedPreds(
		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &Chain,
		const BlockFilterSet *BlockFilter) {
		if (!shouldTailDuplicate(Succ))
		return false;

		for (MachineBasicBlock *Pred : Succ->predecessors()) {
		// Make sure all unplaced and unfiltered predecessors can be
		// tail-duplicated into.
		if (Pred == BB \|\| (BlockFilter && !BlockFilter->count(Pred))
		\|\| BlockToChain[Pred] == &Chain)
		continue;
		if (!TailDup.canTailDuplicate(Succ, Pred))
		return false;
		}
		return true;
		}

/// When the option OutlineOptionalBranches is on, this method		/// When the option OutlineOptionalBranches is on, this method
/// checks if the fallthrough candidate block \p Succ (of block		/// checks if the fallthrough candidate block \p Succ (of block
/// \p BB) also has other unscheduled predecessor blocks which		/// \p BB) also has other unscheduled predecessor blocks which
/// are also successors of \p BB (forming triangular shape CFG).		/// are also successors of \p BB (forming triangular shape CFG).
/// If none of such predecessors are small, it returns true.		/// If none of such predecessors are small, it returns true.
/// The caller can choose to select \p Succ as the layout successors		/// The caller can choose to select \p Succ as the layout successors
/// so that \p Succ's predecessors (optional branches) can be		/// so that \p Succ's predecessors (optional branches) can be
/// outlined.		/// outlined.
Show All 32 Lines	static BranchProbability getLayoutSuccessorProbThreshold(
if (!BB->getParent()->getFunction()->getEntryCount())		if (!BB->getParent()->getFunction()->getEntryCount())
return BranchProbability(StaticLikelyProb, 100);		return BranchProbability(StaticLikelyProb, 100);
if (BB->succ_size() == 2) {		if (BB->succ_size() == 2) {
const MachineBasicBlock Succ1 = BB->succ_begin();		const MachineBasicBlock Succ1 = BB->succ_begin();
const MachineBasicBlock Succ2 = (BB->succ_begin() + 1);		const MachineBasicBlock Succ2 = (BB->succ_begin() + 1);
if (Succ1->isSuccessor(Succ2) \|\| Succ2->isSuccessor(Succ1)) {		if (Succ1->isSuccessor(Succ2) \|\| Succ2->isSuccessor(Succ1)) {
/* See case 1 below for the cost analysis. For BB->Succ to		/* See case 1 below for the cost analysis. For BB->Succ to
* be taken with smaller cost, the following needs to hold:		* be taken with smaller cost, the following needs to hold:
* Prob(BB->Succ) > 2* Prob(BB->Pred)		* Prob(BB->Succ) > 2 * Prob(BB->Pred)
* So the threshold T		* So the threshold T in the calculation below
* T = 2 * (1-Prob(BB->Pred). Since T + Prob(BB->Pred) == 1,		* (1-T) * Prob(BB->Succ) > T * Prob(BB->Pred)
* We have T + T/2 = 1, i.e. T = 2/3. Also adding user specified		* So T / (1 - T) = 2, Yielding T = 2/3
* branch bias, we have		* Also adding user specified branch bias, we have
* T = (2/3)*(ProfileLikelyProb/50)		* T = (2/3)*(ProfileLikelyProb/50)
* = (2*ProfileLikelyProb)/150)		* = (2*ProfileLikelyProb)/150)
*/		*/
return BranchProbability(2 * ProfileLikelyProb, 150);		return BranchProbability(2 * ProfileLikelyProb, 150);
}		}
}		}
return BranchProbability(ProfileLikelyProb, 100);		return BranchProbability(ProfileLikelyProb, 100);
}		}

/// Checks to see if the layout candidate block \p Succ has a better layout		/// Checks to see if the layout candidate block \p Succ has a better layout
/// predecessor than \c BB. If yes, returns true.		/// predecessor than \c BB. If yes, returns true.
		/// \p SuccProb: The probability adjusted for only remaining blocks.
		/// Only used for logging
		/// \p RealSuccProb: The un-adjusted probability.
		/// \p Chain: The chain that BB belongs to and Succ is being considered for.
		/// \p BlockFilter: if non-null, the set of blocks that make up the loop being
		/// considered
bool MachineBlockPlacement::hasBetterLayoutPredecessor(		bool MachineBlockPlacement::hasBetterLayoutPredecessor(
		davidxlUnsubmitted Not Done Reply Inline Actions Add more description about what blocks to ignore. davidxl: Add more description about what blocks to ignore.
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Well, that's really up to the caller. Do you want me to list why you might want to ignore a block? iteratee: Well, that's really up to the caller. Do you want me to list why you might want to ignore a…
		davidxlUnsubmitted Done Reply Inline Actions something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details. However, see my comment in the function, this parameter seems unnecessary. davidxl: something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details.
MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,
BranchProbability SuccProb, BranchProbability RealSuccProb,		BranchProbability SuccProb, BranchProbability RealSuccProb,
BlockChain &Chain, const BlockFilterSet *BlockFilter) {		BlockChain &Chain, const BlockFilterSet *BlockFilter) {

// There isn't a better layout when there are no unscheduled predecessors.		// There isn't a better layout when there are no unscheduled predecessors.
if (SuccChain.UnscheduledPredecessors == 0)		if (SuccChain.UnscheduledPredecessors == 0)
return false;		return false;

▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::hasBetterLayoutPredecessor(
// Make sure that a hot successor doesn't have a globally more		// Make sure that a hot successor doesn't have a globally more
// important predecessor.		// important predecessor.
BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;		BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;
bool BadCFGConflict = false;		bool BadCFGConflict = false;

for (MachineBasicBlock *Pred : Succ->predecessors()) {		for (MachineBasicBlock *Pred : Succ->predecessors()) {
if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|		if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|
(BlockFilter && !BlockFilter->count(Pred)) \|\|		(BlockFilter && !BlockFilter->count(Pred)) \|\|
BlockToChain[Pred] == &Chain)		BlockToChain[Pred] == &Chain \|\|
		// This check is redundant except for look ahead. This function is
		davidxlUnsubmitted Done Reply Inline Actions I think it is equivalent to check Pred == BB. In normal calling context, this is covered by BlockToChain[Pred] == &Chain, but for lookahead case, it is needed to filter BB which is not laid out yet. davidxl: I think it is equivalent to check Pred == BB. In normal calling context, this is covered by…
		// called for lookahead by isProfitableToTailDup when BB hasn't been
		davidxlUnsubmitted Done Reply Inline Actions --> ... for lookhead by isProfitableToTailDup when BB has not yet been placed. davidxl: --> ... for lookhead by isProfitableToTailDup when BB has not yet been placed.
		// placed yet.
		(Pred == BB))
continue;		continue;
// Do backward checking.		// Do backward checking.
// For all cases above, we need a backward checking to filter out edges that		// For all cases above, we need a backward checking to filter out edges that
// are not 'strongly' biased. With profile data available, the check is		// are not 'strongly' biased.
// mostly redundant for case 2 (when threshold prob is set at 50%) unless S
// has more than two successors.
// BB Pred		// BB Pred
// \ /		// \ /
// Succ		// Succ
// We select edge BB->Succ if		// We select edge BB->Succ if
// freq(BB->Succ) > freq(Succ) * HotProb		// freq(BB->Succ) > freq(Succ) * HotProb
// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *		// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *
// HotProb		// HotProb
// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb		// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb
Show All 19 Lines
/// \brief Select the best successor for a block.		/// \brief Select the best successor for a block.
///		///
/// This looks across all successors of a particular block and attempts to		/// This looks across all successors of a particular block and attempts to
/// select the "best" one to be the layout successor. It only considers direct		/// select the "best" one to be the layout successor. It only considers direct
/// successors which also pass the block filter. It will attempt to avoid		/// successors which also pass the block filter. It will attempt to avoid
/// breaking CFG structure, but cave and break such structures in the case of		/// breaking CFG structure, but cave and break such structures in the case of
/// very hot successor edges.		/// very hot successor edges.
///		///
/// \returns The best successor block found, or null if none are viable.		/// \returns The best successor block found, or null if none are viable, along
MachineBasicBlock *		/// with a boolean indicating if tail duplication is necessary.
		MachineBlockPlacement::BlockAndTailDupResult
MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB,		MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB,
BlockChain &Chain,		BlockChain &Chain,
const BlockFilterSet *BlockFilter) {		const BlockFilterSet *BlockFilter) {
const BranchProbability HotProb(StaticLikelyProb, 100);		const BranchProbability HotProb(StaticLikelyProb, 100);

MachineBasicBlock *BestSucc = nullptr;		BlockAndTailDupResult BestSucc = { nullptr, false };
auto BestProb = BranchProbability::getZero();		auto BestProb = BranchProbability::getZero();

SmallVector<MachineBasicBlock *, 4> Successors;		SmallVector<MachineBasicBlock *, 4> Successors;
auto AdjustedSumProb =		auto AdjustedSumProb =
collectViableSuccessors(BB, Chain, BlockFilter, Successors);		collectViableSuccessors(BB, Chain, BlockFilter, Successors);

DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n");		DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n");

		// For blocks with CFG violations, we may be able to lay them out anyway with
		// tail-duplication. We keep this vector so we can perform the probability
		// calculations the minimum number of times.
		SmallVector<std::tuple<BranchProbability, MachineBasicBlock *>, 4>
		DupCandidates;
for (MachineBasicBlock *Succ : Successors) {		for (MachineBasicBlock *Succ : Successors) {
auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);		auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);
BranchProbability SuccProb =		BranchProbability SuccProb =
getAdjustedProbability(RealSuccProb, AdjustedSumProb);		getAdjustedProbability(RealSuccProb, AdjustedSumProb);

// This heuristic is off by default.		// This heuristic is off by default.
if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb,		if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb,
HotProb))		HotProb)) {
return Succ;		BestSucc.BB = Succ;
		return BestSucc;
		}

BlockChain &SuccChain = *BlockToChain[Succ];		BlockChain &SuccChain = *BlockToChain[Succ];
// Skip the edge \c BB->Succ if block \c Succ has a better layout		// Skip the edge \c BB->Succ if block \c Succ has a better layout
// predecessor that yields lower global cost.		// predecessor that yields lower global cost.
if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,		if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,
Chain, BlockFilter))		Chain, BlockFilter)) {
		// If tail duplication would make Succ profitable, place it.
		if (TailDupPlacement && shouldTailDuplicate(Succ))
		DupCandidates.push_back(std::make_tuple(SuccProb, Succ));
		davidxlUnsubmitted Done Reply Inline Actions Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win depending on the iteration order, but adds more confusion. davidxl: Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win…
continue;		continue;
		}

DEBUG(		DEBUG(
dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "		dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "
<< SuccProb		<< SuccProb
<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")		<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")
<< "\n");		<< "\n");

if (BestSucc && BestProb >= SuccProb) {		if (BestSucc.BB && BestProb >= SuccProb) {
DEBUG(dbgs() << " Not the best candidate, continuing\n");		DEBUG(dbgs() << " Not the best candidate, continuing\n");
continue;		continue;
}		}

DEBUG(dbgs() << " Setting it as best candidate\n");		DEBUG(dbgs() << " Setting it as best candidate\n");
BestSucc = Succ;		BestSucc.BB = Succ;
BestProb = SuccProb;		BestProb = SuccProb;
		davidxlUnsubmitted Done Reply Inline Actions no need to set ShouldTailDup in the loop -- it is already initalized outside. davidxl: no need to set ShouldTailDup in the loop -- it is already initalized outside.
}		}
if (BestSucc)		// Handle the tail duplication candidates in order of decreasing probability.
DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc) << "\n");		// Stop at the first one that is profitable. Also stop if they are less
		// profitable than BestSucc. Position is important because we preserve it and
		// prefer first best match. Here we aren't comparing in order, so we capture
		// the position instead.
		if (DupCandidates.size() != 0) {
		auto cmp =
		davidxlUnsubmitted Done Reply Inline Actions Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do you need position ? davidxl: Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Will just sort the vector. Position is because we rely on the successor order being stable and the first successor being a subtle hint. Without the position, we lose track of whether the block in the vector came before or after the block we picked without tail duplication. iteratee: Will just sort the vector. Position is because we rely on the successor order being stable and…
		[](const std::tuple<BranchProbability, MachineBasicBlock *> &a,
		const std::tuple<BranchProbability, MachineBasicBlock *> &b) {
		return std::get<0>(a) > std::get<0>(b);
		};
		std::stable_sort(DupCandidates.begin(), DupCandidates.end(), cmp);
		}
		for(auto &Tup : DupCandidates) {
		BranchProbability DupProb;
		MachineBasicBlock *Succ;
		std::tie(DupProb, Succ) = Tup;
		davidxlUnsubmitted Done Reply Inline Actions Should it break instead? davidxl: Should it break instead?
		if (DupProb < BestProb)
		break;
		davidxlUnsubmitted Not Done Reply Inline Actions isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not be true here as there are other two possibilities: Succ == BestSucc.BB in the base layout BestSucc.BB == null in the base layout (all BB's successors have conflicts). In such two cases, isProfitable check should probably be skipped (as it is benefitial) davidxl: isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the first loop by construction. Good catch. I'll add that. iteratee: 1. Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the…
		if (canTailDuplicateUnplacedPreds(BB, Succ, Chain, BlockFilter)
		// If tail duplication gives us fallthrough when we otherwise wouldn't
		// have it, that is a strict gain.
		&& (BestSucc.BB == nullptr
		\|\| isProfitableToTailDup(BB, Succ, BestProb, Chain,
		BlockFilter))) {
		DEBUG(
		dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "
		<< DupProb
		<< " (Tail Duplicate)\n");
		BestSucc.BB = Succ;
		BestSucc.ShouldTailDup = true;
		break;
		}
		}

		if (BestSucc.BB)
		DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB) << "\n");

return BestSucc;		return BestSucc;
}		}

/// \brief Select the best block from a worklist.		/// \brief Select the best block from a worklist.
///		///
/// This looks through the provided worklist as a list of candidate basic		/// This looks through the provided worklist as a list of candidate basic
/// blocks and select the most profitable one to place. The definition of		/// blocks and select the most profitable one to place. The definition of
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	void MachineBlockPlacement::buildChain(
for (;;) {		for (;;) {
assert(BB && "null block found at end of chain in loop.");		assert(BB && "null block found at end of chain in loop.");
assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop.");		assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop.");
assert(*std::prev(Chain.end()) == BB && "BB Not found at end of chain.");		assert(*std::prev(Chain.end()) == BB && "BB Not found at end of chain.");


// Look for the best viable successor if there is one to place immediately		// Look for the best viable successor if there is one to place immediately
// after this block.		// after this block.
MachineBasicBlock *BestSucc = selectBestSuccessor(BB, Chain, BlockFilter);		auto Result = selectBestSuccessor(BB, Chain, BlockFilter);
		MachineBasicBlock* BestSucc = Result.BB;
		bool ShouldTailDup = Result.ShouldTailDup;
		if (TailDupPlacement)
		ShouldTailDup \|= (BestSucc && shouldTailDuplicate(BestSucc));

// If an immediate successor isn't available, look for the best viable		// If an immediate successor isn't available, look for the best viable
// block among those we've identified as not violating the loop's CFG at		// block among those we've identified as not violating the loop's CFG at
// this point. This won't be a fallthrough, but it will increase locality.		// this point. This won't be a fallthrough, but it will increase locality.
if (!BestSucc)		if (!BestSucc)
BestSucc = selectBestCandidateBlock(Chain, BlockWorkList);		BestSucc = selectBestCandidateBlock(Chain, BlockWorkList);
if (!BestSucc)		if (!BestSucc)
BestSucc = selectBestCandidateBlock(Chain, EHPadWorkList);		BestSucc = selectBestCandidateBlock(Chain, EHPadWorkList);

if (!BestSucc) {		if (!BestSucc) {
BestSucc = getFirstUnplacedBlock(Chain, PrevUnplacedBlockIt, BlockFilter);		BestSucc = getFirstUnplacedBlock(Chain, PrevUnplacedBlockIt, BlockFilter);
if (!BestSucc)		if (!BestSucc)
break;		break;

DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "		DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "
"layout successor until the CFG reduces\n");		"layout successor until the CFG reduces\n");
}		}

// Placement may have changed tail duplication opportunities.		// Placement may have changed tail duplication opportunities.
// Check for that now.		// Check for that now.
if (TailDupPlacement && BestSucc) {		if (TailDupPlacement && BestSucc && ShouldTailDup) {
// If the chosen successor was duplicated into all its predecessors,		// If the chosen successor was duplicated into all its predecessors,
// don't bother laying it out, just go round the loop again with BB as		// don't bother laying it out, just go round the loop again with BB as
// the chain end.		// the chain end.
if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,		if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,
BlockFilter, PrevUnplacedBlockIt))		BlockFilter, PrevUnplacedBlockIt))
continue;		continue;
}		}

▲ Show 20 Lines • Show All 875 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::maybeTailDuplicateBlock(
MachineBasicBlock BB, MachineBasicBlock LPred,		MachineBasicBlock BB, MachineBasicBlock LPred,
const BlockChain &Chain, BlockFilterSet *BlockFilter,		const BlockChain &Chain, BlockFilterSet *BlockFilter,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
bool &DuplicatedToLPred) {		bool &DuplicatedToLPred) {

DuplicatedToLPred = false;		DuplicatedToLPred = false;
DEBUG(dbgs() << "Redoing tail duplication for Succ#"		DEBUG(dbgs() << "Redoing tail duplication for Succ#"
<< BB->getNumber() << "\n");		<< BB->getNumber() << "\n");
bool IsSimple = TailDup.isSimpleBB(BB);
// Blocks with single successors don't create additional fallthrough		if (!shouldTailDuplicate(BB))
// opportunities. Don't duplicate them. TODO: When conditional exits are
// analyzable, allow them to be duplicated.
if (!IsSimple && BB->succ_size() == 1)
return false;
if (!TailDup.shouldTailDuplicate(IsSimple, *BB))
return false;		return false;
// This has to be a callback because none of it can be done after		// This has to be a callback because none of it can be done after
// BB is deleted.		// BB is deleted.
bool Removed = false;		bool Removed = false;
auto RemovalCallback =		auto RemovalCallback =
[&](MachineBasicBlock *RemBB) {		[&](MachineBasicBlock *RemBB) {
// Signal to outer function		// Signal to outer function
Removed = true;		Removed = true;
Show All 36 Lines	auto RemovalCallback =

DEBUG(dbgs() << "TailDuplicator deleted block: "		DEBUG(dbgs() << "TailDuplicator deleted block: "
<< getBlockName(RemBB) << "\n");		<< getBlockName(RemBB) << "\n");
};		};
auto RemovalCallbackRef =		auto RemovalCallbackRef =
llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback);		llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback);

SmallVector<MachineBasicBlock *, 8> DuplicatedPreds;		SmallVector<MachineBasicBlock *, 8> DuplicatedPreds;
		bool IsSimple = TailDup.isSimpleBB(BB);
TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred,		TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred,
&DuplicatedPreds, &RemovalCallbackRef);		&DuplicatedPreds, &RemovalCallbackRef);

// Update UnscheduledPredecessors to reflect tail-duplication.		// Update UnscheduledPredecessors to reflect tail-duplication.
DuplicatedToLPred = false;		DuplicatedToLPred = false;
for (MachineBasicBlock *Pred : DuplicatedPreds) {		for (MachineBasicBlock *Pred : DuplicatedPreds) {
// We're only looking for unscheduled predecessors that match the filter.		// We're only looking for unscheduled predecessors that match the filter.
BlockChain* PredChain = BlockToChain[Pred];		BlockChain* PredChain = BlockToChain[Pred];
Show All 24 Lines	bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
F = &MF;		F = &MF;
MBPI = &getAnalysis<MachineBranchProbabilityInfo>();		MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(		MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
getAnalysis<MachineBlockFrequencyInfo>());		getAnalysis<MachineBlockFrequencyInfo>());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TLI = MF.getSubtarget().getTargetLowering();		TLI = MF.getSubtarget().getTargetLowering();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
		MPDT = nullptr;

// Initialize PreferredLoopExit to nullptr here since it may never be set if		// Initialize PreferredLoopExit to nullptr here since it may never be set if
// there are no MachineLoops.		// there are no MachineLoops.
PreferredLoopExit = nullptr;		PreferredLoopExit = nullptr;

if (TailDupPlacement) {		if (TailDupPlacement) {
unsigned TailDupSize = TailDuplicatePlacementThreshold;		MPDT = &getAnalysis<MachinePostDominatorTree>();
		unsigned TailDupSize = TailDupPlacementThreshold;
if (MF.getFunction()->optForSize())		if (MF.getFunction()->optForSize())
TailDupSize = 1;		TailDupSize = 1;
TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);		TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);
}		}

assert(BlockToChain.empty());		assert(BlockToChain.empty());

buildCFGChains();		buildCFGChains();

// Changing the layout can create new tail merging opportunities.		// Changing the layout can create new tail merging opportunities.
TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();		TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
// TailMerge can create jump into if branches that make CFG irreducible for		// TailMerge can create jump into if branches that make CFG irreducible for
// HW that requires structured CFG.		// HW that requires structured CFG.
bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() &&		bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() &&
PassConfig->getEnableTailMerge() &&		PassConfig->getEnableTailMerge() &&
BranchFoldPlacement;		BranchFoldPlacement;
// No tail merging opportunities if the block number is less than four.		// No tail merging opportunities if the block number is less than four.
if (MF.size() > 3 && EnableTailMerge) {		if (MF.size() > 3 && EnableTailMerge) {
unsigned TailMergeSize = TailDuplicatePlacementThreshold + 1;		unsigned TailMergeSize = TailDupPlacementThreshold + 1;
BranchFolder BF(/EnableTailMerge=/true, /CommonHoist=/false, *MBFI,		BranchFolder BF(/EnableTailMerge=/true, /CommonHoist=/false, *MBFI,
*MBPI, TailMergeSize);		*MBPI, TailMergeSize);

if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),		if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),
getAnalysisIfAvailable<MachineModuleInfo>(), MLI,		getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
/AfterBlockPlacement=/true)) {		/AfterBlockPlacement=/true)) {
// Redo the layout if tail merging creates/removes/moves blocks.		// Redo the layout if tail merging creates/removes/moves blocks.
BlockToChain.clear();		BlockToChain.clear();
// Must redo the dominator tree if blocks were changed.		// Must redo the dominator tree if blocks were changed.
MDT->runOnMachineFunction(MF);		MDT->runOnMachineFunction(MF);
		if (MPDT)
		MPDT->runOnMachineFunction(MF);
ChainAllocator.DestroyAll();		ChainAllocator.DestroyAll();
buildCFGChains();		buildCFGChains();
}		}
}		}

optimizeBranches();		optimizeBranches();
alignBlocks();		alignBlocks();

▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-atomic.ll

	; RUN: llc < %s -mtriple=arm64-eabi -asm-verbose=false -verify-machineinstrs -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-eabi -asm-verbose=false -verify-machineinstrs -mcpu=cyclone \| FileCheck %s

	define i32 @val_compare_and_swap(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap(i32* %p, i32 %cmp, i32 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap:			; CHECK-LABEL: val_compare_and_swap:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {			define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {
	; CHECK-LABEL: val_compare_and_swap_from_load:			; CHECK-LABEL: val_compare_and_swap_from_load:
	; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]			; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x0]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x0]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: mov x0, x[[ADDR]]
				; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: mov x0, x[[ADDR]]
				; CHECK-NEXT: ret
	%new = load i32, i32* %pnew			%new = load i32, i32* %pnew
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap_rel:			; CHECK-LABEL: val_compare_and_swap_rel:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stlxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]			; CHECK-NEXT: stlxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acq_rel monotonic			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acq_rel monotonic
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i64 @val_compare_and_swap_64(i64* %p, i64 %cmp, i64 %new) #0 {			define i64 @val_compare_and_swap_64(i64* %p, i64 %cmp, i64 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap_64:			; CHECK-LABEL: val_compare_and_swap_64:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldxr [[RESULT:x[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldxr [[RESULT:x[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], x1			; CHECK-NEXT: cmp [[RESULT]], x1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], x2, [x[[ADDR]]]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], x2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i64* %p, i64 %cmp, i64 %new monotonic monotonic			%pair = cmpxchg i64* %p, i64 %cmp, i64 %new monotonic monotonic
	%val = extractvalue { i64, i1 } %pair, 0			%val = extractvalue { i64, i1 } %pair, 0
	ret i64 %val			ret i64 %val
	}			}

	define i32 @fetch_and_nand(i32* %p) #0 {			define i32 @fetch_and_nand(i32* %p) #0 {
	; CHECK-LABEL: fetch_and_nand:			; CHECK-LABEL: fetch_and_nand:
	; CHECK: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK: [[TRYBB:.?LBB[0-9_]+]]:
	▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-shrink-wrapping.ll

	Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]			; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]
	; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8			; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8
	; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]			; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]
	; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]			; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]
	; CHECK-NEXT: sub w1, w1, #1			; CHECK-NEXT: sub w1, w1, #1
	; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]			; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]
	; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]			; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]
	; DISABLE-NEXT: b [[IFEND_LABEL]]			; CHECK-NEXT: [[IFEND_LABEL]]:
	;
	; DISABLE: [[ELSE_LABEL]]: ; %if.else
	; DISABLE: lsl w0, w1, #1
	;
	; CHECK: [[IFEND_LABEL]]:
	; Epilogue code.			; Epilogue code.
	; CHECK: add sp, sp, #16			; CHECK: add sp, sp, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; ENABLE: [[ELSE_LABEL]]: ; %if.else			; CHECK: [[ELSE_LABEL]]: ; %if.else
	; ENABLE-NEXT: lsl w0, w1, #1			; CHECK-NEXT: lsl w0, w1, #1
	; ENABLE_NEXT: ret			; DISABLE-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
	define i32 @variadicFunc(i32 %cond, i32 %count, ...) #0 {			define i32 @variadicFunc(i32 %cond, i32 %count, ...) #0 {
	entry:			entry:
	%ap = alloca i8*, align 8			%ap = alloca i8*, align 8
	%tobool = icmp eq i32 %cond, 0			%tobool = icmp eq i32 %cond, 0
	br i1 %tobool, label %if.else, label %if.then			br i1 %tobool, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%ap1 = bitcast i8** %ap to i8*			%ap1 = bitcast i8** %ap to i8*
	▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

test/CodeGen/AArch64/tail-dup-repeat-worklist.ll

This file was deleted.

This file was completely deleted. Show File Contents

test/CodeGen/AArch64/tbz-tbnz.ll

	; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s

	declare void @t()			declare void @t()

	define void @test1(i32 %a) {			define void @test1(i32 %a) {
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	entry:			entry:
	%sub = add nsw i32 %a, -12			%sub = add nsw i32 %a, -12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:w[0-9]+]], w0, #12			; CHECK: sub [[CMP:w[0-9]+]], w0, #12
	; CHECK: tbz [[CMP]], #31			; CHECK: tbnz [[CMP]], #31

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test2(i64 %a) {			define void @test2(i64 %a) {
	; CHECK-LABEL: @test2			; CHECK-LABEL: @test2
	entry:			entry:
	%sub = add nsw i64 %a, -12			%sub = add nsw i64 %a, -12
	%cmp = icmp slt i64 %sub, 0			%cmp = icmp slt i64 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:x[0-9]+]], x0, #12			; CHECK: sub [[CMP:x[0-9]+]], x0, #12
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	define void @test7(i32 %a) {			define void @test7(i32 %a) {
	; CHECK-LABEL: @test7			; CHECK-LABEL: @test7
	entry:			entry:
	%sub = sub nsw i32 %a, 12			%sub = sub nsw i32 %a, 12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:w[0-9]+]], w0, #12			; CHECK: sub [[CMP:w[0-9]+]], w0, #12
	; CHECK: tbz [[CMP]], #31			; CHECK: tbnz [[CMP]], #31

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	}			}

	define void @test9(i64 %val1) {			define void @test9(i64 %val1) {
	; CHECK-LABEL: @test9			; CHECK-LABEL: @test9
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test10(i64 %val1) {			define void @test10(i64 %val1) {
	; CHECK-LABEL: @test10			; CHECK-LABEL: @test10
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test11(i64 %val1, i64* %ptr) {			define void @test11(i64 %val1, i64* %ptr) {
	; CHECK-LABEL: @test11			; CHECK-LABEL: @test11

	; CHECK: ldr [[CMP:x[0-9]+]], [x1]			; CHECK: ldr [[CMP:x[0-9]+]], [x1]
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	%val = load i64, i64* %ptr			%val = load i64, i64* %ptr
	%tst = icmp slt i64 %val, 0			%tst = icmp slt i64 %val, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test12(i64 %val1) {			define void @test12(i64 %val1) {
	; CHECK-LABEL: @test12			; CHECK-LABEL: @test12
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test13(i64 %val1, i64 %val2) {			define void @test13(i64 %val1, i64 %val2) {
	; CHECK-LABEL: @test13			; CHECK-LABEL: @test13
	%or = or i64 %val1, %val2			%or = or i64 %val1, %val2
	%tst = icmp slt i64 %or, 0			%tst = icmp slt i64 %or, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK: orr [[CMP:x[0-9]+]], x0, x1			; CHECK: orr [[CMP:x[0-9]+]], x0, x1
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/branch-relaxation.ll

	Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND

	; GCN-NEXT: [[BB3]]: ; %bb3			; GCN-NEXT: [[BB3]]: ; %bb3
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @expand_requires_expand(i32 %cond0) #0 {			define void @expand_requires_expand(i32 %cond0) #0 {
	bb0:			bb0:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
	%cmp0 = icmp slt i32 %cond0, 0			%cmp0 = icmp slt i32 %cond0, 0
	br i1 %cmp0, label %bb2, label %bb1			br i1 %cmp0, label %bb2, label %bb1

	bb1:			bb1:
	%val = load volatile i32, i32 addrspace(2)* undef			%val = load volatile i32, i32 addrspace(2)* undef
	%cmp1 = icmp eq i32 %val, 3			%cmp1 = icmp eq i32 %val, 3
	br i1 %cmp1, label %bb3, label %bb2			br i1 %cmp1, label %bb3, label %bb2

	bb2:			bb2:
	call void asm sideeffect			call void asm sideeffect
	"v_nop_e64			"v_nop_e64
	v_nop_e64			v_nop_e64
	v_nop_e64			v_nop_e64
	v_nop_e64", ""() #0			v_nop_e64", ""() #0
	br label %bb3			br label %bb3

	bb3:			bb3:
				; These NOPs prevent tail-duplication-based outlining
				; from firing, which defeats the need to expand the branches and this test.
				call void asm sideeffect
				"v_nop_e64", ""() #0
				call void asm sideeffect
				"v_nop_e64", ""() #0
	ret void			ret void
	}			}

	; Requires expanding of required skip branch.			; Requires expanding of required skip branch.

	; GCN-LABEL: {{^}}uniform_inside_divergent:			; GCN-LABEL: {{^}}uniform_inside_divergent:
	; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}			; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
	; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc			; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
	Show All 13 Lines
	; GCN: s_cbranch_scc1 [[ENDIF]]			; GCN: s_cbranch_scc1 [[ENDIF]]

	; GCN-NEXT: ; BB#2: ; %if_uniform			; GCN-NEXT: ; BB#2: ; %if_uniform
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)

	; GCN-NEXT: [[ENDIF]]: ; %endif			; GCN-NEXT: [[ENDIF]]: ; %endif
	; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]			; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]
				; GCN-NEXT: s_sleep 5
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) #0 {			define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%d_cmp = icmp ult i32 %tid, 16			%d_cmp = icmp ult i32 %tid, 16
	br i1 %d_cmp, label %if, label %endif			br i1 %d_cmp, label %if, label %endif

	if:			if:
	store i32 0, i32 addrspace(1)* %out			store i32 0, i32 addrspace(1)* %out
	%u_cmp = icmp eq i32 %cond, 0			%u_cmp = icmp eq i32 %cond, 0
	br i1 %u_cmp, label %if_uniform, label %endif			br i1 %u_cmp, label %if_uniform, label %endif

	if_uniform:			if_uniform:
	store i32 1, i32 addrspace(1)* %out			store i32 1, i32 addrspace(1)* %out
	br label %endif			br label %endif

	endif:			endif:
				; layout can remove the split branch if it can copy the return block.
				; This call makes the return block long enough that it doesn't get copied.
				call void @llvm.amdgcn.s.sleep(i32 5);
	ret void			ret void
	}			}

	; si_mask_branch			; si_mask_branch
	; s_cbranch_execz			; s_cbranch_execz
	; s_branch			; s_branch

	; GCN-LABEL: {{^}}analyze_mask_branch:			; GCN-LABEL: {{^}}analyze_mask_branch:
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/uniform-cfg.ll

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	ENDIF: ; preds = %IF, %main_body
ret void		ret void
}		}

; GCN-LABEL: {{^}}icmp_users_different_blocks:		; GCN-LABEL: {{^}}icmp_users_different_blocks:
; GCN: s_load_dword [[COND:s[0-9]+]]		; GCN: s_load_dword [[COND:s[0-9]+]]
; GCN: s_cmp_lt_i32 [[COND]], 1		; GCN: s_cmp_lt_i32 [[COND]], 1
; GCN: s_cbranch_scc1 [[EXIT:[A-Za-z0-9_]+]]		; GCN: s_cbranch_scc1 [[EXIT:[A-Za-z0-9_]+]]
; GCN: v_cmp_gt_i32_e64 vcc, [[COND]], 0{{$}}		; GCN: v_cmp_gt_i32_e64 vcc, [[COND]], 0{{$}}
; GCN: s_cbranch_vccnz [[EXIT]]		; GCN: s_cbranch_vccz [[BODY:[A-Za-z0-9_]+]]
; GCN: buffer_store
; GCN: {{^}}[[EXIT]]:		; GCN: {{^}}[[EXIT]]:
; GCN: s_endpgm		; GCN: s_endpgm
		; GCN: {{^}}[[BODY]]:
		; GCN: buffer_store
		; GCN: s_endpgm
define void @icmp_users_different_blocks(i32 %cond0, i32 %cond1, i32 addrspace(1)* %out) {		define void @icmp_users_different_blocks(i32 %cond0, i32 %cond1, i32 addrspace(1)* %out) {
bb:		bb:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
%cmp0 = icmp sgt i32 %cond0, 0		%cmp0 = icmp sgt i32 %cond0, 0
%cmp1 = icmp sgt i32 %cond1, 0		%cmp1 = icmp sgt i32 %cond1, 0
br i1 %cmp0, label %bb2, label %bb9		br i1 %cmp0, label %bb2, label %bb9

bb2: ; preds = %bb		bb2: ; preds = %bb
Show All 30 Lines
}		}

; Test uniform and divergent.		; Test uniform and divergent.

; GCN-LABEL: {{^}}uniform_inside_divergent:		; GCN-LABEL: {{^}}uniform_inside_divergent:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: s_cbranch_execz [[ENDIF_LABEL:[0-9_A-Za-z]+]]
; GCN: s_cmp_lg_u32 {{s[0-9]+}}, 0		; GCN: s_cmp_lg_u32 {{s[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[ENDIF_LABEL]]		; GCN: s_cbranch_scc0 [[IF_UNIFORM_LABEL:[A-Z0-9_a-z]+]]
		; GCN: s_endpgm
		; GCN: {{^}}[[IF_UNIFORM_LABEL]]:
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) {		define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
%d_cmp = icmp ult i32 %tid, 16		%d_cmp = icmp ult i32 %tid, 16
br i1 %d_cmp, label %if, label %endif		br i1 %d_cmp, label %if, label %endif

if:		if:
store i32 0, i32 addrspace(1)* %out		store i32 0, i32 addrspace(1)* %out
%u_cmp = icmp eq i32 %cond, 0		%u_cmp = icmp eq i32 %cond, 0
br i1 %u_cmp, label %if_uniform, label %endif		br i1 %u_cmp, label %if_uniform, label %endif

if_uniform:		if_uniform:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
br label %endif		br label %endif

endif:		endif:
ret void		ret void
}		}

; GCN-LABEL: {{^}}divergent_inside_uniform:		; GCN-LABEL: {{^}}divergent_inside_uniform:
; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0		; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[ENDIF_LABEL:[0-9_A-Za-z]+]]		; GCN: s_cbranch_scc0 [[IF_LABEL:[0-9_A-Za-z]+]]
		; GCN: [[IF_LABEL]]:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
; GCN: [[ENDIF_LABEL]]:
; GCN: s_endpgm
define void @divergent_inside_uniform(i32 addrspace(1)* %out, i32 %cond) {		define void @divergent_inside_uniform(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%u_cmp = icmp eq i32 %cond, 0		%u_cmp = icmp eq i32 %cond, 0
br i1 %u_cmp, label %if, label %endif		br i1 %u_cmp, label %if, label %endif

if:		if:
store i32 0, i32 addrspace(1)* %out		store i32 0, i32 addrspace(1)* %out
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
Show All 11 Lines
; GCN-LABEL: {{^}}divergent_if_uniform_if:		; GCN-LABEL: {{^}}divergent_if_uniform_if:
; GCN: v_cmp_eq_u32_e32 vcc, 0, v0		; GCN: v_cmp_eq_u32_e32 vcc, 0, v0
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
; GCN: s_or_b64 exec, exec, [[MASK]]		; GCN: s_or_b64 exec, exec, [[MASK]]
; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0		; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[EXIT:[A-Z0-9_]+]]		; GCN: s_cbranch_scc0 [[IF_UNIFORM:[A-Z0-9_]+]]
		; GCN: s_endpgm
		; GCN: [[IF_UNIFORM]]:
; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2		; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2
; GCN: buffer_store_dword [[TWO]]		; GCN: buffer_store_dword [[TWO]]
; GCN: [[EXIT]]:
; GCN: s_endpgm
define void @divergent_if_uniform_if(i32 addrspace(1)* %out, i32 %cond) {		define void @divergent_if_uniform_if(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
%d_cmp = icmp eq i32 %tid, 0		%d_cmp = icmp eq i32 %tid, 0
br i1 %d_cmp, label %if, label %endif		br i1 %d_cmp, label %if, label %endif

if:		if:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

test/CodeGen/ARM/arm-and-tst-peephole.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; V8-LABEL: %tailrecurse.switch			; V8-LABEL: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: bne			; V8-NEXT: beq
	; V8-NEXT: b			; V8-NEXT: %sw.epilog
	; The trailing space in the last line checks that the branch is unconditional			; V8-NEXT: bx lr
	switch i32 %and, label %sw.epilog [			switch i32 %and, label %sw.epilog [
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 3, label %sw.bb6			i32 3, label %sw.bb6
	i32 2, label %sw.bb8			i32 2, label %sw.bb8
	], !prof !1			], !prof !1

	sw.bb: ; preds = %tailrecurse.switch, %tailrecurse			sw.bb: ; preds = %tailrecurse.switch, %tailrecurse
	%shl = shl i32 %acc.tr, 1			%shl = shl i32 %acc.tr, 1
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-op.ll

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	; CHECK-NOT: dmb ish			; CHECK-NOT: dmb ish
	; CHECK: [[LOOP_BB:\.?LBB[0-9]+_1]]:			; CHECK: [[LOOP_BB:\.?LBB[0-9]+_1]]:
	; CHECK: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]			; CHECK: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
	; CHECK: cmp [[OLDVAL]], r1			; CHECK: cmp [[OLDVAL]], r1
	; CHECK: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]			; CHECK: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]
	; CHECK: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]			; CHECK: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]
	; CHECK: cmp [[SUCCESS]], #0			; CHECK: cmp [[SUCCESS]], #0
	; CHECK: bne [[LOOP_BB]]			; CHECK: bne [[LOOP_BB]]
	; CHECK: b [[END_BB:\.?LBB[0-9]+_[0-9]+]]			; CHECK: dmb ish
				; CHECK: bx lr
	; CHECK: [[FAIL_BB]]:			; CHECK: [[FAIL_BB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[END_BB]]:
	; CHECK: dmb ish			; CHECK: dmb ish
	; CHECK: bx lr			; CHECK: bx lr

	ret i32 %oldval			ret i32 %oldval
	}			}

	define i32 @load_load_add_acquire(i32* %mem1, i32* %mem2) nounwind {			define i32 @load_load_add_acquire(i32* %mem1, i32* %mem2) nounwind {
	; CHECK-LABEL: load_load_add_acquire			; CHECK-LABEL: load_load_add_acquire
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-ops-v8.ll

	Show First 20 Lines • Show All 1,039 Lines • ▼ Show 20 Lines
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0			; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-ARM-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]			; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
				; CHECK-ARM-NEXT: bx lr
	ret i8 %old			ret i8 %old
	}			}

	define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {			define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i16:			; CHECK-LABEL: test_atomic_cmpxchg_i16:
	%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst			%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst
	%old = extractvalue { i16, i1 } %pair, 0			%old = extractvalue { i16, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16			; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16
	; CHECK-DAG: movt r[[ADDR]], :upper16:var16			; CHECK-DAG: movt r[[ADDR]], :upper16:var16
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0			; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-ARM-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]			; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
				; CHECK-ARM-NEXT: bx lr
	ret i16 %old			ret i16 %old
	}			}

	define void @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {			define void @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i32:			; CHECK-LABEL: test_atomic_cmpxchg_i32:
	%pair = cmpxchg i32* @var32, i32 %wanted, i32 %new release monotonic			%pair = cmpxchg i32* @var32, i32 %wanted, i32 %new release monotonic
	%old = extractvalue { i32, i1 } %pair, 0			%old = extractvalue { i32, i1 } %pair, 0
	store i32 %old, i32* @var32			store i32 %old, i32* @var32
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var32			; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var32
	; CHECK: movt r[[ADDR]], :upper16:var32			; CHECK: movt r[[ADDR]], :upper16:var32

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldrex r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldrex r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-NEXT: cmp r[[OLD]], r0			; CHECK-NEXT: cmp r[[OLD]], r0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlex [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlex [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK: str{{(.w)?}} r[[OLD]],
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK-NEXT: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK: str{{(.w)?}} r[[OLD]],			; CHECK: str{{(.w)?}} r[[OLD]],
				; CHECK-ARM-NEXT: bx lr
	ret void			ret void
	}			}

	define void @test_atomic_cmpxchg_i64(i64 %wanted, i64 %new) nounwind {			define void @test_atomic_cmpxchg_i64(i64 %wanted, i64 %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i64:			; CHECK-LABEL: test_atomic_cmpxchg_i64:
	%pair = cmpxchg i64* @var64, i64 %wanted, i64 %new monotonic monotonic			%pair = cmpxchg i64* @var64, i64 %wanted, i64 %new monotonic monotonic
	%old = extractvalue { i64, i1 } %pair, 0			%old = extractvalue { i64, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var64			; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var64
	; CHECK: movt r[[ADDR]], :upper16:var64			; CHECK: movt r[[ADDR]], :upper16:var64

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldrexd [[OLD1:r[0-9]+\|lr]], [[OLD2:r[0-9]+\|lr]], [r[[ADDR]]]			; CHECK: ldrexd [[OLD1:r[0-9]+\|lr]], [[OLD2:r[0-9]+\|lr]], [r[[ADDR]]]
	; r0, r1 below is a reasonable guess but could change: it certainly comes into the			; r0, r1 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0			; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0
	; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1			; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1
	; CHECK-ARM-LE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-ARM-LE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-THUMB-LE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_HI]], [[MISMATCH_LO]]			; CHECK-THUMB-LE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_HI]], [[MISMATCH_LO]]
	; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1			; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1
	; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0			; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0
	; CHECK-ARM-BE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_HI]], [[MISMATCH_LO]]			; CHECK-ARM-BE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_HI]], [[MISMATCH_LO]]
	; CHECK-THUMB-BE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-THUMB-BE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r2, r3 is a reasonable guess.			; As above, r2, r3 is a reasonable guess.
	; CHECK: strexd [[STATUS:r[0-9]+]], r2, r3, [r[[ADDR]]]			; CHECK: strexd [[STATUS:r[0-9]+]], r2, r3, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK-NEXT: pop
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]			; CHECK-ARM: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]
	store i64 %old, i64* @var64			store i64 %old, i64* @var64
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines

test/CodeGen/ARM/cmpxchg-weak.ll

	; RUN: llc < %s -mtriple=armv7-apple-ios -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-apple-ios -verify-machineinstrs \| FileCheck %s

	define void @test_cmpxchg_weak(i32 *%addr, i32 %desired, i32 %new) {			define void @test_cmpxchg_weak(i32 *%addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: test_cmpxchg_weak:			; CHECK-LABEL: test_cmpxchg_weak:

	%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic			%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic
	%oldval = extractvalue { i32, i1 } %pair, 0			%oldval = extractvalue { i32, i1 } %pair, 0
	; CHECK-NEXT: BB#0:			; CHECK-NEXT: BB#0:
	; CHECK-NEXT: ldrex [[LOADED:r[0-9]+]], [r0]			; CHECK-NEXT: ldrex [[LOADED:r[0-9]+]], [r0]
	; CHECK-NEXT: cmp [[LOADED]], r1			; CHECK-NEXT: cmp [[LOADED]], r1
	; CHECK-NEXT: bne [[LDFAILBB:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: bne [[LDFAILBB:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: BB#1:			; CHECK-NEXT: BB#1:
	; CHECK-NEXT: dmb ish			; CHECK-NEXT: dmb ish
	; CHECK-NEXT: strex [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-NEXT: strex [[SUCCESS:r[0-9]+]], r2, [r0]
	; CHECK-NEXT: cmp [[SUCCESS]], #0			; CHECK-NEXT: cmp [[SUCCESS]], #0
	; CHECK-NEXT: bne [[FAILBB:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: beq [[SUCCESSBB:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; CHECK-NEXT: dmb ish
	; CHECK-NEXT: str r3, [r0]			; CHECK-NEXT: str r3, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	; CHECK-NEXT: [[LDFAILBB]]:			; CHECK-NEXT: [[LDFAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: str r3, [r0]
				; CHECK-NEXT: bx lr
				; CHECK-NEXT: [[SUCCESSBB]]:
				; CHECK-NEXT: dmb ish
	; CHECK-NEXT: str r3, [r0]			; CHECK-NEXT: str r3, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr

	store i32 %oldval, i32* %addr			store i32 %oldval, i32* %addr
	ret void			ret void
	}			}


	Show All 26 Lines

test/CodeGen/Mips/brconnez.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16

	@j = global i32 0, align 4			@j = global i32 0, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @j, align 4			%0 = load i32, i32* @j, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end, !prof !1

	; 16: bnez ${{[0-9]+}}, $[[LABEL:[0-9A-Ba-b_]+]]			; 16: bnez ${{[0-9]+}}, $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: lw ${{[0-9]+}}, %got(result)(${{[0-9]+}})			; 16: lw ${{[0-9]+}}, %got(result)(${{[0-9]+}})
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	ret void			ret void
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/Mips/micromips-compact-branches.ll

	; RUN: llc %s -march=mipsel -mattr=micromips -filetype=asm -O3 \			; RUN: llc %s -march=mipsel -mattr=micromips -filetype=asm -O3 \
	; RUN: -disable-mips-delay-filler -relocation-model=pic -o - \| FileCheck %s			; RUN: -disable-mips-delay-filler -relocation-model=pic -o - \| FileCheck %s

	define void @main() nounwind uwtable {			define void @main() nounwind uwtable {
	entry:			entry:
	%x = alloca i32, align 4			%x = alloca i32, align 4
	%0 = load i32, i32* %x, align 4			%0 = load i32, i32* %x, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end, !prof !1

	if.then:			if.then:
	store i32 10, i32* %x, align 4			store i32 10, i32* %x, align 4
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	; CHECK: bnezc			; CHECK: bnezc
				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/PowerPC/misched-inorder-latency.ll

	Show All 11 Lines
	; CHECK: addi			; CHECK: addi
	; CHECK: bne			; CHECK: bne
	; CHECK: %true			; CHECK: %true
	define i32 @testload(i32 *%ptr, i32 %sumin) {			define i32 @testload(i32 *%ptr, i32 %sumin) {
	entry:			entry:
	%sum1 = add i32 %sumin, 1			%sum1 = add i32 %sumin, 1
	%val1 = load i32, i32* %ptr			%val1 = load i32, i32* %ptr
	%p = icmp eq i32 %sumin, 0			%p = icmp eq i32 %sumin, 0
	br i1 %p, label %true, label %end			br i1 %p, label %true, label %end, !prof !1
	true:			true:
	%sum2 = add i32 %sum1, 1			%sum2 = add i32 %sum1, 1
	%ptr2 = getelementptr i32, i32* %ptr, i32 1			%ptr2 = getelementptr i32, i32* %ptr, i32 1
	%val = load i32, i32* %ptr2			%val = load i32, i32* %ptr2
	%val2 = add i32 %val1, %val			%val2 = add i32 %val1, %val
	br label %end			br label %end
	end:			end:
	%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]			%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]
	Show All 19 Lines
	true:			true:
	%val2 = add i32 %val1, 1			%val2 = add i32 %val1, 1
	br label %end			br label %end
	end:			end:
	%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]			%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]
	ret i32 %valmerge			ret i32 %valmerge
	}			}
	declare void @llvm.prefetch(i8*, i32, i32, i32) nounwind			declare void @llvm.prefetch(i8*, i32, i32, i32) nounwind

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/PowerPC/tail-dup-break-cfg.ll

This file was added.

				; RUN: llc -O2 -o - %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-grtev4-linux-gnu"

				; Intended layout:
				; The code for tail-duplication during layout will produce the layout:
				; test1
				; test2
				; body1 (with copy of test2)
				; body2
				; exit

				;CHECK-LABEL: tail_dup_break_cfg:
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 12, 1, [[BODY1LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: b [[BODY2LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: [[BODY1LABEL]]
				;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL]]
				;CHECK-NEXT: [[BODY2LABEL]]
				;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				define void @tail_dup_break_cfg(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %body1, !prof !1 ; %test2 more likely
				body1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %exit, label %body2, !prof !1 ; %exit more likely
				body2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %exit
				exit:
				ret void
				}

				; The branch weights here hint that we shouldn't tail duplicate in this case.
				;CHECK-LABEL: tail_dup_dont_break_cfg:
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 4, 1, [[TEST2LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %body1
				;CHECK: [[TEST2LABEL]]: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %body2
				;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				define void @tail_dup_dont_break_cfg(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %body1, !prof !1 ; %test2 more likely
				body1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp ne i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %body2, label %exit, !prof !1 ; %body2 more likely
				body2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %exit
				exit:
				ret void
				}
				declare void @a()
				declare void @b()
				declare void @c()
				declare void @d()

				; This function arranges for the successors of %succ to have already been laid
				; out. When we consider whether to lay out succ after bb and to tail-duplicate
				; it, v and ret have already been placed, so we tail-duplicate as it removes a
				; branch and strictly increases fallthrough
				; CHECK-LABEL: tail_dup_no_succ
				; CHECK: # %entry
				; CHECK: # %v
				; CHECK: # %ret
				; CHECK: # %bb
				; CHECK: # %succ
				; CHECK: # %c
				; CHECK: bl c
				; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
				; CHECK: beq
				; CHECK: b
				define void @tail_dup_no_succ(i32 %tag) {
				entry:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %v, label %bb, !prof !2 ; %v very much more likely
				bb:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %succ, label %c, !prof !3 ; %succ more likely
				c:
				call void @c()
				call void @c()
				br label %succ
				succ:
				%tagbit3 = and i32 %tag, 4
				%tagbit3eq0 = icmp eq i32 %tagbit3, 0
				br i1 %tagbit3eq0, label %ret, label %v, !prof !1 ; %u more likely
				v:
				call void @d()
				call void @d()
				br label %ret
				ret:
				ret void
				}


				!1 = !{!"branch_weights", i32 5, i32 3}
				!2 = !{!"branch_weights", i32 95, i32 5}
				!3 = !{!"branch_weights", i32 7, i32 3}

test/CodeGen/PowerPC/tail-dup-layout.ll

Show All 13 Lines
; optional3		; optional3
; optional4		; optional4
; Tail duplication puts test n+1 at the end of optional n		; Tail duplication puts test n+1 at the end of optional n
; so optional1 includes a copy of test2 at the end, and branches		; so optional1 includes a copy of test2 at the end, and branches
; to test3 (at the top) or falls through to optional 2.		; to test3 (at the top) or falls through to optional 2.
; The CHECK statements check for the whole string of tests and exit block,		; The CHECK statements check for the whole string of tests and exit block,
; and then check that the correct test has been duplicated into the end of		; and then check that the correct test has been duplicated into the end of
; the optional blocks and that the optional blocks are in the correct order.		; the optional blocks and that the optional blocks are in the correct order.
;CHECK-LABEL: f:		;CHECK-LABEL: straight_test:
; test1 may have been merged with entry		; test1 may have been merged with entry
;CHECK: mr [[TAGREG:[0-9]+]], 3		;CHECK: mr [[TAGREG:[0-9]+]], 3
;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1		;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
;CHECK-NEXT: bc 12, 1, [[OPT1LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST2LABEL:[._0-9A-Za-z]+]]: # %test2		;CHECK-NEXT: # %test2
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
;CHECK-NEXT: bne 0, [[OPT2LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3		;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
;CHECK-NEXT: bne 0, .[[OPT3LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST4LABEL:[._0-9A-Za-z]+]]: # %test4		;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
;CHECK-NEXT: bne 0, .[[OPT4LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT4LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit		;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit
;CHECK: blr		;CHECK: blr
;CHECK-NEXT: [[OPT1LABEL]]		;CHECK-NEXT: .[[OPT1LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
;CHECK-NEXT: beq 0, [[TEST3LABEL]]		;CHECK-NEXT: beq 0, .[[TEST3LABEL]]
;CHECK-NEXT: [[OPT2LABEL]]		;CHECK-NEXT: .[[OPT2LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
;CHECK-NEXT: beq 0, [[TEST4LABEL]]		;CHECK-NEXT: beq 0, .[[TEST4LABEL]]
;CHECK-NEXT: [[OPT3LABEL]]		;CHECK-NEXT: .[[OPT3LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
;CHECK-NEXT: beq 0, [[EXITLABEL]]		;CHECK-NEXT: beq 0, .[[EXITLABEL]]
;CHECK-NEXT: [[OPT4LABEL]]		;CHECK-NEXT: .[[OPT4LABEL]]
;CHECK: b [[EXITLABEL]]		;CHECK: b .[[EXITLABEL]]

define void @f(i32 %tag) {		define void @straight_test(i32 %tag) {
entry:		entry:
br label %test1		br label %test1
test1:		test1:
%tagbit1 = and i32 %tag, 1		%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1		br i1 %tagbit1eq0, label %test2, label %optional1
optional1:		optional1:
call void @a()		call void @a()
Show All 30 Lines	optional4:
call void @d()		call void @d()
call void @d()		call void @d()
call void @d()		call void @d()
br label %exit		br label %exit
exit:		exit:
ret void		ret void
}		}

		; The block then2 is not unavoidable, but since it can be tail-duplicated, it
		; should be placed as a fallthrough from test2 and copied.
		; CHECK-LABEL: avoidable_test:
		; CHECK: # %entry
		; CHECK: andi.
		; CHECK: # %test2
		; Make sure then2 falls through from test2
		; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
		; CHECK: # %then2
		; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
		; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
		; CHECK: # %end2
		; CHECK: # %else1
		; CHECK: bl a
		; CHECK: bl a
		; Make sure then2 was copied into else1
		; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
		; CHECK: # %else2
		; CHECK: bl c
		define void @avoidable_test(i32 %tag) {
		entry:
		br label %test1
		test1:
		%tagbit1 = and i32 %tag, 1
		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
		br i1 %tagbit1eq0, label %test2, label %else1, !prof !1 ; %test2 more likely
		else1:
		call void @a()
		call void @a()
		br label %then2
		test2:
		%tagbit2 = and i32 %tag, 2
		%tagbit2eq0 = icmp eq i32 %tagbit2, 0
		br i1 %tagbit2eq0, label %then2, label %else2, !prof !1 ; %then2 more likely
		then2:
		%tagbit3 = and i32 %tag, 4
		%tagbit3eq0 = icmp eq i32 %tagbit3, 0
		br i1 %tagbit3eq0, label %end2, label %end1, !prof !1 ; %end2 more likely
		else2:
		call void @c()
		br label %end2
		end2:
		ret void
		end1:
		call void @d()
		ret void
		}

declare void @a()		declare void @a()
declare void @b()		declare void @b()
declare void @c()		declare void @c()
declare void @d()		declare void @d()

		!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/SPARC/sjlj.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK: or %i1, %lo(.LBB1_2), %i1			; CHECK: or %i1, %lo(.LBB1_2), %i1
	; CHECK: st %i1, [%i0+4]			; CHECK: st %i1, [%i0+4]
	; CHECK: st %sp, [%i0+8]			; CHECK: st %sp, [%i0+8]
	; CHECK: bn .LBB1_2			; CHECK: bn .LBB1_2
	; CHECK: st %i7, [%i0+12]			; CHECK: st %i7, [%i0+12]
	; CHECK: ba .LBB1_1			; CHECK: ba .LBB1_1
	; CHECK: nop			; CHECK: nop
	; CHECK:.LBB1_1: ! %entry			; CHECK:.LBB1_1: ! %entry
	; CHECK: ba .LBB1_3
	; CHECK: mov %g0, %i0			; CHECK: mov %g0, %i0
				; CHECK: cmp %i0, 0
				; CHECK: bne .LBB1_4
				; CHECK: ba .LBB1_5
	; CHECK:.LBB1_2: ! Block address taken			; CHECK:.LBB1_2: ! Block address taken
	; CHECK: mov 1, %i0			; CHECK: mov 1, %i0
	; CHECK:.LBB1_3: ! %entry
	; CHECK: cmp %i0, 0
	; CHECK: be .LBB1_5			; CHECK: be .LBB1_5
	; CHECK: nop			; CHECK:.LBB1_4:
				; CHECK: ba .LBB1_6
	}			}
	declare i8* @llvm.frameaddress(i32) #2			declare i8* @llvm.frameaddress(i32) #2

	declare i8* @llvm.stacksave() #3			declare i8* @llvm.stacksave() #3

	declare i32 @llvm.eh.sjlj.setjmp(i8*) #3			declare i32 @llvm.eh.sjlj.setjmp(i8*) #3

	attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { noreturn nounwind }			attributes #1 = { noreturn nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

test/CodeGen/SystemZ/int-cmp-44.ll

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: brasl %r14, foo@PLT			; CHECK-NEXT: brasl %r14, foo@PLT
	; CHECK-NEXT: cijlh [[REG]], 0, .L{{.*}}			; CHECK-NEXT: cijlh [[REG]], 0, .L{{.*}}
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i32 , i32 *%ptr			%val = load i32 , i32 *%ptr
	%xor = xor i32 %val, 1			%xor = xor i32 %val, 1
	%add = add i32 %xor, 1000000			%add = add i32 %xor, 1000000
	call void @foo()			call void @foo()
	%cmp = icmp ne i32 %add, 0			%cmp = icmp eq i32 %add, 0
	br i1 %cmp, label %exit, label %store			br i1 %cmp, label %store, label %exit, !prof !1

	store:			store:
	store i32 %add, i32 *%ptr			store i32 %add, i32 *%ptr
	br label %exit			br label %exit

	exit:			exit:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 397 Lines • ▼ Show 20 Lines

	store:			store:
	store i64 %res, i64 *%dest			store i64 %res, i64 *%dest
	br label %exit			br label %exit

	exit:			exit:
	ret i64 %res			ret i64 %res
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

	; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumb-macho \			; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumb-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V4T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V4T
	; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumbv5-macho \			; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumbv5-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V5T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V5T
	; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumb-macho \			; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumb-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V4T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V4T
	; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumbv5-macho \			; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumbv5-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V5T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V5T

	;			;
	; Note: Lots of tests use inline asm instead of regular calls.			; Note: Lots of tests use inline asm instead of regular calls.
	; This allows to have a better control on what the allocation will do.			; This allows to have a better control on what the allocation will do.
	; Otherwise, we may have spill right in the entry block, defeating			; Otherwise, we may have spill right in the entry block, defeating
	; shrink-wrapping. Moreover, some of the inline asm statements (nop)			; shrink-wrapping. Moreover, some of the inline asm statements (nop)
	; are here to ensure that the related paths do not end up as critical			; are here to ensure that the related paths do not end up as critical
	; edges.			; edges.
	; Also disable the late if-converter as it makes harder to reason on			; Also disable the late if-converter as it makes harder to reason on
	; the diffs.			; the diffs.
				; Disable tail-duplication during placement, as v4t vs v5t get different
				; results due to branches not being analyzable under v5

	; Initial motivating example: Simple diamond with a call just on one side.			; Initial motivating example: Simple diamond with a call just on one side.
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; No prologue needed.			; No prologue needed.
	; ENABLE: cmp r0, r1			; ENABLE: cmp r0, r1
	; ENABLE-NEXT: bge [[EXIT_LABEL:LBB[0-9_]+]]			; ENABLE-NEXT: bge [[EXIT_LABEL:LBB[0-9_]+]]
	▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

test/CodeGen/Thumb2/cbnz.ll

Show All 20 Lines	t:
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
; CHECK: cbnz		; CHECK: cbz
%q = icmp eq i32 %y, 0		%q = icmp eq i32 %y, 0
br i1 %q, label %t2, label %f		br i1 %q, label %t2, label %f

t2:		t2:
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
Show All 17 Lines

test/CodeGen/Thumb2/ifcvt-compare.ll

	; RUN: llc -mtriple=thumbv7-unknown-linux %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv7-unknown-linux %s -o - \| FileCheck %s

	declare void @x()			declare void @x()

	define void @f0(i32 %x) optsize {			define void @f0(i32 %x) optsize {
	; CHECK-LABEL: f0:			; CHECK-LABEL: f0:
	; CHECK: cbnz			; CHECK: cbz
	%p = icmp eq i32 %x, 0			%p = icmp eq i32 %x, 0
	br i1 %p, label %t, label %f			br i1 %p, label %t, label %f

	t:			t:
	call void @x()			call void @x()
	br label %f			br label %f

	f:			f:
	Show All 34 Lines

test/CodeGen/Thumb2/v8_IT_4.ll

	; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -arm-restrict-it \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard -regalloc=basic \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard -regalloc=basic \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -regalloc=basic -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -regalloc=basic -arm-restrict-it \| FileCheck %s

	%"struct.__gnu_cxx::__normal_iterator<char,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" = type { i8 }			%"struct.__gnu_cxx::__normal_iterator<char,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" = type { i8 }
	%"struct.__gnu_cxx::new_allocator<char>" = type <{ i8 }>			%"struct.__gnu_cxx::new_allocator<char>" = type <{ i8 }>
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >" = type { %"struct.__gnu_cxx::__normal_iterator<char*,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >" = type { %"struct.__gnu_cxx::__normal_iterator<char*,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" }
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep" = type { %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep" = type { %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" }
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" = type { i32, i32, i32 }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" = type { i32, i32, i32 }


	define weak arm_aapcs_vfpcc i32 @_ZNKSs7compareERKSs(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this, %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) {			define weak arm_aapcs_vfpcc i32 @_ZNKSs7compareERKSs(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this, %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) {
	; CHECK-LABEL: _ZNKSs7compareERKSs:			; CHECK-LABEL: _ZNKSs7compareERKSs:
	; CHECK: cbnz r0,			; CHECK: cbz r0,
				; CHECK-NEXT: %bb1
				; CHECK-NEXT: pop.w
	; CHECK-NEXT: %bb			; CHECK-NEXT: %bb
	; CHECK-NEXT: sub{{(.w)?}} r0, r{{[0-9]+}}, r{{[0-9]+}}			; CHECK-NEXT: sub{{(.w)?}} r0, r{{[0-9]+}}, r{{[0-9]+}}
	; CHECK-NEXT: %bb1
	; CHECK-NEXT: pop.w			; CHECK-NEXT: pop.w
	entry:			entry:
	%0 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i32> [#uses=3]			%0 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i32> [#uses=3]
	%1 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i32> [#uses=3]			%1 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i32> [#uses=3]
	%2 = icmp ult i32 %1, %0 ; <i1> [#uses=1]			%2 = icmp ult i32 %1, %0 ; <i1> [#uses=1]
	%3 = select i1 %2, i32 %1, i32 %0 ; <i32> [#uses=1]			%3 = select i1 %2, i32 %1, i32 %0 ; <i32> [#uses=1]
	%4 = tail call arm_aapcs_vfpcc i8* @_ZNKSs7_M_dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i8*> [#uses=1]			%4 = tail call arm_aapcs_vfpcc i8* @_ZNKSs7_M_dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i8*> [#uses=1]
	%5 = tail call arm_aapcs_vfpcc i8* @_ZNKSs4dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i8*> [#uses=1]			%5 = tail call arm_aapcs_vfpcc i8* @_ZNKSs4dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i8*> [#uses=1]
	Show All 19 Lines

test/CodeGen/WebAssembly/phi.ll

	; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -verify-machineinstrs \| FileCheck %s

	; Test that phis are lowered.			; Test that phis are lowered.

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	; Basic phi triangle.			; Basic phi triangle.

	; CHECK-LABEL: test0:			; CHECK-LABEL: test0:
	; CHECK: div_s $[[NUM0:[0-9]+]]=, $0, $pop[[NUM1:[0-9]+]]{{$}}			; CHECK: return $0
	; CHECK: return $[[NUM0]]{{$}}			; CHECK: div_s $push[[NUM0:[0-9]+]]=, $0, $pop[[NUM1:[0-9]+]]{{$}}
				; CHECK: return $pop[[NUM0]]{{$}}
	define i32 @test0(i32 %p) {			define i32 @test0(i32 %p) {
	entry:			entry:
	%t = icmp slt i32 %p, 0			%t = icmp slt i32 %p, 0
	br i1 %t, label %true, label %done			br i1 %t, label %true, label %done
	true:			true:
	%a = sdiv i32 %p, 3			%a = sdiv i32 %p, 3
	br label %done			br label %done
	done:			done:
	Show All 27 Lines

test/CodeGen/X86/avx512-cmp.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	}			}

	define float @test5(float %p) #0 {			define float @test5(float %p) #0 {
	; ALL-LABEL: test5:			; ALL-LABEL: test5:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1			; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; ALL-NEXT: vucomiss %xmm1, %xmm0			; ALL-NEXT: vucomiss %xmm1, %xmm0
	; ALL-NEXT: jne LBB3_1			; ALL-NEXT: jne LBB3_1
	; ALL-NEXT: jnp LBB3_2			; ALL-NEXT: jp LBB3_1
				; ALL-NEXT: ## BB#2: ## %return
				; ALL-NEXT: retq
	; ALL-NEXT: LBB3_1: ## %if.end			; ALL-NEXT: LBB3_1: ## %if.end
	; ALL-NEXT: seta %al			; ALL-NEXT: seta %al
	; ALL-NEXT: movzbl %al, %eax			; ALL-NEXT: movzbl %al, %eax
	; ALL-NEXT: leaq {{.*}}(%rip), %rcx			; ALL-NEXT: leaq {{.*}}(%rip), %rcx
	; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; ALL-NEXT: LBB3_2: ## %return
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%cmp = fcmp oeq float %p, 0.000000e+00			%cmp = fcmp oeq float %p, 0.000000e+00
	br i1 %cmp, label %return, label %if.end			br i1 %cmp, label %return, label %if.end

	if.end: ; preds = %entry			if.end: ; preds = %entry
	%cmp1 = fcmp ogt float %p, 0.000000e+00			%cmp1 = fcmp ogt float %p, 0.000000e+00
	%cond = select i1 %cmp1, float 1.000000e+00, float -1.000000e+00			%cond = select i1 %cmp1, float 1.000000e+00, float -1.000000e+00
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

test/CodeGen/X86/bt.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s
	; PR3253			; PR3253

	; The register+memory form of the BT instruction should be usable on			; The register+memory form of the BT instruction should be usable on
	; pentium4, however it is currently disabled due to the register+memory			; pentium4, however it is currently disabled due to the register+memory
	; form having different semantics than the register+register form.			; form having different semantics than the register+register form.

	Show All 9 Lines
	; operand is constant are included).			; operand is constant are included).
	; - The and can be commuted.			; - The and can be commuted.

	define void @test2(i32 %x, i32 %n) nounwind {			define void @test2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB0_2			; CHECK-NEXT: jb .LBB0_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB0_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
				davidxlUnsubmitted Not Done Reply Inline Actions This test has not changed in behavior. Better to revert the change. davidxl: This test has not changed in behavior. Better to revert the change.
				iterateeAuthorUnsubmitted Not Done Reply Inline Actions I'll do a complete check for any tests that fall into this category and revert them. iteratee: I'll do a complete check for any tests that fall into this category and revert them.
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test2b(i32 %x, i32 %n) nounwind {			define void @test2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test2b:			; CHECK-LABEL: test2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB1_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @atest2(i32 %x, i32 %n) nounwind {			define void @atest2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: atest2:			; CHECK-LABEL: atest2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB2_2			; CHECK-NEXT: jb .LBB2_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB2_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @atest2b(i32 %x, i32 %n) nounwind {			define void @atest2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: atest2b:			; CHECK-LABEL: atest2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB3_2			; CHECK-NEXT: jb .LBB3_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB3_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test3(i32 %x, i32 %n) nounwind {			define void @test3(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB4_2			; CHECK-NEXT: jb .LBB4_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB4_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test3b(i32 %x, i32 %n) nounwind {			define void @test3b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test3b:			; CHECK-LABEL: test3b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB5_2			; CHECK-NEXT: jb .LBB5_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB5_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @testne2(i32 %x, i32 %n) nounwind {			define void @testne2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: testne2:			; CHECK-LABEL: testne2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB6_2			; CHECK-NEXT: jae .LBB6_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB6_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp ne i32 %tmp3, 0			%tmp4 = icmp ne i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @testne2b(i32 %x, i32 %n) nounwind {			define void @testne2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: testne2b:			; CHECK-LABEL: testne2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB7_2			; CHECK-NEXT: jae .LBB7_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB7_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp ne i32 %tmp3, 0			%tmp4 = icmp ne i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @atestne2(i32 %x, i32 %n) nounwind {			define void @atestne2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: atestne2:			; CHECK-LABEL: atestne2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB8_2			; CHECK-NEXT: jae .LBB8_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB8_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp ne i32 %tmp3, 0			%tmp4 = icmp ne i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @atestne2b(i32 %x, i32 %n) nounwind {			define void @atestne2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: atestne2b:			; CHECK-LABEL: atestne2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB9_2			; CHECK-NEXT: jae .LBB9_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB9_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp ne i32 %tmp3, 0			%tmp4 = icmp ne i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @testne3(i32 %x, i32 %n) nounwind {			define void @testne3(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: testne3:			; CHECK-LABEL: testne3:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB10_2			; CHECK-NEXT: jae .LBB10_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB10_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp ne i32 %tmp3, 0			%tmp4 = icmp ne i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @testne3b(i32 %x, i32 %n) nounwind {			define void @testne3b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: testne3b:			; CHECK-LABEL: testne3b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB11_2			; CHECK-NEXT: jae .LBB11_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB11_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp ne i32 %tmp3, 0			%tmp4 = icmp ne i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @query2(i32 %x, i32 %n) nounwind {			define void @query2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: query2:			; CHECK-LABEL: query2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB12_2			; CHECK-NEXT: jae .LBB12_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB12_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp eq i32 %tmp3, 1			%tmp4 = icmp eq i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @query2b(i32 %x, i32 %n) nounwind {			define void @query2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: query2b:			; CHECK-LABEL: query2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB13_2			; CHECK-NEXT: jae .LBB13_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB13_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 1			%tmp4 = icmp eq i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @aquery2(i32 %x, i32 %n) nounwind {			define void @aquery2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: aquery2:			; CHECK-LABEL: aquery2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB14_2			; CHECK-NEXT: jae .LBB14_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB14_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp eq i32 %tmp3, 1			%tmp4 = icmp eq i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @aquery2b(i32 %x, i32 %n) nounwind {			define void @aquery2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: aquery2b:			; CHECK-LABEL: aquery2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB15_2			; CHECK-NEXT: jae .LBB15_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB15_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 1			%tmp4 = icmp eq i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @query3(i32 %x, i32 %n) nounwind {			define void @query3(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: query3:			; CHECK-LABEL: query3:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB16_2			; CHECK-NEXT: jae .LBB16_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB16_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp eq i32 %tmp3, %tmp29			%tmp4 = icmp eq i32 %tmp3, %tmp29
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @query3b(i32 %x, i32 %n) nounwind {			define void @query3b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: query3b:			; CHECK-LABEL: query3b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB17_2			; CHECK-NEXT: jae .LBB17_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB17_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp eq i32 %tmp3, %tmp29			%tmp4 = icmp eq i32 %tmp3, %tmp29
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @query3x(i32 %x, i32 %n) nounwind {			define void @query3x(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: query3x:			; CHECK-LABEL: query3x:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB18_2			; CHECK-NEXT: jae .LBB18_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB18_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp eq i32 %tmp29, %tmp3			%tmp4 = icmp eq i32 %tmp29, %tmp3
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @query3bx(i32 %x, i32 %n) nounwind {			define void @query3bx(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: query3bx:			; CHECK-LABEL: query3bx:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jae .LBB19_2			; CHECK-NEXT: jae .LBB19_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB19_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp eq i32 %tmp29, %tmp3			%tmp4 = icmp eq i32 %tmp29, %tmp3
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @queryne2(i32 %x, i32 %n) nounwind {			define void @queryne2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: queryne2:			; CHECK-LABEL: queryne2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB20_2			; CHECK-NEXT: jb .LBB20_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB20_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp ne i32 %tmp3, 1			%tmp4 = icmp ne i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @queryne2b(i32 %x, i32 %n) nounwind {			define void @queryne2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: queryne2b:			; CHECK-LABEL: queryne2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB21_2			; CHECK-NEXT: jb .LBB21_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB21_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp ne i32 %tmp3, 1			%tmp4 = icmp ne i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @aqueryne2(i32 %x, i32 %n) nounwind {			define void @aqueryne2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: aqueryne2:			; CHECK-LABEL: aqueryne2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB22_2			; CHECK-NEXT: jb .LBB22_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB22_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp ne i32 %tmp3, 1			%tmp4 = icmp ne i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @aqueryne2b(i32 %x, i32 %n) nounwind {			define void @aqueryne2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: aqueryne2b:			; CHECK-LABEL: aqueryne2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB23_2			; CHECK-NEXT: jb .LBB23_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB23_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp ne i32 %tmp3, 1			%tmp4 = icmp ne i32 %tmp3, 1
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @queryne3(i32 %x, i32 %n) nounwind {			define void @queryne3(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: queryne3:			; CHECK-LABEL: queryne3:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB24_2			; CHECK-NEXT: jb .LBB24_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB24_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp ne i32 %tmp3, %tmp29			%tmp4 = icmp ne i32 %tmp3, %tmp29
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @queryne3b(i32 %x, i32 %n) nounwind {			define void @queryne3b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: queryne3b:			; CHECK-LABEL: queryne3b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB25_2			; CHECK-NEXT: jb .LBB25_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB25_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp ne i32 %tmp3, %tmp29			%tmp4 = icmp ne i32 %tmp3, %tmp29
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @queryne3x(i32 %x, i32 %n) nounwind {			define void @queryne3x(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: queryne3x:			; CHECK-LABEL: queryne3x:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB26_2			; CHECK-NEXT: jb .LBB26_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB26_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp ne i32 %tmp29, %tmp3			%tmp4 = icmp ne i32 %tmp29, %tmp3
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @queryne3bx(i32 %x, i32 %n) nounwind {			define void @queryne3bx(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: queryne3bx:			; CHECK-LABEL: queryne3bx:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB27_2			; CHECK-NEXT: jb .LBB27_2
	;			; CHECK-NEXT: # BB#1: # %bb
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: callq foo
				; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .LBB27_2: # %UnifiedReturnBlock
				; CHECK-NEXT: retq
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp ne i32 %tmp29, %tmp3			%tmp4 = icmp ne i32 %tmp29, %tmp3
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	declare void @foo()			declare void @foo()

	define zeroext i1 @invert(i32 %flags, i32 %flag) nounwind {			define zeroext i1 @invert(i32 %flags, i32 %flag) nounwind {
	; CHECK-LABEL: invert:			; CHECK-LABEL: invert:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: notl %edi			; CHECK-NEXT: notl %edi
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: setb %al			; CHECK-NEXT: setb %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;
	%neg = xor i32 %flags, -1			%neg = xor i32 %flags, -1
	%shl = shl i32 1, %flag			%shl = shl i32 1, %flag
	%and = and i32 %shl, %neg			%and = and i32 %shl, %neg
	%tobool = icmp ne i32 %and, 0			%tobool = icmp ne i32 %and, 0
	ret i1 %tobool			ret i1 %tobool
	}			}

	define zeroext i1 @extend(i32 %bit, i64 %bits) {			define zeroext i1 @extend(i32 %bit, i64 %bits) {
	; CHECK-LABEL: extend:			; CHECK-LABEL: extend:
	; CHECK: # BB#0:			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %edi, %esi			; CHECK-NEXT: btl %edi, %esi
				; CHECK-NEXT: setb %al
				; CHECK-NEXT: retq
	entry:			entry:
	%and = and i32 %bit, 31			%and = and i32 %bit, 31
	%sh_prom = zext i32 %and to i64			%sh_prom = zext i32 %and to i64
	%shl = shl i64 1, %sh_prom			%shl = shl i64 1, %sh_prom
	%and1 = and i64 %shl, %bits			%and1 = and i64 %shl, %bits
	%tobool = icmp ne i64 %and1, 0			%tobool = icmp ne i64 %and1, 0
	ret i1 %tobool			ret i1 %tobool
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/X86/fp-une-cmp.ll

	Show All 30 Lines
	; CHECK-NEXT: jp .LBB0_2			; CHECK-NEXT: jp .LBB0_2
	; CHECK-NEXT: # BB#1: # %bb1			; CHECK-NEXT: # BB#1: # %bb1
	; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0			; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0
	; CHECK-NEXT: .LBB0_2: # %bb2			; CHECK-NEXT: .LBB0_2: # %bb2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq

	entry:			entry:
	%mul = fmul double %x, %y			%mul = fmul double %x, %y
	%cmp = fcmp une double %mul, 0.000000e+00			%cmp = fcmp oeq double %mul, 0.000000e+00
	br i1 %cmp, label %bb2, label %bb1			br i1 %cmp, label %bb1, label %bb2

	bb1:			bb1:
	%add = fadd double %mul, -1.000000e+00			%add = fadd double %mul, -1.000000e+00
	br label %bb2			br label %bb2

	bb2:			bb2:
	%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]			%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]
	ret double %phi			ret double %phi
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

test/CodeGen/X86/jump_sign.ll

	; RUN: llc < %s -march=x86 -mcpu=pentiumpro -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -march=x86 -mcpu=pentiumpro -verify-machineinstrs \| FileCheck %s

	define i32 @func_f(i32 %X) {			define i32 @func_f(i32 %X) {
	entry:			entry:
	; CHECK-LABEL: func_f:			; CHECK-LABEL: func_f:
	; CHECK: jns			; CHECK: jns
	%tmp1 = add i32 %X, 1 ; <i32> [#uses=1]			%tmp1 = add i32 %X, 1 ; <i32> [#uses=1]
	%tmp = icmp slt i32 %tmp1, 0 ; <i1> [#uses=1]			%tmp = icmp slt i32 %tmp1, 0 ; <i1> [#uses=1]
	br i1 %tmp, label %cond_true, label %cond_next			br i1 %tmp, label %cond_true, label %cond_next, !prof !1

	cond_true: ; preds = %entry			cond_true: ; preds = %entry
	%tmp2 = tail call i32 (...) @bar( ) ; <i32> [#uses=0]			%tmp2 = tail call i32 (...) @bar( ) ; <i32> [#uses=0]
	br label %cond_next			br label %cond_next

	cond_next: ; preds = %cond_true, %entry			cond_next: ; preds = %cond_true, %entry
	%tmp3 = tail call i32 (...) @baz( ) ; <i32> [#uses=0]			%tmp3 = tail call i32 (...) @baz( ) ; <i32> [#uses=0]
	ret i32 undef			ret i32 undef
	▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
	if.then:			if.then:
	%dec = add nsw i32 %1, -1			%dec = add nsw i32 %1, -1
	store i32 %dec, i32* @a, align 4			store i32 %dec, i32* @a, align 4
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret i32 undef			ret i32 undef
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/X86/testb-je-fusion.ll

	; RUN: llc < %s -march=x86-64 -mcpu=corei7-avx \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=corei7-avx \| FileCheck %s

	; testb should be scheduled right before je to enable macro-fusion.			; testb should be scheduled right before je to enable macro-fusion.

	; CHECK: testb $2, %{{[abcd]}}h			; CHECK: testb $2, %{{[abcd]}}h
	; CHECK-NEXT: je			; CHECK-NEXT: je

	define i32 @check_flag(i32 %flags, ...) nounwind {			define i32 @check_flag(i32 %flags, ...) nounwind {
	entry:			entry:
	%and = and i32 %flags, 512			%and = and i32 %flags, 512
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then, !prof !1

	if.then:			if.then:
	br label %if.end			br label %if.end

	if.end:			if.end:
	%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]			%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]
	ret i32 %hasflag			ret i32 %hasflag
	}			}
				!1 = !{!"branch_weights", i32 1, i32 2}

This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 86381

lib/CodeGen/BranchFolding.h

lib/CodeGen/BranchFolding.cpp

lib/CodeGen/MachineBlockPlacement.cpp

test/CodeGen/AArch64/arm64-atomic.ll

test/CodeGen/AArch64/arm64-shrink-wrapping.ll

test/CodeGen/AArch64/tail-dup-repeat-worklist.ll

test/CodeGen/AArch64/tbz-tbnz.ll

test/CodeGen/AMDGPU/branch-relaxation.ll

test/CodeGen/AMDGPU/uniform-cfg.ll

test/CodeGen/ARM/arm-and-tst-peephole.ll

test/CodeGen/ARM/atomic-op.ll

test/CodeGen/ARM/atomic-ops-v8.ll

test/CodeGen/ARM/cmpxchg-weak.ll

test/CodeGen/Mips/brconnez.ll

test/CodeGen/Mips/micromips-compact-branches.ll

test/CodeGen/PowerPC/misched-inorder-latency.ll

test/CodeGen/PowerPC/tail-dup-break-cfg.ll

test/CodeGen/PowerPC/tail-dup-layout.ll

test/CodeGen/SPARC/sjlj.ll

test/CodeGen/SystemZ/int-cmp-44.ll

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

test/CodeGen/Thumb2/cbnz.ll

test/CodeGen/Thumb2/ifcvt-compare.ll

test/CodeGen/Thumb2/v8_IT_4.ll

test/CodeGen/WebAssembly/phi.ll

test/CodeGen/X86/avx512-cmp.ll

test/CodeGen/X86/bt.ll

test/CodeGen/X86/fp-une-cmp.ll

test/CodeGen/X86/jump_sign.ll

test/CodeGen/X86/testb-je-fusion.ll

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic