This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic

Authored by iteratee on Jan 11 2017, 3:07 PM.

Download Raw Diff

Details

Reviewers

davidxl
• tstellarAMD
arsenm
javed.absar

Commits

rGb15c06677c63: CodeGen: Allow small copyable blocks to "break" the CFG.
rL293716: CodeGen: Allow small copyable blocks to "break" the CFG.

Summary

When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Diff Detail

Repository: rL LLVM

Event Timeline

iteratee updated this revision to Diff 84034.Jan 11 2017, 3:07 PM

iteratee retitled this revision from to CodeGen: Allow small copyable blocks to "break" the CFG..

iteratee updated this object.

iteratee added a reviewer: davidxl.

iteratee set the repository for this revision to rL LLVM.

iteratee added subscribers: echristo, timshen, chandlerc, llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJan 11 2017, 3:07 PM

Herald added subscribers: nhaehnle, nemanjai, jyknight and 2 others. · View Herald Transcript

iteratee updated this object.Jan 11 2017, 3:08 PM

iteratee edited edge metadata.

iteratee updated this revision to Diff 84036.Jan 11 2017, 3:15 PM

iteratee added a reviewer: arsenm.

iteratee removed rL LLVM as the repository for this revision.

Herald edited edge metadata. · View Herald TranscriptJan 11 2017, 3:15 PM

Herald added a subscriber: wdng. · View Herald Transcript

Realized that one of the calculations I did was only valid for D28522. Re-worked the calculation for now, and will rebase and update the calculation there.

Herald edited edge metadata. · View Herald TranscriptJan 11 2017, 4:34 PM

junbuml added a subscriber: junbuml.Jan 12 2017, 7:11 AM

I like the direction (with more precise cost analysis) this is going. Will review the code soon.

iteratee mentioned this in D28522: Codegen: Make chains from trellis-shaped CFGs.Jan 12 2017, 12:03 PM

iteratee added a child revision: D28522: Codegen: Make chains from trellis-shaped CFGs.

davidxl added inline comments.Jan 13 2017, 10:04 AM

lib/CodeGen/MachineBlockPlacement.cpp
654 ↗	(On Diff #84053)	In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block. Dependinng the result, we will know that without tailDup, the layout order is Succ-> PDom or Succ->D->PDom. This will make the cost computation more precise.

davidxl added inline comments.Jan 13 2017, 10:04 AM

lib/CodeGen/MachineBlockPlacement.cpp
413 ↗	(On Diff #84053)	Suggest new name : isProfitableToTailDup
612 ↗	(On Diff #84053)	Dom -> PDom
619 ↗	(On Diff #84053)	Why not just check if there exists a SuccSucc that post dominates Succ directly?
638 ↗	(On Diff #84053)	PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ --> D so the overall cost is: Q + P V + Q ( which is smaller than Q + QV + PU + PV)
639 ↗	(On Diff #84053)	We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method is called. Besides, if Succ is not tail-duped, the layout decision may even reject Succ as the layout successor, so the cost is no longer P + V, but 2*Q + V instead (with U > V). In other words, isProfitable check can not be done inside 'hasBetterLayoutPredecessor', but hoisted to the caller of it when 'hasBetterLayoutPrecessor' returns, at which point we will know the layout decision if taildup does not kick in.

Updated the cost calculation to not rely on the lattice layout.
This resulted in fewer duplications in tests, so those tests changes have been rolled into D28522

Herald edited edge metadata. · View Herald TranscriptJan 13 2017, 3:42 PM

I made the calculations in terms of frequency instead of probability.

I adjusted the cost calculation when there is a post dominator based on whether it will be laid out after Succ or not.

Let me know if there are any cost calculations that you think are wrong.

Herald added a reviewer: javed.absar. · View Herald TranscriptJan 19 2017, 5:25 PM

Actually upload the diff with what I said was in the last one:
Use frequency instead of probability

Use slight lookahead for more precise probability calculations.

Let me know what you think. There is a small cleanup that could go in as a separate patch: I switched to a SmallDenseSet because we don't need the orderedness of the SmallVectorSet.

lib/CodeGen/MachineBlockPlacement.cpp
638 ↗	(On Diff #84053)	I thought that too. But without the lattice patch, after duplication, we won't put D after Succ because it now has an unplaced predecessor. The lattice patch fixes the behavior and the calculation.
639 ↗	(On Diff #84053)	I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor.
639 ↗	(On Diff #84053)	We now only call this function to check if we should use Succ despite it having been rejected. So we know that Succ is not the layout successor.

Tidied comments and spacing.

davidxl added inline comments.Jan 20 2017, 2:04 PM

lib/CodeGen/MachineBlockPlacement.cpp
268 ↗	(On Diff #85084)	There is a reason SmallSetVector is used here -- to make sure the iteration order is deterministic.
642 ↗	(On Diff #85084)	I assume this is loop back edge source block. You need a test case to cover it.
653 ↗	(On Diff #85084)	Why break here?
658 ↗	(On Diff #85084)	nit: -->SuccBestPred
666 ↗	(On Diff #85084)	Computing BestSuccPred here is unnecessary. See below for more comments.
675 ↗	(On Diff #85084)	Qin is not necessarily BestSuccPred. Profitability check is called only after hasBetterLayoutPredecessor is returned and it returns true. There are two scenarios it returns true Qin or Qout is larger than P, or P is larger than Qout, but not the branch is not biased enough such that the layout algorithm still decides to keep the top-order. Either way, the baseline layout to compare (with taildup) is that BB->Succ is the branch taken edge, and BB->C is the fall through edge. Qin should just be Prob(BB->C)
691 ↗	(On Diff #85084)	PDom is always a successor of Succ according to the way it is computed.
697 ↗	(On Diff #85084)	The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the fallthrough path is BB->Succ->D so the cost (normalized with freq(bb) ==1) is 2Q+ PV If P < Q, the fall through path is BB->C'->D the cost is 2P + QV
840 ↗	(On Diff #85084)	Add more description about what blocks to ignore.

I'll be glad to add some more comments to explain, but I think the calculations are correct. I've commented individually.

lib/CodeGen/MachineBlockPlacement.cpp
639 ↗	(On Diff #84053)	I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor.
268 ↗	(On Diff #85084)	BlockFilterSet is never iterated. I checked.
653 ↗	(On Diff #85084)	Because if PDom is not null, that's all that we look at for the probability calculation.
675 ↗	(On Diff #85084)	When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be larger than Freq(BB->C). I am using Qin to represent Freq(C'->Succ) and Qout for Freq(BB->C). I could just use different letters if that were more clear. Qout is Freq(BB->C). I don't think Qin should be as well.
691 ↗	(On Diff #85084)	Thanks.
697 ↗	(On Diff #85084)	This function is called in a loop looking for the highest probability successor. If Q > P, this function will be ignored and we will lay out Q anyway, so we can ignore the second case. As to the first case: Until the 2nd patch lands, the duplication will prevent the BB->Succ->D layout. Instead you will get BB->Succ ; C'->D So the cost is as calculated. D28522 will include an update to this calculation along with an update to the behavior.
840 ↗	(On Diff #85084)	Well, that's really up to the caller. Do you want me to list why you might want to ignore a block?

davidxl added inline comments.Jan 20 2017, 4:19 PM

lib/CodeGen/MachineBlockPlacement.cpp
268 ↗	(On Diff #85084)	See for (MachineBasicBlock *LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet);
675 ↗	(On Diff #85084)	differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB->Succ). What I meant is you should directly compute Qin as its definition Freq(C'->Succ)
697 ↗	(On Diff #85084)	You are right about Q > P case that that scenario will be dropped. It is very subtle, so please add some comment to clarify. Ok -- for the first case, also add a comment
840 ↗	(On Diff #85084)	something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details. However, see my comment in the function, this parameter seems unnecessary.
974 ↗	(On Diff #85084)	I think it is equivalent to check Pred == BB. In normal calling context, this is covered by BlockToChain[Pred] == &Chain, but for lookahead case, it is needed to filter BB which is not laid out yet.

Improved comments based on review.

Please mark addressed comments as done. Also let me know if it is ready for another round of review (I saw some issues not addressed such as the deterministic iteration of block filter set).

Missed a comment to rename something.

In D28583#653869, @davidxl wrote:

Please mark addressed comments as done. Also let me know if it is ready for another round of review (I saw some issues not addressed such as the deterministic iteration of block filter set).

Marked.

I think it's ready, and I put back the deterministic set.

lib/CodeGen/MachineBlockPlacement.cpp
675 ↗	(On Diff #85084)	Did you still want me to fix something here?

davidxl added inline comments.Jan 23 2017, 2:39 PM

lib/CodeGen/MachineBlockPlacement.cpp
642 ↗	(On Diff #85084)	test case for this?
675 ↗	(On Diff #85084)	just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming edges which have not been placed.
671 ↗	(On Diff #85451)	Add a short cut here with comments: // If P is not larger, the best successor selection loop will eventually select C, not Succ (as it is not profitable to do so). if (P <= Qout) return false;
969 ↗	(On Diff #85451)	--> ... for lookhead by isProfitableToTailDup when BB has not yet been placed.

More comments from review, and a new test case.

This version looks almost fine except for one remaining unaddressed comment.

lib/CodeGen/MachineBlockPlacement.cpp
671 ↗	(On Diff #85451)	How about this comment? Early return can 1) speed up the computation and 2) make the following code easier to understand.

iteratee added inline comments.Jan 23 2017, 8:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
642 ↗	(On Diff #85084)	It's not just a back edge. I added a test case.
671 ↗	(On Diff #85451)	If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know that we won't use the result.

Sorry. I'd replied to the comment, but Phabricator didn't submit it along with my diff update for some reason.

Save the blocks with CFG violations that are duplication candidates. Review them in descending order of probability, so we call isProfitableToTailDup the minimum number of times.

davidxl added inline comments.Jan 24 2017, 4:43 PM

lib/CodeGen/MachineBlockPlacement.cpp
1059 ↗	(On Diff #85634)	Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win depending on the iteration order, but adds more confusion.
1076 ↗	(On Diff #85634)	no need to set ShouldTailDup in the loop -- it is already initalized outside.
1086 ↗	(On Diff #85634)	Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do you need position ?
1096 ↗	(On Diff #85634)	Should it break instead?
1098 ↗	(On Diff #85634)	isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not be true here as there are other two possibilities: Succ == BestSucc.BB in the base layout BestSucc.BB == null in the base layout (all BB's successors have conflicts). In such two cases, isProfitable check should probably be skipped (as it is benefitial)

Changes from comments:
Just sort the vector instead of make_heap.
If there is a tail duplication opportunity and no other successor, take it.

lib/CodeGen/MachineBlockPlacement.cpp
1086 ↗	(On Diff #85634)	Will just sort the vector. Position is because we rely on the successor order being stable and the first successor being a subtle hint. Without the position, we lose track of whether the block in the vector came before or after the block we picked without tail duplication.
1098 ↗	(On Diff #85634)	Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the first loop by construction. Good catch. I'll add that.

Per offline discussion, I removed the ordering constraint for blocks that are profitable to tail-duplicate.

This resulted in a lot of test churn, but the source change is relatively small.

This looks very clean now.

However the amount of churns remind me of one thing. Since the profit computation is based on static branch prediction (without PGO), it is the right thing to do to be a little more conservative in taildup. In other words, instead of making 'isProtifiable' return true when the taildup cost is smaller than baseline cost, add a predefined margin (controlled by a parameter):

if (baseline_cost - taildup_cost > threshold)

return true;

return false;

The threshold also roughly models the side effect of taildup -- increased icache footprint etc due to code size increase.

Compare frequencies with a small bias against the tail-duplication side to account for increased icache pressure.

Includes a TODO to handle edge frequencies better in general.

davidxl added inline comments.Jan 30 2017, 4:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
151 ↗	(On Diff #86340)	perhaps simplify it to tail-dup-penalty ?
620 ↗	(On Diff #86340)	This basically treats the penality percent parameter as the threshold of normalized improvement: (A-B)/B if ((A-B)/B > PenaltyPercent/100) return true; The problem with this formula is that if B is very hot, it makes (A-B)/B become small, even though the (A-B) is still large. So I think it is better to compute the normalized improvement as (A-B)/Entry_Freq basically the improvement relative to the entry frequency. This will help prevent tail dup from happening in very cold paths. The implementation can makes use of BranchProbablity as well. Suppose we want to implement condition: if ( (A-B)/Entry_Freq > P/100) return true; do this 3 lines: BlockFrequency Profit = A - B; BlockFrequency Threshold = Entry_Freq * BranchProbability(P, 100); return Profit > Threshold;

Use a percentage of the entry frequency as a cutoff.

davidxl added inline comments.Jan 30 2017, 7:23 PM

lib/CodeGen/MachineBlockPlacement.cpp
156 ↗	(On Diff #86381)	Is this default value too low? Increase it 5 or 10 perhaps?
625 ↗	(On Diff #86381)	I suppose this logic here is for rounding errors or overflow? Can you explain why the simple scaling with branch prob (in BranchProbablity.cpp) does not work? return Gain > EntryFreq*ThresholdProb;

Simplify the biased comparison.

iteratee marked 4 inline comments as done.Jan 31 2017, 11:36 AM

iteratee added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
156 ↗	(On Diff #86381)	No, I think we should leave it. Now that it's a flag it's easy to change, and especially comparing with the entry frequency 2% is a big enough margin.
625 ↗	(On Diff #86381)	I did the math, and found a way to do it simply.

lgtm

(I only sampled some test case changes which look reasonable)

test/CodeGen/X86/bt.ll
27 ↗	(On Diff #86468)	This test has not changed in behavior. Better to revert the change.

This revision is now accepted and ready to land.Jan 31 2017, 1:45 PM

iteratee marked an inline comment as done.Jan 31 2017, 1:48 PM

iteratee added inline comments.

test/CodeGen/X86/bt.ll
27 ↗	(On Diff #86468)	I'll do a complete check for any tests that fall into this category and revert them.

Closed by commit rL293716: CodeGen: Allow small copyable blocks to "break" the CFG. (authored by iteratee). · Explain WhyJan 31 2017, 3:59 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

BranchFolding.h

1 line

BranchFolding.cpp

5 lines

MachineBlockPlacement.cpp

362 lines

test/

CodeGen/

AArch64/

arm64-atomic.ll

22 lines

arm64-shrink-wrapping.ll

14 lines

tail-dup-repeat-worklist.ll

69 lines

tbz-tbnz.ll

16 lines

AMDGPU/

branch-relaxation.ll

16 lines

uniform-cfg.ll

22 lines

ARM/

arm-and-tst-peephole.ll

6 lines

atomic-op.ll

4 lines

atomic-ops-v8.ll

35 lines

cmpxchg-weak.ll

8 lines

Mips/

brconnez.ll

4 lines

micromips-compact-branches.ll

3 lines

PowerPC/

misched-inorder-latency.ll

4 lines

tail-dup-break-cfg.ll

140 lines

SPARC/

sjlj.ll

9 lines

SystemZ/

int-cmp-44.ll

6 lines

Thumb/

thumb-shrink-wrapping.ll

11 lines

Thumb2/

cbnz.ll

2 lines

ifcvt-compare.ll

2 lines

v8_IT_4.ll

5 lines

WebAssembly/

phi.ll

5 lines

X86/

5 lines

8 lines

4 lines

4 lines

Diff 86517

llvm/trunk/lib/CodeGen/BranchFolding.h

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	public:
MBFIWrapper(const MachineBlockFrequencyInfo &I) : MBFI(I) {}		MBFIWrapper(const MachineBlockFrequencyInfo &I) : MBFI(I) {}
BlockFrequency getBlockFreq(const MachineBasicBlock *MBB) const;		BlockFrequency getBlockFreq(const MachineBasicBlock *MBB) const;
void setBlockFreq(const MachineBasicBlock *MBB, BlockFrequency F);		void setBlockFreq(const MachineBasicBlock *MBB, BlockFrequency F);
raw_ostream &printBlockFreq(raw_ostream &OS,		raw_ostream &printBlockFreq(raw_ostream &OS,
const MachineBasicBlock *MBB) const;		const MachineBasicBlock *MBB) const;
raw_ostream &printBlockFreq(raw_ostream &OS,		raw_ostream &printBlockFreq(raw_ostream &OS,
const BlockFrequency Freq) const;		const BlockFrequency Freq) const;
void view(bool isSimple = true);		void view(bool isSimple = true);
		uint64_t getEntryFreq() const;

private:		private:
const MachineBlockFrequencyInfo &MBFI;		const MachineBlockFrequencyInfo &MBFI;
DenseMap<const MachineBasicBlock *, BlockFrequency> MergedBBFreq;		DenseMap<const MachineBasicBlock *, BlockFrequency> MergedBBFreq;
};		};

private:		private:
MBFIWrapper &MBBFreqInfo;		MBFIWrapper &MBBFreqInfo;
Show All 32 Lines

llvm/trunk/lib/CodeGen/BranchFolding.cpp

	Show First 20 Lines • Show All 494 Lines • ▼ Show 20 Lines
	raw_ostream &			raw_ostream &
	BranchFolder::MBFIWrapper::printBlockFreq(raw_ostream &OS,			BranchFolder::MBFIWrapper::printBlockFreq(raw_ostream &OS,
	const BlockFrequency Freq) const {			const BlockFrequency Freq) const {
	return MBFI.printBlockFreq(OS, Freq);			return MBFI.printBlockFreq(OS, Freq);
	}			}

	void BranchFolder::MBFIWrapper::view(bool isSimple) { MBFI.view(isSimple); }			void BranchFolder::MBFIWrapper::view(bool isSimple) { MBFI.view(isSimple); }

				uint64_t
				BranchFolder::MBFIWrapper::getEntryFreq() const {
				return MBFI.getEntryFreq();
				}

	/// CountTerminators - Count the number of terminators in the given			/// CountTerminators - Count the number of terminators in the given
	/// block and set I to the position of the first non-terminator, if there			/// block and set I to the position of the first non-terminator, if there
	/// is one, or MBB->end() otherwise.			/// is one, or MBB->end() otherwise.
	static unsigned CountTerminators(MachineBasicBlock *MBB,			static unsigned CountTerminators(MachineBasicBlock *MBB,
	MachineBasicBlock::iterator &I) {			MachineBasicBlock::iterator &I) {
	I = MBB->end();			I = MBB->end();
	unsigned NumTerms = 0;			unsigned NumTerms = 0;
	for (;;) {			for (;;) {
	▲ Show 20 Lines • Show All 1,425 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp

Show All 35 Lines
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
		#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/CodeGen/TailDuplicator.h"		#include "llvm/CodeGen/TailDuplicator.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <algorithm>		#include <algorithm>
		#include <functional>
		#include <utility>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "block-placement"		#define DEBUG_TYPE "block-placement"

STATISTIC(NumCondBranches, "Number of conditional branches");		STATISTIC(NumCondBranches, "Number of conditional branches");
STATISTIC(NumUncondBranches, "Number of unconditional branches");		STATISTIC(NumUncondBranches, "Number of unconditional branches");
STATISTIC(CondBranchTakenFreq,		STATISTIC(CondBranchTakenFreq,
"Potential frequency of taking conditional branches");		"Potential frequency of taking conditional branches");
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines

static cl::opt<bool>		static cl::opt<bool>
BranchFoldPlacement("branch-fold-placement",		BranchFoldPlacement("branch-fold-placement",
cl::desc("Perform branch folding during placement. "		cl::desc("Perform branch folding during placement. "
"Reduces code size."),		"Reduces code size."),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

// Heuristic for tail duplication.		// Heuristic for tail duplication.
static cl::opt<unsigned> TailDuplicatePlacementThreshold(		static cl::opt<unsigned> TailDupPlacementThreshold(
"tail-dup-placement-threshold",		"tail-dup-placement-threshold",
cl::desc("Instruction cutoff for tail duplication during layout. "		cl::desc("Instruction cutoff for tail duplication during layout. "
"Tail merging during layout is forced to have a threshold "		"Tail merging during layout is forced to have a threshold "
"that won't conflict."), cl::init(2),		"that won't conflict."), cl::init(2),
cl::Hidden);		cl::Hidden);

		// Heuristic for tail duplication.
		static cl::opt<unsigned> TailDupPlacementPenalty(
		"tail-dup-placement-penalty",
		cl::desc("Cost penalty for blocks that can avoid breaking CFG by copying. "
		"Copying can increase fallthrough, but it also increases icache "
		"pressure. This parameter controls the penalty to account for that. "
		"Percent as integer."),
		cl::init(2),
		cl::Hidden);

extern cl::opt<unsigned> StaticLikelyProb;		extern cl::opt<unsigned> StaticLikelyProb;
extern cl::opt<unsigned> ProfileLikelyProb;		extern cl::opt<unsigned> ProfileLikelyProb;

#ifndef NDEBUG		#ifndef NDEBUG
extern cl::opt<GVDAGType> ViewBlockLayoutWithBFI;		extern cl::opt<GVDAGType> ViewBlockLayoutWithBFI;
extern cl::opt<std::string> ViewBlockFreqFuncName;		extern cl::opt<std::string> ViewBlockFreqFuncName;
#endif		#endif

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
};		};
}		}

namespace {		namespace {
class MachineBlockPlacement : public MachineFunctionPass {		class MachineBlockPlacement : public MachineFunctionPass {
/// \brief A typedef for a block filter set.		/// \brief A typedef for a block filter set.
typedef SmallSetVector<MachineBasicBlock *, 16> BlockFilterSet;		typedef SmallSetVector<MachineBasicBlock *, 16> BlockFilterSet;

		/// Pair struct containing basic block and taildup profitiability
		struct BlockAndTailDupResult {
		MachineBasicBlock * BB;
		bool ShouldTailDup;
		};

/// \brief work lists of blocks that are ready to be laid out		/// \brief work lists of blocks that are ready to be laid out
SmallVector<MachineBasicBlock *, 16> BlockWorkList;		SmallVector<MachineBasicBlock *, 16> BlockWorkList;
SmallVector<MachineBasicBlock *, 16> EHPadWorkList;		SmallVector<MachineBasicBlock *, 16> EHPadWorkList;

/// \brief Machine Function		/// \brief Machine Function
MachineFunction *F;		MachineFunction *F;

/// \brief A handle to the branch probability pass.		/// \brief A handle to the branch probability pass.
Show All 11 Lines	class MachineBlockPlacement : public MachineFunctionPass {
MachineBasicBlock *PreferredLoopExit;		MachineBasicBlock *PreferredLoopExit;

/// \brief A handle to the target's instruction info.		/// \brief A handle to the target's instruction info.
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;

/// \brief A handle to the target's lowering info.		/// \brief A handle to the target's lowering info.
const TargetLoweringBase *TLI;		const TargetLoweringBase *TLI;

/// \brief A handle to the post dominator tree.		/// \brief A handle to the dominator tree.
MachineDominatorTree *MDT;		MachineDominatorTree *MDT;

		/// \brief A handle to the post dominator tree.
		MachinePostDominatorTree *MPDT;

/// \brief Duplicator used to duplicate tails during placement.		/// \brief Duplicator used to duplicate tails during placement.
///		///
/// Placement decisions can open up new tail duplication opportunities, but		/// Placement decisions can open up new tail duplication opportunities, but
/// since tail duplication affects placement decisions of later blocks, it		/// since tail duplication affects placement decisions of later blocks, it
/// must be done inline.		/// must be done inline.
TailDuplicator TailDup;		TailDuplicator TailDup;

/// \brief A set of blocks that are unavoidably execute, i.e. they dominate		/// \brief A set of blocks that are unavoidably execute, i.e. they dominate
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	bool maybeTailDuplicateBlock(MachineBasicBlock BB, MachineBasicBlock LPred,
BlockFilterSet *BlockFilter,		BlockFilterSet *BlockFilter,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
bool &DuplicatedToPred);		bool &DuplicatedToPred);
bool		bool
hasBetterLayoutPredecessor(MachineBasicBlock BB, MachineBasicBlock Succ,		hasBetterLayoutPredecessor(MachineBasicBlock BB, MachineBasicBlock Succ,
BlockChain &SuccChain, BranchProbability SuccProb,		BlockChain &SuccChain, BranchProbability SuccProb,
BranchProbability RealSuccProb, BlockChain &Chain,		BranchProbability RealSuccProb, BlockChain &Chain,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);
MachineBasicBlock selectBestSuccessor(MachineBasicBlock BB,		BlockAndTailDupResult selectBestSuccessor(MachineBasicBlock *BB,
BlockChain &Chain,		BlockChain &Chain,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);
MachineBasicBlock *		MachineBasicBlock *
selectBestCandidateBlock(BlockChain &Chain,		selectBestCandidateBlock(BlockChain &Chain,
SmallVectorImpl<MachineBasicBlock *> &WorkList);		SmallVectorImpl<MachineBasicBlock *> &WorkList);
MachineBasicBlock *		MachineBasicBlock *
getFirstUnplacedBlock(const BlockChain &PlacedChain,		getFirstUnplacedBlock(const BlockChain &PlacedChain,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);

Show All 16 Lines	#endif
void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,		void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,		void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void collectMustExecuteBBs();		void collectMustExecuteBBs();
void buildCFGChains();		void buildCFGChains();
void optimizeBranches();		void optimizeBranches();
void alignBlocks();		void alignBlocks();
		bool shouldTailDuplicate(MachineBasicBlock *BB);
		/// Check the edge frequencies to see if tail duplication will increase
		/// fallthroughs.
		bool isProfitableToTailDup(
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BranchProbability AdjustedSumProb,
		BlockChain &Chain, const BlockFilterSet *BlockFilter);
		/// Returns true if a block can tail duplicate into all unplaced
		/// predecessors. Filters based on loop.
		bool canTailDuplicateUnplacedPreds(
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BlockChain &Chain, const BlockFilterSet *BlockFilter);

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineBlockPlacement() : MachineFunctionPass(ID) {		MachineBlockPlacement() : MachineFunctionPass(ID) {
initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());		initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<MachineBranchProbabilityInfo>();		AU.addRequired<MachineBranchProbabilityInfo>();
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
		if (TailDupPlacement)
		AU.addRequired<MachinePostDominatorTree>();
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
AU.addRequired<TargetPassConfig>();		AU.addRequired<TargetPassConfig>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};
}		}

char MachineBlockPlacement::ID = 0;		char MachineBlockPlacement::ID = 0;
char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;		char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;
INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",
"Branch Probability Basic Block Placement", false, false)		"Branch Probability Basic Block Placement", false, false)
INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)
INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement",
"Branch Probability Basic Block Placement", false, false)		"Branch Probability Basic Block Placement", false, false)

#ifndef NDEBUG		#ifndef NDEBUG
/// \brief Helper to print the name of a MBB.		/// \brief Helper to print the name of a MBB.
///		///
/// Only used by debug logging.		/// Only used by debug logging.
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	getAdjustedProbability(BranchProbability OrigProb,
if (SuccProbN >= SuccProbD)		if (SuccProbN >= SuccProbD)
SuccProb = BranchProbability::getOne();		SuccProb = BranchProbability::getOne();
else		else
SuccProb = BranchProbability(SuccProbN, SuccProbD);		SuccProb = BranchProbability(SuccProbN, SuccProbD);

return SuccProb;		return SuccProb;
}		}

		/// Check if a block should be tail duplicated.
		/// \p BB Block to check.
		bool MachineBlockPlacement::shouldTailDuplicate(MachineBasicBlock *BB) {
		// Blocks with single successors don't create additional fallthrough
		// opportunities. Don't duplicate them. TODO: When conditional exits are
		// analyzable, allow them to be duplicated.
		bool IsSimple = TailDup.isSimpleBB(BB);

		if (BB->succ_size() == 1)
		return false;
		return TailDup.shouldTailDuplicate(IsSimple, *BB);
		}

		/// Compare 2 BlockFrequency's with a small penalty for \p A.
		/// In order to be conservative, we apply a X% penalty to account for
		/// increased icache pressure and static heuristics. For small frequencies
		/// we use only the numerators to improve accuracy. For simplicity, we assume the
		/// penalty is less than 100%
		/// TODO(iteratee): Use 64-bit fixed point edge frequencies everywhere.
		static bool greaterWithBias(BlockFrequency A, BlockFrequency B,
		uint64_t EntryFreq) {
		BranchProbability ThresholdProb(TailDupPlacementPenalty, 100);
		BlockFrequency Gain = A - B;
		return (Gain / ThresholdProb).getFrequency() >= EntryFreq;
		}

		/// Check the edge frequencies to see if tail duplication will increase
		/// fallthroughs. It only makes sense to call this function when
		/// \p Succ would not be chosen otherwise. Tail duplication of \p Succ is
		/// always locally profitable if we would have picked \p Succ without
		/// considering duplication.
		bool MachineBlockPlacement::isProfitableToTailDup(
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BranchProbability QProb,
		BlockChain &Chain, const BlockFilterSet *BlockFilter) {
		// We need to do a probability calculation to make sure this is profitable.
		// First: does succ have a successor that post-dominates? This affects the
		// calculation. The 2 relevant cases are:
		// BB BB
		// \| \Qout \| \Qout
		// P\| C \|P C
		// = C' = C'
		// \| /Qin \| /Qin
		// \| / \| /
		// Succ Succ
		// / \ \| \ V
		// U/ =V \|U \
		// / \ = D
		// D E \| /
		// \| /
		// \|/
		// PDom
		// '=' : Branch taken for that CFG edge
		// In the second case, Placing Succ while duplicating it into C prevents the
		// fallthrough of Succ into either D or PDom, because they now have C as an
		// unplaced predecessor

		// Start by figuring out which case we fall into
		MachineBasicBlock *PDom = nullptr;
		SmallVector<MachineBasicBlock *, 4> SuccSuccs;
		// Only scan the relevant successors
		auto AdjustedSuccSumProb =
		collectViableSuccessors(Succ, Chain, BlockFilter, SuccSuccs);
		BranchProbability PProb = MBPI->getEdgeProbability(BB, Succ);
		auto BBFreq = MBFI->getBlockFreq(BB);
		auto SuccFreq = MBFI->getBlockFreq(Succ);
		BlockFrequency P = BBFreq * PProb;
		BlockFrequency Qout = BBFreq * QProb;
		uint64_t EntryFreq = MBFI->getEntryFreq();
		// If there are no more successors, it is profitable to copy, as it strictly
		// increases fallthrough.
		if (SuccSuccs.size() == 0)
		return greaterWithBias(P, Qout, EntryFreq);

		auto BestSuccSucc = BranchProbability::getZero();
		// Find the PDom or the best Succ if no PDom exists.
		for (MachineBasicBlock *SuccSucc : SuccSuccs) {
		auto Prob = MBPI->getEdgeProbability(Succ, SuccSucc);
		if (Prob > BestSuccSucc)
		BestSuccSucc = Prob;
		if (PDom == nullptr)
		if (MPDT->dominates(SuccSucc, Succ)) {
		PDom = SuccSucc;
		break;
		}
		}
		// For the comparisons, we need to know Succ's best incoming edge that isn't
		// from BB.
		auto SuccBestPred = BlockFrequency(0);
		for (MachineBasicBlock *SuccPred : Succ->predecessors()) {
		if (SuccPred == Succ \|\| SuccPred == BB
		\|\| BlockToChain[SuccPred] == &Chain
		\|\| (BlockFilter && !BlockFilter->count(SuccPred)))
		continue;
		auto Freq = MBFI->getBlockFreq(SuccPred)
		* MBPI->getEdgeProbability(SuccPred, Succ);
		if (Freq > SuccBestPred)
		SuccBestPred = Freq;
		}
		// Qin is Succ's best unplaced incoming edge that isn't BB
		BlockFrequency Qin = SuccBestPred;
		// If it doesn't have a post-dominating successor, here is the calculation:
		// BB BB
		// \| \Qout \| \
		// P\| C \| =
		// = C' \| C
		// \| /Qin \| \|
		// \| / \| C' (+Succ)
		// Succ Succ /\|
		// / \ \| \/ \|
		// U/ =V = /= =
		// / \ \| / \\|
		// D E D E
		// '=' : Branch taken for that CFG edge
		// Cost in the first case is: P + V
		// For this calculation, we always assume P > Qout. If Qout > P
		// The result of this function will be ignored at the caller.
		// Cost in the second case is: Qout + Qin * V + P * U + P * V
		// TODO(iteratee): If we lay out D after Succ, the P * U term
		// goes away. This logic is coming in D28522.

		if (PDom == nullptr \|\| !Succ->isSuccessor(PDom)) {
		BranchProbability UProb = BestSuccSucc;
		BranchProbability VProb = AdjustedSuccSumProb - UProb;
		BlockFrequency V = SuccFreq * VProb;
		BlockFrequency QinV = Qin * VProb;
		BlockFrequency BaseCost = P + V;
		BlockFrequency DupCost = Qout + QinV + P * AdjustedSuccSumProb;
		return greaterWithBias(BaseCost, DupCost, EntryFreq);
		}
		BranchProbability UProb = MBPI->getEdgeProbability(Succ, PDom);
		BranchProbability VProb = AdjustedSuccSumProb - UProb;
		BlockFrequency U = SuccFreq * UProb;
		BlockFrequency V = SuccFreq * VProb;
		// If there is a post-dominating successor, here is the calculation:
		// BB BB BB BB
		// \| \Qout \| \ \| \Qout \| \
		// \|P C \| = \|P C \| =
		// = C' \|P C = C' \|P C
		// \| /Qin \| \| \| /Qin \| \|
		// \| / \| C' (+Succ) \| / \| C' (+Succ)
		// Succ Succ /\| Succ Succ /\|
		// \| \ V \| \/ \| \| \ V \| \/ \|
		// \|U \ \|U /\ \| \|U = \|U /\ \|
		// = D = = =\| \| D \| = =\|
		// \| / \|/ D \| / \|/ D
		// \| / \| / \| = \| /
		// \|/ \| / \|/ \| =
		// Dom Dom Dom Dom
		// '=' : Branch taken for that CFG edge
		// The cost for taken branches in the first case is P + U
		// The cost in the second case (assuming independence), given the layout:
		// BB, Succ, (C+Succ), D, Dom
		// is Qout + P * V + Qin * U
		// compare P + U vs Qout + P + Qin * U.
		//
		// The 3rd and 4th cases cover when Dom would be chosen to follow Succ.
		//
		// For the 3rd case, the cost is P + 2 * V
		// For the 4th case, the cost is Qout + Qin * U + P * V + V
		// We choose 4 over 3 when (P + V) > Qout + Qin * U + P * V
		if (UProb > AdjustedSuccSumProb / 2
		&& !hasBetterLayoutPredecessor(Succ, PDom, *BlockToChain[PDom],
		UProb, UProb, Chain, BlockFilter)) {
		// Cases 3 & 4
		return greaterWithBias((P + V), (Qout + Qin * UProb + P * VProb),
		EntryFreq);
		}
		// Cases 1 & 2
		return greaterWithBias(
		(P + U), (Qout + Qin * UProb + P * AdjustedSuccSumProb), EntryFreq);
		}


		/// When the option TailDupPlacement is on, this method checks if the
		/// fallthrough candidate block \p Succ (of block \p BB) can be tail-duplicated
		/// into all of its unplaced, unfiltered predecessors, that are not BB.
		bool MachineBlockPlacement::canTailDuplicateUnplacedPreds(
		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &Chain,
		const BlockFilterSet *BlockFilter) {
		if (!shouldTailDuplicate(Succ))
		return false;

		for (MachineBasicBlock *Pred : Succ->predecessors()) {
		// Make sure all unplaced and unfiltered predecessors can be
		// tail-duplicated into.
		if (Pred == BB \|\| (BlockFilter && !BlockFilter->count(Pred))
		\|\| BlockToChain[Pred] == &Chain)
		continue;
		if (!TailDup.canTailDuplicate(Succ, Pred))
		return false;
		}
		return true;
		}

/// When the option OutlineOptionalBranches is on, this method		/// When the option OutlineOptionalBranches is on, this method
/// checks if the fallthrough candidate block \p Succ (of block		/// checks if the fallthrough candidate block \p Succ (of block
/// \p BB) also has other unscheduled predecessor blocks which		/// \p BB) also has other unscheduled predecessor blocks which
/// are also successors of \p BB (forming triangular shape CFG).		/// are also successors of \p BB (forming triangular shape CFG).
/// If none of such predecessors are small, it returns true.		/// If none of such predecessors are small, it returns true.
/// The caller can choose to select \p Succ as the layout successors		/// The caller can choose to select \p Succ as the layout successors
/// so that \p Succ's predecessors (optional branches) can be		/// so that \p Succ's predecessors (optional branches) can be
/// outlined.		/// outlined.
Show All 32 Lines	static BranchProbability getLayoutSuccessorProbThreshold(
if (!BB->getParent()->getFunction()->getEntryCount())		if (!BB->getParent()->getFunction()->getEntryCount())
return BranchProbability(StaticLikelyProb, 100);		return BranchProbability(StaticLikelyProb, 100);
if (BB->succ_size() == 2) {		if (BB->succ_size() == 2) {
const MachineBasicBlock Succ1 = BB->succ_begin();		const MachineBasicBlock Succ1 = BB->succ_begin();
const MachineBasicBlock Succ2 = (BB->succ_begin() + 1);		const MachineBasicBlock Succ2 = (BB->succ_begin() + 1);
if (Succ1->isSuccessor(Succ2) \|\| Succ2->isSuccessor(Succ1)) {		if (Succ1->isSuccessor(Succ2) \|\| Succ2->isSuccessor(Succ1)) {
/* See case 1 below for the cost analysis. For BB->Succ to		/* See case 1 below for the cost analysis. For BB->Succ to
* be taken with smaller cost, the following needs to hold:		* be taken with smaller cost, the following needs to hold:
* Prob(BB->Succ) > 2* Prob(BB->Pred)		* Prob(BB->Succ) > 2 * Prob(BB->Pred)
* So the threshold T		* So the threshold T in the calculation below
* T = 2 * (1-Prob(BB->Pred). Since T + Prob(BB->Pred) == 1,		* (1-T) * Prob(BB->Succ) > T * Prob(BB->Pred)
* We have T + T/2 = 1, i.e. T = 2/3. Also adding user specified		* So T / (1 - T) = 2, Yielding T = 2/3
* branch bias, we have		* Also adding user specified branch bias, we have
* T = (2/3)*(ProfileLikelyProb/50)		* T = (2/3)*(ProfileLikelyProb/50)
* = (2*ProfileLikelyProb)/150)		* = (2*ProfileLikelyProb)/150)
*/		*/
return BranchProbability(2 * ProfileLikelyProb, 150);		return BranchProbability(2 * ProfileLikelyProb, 150);
}		}
}		}
return BranchProbability(ProfileLikelyProb, 100);		return BranchProbability(ProfileLikelyProb, 100);
}		}

/// Checks to see if the layout candidate block \p Succ has a better layout		/// Checks to see if the layout candidate block \p Succ has a better layout
/// predecessor than \c BB. If yes, returns true.		/// predecessor than \c BB. If yes, returns true.
		/// \p SuccProb: The probability adjusted for only remaining blocks.
		/// Only used for logging
		/// \p RealSuccProb: The un-adjusted probability.
		/// \p Chain: The chain that BB belongs to and Succ is being considered for.
		/// \p BlockFilter: if non-null, the set of blocks that make up the loop being
		/// considered
bool MachineBlockPlacement::hasBetterLayoutPredecessor(		bool MachineBlockPlacement::hasBetterLayoutPredecessor(
MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,
BranchProbability SuccProb, BranchProbability RealSuccProb,		BranchProbability SuccProb, BranchProbability RealSuccProb,
BlockChain &Chain, const BlockFilterSet *BlockFilter) {		BlockChain &Chain, const BlockFilterSet *BlockFilter) {

// There isn't a better layout when there are no unscheduled predecessors.		// There isn't a better layout when there are no unscheduled predecessors.
if (SuccChain.UnscheduledPredecessors == 0)		if (SuccChain.UnscheduledPredecessors == 0)
return false;		return false;
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::hasBetterLayoutPredecessor(
// Make sure that a hot successor doesn't have a globally more		// Make sure that a hot successor doesn't have a globally more
// important predecessor.		// important predecessor.
BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;		BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;
bool BadCFGConflict = false;		bool BadCFGConflict = false;

for (MachineBasicBlock *Pred : Succ->predecessors()) {		for (MachineBasicBlock *Pred : Succ->predecessors()) {
if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|		if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|
(BlockFilter && !BlockFilter->count(Pred)) \|\|		(BlockFilter && !BlockFilter->count(Pred)) \|\|
BlockToChain[Pred] == &Chain)		BlockToChain[Pred] == &Chain \|\|
		// This check is redundant except for look ahead. This function is
		// called for lookahead by isProfitableToTailDup when BB hasn't been
		// placed yet.
		(Pred == BB))
continue;		continue;
// Do backward checking.		// Do backward checking.
// For all cases above, we need a backward checking to filter out edges that		// For all cases above, we need a backward checking to filter out edges that
// are not 'strongly' biased. With profile data available, the check is		// are not 'strongly' biased.
// mostly redundant for case 2 (when threshold prob is set at 50%) unless S
// has more than two successors.
// BB Pred		// BB Pred
// \ /		// \ /
// Succ		// Succ
// We select edge BB->Succ if		// We select edge BB->Succ if
// freq(BB->Succ) > freq(Succ) * HotProb		// freq(BB->Succ) > freq(Succ) * HotProb
// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *		// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *
// HotProb		// HotProb
// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb		// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb
Show All 19 Lines
/// \brief Select the best successor for a block.		/// \brief Select the best successor for a block.
///		///
/// This looks across all successors of a particular block and attempts to		/// This looks across all successors of a particular block and attempts to
/// select the "best" one to be the layout successor. It only considers direct		/// select the "best" one to be the layout successor. It only considers direct
/// successors which also pass the block filter. It will attempt to avoid		/// successors which also pass the block filter. It will attempt to avoid
/// breaking CFG structure, but cave and break such structures in the case of		/// breaking CFG structure, but cave and break such structures in the case of
/// very hot successor edges.		/// very hot successor edges.
///		///
/// \returns The best successor block found, or null if none are viable.		/// \returns The best successor block found, or null if none are viable, along
MachineBasicBlock *		/// with a boolean indicating if tail duplication is necessary.
		MachineBlockPlacement::BlockAndTailDupResult
MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB,		MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB,
BlockChain &Chain,		BlockChain &Chain,
const BlockFilterSet *BlockFilter) {		const BlockFilterSet *BlockFilter) {
const BranchProbability HotProb(StaticLikelyProb, 100);		const BranchProbability HotProb(StaticLikelyProb, 100);

MachineBasicBlock *BestSucc = nullptr;		BlockAndTailDupResult BestSucc = { nullptr, false };
auto BestProb = BranchProbability::getZero();		auto BestProb = BranchProbability::getZero();

SmallVector<MachineBasicBlock *, 4> Successors;		SmallVector<MachineBasicBlock *, 4> Successors;
auto AdjustedSumProb =		auto AdjustedSumProb =
collectViableSuccessors(BB, Chain, BlockFilter, Successors);		collectViableSuccessors(BB, Chain, BlockFilter, Successors);

DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n");		DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n");

		// For blocks with CFG violations, we may be able to lay them out anyway with
		// tail-duplication. We keep this vector so we can perform the probability
		// calculations the minimum number of times.
		SmallVector<std::tuple<BranchProbability, MachineBasicBlock *>, 4>
		DupCandidates;
for (MachineBasicBlock *Succ : Successors) {		for (MachineBasicBlock *Succ : Successors) {
auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);		auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);
BranchProbability SuccProb =		BranchProbability SuccProb =
getAdjustedProbability(RealSuccProb, AdjustedSumProb);		getAdjustedProbability(RealSuccProb, AdjustedSumProb);

// This heuristic is off by default.		// This heuristic is off by default.
if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb,		if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb,
HotProb))		HotProb)) {
return Succ;		BestSucc.BB = Succ;
		return BestSucc;
		}

BlockChain &SuccChain = *BlockToChain[Succ];		BlockChain &SuccChain = *BlockToChain[Succ];
// Skip the edge \c BB->Succ if block \c Succ has a better layout		// Skip the edge \c BB->Succ if block \c Succ has a better layout
// predecessor that yields lower global cost.		// predecessor that yields lower global cost.
if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,		if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,
Chain, BlockFilter))		Chain, BlockFilter)) {
		// If tail duplication would make Succ profitable, place it.
		if (TailDupPlacement && shouldTailDuplicate(Succ))
		DupCandidates.push_back(std::make_tuple(SuccProb, Succ));
continue;		continue;
		}

DEBUG(		DEBUG(
dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "		dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "
<< SuccProb		<< SuccProb
<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")		<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")
<< "\n");		<< "\n");

if (BestSucc && BestProb >= SuccProb) {		if (BestSucc.BB && BestProb >= SuccProb) {
DEBUG(dbgs() << " Not the best candidate, continuing\n");		DEBUG(dbgs() << " Not the best candidate, continuing\n");
continue;		continue;
}		}

DEBUG(dbgs() << " Setting it as best candidate\n");		DEBUG(dbgs() << " Setting it as best candidate\n");
BestSucc = Succ;		BestSucc.BB = Succ;
BestProb = SuccProb;		BestProb = SuccProb;
}		}
if (BestSucc)		// Handle the tail duplication candidates in order of decreasing probability.
DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc) << "\n");		// Stop at the first one that is profitable. Also stop if they are less
		// profitable than BestSucc. Position is important because we preserve it and
		// prefer first best match. Here we aren't comparing in order, so we capture
		// the position instead.
		if (DupCandidates.size() != 0) {
		auto cmp =
		[](const std::tuple<BranchProbability, MachineBasicBlock *> &a,
		const std::tuple<BranchProbability, MachineBasicBlock *> &b) {
		return std::get<0>(a) > std::get<0>(b);
		};
		std::stable_sort(DupCandidates.begin(), DupCandidates.end(), cmp);
		}
		for(auto &Tup : DupCandidates) {
		BranchProbability DupProb;
		MachineBasicBlock *Succ;
		std::tie(DupProb, Succ) = Tup;
		if (DupProb < BestProb)
		break;
		if (canTailDuplicateUnplacedPreds(BB, Succ, Chain, BlockFilter)
		// If tail duplication gives us fallthrough when we otherwise wouldn't
		// have it, that is a strict gain.
		&& (BestSucc.BB == nullptr
		\|\| isProfitableToTailDup(BB, Succ, BestProb, Chain,
		BlockFilter))) {
		DEBUG(
		dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "
		<< DupProb
		<< " (Tail Duplicate)\n");
		BestSucc.BB = Succ;
		BestSucc.ShouldTailDup = true;
		break;
		}
		}

		if (BestSucc.BB)
		DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB) << "\n");

return BestSucc;		return BestSucc;
}		}

/// \brief Select the best block from a worklist.		/// \brief Select the best block from a worklist.
///		///
/// This looks through the provided worklist as a list of candidate basic		/// This looks through the provided worklist as a list of candidate basic
/// blocks and select the most profitable one to place. The definition of		/// blocks and select the most profitable one to place. The definition of
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	void MachineBlockPlacement::buildChain(
for (;;) {		for (;;) {
assert(BB && "null block found at end of chain in loop.");		assert(BB && "null block found at end of chain in loop.");
assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop.");		assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop.");
assert(*std::prev(Chain.end()) == BB && "BB Not found at end of chain.");		assert(*std::prev(Chain.end()) == BB && "BB Not found at end of chain.");


// Look for the best viable successor if there is one to place immediately		// Look for the best viable successor if there is one to place immediately
// after this block.		// after this block.
MachineBasicBlock *BestSucc = selectBestSuccessor(BB, Chain, BlockFilter);		auto Result = selectBestSuccessor(BB, Chain, BlockFilter);
		MachineBasicBlock* BestSucc = Result.BB;
		bool ShouldTailDup = Result.ShouldTailDup;
		if (TailDupPlacement)
		ShouldTailDup \|= (BestSucc && shouldTailDuplicate(BestSucc));

// If an immediate successor isn't available, look for the best viable		// If an immediate successor isn't available, look for the best viable
// block among those we've identified as not violating the loop's CFG at		// block among those we've identified as not violating the loop's CFG at
// this point. This won't be a fallthrough, but it will increase locality.		// this point. This won't be a fallthrough, but it will increase locality.
if (!BestSucc)		if (!BestSucc)
BestSucc = selectBestCandidateBlock(Chain, BlockWorkList);		BestSucc = selectBestCandidateBlock(Chain, BlockWorkList);
if (!BestSucc)		if (!BestSucc)
BestSucc = selectBestCandidateBlock(Chain, EHPadWorkList);		BestSucc = selectBestCandidateBlock(Chain, EHPadWorkList);

if (!BestSucc) {		if (!BestSucc) {
BestSucc = getFirstUnplacedBlock(Chain, PrevUnplacedBlockIt, BlockFilter);		BestSucc = getFirstUnplacedBlock(Chain, PrevUnplacedBlockIt, BlockFilter);
if (!BestSucc)		if (!BestSucc)
break;		break;

DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "		DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "
"layout successor until the CFG reduces\n");		"layout successor until the CFG reduces\n");
}		}

// Placement may have changed tail duplication opportunities.		// Placement may have changed tail duplication opportunities.
// Check for that now.		// Check for that now.
if (TailDupPlacement && BestSucc) {		if (TailDupPlacement && BestSucc && ShouldTailDup) {
// If the chosen successor was duplicated into all its predecessors,		// If the chosen successor was duplicated into all its predecessors,
// don't bother laying it out, just go round the loop again with BB as		// don't bother laying it out, just go round the loop again with BB as
// the chain end.		// the chain end.
if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,		if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,
BlockFilter, PrevUnplacedBlockIt))		BlockFilter, PrevUnplacedBlockIt))
continue;		continue;
}		}

▲ Show 20 Lines • Show All 875 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::maybeTailDuplicateBlock(
MachineBasicBlock BB, MachineBasicBlock LPred,		MachineBasicBlock BB, MachineBasicBlock LPred,
const BlockChain &Chain, BlockFilterSet *BlockFilter,		const BlockChain &Chain, BlockFilterSet *BlockFilter,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
bool &DuplicatedToLPred) {		bool &DuplicatedToLPred) {

DuplicatedToLPred = false;		DuplicatedToLPred = false;
DEBUG(dbgs() << "Redoing tail duplication for Succ#"		DEBUG(dbgs() << "Redoing tail duplication for Succ#"
<< BB->getNumber() << "\n");		<< BB->getNumber() << "\n");
bool IsSimple = TailDup.isSimpleBB(BB);
// Blocks with single successors don't create additional fallthrough		if (!shouldTailDuplicate(BB))
// opportunities. Don't duplicate them. TODO: When conditional exits are
// analyzable, allow them to be duplicated.
if (!IsSimple && BB->succ_size() == 1)
return false;
if (!TailDup.shouldTailDuplicate(IsSimple, *BB))
return false;		return false;
// This has to be a callback because none of it can be done after		// This has to be a callback because none of it can be done after
// BB is deleted.		// BB is deleted.
bool Removed = false;		bool Removed = false;
auto RemovalCallback =		auto RemovalCallback =
[&](MachineBasicBlock *RemBB) {		[&](MachineBasicBlock *RemBB) {
// Signal to outer function		// Signal to outer function
Removed = true;		Removed = true;
Show All 36 Lines	auto RemovalCallback =

DEBUG(dbgs() << "TailDuplicator deleted block: "		DEBUG(dbgs() << "TailDuplicator deleted block: "
<< getBlockName(RemBB) << "\n");		<< getBlockName(RemBB) << "\n");
};		};
auto RemovalCallbackRef =		auto RemovalCallbackRef =
llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback);		llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback);

SmallVector<MachineBasicBlock *, 8> DuplicatedPreds;		SmallVector<MachineBasicBlock *, 8> DuplicatedPreds;
		bool IsSimple = TailDup.isSimpleBB(BB);
TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred,		TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred,
&DuplicatedPreds, &RemovalCallbackRef);		&DuplicatedPreds, &RemovalCallbackRef);

// Update UnscheduledPredecessors to reflect tail-duplication.		// Update UnscheduledPredecessors to reflect tail-duplication.
DuplicatedToLPred = false;		DuplicatedToLPred = false;
for (MachineBasicBlock *Pred : DuplicatedPreds) {		for (MachineBasicBlock *Pred : DuplicatedPreds) {
// We're only looking for unscheduled predecessors that match the filter.		// We're only looking for unscheduled predecessors that match the filter.
BlockChain* PredChain = BlockToChain[Pred];		BlockChain* PredChain = BlockToChain[Pred];
Show All 24 Lines	bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
F = &MF;		F = &MF;
MBPI = &getAnalysis<MachineBranchProbabilityInfo>();		MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(		MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
getAnalysis<MachineBlockFrequencyInfo>());		getAnalysis<MachineBlockFrequencyInfo>());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TLI = MF.getSubtarget().getTargetLowering();		TLI = MF.getSubtarget().getTargetLowering();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
		MPDT = nullptr;

// Initialize PreferredLoopExit to nullptr here since it may never be set if		// Initialize PreferredLoopExit to nullptr here since it may never be set if
// there are no MachineLoops.		// there are no MachineLoops.
PreferredLoopExit = nullptr;		PreferredLoopExit = nullptr;

if (TailDupPlacement) {		if (TailDupPlacement) {
unsigned TailDupSize = TailDuplicatePlacementThreshold;		MPDT = &getAnalysis<MachinePostDominatorTree>();
		unsigned TailDupSize = TailDupPlacementThreshold;
if (MF.getFunction()->optForSize())		if (MF.getFunction()->optForSize())
TailDupSize = 1;		TailDupSize = 1;
TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);		TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);
}		}

assert(BlockToChain.empty());		assert(BlockToChain.empty());

buildCFGChains();		buildCFGChains();

// Changing the layout can create new tail merging opportunities.		// Changing the layout can create new tail merging opportunities.
TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();		TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
// TailMerge can create jump into if branches that make CFG irreducible for		// TailMerge can create jump into if branches that make CFG irreducible for
// HW that requires structured CFG.		// HW that requires structured CFG.
bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() &&		bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() &&
PassConfig->getEnableTailMerge() &&		PassConfig->getEnableTailMerge() &&
BranchFoldPlacement;		BranchFoldPlacement;
// No tail merging opportunities if the block number is less than four.		// No tail merging opportunities if the block number is less than four.
if (MF.size() > 3 && EnableTailMerge) {		if (MF.size() > 3 && EnableTailMerge) {
unsigned TailMergeSize = TailDuplicatePlacementThreshold + 1;		unsigned TailMergeSize = TailDupPlacementThreshold + 1;
BranchFolder BF(/EnableTailMerge=/true, /CommonHoist=/false, *MBFI,		BranchFolder BF(/EnableTailMerge=/true, /CommonHoist=/false, *MBFI,
*MBPI, TailMergeSize);		*MBPI, TailMergeSize);

if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),		if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),
getAnalysisIfAvailable<MachineModuleInfo>(), MLI,		getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
/AfterBlockPlacement=/true)) {		/AfterBlockPlacement=/true)) {
// Redo the layout if tail merging creates/removes/moves blocks.		// Redo the layout if tail merging creates/removes/moves blocks.
BlockToChain.clear();		BlockToChain.clear();
// Must redo the dominator tree if blocks were changed.		// Must redo the dominator tree if blocks were changed.
MDT->runOnMachineFunction(MF);		MDT->runOnMachineFunction(MF);
		if (MPDT)
		MPDT->runOnMachineFunction(MF);
ChainAllocator.DestroyAll();		ChainAllocator.DestroyAll();
buildCFGChains();		buildCFGChains();
}		}
}		}

optimizeBranches();		optimizeBranches();
alignBlocks();		alignBlocks();

▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-atomic.ll

	; RUN: llc < %s -mtriple=arm64-eabi -asm-verbose=false -verify-machineinstrs -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-eabi -asm-verbose=false -verify-machineinstrs -mcpu=cyclone \| FileCheck %s

	define i32 @val_compare_and_swap(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap(i32* %p, i32 %cmp, i32 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap:			; CHECK-LABEL: val_compare_and_swap:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {			define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {
	; CHECK-LABEL: val_compare_and_swap_from_load:			; CHECK-LABEL: val_compare_and_swap_from_load:
	; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]			; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x0]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x0]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: mov x0, x[[ADDR]]
				; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: mov x0, x[[ADDR]]
				; CHECK-NEXT: ret
	%new = load i32, i32* %pnew			%new = load i32, i32* %pnew
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap_rel:			; CHECK-LABEL: val_compare_and_swap_rel:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stlxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]			; CHECK-NEXT: stlxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acq_rel monotonic			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acq_rel monotonic
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i64 @val_compare_and_swap_64(i64* %p, i64 %cmp, i64 %new) #0 {			define i64 @val_compare_and_swap_64(i64* %p, i64 %cmp, i64 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap_64:			; CHECK-LABEL: val_compare_and_swap_64:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldxr [[RESULT:x[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldxr [[RESULT:x[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], x1			; CHECK-NEXT: cmp [[RESULT]], x1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], x2, [x[[ADDR]]]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], x2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i64* %p, i64 %cmp, i64 %new monotonic monotonic			%pair = cmpxchg i64* %p, i64 %cmp, i64 %new monotonic monotonic
	%val = extractvalue { i64, i1 } %pair, 0			%val = extractvalue { i64, i1 } %pair, 0
	ret i64 %val			ret i64 %val
	}			}

	define i32 @fetch_and_nand(i32* %p) #0 {			define i32 @fetch_and_nand(i32* %p) #0 {
	; CHECK-LABEL: fetch_and_nand:			; CHECK-LABEL: fetch_and_nand:
	; CHECK: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK: [[TRYBB:.?LBB[0-9_]+]]:
	▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-shrink-wrapping.ll

	Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]			; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]
	; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8			; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8
	; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]			; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]
	; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]			; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]
	; CHECK-NEXT: sub w1, w1, #1			; CHECK-NEXT: sub w1, w1, #1
	; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]			; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]
	; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]			; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]
	; DISABLE-NEXT: b [[IFEND_LABEL]]			; CHECK-NEXT: [[IFEND_LABEL]]:
	;
	; DISABLE: [[ELSE_LABEL]]: ; %if.else
	; DISABLE: lsl w0, w1, #1
	;
	; CHECK: [[IFEND_LABEL]]:
	; Epilogue code.			; Epilogue code.
	; CHECK: add sp, sp, #16			; CHECK: add sp, sp, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; ENABLE: [[ELSE_LABEL]]: ; %if.else			; CHECK: [[ELSE_LABEL]]: ; %if.else
	; ENABLE-NEXT: lsl w0, w1, #1			; CHECK-NEXT: lsl w0, w1, #1
	; ENABLE_NEXT: ret			; DISABLE-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
	define i32 @variadicFunc(i32 %cond, i32 %count, ...) #0 {			define i32 @variadicFunc(i32 %cond, i32 %count, ...) #0 {
	entry:			entry:
	%ap = alloca i8*, align 8			%ap = alloca i8*, align 8
	%tobool = icmp eq i32 %cond, 0			%tobool = icmp eq i32 %cond, 0
	br i1 %tobool, label %if.else, label %if.then			br i1 %tobool, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%ap1 = bitcast i8** %ap to i8*			%ap1 = bitcast i8** %ap to i8*
	▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/tail-dup-repeat-worklist.ll

	; RUN: llc -O3 -o - -verify-machineinstrs %s \| FileCheck %s
	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"

	%struct.s1 = type { %struct.s3, %struct.s1 }
	%struct.s2 = type opaque
	%struct.s3 = type { i32 }

	; Function Attrs: nounwind
	define internal fastcc i32 @repeated_dup_worklist(%struct.s1** %pp1, %struct.s2* %p2, i32 %state, i1 %i1_1, i32 %i32_1) unnamed_addr #0 {
	entry:
	br label %while.cond.outer

	; The loop gets laid out:
	; %while.cond.outer
	; %(null)
	; %(null)
	; %dup2
	; and then %dup1 gets chosen as the next block.
	; when dup2 is duplicated into dup1, %worklist could erroneously be placed on
	; the worklist, because all of its current predecessors are now scheduled.
	; However, after dup2 is tail-duplicated, %worklist can't be on the worklist
	; because it now has unscheduled predecessors.q
	; CHECK-LABEL: repeated_dup_worklist
	; CHECK: // %entry
	; CHECK: // %while.cond.outer
	; first %(null) block
	; CHECK: // in Loop:
	; CHECK: ldr
	; CHECK-NEXT: tbnz
	; second %(null) block
	; CHECK: // in Loop:
	; CHECK: // %dup2
	; CHECK: // %worklist
	; CHECK: // %if.then96.i
	while.cond.outer: ; preds = %dup1, %entry
	%progress.0.ph = phi i32 [ 0, %entry ], [ %progress.1, %dup1 ]
	%inc77 = add nsw i32 %progress.0.ph, 1
	%cmp = icmp slt i32 %progress.0.ph, %i32_1
	br i1 %cmp, label %dup2, label %dup1

	dup2: ; preds = %if.then96.i, %worklist, %while.cond.outer
	%progress.1.ph = phi i32 [ 0, %while.cond.outer ], [ %progress.1, %if.then96.i ], [ %progress.1, %worklist ]
	%.pr = load %struct.s1, %struct.s1* %pp1, align 8
	br label %dup1

	dup1: ; preds = %dup2, %while.cond.outer
	%0 = phi %struct.s1* [ %.pr, %dup2 ], [ undef, %while.cond.outer ]
	%progress.1 = phi i32 [ %progress.1.ph, %dup2 ], [ %inc77, %while.cond.outer ]
	br i1 %i1_1, label %while.cond.outer, label %worklist

	worklist: ; preds = %dup1
	%snode94 = getelementptr inbounds %struct.s1, %struct.s1* %0, i64 0, i32 0
	%1 = load %struct.s3, %struct.s3* %snode94, align 8
	%2 = getelementptr inbounds %struct.s3, %struct.s3* %1, i32 0, i32 0
	%3 = load i32, i32* %2, align 4
	%tobool95.i = icmp eq i32 %3, 0
	br i1 %tobool95.i, label %if.then96.i, label %dup2

	if.then96.i: ; preds = %worklist
	call fastcc void @free_s3(%struct.s2* %p2, %struct.s3* %1) #1
	br label %dup2
	}

	; Function Attrs: nounwind
	declare fastcc void @free_s3(%struct.s2, %struct.s3) unnamed_addr #0

	attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a57" "target-features"="+crc,+crypto,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { nounwind }

llvm/trunk/test/CodeGen/AArch64/tbz-tbnz.ll

	; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s

	declare void @t()			declare void @t()

	define void @test1(i32 %a) {			define void @test1(i32 %a) {
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	entry:			entry:
	%sub = add nsw i32 %a, -12			%sub = add nsw i32 %a, -12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:w[0-9]+]], w0, #12			; CHECK: sub [[CMP:w[0-9]+]], w0, #12
	; CHECK: tbz [[CMP]], #31			; CHECK: tbnz [[CMP]], #31

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test2(i64 %a) {			define void @test2(i64 %a) {
	; CHECK-LABEL: @test2			; CHECK-LABEL: @test2
	entry:			entry:
	%sub = add nsw i64 %a, -12			%sub = add nsw i64 %a, -12
	%cmp = icmp slt i64 %sub, 0			%cmp = icmp slt i64 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:x[0-9]+]], x0, #12			; CHECK: sub [[CMP:x[0-9]+]], x0, #12
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	define void @test7(i32 %a) {			define void @test7(i32 %a) {
	; CHECK-LABEL: @test7			; CHECK-LABEL: @test7
	entry:			entry:
	%sub = sub nsw i32 %a, 12			%sub = sub nsw i32 %a, 12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:w[0-9]+]], w0, #12			; CHECK: sub [[CMP:w[0-9]+]], w0, #12
	; CHECK: tbz [[CMP]], #31			; CHECK: tbnz [[CMP]], #31

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	}			}

	define void @test9(i64 %val1) {			define void @test9(i64 %val1) {
	; CHECK-LABEL: @test9			; CHECK-LABEL: @test9
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test10(i64 %val1) {			define void @test10(i64 %val1) {
	; CHECK-LABEL: @test10			; CHECK-LABEL: @test10
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test11(i64 %val1, i64* %ptr) {			define void @test11(i64 %val1, i64* %ptr) {
	; CHECK-LABEL: @test11			; CHECK-LABEL: @test11

	; CHECK: ldr [[CMP:x[0-9]+]], [x1]			; CHECK: ldr [[CMP:x[0-9]+]], [x1]
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	%val = load i64, i64* %ptr			%val = load i64, i64* %ptr
	%tst = icmp slt i64 %val, 0			%tst = icmp slt i64 %val, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test12(i64 %val1) {			define void @test12(i64 %val1) {
	; CHECK-LABEL: @test12			; CHECK-LABEL: @test12
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test13(i64 %val1, i64 %val2) {			define void @test13(i64 %val1, i64 %val2) {
	; CHECK-LABEL: @test13			; CHECK-LABEL: @test13
	%or = or i64 %val1, %val2			%or = or i64 %val1, %val2
	%tst = icmp slt i64 %or, 0			%tst = icmp slt i64 %or, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK: orr [[CMP:x[0-9]+]], x0, x1			; CHECK: orr [[CMP:x[0-9]+]], x0, x1
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/branch-relaxation.ll

	Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND

	; GCN-NEXT: [[BB3]]: ; %bb3			; GCN-NEXT: [[BB3]]: ; %bb3
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @expand_requires_expand(i32 %cond0) #0 {			define void @expand_requires_expand(i32 %cond0) #0 {
	bb0:			bb0:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
	%cmp0 = icmp slt i32 %cond0, 0			%cmp0 = icmp slt i32 %cond0, 0
	br i1 %cmp0, label %bb2, label %bb1			br i1 %cmp0, label %bb2, label %bb1

	bb1:			bb1:
	%val = load volatile i32, i32 addrspace(2)* undef			%val = load volatile i32, i32 addrspace(2)* undef
	%cmp1 = icmp eq i32 %val, 3			%cmp1 = icmp eq i32 %val, 3
	br i1 %cmp1, label %bb3, label %bb2			br i1 %cmp1, label %bb3, label %bb2

	bb2:			bb2:
	call void asm sideeffect			call void asm sideeffect
	"v_nop_e64			"v_nop_e64
	v_nop_e64			v_nop_e64
	v_nop_e64			v_nop_e64
	v_nop_e64", ""() #0			v_nop_e64", ""() #0
	br label %bb3			br label %bb3

	bb3:			bb3:
				; These NOPs prevent tail-duplication-based outlining
				; from firing, which defeats the need to expand the branches and this test.
				call void asm sideeffect
				"v_nop_e64", ""() #0
				call void asm sideeffect
				"v_nop_e64", ""() #0
	ret void			ret void
	}			}

	; Requires expanding of required skip branch.			; Requires expanding of required skip branch.

	; GCN-LABEL: {{^}}uniform_inside_divergent:			; GCN-LABEL: {{^}}uniform_inside_divergent:
	; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}			; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
	; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc			; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
	Show All 13 Lines
	; GCN: s_cbranch_scc1 [[ENDIF]]			; GCN: s_cbranch_scc1 [[ENDIF]]

	; GCN-NEXT: ; BB#2: ; %if_uniform			; GCN-NEXT: ; BB#2: ; %if_uniform
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)

	; GCN-NEXT: [[ENDIF]]: ; %endif			; GCN-NEXT: [[ENDIF]]: ; %endif
	; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]			; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]
				; GCN-NEXT: s_sleep 5
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) #0 {			define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%d_cmp = icmp ult i32 %tid, 16			%d_cmp = icmp ult i32 %tid, 16
	br i1 %d_cmp, label %if, label %endif			br i1 %d_cmp, label %if, label %endif

	if:			if:
	store i32 0, i32 addrspace(1)* %out			store i32 0, i32 addrspace(1)* %out
	%u_cmp = icmp eq i32 %cond, 0			%u_cmp = icmp eq i32 %cond, 0
	br i1 %u_cmp, label %if_uniform, label %endif			br i1 %u_cmp, label %if_uniform, label %endif

	if_uniform:			if_uniform:
	store i32 1, i32 addrspace(1)* %out			store i32 1, i32 addrspace(1)* %out
	br label %endif			br label %endif

	endif:			endif:
				; layout can remove the split branch if it can copy the return block.
				; This call makes the return block long enough that it doesn't get copied.
				call void @llvm.amdgcn.s.sleep(i32 5);
	ret void			ret void
	}			}

	; si_mask_branch			; si_mask_branch
	; s_cbranch_execz			; s_cbranch_execz
	; s_branch			; s_branch

	; GCN-LABEL: {{^}}analyze_mask_branch:			; GCN-LABEL: {{^}}analyze_mask_branch:
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/uniform-cfg.ll

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	ENDIF: ; preds = %IF, %main_body
ret void		ret void
}		}

; GCN-LABEL: {{^}}icmp_users_different_blocks:		; GCN-LABEL: {{^}}icmp_users_different_blocks:
; GCN: s_load_dword [[COND:s[0-9]+]]		; GCN: s_load_dword [[COND:s[0-9]+]]
; GCN: s_cmp_lt_i32 [[COND]], 1		; GCN: s_cmp_lt_i32 [[COND]], 1
; GCN: s_cbranch_scc1 [[EXIT:[A-Za-z0-9_]+]]		; GCN: s_cbranch_scc1 [[EXIT:[A-Za-z0-9_]+]]
; GCN: v_cmp_gt_i32_e64 vcc, [[COND]], 0{{$}}		; GCN: v_cmp_gt_i32_e64 vcc, [[COND]], 0{{$}}
; GCN: s_cbranch_vccnz [[EXIT]]		; GCN: s_cbranch_vccz [[BODY:[A-Za-z0-9_]+]]
; GCN: buffer_store
; GCN: {{^}}[[EXIT]]:		; GCN: {{^}}[[EXIT]]:
; GCN: s_endpgm		; GCN: s_endpgm
		; GCN: {{^}}[[BODY]]:
		; GCN: buffer_store
		; GCN: s_endpgm
define void @icmp_users_different_blocks(i32 %cond0, i32 %cond1, i32 addrspace(1)* %out) {		define void @icmp_users_different_blocks(i32 %cond0, i32 %cond1, i32 addrspace(1)* %out) {
bb:		bb:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
%cmp0 = icmp sgt i32 %cond0, 0		%cmp0 = icmp sgt i32 %cond0, 0
%cmp1 = icmp sgt i32 %cond1, 0		%cmp1 = icmp sgt i32 %cond1, 0
br i1 %cmp0, label %bb2, label %bb9		br i1 %cmp0, label %bb2, label %bb9

bb2: ; preds = %bb		bb2: ; preds = %bb
Show All 30 Lines
}		}

; Test uniform and divergent.		; Test uniform and divergent.

; GCN-LABEL: {{^}}uniform_inside_divergent:		; GCN-LABEL: {{^}}uniform_inside_divergent:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: s_cbranch_execz [[ENDIF_LABEL:[0-9_A-Za-z]+]]
; GCN: s_cmp_lg_u32 {{s[0-9]+}}, 0		; GCN: s_cmp_lg_u32 {{s[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[ENDIF_LABEL]]		; GCN: s_cbranch_scc0 [[IF_UNIFORM_LABEL:[A-Z0-9_a-z]+]]
		; GCN: s_endpgm
		; GCN: {{^}}[[IF_UNIFORM_LABEL]]:
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) {		define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
%d_cmp = icmp ult i32 %tid, 16		%d_cmp = icmp ult i32 %tid, 16
br i1 %d_cmp, label %if, label %endif		br i1 %d_cmp, label %if, label %endif

if:		if:
store i32 0, i32 addrspace(1)* %out		store i32 0, i32 addrspace(1)* %out
%u_cmp = icmp eq i32 %cond, 0		%u_cmp = icmp eq i32 %cond, 0
br i1 %u_cmp, label %if_uniform, label %endif		br i1 %u_cmp, label %if_uniform, label %endif

if_uniform:		if_uniform:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
br label %endif		br label %endif

endif:		endif:
ret void		ret void
}		}

; GCN-LABEL: {{^}}divergent_inside_uniform:		; GCN-LABEL: {{^}}divergent_inside_uniform:
; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0		; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[ENDIF_LABEL:[0-9_A-Za-z]+]]		; GCN: s_cbranch_scc0 [[IF_LABEL:[0-9_A-Za-z]+]]
		; GCN: [[IF_LABEL]]:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
; GCN: [[ENDIF_LABEL]]:
; GCN: s_endpgm
define void @divergent_inside_uniform(i32 addrspace(1)* %out, i32 %cond) {		define void @divergent_inside_uniform(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%u_cmp = icmp eq i32 %cond, 0		%u_cmp = icmp eq i32 %cond, 0
br i1 %u_cmp, label %if, label %endif		br i1 %u_cmp, label %if, label %endif

if:		if:
store i32 0, i32 addrspace(1)* %out		store i32 0, i32 addrspace(1)* %out
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
Show All 11 Lines
; GCN-LABEL: {{^}}divergent_if_uniform_if:		; GCN-LABEL: {{^}}divergent_if_uniform_if:
; GCN: v_cmp_eq_u32_e32 vcc, 0, v0		; GCN: v_cmp_eq_u32_e32 vcc, 0, v0
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
; GCN: s_or_b64 exec, exec, [[MASK]]		; GCN: s_or_b64 exec, exec, [[MASK]]
; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0		; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[EXIT:[A-Z0-9_]+]]		; GCN: s_cbranch_scc0 [[IF_UNIFORM:[A-Z0-9_]+]]
		; GCN: s_endpgm
		; GCN: [[IF_UNIFORM]]:
; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2		; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2
; GCN: buffer_store_dword [[TWO]]		; GCN: buffer_store_dword [[TWO]]
; GCN: [[EXIT]]:
; GCN: s_endpgm
define void @divergent_if_uniform_if(i32 addrspace(1)* %out, i32 %cond) {		define void @divergent_if_uniform_if(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
%d_cmp = icmp eq i32 %tid, 0		%d_cmp = icmp eq i32 %tid, 0
br i1 %d_cmp, label %if, label %endif		br i1 %d_cmp, label %if, label %endif

if:		if:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; V8-LABEL: %tailrecurse.switch			; V8-LABEL: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: bne			; V8-NEXT: beq
	; V8-NEXT: b			; V8-NEXT: %sw.epilog
	; The trailing space in the last line checks that the branch is unconditional			; V8-NEXT: bx lr
	switch i32 %and, label %sw.epilog [			switch i32 %and, label %sw.epilog [
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 3, label %sw.bb6			i32 3, label %sw.bb6
	i32 2, label %sw.bb8			i32 2, label %sw.bb8
	], !prof !1			], !prof !1

	sw.bb: ; preds = %tailrecurse.switch, %tailrecurse			sw.bb: ; preds = %tailrecurse.switch, %tailrecurse
	%shl = shl i32 %acc.tr, 1			%shl = shl i32 %acc.tr, 1
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/atomic-op.ll

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	; CHECK-NOT: dmb ish			; CHECK-NOT: dmb ish
	; CHECK: [[LOOP_BB:\.?LBB[0-9]+_1]]:			; CHECK: [[LOOP_BB:\.?LBB[0-9]+_1]]:
	; CHECK: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]			; CHECK: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
	; CHECK: cmp [[OLDVAL]], r1			; CHECK: cmp [[OLDVAL]], r1
	; CHECK: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]			; CHECK: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]
	; CHECK: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]			; CHECK: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]
	; CHECK: cmp [[SUCCESS]], #0			; CHECK: cmp [[SUCCESS]], #0
	; CHECK: bne [[LOOP_BB]]			; CHECK: bne [[LOOP_BB]]
	; CHECK: b [[END_BB:\.?LBB[0-9]+_[0-9]+]]			; CHECK: dmb ish
				; CHECK: bx lr
	; CHECK: [[FAIL_BB]]:			; CHECK: [[FAIL_BB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[END_BB]]:
	; CHECK: dmb ish			; CHECK: dmb ish
	; CHECK: bx lr			; CHECK: bx lr

	ret i32 %oldval			ret i32 %oldval
	}			}

	define i32 @load_load_add_acquire(i32* %mem1, i32* %mem2) nounwind {			define i32 @load_load_add_acquire(i32* %mem1, i32* %mem2) nounwind {
	; CHECK-LABEL: load_load_add_acquire			; CHECK-LABEL: load_load_add_acquire
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/atomic-ops-v8.ll

	Show First 20 Lines • Show All 1,039 Lines • ▼ Show 20 Lines
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0			; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-ARM-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]			; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
				; CHECK-ARM-NEXT: bx lr
	ret i8 %old			ret i8 %old
	}			}

	define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {			define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i16:			; CHECK-LABEL: test_atomic_cmpxchg_i16:
	%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst			%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst
	%old = extractvalue { i16, i1 } %pair, 0			%old = extractvalue { i16, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16			; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16
	; CHECK-DAG: movt r[[ADDR]], :upper16:var16			; CHECK-DAG: movt r[[ADDR]], :upper16:var16
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0			; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-ARM-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]			; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
				; CHECK-ARM-NEXT: bx lr
	ret i16 %old			ret i16 %old
	}			}

	define void @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {			define void @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i32:			; CHECK-LABEL: test_atomic_cmpxchg_i32:
	%pair = cmpxchg i32* @var32, i32 %wanted, i32 %new release monotonic			%pair = cmpxchg i32* @var32, i32 %wanted, i32 %new release monotonic
	%old = extractvalue { i32, i1 } %pair, 0			%old = extractvalue { i32, i1 } %pair, 0
	store i32 %old, i32* @var32			store i32 %old, i32* @var32
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var32			; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var32
	; CHECK: movt r[[ADDR]], :upper16:var32			; CHECK: movt r[[ADDR]], :upper16:var32

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldrex r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldrex r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-NEXT: cmp r[[OLD]], r0			; CHECK-NEXT: cmp r[[OLD]], r0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlex [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlex [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK: str{{(.w)?}} r[[OLD]],
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK-NEXT: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK: str{{(.w)?}} r[[OLD]],			; CHECK: str{{(.w)?}} r[[OLD]],
				; CHECK-ARM-NEXT: bx lr
	ret void			ret void
	}			}

	define void @test_atomic_cmpxchg_i64(i64 %wanted, i64 %new) nounwind {			define void @test_atomic_cmpxchg_i64(i64 %wanted, i64 %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i64:			; CHECK-LABEL: test_atomic_cmpxchg_i64:
	%pair = cmpxchg i64* @var64, i64 %wanted, i64 %new monotonic monotonic			%pair = cmpxchg i64* @var64, i64 %wanted, i64 %new monotonic monotonic
	%old = extractvalue { i64, i1 } %pair, 0			%old = extractvalue { i64, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var64			; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var64
	; CHECK: movt r[[ADDR]], :upper16:var64			; CHECK: movt r[[ADDR]], :upper16:var64

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldrexd [[OLD1:r[0-9]+\|lr]], [[OLD2:r[0-9]+\|lr]], [r[[ADDR]]]			; CHECK: ldrexd [[OLD1:r[0-9]+\|lr]], [[OLD2:r[0-9]+\|lr]], [r[[ADDR]]]
	; r0, r1 below is a reasonable guess but could change: it certainly comes into the			; r0, r1 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0			; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0
	; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1			; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1
	; CHECK-ARM-LE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-ARM-LE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-THUMB-LE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_HI]], [[MISMATCH_LO]]			; CHECK-THUMB-LE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_HI]], [[MISMATCH_LO]]
	; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1			; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1
	; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0			; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0
	; CHECK-ARM-BE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_HI]], [[MISMATCH_LO]]			; CHECK-ARM-BE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_HI]], [[MISMATCH_LO]]
	; CHECK-THUMB-BE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-THUMB-BE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r2, r3 is a reasonable guess.			; As above, r2, r3 is a reasonable guess.
	; CHECK: strexd [[STATUS:r[0-9]+]], r2, r3, [r[[ADDR]]]			; CHECK: strexd [[STATUS:r[0-9]+]], r2, r3, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK-NEXT: pop
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]			; CHECK-ARM: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]
	store i64 %old, i64* @var64			store i64 %old, i64* @var64
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/cmpxchg-weak.ll

	; RUN: llc < %s -mtriple=armv7-apple-ios -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-apple-ios -verify-machineinstrs \| FileCheck %s

	define void @test_cmpxchg_weak(i32 *%addr, i32 %desired, i32 %new) {			define void @test_cmpxchg_weak(i32 *%addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: test_cmpxchg_weak:			; CHECK-LABEL: test_cmpxchg_weak:

	%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic			%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic
	%oldval = extractvalue { i32, i1 } %pair, 0			%oldval = extractvalue { i32, i1 } %pair, 0
	; CHECK-NEXT: BB#0:			; CHECK-NEXT: BB#0:
	; CHECK-NEXT: ldrex [[LOADED:r[0-9]+]], [r0]			; CHECK-NEXT: ldrex [[LOADED:r[0-9]+]], [r0]
	; CHECK-NEXT: cmp [[LOADED]], r1			; CHECK-NEXT: cmp [[LOADED]], r1
	; CHECK-NEXT: bne [[LDFAILBB:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: bne [[LDFAILBB:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: BB#1:			; CHECK-NEXT: BB#1:
	; CHECK-NEXT: dmb ish			; CHECK-NEXT: dmb ish
	; CHECK-NEXT: strex [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-NEXT: strex [[SUCCESS:r[0-9]+]], r2, [r0]
	; CHECK-NEXT: cmp [[SUCCESS]], #0			; CHECK-NEXT: cmp [[SUCCESS]], #0
	; CHECK-NEXT: bne [[FAILBB:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: beq [[SUCCESSBB:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; CHECK-NEXT: dmb ish
	; CHECK-NEXT: str r3, [r0]			; CHECK-NEXT: str r3, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	; CHECK-NEXT: [[LDFAILBB]]:			; CHECK-NEXT: [[LDFAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: str r3, [r0]
				; CHECK-NEXT: bx lr
				; CHECK-NEXT: [[SUCCESSBB]]:
				; CHECK-NEXT: dmb ish
	; CHECK-NEXT: str r3, [r0]			; CHECK-NEXT: str r3, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr

	store i32 %oldval, i32* %addr			store i32 %oldval, i32* %addr
	ret void			ret void
	}			}


	Show All 26 Lines

llvm/trunk/test/CodeGen/Mips/brconnez.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16

	@j = global i32 0, align 4			@j = global i32 0, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @j, align 4			%0 = load i32, i32* @j, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end, !prof !1

	; 16: bnez ${{[0-9]+}}, $[[LABEL:[0-9A-Ba-b_]+]]			; 16: bnez ${{[0-9]+}}, $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: lw ${{[0-9]+}}, %got(result)(${{[0-9]+}})			; 16: lw ${{[0-9]+}}, %got(result)(${{[0-9]+}})
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	ret void			ret void
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

llvm/trunk/test/CodeGen/Mips/micromips-compact-branches.ll

	; RUN: llc %s -march=mipsel -mattr=micromips -filetype=asm -O3 \			; RUN: llc %s -march=mipsel -mattr=micromips -filetype=asm -O3 \
	; RUN: -disable-mips-delay-filler -relocation-model=pic -o - \| FileCheck %s			; RUN: -disable-mips-delay-filler -relocation-model=pic -o - \| FileCheck %s

	define void @main() nounwind uwtable {			define void @main() nounwind uwtable {
	entry:			entry:
	%x = alloca i32, align 4			%x = alloca i32, align 4
	%0 = load i32, i32* %x, align 4			%0 = load i32, i32* %x, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end, !prof !1

	if.then:			if.then:
	store i32 10, i32* %x, align 4			store i32 10, i32* %x, align 4
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	; CHECK: bnezc			; CHECK: bnezc
				!1 = !{!"branch_weights", i32 2, i32 1}

llvm/trunk/test/CodeGen/PowerPC/misched-inorder-latency.ll

	Show All 11 Lines
	; CHECK: addi			; CHECK: addi
	; CHECK: bne			; CHECK: bne
	; CHECK: %true			; CHECK: %true
	define i32 @testload(i32 *%ptr, i32 %sumin) {			define i32 @testload(i32 *%ptr, i32 %sumin) {
	entry:			entry:
	%sum1 = add i32 %sumin, 1			%sum1 = add i32 %sumin, 1
	%val1 = load i32, i32* %ptr			%val1 = load i32, i32* %ptr
	%p = icmp eq i32 %sumin, 0			%p = icmp eq i32 %sumin, 0
	br i1 %p, label %true, label %end			br i1 %p, label %true, label %end, !prof !1
	true:			true:
	%sum2 = add i32 %sum1, 1			%sum2 = add i32 %sum1, 1
	%ptr2 = getelementptr i32, i32* %ptr, i32 1			%ptr2 = getelementptr i32, i32* %ptr, i32 1
	%val = load i32, i32* %ptr2			%val = load i32, i32* %ptr2
	%val2 = add i32 %val1, %val			%val2 = add i32 %val1, %val
	br label %end			br label %end
	end:			end:
	%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]			%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]
	Show All 19 Lines
	true:			true:
	%val2 = add i32 %val1, 1			%val2 = add i32 %val1, 1
	br label %end			br label %end
	end:			end:
	%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]			%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]
	ret i32 %valmerge			ret i32 %valmerge
	}			}
	declare void @llvm.prefetch(i8*, i32, i32, i32) nounwind			declare void @llvm.prefetch(i8*, i32, i32, i32) nounwind

				!1 = !{!"branch_weights", i32 2, i32 1}

llvm/trunk/test/CodeGen/PowerPC/tail-dup-break-cfg.ll

				; RUN: llc -O2 -o - %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-grtev4-linux-gnu"

				; Intended layout:
				; The code for tail-duplication during layout will produce the layout:
				; test1
				; test2
				; body1 (with copy of test2)
				; body2
				; exit

				;CHECK-LABEL: tail_dup_break_cfg:
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 12, 1, [[BODY1LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: b [[BODY2LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: [[BODY1LABEL]]
				;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL]]
				;CHECK-NEXT: [[BODY2LABEL]]
				;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				define void @tail_dup_break_cfg(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %body1, !prof !1 ; %test2 more likely
				body1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %exit, label %body2, !prof !1 ; %exit more likely
				body2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %exit
				exit:
				ret void
				}

				; The branch weights here hint that we shouldn't tail duplicate in this case.
				;CHECK-LABEL: tail_dup_dont_break_cfg:
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 4, 1, [[TEST2LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %body1
				;CHECK: [[TEST2LABEL]]: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %body2
				;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				define void @tail_dup_dont_break_cfg(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %body1, !prof !1 ; %test2 more likely
				body1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp ne i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %body2, label %exit, !prof !1 ; %body2 more likely
				body2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %exit
				exit:
				ret void
				}
				declare void @a()
				declare void @b()
				declare void @c()
				declare void @d()

				; This function arranges for the successors of %succ to have already been laid
				; out. When we consider whether to lay out succ after bb and to tail-duplicate
				; it, v and ret have already been placed, so we tail-duplicate as it removes a
				; branch and strictly increases fallthrough
				; CHECK-LABEL: tail_dup_no_succ
				; CHECK: # %entry
				; CHECK: # %v
				; CHECK: # %ret
				; CHECK: # %bb
				; CHECK: # %succ
				; CHECK: # %c
				; CHECK: bl c
				; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
				; CHECK: beq
				; CHECK: b
				define void @tail_dup_no_succ(i32 %tag) {
				entry:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %v, label %bb, !prof !2 ; %v very much more likely
				bb:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %succ, label %c, !prof !3 ; %succ more likely
				c:
				call void @c()
				call void @c()
				br label %succ
				succ:
				%tagbit3 = and i32 %tag, 4
				%tagbit3eq0 = icmp eq i32 %tagbit3, 0
				br i1 %tagbit3eq0, label %ret, label %v, !prof !1 ; %u more likely
				v:
				call void @d()
				call void @d()
				br label %ret
				ret:
				ret void
				}


				!1 = !{!"branch_weights", i32 5, i32 3}
				!2 = !{!"branch_weights", i32 95, i32 5}
				!3 = !{!"branch_weights", i32 7, i32 3}

llvm/trunk/test/CodeGen/SPARC/sjlj.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK: or %i1, %lo(.LBB1_2), %i1			; CHECK: or %i1, %lo(.LBB1_2), %i1
	; CHECK: st %i1, [%i0+4]			; CHECK: st %i1, [%i0+4]
	; CHECK: st %sp, [%i0+8]			; CHECK: st %sp, [%i0+8]
	; CHECK: bn .LBB1_2			; CHECK: bn .LBB1_2
	; CHECK: st %i7, [%i0+12]			; CHECK: st %i7, [%i0+12]
	; CHECK: ba .LBB1_1			; CHECK: ba .LBB1_1
	; CHECK: nop			; CHECK: nop
	; CHECK:.LBB1_1: ! %entry			; CHECK:.LBB1_1: ! %entry
	; CHECK: ba .LBB1_3
	; CHECK: mov %g0, %i0			; CHECK: mov %g0, %i0
				; CHECK: cmp %i0, 0
				; CHECK: bne .LBB1_4
				; CHECK: ba .LBB1_5
	; CHECK:.LBB1_2: ! Block address taken			; CHECK:.LBB1_2: ! Block address taken
	; CHECK: mov 1, %i0			; CHECK: mov 1, %i0
	; CHECK:.LBB1_3: ! %entry
	; CHECK: cmp %i0, 0
	; CHECK: be .LBB1_5			; CHECK: be .LBB1_5
	; CHECK: nop			; CHECK:.LBB1_4:
				; CHECK: ba .LBB1_6
	}			}
	declare i8* @llvm.frameaddress(i32) #2			declare i8* @llvm.frameaddress(i32) #2

	declare i8* @llvm.stacksave() #3			declare i8* @llvm.stacksave() #3

	declare i32 @llvm.eh.sjlj.setjmp(i8*) #3			declare i32 @llvm.eh.sjlj.setjmp(i8*) #3

	attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { noreturn nounwind }			attributes #1 = { noreturn nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

llvm/trunk/test/CodeGen/SystemZ/int-cmp-44.ll

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: brasl %r14, foo@PLT			; CHECK-NEXT: brasl %r14, foo@PLT
	; CHECK-NEXT: cijlh [[REG]], 0, .L{{.*}}			; CHECK-NEXT: cijlh [[REG]], 0, .L{{.*}}
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i32 , i32 *%ptr			%val = load i32 , i32 *%ptr
	%xor = xor i32 %val, 1			%xor = xor i32 %val, 1
	%add = add i32 %xor, 1000000			%add = add i32 %xor, 1000000
	call void @foo()			call void @foo()
	%cmp = icmp ne i32 %add, 0			%cmp = icmp eq i32 %add, 0
	br i1 %cmp, label %exit, label %store			br i1 %cmp, label %store, label %exit, !prof !1

	store:			store:
	store i32 %add, i32 *%ptr			store i32 %add, i32 *%ptr
	br label %exit			br label %exit

	exit:			exit:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 397 Lines • ▼ Show 20 Lines

	store:			store:
	store i64 %res, i64 *%dest			store i64 %res, i64 *%dest
	br label %exit			br label %exit

	exit:			exit:
	ret i64 %res			ret i64 %res
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll

	; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumb-macho \			; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumb-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V4T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V4T
	; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumbv5-macho \			; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumbv5-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V5T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V5T
	; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumb-macho \			; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumb-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V4T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V4T
	; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumbv5-macho \			; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumbv5-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V5T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V5T

	;			;
	; Note: Lots of tests use inline asm instead of regular calls.			; Note: Lots of tests use inline asm instead of regular calls.
	; This allows to have a better control on what the allocation will do.			; This allows to have a better control on what the allocation will do.
	; Otherwise, we may have spill right in the entry block, defeating			; Otherwise, we may have spill right in the entry block, defeating
	; shrink-wrapping. Moreover, some of the inline asm statements (nop)			; shrink-wrapping. Moreover, some of the inline asm statements (nop)
	; are here to ensure that the related paths do not end up as critical			; are here to ensure that the related paths do not end up as critical
	; edges.			; edges.
	; Also disable the late if-converter as it makes harder to reason on			; Also disable the late if-converter as it makes harder to reason on
	; the diffs.			; the diffs.
				; Disable tail-duplication during placement, as v4t vs v5t get different
				; results due to branches not being analyzable under v5

	; Initial motivating example: Simple diamond with a call just on one side.			; Initial motivating example: Simple diamond with a call just on one side.
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; No prologue needed.			; No prologue needed.
	; ENABLE: cmp r0, r1			; ENABLE: cmp r0, r1
	; ENABLE-NEXT: bge [[EXIT_LABEL:LBB[0-9_]+]]			; ENABLE-NEXT: bge [[EXIT_LABEL:LBB[0-9_]+]]
	▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb2/cbnz.ll

Show All 20 Lines	t:
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
; CHECK: cbnz		; CHECK: cbz
%q = icmp eq i32 %y, 0		%q = icmp eq i32 %y, 0
br i1 %q, label %t2, label %f		br i1 %q, label %t2, label %f

t2:		t2:
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
Show All 17 Lines

llvm/trunk/test/CodeGen/Thumb2/ifcvt-compare.ll

	; RUN: llc -mtriple=thumbv7-unknown-linux %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv7-unknown-linux %s -o - \| FileCheck %s

	declare void @x()			declare void @x()

	define void @f0(i32 %x) optsize {			define void @f0(i32 %x) optsize {
	; CHECK-LABEL: f0:			; CHECK-LABEL: f0:
	; CHECK: cbnz			; CHECK: cbz
	%p = icmp eq i32 %x, 0			%p = icmp eq i32 %x, 0
	br i1 %p, label %t, label %f			br i1 %p, label %t, label %f

	t:			t:
	call void @x()			call void @x()
	br label %f			br label %f

	f:			f:
	Show All 34 Lines

llvm/trunk/test/CodeGen/Thumb2/v8_IT_4.ll

	; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -arm-restrict-it \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard -regalloc=basic \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard -regalloc=basic \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -regalloc=basic -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -regalloc=basic -arm-restrict-it \| FileCheck %s

	%"struct.__gnu_cxx::__normal_iterator<char,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" = type { i8 }			%"struct.__gnu_cxx::__normal_iterator<char,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" = type { i8 }
	%"struct.__gnu_cxx::new_allocator<char>" = type <{ i8 }>			%"struct.__gnu_cxx::new_allocator<char>" = type <{ i8 }>
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >" = type { %"struct.__gnu_cxx::__normal_iterator<char*,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >" = type { %"struct.__gnu_cxx::__normal_iterator<char*,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" }
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep" = type { %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep" = type { %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" }
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" = type { i32, i32, i32 }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" = type { i32, i32, i32 }


	define weak arm_aapcs_vfpcc i32 @_ZNKSs7compareERKSs(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this, %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) {			define weak arm_aapcs_vfpcc i32 @_ZNKSs7compareERKSs(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this, %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) {
	; CHECK-LABEL: _ZNKSs7compareERKSs:			; CHECK-LABEL: _ZNKSs7compareERKSs:
	; CHECK: cbnz r0,			; CHECK: cbz r0,
				; CHECK-NEXT: %bb1
				; CHECK-NEXT: pop.w
	; CHECK-NEXT: %bb			; CHECK-NEXT: %bb
	; CHECK-NEXT: sub{{(.w)?}} r0, r{{[0-9]+}}, r{{[0-9]+}}			; CHECK-NEXT: sub{{(.w)?}} r0, r{{[0-9]+}}, r{{[0-9]+}}
	; CHECK-NEXT: %bb1
	; CHECK-NEXT: pop.w			; CHECK-NEXT: pop.w
	entry:			entry:
	%0 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i32> [#uses=3]			%0 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i32> [#uses=3]
	%1 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i32> [#uses=3]			%1 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i32> [#uses=3]
	%2 = icmp ult i32 %1, %0 ; <i1> [#uses=1]			%2 = icmp ult i32 %1, %0 ; <i1> [#uses=1]
	%3 = select i1 %2, i32 %1, i32 %0 ; <i32> [#uses=1]			%3 = select i1 %2, i32 %1, i32 %0 ; <i32> [#uses=1]
	%4 = tail call arm_aapcs_vfpcc i8* @_ZNKSs7_M_dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i8*> [#uses=1]			%4 = tail call arm_aapcs_vfpcc i8* @_ZNKSs7_M_dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i8*> [#uses=1]
	%5 = tail call arm_aapcs_vfpcc i8* @_ZNKSs4dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i8*> [#uses=1]			%5 = tail call arm_aapcs_vfpcc i8* @_ZNKSs4dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i8*> [#uses=1]
	Show All 19 Lines

llvm/trunk/test/CodeGen/WebAssembly/phi.ll

	; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -verify-machineinstrs \| FileCheck %s

	; Test that phis are lowered.			; Test that phis are lowered.

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	; Basic phi triangle.			; Basic phi triangle.

	; CHECK-LABEL: test0:			; CHECK-LABEL: test0:
	; CHECK: div_s $[[NUM0:[0-9]+]]=, $0, $pop[[NUM1:[0-9]+]]{{$}}			; CHECK: return $0
	; CHECK: return $[[NUM0]]{{$}}			; CHECK: div_s $push[[NUM0:[0-9]+]]=, $0, $pop[[NUM1:[0-9]+]]{{$}}
				; CHECK: return $pop[[NUM0]]{{$}}
	define i32 @test0(i32 %p) {			define i32 @test0(i32 %p) {
	entry:			entry:
	%t = icmp slt i32 %p, 0			%t = icmp slt i32 %p, 0
	br i1 %t, label %true, label %done			br i1 %t, label %true, label %done
	true:			true:
	%a = sdiv i32 %p, 3			%a = sdiv i32 %p, 3
	br label %done			br label %done
	done:			done:
	Show All 27 Lines

llvm/trunk/test/CodeGen/X86/avx512-cmp.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	}			}

	define float @test5(float %p) #0 {			define float @test5(float %p) #0 {
	; ALL-LABEL: test5:			; ALL-LABEL: test5:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1			; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; ALL-NEXT: vucomiss %xmm1, %xmm0			; ALL-NEXT: vucomiss %xmm1, %xmm0
	; ALL-NEXT: jne LBB3_1			; ALL-NEXT: jne LBB3_1
	; ALL-NEXT: jnp LBB3_2			; ALL-NEXT: jp LBB3_1
				; ALL-NEXT: ## BB#2: ## %return
				; ALL-NEXT: retq
	; ALL-NEXT: LBB3_1: ## %if.end			; ALL-NEXT: LBB3_1: ## %if.end
	; ALL-NEXT: seta %al			; ALL-NEXT: seta %al
	; ALL-NEXT: movzbl %al, %eax			; ALL-NEXT: movzbl %al, %eax
	; ALL-NEXT: leaq {{.*}}(%rip), %rcx			; ALL-NEXT: leaq {{.*}}(%rip), %rcx
	; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; ALL-NEXT: LBB3_2: ## %return
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%cmp = fcmp oeq float %p, 0.000000e+00			%cmp = fcmp oeq float %p, 0.000000e+00
	br i1 %cmp, label %return, label %if.end			br i1 %cmp, label %return, label %if.end

	if.end: ; preds = %entry			if.end: ; preds = %entry
	%cmp1 = fcmp ogt float %p, 0.000000e+00			%cmp1 = fcmp ogt float %p, 0.000000e+00
	%cond = select i1 %cmp1, float 1.000000e+00, float -1.000000e+00			%cond = select i1 %cmp1, float 1.000000e+00, float -1.000000e+00
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/bt.ll

	Show All 37 Lines
	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test2b(i32 %x, i32 %n) nounwind {			define void @test2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test2b:			; CHECK-LABEL: test2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB1_2			; CHECK-NEXT: jae .LBB1_1
	;			;
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	Show All 23 Lines
	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @atest2b(i32 %x, i32 %n) nounwind {			define void @atest2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: atest2b:			; CHECK-LABEL: atest2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB3_2			; CHECK-NEXT: jae .LBB3_1
	;			;
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test3(i32 %x, i32 %n) nounwind {			define void @test3(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB4_2			; CHECK-NEXT: jae .LBB4_1
	;			;
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test3b(i32 %x, i32 %n) nounwind {			define void @test3b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test3b:			; CHECK-LABEL: test3b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB5_2			; CHECK-NEXT: jae .LBB5_1
	;			;
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	▲ Show 20 Lines • Show All 476 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll

	Show All 30 Lines
	; CHECK-NEXT: jp .LBB0_2			; CHECK-NEXT: jp .LBB0_2
	; CHECK-NEXT: # BB#1: # %bb1			; CHECK-NEXT: # BB#1: # %bb1
	; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0			; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0
	; CHECK-NEXT: .LBB0_2: # %bb2			; CHECK-NEXT: .LBB0_2: # %bb2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq

	entry:			entry:
	%mul = fmul double %x, %y			%mul = fmul double %x, %y
	%cmp = fcmp une double %mul, 0.000000e+00			%cmp = fcmp oeq double %mul, 0.000000e+00
	br i1 %cmp, label %bb2, label %bb1			br i1 %cmp, label %bb1, label %bb2

	bb1:			bb1:
	%add = fadd double %mul, -1.000000e+00			%add = fadd double %mul, -1.000000e+00
	br label %bb2			br label %bb2

	bb2:			bb2:
	%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]			%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]
	ret double %phi			ret double %phi
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/jump_sign.ll

	; RUN: llc < %s -march=x86 -mcpu=pentiumpro -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -march=x86 -mcpu=pentiumpro -verify-machineinstrs \| FileCheck %s

	define i32 @func_f(i32 %X) {			define i32 @func_f(i32 %X) {
	entry:			entry:
	; CHECK-LABEL: func_f:			; CHECK-LABEL: func_f:
	; CHECK: jns			; CHECK: jns
	%tmp1 = add i32 %X, 1 ; <i32> [#uses=1]			%tmp1 = add i32 %X, 1 ; <i32> [#uses=1]
	%tmp = icmp slt i32 %tmp1, 0 ; <i1> [#uses=1]			%tmp = icmp slt i32 %tmp1, 0 ; <i1> [#uses=1]
	br i1 %tmp, label %cond_true, label %cond_next			br i1 %tmp, label %cond_true, label %cond_next, !prof !1

	cond_true: ; preds = %entry			cond_true: ; preds = %entry
	%tmp2 = tail call i32 (...) @bar( ) ; <i32> [#uses=0]			%tmp2 = tail call i32 (...) @bar( ) ; <i32> [#uses=0]
	br label %cond_next			br label %cond_next

	cond_next: ; preds = %cond_true, %entry			cond_next: ; preds = %cond_true, %entry
	%tmp3 = tail call i32 (...) @baz( ) ; <i32> [#uses=0]			%tmp3 = tail call i32 (...) @baz( ) ; <i32> [#uses=0]
	ret i32 undef			ret i32 undef
	▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
	if.then:			if.then:
	%dec = add nsw i32 %1, -1			%dec = add nsw i32 %1, -1
	store i32 %dec, i32* @a, align 4			store i32 %dec, i32* @a, align 4
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret i32 undef			ret i32 undef
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 86517

llvm/trunk/lib/CodeGen/BranchFolding.h

llvm/trunk/lib/CodeGen/BranchFolding.cpp

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp

llvm/trunk/test/CodeGen/AArch64/arm64-atomic.ll

llvm/trunk/test/CodeGen/AArch64/arm64-shrink-wrapping.ll

llvm/trunk/test/CodeGen/AArch64/tail-dup-repeat-worklist.ll

llvm/trunk/test/CodeGen/AArch64/tbz-tbnz.ll

llvm/trunk/test/CodeGen/AMDGPU/branch-relaxation.ll

llvm/trunk/test/CodeGen/AMDGPU/uniform-cfg.ll

llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll

llvm/trunk/test/CodeGen/ARM/atomic-op.ll

llvm/trunk/test/CodeGen/ARM/atomic-ops-v8.ll

llvm/trunk/test/CodeGen/ARM/cmpxchg-weak.ll

llvm/trunk/test/CodeGen/Mips/brconnez.ll

llvm/trunk/test/CodeGen/Mips/micromips-compact-branches.ll

llvm/trunk/test/CodeGen/PowerPC/misched-inorder-latency.ll

llvm/trunk/test/CodeGen/PowerPC/tail-dup-break-cfg.ll

llvm/trunk/test/CodeGen/SPARC/sjlj.ll

llvm/trunk/test/CodeGen/SystemZ/int-cmp-44.ll

llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll

llvm/trunk/test/CodeGen/Thumb2/cbnz.ll

llvm/trunk/test/CodeGen/Thumb2/ifcvt-compare.ll

llvm/trunk/test/CodeGen/Thumb2/v8_IT_4.ll

llvm/trunk/test/CodeGen/WebAssembly/phi.ll

llvm/trunk/test/CodeGen/X86/avx512-cmp.ll

llvm/trunk/test/CodeGen/X86/bt.ll

llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll

llvm/trunk/test/CodeGen/X86/jump_sign.ll

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic