This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic

Authored by iteratee on Jan 11 2017, 3:07 PM.

Download Raw Diff

Details

Reviewers

davidxl
• tstellarAMD
arsenm
javed.absar

Commits

rGb15c06677c63: CodeGen: Allow small copyable blocks to "break" the CFG.
rL293716: CodeGen: Allow small copyable blocks to "break" the CFG.

Summary

When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Diff Detail

Event Timeline

iteratee updated this revision to Diff 84034.Jan 11 2017, 3:07 PM

iteratee retitled this revision from to CodeGen: Allow small copyable blocks to "break" the CFG..

iteratee updated this object.

iteratee added a reviewer: davidxl.

iteratee set the repository for this revision to rL LLVM.

iteratee added subscribers: echristo, timshen, chandlerc, llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptJan 11 2017, 3:07 PM

Herald added subscribers: nhaehnle, nemanjai, jyknight and 2 others. · View Herald Transcript

iteratee updated this object.Jan 11 2017, 3:08 PM

iteratee edited edge metadata.

iteratee updated this revision to Diff 84036.Jan 11 2017, 3:15 PM

iteratee added a reviewer: arsenm.

iteratee removed rL LLVM as the repository for this revision.

Herald edited edge metadata. · View Herald TranscriptJan 11 2017, 3:15 PM

Herald added a subscriber: wdng. · View Herald Transcript

Realized that one of the calculations I did was only valid for D28522. Re-worked the calculation for now, and will rebase and update the calculation there.

Herald edited edge metadata. · View Herald TranscriptJan 11 2017, 4:34 PM

junbuml added a subscriber: junbuml.Jan 12 2017, 7:11 AM

I like the direction (with more precise cost analysis) this is going. Will review the code soon.

iteratee mentioned this in D28522: Codegen: Make chains from trellis-shaped CFGs.Jan 12 2017, 12:03 PM

iteratee added a child revision: D28522: Codegen: Make chains from trellis-shaped CFGs.

davidxl added inline comments.Jan 13 2017, 10:04 AM

lib/CodeGen/MachineBlockPlacement.cpp
665	In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block. Dependinng the result, we will know that without tailDup, the layout order is Succ-> PDom or Succ->D->PDom. This will make the cost computation more precise.

davidxl added inline comments.Jan 13 2017, 10:04 AM

lib/CodeGen/MachineBlockPlacement.cpp
420	Suggest new name : isProfitableToTailDup
623	Dom -> PDom
630	Why not just check if there exists a SuccSucc that post dominates Succ directly?
649	PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ --> D so the overall cost is: Q + P V + Q ( which is smaller than Q + QV + PU + PV)
650	We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method is called. Besides, if Succ is not tail-duped, the layout decision may even reject Succ as the layout successor, so the cost is no longer P + V, but 2*Q + V instead (with U > V). In other words, isProfitable check can not be done inside 'hasBetterLayoutPredecessor', but hoisted to the caller of it when 'hasBetterLayoutPrecessor' returns, at which point we will know the layout decision if taildup does not kick in.

Updated the cost calculation to not rely on the lattice layout.
This resulted in fewer duplications in tests, so those tests changes have been rolled into D28522

Herald edited edge metadata. · View Herald TranscriptJan 13 2017, 3:42 PM

I made the calculations in terms of frequency instead of probability.

I adjusted the cost calculation when there is a post dominator based on whether it will be laid out after Succ or not.

Let me know if there are any cost calculations that you think are wrong.

Herald added a reviewer: javed.absar. · View Herald TranscriptJan 19 2017, 5:25 PM

Actually upload the diff with what I said was in the last one:
Use frequency instead of probability

Use slight lookahead for more precise probability calculations.

Let me know what you think. There is a small cleanup that could go in as a separate patch: I switched to a SmallDenseSet because we don't need the orderedness of the SmallVectorSet.

lib/CodeGen/MachineBlockPlacement.cpp
649	I thought that too. But without the lattice patch, after duplication, we won't put D after Succ because it now has an unplaced predecessor. The lattice patch fixes the behavior and the calculation.
650	I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor.
650	We now only call this function to check if we should use Succ despite it having been rejected. So we know that Succ is not the layout successor.

Tidied comments and spacing.

davidxl added inline comments.Jan 20 2017, 2:04 PM

lib/CodeGen/MachineBlockPlacement.cpp
268–274	There is a reason SmallSetVector is used here -- to make sure the iteration order is deterministic.
642	I assume this is loop back edge source block. You need a test case to cover it.
653	Why break here?
658	nit: -->SuccBestPred
666	Computing BestSuccPred here is unnecessary. See below for more comments.
675	Qin is not necessarily BestSuccPred. Profitability check is called only after hasBetterLayoutPredecessor is returned and it returns true. There are two scenarios it returns true Qin or Qout is larger than P, or P is larger than Qout, but not the branch is not biased enough such that the layout algorithm still decides to keep the top-order. Either way, the baseline layout to compare (with taildup) is that BB->Succ is the branch taken edge, and BB->C is the fall through edge. Qin should just be Prob(BB->C)
691	PDom is always a successor of Succ according to the way it is computed.
697	The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the fallthrough path is BB->Succ->D so the cost (normalized with freq(bb) ==1) is 2Q+ PV If P < Q, the fall through path is BB->C'->D the cost is 2P + QV
842	Add more description about what blocks to ignore.

I'll be glad to add some more comments to explain, but I think the calculations are correct. I've commented individually.

lib/CodeGen/MachineBlockPlacement.cpp
268–274	BlockFilterSet is never iterated. I checked.
650	I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor.
653	Because if PDom is not null, that's all that we look at for the probability calculation.
675	When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be larger than Freq(BB->C). I am using Qin to represent Freq(C'->Succ) and Qout for Freq(BB->C). I could just use different letters if that were more clear. Qout is Freq(BB->C). I don't think Qin should be as well.
691	Thanks.
697	This function is called in a loop looking for the highest probability successor. If Q > P, this function will be ignored and we will lay out Q anyway, so we can ignore the second case. As to the first case: Until the 2nd patch lands, the duplication will prevent the BB->Succ->D layout. Instead you will get BB->Succ ; C'->D So the cost is as calculated. D28522 will include an update to this calculation along with an update to the behavior.
842	Well, that's really up to the caller. Do you want me to list why you might want to ignore a block?

davidxl added inline comments.Jan 20 2017, 4:19 PM

lib/CodeGen/MachineBlockPlacement.cpp
268–274	See for (MachineBasicBlock *LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet);
675	differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB->Succ). What I meant is you should directly compute Qin as its definition Freq(C'->Succ)
697	You are right about Q > P case that that scenario will be dropped. It is very subtle, so please add some comment to clarify. Ok -- for the first case, also add a comment
842	something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details. However, see my comment in the function, this parameter seems unnecessary.
976	I think it is equivalent to check Pred == BB. In normal calling context, this is covered by BlockToChain[Pred] == &Chain, but for lookahead case, it is needed to filter BB which is not laid out yet.

Improved comments based on review.

Please mark addressed comments as done. Also let me know if it is ready for another round of review (I saw some issues not addressed such as the deterministic iteration of block filter set).

Missed a comment to rename something.

In D28583#653869, @davidxl wrote:

Please mark addressed comments as done. Also let me know if it is ready for another round of review (I saw some issues not addressed such as the deterministic iteration of block filter set).

Marked.

I think it's ready, and I put back the deterministic set.

lib/CodeGen/MachineBlockPlacement.cpp
675	Did you still want me to fix something here?

davidxl added inline comments.Jan 23 2017, 2:39 PM

lib/CodeGen/MachineBlockPlacement.cpp
642	test case for this?
672	Add a short cut here with comments: // If P is not larger, the best successor selection loop will eventually select C, not Succ (as it is not profitable to do so). if (P <= Qout) return false;
675	just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming edges which have not been placed.
977	--> ... for lookhead by isProfitableToTailDup when BB has not yet been placed.

More comments from review, and a new test case.

This version looks almost fine except for one remaining unaddressed comment.

lib/CodeGen/MachineBlockPlacement.cpp
672	How about this comment? Early return can 1) speed up the computation and 2) make the following code easier to understand.

iteratee added inline comments.Jan 23 2017, 8:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
642	It's not just a back edge. I added a test case.
672	If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know that we won't use the result.

Sorry. I'd replied to the comment, but Phabricator didn't submit it along with my diff update for some reason.

Save the blocks with CFG violations that are duplication candidates. Review them in descending order of probability, so we call isProfitableToTailDup the minimum number of times.

davidxl added inline comments.Jan 24 2017, 4:43 PM

lib/CodeGen/MachineBlockPlacement.cpp
1051	Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win depending on the iteration order, but adds more confusion.
1073	no need to set ShouldTailDup in the loop -- it is already initalized outside.
1082	Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do you need position ?
1092	Should it break instead?
1094	isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not be true here as there are other two possibilities: Succ == BestSucc.BB in the base layout BestSucc.BB == null in the base layout (all BB's successors have conflicts). In such two cases, isProfitable check should probably be skipped (as it is benefitial)

Changes from comments:
Just sort the vector instead of make_heap.
If there is a tail duplication opportunity and no other successor, take it.

lib/CodeGen/MachineBlockPlacement.cpp
1082	Will just sort the vector. Position is because we rely on the successor order being stable and the first successor being a subtle hint. Without the position, we lose track of whether the block in the vector came before or after the block we picked without tail duplication.
1094	Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the first loop by construction. Good catch. I'll add that.

Per offline discussion, I removed the ordering constraint for blocks that are profitable to tail-duplicate.

This resulted in a lot of test churn, but the source change is relatively small.

This looks very clean now.

However the amount of churns remind me of one thing. Since the profit computation is based on static branch prediction (without PGO), it is the right thing to do to be a little more conservative in taildup. In other words, instead of making 'isProtifiable' return true when the taildup cost is smaller than baseline cost, add a predefined margin (controlled by a parameter):

if (baseline_cost - taildup_cost > threshold)

return true;

return false;

The threshold also roughly models the side effect of taildup -- increased icache footprint etc due to code size increase.

Compare frequencies with a small bias against the tail-duplication side to account for increased icache pressure.

Includes a TODO to handle edge frequencies better in general.

davidxl added inline comments.Jan 30 2017, 4:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
149	perhaps simplify it to tail-dup-penalty ?
609	This basically treats the penality percent parameter as the threshold of normalized improvement: (A-B)/B if ((A-B)/B > PenaltyPercent/100) return true; The problem with this formula is that if B is very hot, it makes (A-B)/B become small, even though the (A-B) is still large. So I think it is better to compute the normalized improvement as (A-B)/Entry_Freq basically the improvement relative to the entry frequency. This will help prevent tail dup from happening in very cold paths. The implementation can makes use of BranchProbablity as well. Suppose we want to implement condition: if ( (A-B)/Entry_Freq > P/100) return true; do this 3 lines: BlockFrequency Profit = A - B; BlockFrequency Threshold = Entry_Freq * BranchProbability(P, 100); return Profit > Threshold;

Use a percentage of the entry frequency as a cutoff.

davidxl added inline comments.Jan 30 2017, 7:23 PM

lib/CodeGen/MachineBlockPlacement.cpp
154	Is this default value too low? Increase it 5 or 10 perhaps?
614	I suppose this logic here is for rounding errors or overflow? Can you explain why the simple scaling with branch prob (in BranchProbablity.cpp) does not work? return Gain > EntryFreq*ThresholdProb;

Simplify the biased comparison.

iteratee marked 4 inline comments as done.Jan 31 2017, 11:36 AM

iteratee added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
154	No, I think we should leave it. Now that it's a flag it's easy to change, and especially comparing with the entry frequency 2% is a big enough margin.
614	I did the math, and found a way to do it simply.

lgtm

(I only sampled some test case changes which look reasonable)

test/CodeGen/X86/bt.ll
27	This test has not changed in behavior. Better to revert the change.

This revision is now accepted and ready to land.Jan 31 2017, 1:45 PM

iteratee marked an inline comment as done.Jan 31 2017, 1:48 PM

iteratee added inline comments.

test/CodeGen/X86/bt.ll
27	I'll do a complete check for any tests that fall into this category and revert them.

Closed by commit rL293716: CodeGen: Allow small copyable blocks to "break" the CFG. (authored by iteratee). · Explain WhyJan 31 2017, 3:59 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

MachineBlockPlacement.cpp

310 lines

test/

CodeGen/

AArch64/

addsub.ll

7 lines

arm64-atomic.ll

22 lines

arm64-ccmp.ll

8 lines

arm64-shrink-wrapping.ll

14 lines

compare-branch.ll

2 lines

logical_shifted_reg.ll

2 lines

tail-dup-repeat-worklist.ll

tbz-tbnz.ll

18 lines

AMDGPU/

branch-relaxation.ll

16 lines

si-annotate-cf-noloop.ll

5 lines

uniform-cfg.ll

32 lines

ARM/

arm-and-tst-peephole.ll

6 lines

4 lines

35 lines

8 lines

2 lines

Mips/

4 lines

4 lines

4 lines

4 lines

4 lines

llvm-ir/

ashr.ll

14 lines

micromips-compact-branches.ll

3 lines

PowerPC/

misched-inorder-latency.ll

4 lines

tail-dup-break-cfg.ll

97 lines

tail-dup-layout.ll

86 lines

SPARC/

sjlj.ll

9 lines

SystemZ/

4 lines

10 lines

10 lines

10 lines

10 lines

5 lines

5 lines

24 lines

20 lines

6 lines

26 lines

14 lines

Thumb/

thumb-shrink-wrapping.ll

11 lines

Thumb2/

cbnz.ll

2 lines

ifcvt-compare.ll

2 lines

v8_IT_4.ll

5 lines

WebAssembly/

phi.ll

5 lines

X86/

2008-11-29-ULT-Sign.ll

4 lines

6 lines

7 lines

5 lines

10 lines

critical-edge-split-2.ll

5 lines

4 lines

4 lines

4 lines

24 lines

3 lines

sse-scalar-fp-arith.ll

16 lines

testb-je-fusion.ll

3 lines

Diff 85073

lib/CodeGen/MachineBlockPlacement.cpp

Show All 24 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "BranchFolding.h"		#include "BranchFolding.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
		#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/CodeGen/TailDuplicator.h"		#include "llvm/CodeGen/TailDuplicator.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> TailDuplicatePlacementThreshold(
"tail-dup-placement-threshold",		"tail-dup-placement-threshold",
cl::desc("Instruction cutoff for tail duplication during layout. "		cl::desc("Instruction cutoff for tail duplication during layout. "
"Tail merging during layout is forced to have a threshold "		"Tail merging during layout is forced to have a threshold "
"that won't conflict."), cl::init(2),		"that won't conflict."), cl::init(2),
cl::Hidden);		cl::Hidden);

extern cl::opt<unsigned> StaticLikelyProb;		extern cl::opt<unsigned> StaticLikelyProb;
extern cl::opt<unsigned> ProfileLikelyProb;		extern cl::opt<unsigned> ProfileLikelyProb;

		davidxlUnsubmitted Not Done Reply Inline Actions perhaps simplify it to tail-dup-penalty ? davidxl: perhaps simplify it to tail-dup-penalty ?
namespace {		namespace {
class BlockChain;		class BlockChain;
/// \brief Type for our function-wide basic block -> block chain mapping.		/// \brief Type for our function-wide basic block -> block chain mapping.
typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;		typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;
}		}
		davidxlUnsubmitted Not Done Reply Inline Actions Is this default value too low? Increase it 5 or 10 perhaps? davidxl: Is this default value too low? Increase it 5 or 10 perhaps?
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions No, I think we should leave it. Now that it's a flag it's easy to change, and especially comparing with the entry frequency 2% is a big enough margin. iteratee: No, I think we should leave it. Now that it's a flag it's easy to change, and especially…

namespace {		namespace {
/// \brief A chain of blocks which will be laid out contiguously.		/// \brief A chain of blocks which will be laid out contiguously.
///		///
/// This is the datastructure representing a chain of consecutive blocks that		/// This is the datastructure representing a chain of consecutive blocks that
/// are profitable to layout together in order to maximize fallthrough		/// are profitable to layout together in order to maximize fallthrough
/// probabilities and code locality. We also can use a block chain to represent		/// probabilities and code locality. We also can use a block chain to represent
/// a sequence of basic blocks which have some external (correctness)		/// a sequence of basic blocks which have some external (correctness)
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	#endif // NDEBUG
/// and then once for the function as a whole.		/// and then once for the function as a whole.
unsigned UnscheduledPredecessors;		unsigned UnscheduledPredecessors;
};		};
}		}

namespace {		namespace {
class MachineBlockPlacement : public MachineFunctionPass {		class MachineBlockPlacement : public MachineFunctionPass {
/// \brief A typedef for a block filter set.		/// \brief A typedef for a block filter set.
typedef SmallSetVector<MachineBasicBlock *, 16> BlockFilterSet;		typedef SmallDenseSet<MachineBasicBlock *, 16> BlockFilterSet;

		/// Pair struct containing basic block and taildup profitiability
		struct BlockAndTailDupResult {
		MachineBasicBlock * BB;
		bool ShouldTailDup;
		};
		davidxlUnsubmitted Done Reply Inline Actions There is a reason SmallSetVector is used here -- to make sure the iteration order is deterministic. davidxl: There is a reason SmallSetVector is used here -- to make sure the iteration order is…
		iterateeAuthorUnsubmitted Done Reply Inline Actions BlockFilterSet is never iterated. I checked. iteratee: BlockFilterSet is never iterated. I checked.
		davidxlUnsubmitted Done Reply Inline Actions See for (MachineBasicBlock LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet); davidxl:* See for (MachineBasicBlock *LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds…

/// \brief work lists of blocks that are ready to be laid out		/// \brief work lists of blocks that are ready to be laid out
SmallVector<MachineBasicBlock *, 16> BlockWorkList;		SmallVector<MachineBasicBlock *, 16> BlockWorkList;
SmallVector<MachineBasicBlock *, 16> EHPadWorkList;		SmallVector<MachineBasicBlock *, 16> EHPadWorkList;

/// \brief Machine Function		/// \brief Machine Function
MachineFunction *F;		MachineFunction *F;

Show All 12 Lines	class MachineBlockPlacement : public MachineFunctionPass {
MachineBasicBlock *PreferredLoopExit;		MachineBasicBlock *PreferredLoopExit;

/// \brief A handle to the target's instruction info.		/// \brief A handle to the target's instruction info.
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;

/// \brief A handle to the target's lowering info.		/// \brief A handle to the target's lowering info.
const TargetLoweringBase *TLI;		const TargetLoweringBase *TLI;

/// \brief A handle to the post dominator tree.		/// \brief A handle to the dominator tree.
MachineDominatorTree *MDT;		MachineDominatorTree *MDT;

		/// \brief A handle to the post dominator tree.
		MachinePostDominatorTree *MPDT;

/// \brief Duplicator used to duplicate tails during placement.		/// \brief Duplicator used to duplicate tails during placement.
///		///
/// Placement decisions can open up new tail duplication opportunities, but		/// Placement decisions can open up new tail duplication opportunities, but
/// since tail duplication affects placement decisions of later blocks, it		/// since tail duplication affects placement decisions of later blocks, it
/// must be done inline.		/// must be done inline.
TailDuplicator TailDup;		TailDuplicator TailDup;

/// \brief A set of blocks that are unavoidably execute, i.e. they dominate		/// \brief A set of blocks that are unavoidably execute, i.e. they dominate
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	bool maybeTailDuplicateBlock(MachineBasicBlock BB, MachineBasicBlock LPred,
const BlockChain &Chain,		const BlockChain &Chain,
BlockFilterSet *BlockFilter,		BlockFilterSet *BlockFilter,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
bool &DuplicatedToPred);		bool &DuplicatedToPred);
bool		bool
hasBetterLayoutPredecessor(MachineBasicBlock BB, MachineBasicBlock Succ,		hasBetterLayoutPredecessor(MachineBasicBlock BB, MachineBasicBlock Succ,
BlockChain &SuccChain, BranchProbability SuccProb,		BlockChain &SuccChain, BranchProbability SuccProb,
BranchProbability RealSuccProb, BlockChain &Chain,		BranchProbability RealSuccProb, BlockChain &Chain,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter,
MachineBasicBlock selectBestSuccessor(MachineBasicBlock BB,		const BlockFilterSet *LookAhead);
		BlockAndTailDupResult selectBestSuccessor(MachineBasicBlock *BB,
BlockChain &Chain,		BlockChain &Chain,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);
MachineBasicBlock *		MachineBasicBlock *
selectBestCandidateBlock(BlockChain &Chain,		selectBestCandidateBlock(BlockChain &Chain,
SmallVectorImpl<MachineBasicBlock *> &WorkList);		SmallVectorImpl<MachineBasicBlock *> &WorkList);
MachineBasicBlock *		MachineBasicBlock *
getFirstUnplacedBlock(const BlockChain &PlacedChain,		getFirstUnplacedBlock(const BlockChain &PlacedChain,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
const BlockFilterSet *BlockFilter);		const BlockFilterSet *BlockFilter);

Show All 16 Lines	#endif
void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,		void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,		void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void collectMustExecuteBBs();		void collectMustExecuteBBs();
void buildCFGChains();		void buildCFGChains();
void optimizeBranches();		void optimizeBranches();
void alignBlocks();		void alignBlocks();
		bool shouldTailDuplicate(MachineBasicBlock *BB);
		/// Check the edge frequencies to see if tail duplication will increase
		/// fallthroughs.
		bool isProfitableToTailDup(
		davidxlUnsubmitted Done Reply Inline Actions Suggest new name : isProfitableToTailDup davidxl: Suggest new name : isProfitableToTailDup
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BranchProbability AdjustedSumProb,
		BlockChain &Chain, const BlockFilterSet *BlockFilter);
		/// Returns true if a block can tail duplicate into all unplaced
		/// predecessors. Filters based on loop.
		bool canTailDuplicateUnplacedPreds(
		MachineBasicBlock BB, MachineBasicBlock Succ,
		BlockChain &Chain, const BlockFilterSet *BlockFilter);

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineBlockPlacement() : MachineFunctionPass(ID) {		MachineBlockPlacement() : MachineFunctionPass(ID) {
initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());		initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<MachineBranchProbabilityInfo>();		AU.addRequired<MachineBranchProbabilityInfo>();
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
		if (TailDupPlacement)
		AU.addRequired<MachinePostDominatorTree>();
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
AU.addRequired<TargetPassConfig>();		AU.addRequired<TargetPassConfig>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};
}		}

char MachineBlockPlacement::ID = 0;		char MachineBlockPlacement::ID = 0;
char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;		char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;
INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",
"Branch Probability Basic Block Placement", false, false)		"Branch Probability Basic Block Placement", false, false)
INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)
INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement",
"Branch Probability Basic Block Placement", false, false)		"Branch Probability Basic Block Placement", false, false)

#ifndef NDEBUG		#ifndef NDEBUG
/// \brief Helper to print the name of a MBB.		/// \brief Helper to print the name of a MBB.
///		///
/// Only used by debug logging.		/// Only used by debug logging.
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	getAdjustedProbability(BranchProbability OrigProb,
if (SuccProbN >= SuccProbD)		if (SuccProbN >= SuccProbD)
SuccProb = BranchProbability::getOne();		SuccProb = BranchProbability::getOne();
else		else
SuccProb = BranchProbability(SuccProbN, SuccProbD);		SuccProb = BranchProbability(SuccProbN, SuccProbD);

return SuccProb;		return SuccProb;
}		}

		/// Check if a block should be tail duplicated.
		/// \p BB Block to check.
		bool MachineBlockPlacement::shouldTailDuplicate(MachineBasicBlock *BB) {
		// Blocks with single successors don't create additional fallthrough
		// opportunities. Don't duplicate them. TODO: When conditional exits are
		// analyzable, allow them to be duplicated.
		bool IsSimple = TailDup.isSimpleBB(BB);

		if (BB->succ_size() == 1)
		return false;
		return TailDup.shouldTailDuplicate(IsSimple, *BB);
		}

		/// Check the edge frequencies to see if tail duplication will increase
		/// fallthroughs. It only makes sense to call this function when
		/// \p Succ != ChosenSucc. Tail duplication of \p Succ is always locally
		/// profitable if we would have picked Succ without considering duplication.
		/// \p ChosenSucc The block chosen w/out considering tail duplication.
		bool MachineBlockPlacement::isProfitableToTailDup(
		MachineBasicBlock BB, MachineBasicBlock Succ,
		davidxlUnsubmitted Not Done Reply Inline Actions This basically treats the penality percent parameter as the threshold of normalized improvement: (A-B)/B if ((A-B)/B > PenaltyPercent/100) return true; The problem with this formula is that if B is very hot, it makes (A-B)/B become small, even though the (A-B) is still large. So I think it is better to compute the normalized improvement as (A-B)/Entry_Freq basically the improvement relative to the entry frequency. This will help prevent tail dup from happening in very cold paths. The implementation can makes use of BranchProbablity as well. Suppose we want to implement condition: if ( (A-B)/Entry_Freq > P/100) return true; do this 3 lines: BlockFrequency Profit = A - B; BlockFrequency Threshold = Entry_Freq * BranchProbability(P, 100); return Profit > Threshold; davidxl: This basically treats the penality percent parameter as the threshold of normalized improvement…
		BranchProbability AdjustedSumProb,
		BlockChain &Chain, const BlockFilterSet *BlockFilter) {
		// We need to do a probability calculation to make sure this is profitable.
		// First: does succ have a successor that post-dominates? This affects the
		// calculation. The 2 relevant cases are:
		davidxlUnsubmitted Done Reply Inline Actions I suppose this logic here is for rounding errors or overflow? Can you explain why the simple scaling with branch prob (in BranchProbablity.cpp) does not work? return Gain > EntryFreqThresholdProb; davidxl:* I suppose this logic here is for rounding errors or overflow? Can you explain why the simple…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions I did the math, and found a way to do it simply. iteratee: I did the math, and found a way to do it simply.
		// BB BB
		// \| \ \| \
		// P\| C \|P C
		// = C' = C'
		// \| /Qout \| /Qout
		// \| / \| /
		// Succ Succ
		// / \ \| \ V
		// U/ =V \|U \
		davidxlUnsubmitted Done Reply Inline Actions Dom -> PDom davidxl: Dom -> PDom
		// / \ = D
		// D E \| /
		// \| /
		// \|/
		// PDom
		// '=' : Branch taken for that CFG edge
		// In the second case, Placing Succ while duplicating it into C prevents the
		davidxlUnsubmitted Done Reply Inline Actions Why not just check if there exists a SuccSucc that post dominates Succ directly? davidxl: Why not just check if there exists a SuccSucc that post dominates Succ directly?
		// fallthrough of Succ into either D or PDom, because they now have C as an
		// unplaced predecessor

		// Start by figuring out which case we fall into
		MachineBasicBlock *PDom = nullptr;
		SmallVector<MachineBasicBlock *, 4> SuccSuccs;
		// Only scan the relevant successors
		auto AdjustedSuccSumProb =
		collectViableSuccessors(Succ, Chain, BlockFilter, SuccSuccs);
		// If there are no more successors, it is profitable to copy, as it strictly
		// increases fallthrough.
		if (SuccSuccs.size() == 0)
		davidxlUnsubmitted Done Reply Inline Actions I assume this is loop back edge source block. You need a test case to cover it. davidxl: I assume this is loop back edge source block. You need a test case to cover it.
		davidxlUnsubmitted Done Reply Inline Actions test case for this? davidxl: test case for this?
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions It's not just a back edge. I added a test case. iteratee: It's not just a back edge. I added a test case.
		return true;
		auto BestSuccSucc = BranchProbability::getZero();
		// Find the PDom or the best Succ if no PDom exists.
		for (MachineBasicBlock *SuccSucc : SuccSuccs) {
		auto Prob = MBPI->getEdgeProbability(Succ, SuccSucc);
		if (Prob > BestSuccSucc)
		BestSuccSucc = Prob;
		davidxlUnsubmitted Done Reply Inline Actions PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ --> D so the overall cost is: Q + P V + Q ( which is smaller than Q + QV + PU + PV) davidxl: PU + PV == P Also Assuming U > V, the layout order (with tail dup) on should be BB --> Succ…
		iterateeAuthorUnsubmitted Done Reply Inline Actions I thought that too. But without the lattice patch, after duplication, we won't put D after Succ because it now has an unplaced predecessor. The lattice patch fixes the behavior and the calculation. iteratee: I thought that too. But without the lattice patch, after duplication, we won't put D after Succ…
		if (PDom == nullptr)
		davidxlUnsubmitted Done Reply Inline Actions We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method is called. Besides, if Succ is not tail-duped, the layout decision may even reject Succ as the layout successor, so the cost is no longer P + V, but 2Q + V instead (with U > V). In other words, isProfitable check can not be done inside 'hasBetterLayoutPredecessor', but hoisted to the caller of it when 'hasBetterLayoutPrecessor' returns, at which point we will know the layout decision if taildup does not kick in. davidxl:* We can not assume BB and Succ are in a triangular shape subcfg here -- given where this method…
		iterateeAuthorUnsubmitted Done Reply Inline Actions I think I have the calculation right for when Succ would not be the layout successor, but you're right to point out that we should also do the calculation even when Succ is the chosen successor. iteratee: I think I have the calculation right for when Succ would not be the layout successor, but…
		iterateeAuthorUnsubmitted Done Reply Inline Actions We now only call this function to check if we should use Succ despite it having been rejected. So we know that Succ is not the layout successor. iteratee: We now only call this function to check if we should use Succ despite it having been rejected.
		if (MPDT->dominates(SuccSucc, Succ)) {
		PDom = SuccSucc;
		break;
		davidxlUnsubmitted Done Reply Inline Actions Why break here? davidxl: Why break here?
		iterateeAuthorUnsubmitted Done Reply Inline Actions Because if PDom is not null, that's all that we look at for the probability calculation. iteratee: Because if PDom is not null, that's all that we look at for the probability calculation.
		}
		}
		// For the comparisons, we need to know Succ's best incoming edge that isn't
		// from BB.
		auto BestSuccPred = BlockFrequency(0);
		davidxlUnsubmitted Done Reply Inline Actions nit: -->SuccBestPred davidxl: nit: -->SuccBestPred
		for (MachineBasicBlock *SuccPred : Succ->predecessors()) {
		if (SuccPred == Succ \|\| BlockToChain[SuccPred] == &Chain
		\|\| (BlockFilter && !BlockFilter->count(SuccPred)))
		continue;
		auto Freq = MBFI->getBlockFreq(SuccPred)
		* MBPI->getEdgeProbability(SuccPred, Succ);
		if (Freq > BestSuccPred)
		davidxlUnsubmitted Done Reply Inline Actions In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block. Dependinng the result, we will know that without tailDup, the layout order is Succ-> PDom or Succ->D->PDom. This will make the cost computation more precise. davidxl: In this case, what is needed to to invoke 'hasBetterLayoutPredecessor' on PDom block.
		BestSuccPred = Freq;
		davidxlUnsubmitted Done Reply Inline Actions Computing BestSuccPred here is unnecessary. See below for more comments. davidxl: Computing BestSuccPred here is unnecessary. See below for more comments.
		}
		auto BBFreq = MBFI->getBlockFreq(BB);
		auto SuccFreq = MBFI->getBlockFreq(Succ);
		BranchProbability PProb = MBPI->getEdgeProbability(BB, Succ);
		BlockFrequency P = BBFreq * PProb;
		// At this point, we don't know which block would be chosen instead of Succ.
		davidxlUnsubmitted Done Reply Inline Actions Add a short cut here with comments: // If P is not larger, the best successor selection loop will eventually select C, not Succ (as it is not profitable to do so). if (P <= Qout) return false; davidxl: Add a short cut here with comments: // If P is not larger, the best successor selection loop…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know that we won't use the result. iteratee: If we weren't estimating Qout, I'd agree. Instead we'll skip calling this altogether if we know…
		davidxlUnsubmitted Done Reply Inline Actions How about this comment? Early return can 1) speed up the computation and 2) make the following code easier to understand. davidxl: How about this comment? Early return can 1) speed up the computation and 2) make the following…
		// Using Qout as (1 - P) is conservative.
		BlockFrequency Qout = BBFreq * (AdjustedSumProb - PProb);
		BlockFrequency Qin = BestSuccPred;
		davidxlUnsubmitted Not Done Reply Inline Actions Qin is not necessarily BestSuccPred. Profitability check is called only after hasBetterLayoutPredecessor is returned and it returns true. There are two scenarios it returns true Qin or Qout is larger than P, or P is larger than Qout, but not the branch is not biased enough such that the layout algorithm still decides to keep the top-order. Either way, the baseline layout to compare (with taildup) is that BB->Succ is the branch taken edge, and BB->C is the fall through edge. Qin should just be Prob(BB->C) davidxl: Qin is not necessarily BestSuccPred. Profitability check is called only after…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be larger than Freq(BB->C). I am using Qin to represent Freq(C'->Succ) and Qout for Freq(BB->C). I could just use different letters if that were more clear. Qout is Freq(BB->C). I don't think Qin should be as well. iteratee: When we place Succ, we remove 2 fallthrough edges BB->C and C'->Succ. Freq(C'->Succ) may be…
		davidxlUnsubmitted Not Done Reply Inline Actions differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB->Succ). What I meant is you should directly compute Qin as its definition Freq(C'->Succ) davidxl: differentiate Qin and Qout is fine, but in the code Qin = BestSuccPred which could be Freq(BB…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Did you still want me to fix something here? iteratee: Did you still want me to fix something here?
		davidxlUnsubmitted Done Reply Inline Actions just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming edges which have not been placed. davidxl: just add a comment above Qin decl stating that Qin is the largest frequency of Succ's incoming…
		// If it doesn't have a post-dominating successor, here is the calculation:
		// BB BB
		// \| \Qout \| \
		// P\| C \| =
		// = C' \| C
		// \| /Qin \| \|
		// \| / \| C' (+Succ)
		// Succ Succ /\|
		// / \ \| \/ \|
		// U/ =V = /= =
		// / \ \| / \\|
		// D E D E
		// '=' : Branch taken for that CFG edge
		// Cost in the first case is: P + V
		// Cost in the second case is: Qout + Qin*V + PU + PV
		if (PDom == nullptr \|\| !Succ->isSuccessor(PDom)) {
		davidxlUnsubmitted Done Reply Inline Actions PDom is always a successor of Succ according to the way it is computed. davidxl: PDom is always a successor of Succ according to the way it is computed.
		iterateeAuthorUnsubmitted Done Reply Inline Actions Thanks. iteratee: Thanks.
		BranchProbability UProb = BestSuccSucc;
		BranchProbability VProb = AdjustedSuccSumProb - UProb;
		BlockFrequency V = SuccFreq * VProb;
		BlockFrequency QinV = Qin * VProb;
		BlockFrequency BaseCost = P + V;
		BlockFrequency DupCost = Qout + QinV + P*AdjustedSuccSumProb;
		davidxlUnsubmitted Done Reply Inline Actions The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the fallthrough path is BB->Succ->D so the cost (normalized with freq(bb) ==1) is 2Q+ PV If P < Q, the fall through path is BB->C'->D the cost is 2P + QV davidxl: The base cost is as wrote, the DupCost however depends on whether P > Q or not. If P > Q, the…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions This function is called in a loop looking for the highest probability successor. If Q > P, this function will be ignored and we will lay out Q anyway, so we can ignore the second case. As to the first case: Until the 2nd patch lands, the duplication will prevent the BB->Succ->D layout. Instead you will get BB->Succ ; C'->D So the cost is as calculated. D28522 will include an update to this calculation along with an update to the behavior. iteratee: This function is called in a loop looking for the highest probability successor. If Q > P, this…
		davidxlUnsubmitted Done Reply Inline Actions You are right about Q > P case that that scenario will be dropped. It is very subtle, so please add some comment to clarify. Ok -- for the first case, also add a comment davidxl: You are right about Q > P case that that scenario will be dropped. It is very subtle, so please…
		return (BaseCost > DupCost);
		}
		BranchProbability UProb = MBPI->getEdgeProbability(Succ, PDom);
		BranchProbability VProb = AdjustedSuccSumProb - UProb;
		BlockFrequency U = SuccFreq * UProb;
		BlockFrequency V = SuccFreq * VProb;
		// If there is a post-dominating successor, here is the calculation:
		// BB BB BB BB
		// \| \Qout \| \ \| \Qout \| \
		// \|P C \| = \|P C \| =
		// = C' \|P C = C' \|P C
		// \| /Qin \| \| \| /Qin \| \|
		// \| / \| C' (+Succ) \| / \| C' (+Succ)
		// Succ Succ /\| Succ Succ /\|
		// \| \ V \| \/ \| \| \ V \| \/ \|
		// \|U \ \|U /\ \| \|U = \|U /\ \|
		// = D = = =\| \| D \| = =\|
		// \| / \|/ D \| / \|/ D
		// \| / \| / \| = \| /
		// \|/ \| / \|/ \| =
		// Dom Dom Dom Dom
		// '=' : Branch taken for that CFG edge
		// The cost for taken branches in the first case is P + U
		// The cost in the second case (assuming independence), given the layout:
		// BB, Succ, (C+Succ), D, Dom
		// is Qout + PU + PV + Qin*U
		// compare U vs Qout + Qin*U.
		//
		// The 3rd and 4th cases cover when Dom would be chosen to follow Succ.
		//
		// For the 3rd case, the cost is P + 2 * V
		// For the 4th case, the cost is Qout + Qin * U + P * V + V
		// In We choose 4 over 3 when (P + V) > Qout + Qin * U + P * V
		// Be conservative and cover both cases by checking for:
		// (P + min(U, V) > Qout + Qin * U + P
		BlockFilterSet LookAhead;
		LookAhead.insert(Succ);
		if (UProb > AdjustedSuccSumProb / 2
		&& !hasBetterLayoutPredecessor(Succ, PDom, *BlockToChain[PDom],
		UProb, UProb, Chain, BlockFilter,
		&LookAhead))
		// Cases 3 & 4
		return (P + V) > (Qout + QinUProb + PVProb);
		// Cases 1 & 2
		return (P + U > (Qout + QinUProb + PAdjustedSuccSumProb));
		}


		/// When the option TailDupPlacement is on, this method checks if the
		/// fallthrough candidate block \p Succ (of block \p BB) can be tail-duplicated
		/// into all of its unplaced, unfiltered predecessors, that are not BB. In
		/// addition we keep a set of blocks that have been tail-duplicated into and
		/// allow those blocks to be unplaced as well. This allows the creation of a
		/// second (larger) spine and a short fallthrough spine.
		/// We also identify blocks with the CFG that would have been produced by
		/// tail-duplication and lay them out in the same manner.
		bool MachineBlockPlacement::canTailDuplicateUnplacedPreds(
		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &Chain,
		const BlockFilterSet *BlockFilter) {
		if (!shouldTailDuplicate(Succ))
		return false;

		for (MachineBasicBlock *Pred : Succ->predecessors()) {
		// Make sure all unplaced and unfiltered predecessors can be
		// tail-duplicated into.
		if (Pred == BB \|\| (BlockFilter && !BlockFilter->count(Pred))
		\|\| BlockToChain[Pred] == &Chain)
		continue;
		if (!TailDup.canTailDuplicate(Succ, Pred))
		return false;
		}
		return true;
		}

/// When the option OutlineOptionalBranches is on, this method		/// When the option OutlineOptionalBranches is on, this method
/// checks if the fallthrough candidate block \p Succ (of block		/// checks if the fallthrough candidate block \p Succ (of block
/// \p BB) also has other unscheduled predecessor blocks which		/// \p BB) also has other unscheduled predecessor blocks which
/// are also successors of \p BB (forming triangular shape CFG).		/// are also successors of \p BB (forming triangular shape CFG).
/// If none of such predecessors are small, it returns true.		/// If none of such predecessors are small, it returns true.
/// The caller can choose to select \p Succ as the layout successors		/// The caller can choose to select \p Succ as the layout successors
/// so that \p Succ's predecessors (optional branches) can be		/// so that \p Succ's predecessors (optional branches) can be
/// outlined.		/// outlined.
Show All 32 Lines	static BranchProbability getLayoutSuccessorProbThreshold(
if (!BB->getParent()->getFunction()->getEntryCount())		if (!BB->getParent()->getFunction()->getEntryCount())
return BranchProbability(StaticLikelyProb, 100);		return BranchProbability(StaticLikelyProb, 100);
if (BB->succ_size() == 2) {		if (BB->succ_size() == 2) {
const MachineBasicBlock Succ1 = BB->succ_begin();		const MachineBasicBlock Succ1 = BB->succ_begin();
const MachineBasicBlock Succ2 = (BB->succ_begin() + 1);		const MachineBasicBlock Succ2 = (BB->succ_begin() + 1);
if (Succ1->isSuccessor(Succ2) \|\| Succ2->isSuccessor(Succ1)) {		if (Succ1->isSuccessor(Succ2) \|\| Succ2->isSuccessor(Succ1)) {
/* See case 1 below for the cost analysis. For BB->Succ to		/* See case 1 below for the cost analysis. For BB->Succ to
* be taken with smaller cost, the following needs to hold:		* be taken with smaller cost, the following needs to hold:
* Prob(BB->Succ) > 2* Prob(BB->Pred)		* Prob(BB->Succ) > 2 * Prob(BB->Pred)
* So the threshold T		* So the threshold T in the calculation below
* T = 2 * (1-Prob(BB->Pred). Since T + Prob(BB->Pred) == 1,		* (1-T) * Prob(BB->Succ) > T * Prob(BB->Pred)
* We have T + T/2 = 1, i.e. T = 2/3. Also adding user specified		* So T / (1 - T) = 2, Yielding T = 2/3
* branch bias, we have		* Also adding user specified branch bias, we have
* T = (2/3)*(ProfileLikelyProb/50)		* T = (2/3)*(ProfileLikelyProb/50)
* = (2*ProfileLikelyProb)/150)		* = (2*ProfileLikelyProb)/150)
*/		*/
return BranchProbability(2 * ProfileLikelyProb, 150);		return BranchProbability(2 * ProfileLikelyProb, 150);
}		}
}		}
return BranchProbability(ProfileLikelyProb, 100);		return BranchProbability(ProfileLikelyProb, 100);
}		}

/// Checks to see if the layout candidate block \p Succ has a better layout		/// Checks to see if the layout candidate block \p Succ has a better layout
/// predecessor than \c BB. If yes, returns true.		/// predecessor than \c BB. If yes, returns true.
		/// \p SuccProb: The probability adjusted for only remaining blocks.
		/// Only used for logging
		/// \p RealSuccProb: The un-adjusted probability.
		/// \p Chain: The chain that BB belongs to and Succ is being considered for.
		/// \p BlockFilter: if non-null, the set of blocks that make up the loop being
		/// considered
		/// \p Lookahead: if non-null, a set of blocks to ignore.
		davidxlUnsubmitted Not Done Reply Inline Actions Add more description about what blocks to ignore. davidxl: Add more description about what blocks to ignore.
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Well, that's really up to the caller. Do you want me to list why you might want to ignore a block? iteratee: Well, that's really up to the caller. Do you want me to list why you might want to ignore a…
		davidxlUnsubmitted Done Reply Inline Actions something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details. However, see my comment in the function, this parameter seems unnecessary. davidxl: something like : e.g, when called under xxx, we want to ignore yyy. See caller zzz for details.
bool MachineBlockPlacement::hasBetterLayoutPredecessor(		bool MachineBlockPlacement::hasBetterLayoutPredecessor(
MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,
BranchProbability SuccProb, BranchProbability RealSuccProb,		BranchProbability SuccProb, BranchProbability RealSuccProb,
BlockChain &Chain, const BlockFilterSet *BlockFilter) {		BlockChain &Chain, const BlockFilterSet *BlockFilter,
		const BlockFilterSet *LookAhead) {

// There isn't a better layout when there are no unscheduled predecessors.		// There isn't a better layout when there are no unscheduled predecessors.
if (SuccChain.UnscheduledPredecessors == 0)		if (SuccChain.UnscheduledPredecessors == 0)
return false;		return false;

// There are two basic scenarios here:		// There are two basic scenarios here:
// -------------------------------------		// -------------------------------------
// Case 1: triangular shape CFG (if-then):		// Case 1: triangular shape CFG (if-then):
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::hasBetterLayoutPredecessor(
// Make sure that a hot successor doesn't have a globally more		// Make sure that a hot successor doesn't have a globally more
// important predecessor.		// important predecessor.
BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;		BlockFrequency CandidateEdgeFreq = MBFI->getBlockFreq(BB) * RealSuccProb;
bool BadCFGConflict = false;		bool BadCFGConflict = false;

for (MachineBasicBlock *Pred : Succ->predecessors()) {		for (MachineBasicBlock *Pred : Succ->predecessors()) {
if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|		if (Pred == Succ \|\| BlockToChain[Pred] == &SuccChain \|\|
(BlockFilter && !BlockFilter->count(Pred)) \|\|		(BlockFilter && !BlockFilter->count(Pred)) \|\|
BlockToChain[Pred] == &Chain)		BlockToChain[Pred] == &Chain \|\|
		(LookAhead && LookAhead->count(Pred)))
		davidxlUnsubmitted Done Reply Inline Actions I think it is equivalent to check Pred == BB. In normal calling context, this is covered by BlockToChain[Pred] == &Chain, but for lookahead case, it is needed to filter BB which is not laid out yet. davidxl: I think it is equivalent to check Pred == BB. In normal calling context, this is covered by…
continue;		continue;
		davidxlUnsubmitted Done Reply Inline Actions --> ... for lookhead by isProfitableToTailDup when BB has not yet been placed. davidxl: --> ... for lookhead by isProfitableToTailDup when BB has not yet been placed.
// Do backward checking.		// Do backward checking.
// For all cases above, we need a backward checking to filter out edges that		// For all cases above, we need a backward checking to filter out edges that
// are not 'strongly' biased. With profile data available, the check is		// are not 'strongly' biased.
// mostly redundant for case 2 (when threshold prob is set at 50%) unless S
// has more than two successors.
// BB Pred		// BB Pred
// \ /		// \ /
// Succ		// Succ
// We select edge BB->Succ if		// We select edge BB->Succ if
// freq(BB->Succ) > freq(Succ) * HotProb		// freq(BB->Succ) > freq(Succ) * HotProb
// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *		// i.e. freq(BB->Succ) > freq(BB->Succ) * HotProb + freq(Pred->Succ) *
// HotProb		// HotProb
// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb		// i.e. freq((BB->Succ) * (1 - HotProb) > freq(Pred->Succ) * HotProb
Show All 19 Lines
/// \brief Select the best successor for a block.		/// \brief Select the best successor for a block.
///		///
/// This looks across all successors of a particular block and attempts to		/// This looks across all successors of a particular block and attempts to
/// select the "best" one to be the layout successor. It only considers direct		/// select the "best" one to be the layout successor. It only considers direct
/// successors which also pass the block filter. It will attempt to avoid		/// successors which also pass the block filter. It will attempt to avoid
/// breaking CFG structure, but cave and break such structures in the case of		/// breaking CFG structure, but cave and break such structures in the case of
/// very hot successor edges.		/// very hot successor edges.
///		///
/// \returns The best successor block found, or null if none are viable.		/// \returns The best successor block found, or null if none are viable, along
MachineBasicBlock *		/// with a boolean indicating if tail duplication is necessary.
		MachineBlockPlacement::BlockAndTailDupResult
MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB,		MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB,
BlockChain &Chain,		BlockChain &Chain,
const BlockFilterSet *BlockFilter) {		const BlockFilterSet *BlockFilter) {
const BranchProbability HotProb(StaticLikelyProb, 100);		const BranchProbability HotProb(StaticLikelyProb, 100);

MachineBasicBlock *BestSucc = nullptr;		BlockAndTailDupResult BestSucc = { nullptr, false };
auto BestProb = BranchProbability::getZero();		auto BestProb = BranchProbability::getZero();

SmallVector<MachineBasicBlock *, 4> Successors;		SmallVector<MachineBasicBlock *, 4> Successors;
auto AdjustedSumProb =		auto AdjustedSumProb =
collectViableSuccessors(BB, Chain, BlockFilter, Successors);		collectViableSuccessors(BB, Chain, BlockFilter, Successors);

DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n");		DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n");
for (MachineBasicBlock *Succ : Successors) {		for (MachineBasicBlock *Succ : Successors) {
		bool ShouldTailDup = false;
auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);		auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ);
BranchProbability SuccProb =		BranchProbability SuccProb =
getAdjustedProbability(RealSuccProb, AdjustedSumProb);		getAdjustedProbability(RealSuccProb, AdjustedSumProb);

// This heuristic is off by default.		// This heuristic is off by default.
if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb,		if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb,
HotProb))		HotProb)) {
return Succ;		BestSucc.BB = Succ;
		return BestSucc;
		}

BlockChain &SuccChain = *BlockToChain[Succ];		BlockChain &SuccChain = *BlockToChain[Succ];
// Skip the edge \c BB->Succ if block \c Succ has a better layout		// Skip the edge \c BB->Succ if block \c Succ has a better layout
// predecessor that yields lower global cost.		// predecessor that yields lower global cost.
if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,		if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb,
Chain, BlockFilter))		Chain, BlockFilter, nullptr)) {
		// If tail duplication would make Succ profitable, place it.
		if (!(TailDupPlacement
		davidxlUnsubmitted Done Reply Inline Actions Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win depending on the iteration order, but adds more confusion. davidxl: Remove the first 'SuccProb > BestProb' check -- it provides only very tiny compile time win…
		&& canTailDuplicateUnplacedPreds(BB, Succ, Chain, BlockFilter)
		&& isProfitableToTailDup(BB, Succ, AdjustedSumProb, Chain,
		BlockFilter)))
continue;		continue;
		ShouldTailDup = true;
		}

DEBUG(		DEBUG(
dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "		dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: "
<< SuccProb		<< SuccProb
<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")		<< (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "")
		<< (ShouldTailDup ? " (Tail Duplicate)" : "")
<< "\n");		<< "\n");

if (BestSucc && BestProb >= SuccProb) {		if (BestSucc.BB && BestProb >= SuccProb) {
DEBUG(dbgs() << " Not the best candidate, continuing\n");		DEBUG(dbgs() << " Not the best candidate, continuing\n");
continue;		continue;
}		}

DEBUG(dbgs() << " Setting it as best candidate\n");		DEBUG(dbgs() << " Setting it as best candidate\n");
BestSucc = Succ;		BestSucc.BB = Succ;
		BestSucc.ShouldTailDup = ShouldTailDup;
		davidxlUnsubmitted Done Reply Inline Actions no need to set ShouldTailDup in the loop -- it is already initalized outside. davidxl: no need to set ShouldTailDup in the loop -- it is already initalized outside.
BestProb = SuccProb;		BestProb = SuccProb;
}		}
if (BestSucc)		if (BestSucc.BB)
DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc) << "\n");		DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB) << "\n");

return BestSucc;		return BestSucc;
}		}

/// \brief Select the best block from a worklist.		/// \brief Select the best block from a worklist.
		davidxlUnsubmitted Done Reply Inline Actions Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do you need position ? davidxl: Why not just stable sort it? The vector should be of size 1 for most of the cases. Also why do…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Will just sort the vector. Position is because we rely on the successor order being stable and the first successor being a subtle hint. Without the position, we lose track of whether the block in the vector came before or after the block we picked without tail duplication. iteratee: Will just sort the vector. Position is because we rely on the successor order being stable and…
///		///
/// This looks through the provided worklist as a list of candidate basic		/// This looks through the provided worklist as a list of candidate basic
/// blocks and select the most profitable one to place. The definition of		/// blocks and select the most profitable one to place. The definition of
/// profitable only really makes sense in the context of a loop. This returns		/// profitable only really makes sense in the context of a loop. This returns
/// the most frequently visited block in the worklist, which in the case of		/// the most frequently visited block in the worklist, which in the case of
/// a loop, is the one most desirable to be physically close to the rest of the		/// a loop, is the one most desirable to be physically close to the rest of the
/// loop body in order to improve i-cache behavior.		/// loop body in order to improve i-cache behavior.
///		///
/// \returns The best block found, or null if none are viable.		/// \returns The best block found, or null if none are viable.
MachineBasicBlock *MachineBlockPlacement::selectBestCandidateBlock(		MachineBasicBlock *MachineBlockPlacement::selectBestCandidateBlock(
		davidxlUnsubmitted Done Reply Inline Actions Should it break instead? davidxl: Should it break instead?
BlockChain &Chain, SmallVectorImpl<MachineBasicBlock *> &WorkList) {		BlockChain &Chain, SmallVectorImpl<MachineBasicBlock *> &WorkList) {
// Once we need to walk the worklist looking for a candidate, cleanup the		// Once we need to walk the worklist looking for a candidate, cleanup the
		davidxlUnsubmitted Not Done Reply Inline Actions isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not be true here as there are other two possibilities: Succ == BestSucc.BB in the base layout BestSucc.BB == null in the base layout (all BB's successors have conflicts). In such two cases, isProfitable check should probably be skipped (as it is benefitial) davidxl: isProfitableToTailDup assumes the baseline layout does not pick Succ. The assumption may not…
		iterateeAuthorUnsubmitted Not Done Reply Inline Actions Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the first loop by construction. Good catch. I'll add that. iteratee: 1. Succ won't equal BestSucc.BB because of the continue. These blocks were not chosen by the…
// worklist of already placed entries.		// worklist of already placed entries.
// FIXME: If this shows up on profiles, it could be folded (at the cost of		// FIXME: If this shows up on profiles, it could be folded (at the cost of
// some code complexity) into the loop below.		// some code complexity) into the loop below.
WorkList.erase(remove_if(WorkList,		WorkList.erase(remove_if(WorkList,
[&](MachineBasicBlock *BB) {		[&](MachineBasicBlock *BB) {
return BlockToChain.lookup(BB) == &Chain;		return BlockToChain.lookup(BB) == &Chain;
}),		}),
WorkList.end());		WorkList.end());
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	void MachineBlockPlacement::buildChain(
for (;;) {		for (;;) {
assert(BB && "null block found at end of chain in loop.");		assert(BB && "null block found at end of chain in loop.");
assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop.");		assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop.");
assert(*std::prev(Chain.end()) == BB && "BB Not found at end of chain.");		assert(*std::prev(Chain.end()) == BB && "BB Not found at end of chain.");


// Look for the best viable successor if there is one to place immediately		// Look for the best viable successor if there is one to place immediately
// after this block.		// after this block.
MachineBasicBlock *BestSucc = selectBestSuccessor(BB, Chain, BlockFilter);		auto Result = selectBestSuccessor(BB, Chain, BlockFilter);
		MachineBasicBlock* BestSucc = Result.BB;
		bool ShouldTailDup = Result.ShouldTailDup;
		if (TailDupPlacement)
		ShouldTailDup \|= (BestSucc && shouldTailDuplicate(BestSucc));

// If an immediate successor isn't available, look for the best viable		// If an immediate successor isn't available, look for the best viable
// block among those we've identified as not violating the loop's CFG at		// block among those we've identified as not violating the loop's CFG at
// this point. This won't be a fallthrough, but it will increase locality.		// this point. This won't be a fallthrough, but it will increase locality.
if (!BestSucc)		if (!BestSucc)
BestSucc = selectBestCandidateBlock(Chain, BlockWorkList);		BestSucc = selectBestCandidateBlock(Chain, BlockWorkList);
if (!BestSucc)		if (!BestSucc)
BestSucc = selectBestCandidateBlock(Chain, EHPadWorkList);		BestSucc = selectBestCandidateBlock(Chain, EHPadWorkList);

if (!BestSucc) {		if (!BestSucc) {
BestSucc = getFirstUnplacedBlock(Chain, PrevUnplacedBlockIt, BlockFilter);		BestSucc = getFirstUnplacedBlock(Chain, PrevUnplacedBlockIt, BlockFilter);
if (!BestSucc)		if (!BestSucc)
break;		break;

DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "		DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "
"layout successor until the CFG reduces\n");		"layout successor until the CFG reduces\n");
}		}

// Placement may have changed tail duplication opportunities.		// Placement may have changed tail duplication opportunities.
// Check for that now.		// Check for that now.
if (TailDupPlacement && BestSucc) {		if (TailDupPlacement && BestSucc && ShouldTailDup) {
// If the chosen successor was duplicated into all its predecessors,		// If the chosen successor was duplicated into all its predecessors,
// don't bother laying it out, just go round the loop again with BB as		// don't bother laying it out, just go round the loop again with BB as
// the chain end.		// the chain end.
if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,		if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,
BlockFilter, PrevUnplacedBlockIt))		BlockFilter, PrevUnplacedBlockIt))
continue;		continue;
}		}

▲ Show 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	DEBUG({
if (LoopChain.UnscheduledPredecessors) {		if (LoopChain.UnscheduledPredecessors) {
BadLoop = true;		BadLoop = true;
dbgs() << "Loop chain contains a block without its preds placed!\n"		dbgs() << "Loop chain contains a block without its preds placed!\n"
<< " Loop header: " << getBlockName(*L.block_begin()) << "\n"		<< " Loop header: " << getBlockName(*L.block_begin()) << "\n"
<< " Chain header: " << getBlockName(*LoopChain.begin()) << "\n";		<< " Chain header: " << getBlockName(*LoopChain.begin()) << "\n";
}		}
for (MachineBasicBlock *ChainBB : LoopChain) {		for (MachineBasicBlock *ChainBB : LoopChain) {
dbgs() << " ... " << getBlockName(ChainBB) << "\n";		dbgs() << " ... " << getBlockName(ChainBB) << "\n";
if (!LoopBlockSet.remove(ChainBB)) {		if (!LoopBlockSet.erase(ChainBB)) {
// We don't mark the loop as bad here because there are real situations		// We don't mark the loop as bad here because there are real situations
// where this can occur. For example, with an unanalyzable fallthrough		// where this can occur. For example, with an unanalyzable fallthrough
// from a loop block to a non-loop block or vice versa.		// from a loop block to a non-loop block or vice versa.
dbgs() << "Loop chain contains a block not contained by the loop!\n"		dbgs() << "Loop chain contains a block not contained by the loop!\n"
<< " Loop header: " << getBlockName(*L.block_begin()) << "\n"		<< " Loop header: " << getBlockName(*L.block_begin()) << "\n"
<< " Chain header: " << getBlockName(*LoopChain.begin()) << "\n"		<< " Chain header: " << getBlockName(*LoopChain.begin()) << "\n"
<< " Bad block: " << getBlockName(ChainBB) << "\n";		<< " Bad block: " << getBlockName(ChainBB) << "\n";
}		}
▲ Show 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::maybeTailDuplicateBlock(
MachineBasicBlock BB, MachineBasicBlock LPred,		MachineBasicBlock BB, MachineBasicBlock LPred,
const BlockChain &Chain, BlockFilterSet *BlockFilter,		const BlockChain &Chain, BlockFilterSet *BlockFilter,
MachineFunction::iterator &PrevUnplacedBlockIt,		MachineFunction::iterator &PrevUnplacedBlockIt,
bool &DuplicatedToLPred) {		bool &DuplicatedToLPred) {

DuplicatedToLPred = false;		DuplicatedToLPred = false;
DEBUG(dbgs() << "Redoing tail duplication for Succ#"		DEBUG(dbgs() << "Redoing tail duplication for Succ#"
<< BB->getNumber() << "\n");		<< BB->getNumber() << "\n");
bool IsSimple = TailDup.isSimpleBB(BB);
// Blocks with single successors don't create additional fallthrough		if (!shouldTailDuplicate(BB))
// opportunities. Don't duplicate them. TODO: When conditional exits are
// analyzable, allow them to be duplicated.
if (!IsSimple && BB->succ_size() == 1)
return false;
if (!TailDup.shouldTailDuplicate(IsSimple, *BB))
return false;		return false;
// This has to be a callback because none of it can be done after		// This has to be a callback because none of it can be done after
// BB is deleted.		// BB is deleted.
bool Removed = false;		bool Removed = false;
auto RemovalCallback =		auto RemovalCallback =
[&](MachineBasicBlock *RemBB) {		[&](MachineBasicBlock *RemBB) {
// Signal to outer function		// Signal to outer function
Removed = true;		Removed = true;
Show All 21 Lines	auto RemovalCallback =
RemoveList.erase(		RemoveList.erase(
remove_if(RemoveList,		remove_if(RemoveList,
[RemBB](MachineBasicBlock *BB) {return BB == RemBB;}),		[RemBB](MachineBasicBlock *BB) {return BB == RemBB;}),
RemoveList.end());		RemoveList.end());
}		}

// Handle the filter set		// Handle the filter set
if (BlockFilter) {		if (BlockFilter) {
BlockFilter->remove(RemBB);		BlockFilter->erase(RemBB);
}		}

// Remove the block from loop info.		// Remove the block from loop info.
MLI->removeBlock(RemBB);		MLI->removeBlock(RemBB);
if (RemBB == PreferredLoopExit)		if (RemBB == PreferredLoopExit)
PreferredLoopExit = nullptr;		PreferredLoopExit = nullptr;

DEBUG(dbgs() << "TailDuplicator deleted block: "		DEBUG(dbgs() << "TailDuplicator deleted block: "
<< getBlockName(RemBB) << "\n");		<< getBlockName(RemBB) << "\n");
};		};
auto RemovalCallbackRef =		auto RemovalCallbackRef =
llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback);		llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback);

SmallVector<MachineBasicBlock *, 8> DuplicatedPreds;		SmallVector<MachineBasicBlock *, 8> DuplicatedPreds;
		bool IsSimple = TailDup.isSimpleBB(BB);
TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred,		TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred,
&DuplicatedPreds, &RemovalCallbackRef);		&DuplicatedPreds, &RemovalCallbackRef);

// Update UnscheduledPredecessors to reflect tail-duplication.		// Update UnscheduledPredecessors to reflect tail-duplication.
DuplicatedToLPred = false;		DuplicatedToLPred = false;
for (MachineBasicBlock *Pred : DuplicatedPreds) {		for (MachineBasicBlock *Pred : DuplicatedPreds) {
// We're only looking for unscheduled predecessors that match the filter.		// We're only looking for unscheduled predecessors that match the filter.
BlockChain* PredChain = BlockToChain[Pred];		BlockChain* PredChain = BlockToChain[Pred];
Show All 24 Lines	bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
F = &MF;		F = &MF;
MBPI = &getAnalysis<MachineBranchProbabilityInfo>();		MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(		MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
getAnalysis<MachineBlockFrequencyInfo>());		getAnalysis<MachineBlockFrequencyInfo>());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TLI = MF.getSubtarget().getTargetLowering();		TLI = MF.getSubtarget().getTargetLowering();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
		MPDT = nullptr;

// Initialize PreferredLoopExit to nullptr here since it may never be set if		// Initialize PreferredLoopExit to nullptr here since it may never be set if
// there are no MachineLoops.		// there are no MachineLoops.
PreferredLoopExit = nullptr;		PreferredLoopExit = nullptr;

if (TailDupPlacement) {		if (TailDupPlacement) {
		MPDT = &getAnalysis<MachinePostDominatorTree>();
unsigned TailDupSize = TailDuplicatePlacementThreshold;		unsigned TailDupSize = TailDuplicatePlacementThreshold;
if (MF.getFunction()->optForSize())		if (MF.getFunction()->optForSize())
TailDupSize = 1;		TailDupSize = 1;
TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);		TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize);
}		}

assert(BlockToChain.empty());		assert(BlockToChain.empty());

Show All 14 Lines	if (MF.size() > 3 && EnableTailMerge) {

if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),		if (BF.OptimizeFunction(MF, TII, MF.getSubtarget().getRegisterInfo(),
getAnalysisIfAvailable<MachineModuleInfo>(), MLI,		getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
/AfterBlockPlacement=/true)) {		/AfterBlockPlacement=/true)) {
// Redo the layout if tail merging creates/removes/moves blocks.		// Redo the layout if tail merging creates/removes/moves blocks.
BlockToChain.clear();		BlockToChain.clear();
// Must redo the dominator tree if blocks were changed.		// Must redo the dominator tree if blocks were changed.
MDT->runOnMachineFunction(MF);		MDT->runOnMachineFunction(MF);
		if (MPDT)
		MPDT->runOnMachineFunction(MF);
ChainAllocator.DestroyAll();		ChainAllocator.DestroyAll();
buildCFGChains();		buildCFGChains();
}		}
}		}

optimizeBranches();		optimizeBranches();
alignBlocks();		alignBlocks();

▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

test/CodeGen/AArch64/addsub.ll

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; CHECK: b.gt [[RET]]			; CHECK: b.gt [[RET]]
	%newval4 = add i32 %val, 3			%newval4 = add i32 %val, 3
	store i32 %newval4, i32* @var_i32			store i32 %newval4, i32* @var_i32
	%cmp_pos_sgt = icmp sgt i32 %val2, 321			%cmp_pos_sgt = icmp sgt i32 %val2, 321
	br i1 %cmp_pos_sgt, label %ret, label %test5			br i1 %cmp_pos_sgt, label %ret, label %test5

	test5:			test5:
	; CHECK: cmn {{w[0-9]+}}, #444			; CHECK: cmn {{w[0-9]+}}, #444
	; CHECK: b.gt [[RET]]			; CHECK: b.le [[TEST6:.?LBB[0-9]+_[0-9]+]]
	%newval5 = add i32 %val, 4			%newval5 = add i32 %val, 4
	store i32 %newval5, i32* @var_i32			store i32 %newval5, i32* @var_i32
	%cmp_neg_uge = icmp sgt i32 %val2, -444			%cmp_neg_uge = icmp sgt i32 %val2, -444
	br i1 %cmp_neg_uge, label %ret, label %test6			br i1 %cmp_neg_uge, label %ret, label %test6

				; CHECK: {{^}}[[RET]]:
				; CHECK: ret
				; CHECK: {{^}}[[TEST6]]:
				; CHECK: ret

	test6:			test6:
	%newval6 = add i32 %val, 5			%newval6 = add i32 %val, 5
	store i32 %newval6, i32* @var_i32			store i32 %newval6, i32* @var_i32
	ret void			ret void

	ret:			ret:
	ret void			ret void
	}			}
	; TODO: adds/subs			; TODO: adds/subs

test/CodeGen/AArch64/arm64-atomic.ll

	; RUN: llc < %s -mtriple=arm64-eabi -asm-verbose=false -verify-machineinstrs -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-eabi -asm-verbose=false -verify-machineinstrs -mcpu=cyclone \| FileCheck %s

	define i32 @val_compare_and_swap(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap(i32* %p, i32 %cmp, i32 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap:			; CHECK-LABEL: val_compare_and_swap:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {			define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {
	; CHECK-LABEL: val_compare_and_swap_from_load:			; CHECK-LABEL: val_compare_and_swap_from_load:
	; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]			; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x0]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x0]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: mov x0, x[[ADDR]]
				; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: mov x0, x[[ADDR]]
				; CHECK-NEXT: ret
	%new = load i32, i32* %pnew			%new = load i32, i32* %pnew
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap_rel:			; CHECK-LABEL: val_compare_and_swap_rel:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]			; CHECK-NEXT: ldaxr [[RESULT:w[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], w1			; CHECK-NEXT: cmp [[RESULT]], w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stlxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]			; CHECK-NEXT: stlxr [[SCRATCH_REG:w[0-9]+]], w2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acq_rel monotonic			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acq_rel monotonic
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i64 @val_compare_and_swap_64(i64* %p, i64 %cmp, i64 %new) #0 {			define i64 @val_compare_and_swap_64(i64* %p, i64 %cmp, i64 %new) #0 {
	; CHECK-LABEL: val_compare_and_swap_64:			; CHECK-LABEL: val_compare_and_swap_64:
	; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0			; CHECK-NEXT: mov x[[ADDR:[0-9]+]], x0
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldxr [[RESULT:x[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldxr [[RESULT:x[0-9]+]], [x[[ADDR]]]
	; CHECK-NEXT: cmp [[RESULT]], x1			; CHECK-NEXT: cmp [[RESULT]], x1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], x2, [x[[ADDR]]]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], x2, [x[[ADDR]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: b [[EXITBB:.?LBB[0-9_]+]]			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[EXITBB]]:			; CHECK-NEXT: ret
	%pair = cmpxchg i64* %p, i64 %cmp, i64 %new monotonic monotonic			%pair = cmpxchg i64* %p, i64 %cmp, i64 %new monotonic monotonic
	%val = extractvalue { i64, i1 } %pair, 0			%val = extractvalue { i64, i1 } %pair, 0
	ret i64 %val			ret i64 %val
	}			}

	define i32 @fetch_and_nand(i32* %p) #0 {			define i32 @fetch_and_nand(i32* %p) #0 {
	; CHECK-LABEL: fetch_and_nand:			; CHECK-LABEL: fetch_and_nand:
	; CHECK: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK: [[TRYBB:.?LBB[0-9_]+]]:
	▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-ccmp.ll

	Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

	; Speculatively execute division by zero.			; Speculatively execute division by zero.
	; The sdiv/udiv instructions do not trap when the divisor is zero, so they are			; The sdiv/udiv instructions do not trap when the divisor is zero, so they are
	; safe to speculate.			; safe to speculate.
	; CHECK-LABEL: speculate_division:			; CHECK-LABEL: speculate_division:
	; CHECK: cmp w0, #1			; CHECK: cmp w0, #1
	; CHECK: sdiv [[DIVRES:w[0-9]+]], w1, w0			; CHECK: sdiv [[DIVRES:w[0-9]+]], w1, w0
	; CHECK: ccmp [[DIVRES]], #16, #0, ge			; CHECK: ccmp [[DIVRES]], #16, #0, ge
	; CHECK: b.gt [[BLOCK:LBB[0-9_]+]]			; CHECK: b.le [[BLOCK:LBB[0-9_]+]]
	; CHECK: bl _foo
	; CHECK: [[BLOCK]]:
	; CHECK: orr w0, wzr, #0x7			; CHECK: orr w0, wzr, #0x7
				; CHECK: [[BLOCK]]:
				; CHECK: bl _foo
	define i32 @speculate_division(i32 %a, i32 %b) nounwind ssp {			define i32 @speculate_division(i32 %a, i32 %b) nounwind ssp {
	entry:			entry:
	%cmp = icmp sgt i32 %a, 0			%cmp = icmp sgt i32 %a, 0
	br i1 %cmp, label %land.lhs.true, label %if.end			br i1 %cmp, label %land.lhs.true, label %if.end

	land.lhs.true:			land.lhs.true:
	%div = sdiv i32 %b, %a			%div = sdiv i32 %b, %a
	%cmp1 = icmp slt i32 %div, 17			%cmp1 = icmp slt i32 %div, 17
	br i1 %cmp1, label %if.then, label %if.end			br i1 %cmp1, label %if.then, label %if.end

	if.then:			if.then:
	%call = tail call i32 @foo() nounwind			%call = tail call i32 @foo() nounwind
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret i32 7			ret i32 7
	}			}

	; Floating point compare.			; Floating point compare.
	; CHECK: single_fcmp			; CHECK: single_fcmp
	; CHECK: cmp			; CHECK: cmp
	; CHECK-NOT: b.			; CHECK-NOT: b.
	; CHECK: fccmp {{.*}}, #8, ge			; CHECK: fccmp {{.*}}, #8, ge
	; CHECK: b.lt			; CHECK: b.ge
	define i32 @single_fcmp(i32 %a, float %b) nounwind ssp {			define i32 @single_fcmp(i32 %a, float %b) nounwind ssp {
	entry:			entry:
	%cmp = icmp sgt i32 %a, 0			%cmp = icmp sgt i32 %a, 0
	br i1 %cmp, label %land.lhs.true, label %if.end			br i1 %cmp, label %land.lhs.true, label %if.end

	land.lhs.true:			land.lhs.true:
	%conv = sitofp i32 %a to float			%conv = sitofp i32 %a to float
	%div = fdiv float %b, %conv			%div = fdiv float %b, %conv
	▲ Show 20 Lines • Show All 512 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-shrink-wrapping.ll

	Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]			; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]
	; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8			; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8
	; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]			; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]
	; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]			; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]
	; CHECK-NEXT: sub w1, w1, #1			; CHECK-NEXT: sub w1, w1, #1
	; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]			; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]
	; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]			; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]
	; DISABLE-NEXT: b [[IFEND_LABEL]]			; CHECK-NEXT: [[IFEND_LABEL]]:
	;
	; DISABLE: [[ELSE_LABEL]]: ; %if.else
	; DISABLE: lsl w0, w1, #1
	;
	; CHECK: [[IFEND_LABEL]]:
	; Epilogue code.			; Epilogue code.
	; CHECK: add sp, sp, #16			; CHECK: add sp, sp, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; ENABLE: [[ELSE_LABEL]]: ; %if.else			; CHECK: [[ELSE_LABEL]]: ; %if.else
	; ENABLE-NEXT: lsl w0, w1, #1			; CHECK-NEXT: lsl w0, w1, #1
	; ENABLE_NEXT: ret			; DISABLE-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
	define i32 @variadicFunc(i32 %cond, i32 %count, ...) #0 {			define i32 @variadicFunc(i32 %cond, i32 %count, ...) #0 {
	entry:			entry:
	%ap = alloca i8*, align 8			%ap = alloca i8*, align 8
	%tobool = icmp eq i32 %cond, 0			%tobool = icmp eq i32 %cond, 0
	br i1 %tobool, label %if.else, label %if.then			br i1 %tobool, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%ap1 = bitcast i8** %ap to i8*			%ap1 = bitcast i8** %ap to i8*
	▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

test/CodeGen/AArch64/compare-branch.ll

Show All 21 Lines	test3:
%tst3 = icmp eq i64 %val3, 0		%tst3 = icmp eq i64 %val3, 0
br i1 %tst3, label %end, label %test4, !prof !1		br i1 %tst3, label %end, label %test4, !prof !1
; CHECK: cbz {{x[0-9]+}}, .LBB		; CHECK: cbz {{x[0-9]+}}, .LBB

test4:		test4:
%val4 = load volatile i64, i64* @var64		%val4 = load volatile i64, i64* @var64
%tst4 = icmp ne i64 %val4, 0		%tst4 = icmp ne i64 %val4, 0
br i1 %tst4, label %end, label %test5, !prof !1		br i1 %tst4, label %end, label %test5, !prof !1
; CHECK: cbnz {{x[0-9]+}}, .LBB		; CHECK: cbz {{x[0-9]+}}, .LBB

test5:		test5:
store volatile i64 %val4, i64* @var64		store volatile i64 %val4, i64* @var64
ret void		ret void

end:		end:
ret void		ret void
}		}


!1 = !{!"branch_weights", i32 1, i32 1}		!1 = !{!"branch_weights", i32 1, i32 1}

test/CodeGen/AArch64/logical_shifted_reg.ll

	Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; CHECK: b.lt .L			; CHECK: b.lt .L
	%shifted_op = shl i64 %val2, 63			%shifted_op = shl i64 %val2, 63
	%shifted_and = and i64 %val1, %shifted_op			%shifted_and = and i64 %val1, %shifted_op
	%tst2 = icmp slt i64 %shifted_and, 0			%tst2 = icmp slt i64 %shifted_and, 0
	br i1 %tst2, label %ret, label %test3, !prof !1			br i1 %tst2, label %ret, label %test3, !prof !1

	test3:			test3:
	; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}, asr #12			; CHECK: tst {{x[0-9]+}}, {{x[0-9]+}}, asr #12
	; CHECK: b.gt .L			; CHECK: b.le .L
	%asr_op = ashr i64 %val2, 12			%asr_op = ashr i64 %val2, 12
	%asr_and = and i64 %asr_op, %val1			%asr_and = and i64 %asr_op, %val1
	%tst3 = icmp sgt i64 %asr_and, 0			%tst3 = icmp sgt i64 %asr_and, 0
	br i1 %tst3, label %ret, label %other_exit, !prof !1			br i1 %tst3, label %ret, label %other_exit, !prof !1

	other_exit:			other_exit:
	store volatile i64 %val1, i64* @var1_64			store volatile i64 %val1, i64* @var1_64
	ret void			ret void
	ret:			ret:
	ret void			ret void
	}			}

	!1 = !{!"branch_weights", i32 1, i32 1}			!1 = !{!"branch_weights", i32 1, i32 1}

test/CodeGen/AArch64/tail-dup-repeat-worklist.ll

This file was deleted.

	; RUN: llc -O3 -o - -verify-machineinstrs %s \| FileCheck %s
	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"

	%struct.s1 = type { %struct.s3, %struct.s1 }
	%struct.s2 = type opaque
	%struct.s3 = type { i32 }

	; Function Attrs: nounwind
	define internal fastcc i32 @repeated_dup_worklist(%struct.s1** %pp1, %struct.s2* %p2, i32 %state, i1 %i1_1, i32 %i32_1) unnamed_addr #0 {
	entry:
	br label %while.cond.outer

	; The loop gets laid out:
	; %while.cond.outer
	; %(null)
	; %(null)
	; %dup2
	; and then %dup1 gets chosen as the next block.
	; when dup2 is duplicated into dup1, %worklist could erroneously be placed on
	; the worklist, because all of its current predecessors are now scheduled.
	; However, after dup2 is tail-duplicated, %worklist can't be on the worklist
	; because it now has unscheduled predecessors.q
	; CHECK-LABEL: repeated_dup_worklist
	; CHECK: // %entry
	; CHECK: // %while.cond.outer
	; first %(null) block
	; CHECK: // in Loop:
	; CHECK: ldr
	; CHECK-NEXT: tbnz
	; second %(null) block
	; CHECK: // in Loop:
	; CHECK: // %dup2
	; CHECK: // %worklist
	; CHECK: // %if.then96.i
	while.cond.outer: ; preds = %dup1, %entry
	%progress.0.ph = phi i32 [ 0, %entry ], [ %progress.1, %dup1 ]
	%inc77 = add nsw i32 %progress.0.ph, 1
	%cmp = icmp slt i32 %progress.0.ph, %i32_1
	br i1 %cmp, label %dup2, label %dup1

	dup2: ; preds = %if.then96.i, %worklist, %while.cond.outer
	%progress.1.ph = phi i32 [ 0, %while.cond.outer ], [ %progress.1, %if.then96.i ], [ %progress.1, %worklist ]
	%.pr = load %struct.s1, %struct.s1* %pp1, align 8
	br label %dup1

	dup1: ; preds = %dup2, %while.cond.outer
	%0 = phi %struct.s1* [ %.pr, %dup2 ], [ undef, %while.cond.outer ]
	%progress.1 = phi i32 [ %progress.1.ph, %dup2 ], [ %inc77, %while.cond.outer ]
	br i1 %i1_1, label %while.cond.outer, label %worklist

	worklist: ; preds = %dup1
	%snode94 = getelementptr inbounds %struct.s1, %struct.s1* %0, i64 0, i32 0
	%1 = load %struct.s3, %struct.s3* %snode94, align 8
	%2 = getelementptr inbounds %struct.s3, %struct.s3* %1, i32 0, i32 0
	%3 = load i32, i32* %2, align 4
	%tobool95.i = icmp eq i32 %3, 0
	br i1 %tobool95.i, label %if.then96.i, label %dup2

	if.then96.i: ; preds = %worklist
	call fastcc void @free_s3(%struct.s2* %p2, %struct.s3* %1) #1
	br label %dup2
	}

	; Function Attrs: nounwind
	declare fastcc void @free_s3(%struct.s2, %struct.s3) unnamed_addr #0

	attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a57" "target-features"="+crc,+crypto,+neon" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { nounwind }

test/CodeGen/AArch64/tbz-tbnz.ll

	; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s

	declare void @t()			declare void @t()

	define void @test1(i32 %a) {			define void @test1(i32 %a) {
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	entry:			entry:
	%sub = add nsw i32 %a, -12			%sub = add nsw i32 %a, -12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:w[0-9]+]], w0, #12			; CHECK: sub [[CMP:w[0-9]+]], w0, #12
	; CHECK: tbz [[CMP]], #31			; CHECK: tbnz [[CMP]], #31

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test2(i64 %a) {			define void @test2(i64 %a) {
	; CHECK-LABEL: @test2			; CHECK-LABEL: @test2
	entry:			entry:
	%sub = add nsw i64 %a, -12			%sub = add nsw i64 %a, -12
	%cmp = icmp slt i64 %sub, 0			%cmp = icmp slt i64 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:x[0-9]+]], x0, #12			; CHECK: sub [[CMP:x[0-9]+]], x0, #12
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	define void @test7(i32 %a) {			define void @test7(i32 %a) {
	; CHECK-LABEL: @test7			; CHECK-LABEL: @test7
	entry:			entry:
	%sub = sub nsw i32 %a, 12			%sub = sub nsw i32 %a, 12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	; CHECK: sub [[CMP:w[0-9]+]], w0, #12			; CHECK: sub [[CMP:w[0-9]+]], w0, #12
	; CHECK: tbz [[CMP]], #31			; CHECK: tbnz [[CMP]], #31

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	}			}

	define void @test9(i64 %val1) {			define void @test9(i64 %val1) {
	; CHECK-LABEL: @test9			; CHECK-LABEL: @test9
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test10(i64 %val1) {			define void @test10(i64 %val1) {
	; CHECK-LABEL: @test10			; CHECK-LABEL: @test10
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test11(i64 %val1, i64* %ptr) {			define void @test11(i64 %val1, i64* %ptr) {
	; CHECK-LABEL: @test11			; CHECK-LABEL: @test11

	; CHECK: ldr [[CMP:x[0-9]+]], [x1]			; CHECK: ldr [[CMP:x[0-9]+]], [x1]
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	%val = load i64, i64* %ptr			%val = load i64, i64* %ptr
	%tst = icmp slt i64 %val, 0			%tst = icmp slt i64 %val, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test12(i64 %val1) {			define void @test12(i64 %val1) {
	; CHECK-LABEL: @test12			; CHECK-LABEL: @test12
	%tst = icmp slt i64 %val1, 0			%tst = icmp slt i64 %val1, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz x0, #63			; CHECK: tbnz x0, #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test13(i64 %val1, i64 %val2) {			define void @test13(i64 %val1, i64 %val2) {
	; CHECK-LABEL: @test13			; CHECK-LABEL: @test13
	%or = or i64 %val1, %val2			%or = or i64 %val1, %val2
	%tst = icmp slt i64 %or, 0			%tst = icmp slt i64 %or, 0
	br i1 %tst, label %if.then, label %if.end			br i1 %tst, label %if.then, label %if.end

	; CHECK: orr [[CMP:x[0-9]+]], x0, x1			; CHECK: orr [[CMP:x[0-9]+]], x0, x1
	; CHECK-NOT: cmp			; CHECK-NOT: cmp
	; CHECK: tbz [[CMP]], #63			; CHECK: tbnz [[CMP]], #63

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @test14(i1 %cond) {			define void @test14(i1 %cond) {
	; CHECK-LABEL: @test14			; CHECK-LABEL: @test14
	br i1 %cond, label %if.end, label %if.then			br i1 %cond, label %if.end, label %if.then

	; CHECK-NOT: and			; CHECK-NOT: and
	; CHECK: tbnz w0, #0			; CHECK: tbz w0, #0

	if.then:			if.then:
	call void @t()			call void @t()
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/branch-relaxation.ll

	Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: v_nop_e64			; GCN-NEXT: v_nop_e64
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND

	; GCN-NEXT: [[BB3]]: ; %bb3			; GCN-NEXT: [[BB3]]: ; %bb3
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: v_nop_e64
				; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @expand_requires_expand(i32 %cond0) #0 {			define void @expand_requires_expand(i32 %cond0) #0 {
	bb0:			bb0:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
	%cmp0 = icmp slt i32 %cond0, 0			%cmp0 = icmp slt i32 %cond0, 0
	br i1 %cmp0, label %bb2, label %bb1			br i1 %cmp0, label %bb2, label %bb1

	bb1:			bb1:
	%val = load volatile i32, i32 addrspace(2)* undef			%val = load volatile i32, i32 addrspace(2)* undef
	%cmp1 = icmp eq i32 %val, 3			%cmp1 = icmp eq i32 %val, 3
	br i1 %cmp1, label %bb3, label %bb2			br i1 %cmp1, label %bb3, label %bb2

	bb2:			bb2:
	call void asm sideeffect			call void asm sideeffect
	"v_nop_e64			"v_nop_e64
	v_nop_e64			v_nop_e64
	v_nop_e64			v_nop_e64
	v_nop_e64", ""() #0			v_nop_e64", ""() #0
	br label %bb3			br label %bb3

	bb3:			bb3:
				; These NOPs prevent tail-duplication-based outlining
				; from firing, which defeats the need to expand the branches and this test.
				call void asm sideeffect
				"v_nop_e64", ""() #0
				call void asm sideeffect
				"v_nop_e64", ""() #0
	ret void			ret void
	}			}

	; Requires expanding of required skip branch.			; Requires expanding of required skip branch.

	; GCN-LABEL: {{^}}uniform_inside_divergent:			; GCN-LABEL: {{^}}uniform_inside_divergent:
	; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}			; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
	; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc			; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
	Show All 13 Lines
	; GCN: s_cbranch_scc1 [[ENDIF]]			; GCN: s_cbranch_scc1 [[ENDIF]]

	; GCN-NEXT: ; BB#2: ; %if_uniform			; GCN-NEXT: ; BB#2: ; %if_uniform
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)

	; GCN-NEXT: [[ENDIF]]: ; %endif			; GCN-NEXT: [[ENDIF]]: ; %endif
	; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]			; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]
				; GCN-NEXT: s_sleep 5
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) #0 {			define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%d_cmp = icmp ult i32 %tid, 16			%d_cmp = icmp ult i32 %tid, 16
	br i1 %d_cmp, label %if, label %endif			br i1 %d_cmp, label %if, label %endif

	if:			if:
	store i32 0, i32 addrspace(1)* %out			store i32 0, i32 addrspace(1)* %out
	%u_cmp = icmp eq i32 %cond, 0			%u_cmp = icmp eq i32 %cond, 0
	br i1 %u_cmp, label %if_uniform, label %endif			br i1 %u_cmp, label %if_uniform, label %endif

	if_uniform:			if_uniform:
	store i32 1, i32 addrspace(1)* %out			store i32 1, i32 addrspace(1)* %out
	br label %endif			br label %endif

	endif:			endif:
				; layout can remove the split branch if it can copy the return block.
				; This call makes the return block long enough that it doesn't get copied.
				call void @llvm.amdgcn.s.sleep(i32 5);
	ret void			ret void
	}			}

	; si_mask_branch			; si_mask_branch
	; s_cbranch_execz			; s_cbranch_execz
	; s_branch			; s_branch

	; GCN-LABEL: {{^}}analyze_mask_branch:			; GCN-LABEL: {{^}}analyze_mask_branch:
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/si-annotate-cf-noloop.ll

Show All 31 Lines	bb5: ; preds = %bb3, %bb1
unreachable		unreachable
}		}


; OPT-LABEL: @annotate_ret_noloop(		; OPT-LABEL: @annotate_ret_noloop(
; OPT-NOT: call i1 @llvm.amdgcn.loop		; OPT-NOT: call i1 @llvm.amdgcn.loop

; GCN-LABEL: {{^}}annotate_ret_noloop:		; GCN-LABEL: {{^}}annotate_ret_noloop:
; GCN: s_cbranch_scc1		; GCN: s_cbranch_scc0 [[BODY:BB[0-9]+_[0-9]+]]
		; GCN: s_endpgm

		; GCN: {{^}}[[BODY]]:
; GCN: s_endpgm		; GCN: s_endpgm
; GCN: .Lfunc_end1		; GCN: .Lfunc_end1
define void @annotate_ret_noloop(<4 x float> addrspace(1)* noalias nocapture readonly %arg) #0 {		define void @annotate_ret_noloop(<4 x float> addrspace(1)* noalias nocapture readonly %arg) #0 {
bb:		bb:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
br label %bb1		br label %bb1

bb1: ; preds = %bb		bb1: ; preds = %bb
Show All 22 Lines

test/CodeGen/AMDGPU/uniform-cfg.ll

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	ENDIF: ; preds = %IF, %main_body
ret void		ret void
}		}

; GCN-LABEL: {{^}}icmp_users_different_blocks:		; GCN-LABEL: {{^}}icmp_users_different_blocks:
; GCN: s_load_dword [[COND:s[0-9]+]]		; GCN: s_load_dword [[COND:s[0-9]+]]
; GCN: s_cmp_lt_i32 [[COND]], 1		; GCN: s_cmp_lt_i32 [[COND]], 1
; GCN: s_cbranch_scc1 [[EXIT:[A-Za-z0-9_]+]]		; GCN: s_cbranch_scc1 [[EXIT:[A-Za-z0-9_]+]]
; GCN: v_cmp_gt_i32_e64 vcc, [[COND]], 0{{$}}		; GCN: v_cmp_gt_i32_e64 vcc, [[COND]], 0{{$}}
; GCN: s_cbranch_vccnz [[EXIT]]		; GCN: s_cbranch_vccz [[BODY:[A-Za-z0-9_]+]]
; GCN: buffer_store
; GCN: {{^}}[[EXIT]]:		; GCN: {{^}}[[EXIT]]:
; GCN: s_endpgm		; GCN: s_endpgm
		; GCN: {{^}}[[BODY]]:
		; GCN: buffer_store
		; GCN: s_endpgm
define void @icmp_users_different_blocks(i32 %cond0, i32 %cond1, i32 addrspace(1)* %out) {		define void @icmp_users_different_blocks(i32 %cond0, i32 %cond1, i32 addrspace(1)* %out) {
bb:		bb:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
%cmp0 = icmp sgt i32 %cond0, 0		%cmp0 = icmp sgt i32 %cond0, 0
%cmp1 = icmp sgt i32 %cond1, 0		%cmp1 = icmp sgt i32 %cond1, 0
br i1 %cmp0, label %bb2, label %bb9		br i1 %cmp0, label %bb2, label %bb9

bb2: ; preds = %bb		bb2: ; preds = %bb
Show All 30 Lines
}		}

; Test uniform and divergent.		; Test uniform and divergent.

; GCN-LABEL: {{^}}uniform_inside_divergent:		; GCN-LABEL: {{^}}uniform_inside_divergent:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: s_cbranch_execz [[ENDIF_LABEL:[0-9_A-Za-z]+]]
; GCN: s_cmp_lg_u32 {{s[0-9]+}}, 0		; GCN: s_cmp_lg_u32 {{s[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[ENDIF_LABEL]]		; GCN: s_cbranch_scc0 [[IF_UNIFORM_LABEL:[A-Z0-9_a-z]+]]
		; GCN: s_endpgm
		; GCN: {{^}}[[IF_UNIFORM_LABEL]]:
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) {		define void @uniform_inside_divergent(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
%d_cmp = icmp ult i32 %tid, 16		%d_cmp = icmp ult i32 %tid, 16
br i1 %d_cmp, label %if, label %endif		br i1 %d_cmp, label %if, label %endif

if:		if:
store i32 0, i32 addrspace(1)* %out		store i32 0, i32 addrspace(1)* %out
%u_cmp = icmp eq i32 %cond, 0		%u_cmp = icmp eq i32 %cond, 0
br i1 %u_cmp, label %if_uniform, label %endif		br i1 %u_cmp, label %if_uniform, label %endif

if_uniform:		if_uniform:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
br label %endif		br label %endif

endif:		endif:
ret void		ret void
}		}

; GCN-LABEL: {{^}}divergent_inside_uniform:		; GCN-LABEL: {{^}}divergent_inside_uniform:
; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0		; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[ENDIF_LABEL:[0-9_A-Za-z]+]]		; GCN: s_cbranch_scc0 [[IF_LABEL:[0-9_A-Za-z]+]]
		; GCN: [[IF_LABEL]]:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK1:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
; GCN: [[ENDIF_LABEL]]:
; GCN: s_endpgm
define void @divergent_inside_uniform(i32 addrspace(1)* %out, i32 %cond) {		define void @divergent_inside_uniform(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%u_cmp = icmp eq i32 %cond, 0		%u_cmp = icmp eq i32 %cond, 0
br i1 %u_cmp, label %if, label %endif		br i1 %u_cmp, label %if, label %endif

if:		if:
store i32 0, i32 addrspace(1)* %out		store i32 0, i32 addrspace(1)* %out
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
Show All 11 Lines
; GCN-LABEL: {{^}}divergent_if_uniform_if:		; GCN-LABEL: {{^}}divergent_if_uniform_if:
; GCN: v_cmp_eq_u32_e32 vcc, 0, v0		; GCN: v_cmp_eq_u32_e32 vcc, 0, v0
; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
; GCN: s_xor_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]		; GCN: s_xor_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], exec, [[MASK]]
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1		; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
; GCN: buffer_store_dword [[ONE]]		; GCN: buffer_store_dword [[ONE]]
; GCN: s_or_b64 exec, exec, [[MASK]]		; GCN: s_or_b64 exec, exec, [[MASK]]
; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0		; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 0
; GCN: s_cbranch_scc1 [[EXIT:[A-Z0-9_]+]]		; GCN: s_cbranch_scc0 [[IF_UNIFORM:[A-Z0-9_]+]]
		; GCN: s_endpgm
		; GCN: [[IF_UNIFORM]]:
; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2		; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2
; GCN: buffer_store_dword [[TWO]]		; GCN: buffer_store_dword [[TWO]]
; GCN: [[EXIT]]:
; GCN: s_endpgm
define void @divergent_if_uniform_if(i32 addrspace(1)* %out, i32 %cond) {		define void @divergent_if_uniform_if(i32 addrspace(1)* %out, i32 %cond) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() #0		%tid = call i32 @llvm.amdgcn.workitem.id.x() #0
%d_cmp = icmp eq i32 %tid, 0		%d_cmp = icmp eq i32 %tid, 0
br i1 %d_cmp, label %if, label %endif		br i1 %d_cmp, label %if, label %endif

if:		if:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
Show All 14 Lines
; The condition of the branches in the two blocks are		; The condition of the branches in the two blocks are
; uniform. MachineCSE replaces the 2nd condition with the inverse of		; uniform. MachineCSE replaces the 2nd condition with the inverse of
; the first, leaving an scc use in a different block than it was		; the first, leaving an scc use in a different block than it was
; defed.		; defed.

; GCN-LABEL: {{^}}cse_uniform_condition_different_blocks:		; GCN-LABEL: {{^}}cse_uniform_condition_different_blocks:
; GCN: s_load_dword [[COND:s[0-9]+]]		; GCN: s_load_dword [[COND:s[0-9]+]]
; GCN: s_cmp_lt_i32 [[COND]], 1		; GCN: s_cmp_lt_i32 [[COND]], 1
; GCN: s_cbranch_scc1 BB[[FNNUM:[0-9]+]]_3		; GCN: s_cbranch_scc1 [[FN:BB[0-9_]+]]

; GCN: BB#1:		; GCN: BB#1:
; GCN-NOT: cmp		; GCN-NOT: cmp
; GCN: buffer_load_dword		; GCN: buffer_load_dword
; GCN: buffer_store_dword		; GCN: buffer_store_dword
; GCN: s_cbranch_scc1 BB[[FNNUM]]_3		; GCN: s_cbranch_scc0 [[BB7:BB[0-9_]+]]

; GCN: BB[[FNNUM]]_3:		; GCN: [[FN]]:
; GCN: s_endpgm		; GCN: s_endpgm

		; GCN: [[BB7]]:
		; GCN: s_endpgm

define void @cse_uniform_condition_different_blocks(i32 %cond, i32 addrspace(1)* %out) {		define void @cse_uniform_condition_different_blocks(i32 %cond, i32 addrspace(1)* %out) {
bb:		bb:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #0
%tmp1 = icmp sgt i32 %cond, 0		%tmp1 = icmp sgt i32 %cond, 0
br i1 %tmp1, label %bb2, label %bb9		br i1 %tmp1, label %bb2, label %bb9

bb2: ; preds = %bb		bb2: ; preds = %bb
%tmp3 = load volatile i32, i32 addrspace(1)* undef		%tmp3 = load volatile i32, i32 addrspace(1)* undef
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

test/CodeGen/ARM/arm-and-tst-peephole.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; V8-LABEL: %tailrecurse.switch			; V8-LABEL: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: bne			; V8-NEXT: beq
	; V8-NEXT: b			; V8-NEXT: %sw.epilog
	; The trailing space in the last line checks that the branch is unconditional			; V8-NEXT: bx lr
	switch i32 %and, label %sw.epilog [			switch i32 %and, label %sw.epilog [
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 3, label %sw.bb6			i32 3, label %sw.bb6
	i32 2, label %sw.bb8			i32 2, label %sw.bb8
	], !prof !1			], !prof !1

	sw.bb: ; preds = %tailrecurse.switch, %tailrecurse			sw.bb: ; preds = %tailrecurse.switch, %tailrecurse
	%shl = shl i32 %acc.tr, 1			%shl = shl i32 %acc.tr, 1
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-op.ll

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	; CHECK-NOT: dmb ish			; CHECK-NOT: dmb ish
	; CHECK: [[LOOP_BB:\.?LBB[0-9]+_1]]:			; CHECK: [[LOOP_BB:\.?LBB[0-9]+_1]]:
	; CHECK: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]			; CHECK: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
	; CHECK: cmp [[OLDVAL]], r1			; CHECK: cmp [[OLDVAL]], r1
	; CHECK: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]			; CHECK: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]
	; CHECK: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]			; CHECK: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]
	; CHECK: cmp [[SUCCESS]], #0			; CHECK: cmp [[SUCCESS]], #0
	; CHECK: bne [[LOOP_BB]]			; CHECK: bne [[LOOP_BB]]
	; CHECK: b [[END_BB:\.?LBB[0-9]+_[0-9]+]]			; CHECK: dmb ish
				; CHECK: bx lr
	; CHECK: [[FAIL_BB]]:			; CHECK: [[FAIL_BB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[END_BB]]:
	; CHECK: dmb ish			; CHECK: dmb ish
	; CHECK: bx lr			; CHECK: bx lr

	ret i32 %oldval			ret i32 %oldval
	}			}

	define i32 @load_load_add_acquire(i32* %mem1, i32* %mem2) nounwind {			define i32 @load_load_add_acquire(i32* %mem1, i32* %mem2) nounwind {
	; CHECK-LABEL: load_load_add_acquire			; CHECK-LABEL: load_load_add_acquire
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-ops-v8.ll

	Show First 20 Lines • Show All 1,039 Lines • ▼ Show 20 Lines
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0			; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-ARM-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]			; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
				; CHECK-ARM-NEXT: bx lr
	ret i8 %old			ret i8 %old
	}			}

	define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {			define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i16:			; CHECK-LABEL: test_atomic_cmpxchg_i16:
	%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst			%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst
	%old = extractvalue { i16, i1 } %pair, 0			%old = extractvalue { i16, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16			; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16
	; CHECK-DAG: movt r[[ADDR]], :upper16:var16			; CHECK-DAG: movt r[[ADDR]], :upper16:var16
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0			; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-ARM-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]			; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
				; CHECK-ARM-NEXT: bx lr
	ret i16 %old			ret i16 %old
	}			}

	define void @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {			define void @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i32:			; CHECK-LABEL: test_atomic_cmpxchg_i32:
	%pair = cmpxchg i32* @var32, i32 %wanted, i32 %new release monotonic			%pair = cmpxchg i32* @var32, i32 %wanted, i32 %new release monotonic
	%old = extractvalue { i32, i1 } %pair, 0			%old = extractvalue { i32, i1 } %pair, 0
	store i32 %old, i32* @var32			store i32 %old, i32* @var32
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var32			; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var32
	; CHECK: movt r[[ADDR]], :upper16:var32			; CHECK: movt r[[ADDR]], :upper16:var32

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldrex r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldrex r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-NEXT: cmp r[[OLD]], r0			; CHECK-NEXT: cmp r[[OLD]], r0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlex [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlex [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK: str{{(.w)?}} r[[OLD]],
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK-NEXT: bx lr
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK: str{{(.w)?}} r[[OLD]],			; CHECK: str{{(.w)?}} r[[OLD]],
				; CHECK-ARM-NEXT: bx lr
	ret void			ret void
	}			}

	define void @test_atomic_cmpxchg_i64(i64 %wanted, i64 %new) nounwind {			define void @test_atomic_cmpxchg_i64(i64 %wanted, i64 %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i64:			; CHECK-LABEL: test_atomic_cmpxchg_i64:
	%pair = cmpxchg i64* @var64, i64 %wanted, i64 %new monotonic monotonic			%pair = cmpxchg i64* @var64, i64 %wanted, i64 %new monotonic monotonic
	%old = extractvalue { i64, i1 } %pair, 0			%old = extractvalue { i64, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var64			; CHECK: movw r[[ADDR:[0-9]+]], :lower16:var64
	; CHECK: movt r[[ADDR]], :upper16:var64			; CHECK: movt r[[ADDR]], :upper16:var64

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldrexd [[OLD1:r[0-9]+\|lr]], [[OLD2:r[0-9]+\|lr]], [r[[ADDR]]]			; CHECK: ldrexd [[OLD1:r[0-9]+\|lr]], [[OLD2:r[0-9]+\|lr]], [r[[ADDR]]]
	; r0, r1 below is a reasonable guess but could change: it certainly comes into the			; r0, r1 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0			; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0
	; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1			; CHECK-LE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1
	; CHECK-ARM-LE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-ARM-LE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-THUMB-LE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_HI]], [[MISMATCH_LO]]			; CHECK-THUMB-LE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_HI]], [[MISMATCH_LO]]
	; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1			; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_HI:r[0-9]+\|lr]], [[OLD2]], r1
	; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0			; CHECK-BE-DAG: eor{{(\.w)?}} [[MISMATCH_LO:r[0-9]+\|lr]], [[OLD1]], r0
	; CHECK-ARM-BE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_HI]], [[MISMATCH_LO]]			; CHECK-ARM-BE: orrs{{(\.w)?}} {{r[0-9]+}}, [[MISMATCH_HI]], [[MISMATCH_LO]]
	; CHECK-THUMB-BE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-THUMB-BE: orrs{{(\.w)?}} {{(r[0-9]+, )?}}[[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_3			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; As above, r2, r3 is a reasonable guess.			; As above, r2, r3 is a reasonable guess.
	; CHECK: strexd [[STATUS:r[0-9]+]], r2, r3, [r[[ADDR]]]			; CHECK: strexd [[STATUS:r[0-9]+]], r2, r3, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-NEXT: b .LBB{{[0-9]+}}_4			; CHECK: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]
	; CHECK-NEXT: .LBB{{[0-9]+}}_3:			; CHECK-NEXT: pop
	; CHECK-NEXT: clrex
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
				; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]			; CHECK-ARM: strd [[OLD1]], [[OLD2]], [r[[ADDR]]]
	store i64 %old, i64* @var64			store i64 %old, i64* @var64
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines

test/CodeGen/ARM/cmpxchg-weak.ll

	; RUN: llc < %s -mtriple=armv7-apple-ios -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-apple-ios -verify-machineinstrs \| FileCheck %s

	define void @test_cmpxchg_weak(i32 *%addr, i32 %desired, i32 %new) {			define void @test_cmpxchg_weak(i32 *%addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: test_cmpxchg_weak:			; CHECK-LABEL: test_cmpxchg_weak:

	%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic			%pair = cmpxchg weak i32* %addr, i32 %desired, i32 %new seq_cst monotonic
	%oldval = extractvalue { i32, i1 } %pair, 0			%oldval = extractvalue { i32, i1 } %pair, 0
	; CHECK-NEXT: BB#0:			; CHECK-NEXT: BB#0:
	; CHECK-NEXT: ldrex [[LOADED:r[0-9]+]], [r0]			; CHECK-NEXT: ldrex [[LOADED:r[0-9]+]], [r0]
	; CHECK-NEXT: cmp [[LOADED]], r1			; CHECK-NEXT: cmp [[LOADED]], r1
	; CHECK-NEXT: bne [[LDFAILBB:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: bne [[LDFAILBB:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: BB#1:			; CHECK-NEXT: BB#1:
	; CHECK-NEXT: dmb ish			; CHECK-NEXT: dmb ish
	; CHECK-NEXT: strex [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-NEXT: strex [[SUCCESS:r[0-9]+]], r2, [r0]
	; CHECK-NEXT: cmp [[SUCCESS]], #0			; CHECK-NEXT: cmp [[SUCCESS]], #0
	; CHECK-NEXT: bne [[FAILBB:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: beq [[SUCCESSBB:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: BB#2:			; CHECK-NEXT: BB#2:
	; CHECK-NEXT: dmb ish
	; CHECK-NEXT: str r3, [r0]			; CHECK-NEXT: str r3, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	; CHECK-NEXT: [[LDFAILBB]]:			; CHECK-NEXT: [[LDFAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: str r3, [r0]
				; CHECK-NEXT: bx lr
				; CHECK-NEXT: [[SUCCESSBB]]:
				; CHECK-NEXT: dmb ish
	; CHECK-NEXT: str r3, [r0]			; CHECK-NEXT: str r3, [r0]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr

	store i32 %oldval, i32* %addr			store i32 %oldval, i32* %addr
	ret void			ret void
	}			}


	Show All 26 Lines

test/CodeGen/ARM/machine-cse-cmp.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind			declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind

	; rdar://12462006			; rdar://12462006
	define i8* @f3(i8* %base, i32* nocapture %offset, i32 %size) nounwind {			define i8* @f3(i8* %base, i32* nocapture %offset, i32 %size) nounwind {
	entry:			entry:
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK-NOT: sub			; CHECK-NOT: sub
	; CHECK: cmp			; CHECK: cmp
	; CHECK: blt			; CHECK: bge
	%0 = load i32, i32* %offset, align 4			%0 = load i32, i32* %offset, align 4
	%cmp = icmp slt i32 %0, %size			%cmp = icmp slt i32 %0, %size
	%s = sub nsw i32 %0, %size			%s = sub nsw i32 %0, %size
	%size2 = sub nsw i32 %size, 0			%size2 = sub nsw i32 %size, 0
	br i1 %cmp, label %return, label %if.end			br i1 %cmp, label %return, label %if.end

	if.end:			if.end:
	; We are checking cse between %sub here and %s in entry block.			; We are checking cse between %sub here and %s in entry block.
	Show All 16 Lines

test/CodeGen/Mips/brconeq.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16

	@i = global i32 5, align 4			@i = global i32 5, align 4
	@j = global i32 10, align 4			@j = global i32 10, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @i, align 4			%0 = load i32, i32* @i, align 4
	%1 = load i32, i32* @j, align 4			%1 = load i32, i32* @j, align 4
	%cmp = icmp eq i32 %0, %1			%cmp = icmp ne i32 %0, %1
	; 16: cmp ${{[0-9]+}}, ${{[0-9]+}}			; 16: cmp ${{[0-9]+}}, ${{[0-9]+}}
	; 16: bteqz $[[LABEL:[0-9A-Ba-b_]+]]			; 16: bteqz $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:
	br i1 %cmp, label %if.end, label %if.then			br i1 %cmp, label %if.then, label %if.end

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %entry, %if.then			if.end: ; preds = %entry, %if.then
	ret void			ret void
	}			}
	Show All 15 Lines

test/CodeGen/Mips/brconeqk.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16

	@i = global i32 5, align 4			@i = global i32 5, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @i, align 4			%0 = load i32, i32* @i, align 4
	%cmp = icmp eq i32 %0, 10			%cmp = icmp ne i32 %0, 10
	br i1 %cmp, label %if.end, label %if.then			br i1 %cmp, label %if.then, label %if.end
	; 16: cmpi ${{[0-9]+}}, {{[0-9]+}}			; 16: cmpi ${{[0-9]+}}, {{[0-9]+}}
	; 16: bteqz $[[LABEL:[0-9A-Ba-b_]+]]			; 16: bteqz $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:
	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %entry, %if.then			if.end: ; preds = %entry, %if.then
	ret void			ret void
	}			}

test/CodeGen/Mips/brcongt.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16

	@i = global i32 5, align 4			@i = global i32 5, align 4
	@j = global i32 10, align 4			@j = global i32 10, align 4
	@k = global i32 5, align 4			@k = global i32 5, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @i, align 4			%0 = load i32, i32* @i, align 4
	%1 = load i32, i32* @j, align 4			%1 = load i32, i32* @j, align 4
	%cmp = icmp sgt i32 %0, %1			%cmp = icmp sle i32 %0, %1
	br i1 %cmp, label %if.end, label %if.then			br i1 %cmp, label %if.then, label %if.end
	; 16: slt ${{[0-9]+}}, ${{[0-9]+}}			; 16: slt ${{[0-9]+}}, ${{[0-9]+}}
	; 16: btnez $[[LABEL:[0-9A-Ba-b_]+]]			; 16: btnez $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:
	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %entry, %if.then			if.end: ; preds = %entry, %if.then
	ret void			ret void
	}			}

test/CodeGen/Mips/brconlt.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16
	; RUN: llc -march=mips -mattr=micromips -mcpu=mips32r6 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=MM32R6			; RUN: llc -march=mips -mattr=micromips -mcpu=mips32r6 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=MM32R6

	@i = global i32 5, align 4			@i = global i32 5, align 4
	@j = global i32 10, align 4			@j = global i32 10, align 4
	@k = global i32 5, align 4			@k = global i32 5, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @j, align 4			%0 = load i32, i32* @j, align 4
	%1 = load i32, i32* @i, align 4			%1 = load i32, i32* @i, align 4
	%cmp = icmp slt i32 %0, %1			%cmp = icmp sge i32 %0, %1
	br i1 %cmp, label %if.end, label %if.then			br i1 %cmp, label %if.then, label %if.end

	; 16: slt ${{[0-9]+}}, ${{[0-9]+}}			; 16: slt ${{[0-9]+}}, ${{[0-9]+}}
	; MM32R6: slt ${{[0-9]+}}, ${{[0-9]+}}			; MM32R6: slt ${{[0-9]+}}, ${{[0-9]+}}
	; 16: btnez $[[LABEL:[0-9A-Ba-b_]+]]			; 16: btnez $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %entry, %if.then			if.end: ; preds = %entry, %if.then
	ret void			ret void
	}			}

test/CodeGen/Mips/brconnez.ll

	; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16			; RUN: llc -march=mipsel -mattr=mips16 -relocation-model=pic -O3 < %s \| FileCheck %s -check-prefix=16

	@j = global i32 0, align 4			@j = global i32 0, align 4
	@result = global i32 0, align 4			@result = global i32 0, align 4

	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	%0 = load i32, i32* @j, align 4			%0 = load i32, i32* @j, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end, !prof !1

	; 16: bnez ${{[0-9]+}}, $[[LABEL:[0-9A-Ba-b_]+]]			; 16: bnez ${{[0-9]+}}, $[[LABEL:[0-9A-Ba-b_]+]]
	; 16: lw ${{[0-9]+}}, %got(result)(${{[0-9]+}})			; 16: lw ${{[0-9]+}}, %got(result)(${{[0-9]+}})
	; 16: $[[LABEL]]:			; 16: $[[LABEL]]:

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 1, i32* @result, align 4			store i32 1, i32* @result, align 4
	br label %if.end			br label %if.end

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	ret void			ret void
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/Mips/llvm-ir/ashr.ll

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	; ALL-LABEL: ashr_i64:
; M2: bnez $[[T1]], $[[BB0:BB[0-9_]+]]		; M2: bnez $[[T1]], $[[BB0:BB[0-9_]+]]
; M2: move $3, $[[T0]]		; M2: move $3, $[[T0]]
; M2: srlv $[[T2:[0-9]+]], $5, $7		; M2: srlv $[[T2:[0-9]+]], $5, $7
; M2: not $[[T3:[0-9]+]], $7		; M2: not $[[T3:[0-9]+]], $7
; M2: sll $[[T4:[0-9]+]], $4, 1		; M2: sll $[[T4:[0-9]+]], $4, 1
; M2: sllv $[[T5:[0-9]+]], $[[T4]], $[[T3]]		; M2: sllv $[[T5:[0-9]+]], $[[T4]], $[[T3]]
; M2: or $3, $[[T3]], $[[T2]]		; M2: or $3, $[[T3]], $[[T2]]
; M2: $[[BB0]]:		; M2: $[[BB0]]:
; M2: beqz $[[T1]], $[[BB1:BB[0-9_]+]]		; M2: bnez $[[T1]], $[[BB1:BB[0-9_]+]]
; M2: nop		; M2: nop
; M2: sra $2, $4, 31
; M2: $[[BB1]]:
; M2: jr $ra		; M2: jr $ra
; M2: nop		; M2: nop
		; M2: $[[BB1]]:
		; M2: jr $ra
		; M2: sra $2, $4, 31

; 32R1-R5: srlv $[[T0:[0-9]+]], $5, $7		; 32R1-R5: srlv $[[T0:[0-9]+]], $5, $7
; 32R1-R5: not $[[T1:[0-9]+]], $7		; 32R1-R5: not $[[T1:[0-9]+]], $7
; 32R1-R5: sll $[[T2:[0-9]+]], $4, 1		; 32R1-R5: sll $[[T2:[0-9]+]], $4, 1
; 32R1-R5: sllv $[[T3:[0-9]+]], $[[T2]], $[[T1]]		; 32R1-R5: sllv $[[T3:[0-9]+]], $[[T2]], $[[T1]]
; 32R1-R5: or $3, $[[T3]], $[[T0]]		; 32R1-R5: or $3, $[[T3]], $[[T0]]
; 32R1-R5: srav $[[T4:[0-9]+]], $4, $7		; 32R1-R5: srav $[[T4:[0-9]+]], $4, $7
; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32		; 32R1-R5: andi $[[T5:[0-9]+]], $7, 32
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	; ALL-LABEL: ashr_i128:
; M3: bnez $[[T3:[0-9]+]], [[BB0:.LBB[0-9_]+]]		; M3: bnez $[[T3:[0-9]+]], [[BB0:.LBB[0-9_]+]]
; M3: move $3, $[[T1]]		; M3: move $3, $[[T1]]
; M3: dsrlv $[[T4:[0-9]+]], $5, $7		; M3: dsrlv $[[T4:[0-9]+]], $5, $7
; M3: dsll $[[T5:[0-9]+]], $4, 1		; M3: dsll $[[T5:[0-9]+]], $4, 1
; M3: not $[[T6:[0-9]+]], $[[T0]]		; M3: not $[[T6:[0-9]+]], $[[T0]]
; M3: dsllv $[[T7:[0-9]+]], $[[T5]], $[[T6]]		; M3: dsllv $[[T7:[0-9]+]], $[[T5]], $[[T6]]
; M3: or $3, $[[T7]], $[[T4]]		; M3: or $3, $[[T7]], $[[T4]]
; M3: [[BB0]]:		; M3: [[BB0]]:
; M3: beqz $[[T3]], [[BB1:.LBB[0-9_]+]]		; M3: bnez $[[T3]], [[BB1:.LBB[0-9_]+]]
; M3: nop		; M3: nop
; M3: dsra $2, $4, 63
; M3: [[BB1]]:
; M3: jr $ra		; M3: jr $ra
; M3: nop		; M3: nop
		; M3: [[BB1]]:
		; M3: jr $ra
		; M3: dsra $2, $4, 63

; GP64-NOT-R6: dsrlv $[[T0:[0-9]+]], $5, $7		; GP64-NOT-R6: dsrlv $[[T0:[0-9]+]], $5, $7
; GP64-NOT-R6: dsll $[[T1:[0-9]+]], $4, 1		; GP64-NOT-R6: dsll $[[T1:[0-9]+]], $4, 1
; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0		; GP64-NOT-R6: sll $[[T2:[0-9]+]], $7, 0
; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]		; GP64-NOT-R6: not $[[T3:[0-9]+]], $[[T2]]
; GP64-NOT-R6: dsllv $[[T4:[0-9]+]], $[[T1]], $[[T3]]		; GP64-NOT-R6: dsllv $[[T4:[0-9]+]], $[[T1]], $[[T3]]
; GP64-NOT-R6: or $3, $[[T4]], $[[T0]]		; GP64-NOT-R6: or $3, $[[T4]], $[[T0]]
; GP64-NOT-R6: dsrav $2, $4, $7		; GP64-NOT-R6: dsrav $2, $4, $7
Show All 27 Lines

test/CodeGen/Mips/micromips-compact-branches.ll

	; RUN: llc %s -march=mipsel -mattr=micromips -filetype=asm -O3 \			; RUN: llc %s -march=mipsel -mattr=micromips -filetype=asm -O3 \
	; RUN: -disable-mips-delay-filler -relocation-model=pic -o - \| FileCheck %s			; RUN: -disable-mips-delay-filler -relocation-model=pic -o - \| FileCheck %s

	define void @main() nounwind uwtable {			define void @main() nounwind uwtable {
	entry:			entry:
	%x = alloca i32, align 4			%x = alloca i32, align 4
	%0 = load i32, i32* %x, align 4			%0 = load i32, i32* %x, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end, !prof !1

	if.then:			if.then:
	store i32 10, i32* %x, align 4			store i32 10, i32* %x, align 4
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	; CHECK: bnezc			; CHECK: bnezc
				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/PowerPC/misched-inorder-latency.ll

	Show All 11 Lines
	; CHECK: addi			; CHECK: addi
	; CHECK: bne			; CHECK: bne
	; CHECK: %true			; CHECK: %true
	define i32 @testload(i32 *%ptr, i32 %sumin) {			define i32 @testload(i32 *%ptr, i32 %sumin) {
	entry:			entry:
	%sum1 = add i32 %sumin, 1			%sum1 = add i32 %sumin, 1
	%val1 = load i32, i32* %ptr			%val1 = load i32, i32* %ptr
	%p = icmp eq i32 %sumin, 0			%p = icmp eq i32 %sumin, 0
	br i1 %p, label %true, label %end			br i1 %p, label %true, label %end, !prof !1
	true:			true:
	%sum2 = add i32 %sum1, 1			%sum2 = add i32 %sum1, 1
	%ptr2 = getelementptr i32, i32* %ptr, i32 1			%ptr2 = getelementptr i32, i32* %ptr, i32 1
	%val = load i32, i32* %ptr2			%val = load i32, i32* %ptr2
	%val2 = add i32 %val1, %val			%val2 = add i32 %val1, %val
	br label %end			br label %end
	end:			end:
	%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]			%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]
	Show All 19 Lines
	true:			true:
	%val2 = add i32 %val1, 1			%val2 = add i32 %val1, 1
	br label %end			br label %end
	end:			end:
	%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]			%valmerge = phi i32 [ %val1, %entry], [ %val2, %true ]
	ret i32 %valmerge			ret i32 %valmerge
	}			}
	declare void @llvm.prefetch(i8*, i32, i32, i32) nounwind			declare void @llvm.prefetch(i8*, i32, i32, i32) nounwind

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/PowerPC/tail-dup-break-cfg.ll

This file was added.

				; RUN: llc -O2 -o - %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-grtev4-linux-gnu"

				; Intended layout:
				; The code for tail-duplication during layout will produce the layout:
				; test1
				; test2
				; body1 (with copy of test2)
				; body2
				; exit

				;CHECK-LABEL: tail_dup_break_cfg:
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 12, 1, [[BODY1LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: b [[BODY2LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: [[BODY1LABEL]]
				;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL]]
				;CHECK-NEXT: [[BODY2LABEL]]
				;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				define void @tail_dup_break_cfg(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %body1, !prof !1 ; %test2 more likely
				body1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp eq i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %exit, label %body2, !prof !1 ; %exit more likely
				body2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %exit
				exit:
				ret void
				}

				; The branch weights here hint that we shouldn't tail duplicate in this case.
				;CHECK-LABEL: tail_dup_dont_break_cfg:
				;CHECK: mr [[TAGREG:[0-9]+]], 3
				;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
				;CHECK-NEXT: bc 4, 1, [[TEST2LABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %body1
				;CHECK: [[TEST2LABEL]]: # %test2
				;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
				;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
				;CHECK-NEXT: # %body2
				;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
				;CHECK: blr
				define void @tail_dup_dont_break_cfg(i32 %tag) {
				entry:
				br label %test1
				test1:
				%tagbit1 = and i32 %tag, 1
				%tagbit1eq0 = icmp eq i32 %tagbit1, 0
				br i1 %tagbit1eq0, label %test2, label %body1, !prof !1 ; %test2 more likely
				body1:
				call void @a()
				call void @a()
				call void @a()
				call void @a()
				br label %test2
				test2:
				%tagbit2 = and i32 %tag, 2
				%tagbit2eq0 = icmp ne i32 %tagbit2, 0
				br i1 %tagbit2eq0, label %body2, label %exit, !prof !1 ; %body2 more likely
				body2:
				call void @b()
				call void @b()
				call void @b()
				call void @b()
				br label %exit
				exit:
				ret void
				}
				declare void @a()
				declare void @b()
				declare void @c()
				declare void @d()

				!1 = !{!"branch_weights", i32 5, i32 3}

test/CodeGen/PowerPC/tail-dup-layout.ll

Show All 13 Lines
; optional3		; optional3
; optional4		; optional4
; Tail duplication puts test n+1 at the end of optional n		; Tail duplication puts test n+1 at the end of optional n
; so optional1 includes a copy of test2 at the end, and branches		; so optional1 includes a copy of test2 at the end, and branches
; to test3 (at the top) or falls through to optional 2.		; to test3 (at the top) or falls through to optional 2.
; The CHECK statements check for the whole string of tests and exit block,		; The CHECK statements check for the whole string of tests and exit block,
; and then check that the correct test has been duplicated into the end of		; and then check that the correct test has been duplicated into the end of
; the optional blocks and that the optional blocks are in the correct order.		; the optional blocks and that the optional blocks are in the correct order.
;CHECK-LABEL: f:		;CHECK-LABEL: straight_test:
; test1 may have been merged with entry		; test1 may have been merged with entry
;CHECK: mr [[TAGREG:[0-9]+]], 3		;CHECK: mr [[TAGREG:[0-9]+]], 3
;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1		;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
;CHECK-NEXT: bc 12, 1, [[OPT1LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST2LABEL:[._0-9A-Za-z]+]]: # %test2		;CHECK-NEXT: # %test2
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
;CHECK-NEXT: bne 0, [[OPT2LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3		;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
;CHECK-NEXT: bne 0, .[[OPT3LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST4LABEL:[._0-9A-Za-z]+]]: # %test4		;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
;CHECK-NEXT: bne 0, .[[OPT4LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT4LABEL:[_0-9A-Za-z]+]]
;CHECK-NEXT: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit		;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit
;CHECK: blr		;CHECK: blr
;CHECK-NEXT: [[OPT1LABEL]]		;CHECK-NEXT: .[[OPT1LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
;CHECK-NEXT: beq 0, [[TEST3LABEL]]		;CHECK-NEXT: beq 0, .[[TEST3LABEL]]
;CHECK-NEXT: [[OPT2LABEL]]		;CHECK-NEXT: .[[OPT2LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
;CHECK-NEXT: beq 0, [[TEST4LABEL]]		;CHECK-NEXT: beq 0, .[[TEST4LABEL]]
;CHECK-NEXT: [[OPT3LABEL]]		;CHECK-NEXT: .[[OPT3LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
;CHECK-NEXT: beq 0, [[EXITLABEL]]		;CHECK-NEXT: beq 0, .[[EXITLABEL]]
;CHECK-NEXT: [[OPT4LABEL]]		;CHECK-NEXT: .[[OPT4LABEL]]
;CHECK: b [[EXITLABEL]]		;CHECK: b .[[EXITLABEL]]

define void @f(i32 %tag) {		define void @straight_test(i32 %tag) {
entry:		entry:
br label %test1		br label %test1
test1:		test1:
%tagbit1 = and i32 %tag, 1		%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1		br i1 %tagbit1eq0, label %test2, label %optional1
optional1:		optional1:
call void @a()		call void @a()
Show All 30 Lines	optional4:
call void @d()		call void @d()
call void @d()		call void @d()
call void @d()		call void @d()
br label %exit		br label %exit
exit:		exit:
ret void		ret void
}		}

		; The block then2 is not unavoidable, but since it can be tail-duplicated, it
		; should be placed as a fallthrough from test2 and copied.
		; CHECK-LABEL: avoidable_test:
		; CHECK: # %entry
		; CHECK: andi.
		; CHECK: # %test2
		; Make sure then2 falls through from test2
		; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
		; CHECK: # %then2
		; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
		; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
		; CHECK: # %end2
		; CHECK: # %else1
		; CHECK: bl a
		; CHECK: bl a
		; Make sure then2 was copied into else1
		; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
		; CHECK: # %else2
		; CHECK: bl c
		define void @avoidable_test(i32 %tag) {
		entry:
		br label %test1
		test1:
		%tagbit1 = and i32 %tag, 1
		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
		br i1 %tagbit1eq0, label %test2, label %else1, !prof !1 ; %test2 more likely
		else1:
		call void @a()
		call void @a()
		br label %then2
		test2:
		%tagbit2 = and i32 %tag, 2
		%tagbit2eq0 = icmp eq i32 %tagbit2, 0
		br i1 %tagbit2eq0, label %then2, label %else2, !prof !1 ; %then2 more likely
		then2:
		%tagbit3 = and i32 %tag, 4
		%tagbit3eq0 = icmp eq i32 %tagbit3, 0
		br i1 %tagbit3eq0, label %end2, label %end1, !prof !1 ; %end2 more likely
		else2:
		call void @c()
		br label %end2
		end2:
		ret void
		end1:
		call void @d()
		ret void
		}

declare void @a()		declare void @a()
declare void @b()		declare void @b()
declare void @c()		declare void @c()
declare void @d()		declare void @d()

		!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/SPARC/sjlj.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK: or %i1, %lo(.LBB1_2), %i1			; CHECK: or %i1, %lo(.LBB1_2), %i1
	; CHECK: st %i1, [%i0+4]			; CHECK: st %i1, [%i0+4]
	; CHECK: st %sp, [%i0+8]			; CHECK: st %sp, [%i0+8]
	; CHECK: bn .LBB1_2			; CHECK: bn .LBB1_2
	; CHECK: st %i7, [%i0+12]			; CHECK: st %i7, [%i0+12]
	; CHECK: ba .LBB1_1			; CHECK: ba .LBB1_1
	; CHECK: nop			; CHECK: nop
	; CHECK:.LBB1_1: ! %entry			; CHECK:.LBB1_1: ! %entry
	; CHECK: ba .LBB1_3
	; CHECK: mov %g0, %i0			; CHECK: mov %g0, %i0
				; CHECK: cmp %i0, 0
				; CHECK: bne .LBB1_4
				; CHECK: ba .LBB1_5
	; CHECK:.LBB1_2: ! Block address taken			; CHECK:.LBB1_2: ! Block address taken
	; CHECK: mov 1, %i0			; CHECK: mov 1, %i0
	; CHECK:.LBB1_3: ! %entry
	; CHECK: cmp %i0, 0
	; CHECK: be .LBB1_5			; CHECK: be .LBB1_5
	; CHECK: nop			; CHECK:.LBB1_4:
				; CHECK: ba .LBB1_6
	}			}
	declare i8* @llvm.frameaddress(i32) #2			declare i8* @llvm.frameaddress(i32) #2

	declare i8* @llvm.stacksave() #3			declare i8* @llvm.stacksave() #3

	declare i32 @llvm.eh.sjlj.setjmp(i8*) #3			declare i32 @llvm.eh.sjlj.setjmp(i8*) #3

	attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { noreturn nounwind }			attributes #1 = { noreturn nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

test/CodeGen/SystemZ/asm-18.ll

	Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	; Test selects involving high registers.			; Test selects involving high registers.
	define void @f13(i32 %x, i32 %y) {			define void @f13(i32 %x, i32 %y) {
	; CHECK-LABEL: f13:			; CHECK-LABEL: f13:
	; CHECK: llihl [[REG:%r[0-5]]], 0			; CHECK: llihl [[REG:%r[0-5]]], 0
	; CHECK: cije %r2, 0			; CHECK: cije %r2, 0
	; CHECK: iihf [[REG]], 2102030405			; CHECK: iihf [[REG]], 2102030405
	; CHECK: blah [[REG]]			; CHECK: blah [[REG]]
	; CHECK: br %r14			; CHECK: br %r14
	%cmp = icmp eq i32 %x, 0			%cmp = icmp ne i32 %x, 0
	%val = select i1 %cmp, i32 0, i32 2102030405			%val = select i1 %cmp, i32 0, i32 2102030405
	call void asm sideeffect "blah $0", "h"(i32 %val)			call void asm sideeffect "blah $0", "h"(i32 %val)
	ret void			ret void
	}			}

	; Test selects involving low registers.			; Test selects involving low registers.
	define void @f14(i32 %x, i32 %y) {			define void @f14(i32 %x, i32 %y) {
	; CHECK-LABEL: f14:			; CHECK-LABEL: f14:
	; CHECK: lhi [[REG:%r[0-5]]], 0			; CHECK: lhi [[REG:%r[0-5]]], 0
	; CHECK: cije %r2, 0			; CHECK: cije %r2, 0
	; CHECK: iilf [[REG]], 2102030405			; CHECK: iilf [[REG]], 2102030405
	; CHECK: blah [[REG]]			; CHECK: blah [[REG]]
	; CHECK: br %r14			; CHECK: br %r14
	%cmp = icmp eq i32 %x, 0			%cmp = icmp ne i32 %x, 0
	%val = select i1 %cmp, i32 0, i32 2102030405			%val = select i1 %cmp, i32 0, i32 2102030405
	call void asm sideeffect "blah $0", "r"(i32 %val)			call void asm sideeffect "blah $0", "r"(i32 %val)
	ret void			ret void
	}			}

	; Test immediate insertion involving high registers.			; Test immediate insertion involving high registers.
	define void @f15() {			define void @f15() {
	; CHECK-LABEL: f15:			; CHECK-LABEL: f15:
	▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/cond-store-01.ll

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store i8 %res, i8 *%ptr		store i8 %res, i8 *%ptr
ret void		ret void
}		}

; Check that volatile loads are not matched.		; Check that volatile loads are not matched.
define void @f18(i8 *%ptr, i8 %alt, i32 %limit) {		define void @f18(i8 *%ptr, i8 %alt, i32 %limit) {
; CHECK-LABEL: f18:		; CHECK-LABEL: f18:
; CHECK: lb {{%r[0-5]}}, 0(%r2)		; CHECK: lb {{%r[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: stc {{%r[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: stc {{%r[0-5]}}, 0(%r2)		; CHECK: stc {{%r[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load volatile i8 , i8 *%ptr		%orig = load volatile i8 , i8 *%ptr
%res = select i1 %cond, i8 %orig, i8 %alt		%res = select i1 %cond, i8 %orig, i8 %alt
store i8 %res, i8 *%ptr		store i8 %res, i8 *%ptr
ret void		ret void
}		}
Show All 16 Lines
; Check that atomic loads are not matched. The transformation is OK for		; Check that atomic loads are not matched. The transformation is OK for
; the "unordered" case tested here, but since we don't try to handle atomic		; the "unordered" case tested here, but since we don't try to handle atomic
; operations at all in this context, it seems better to assert that than		; operations at all in this context, it seems better to assert that than
; to restrict the test to a stronger ordering.		; to restrict the test to a stronger ordering.
define void @f20(i8 *%ptr, i8 %alt, i32 %limit) {		define void @f20(i8 *%ptr, i8 %alt, i32 %limit) {
; FIXME: should use a normal load instead of CS.		; FIXME: should use a normal load instead of CS.
; CHECK-LABEL: f20:		; CHECK-LABEL: f20:
; CHECK: lb {{%r[0-9]+}}, 0(%r2)		; CHECK: lb {{%r[0-9]+}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: stc {{%r[0-9]+}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: stc {{%r[0-9]+}}, 0(%r2)		; CHECK: stc {{%r[0-9]+}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load atomic i8 , i8 *%ptr unordered, align 1		%orig = load atomic i8 , i8 *%ptr unordered, align 1
%res = select i1 %cond, i8 %orig, i8 %alt		%res = select i1 %cond, i8 %orig, i8 %alt
store i8 %res, i8 *%ptr		store i8 %res, i8 *%ptr
ret void		ret void
}		}
Show All 37 Lines

test/CodeGen/SystemZ/cond-store-02.ll

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store i16 %res, i16 *%ptr		store i16 %res, i16 *%ptr
ret void		ret void
}		}

; Check that volatile loads are not matched.		; Check that volatile loads are not matched.
define void @f18(i16 *%ptr, i16 %alt, i32 %limit) {		define void @f18(i16 *%ptr, i16 %alt, i32 %limit) {
; CHECK-LABEL: f18:		; CHECK-LABEL: f18:
; CHECK: lh {{%r[0-5]}}, 0(%r2)		; CHECK: lh {{%r[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: sth {{%r[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: sth {{%r[0-5]}}, 0(%r2)		; CHECK: sth {{%r[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load volatile i16 , i16 *%ptr		%orig = load volatile i16 , i16 *%ptr
%res = select i1 %cond, i16 %orig, i16 %alt		%res = select i1 %cond, i16 %orig, i16 %alt
store i16 %res, i16 *%ptr		store i16 %res, i16 *%ptr
ret void		ret void
}		}
Show All 16 Lines
; Check that atomic loads are not matched. The transformation is OK for		; Check that atomic loads are not matched. The transformation is OK for
; the "unordered" case tested here, but since we don't try to handle atomic		; the "unordered" case tested here, but since we don't try to handle atomic
; operations at all in this context, it seems better to assert that than		; operations at all in this context, it seems better to assert that than
; to restrict the test to a stronger ordering.		; to restrict the test to a stronger ordering.
define void @f20(i16 *%ptr, i16 %alt, i32 %limit) {		define void @f20(i16 *%ptr, i16 %alt, i32 %limit) {
; FIXME: should use a normal load instead of CS.		; FIXME: should use a normal load instead of CS.
; CHECK-LABEL: f20:		; CHECK-LABEL: f20:
; CHECK: lh {{%r[0-9]+}}, 0(%r2)		; CHECK: lh {{%r[0-9]+}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: sth {{%r[0-9]+}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lr {{%r[0-9]+}}, {{%r[0-9]+}}
; CHECK: sth {{%r[0-9]+}}, 0(%r2)		; CHECK: sth {{%r[0-9]+}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load atomic i16 , i16 *%ptr unordered, align 2		%orig = load atomic i16 , i16 *%ptr unordered, align 2
%res = select i1 %cond, i16 %orig, i16 %alt		%res = select i1 %cond, i16 %orig, i16 %alt
store i16 %res, i16 *%ptr		store i16 %res, i16 *%ptr
ret void		ret void
}		}
Show All 37 Lines

test/CodeGen/SystemZ/cond-store-03.ll

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store i32 %res, i32 *%ptr		store i32 %res, i32 *%ptr
ret void		ret void
}		}

; Check that volatile loads are not matched.		; Check that volatile loads are not matched.
define void @f14(i32 *%ptr, i32 %alt, i32 %limit) {		define void @f14(i32 *%ptr, i32 %alt, i32 %limit) {
; CHECK-LABEL: f14:		; CHECK-LABEL: f14:
; CHECK: l {{%r[0-5]}}, 0(%r2)		; CHECK: l {{%r[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: st {{%r[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: st {{%r[0-5]}}, 0(%r2)		; CHECK: st {{%r[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load volatile i32 , i32 *%ptr		%orig = load volatile i32 , i32 *%ptr
%res = select i1 %cond, i32 %orig, i32 %alt		%res = select i1 %cond, i32 %orig, i32 %alt
store i32 %res, i32 *%ptr		store i32 %res, i32 *%ptr
ret void		ret void
}		}
Show All 16 Lines
; Check that atomic loads are not matched. The transformation is OK for		; Check that atomic loads are not matched. The transformation is OK for
; the "unordered" case tested here, but since we don't try to handle atomic		; the "unordered" case tested here, but since we don't try to handle atomic
; operations at all in this context, it seems better to assert that than		; operations at all in this context, it seems better to assert that than
; to restrict the test to a stronger ordering.		; to restrict the test to a stronger ordering.
define void @f16(i32 *%ptr, i32 %alt, i32 %limit) {		define void @f16(i32 *%ptr, i32 %alt, i32 %limit) {
; FIXME: should use a normal load instead of CS.		; FIXME: should use a normal load instead of CS.
; CHECK-LABEL: f16:		; CHECK-LABEL: f16:
; CHECK: l {{%r[0-5]}}, 0(%r2)		; CHECK: l {{%r[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: st {{%r[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: st {{%r[0-5]}}, 0(%r2)		; CHECK: st {{%r[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load atomic i32 , i32 *%ptr unordered, align 4		%orig = load atomic i32 , i32 *%ptr unordered, align 4
%res = select i1 %cond, i32 %orig, i32 %alt		%res = select i1 %cond, i32 %orig, i32 %alt
store i32 %res, i32 *%ptr		store i32 %res, i32 *%ptr
ret void		ret void
}		}
Show All 37 Lines

test/CodeGen/SystemZ/cond-store-04.ll

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store i64 %res, i64 *%ptr		store i64 %res, i64 *%ptr
ret void		ret void
}		}

; Check that volatile loads are not matched.		; Check that volatile loads are not matched.
define void @f8(i64 *%ptr, i64 %alt, i32 %limit) {		define void @f8(i64 *%ptr, i64 %alt, i32 %limit) {
; CHECK-LABEL: f8:		; CHECK-LABEL: f8:
; CHECK: lg {{%r[0-5]}}, 0(%r2)		; CHECK: lg {{%r[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: stg {{%r[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lgr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: stg {{%r[0-5]}}, 0(%r2)		; CHECK: stg {{%r[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load volatile i64 , i64 *%ptr		%orig = load volatile i64 , i64 *%ptr
%res = select i1 %cond, i64 %orig, i64 %alt		%res = select i1 %cond, i64 %orig, i64 %alt
store i64 %res, i64 *%ptr		store i64 %res, i64 *%ptr
ret void		ret void
}		}
Show All 16 Lines
; Check that atomic loads are not matched. The transformation is OK for		; Check that atomic loads are not matched. The transformation is OK for
; the "unordered" case tested here, but since we don't try to handle atomic		; the "unordered" case tested here, but since we don't try to handle atomic
; operations at all in this context, it seems better to assert that than		; operations at all in this context, it seems better to assert that than
; to restrict the test to a stronger ordering.		; to restrict the test to a stronger ordering.
define void @f10(i64 *%ptr, i64 %alt, i32 %limit) {		define void @f10(i64 *%ptr, i64 %alt, i32 %limit) {
; FIXME: should use a normal load instead of CSG.		; FIXME: should use a normal load instead of CSG.
; CHECK-LABEL: f10:		; CHECK-LABEL: f10:
; CHECK: lg {{%r[0-5]}}, 0(%r2)		; CHECK: lg {{%r[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: stg {{%r[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: lgr {{%r[0-5]}}, {{%r[0-5]}}
; CHECK: stg {{%r[0-5]}}, 0(%r2)		; CHECK: stg {{%r[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load atomic i64 , i64 *%ptr unordered, align 8		%orig = load atomic i64 , i64 *%ptr unordered, align 8
%res = select i1 %cond, i64 %orig, i64 %alt		%res = select i1 %cond, i64 %orig, i64 %alt
store i64 %res, i64 *%ptr		store i64 %res, i64 *%ptr
ret void		ret void
}		}
Show All 37 Lines

test/CodeGen/SystemZ/cond-store-05.ll

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store float %res, float *%ptr		store float %res, float *%ptr
ret void		ret void
}		}

; Check that volatile loads are not matched.		; Check that volatile loads are not matched.
define void @f10(float *%ptr, float %alt, i32 %limit) {		define void @f10(float *%ptr, float %alt, i32 %limit) {
; CHECK-LABEL: f10:		; CHECK-LABEL: f10:
; CHECK: le {{%f[0-5]}}, 0(%r2)		; CHECK: le {{%f[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: ste {{%f[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: ler {{%f[0-5]}}, {{%f[0-5]}}
; CHECK: ste {{%f[0-5]}}, 0(%r2)		; CHECK: ste {{%f[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load volatile float , float *%ptr		%orig = load volatile float , float *%ptr
%res = select i1 %cond, float %orig, float %alt		%res = select i1 %cond, float %orig, float %alt
store float %res, float *%ptr		store float %res, float *%ptr
ret void		ret void
}		}
Show All 36 Lines

test/CodeGen/SystemZ/cond-store-06.ll

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store double %res, double *%ptr		store double %res, double *%ptr
ret void		ret void
}		}

; Check that volatile loads are not matched.		; Check that volatile loads are not matched.
define void @f10(double *%ptr, double %alt, i32 %limit) {		define void @f10(double *%ptr, double %alt, i32 %limit) {
; CHECK-LABEL: f10:		; CHECK-LABEL: f10:
; CHECK: ld {{%f[0-5]}}, 0(%r2)		; CHECK: ld {{%f[0-5]}}, 0(%r2)
; CHECK: {{jl\|jnl}} [[LABEL:[^ ]*]]		; CHECK: {{jhe\|jnhe}} [[LABEL:[^ ]*]]
		; CHECK: std {{%f[0-5]}}, 0(%r2)
		; CHECK: br %r14
; CHECK: [[LABEL]]:		; CHECK: [[LABEL]]:
		; CHECK: ldr {{%f[0-5]}}, {{%f[0-5]}}
; CHECK: std {{%f[0-5]}}, 0(%r2)		; CHECK: std {{%f[0-5]}}, 0(%r2)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp ult i32 %limit, 420		%cond = icmp ult i32 %limit, 420
%orig = load volatile double , double *%ptr		%orig = load volatile double , double *%ptr
%res = select i1 %cond, double %orig, double %alt		%res = select i1 %cond, double %orig, double %alt
store double %res, double *%ptr		store double %res, double *%ptr
ret void		ret void
}		}
Show All 36 Lines

test/CodeGen/SystemZ/int-cmp-37.ll

	Show All 9 Lines
	define i32 @f1(i32 %src1) {			define i32 @f1(i32 %src1) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: clhrl %r2, g			; CHECK: clhrl %r2, g
	; CHECK-NEXT: jl			; CHECK-NEXT: jl
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i32			%src2 = zext i16 %val to i32
	%cond = icmp ult i32 %src1, %src2			%cond = icmp uge i32 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i32 %src1, %src1			%mul = mul i32 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i32 %tmp, 1			%res = add i32 %tmp, 1
	ret i32 %res			ret i32 %res
	}			}

	; Check signed comparison.			; Check signed comparison.
	define i32 @f2(i32 %src1) {			define i32 @f2(i32 %src1) {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f2:
	; CHECK-NOT: clhrl			; CHECK-NOT: clhrl
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i32			%src2 = zext i16 %val to i32
	%cond = icmp slt i32 %src1, %src2			%cond = icmp sge i32 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i32 %src1, %src1			%mul = mul i32 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i32 %tmp, 1			%res = add i32 %tmp, 1
	ret i32 %res			ret i32 %res
	}			}

	; Check equality.			; Check equality.
	define i32 @f3(i32 %src1) {			define i32 @f3(i32 %src1) {
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK: clhrl %r2, g			; CHECK: clhrl %r2, g
	; CHECK-NEXT: je			; CHECK-NEXT: je
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i32			%src2 = zext i16 %val to i32
	%cond = icmp eq i32 %src1, %src2			%cond = icmp ne i32 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i32 %src1, %src1			%mul = mul i32 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i32 %tmp, 1			%res = add i32 %tmp, 1
	ret i32 %res			ret i32 %res
	}			}

	; Check inequality.			; Check inequality.
	define i32 @f4(i32 %src1) {			define i32 @f4(i32 %src1) {
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: clhrl %r2, g			; CHECK: clhrl %r2, g
	; CHECK-NEXT: jlh			; CHECK-NEXT: jlh
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i32			%src2 = zext i16 %val to i32
	%cond = icmp ne i32 %src1, %src2			%cond = icmp eq i32 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i32 %src1, %src1			%mul = mul i32 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i32 %tmp, 1			%res = add i32 %tmp, 1
	ret i32 %res			ret i32 %res
	}			}

	; Repeat f1 with an unaligned address.			; Repeat f1 with an unaligned address.
	define i32 @f5(i32 %src1) {			define i32 @f5(i32 %src1) {
	; CHECK-LABEL: f5:			; CHECK-LABEL: f5:
	; CHECK: lgrl [[REG:%r[0-5]]], h@GOT			; CHECK: lgrl [[REG:%r[0-5]]], h@GOT
	; CHECK: llh [[VAL:%r[0-5]]], 0([[REG]])			; CHECK: llh [[VAL:%r[0-5]]], 0([[REG]])
	; CHECK: clrjl %r2, [[VAL]],			; CHECK: clrjl %r2, [[VAL]],
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@h, align 1			%val = load i16 , i16 *@h, align 1
	%src2 = zext i16 %val to i32			%src2 = zext i16 %val to i32
	%cond = icmp ult i32 %src1, %src2			%cond = icmp uge i32 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i32 %src1, %src1			%mul = mul i32 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i32 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i32 %tmp, 1			%res = add i32 %tmp, 1
	ret i32 %res			ret i32 %res
	}			}

	; Check the comparison can be reversed if that allows CLHRL to be used.			; Check the comparison can be reversed if that allows CLHRL to be used.
	define i32 @f6(i32 %src2) {			define i32 @f6(i32 %src2) {
	; CHECK-LABEL: f6:			; CHECK-LABEL: f6:
	; CHECK: clhrl %r2, g			; CHECK: clhrl %r2, g
	; CHECK-NEXT: jh {{\.L.*}}			; CHECK-NEXT: jh {{\.L.*}}
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src1 = zext i16 %val to i32			%src1 = zext i16 %val to i32
	%cond = icmp ult i32 %src1, %src2			%cond = icmp uge i32 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i32 %src2, %src2			%mul = mul i32 %src2, %src2
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i32 [ %src2, %entry ], [ %mul, %mulb ]			%tmp = phi i32 [ %src2, %entry ], [ %mul, %mulb ]
	%res = add i32 %tmp, 1			%res = add i32 %tmp, 1
	ret i32 %res			ret i32 %res
	}			}

test/CodeGen/SystemZ/int-cmp-40.ll

	Show All 9 Lines
	define i64 @f1(i64 %src1) {			define i64 @f1(i64 %src1) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: clghrl %r2, g			; CHECK: clghrl %r2, g
	; CHECK-NEXT: jl			; CHECK-NEXT: jl
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i64			%src2 = zext i16 %val to i64
	%cond = icmp ult i64 %src1, %src2			%cond = icmp uge i64 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i64 %src1, %src1			%mul = mul i64 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i64 %tmp, 1			%res = add i64 %tmp, 1
	ret i64 %res			ret i64 %res
	}			}
	Show All 21 Lines
	define i64 @f3(i64 %src1) {			define i64 @f3(i64 %src1) {
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK: clghrl %r2, g			; CHECK: clghrl %r2, g
	; CHECK-NEXT: je			; CHECK-NEXT: je
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i64			%src2 = zext i16 %val to i64
	%cond = icmp eq i64 %src1, %src2			%cond = icmp ne i64 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i64 %src1, %src1			%mul = mul i64 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i64 %tmp, 1			%res = add i64 %tmp, 1
	ret i64 %res			ret i64 %res
	}			}

	; Check inequality.			; Check inequality.
	define i64 @f4(i64 %src1) {			define i64 @f4(i64 %src1) {
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: clghrl %r2, g			; CHECK: clghrl %r2, g
	; CHECK-NEXT: jlh			; CHECK-NEXT: jlh
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src2 = zext i16 %val to i64			%src2 = zext i16 %val to i64
	%cond = icmp ne i64 %src1, %src2			%cond = icmp eq i64 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i64 %src1, %src1			%mul = mul i64 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i64 %tmp, 1			%res = add i64 %tmp, 1
	ret i64 %res			ret i64 %res
	}			}

	; Repeat f1 with an unaligned address.			; Repeat f1 with an unaligned address.
	define i64 @f5(i64 %src1) {			define i64 @f5(i64 %src1) {
	; CHECK-LABEL: f5:			; CHECK-LABEL: f5:
	; CHECK: lgrl [[REG:%r[0-5]]], h@GOT			; CHECK: lgrl [[REG:%r[0-5]]], h@GOT
	; CHECK: llgh [[VAL:%r[0-5]]], 0([[REG]])			; CHECK: llgh [[VAL:%r[0-5]]], 0([[REG]])
	; CHECK: clgrjl %r2, [[VAL]],			; CHECK: clgrjl %r2, [[VAL]],
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@h, align 1			%val = load i16 , i16 *@h, align 1
	%src2 = zext i16 %val to i64			%src2 = zext i16 %val to i64
	%cond = icmp ult i64 %src1, %src2			%cond = icmp uge i64 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i64 %src1, %src1			%mul = mul i64 %src1, %src1
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]			%tmp = phi i64 [ %src1, %entry ], [ %mul, %mulb ]
	%res = add i64 %tmp, 1			%res = add i64 %tmp, 1
	ret i64 %res			ret i64 %res
	}			}

	; Check the comparison can be reversed if that allows CLGHRL to be used.			; Check the comparison can be reversed if that allows CLGHRL to be used.
	define i64 @f6(i64 %src2) {			define i64 @f6(i64 %src2) {
	; CHECK-LABEL: f6:			; CHECK-LABEL: f6:
	; CHECK: clghrl %r2, g			; CHECK: clghrl %r2, g
	; CHECK-NEXT: jh {{\.L.*}}			; CHECK-NEXT: jh {{\.L.*}}
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i16 , i16 *@g			%val = load i16 , i16 *@g
	%src1 = zext i16 %val to i64			%src1 = zext i16 %val to i64
	%cond = icmp ult i64 %src1, %src2			%cond = icmp uge i64 %src1, %src2
	br i1 %cond, label %exit, label %mulb			br i1 %cond, label %mulb, label %exit
	mulb:			mulb:
	%mul = mul i64 %src2, %src2			%mul = mul i64 %src2, %src2
	br label %exit			br label %exit
	exit:			exit:
	%tmp = phi i64 [ %src2, %entry ], [ %mul, %mulb ]			%tmp = phi i64 [ %src2, %entry ], [ %mul, %mulb ]
	%res = add i64 %tmp, 1			%res = add i64 %tmp, 1
	ret i64 %res			ret i64 %res
	}			}

test/CodeGen/SystemZ/int-cmp-44.ll

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: brasl %r14, foo@PLT			; CHECK-NEXT: brasl %r14, foo@PLT
	; CHECK-NEXT: cijlh [[REG]], 0, .L{{.*}}			; CHECK-NEXT: cijlh [[REG]], 0, .L{{.*}}
	; CHECK: br %r14			; CHECK: br %r14
	entry:			entry:
	%val = load i32 , i32 *%ptr			%val = load i32 , i32 *%ptr
	%xor = xor i32 %val, 1			%xor = xor i32 %val, 1
	%add = add i32 %xor, 1000000			%add = add i32 %xor, 1000000
	call void @foo()			call void @foo()
	%cmp = icmp ne i32 %add, 0			%cmp = icmp eq i32 %add, 0
	br i1 %cmp, label %exit, label %store			br i1 %cmp, label %store, label %exit, !prof !1

	store:			store:
	store i32 %add, i32 *%ptr			store i32 %add, i32 *%ptr
	br label %exit			br label %exit

	exit:			exit:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 397 Lines • ▼ Show 20 Lines

	store:			store:
	store i64 %res, i64 *%dest			store i64 %res, i64 *%dest
	br label %exit			br label %exit

	exit:			exit:
	ret i64 %res			ret i64 %res
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/SystemZ/int-cmp-48.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
exit:		exit:
ret void		ret void
}		}

; Check a simple select-based use of TM.		; Check a simple select-based use of TM.
define double @f3(i8 *%src, double %a, double %b) {		define double @f3(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f3:		; CHECK-LABEL: f3:
; CHECK: tm 0(%r2), 1		; CHECK: tm 0(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%byte = load i8 , i8 *%src		%byte = load i8 , i8 *%src
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

Show All 11 Lines	; CHECK: br %r14
store i8 0, i8 *%src		store i8 0, i8 *%src
ret double %res		ret double %res
}		}

; Check an inequality check.		; Check an inequality check.
define double @f5(i8 *%src, double %a, double %b) {		define double @f5(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f5:		; CHECK-LABEL: f5:
; CHECK: tm 0(%r2), 1		; CHECK: tm 0(%r2), 1
; CHECK: jne {{\.L.*}}		; CHECK: je {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%byte = load i8 , i8 *%src		%byte = load i8 , i8 *%src
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp ne i8 %and, 0		%cmp = icmp ne i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check that we can also use TM for equality comparisons with the mask.		; Check that we can also use TM for equality comparisons with the mask.
define double @f6(i8 *%src, double %a, double %b) {		define double @f6(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f6:		; CHECK-LABEL: f6:
; CHECK: tm 0(%r2), 254		; CHECK: tm 0(%r2), 254
; CHECK: jo {{\.L.*}}		; CHECK: jno {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%byte = load i8 , i8 *%src		%byte = load i8 , i8 *%src
%and = and i8 %byte, 254		%and = and i8 %byte, 254
%cmp = icmp eq i8 %and, 254		%cmp = icmp eq i8 %and, 254
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check inequality comparisons with the mask.		; Check inequality comparisons with the mask.
define double @f7(i8 *%src, double %a, double %b) {		define double @f7(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f7:		; CHECK-LABEL: f7:
; CHECK: tm 0(%r2), 254		; CHECK: tm 0(%r2), 254
; CHECK: jno {{\.L.*}}		; CHECK: jo {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%byte = load i8 , i8 *%src		%byte = load i8 , i8 *%src
%and = and i8 %byte, 254		%and = and i8 %byte, 254
%cmp = icmp ne i8 %and, 254		%cmp = icmp ne i8 %and, 254
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check that we do not use the memory TM instruction when CC is being tested		; Check that we do not use the memory TM instruction when CC is being tested
; for 2.		; for 2.
define double @f8(i8 *%src, double %a, double %b) {		define double @f8(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f8:		; CHECK-LABEL: f8:
; CHECK: llc [[REG:%r[0-5]]], 0(%r2)		; CHECK: llc [[REG:%r[0-5]]], 0(%r2)
; CHECK: tmll [[REG]], 3		; CHECK: tmll [[REG]], 3
; CHECK: jh {{\.L.*}}		; CHECK: jnh {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%byte = load i8 , i8 *%src		%byte = load i8 , i8 *%src
%and = and i8 %byte, 3		%and = and i8 %byte, 3
%cmp = icmp eq i8 %and, 2		%cmp = icmp eq i8 %and, 2
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; ...likewise 1.		; ...likewise 1.
define double @f9(i8 *%src, double %a, double %b) {		define double @f9(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f9:		; CHECK-LABEL: f9:
; CHECK: llc [[REG:%r[0-5]]], 0(%r2)		; CHECK: llc [[REG:%r[0-5]]], 0(%r2)
; CHECK: tmll [[REG]], 3		; CHECK: tmll [[REG]], 3
; CHECK: jl {{\.L.*}}		; CHECK: jnl {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%byte = load i8 , i8 *%src		%byte = load i8 , i8 *%src
%and = and i8 %byte, 3		%and = and i8 %byte, 3
%cmp = icmp eq i8 %and, 1		%cmp = icmp eq i8 %and, 1
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check the high end of the TM range.		; Check the high end of the TM range.
define double @f10(i8 *%src, double %a, double %b) {		define double @f10(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f10:		; CHECK-LABEL: f10:
; CHECK: tm 4095(%r2), 1		; CHECK: tm 4095(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 4095		%ptr = getelementptr i8, i8 *%src, i64 4095
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check the low end of the positive TMY range.		; Check the low end of the positive TMY range.
define double @f11(i8 *%src, double %a, double %b) {		define double @f11(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f11:		; CHECK-LABEL: f11:
; CHECK: tmy 4096(%r2), 1		; CHECK: tmy 4096(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 4096		%ptr = getelementptr i8, i8 *%src, i64 4096
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check the high end of the TMY range.		; Check the high end of the TMY range.
define double @f12(i8 *%src, double %a, double %b) {		define double @f12(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f12:		; CHECK-LABEL: f12:
; CHECK: tmy 524287(%r2), 1		; CHECK: tmy 524287(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 524287		%ptr = getelementptr i8, i8 *%src, i64 524287
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check the next byte up, which needs separate address logic.		; Check the next byte up, which needs separate address logic.
define double @f13(i8 *%src, double %a, double %b) {		define double @f13(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f13:		; CHECK-LABEL: f13:
; CHECK: agfi %r2, 524288		; CHECK: agfi %r2, 524288
; CHECK: tm 0(%r2), 1		; CHECK: tm 0(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 524288		%ptr = getelementptr i8, i8 *%src, i64 524288
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check the low end of the TMY range.		; Check the low end of the TMY range.
define double @f14(i8 *%src, double %a, double %b) {		define double @f14(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f14:		; CHECK-LABEL: f14:
; CHECK: tmy -524288(%r2), 1		; CHECK: tmy -524288(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 -524288		%ptr = getelementptr i8, i8 *%src, i64 -524288
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check the next byte down, which needs separate address logic.		; Check the next byte down, which needs separate address logic.
define double @f15(i8 *%src, double %a, double %b) {		define double @f15(i8 *%src, double %a, double %b) {
; CHECK-LABEL: f15:		; CHECK-LABEL: f15:
; CHECK: agfi %r2, -524289		; CHECK: agfi %r2, -524289
; CHECK: tm 0(%r2), 1		; CHECK: tm 0(%r2), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 -524289		%ptr = getelementptr i8, i8 *%src, i64 -524289
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

; Check that TM(Y) does not allow an index		; Check that TM(Y) does not allow an index
define double @f16(i8 *%src, i64 %index, double %a, double %b) {		define double @f16(i8 *%src, i64 %index, double %a, double %b) {
; CHECK-LABEL: f16:		; CHECK-LABEL: f16:
; CHECK: tm 0({{%r[1-5]}}), 1		; CHECK: tm 0({{%r[1-5]}}), 1
; CHECK: je {{\.L.*}}		; CHECK: jne {{\.L.*}}
; CHECK: br %r14		; CHECK: br %r14
%ptr = getelementptr i8, i8 *%src, i64 %index		%ptr = getelementptr i8, i8 *%src, i64 %index
%byte = load i8 , i8 *%ptr		%byte = load i8 , i8 *%ptr
%and = and i8 %byte, 1		%and = and i8 %byte, 1
%cmp = icmp eq i8 %and, 0		%cmp = icmp eq i8 %and, 0
%res = select i1 %cmp, double %b, double %a		%res = select i1 %cmp, double %b, double %a
ret double %res		ret double %res
}		}

test/CodeGen/SystemZ/tdc-06.ll

	Show All 20 Lines
	; CHECK: cdbr %f0, %f0			; CHECK: cdbr %f0, %f0
	; CHECK: jo [[RET]]			; CHECK: jo [[RET]]
	%testnan = fcmp uno double %x, 0.000000e+00			%testnan = fcmp uno double %x, 0.000000e+00
	br i1 %testnan, label %ret, label %nonzeroord, !prof !1			br i1 %testnan, label %ret, label %nonzeroord, !prof !1

	nonzeroord:			nonzeroord:
	; CHECK: lhi %r2, 2			; CHECK: lhi %r2, 2
	; CHECK: tcdb %f0, 48			; CHECK: tcdb %f0, 48
	; CHECK: jl [[RET]]			; CHECK: je [[FINITE:.]]
	%abs = tail call double @llvm.fabs.f64(double %x)			%abs = tail call double @llvm.fabs.f64(double %x)
	%testinf = fcmp oeq double %abs, 0x7FF0000000000000			%testinf = fcmp oeq double %abs, 0x7FF0000000000000
	br i1 %testinf, label %ret, label %finite, !prof !1			br i1 %testinf, label %ret, label %finite, !prof !1

				ret:
				; CHECK: [[RET]]:
				; CHECK: br %r14
				%res = phi i32 [ 5, %entry ], [ 1, %nonzero ], [ 2, %nonzeroord ], [ %finres, %finite ]
				ret i32 %res

	finite:			finite:
	; CHECK: lhi %r2, 3			; CHECK: lhi %r2, 3
	; CHECK: tcdb %f0, 831			; CHECK: tcdb %f0, 831
	; CHECK: blr %r14			; CHECK: blr %r14
	; CHECK: lhi %r2, 4			; CHECK: lhi %r2, 4
				; CHECK: br %r14
	%testnormal = fcmp uge double %abs, 0x10000000000000			%testnormal = fcmp uge double %abs, 0x10000000000000
	%finres = select i1 %testnormal, i32 3, i32 4			%finres = select i1 %testnormal, i32 3, i32 4
	br label %ret			br label %ret

	ret:
	; CHECK: [[RET]]:
	; CHECK: br %r14
	%res = phi i32 [ 5, %entry ], [ 1, %nonzero ], [ 2, %nonzeroord ], [ %finres, %finite ]
	ret i32 %res
	}			}

	!1 = !{!"branch_weights", i32 1, i32 1}			!1 = !{!"branch_weights", i32 1, i32 1}

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

	; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumb-macho \			; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumb-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V4T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V4T
	; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumbv5-macho \			; RUN: llc %s -o - -enable-shrink-wrap=true -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumbv5-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V5T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=ENABLE --check-prefix=ENABLE-V5T
	; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumb-macho \			; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumb-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V4T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V4T
	; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -mtriple=thumbv5-macho \			; RUN: llc %s -o - -enable-shrink-wrap=false -ifcvt-fn-start=1 -ifcvt-fn-stop=0 -tail-dup-placement=0 -mtriple=thumbv5-macho \
	; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V5T			; RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=DISABLE --check-prefix=DISABLE-V5T

	;			;
	; Note: Lots of tests use inline asm instead of regular calls.			; Note: Lots of tests use inline asm instead of regular calls.
	; This allows to have a better control on what the allocation will do.			; This allows to have a better control on what the allocation will do.
	; Otherwise, we may have spill right in the entry block, defeating			; Otherwise, we may have spill right in the entry block, defeating
	; shrink-wrapping. Moreover, some of the inline asm statements (nop)			; shrink-wrapping. Moreover, some of the inline asm statements (nop)
	; are here to ensure that the related paths do not end up as critical			; are here to ensure that the related paths do not end up as critical
	; edges.			; edges.
	; Also disable the late if-converter as it makes harder to reason on			; Also disable the late if-converter as it makes harder to reason on
	; the diffs.			; the diffs.
				; Disable tail-duplication during placement, as v4t vs v5t get different
				; results due to branches not being analyzable under v5

	; Initial motivating example: Simple diamond with a call just on one side.			; Initial motivating example: Simple diamond with a call just on one side.
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; No prologue needed.			; No prologue needed.
	; ENABLE: cmp r0, r1			; ENABLE: cmp r0, r1
	; ENABLE-NEXT: bge [[EXIT_LABEL:LBB[0-9_]+]]			; ENABLE-NEXT: bge [[EXIT_LABEL:LBB[0-9_]+]]
	▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

test/CodeGen/Thumb2/cbnz.ll

Show All 20 Lines	t:
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
; CHECK: cbnz		; CHECK: cbz
%q = icmp eq i32 %y, 0		%q = icmp eq i32 %y, 0
br i1 %q, label %t2, label %f		br i1 %q, label %t2, label %f

t2:		t2:
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
call void @x()		call void @x()
Show All 17 Lines

test/CodeGen/Thumb2/ifcvt-compare.ll

	; RUN: llc -mtriple=thumbv7-unknown-linux %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv7-unknown-linux %s -o - \| FileCheck %s

	declare void @x()			declare void @x()

	define void @f0(i32 %x) optsize {			define void @f0(i32 %x) optsize {
	; CHECK-LABEL: f0:			; CHECK-LABEL: f0:
	; CHECK: cbnz			; CHECK: cbz
	%p = icmp eq i32 %x, 0			%p = icmp eq i32 %x, 0
	br i1 %p, label %t, label %f			br i1 %p, label %t, label %f

	t:			t:
	call void @x()			call void @x()
	br label %f			br label %f

	f:			f:
	Show All 34 Lines

test/CodeGen/Thumb2/v8_IT_4.ll

	; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -arm-restrict-it \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard -regalloc=basic \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8-eabi -float-abi=hard -regalloc=basic \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -regalloc=basic -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-eabi -float-abi=hard -regalloc=basic -arm-restrict-it \| FileCheck %s

	%"struct.__gnu_cxx::__normal_iterator<char,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" = type { i8 }			%"struct.__gnu_cxx::__normal_iterator<char,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" = type { i8 }
	%"struct.__gnu_cxx::new_allocator<char>" = type <{ i8 }>			%"struct.__gnu_cxx::new_allocator<char>" = type <{ i8 }>
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >" = type { %"struct.__gnu_cxx::__normal_iterator<char*,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >" = type { %"struct.__gnu_cxx::__normal_iterator<char*,std::basic_string<char, std::char_traits<char>, std::allocator<char> > >" }
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep" = type { %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep" = type { %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" }
	%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" = type { i32, i32, i32 }			%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Rep_base" = type { i32, i32, i32 }


	define weak arm_aapcs_vfpcc i32 @_ZNKSs7compareERKSs(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this, %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) {			define weak arm_aapcs_vfpcc i32 @_ZNKSs7compareERKSs(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this, %"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) {
	; CHECK-LABEL: _ZNKSs7compareERKSs:			; CHECK-LABEL: _ZNKSs7compareERKSs:
	; CHECK: cbnz r0,			; CHECK: cbz r0,
				; CHECK-NEXT: %bb1
				; CHECK-NEXT: pop.w
	; CHECK-NEXT: %bb			; CHECK-NEXT: %bb
	; CHECK-NEXT: sub{{(.w)?}} r0, r{{[0-9]+}}, r{{[0-9]+}}			; CHECK-NEXT: sub{{(.w)?}} r0, r{{[0-9]+}}, r{{[0-9]+}}
	; CHECK-NEXT: %bb1
	; CHECK-NEXT: pop.w			; CHECK-NEXT: pop.w
	entry:			entry:
	%0 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i32> [#uses=3]			%0 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i32> [#uses=3]
	%1 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i32> [#uses=3]			%1 = tail call arm_aapcs_vfpcc i32 @_ZNKSs4sizeEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i32> [#uses=3]
	%2 = icmp ult i32 %1, %0 ; <i1> [#uses=1]			%2 = icmp ult i32 %1, %0 ; <i1> [#uses=1]
	%3 = select i1 %2, i32 %1, i32 %0 ; <i32> [#uses=1]			%3 = select i1 %2, i32 %1, i32 %0 ; <i32> [#uses=1]
	%4 = tail call arm_aapcs_vfpcc i8* @_ZNKSs7_M_dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i8*> [#uses=1]			%4 = tail call arm_aapcs_vfpcc i8* @_ZNKSs7_M_dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %this) ; <i8*> [#uses=1]
	%5 = tail call arm_aapcs_vfpcc i8* @_ZNKSs4dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i8*> [#uses=1]			%5 = tail call arm_aapcs_vfpcc i8* @_ZNKSs4dataEv(%"struct.std::basic_string<char,std::char_traits<char>,std::allocator<char> >"* %__str) ; <i8*> [#uses=1]
	Show All 19 Lines

test/CodeGen/WebAssembly/phi.ll

	; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -verify-machineinstrs \| FileCheck %s

	; Test that phis are lowered.			; Test that phis are lowered.

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	; Basic phi triangle.			; Basic phi triangle.

	; CHECK-LABEL: test0:			; CHECK-LABEL: test0:
	; CHECK: div_s $[[NUM0:[0-9]+]]=, $0, $pop[[NUM1:[0-9]+]]{{$}}			; CHECK: return $0
	; CHECK: return $[[NUM0]]{{$}}			; CHECK: div_s $push[[NUM0:[0-9]+]]=, $0, $pop[[NUM1:[0-9]+]]{{$}}
				; CHECK: return $pop[[NUM0]]{{$}}
	define i32 @test0(i32 %p) {			define i32 @test0(i32 %p) {
	entry:			entry:
	%t = icmp slt i32 %p, 0			%t = icmp slt i32 %p, 0
	br i1 %t, label %true, label %done			br i1 %t, label %true, label %done
	true:			true:
	%a = sdiv i32 %p, 3			%a = sdiv i32 %p, 3
	br label %done			br label %done
	done:			done:
	Show All 27 Lines

test/CodeGen/X86/2008-11-29-ULT-Sign.ll

	; RUN: llc < %s -mtriple=i686-pc-linux-gnu \| grep "jns" \| count 1			; RUN: llc < %s -mtriple=i686-pc-linux-gnu \| grep "jns" \| count 1
	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
	target triple = "i686-pc-linux-gnu"			target triple = "i686-pc-linux-gnu"

	define i32 @a(i32 %x) nounwind {			define i32 @a(i32 %x) nounwind {
	entry:			entry:
	%cmp = icmp ult i32 %x, -2147483648 ; <i1> [#uses=1]			%cmp = icmp uge i32 %x, -2147483648 ; <i1> [#uses=1]
	br i1 %cmp, label %if.end, label %if.then			br i1 %cmp, label %if.then, label %if.end

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%call = call i32 (...) @b() ; <i32> [#uses=0]			%call = call i32 (...) @b() ; <i32> [#uses=0]
	br label %if.end			br label %if.end

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	br label %return			br label %return

	return: ; preds = %if.end			return: ; preds = %if.end
	ret i32 undef			ret i32 undef
	}			}

	declare i32 @b(...)			declare i32 @b(...)

test/CodeGen/X86/add.ll

	Show All 24 Lines
	; X64: subq $-128,			; X64: subq $-128,
	}			}

	define i1 @test4(i32 %v1, i32 %v2, i32* %X) nounwind {			define i1 @test4(i32 %v1, i32 %v2, i32* %X) nounwind {
	entry:			entry:
	%t = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %v1, i32 %v2)			%t = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %v1, i32 %v2)
	%sum = extractvalue {i32, i1} %t, 0			%sum = extractvalue {i32, i1} %t, 0
	%obit = extractvalue {i32, i1} %t, 1			%obit = extractvalue {i32, i1} %t, 1
	br i1 %obit, label %overflow, label %normal			%notobit = xor i1 1, %obit
				br i1 %notobit, label %normal, label %overflow

	normal:			normal:
	store i32 0, i32* %X			store i32 0, i32* %X
	br label %overflow			br label %overflow

	overflow:			overflow:
	ret i1 false			ret i1 false

	; X32-LABEL: test4:			; X32-LABEL: test4:
	; X32: addl			; X32: addl
	; X32-NEXT: jo			; X32-NEXT: jo

	; X64-LABEL: test4:			; X64-LABEL: test4:
	; X64: addl %e[[A1:si\|dx]], %e[[A0:di\|cx]]			; X64: addl %e[[A1:si\|dx]], %e[[A0:di\|cx]]
	; X64-NEXT: jo			; X64-NEXT: jo
	}			}

	define i1 @test5(i32 %v1, i32 %v2, i32* %X) nounwind {			define i1 @test5(i32 %v1, i32 %v2, i32* %X) nounwind {
	entry:			entry:
	%t = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %v1, i32 %v2)			%t = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %v1, i32 %v2)
	%sum = extractvalue {i32, i1} %t, 0			%sum = extractvalue {i32, i1} %t, 0
	%obit = extractvalue {i32, i1} %t, 1			%obit = extractvalue {i32, i1} %t, 1
	br i1 %obit, label %carry, label %normal			%notobit = xor i1 1, %obit
				br i1 %notobit, label %normal, label %carry

	normal:			normal:
	store i32 0, i32* %X			store i32 0, i32* %X
	br label %carry			br label %carry

	carry:			carry:
	ret i1 false			ret i1 false

	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

test/CodeGen/X86/avx-splat.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>			; shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
	;			;
	define <8 x float> @funcE() nounwind {			define <8 x float> @funcE() nounwind {
	; CHECK-LABEL: funcE:			; CHECK-LABEL: funcE:
	; CHECK: ## BB#0: ## %for_exit499			; CHECK: ## BB#0: ## %for_exit499
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: ## implicit-def: %YMM0			; CHECK-NEXT: ## implicit-def: %YMM0
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne LBB4_2			; CHECK-NEXT: je LBB4_1
	; CHECK-NEXT: ## BB#1: ## %load.i1247			; CHECK-NEXT: ## BB#2: ## %__load_and_broadcast_32.exit1249
				; CHECK-NEXT: retq
				; CHECK-NEXT: LBB4_1: ## %load.i1247
	; CHECK-NEXT: pushq %rbp			; CHECK-NEXT: pushq %rbp
	; CHECK-NEXT: movq %rsp, %rbp			; CHECK-NEXT: movq %rsp, %rbp
	; CHECK-NEXT: andq $-32, %rsp			; CHECK-NEXT: andq $-32, %rsp
	; CHECK-NEXT: subq $1312, %rsp ## imm = 0x520			; CHECK-NEXT: subq $1312, %rsp ## imm = 0x520
	; CHECK-NEXT: vbroadcastss {{[0-9]+}}(%rsp), %ymm0			; CHECK-NEXT: vbroadcastss {{[0-9]+}}(%rsp), %ymm0
	; CHECK-NEXT: movq %rbp, %rsp			; CHECK-NEXT: movq %rbp, %rsp
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	; CHECK-NEXT: LBB4_2: ## %__load_and_broadcast_32.exit1249
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	allocas:			allocas:
	%udx495 = alloca [18 x [18 x float]], align 32			%udx495 = alloca [18 x [18 x float]], align 32
	br label %for_test505.preheader			br label %for_test505.preheader

	for_test505.preheader: ; preds = %for_test505.preheader, %allocas			for_test505.preheader: ; preds = %for_test505.preheader, %allocas
	br i1 undef, label %for_exit499, label %for_test505.preheader			br i1 undef, label %for_exit499, label %for_test505.preheader

	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-cmp.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	}			}

	define float @test5(float %p) #0 {			define float @test5(float %p) #0 {
	; ALL-LABEL: test5:			; ALL-LABEL: test5:
	; ALL: ## BB#0: ## %entry			; ALL: ## BB#0: ## %entry
	; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1			; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; ALL-NEXT: vucomiss %xmm1, %xmm0			; ALL-NEXT: vucomiss %xmm1, %xmm0
	; ALL-NEXT: jne LBB3_1			; ALL-NEXT: jne LBB3_1
	; ALL-NEXT: jnp LBB3_2			; ALL-NEXT: jp LBB3_1
				; ALL-NEXT: ## BB#2: ## %return
				; ALL-NEXT: retq
	; ALL-NEXT: LBB3_1: ## %if.end			; ALL-NEXT: LBB3_1: ## %if.end
	; ALL-NEXT: seta %al			; ALL-NEXT: seta %al
	; ALL-NEXT: movzbl %al, %eax			; ALL-NEXT: movzbl %al, %eax
	; ALL-NEXT: leaq {{.*}}(%rip), %rcx			; ALL-NEXT: leaq {{.*}}(%rip), %rcx
	; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; ALL-NEXT: LBB3_2: ## %return
	; ALL-NEXT: retq			; ALL-NEXT: retq
	entry:			entry:
	%cmp = fcmp oeq float %p, 0.000000e+00			%cmp = fcmp oeq float %p, 0.000000e+00
	br i1 %cmp, label %return, label %if.end			br i1 %cmp, label %return, label %if.end

	if.end: ; preds = %entry			if.end: ; preds = %entry
	%cmp1 = fcmp ogt float %p, 0.000000e+00			%cmp1 = fcmp ogt float %p, 0.000000e+00
	%cond = select i1 %cmp1, float 1.000000e+00, float -1.000000e+00			%cond = select i1 %cmp1, float 1.000000e+00, float -1.000000e+00
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

test/CodeGen/X86/bt.ll

	Show All 18 Lines
	; operand is constant are included).			; operand is constant are included).
	; - The and can be commuted.			; - The and can be commuted.

	define void @test2(i32 %x, i32 %n) nounwind {			define void @test2(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB0_2			; CHECK-NEXT: jb .LBB0_2
	;			;
				davidxlUnsubmitted Not Done Reply Inline Actions This test has not changed in behavior. Better to revert the change. davidxl: This test has not changed in behavior. Better to revert the change.
				iterateeAuthorUnsubmitted Not Done Reply Inline Actions I'll do a complete check for any tests that fall into this category and revert them. iteratee: I'll do a complete check for any tests that fall into this category and revert them.
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 %tmp29, 1			%tmp3 = and i32 %tmp29, 1
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test2b(i32 %x, i32 %n) nounwind {			define void @test2b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test2b:			; CHECK-LABEL: test2b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB1_2			; CHECK-NEXT: jb .LBB1_2
	;			;
	entry:			entry:
	%tmp29 = lshr i32 %x, %n			%tmp29 = lshr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}
	Show All 23 Lines
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB3_2			; CHECK-NEXT: jb .LBB3_2
	;			;
	entry:			entry:
	%tmp29 = ashr i32 %x, %n			%tmp29 = ashr i32 %x, %n
	%tmp3 = and i32 1, %tmp29			%tmp3 = and i32 1, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test3(i32 %x, i32 %n) nounwind {			define void @test3(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB4_2			; CHECK-NEXT: jb .LBB4_2
	;			;
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %tmp29, %x			%tmp3 = and i32 %tmp29, %x
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}

	define void @test3b(i32 %x, i32 %n) nounwind {			define void @test3b(i32 %x, i32 %n) nounwind {
	; CHECK-LABEL: test3b:			; CHECK-LABEL: test3b:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
	; CHECK-NEXT: btl %esi, %edi			; CHECK-NEXT: btl %esi, %edi
	; CHECK-NEXT: jb .LBB5_2			; CHECK-NEXT: jb .LBB5_2
	;			;
	entry:			entry:
	%tmp29 = shl i32 1, %n			%tmp29 = shl i32 1, %n
	%tmp3 = and i32 %x, %tmp29			%tmp3 = and i32 %x, %tmp29
	%tmp4 = icmp eq i32 %tmp3, 0			%tmp4 = icmp eq i32 %tmp3, 0
	br i1 %tmp4, label %bb, label %UnifiedReturnBlock			br i1 %tmp4, label %bb, label %UnifiedReturnBlock, !prof !1

	bb:			bb:
	call void @foo()			call void @foo()
	ret void			ret void

	UnifiedReturnBlock:			UnifiedReturnBlock:
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines
	entry:			entry:
	%and = and i32 %bit, 31			%and = and i32 %bit, 31
	%sh_prom = zext i32 %and to i64			%sh_prom = zext i32 %and to i64
	%shl = shl i64 1, %sh_prom			%shl = shl i64 1, %sh_prom
	%and1 = and i64 %shl, %bits			%and1 = and i64 %shl, %bits
	%tobool = icmp ne i64 %and1, 0			%tobool = icmp ne i64 %and1, 0
	ret i1 %tobool			ret i1 %tobool
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/X86/critical-edge-split-2.ll

	Show All 18 Lines

	cond.end.i: ; preds = %entry			cond.end.i: ; preds = %entry
	%call1 = phi i16 [ trunc (i32 srem (i32 1, i32 zext (i1 icmp eq (%1* bitcast (i8* getelementptr inbounds (%0, %0* @g_2, i64 0, i32 1, i32 0) to %1), %1 @g_4) to i32)) to i16), %cond.false.i ], [ 1, %entry ]			%call1 = phi i16 [ trunc (i32 srem (i32 1, i32 zext (i1 icmp eq (%1* bitcast (i8* getelementptr inbounds (%0, %0* @g_2, i64 0, i32 1, i32 0) to %1), %1 @g_4) to i32)) to i16), %cond.false.i ], [ 1, %entry ]
	ret i16 %call1			ret i16 %call1
	}			}

	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: testb %dil, %dil			; CHECK: testb %dil, %dil
	; CHECK: jne LBB0_2			; CHECK: je LBB0_1
				; CHECK: retq
				; CHECK: LBB0_1:
	; CHECK: divl			; CHECK: divl
	; CHECK: LBB0_2:

test/CodeGen/X86/fp-une-cmp.ll

	Show All 30 Lines
	; CHECK-NEXT: jp .LBB0_2			; CHECK-NEXT: jp .LBB0_2
	; CHECK-NEXT: # BB#1: # %bb1			; CHECK-NEXT: # BB#1: # %bb1
	; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0			; CHECK-NEXT: addsd {{.*}}(%rip), %xmm0
	; CHECK-NEXT: .LBB0_2: # %bb2			; CHECK-NEXT: .LBB0_2: # %bb2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq

	entry:			entry:
	%mul = fmul double %x, %y			%mul = fmul double %x, %y
	%cmp = fcmp une double %mul, 0.000000e+00			%cmp = fcmp oeq double %mul, 0.000000e+00
	br i1 %cmp, label %bb2, label %bb1			br i1 %cmp, label %bb1, label %bb2

	bb1:			bb1:
	%add = fadd double %mul, -1.000000e+00			%add = fadd double %mul, -1.000000e+00
	br label %bb2			br label %bb2

	bb2:			bb2:
	%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]			%phi = phi double [ %add, %bb1 ], [ %mul, %entry ]
	ret double %phi			ret double %phi
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

test/CodeGen/X86/jump_sign.ll

	; RUN: llc < %s -march=x86 -mcpu=pentiumpro -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -march=x86 -mcpu=pentiumpro -verify-machineinstrs \| FileCheck %s

	define i32 @func_f(i32 %X) {			define i32 @func_f(i32 %X) {
	entry:			entry:
	; CHECK-LABEL: func_f:			; CHECK-LABEL: func_f:
	; CHECK: jns			; CHECK: jns
	%tmp1 = add i32 %X, 1 ; <i32> [#uses=1]			%tmp1 = add i32 %X, 1 ; <i32> [#uses=1]
	%tmp = icmp slt i32 %tmp1, 0 ; <i1> [#uses=1]			%tmp = icmp slt i32 %tmp1, 0 ; <i1> [#uses=1]
	br i1 %tmp, label %cond_true, label %cond_next			br i1 %tmp, label %cond_true, label %cond_next, !prof !1

	cond_true: ; preds = %entry			cond_true: ; preds = %entry
	%tmp2 = tail call i32 (...) @bar( ) ; <i32> [#uses=0]			%tmp2 = tail call i32 (...) @bar( ) ; <i32> [#uses=0]
	br label %cond_next			br label %cond_next

	cond_next: ; preds = %cond_true, %entry			cond_next: ; preds = %cond_true, %entry
	%tmp3 = tail call i32 (...) @baz( ) ; <i32> [#uses=0]			%tmp3 = tail call i32 (...) @baz( ) ; <i32> [#uses=0]
	ret i32 undef			ret i32 undef
	▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
	if.then:			if.then:
	%dec = add nsw i32 %1, -1			%dec = add nsw i32 %1, -1
	store i32 %dec, i32* @a, align 4			store i32 %dec, i32* @a, align 4
	br label %if.end			br label %if.end

	if.end:			if.end:
	ret i32 undef			ret i32 undef
	}			}

				!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/X86/machine-cse.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines

	; CSE physical register defining instruction across MBB boundary.			; CSE physical register defining instruction across MBB boundary.
	; rdar://10660865			; rdar://10660865
	define i32 @cross_mbb_phys_cse(i32 %a, i32 %b) nounwind ssp {			define i32 @cross_mbb_phys_cse(i32 %a, i32 %b) nounwind ssp {
	entry:			entry:
	; CHECK-LABEL: cross_mbb_phys_cse:			; CHECK-LABEL: cross_mbb_phys_cse:
	; CHECK: cmpl			; CHECK: cmpl
	; CHECK: ja			; CHECK: ja
	%cmp = icmp ugt i32 %a, %b			%cmp = icmp ule i32 %a, %b
	br i1 %cmp, label %return, label %if.end			br i1 %cmp, label %if.end, label %return

	if.end: ; preds = %entry			if.end: ; preds = %entry
	; CHECK-NOT: cmpl			; CHECK-NOT: cmpl
	; CHECK: sbbl			; CHECK: sbbl
	%cmp1 = icmp ult i32 %a, %b			%cmp1 = icmp ult i32 %a, %b
	%. = sext i1 %cmp1 to i32			%. = sext i1 %cmp1 to i32
	br label %return			br label %return

	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/X86/shift-double.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s

	; Shift i64 integers on 32-bit target			; Shift i64 integers on 32-bit target

	define i64 @test1(i64 %X, i8 %C) nounwind {			define i64 @test1(i64 %X, i8 %C) nounwind {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: pushl %esi			; CHECK-NEXT: pushl %esi
	; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl			; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: shll %cl, %eax			; CHECK-NEXT: shll %cl, %eax
	; CHECK-NEXT: shldl %cl, %esi, %edx			; CHECK-NEXT: shldl %cl, %esi, %edx
	; CHECK-NEXT: testb $32, %cl			; CHECK-NEXT: testb $32, %cl
	; CHECK-NEXT: je .LBB0_2			; CHECK-NEXT: jne .LBB0_1
	; CHECK-NEXT: # BB#1:			; CHECK-NEXT: # BB#2:
				; CHECK-NEXT: popl %esi
				; CHECK-NEXT: retl
				; CHECK-NEXT: .LBB0_1:
	; CHECK-NEXT: movl %eax, %edx			; CHECK-NEXT: movl %eax, %edx
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: .LBB0_2:
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%shift.upgrd.1 = zext i8 %C to i64 ; <i64> [#uses=1]			%shift.upgrd.1 = zext i8 %C to i64 ; <i64> [#uses=1]
	%Y = shl i64 %X, %shift.upgrd.1 ; <i64> [#uses=1]			%Y = shl i64 %X, %shift.upgrd.1 ; <i64> [#uses=1]
	ret i64 %Y			ret i64 %Y
	}			}

	define i64 @test2(i64 %X, i8 %C) nounwind {			define i64 @test2(i64 %X, i8 %C) nounwind {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: pushl %esi			; CHECK-NEXT: pushl %esi
	; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl			; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi
	; CHECK-NEXT: movl %esi, %edx			; CHECK-NEXT: movl %esi, %edx
	; CHECK-NEXT: sarl %cl, %edx			; CHECK-NEXT: sarl %cl, %edx
	; CHECK-NEXT: shrdl %cl, %esi, %eax			; CHECK-NEXT: shrdl %cl, %esi, %eax
	; CHECK-NEXT: testb $32, %cl			; CHECK-NEXT: testb $32, %cl
	; CHECK-NEXT: je .LBB1_2			; CHECK-NEXT: jne .LBB1_1
	; CHECK-NEXT: # BB#1:			; CHECK-NEXT: # BB#2:
				; CHECK-NEXT: popl %esi
				; CHECK-NEXT: retl
				; CHECK-NEXT: .LBB1_1:
	; CHECK-NEXT: sarl $31, %esi			; CHECK-NEXT: sarl $31, %esi
	; CHECK-NEXT: movl %edx, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: movl %esi, %edx			; CHECK-NEXT: movl %esi, %edx
	; CHECK-NEXT: .LBB1_2:
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%shift.upgrd.2 = zext i8 %C to i64 ; <i64> [#uses=1]			%shift.upgrd.2 = zext i8 %C to i64 ; <i64> [#uses=1]
	%Y = ashr i64 %X, %shift.upgrd.2 ; <i64> [#uses=1]			%Y = ashr i64 %X, %shift.upgrd.2 ; <i64> [#uses=1]
	ret i64 %Y			ret i64 %Y
	}			}

	define i64 @test3(i64 %X, i8 %C) nounwind {			define i64 @test3(i64 %X, i8 %C) nounwind {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: pushl %esi			; CHECK-NEXT: pushl %esi
	; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl			; CHECK-NEXT: movb {{[0-9]+}}(%esp), %cl
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi
	; CHECK-NEXT: movl %esi, %edx			; CHECK-NEXT: movl %esi, %edx
	; CHECK-NEXT: shrl %cl, %edx			; CHECK-NEXT: shrl %cl, %edx
	; CHECK-NEXT: shrdl %cl, %esi, %eax			; CHECK-NEXT: shrdl %cl, %esi, %eax
	; CHECK-NEXT: testb $32, %cl			; CHECK-NEXT: testb $32, %cl
	; CHECK-NEXT: je .LBB2_2			; CHECK-NEXT: jne .LBB2_1
	; CHECK-NEXT: # BB#1:			; CHECK-NEXT: # BB#2:
				; CHECK-NEXT: popl %esi
				; CHECK-NEXT: retl
				; CHECK-NEXT: .LBB2_1:
	; CHECK-NEXT: movl %edx, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: xorl %edx, %edx			; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: .LBB2_2:
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%shift.upgrd.3 = zext i8 %C to i64 ; <i64> [#uses=1]			%shift.upgrd.3 = zext i8 %C to i64 ; <i64> [#uses=1]
	%Y = lshr i64 %X, %shift.upgrd.3 ; <i64> [#uses=1]			%Y = lshr i64 %X, %shift.upgrd.3 ; <i64> [#uses=1]
	ret i64 %Y			ret i64 %Y
	}			}

	; Combine 2xi32/2xi16 shifts into SHLD			; Combine 2xi32/2xi16 shifts into SHLD
	▲ Show 20 Lines • Show All 236 Lines • Show Last 20 Lines

test/CodeGen/X86/sink-hoist.ll

	Show All 20 Lines
	}			}

	; Make sure the critical edge is broken so the divsd is sunken below			; Make sure the critical edge is broken so the divsd is sunken below
	; the conditional branch.			; the conditional branch.
	; rdar://8454886			; rdar://8454886

	; CHECK-LABEL: split:			; CHECK-LABEL: split:
	; CHECK-NEXT: testb $1, %dil			; CHECK-NEXT: testb $1, %dil
	; CHECK-NEXT: je			; CHECK-NEXT: jne
				; CHECK: ret
	; CHECK: divsd			; CHECK: divsd
	; CHECK: movapd			; CHECK: movapd
	; CHECK: ret			; CHECK: ret
	define double @split(double %x, double %y, i1 %c) nounwind {			define double @split(double %x, double %y, i1 %c) nounwind {
	%a = fdiv double %x, 3.2			%a = fdiv double %x, 3.2
	%z = select i1 %c, double %a, double %y			%z = select i1 %c, double %a, double %y
	ret double %z			ret double %z
	}			}
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

test/CodeGen/X86/sse-scalar-fp-arith.ll

	Show First 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: .LBB62_1:			; SSE41-NEXT: .LBB62_1:
	; SSE41-NEXT: addss %xmm0, %xmm1			; SSE41-NEXT: addss %xmm0, %xmm1
	; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]			; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: add_ss_mask:			; AVX1-LABEL: add_ss_mask:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: testb $1, %dil			; AVX1-NEXT: testb $1, %dil
	; AVX1-NEXT: je .LBB62_2			; AVX1-NEXT: jne .LBB62_1
	; AVX1-NEXT: # BB#1:			; AVX1-NEXT: # BB#2:
				; AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB62_1:
	; AVX1-NEXT: vaddss %xmm1, %xmm0, %xmm2			; AVX1-NEXT: vaddss %xmm1, %xmm0, %xmm2
	; AVX1-NEXT: .LBB62_2:
	; AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]			; AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX512-LABEL: add_ss_mask:			; AVX512-LABEL: add_ss_mask:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: andl $1, %edi			; AVX512-NEXT: andl $1, %edi
	; AVX512-NEXT: kmovw %edi, %k1			; AVX512-NEXT: kmovw %edi, %k1
	; AVX512-NEXT: vaddss %xmm1, %xmm0, %xmm2 {%k1}			; AVX512-NEXT: vaddss %xmm1, %xmm0, %xmm2 {%k1}
	Show All 35 Lines
	; SSE41-NEXT: .LBB63_1:			; SSE41-NEXT: .LBB63_1:
	; SSE41-NEXT: addsd %xmm0, %xmm1			; SSE41-NEXT: addsd %xmm0, %xmm1
	; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]			; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: add_sd_mask:			; AVX1-LABEL: add_sd_mask:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: testb $1, %dil			; AVX1-NEXT: testb $1, %dil
	; AVX1-NEXT: je .LBB63_2			; AVX1-NEXT: jne .LBB63_1
	; AVX1-NEXT: # BB#1:			; AVX1-NEXT: # BB#2:
				; AVX1-NEXT: vblendpd {{.*#+}} xmm0 = xmm2[0],xmm0[1]
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB63_1:
	; AVX1-NEXT: vaddsd %xmm1, %xmm0, %xmm2			; AVX1-NEXT: vaddsd %xmm1, %xmm0, %xmm2
	; AVX1-NEXT: .LBB63_2:
	; AVX1-NEXT: vblendpd {{.*#+}} xmm0 = xmm2[0],xmm0[1]			; AVX1-NEXT: vblendpd {{.*#+}} xmm0 = xmm2[0],xmm0[1]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX512-LABEL: add_sd_mask:			; AVX512-LABEL: add_sd_mask:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: andl $1, %edi			; AVX512-NEXT: andl $1, %edi
	; AVX512-NEXT: kmovw %edi, %k1			; AVX512-NEXT: kmovw %edi, %k1
	; AVX512-NEXT: vaddsd %xmm1, %xmm0, %xmm2 {%k1}			; AVX512-NEXT: vaddsd %xmm1, %xmm0, %xmm2 {%k1}
	Show All 12 Lines

test/CodeGen/X86/testb-je-fusion.ll

	; RUN: llc < %s -march=x86-64 -mcpu=corei7-avx \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mcpu=corei7-avx \| FileCheck %s

	; testb should be scheduled right before je to enable macro-fusion.			; testb should be scheduled right before je to enable macro-fusion.

	; CHECK: testb $2, %{{[abcd]}}h			; CHECK: testb $2, %{{[abcd]}}h
	; CHECK-NEXT: je			; CHECK-NEXT: je

	define i32 @check_flag(i32 %flags, ...) nounwind {			define i32 @check_flag(i32 %flags, ...) nounwind {
	entry:			entry:
	%and = and i32 %flags, 512			%and = and i32 %flags, 512
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	br i1 %tobool, label %if.end, label %if.then			br i1 %tobool, label %if.end, label %if.then, !prof !1

	if.then:			if.then:
	br label %if.end			br label %if.end

	if.end:			if.end:
	%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]			%hasflag = phi i32 [ 1, %if.then ], [ 0, %entry ]
	ret i32 %hasflag			ret i32 %hasflag
	}			}
				!1 = !{!"branch_weights", i32 1, i32 2}

This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85073

lib/CodeGen/MachineBlockPlacement.cpp

test/CodeGen/AArch64/addsub.ll

test/CodeGen/AArch64/arm64-atomic.ll

test/CodeGen/AArch64/arm64-ccmp.ll

test/CodeGen/AArch64/arm64-shrink-wrapping.ll

test/CodeGen/AArch64/compare-branch.ll

test/CodeGen/AArch64/logical_shifted_reg.ll

test/CodeGen/AArch64/tail-dup-repeat-worklist.ll

test/CodeGen/AArch64/tbz-tbnz.ll

test/CodeGen/AMDGPU/branch-relaxation.ll

test/CodeGen/AMDGPU/si-annotate-cf-noloop.ll

test/CodeGen/AMDGPU/uniform-cfg.ll

test/CodeGen/ARM/arm-and-tst-peephole.ll

test/CodeGen/ARM/atomic-op.ll

test/CodeGen/ARM/atomic-ops-v8.ll

test/CodeGen/ARM/cmpxchg-weak.ll

test/CodeGen/ARM/machine-cse-cmp.ll

test/CodeGen/Mips/brconeq.ll

test/CodeGen/Mips/brconeqk.ll

test/CodeGen/Mips/brcongt.ll

test/CodeGen/Mips/brconlt.ll

test/CodeGen/Mips/brconnez.ll

test/CodeGen/Mips/llvm-ir/ashr.ll

test/CodeGen/Mips/micromips-compact-branches.ll

test/CodeGen/PowerPC/misched-inorder-latency.ll

test/CodeGen/PowerPC/tail-dup-break-cfg.ll

test/CodeGen/PowerPC/tail-dup-layout.ll

test/CodeGen/SPARC/sjlj.ll

test/CodeGen/SystemZ/asm-18.ll

test/CodeGen/SystemZ/cond-store-01.ll

test/CodeGen/SystemZ/cond-store-02.ll

test/CodeGen/SystemZ/cond-store-03.ll

test/CodeGen/SystemZ/cond-store-04.ll

test/CodeGen/SystemZ/cond-store-05.ll

test/CodeGen/SystemZ/cond-store-06.ll

test/CodeGen/SystemZ/int-cmp-37.ll

test/CodeGen/SystemZ/int-cmp-40.ll

test/CodeGen/SystemZ/int-cmp-44.ll

test/CodeGen/SystemZ/int-cmp-48.ll

test/CodeGen/SystemZ/tdc-06.ll

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

test/CodeGen/Thumb2/cbnz.ll

test/CodeGen/Thumb2/ifcvt-compare.ll

test/CodeGen/Thumb2/v8_IT_4.ll

test/CodeGen/WebAssembly/phi.ll

test/CodeGen/X86/2008-11-29-ULT-Sign.ll

test/CodeGen/X86/add.ll

test/CodeGen/X86/avx-splat.ll

test/CodeGen/X86/avx512-cmp.ll

test/CodeGen/X86/bt.ll

test/CodeGen/X86/critical-edge-split-2.ll

test/CodeGen/X86/fp-une-cmp.ll

test/CodeGen/X86/jump_sign.ll

test/CodeGen/X86/machine-cse.ll

test/CodeGen/X86/shift-double.ll

test/CodeGen/X86/sink-hoist.ll

test/CodeGen/X86/sse-scalar-fp-arith.ll

test/CodeGen/X86/testb-je-fusion.ll

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic