This is an archive of the discontinued LLVM Phabricator instance.

Differential D20505

Codegen: Make chains from lattice-shaped CFGs
AbandonedPublic

Authored by iteratee on May 20 2016, 6:33 PM.

Download Raw Diff

Details

Reviewers

davidxl
• tstellarAMD
haicheng

Summary

This change extends D27742 to allow a chain of triangles to
tail-duplicate and produce a lattice. The essential change is that if a
predecessor has the same successors as a layout predecessors, we ignore
that block when considering if we can tail-duplicate into unplaced
predecessors.

As an example consider the following CFG:

  B   D   F   H
 / \ / \ / \ / \
A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:

straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
        mr 30, 3
        andi. 3, 30, 1
        bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
        rlwinm. 3, 30, 0, 30, 30
        beq      0, .LBB0_3
        b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
        bl a
        nop
        rlwinm. 3, 30, 0, 30, 30
        bne      0, .LBB0_4
.LBB0_3:                                # %test3 ; E
        rlwinm. 3, 30, 0, 29, 29
        beq      0, .LBB0_5
        b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
        bl b
        nop
        rlwinm. 3, 30, 0, 29, 29
        bne      0, .LBB0_6
.LBB0_5:                                # %test4 ; G
        rlwinm. 3, 30, 0, 28, 28
        beq      0, .LBB0_8
        b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
        bl c
        nop
        rlwinm. 3, 30, 0, 28, 28
        beq      0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
        bl d
        nop
.LBB0_8:                                # %exit ; Ret
        ld 30, 96(1)                    # 8-byte Folded Reload
        addi 1, 1, 112
        ld 0, 16(1)
        mtlr 0
        blr

This is where the more bold strategy of this patch comes in. We allow E
to be placed, even though its predecessor B (after copying C) is
unplaced, because it is lattice shaped after tail-duplication.
This then produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
        mr 30, 3
        andi. 3, 30, 1
        bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
        rlwinm. 3, 30, 0, 30, 30
        bne      0, .LBB0_5
.LBB0_2:                                # %test3 ; E
        rlwinm. 3, 30, 0, 29, 29
        bne      0, .LBB0_6
.LBB0_3:                                # %test4 ; G
        rlwinm. 3, 30, 0, 28, 28
        bne      0, .LBB0_7
        b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
        bl a
        nop
        rlwinm. 3, 30, 0, 30, 30
        beq      0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
        bl b
        nop
        rlwinm. 3, 30, 0, 29, 29
        beq      0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
        bl c
        nop
        rlwinm. 3, 30, 0, 28, 28
        beq      0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
        bl d
        nop
.LBB0_8:                                # %exit

Diff Detail

Event Timeline

iteratee updated this revision to Diff 58025.May 20 2016, 6:33 PM

iteratee retitled this revision from to Codegen: Outline for chains of tail-duplicable blocks..

iteratee updated this object.

iteratee added a reviewer: haicheng.

iteratee set the repository for this revision to rL LLVM.

iteratee added subscribers: llvm-commits, chandlerc, echristo.

Herald added subscribers: dsanders, jyknight, jfb. · View Herald TranscriptMay 20 2016, 6:33 PM

iteratee added parent revisions: D20379: Codegen: Fix broken assumption in Tail Merge., D18226: Codegen: Tail-duplicate during placement..May 20 2016, 6:33 PM

sunfish added a subscriber: sunfish.May 20 2016, 7:04 PM

iteratee mentioned this in D18226: Codegen: Tail-duplicate during placement..May 23 2016, 12:42 PM

iteratee mentioned this in D20604: Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough..May 24 2016, 3:39 PM

iteratee added a parent revision: D20604: Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough..

iteratee removed a parent revision: D20604: Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough..May 25 2016, 5:29 PM

Added fixes for AMDGPU tests, as some intervening change has enabled this optimization for that target.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptMay 25 2016, 5:32 PM

iteratee edited edge metadata.May 31 2016, 11:35 AM

iteratee added a subscriber: kbarton.

Did a quick run through for clarity. A few inline comments. Few requests to break things up. Check for coding style nits across the entire set of code and feel free to run clang format on the lines you've changed.

Thanks for the work so far!

-eric

lib/CodeGen/MachineBlockPlacement.cpp
724	Sadly I'm not sure if you're adding or deleting whitespace here. Either way feel free to do it separately.
756–757	Go ahead and commit this separately (along with the one below). Also "mismatch" and you shouldn't need the \n.
825–826	Can you document everything that's going on here more please? In particular, what's going on with the callback here and why it needs to be a callback rather than happening on the spot.
1288	Once again I can't remember if we have autobrief turned on or not...
1297	Formatting.
1312	Coding style nit: no braces around single lines.

Added a couple of comments and tidied formatting.

Formatting and comments.

iteratee added inline comments.Jun 8 2016, 2:59 PM

lib/CodeGen/MachineBlockPlacement.cpp
825–826	I've added a comment to this effect, but the reason it has to be a callback is because none of the things that occur would be valid after deleting the block. (use after free). As to the rest, the function is broken up into small chunks with a comment as to what each chunk is doing. Is there something more you'd like to see?
1288	Even if we do, it's probably better to match the existing style for this change, and clean it up in a separate patch.

Add comments about callback

davidxl added a reviewer: davidxl.Jun 13 2016, 10:40 AM

Kyle, can you update your patch and do a rebase -- there were recent restructure changes in MBP which can make the code cleaner.

iteratee mentioned this in D21674: [BranchFolding] Update UnavoidableBlocks for OutlineOptionalBranches.Jun 24 2016, 10:33 AM

Added changes to handle re-laying out code that had been tail-duplicated into the same shape. Necessary to work correctly with tail merging during layout.

Herald added a subscriber: nemanjai. · View Herald TranscriptJun 28 2016, 4:25 PM

OK, this took longer than I thought it would, because I had to come up with
a good way to interact with tail-merging during layout. Please take a look
now.

Kyle.

minor cleanups.

thanks. I don't seem to find explicit test cases added for this change. can you add one ?

Please also update the description with a real motivation example -- the original code and the pseudo code after the transformation.

davidxl added inline comments.Jun 28 2016, 5:32 PM

lib/CodeGen/MachineBlockPlacement.cpp
316	This document does not help understand the meaning. Can add a reference to detailed description of the algoirthm in other place (e.g. function definition).
408	Brief documentation.
411	Same here.
1118	Please outline this big part into its own method.
lib/CodeGen/TailDuplicator.cpp
787 ↗	(On Diff #62155)	Split out the refactor change.
872 ↗	(On Diff #62155)	Split out the clean-up changes.
test/CodeGen/PowerPC/tail-dup-layout.ll
1	This test case can use some simplifications. Why not just do simple function call in optional branches? The test block can also be simplified for instance testing input parameters.

iteratee mentioned this in rL278288: Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough..Aug 10 2016, 2:11 PM

Lots of updates. Mainly pulled some of the changes into D18226 and expanded the commit message.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptAug 11 2016, 3:29 PM

iteratee updated this object.Aug 11 2016, 3:35 PM

Minor fix.

Add brief comments to method declarations.

include/llvm/Analysis/LoopInfoImpl.h
188 ↗	(On Diff #67764)	I should probably split this out.
lib/CodeGen/MachineBlockPlacement.cpp
1118	This was pulled into D18226 and placed in its own method there.
lib/CodeGen/TailDuplicator.cpp
872 ↗	(On Diff #62155)	I think you mean the line below. I'll split that out. The line above isn't clean up.

iteratee mentioned this in rL278866: Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough..Aug 16 2016, 4:04 PM

Simple rebase.

junbuml added a subscriber: junbuml.Aug 31 2016, 11:17 AM

iteratee removed parent revisions: D18226: Codegen: Tail-duplicate during placement., D20379: Codegen: Fix broken assumption in Tail Merge..Nov 1 2016, 4:39 PM

There are two independent problems that this patch tries to address.

Enable tail duplication for cases when current layout prefers topological order
Handling a sequence of tail-duplicatable blocks.

Please split out 1) and 2) into two different patches.

For patch 1), I don't think it is the right approach to piggy back the implementation on the outline heuristics, please split it out. Ideally, the fix should be simply add one new heuristic checked before hasBetterLayoutSuccessor check or preserve top order only when tail dup is not good:

if (hasBetterLayoutSuccessor(... ) ) {
         if (!IsTailDupCandidate(Succ)) {
               continue;
         }
 }

This is now MUCH shorter.

I realized with some help from davidxl that I didn't need to tie this to the outlining.

Also, because we need to recognize the pattern that occurs from repeated tail-duplication (So that when we repeat layout, we get the same result), we just recognize the pattern instead of using the delay set, as it's redundant.

I need to rewrite the description, but the code should be much easier to review now. The change the placement algorithm is now 22 lines, and most of that is a utility function for CFG matching.

Herald edited edge metadata. · View Herald TranscriptDec 13 2016, 5:17 PM

Herald added subscribers: nhaehnle, wdng. · View Herald Transcript

davidxl added inline comments.Dec 14 2016, 3:33 PM

lib/CodeGen/MachineBlockPlacement.cpp
572	Add a documentation line to this method.
623	Add more explanation here (as comment) and possible with a simple example?

arsenm added a subscriber: arsenm.Dec 15 2016, 1:03 PM

arsenm added inline comments.

test/CodeGen/AMDGPU/convergent-inlineasm.ll
32	Unnecessary whitespace change

iteratee mentioned this in D27742: CodeGen: Allow small copyable blocks to "break" the CFG..Dec 21 2016, 4:28 PM

Rebase and re-write description

Herald edited edge metadata. · View Herald TranscriptJan 6 2017, 5:11 PM

iteratee retitled this revision from Codegen: Outline for chains of tail-duplicable blocks. to Codegen: Make chains from lattice-shaped CFGs.Jan 6 2017, 5:15 PM

iteratee updated this object.

iteratee edited edge metadata.

Add comments as requested

Herald edited edge metadata. · View Herald TranscriptJan 9 2017, 4:04 PM

More comments.

Herald edited edge metadata. · View Herald TranscriptJan 9 2017, 4:32 PM

What is the base revision of this patch?

Since the patch has been rewritten, is it possible to create a new patch (after D27742 lands) and abandon this one? A clean restart can simplify things a lot.

In D20505#641524, @davidxl wrote:

What is the base revision of this patch?

The base is D27742

Since the patch has been rewritten, is it possible to create a new patch (after D27742 lands) and abandon this one? A clean restart can simplify things a lot.

I don't really want to submit D27742 without this patch. I created a new patch as you requested and marked it as a child of D27742.

Please see https://reviews.llvm.org/D28522

Revision Contents

Path

Size

lib/

CodeGen/

MachineBlockPlacement.cpp

54 lines

test/

CodeGen/

AArch64/

branch-relax-cbz.ll

15 lines

optimize-cond-branch.ll

3 lines

AMDGPU/

basic-branch.ll

5 lines

cf-loop-on-constant.ll

2 lines

convergent-inlineasm.ll

1 line

salu-to-valu.ll

2 lines

skip-if-dead.ll

7 lines

ARM/

atomic-cmpxchg.ll

8 lines

fold-stack-adjust.ll

2 lines

PowerPC/

tail-dup-layout.ll

124 lines

WebAssembly/

mem-intrinsics.ll

2 lines

X86/

block-placement.ll

31 lines

tail-dup-merge-loop-headers.ll

4 lines

tail-dup-repeat.ll

2 lines

tail-opts.ll

7 lines

twoaddr-coalesce-3.ll

4 lines

Diff 83739

lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	class MachineBlockPlacement : public MachineFunctionPass {
SmallPtrSet<MachineBasicBlock *, 4> UnavoidableBlocks;		SmallPtrSet<MachineBasicBlock *, 4> UnavoidableBlocks;

/// \brief Allocator and owner of BlockChain structures.		/// \brief Allocator and owner of BlockChain structures.
///		///
/// We build BlockChains lazily while processing the loop structure of		/// We build BlockChains lazily while processing the loop structure of
/// a function. To reduce malloc traffic, we allocate them using this		/// a function. To reduce malloc traffic, we allocate them using this
/// slab-like allocator, and destroy them after the pass completes. An		/// slab-like allocator, and destroy them after the pass completes. An
/// important guarantee is that this allocator produces stable pointers to		/// important guarantee is that this allocator produces stable pointers to
/// the chains.		/// the chains.
		davidxlUnsubmitted Done Reply Inline Actions This document does not help understand the meaning. Can add a reference to detailed description of the algoirthm in other place (e.g. function definition). davidxl: This document does not help understand the meaning. Can add a reference to detailed description…
SpecificBumpPtrAllocator<BlockChain> ChainAllocator;		SpecificBumpPtrAllocator<BlockChain> ChainAllocator;

/// \brief Function wide BasicBlock to BlockChain mapping.		/// \brief Function wide BasicBlock to BlockChain mapping.
///		///
/// This mapping allows efficiently moving from any given basic block to the		/// This mapping allows efficiently moving from any given basic block to the
/// BlockChain it participates in, if any. We use it to, among other things,		/// BlockChain it participates in, if any. We use it to, among other things,
/// allow implicitly defining edges between chains as the existing edges		/// allow implicitly defining edges between chains as the existing edges
/// between basic blocks.		/// between basic blocks.
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	#endif
void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,		void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,		void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L,
const BlockFilterSet &LoopBlockSet);		const BlockFilterSet &LoopBlockSet);
void collectMustExecuteBBs();		void collectMustExecuteBBs();
void buildCFGChains();		void buildCFGChains();
void optimizeBranches();		void optimizeBranches();
void alignBlocks();		void alignBlocks();
		/// Returns true if a block should be tail-duplicated to increase fallthrough
		/// opportunities.
bool shouldTailDuplicate(MachineBasicBlock *BB);		bool shouldTailDuplicate(MachineBasicBlock *BB);
		davidxlUnsubmitted Done Reply Inline Actions Brief documentation. davidxl: Brief documentation.
		/// Returns true if a block can tail duplicate into all unplaced
		/// predecessors. Filters based on loop.
bool canTailDuplicateUnplacedPreds(		bool canTailDuplicateUnplacedPreds(
		davidxlUnsubmitted Done Reply Inline Actions Same here. davidxl: Same here.
MachineBasicBlock BB, MachineBasicBlock Succ,		MachineBasicBlock BB, MachineBasicBlock Succ,
BlockChain &Chain, const BlockFilterSet *BlockFilter);		BlockChain &Chain, const BlockFilterSet *BlockFilter);

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineBlockPlacement() : MachineFunctionPass(ID) {		MachineBlockPlacement() : MachineFunctionPass(ID) {
initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());		initializeMachineBlockPlacementPass(*PassRegistry::getPassRegistry());
}		}
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	getAdjustedProbability(BranchProbability OrigProb,
if (SuccProbN >= SuccProbD)		if (SuccProbN >= SuccProbD)
SuccProb = BranchProbability::getOne();		SuccProb = BranchProbability::getOne();
else		else
SuccProb = BranchProbability(SuccProbN, SuccProbD);		SuccProb = BranchProbability(SuccProbN, SuccProbD);

return SuccProb;		return SuccProb;
}		}

/// Check if a block should be tail duplicated.		/// Check if \p BB has exactly the successors in \p Successors.
		davidxlUnsubmitted Done Reply Inline Actions Add a documentation line to this method. davidxl: Add a documentation line to this method.
		static bool hasSameSuccessors(
		MachineBasicBlock &BB, SmallPtrSetImpl<MachineBasicBlock *> &Successors) {
		if (BB.succ_size() != Successors.size())
		return false;
		// We don't want to count self-loops
		if (Successors.count(&BB))
		return false;
		for (MachineBasicBlock *Succ : BB.successors())
		if (!Successors.count(Succ))
		return false;
		return true;
		}

		/// Check if a block should be tail duplicated to increase fallthrough
		/// opportunities.
/// \p BB Block to check.		/// \p BB Block to check.
bool MachineBlockPlacement::shouldTailDuplicate(MachineBasicBlock *BB) {		bool MachineBlockPlacement::shouldTailDuplicate(MachineBasicBlock *BB) {
// Blocks with single successors don't create additional fallthrough		// Blocks with single successors don't create additional fallthrough
// opportunities. Don't duplicate them. TODO: When conditional exits are		// opportunities. Don't duplicate them. TODO: When conditional exits are
// analyzable, allow them to be duplicated.		// analyzable, allow them to be duplicated.
bool IsSimple = TailDup.isSimpleBB(BB);		bool IsSimple = TailDup.isSimpleBB(BB);

if (BB->succ_size() == 1)		if (BB->succ_size() == 1)
Show All 10 Lines
/// We also identify blocks with the CFG that would have been produced by		/// We also identify blocks with the CFG that would have been produced by
/// tail-duplication and lay them out in the same manner.		/// tail-duplication and lay them out in the same manner.
bool MachineBlockPlacement::canTailDuplicateUnplacedPreds(		bool MachineBlockPlacement::canTailDuplicateUnplacedPreds(
MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &Chain,		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &Chain,
const BlockFilterSet *BlockFilter) {		const BlockFilterSet *BlockFilter) {
if (!shouldTailDuplicate(Succ))		if (!shouldTailDuplicate(Succ))
return false;		return false;

		// For CFG checking.
		SmallPtrSet<MachineBasicBlock *, 4> Successors(BB->succ_begin(), BB->succ_end());
for (MachineBasicBlock *Pred : Succ->predecessors()) {		for (MachineBasicBlock *Pred : Succ->predecessors()) {
// Make sure all unplaced and unfiltered predecessors can be		// Make sure all unplaced and unfiltered predecessors can be
// tail-duplicated into.		// tail-duplicated into.
if (Pred == BB \|\| (BlockFilter && !BlockFilter->count(Pred))		if (Pred == BB \|\| (BlockFilter && !BlockFilter->count(Pred))
\|\| BlockToChain[Pred] == &Chain)		\|\| BlockToChain[Pred] == &Chain)
continue;		continue;
if (!TailDup.canTailDuplicate(Succ, Pred))		if (!TailDup.canTailDuplicate(Succ, Pred)) {
		if (Successors.size() > 1
		davidxlUnsubmitted Done Reply Inline Actions Add more explanation here (as comment) and possible with a simple example? davidxl: Add more explanation here (as comment) and possible with a simple example?
		&& hasSameSuccessors(*Pred, Successors))
		// This looks like a tail-duplicated block. Skip it.
		// For example:
		// A A
		// \|\ \|\
		// \| \ \| \
		// \| C \| C
		// \| / \| \|
		// \|/ \| \|
		// B => B \|
		// \|\ \|\/\|
		// \| \ \|/\\|
		// \| D \| D
		// \| / \| /
		// \|/ \|/
		// E E
		//
		// After B was duplicated into C, the layout looks like the one on the
		// right. B and C now have the same successors. When considering whether
		// E can be duplicated into all its unplaced predecessors, we ignore C.
		// This allows lattices to be laid out in 2 separate chains (ABE...) and
		// later (CD...) This is a reasonable heuristic because it allows the
		// creation of 2 fallthrough paths with links between them.
		// We look for the CFG pattern rather than recording the blocks because
		// we want layout to be repeatable, and if some other pass does the
		// tail-duplication, we want to lay it out the same way.
		continue;
return false;		return false;
}		}
		}
return true;		return true;
}		}

/// When the option OutlineOptionalBranches is on, this method		/// When the option OutlineOptionalBranches is on, this method
/// checks if the fallthrough candidate block \p Succ (of block		/// checks if the fallthrough candidate block \p Succ (of block
/// \p BB) also has other unscheduled predecessor blocks which		/// \p BB) also has other unscheduled predecessor blocks which
/// are also successors of \p BB (forming triangular shape CFG).		/// are also successors of \p BB (forming triangular shape CFG).
/// If none of such predecessors are small, it returns true.		/// If none of such predecessors are small, it returns true.
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static BranchProbability getLayoutSuccessorProbThreshold(
return BranchProbability(ProfileLikelyProb, 100);		return BranchProbability(ProfileLikelyProb, 100);
}		}

/// Checks to see if the layout candidate block \p Succ has a better layout		/// Checks to see if the layout candidate block \p Succ has a better layout
/// predecessor than \c BB. If yes, returns true.		/// predecessor than \c BB. If yes, returns true.
bool MachineBlockPlacement::hasBetterLayoutPredecessor(		bool MachineBlockPlacement::hasBetterLayoutPredecessor(
MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,		MachineBasicBlock BB, MachineBasicBlock Succ, BlockChain &SuccChain,
BranchProbability SuccProb, BranchProbability RealSuccProb,		BranchProbability SuccProb, BranchProbability RealSuccProb,
BlockChain &Chain, const BlockFilterSet *BlockFilter) {		BlockChain &Chain, const BlockFilterSet *BlockFilter) {
		echristoUnsubmitted Done Reply Inline Actions Sadly I'm not sure if you're adding or deleting whitespace here. Either way feel free to do it separately. echristo: Sadly I'm not sure if you're adding or deleting whitespace here. Either way feel free to do it…

// There isn't a better layout when there are no unscheduled predecessors.		// There isn't a better layout when there are no unscheduled predecessors.
if (SuccChain.UnscheduledPredecessors == 0)		if (SuccChain.UnscheduledPredecessors == 0)
return false;		return false;

// As a heuristic, if we can duplicate the block into all its unscheduled		// As a heuristic, if we can duplicate the block into all its unscheduled
// predecessors, we return false.		// predecessors, we return false.
if (TailDupPlacement		if (TailDupPlacement
Show All 15 Lines	bool MachineBlockPlacement::hasBetterLayoutPredecessor(
// With this layout, Pred BB		// With this layout, Pred BB
// is forced to be outlined, so the overall cost will be cost of the		// is forced to be outlined, so the overall cost will be cost of the
// branch taken from BB to Pred, plus the cost of back taken branch		// branch taken from BB to Pred, plus the cost of back taken branch
// from Pred to Succ, as well as the additional cost associated		// from Pred to Succ, as well as the additional cost associated
// with the needed unconditional jump instruction from Pred To Succ.		// with the needed unconditional jump instruction from Pred To Succ.

// The cost of the topological order layout is the taken branch cost		// The cost of the topological order layout is the taken branch cost
// from BB to Succ, so to make BB->Succ a viable candidate, the following		// from BB to Succ, so to make BB->Succ a viable candidate, the following
// must hold:		// must hold:
// 2 * freq(BB->Pred) * taken_branch_cost + unconditional_jump_cost		// 2 * freq(BB->Pred) * taken_branch_cost + unconditional_jump_cost
		echristoUnsubmitted Done Reply Inline Actions Go ahead and commit this separately (along with the one below). Also "mismatch" and you shouldn't need the \n. echristo: Go ahead and commit this separately (along with the one below). Also "mismatch" and you…
// < freq(BB->Succ) * taken_branch_cost.		// < freq(BB->Succ) * taken_branch_cost.
// Ignoring unconditional jump cost, we get		// Ignoring unconditional jump cost, we get
// freq(BB->Succ) > 2 * freq(BB->Pred), i.e.,		// freq(BB->Succ) > 2 * freq(BB->Pred), i.e.,
// prob(BB->Succ) > 2 * prob(BB->Pred)		// prob(BB->Succ) > 2 * prob(BB->Pred)
//		//
// When real profile data is available, we can precisely compute the		// When real profile data is available, we can precisely compute the
// probability threshold that is needed for edge BB->Succ to be considered.		// probability threshold that is needed for edge BB->Succ to be considered.
// Without profile data, the heuristic requires the branch bias to be		// Without profile data, the heuristic requires the branch bias to be
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::hasBetterLayoutPredecessor(
// \| \ / \|		// \| \ / \|
// \| X \|		// \| X \|
// \| / \ \|		// \| / \ \|
// \| / \ \|		// \| / \ \|
// S1 S2		// S1 S2
//		//
// The current block is BB and edge BB->S1 is now being evaluated.		// The current block is BB and edge BB->S1 is now being evaluated.
// As above S->BB was already selected because		// As above S->BB was already selected because
// prob(S->BB) > prob(S->Pred). Assume that prob(BB->S1) >= prob(BB->S2).		// prob(S->BB) > prob(S->Pred). Assume that prob(BB->S1) >= prob(BB->S2).
//		//
		echristoUnsubmitted Done Reply Inline Actions Can you document everything that's going on here more please? In particular, what's going on with the callback here and why it needs to be a callback rather than happening on the spot. echristo: Can you document everything that's going on here more please? In particular, what's going on…
		iterateeAuthorUnsubmitted Done Reply Inline Actions I've added a comment to this effect, but the reason it has to be a callback is because none of the things that occur would be valid after deleting the block. (use after free). As to the rest, the function is broken up into small chunks with a comment as to what each chunk is doing. Is there something more you'd like to see? iteratee: I've added a comment to this effect, but the reason it has to be a callback is because none of…
// topo-order:		// topo-order:
//		//
// S-------\| ---S		// S-------\| ---S
// \| \| \| \|		// \| \| \| \|
// ---BB \| \| BB		// ---BB \| \| BB
// \| \| \| \|		// \| \| \| \|
// \| Pred----\| \| S1----		// \| Pred----\| \| S1----
// \| \| \| \|		// \| \| \| \|
▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	if (!BestSucc) {
break;		break;

DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "		DEBUG(dbgs() << "Unnatural loop CFG detected, forcibly merging the "
"layout successor until the CFG reduces\n");		"layout successor until the CFG reduces\n");
}		}

// Placement may have changed tail duplication opportunities.		// Placement may have changed tail duplication opportunities.
// Check for that now.		// Check for that now.
if (TailDupPlacement && BestSucc) {		if (TailDupPlacement && BestSucc) {
		davidxlUnsubmitted Done Reply Inline Actions Please outline this big part into its own method. davidxl: Please outline this big part into its own method.
		iterateeAuthorUnsubmitted Done Reply Inline Actions This was pulled into D18226 and placed in its own method there. iteratee: This was pulled into D18226 and placed in its own method there.
// If the chosen successor was duplicated into all its predecessors,		// If the chosen successor was duplicated into all its predecessors,
// don't bother laying it out, just go round the loop again with BB as		// don't bother laying it out, just go round the loop again with BB as
// the chain end.		// the chain end.
if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,		if (repeatedlyTailDuplicateBlock(BestSucc, BB, LoopHeaderBB, Chain,
BlockFilter, PrevUnplacedBlockIt))		BlockFilter, PrevUnplacedBlockIt))
continue;		continue;
}		}

▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *Succ : MBB->successors()) {
if (ExitLoop->contains(&L))		if (ExitLoop->contains(&L))
BlocksExitingToOuterLoop.insert(MBB);		BlocksExitingToOuterLoop.insert(MBB);
}		}

BlockFrequency ExitEdgeFreq = MBFI->getBlockFreq(MBB) * SuccProb;		BlockFrequency ExitEdgeFreq = MBFI->getBlockFreq(MBB) * SuccProb;
DEBUG(dbgs() << " exiting: " << getBlockName(MBB) << " -> "		DEBUG(dbgs() << " exiting: " << getBlockName(MBB) << " -> "
<< getBlockName(Succ) << " [L:" << SuccLoopDepth << "] (";		<< getBlockName(Succ) << " [L:" << SuccLoopDepth << "] (";
MBFI->printBlockFreq(dbgs(), ExitEdgeFreq) << ")\n");		MBFI->printBlockFreq(dbgs(), ExitEdgeFreq) << ")\n");
// Note that we bias this toward an existing layout successor to retain		// Note that we bias this toward an existing layout successor to retain
		echristoUnsubmitted Done Reply Inline Actions Once again I can't remember if we have autobrief turned on or not... echristo: Once again I can't remember if we have autobrief turned on or not...
		iterateeAuthorUnsubmitted Done Reply Inline Actions Even if we do, it's probably better to match the existing style for this change, and clean it up in a separate patch. iteratee: Even if we do, it's probably better to match the existing style for this change, and clean it…
// incoming order in the absence of better information. The exit must have		// incoming order in the absence of better information. The exit must have
// a frequency higher than the current exit before we consider breaking		// a frequency higher than the current exit before we consider breaking
// the layout.		// the layout.
BranchProbability Bias(100 - ExitBlockBias, 100);		BranchProbability Bias(100 - ExitBlockBias, 100);
if (!ExitingBB \|\| SuccLoopDepth > BestExitLoopDepth \|\|		if (!ExitingBB \|\| SuccLoopDepth > BestExitLoopDepth \|\|
ExitEdgeFreq > BestExitEdgeFreq \|\|		ExitEdgeFreq > BestExitEdgeFreq \|\|
(MBB->isLayoutSuccessor(Succ) &&		(MBB->isLayoutSuccessor(Succ) &&
!(ExitEdgeFreq < BestExitEdgeFreq * Bias))) {		!(ExitEdgeFreq < BestExitEdgeFreq * Bias))) {
BestExitEdgeFreq = ExitEdgeFreq;		BestExitEdgeFreq = ExitEdgeFreq;
		echristoUnsubmitted Done Reply Inline Actions Formatting. echristo: Formatting.
ExitingBB = MBB;		ExitingBB = MBB;
}		}
}		}

if (!HasLoopingSucc) {		if (!HasLoopingSucc) {
// Restore the old exiting state, no viable looping successor was found.		// Restore the old exiting state, no viable looping successor was found.
ExitingBB = OldExitingBB;		ExitingBB = OldExitingBB;
BestExitEdgeFreq = OldBestExitEdgeFreq;		BestExitEdgeFreq = OldBestExitEdgeFreq;
}		}
}		}
// Without a candidate exiting block or with only a single block in the		// Without a candidate exiting block or with only a single block in the
// loop, just use the loop header to layout the loop.		// loop, just use the loop header to layout the loop.
if (!ExitingBB) {		if (!ExitingBB) {
DEBUG(dbgs() << " No other candidate exit blocks, using loop header\n");		DEBUG(dbgs() << " No other candidate exit blocks, using loop header\n");
return nullptr;		return nullptr;
		echristoUnsubmitted Done Reply Inline Actions Coding style nit: no braces around single lines. echristo: Coding style nit: no braces around single lines.
}		}
if (L.getNumBlocks() == 1) {		if (L.getNumBlocks() == 1) {
DEBUG(dbgs() << " Loop has 1 block, using loop header as exit\n");		DEBUG(dbgs() << " Loop has 1 block, using loop header as exit\n");
return nullptr;		return nullptr;
}		}

// Also, if we have exit blocks which lead to outer loops but didn't select		// Also, if we have exit blocks which lead to outer loops but didn't select
// one of them as the exiting block we are rotating toward, disable loop		// one of them as the exiting block we are rotating toward, disable loop
▲ Show 20 Lines • Show All 918 Lines • Show Last 20 Lines

test/CodeGen/AArch64/branch-relax-cbz.ll

	; RUN: llc -mtriple=aarch64-apple-darwin -aarch64-cbz-offset-bits=3 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-apple-darwin -aarch64-cbz-offset-bits=3 < %s \| FileCheck %s

	; CHECK-LABEL: _split_block_no_fallthrough:			; CHECK-LABEL: _split_block_no_fallthrough:
	; CHECK: cmn x{{[0-9]+}}, #5			; CHECK: cmn x{{[0-9]+}}, #5
	; CHECK-NEXT: b.le [[B2:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: b.le [[B2:LBB[0-9]+_[0-9]+]]

	; CHECK-NEXT: ; BB#1: ; %b3			; CHECK-NEXT: ; BB#1: ; %b3
	; CHECK: ldr [[LOAD:w[0-9]+]]			; CHECK: ldr [[LOAD:w[0-9]+]]
	; CHECK: cbz [[LOAD]], [[SKIP_LONG_B:LBB[0-9]+_[0-9]+]]			; CHECK: cbnz [[LOAD]], [[B8:LBB[0-9]+_[0-9]+]]
	; CHECK-NEXT: b [[B8:LBB[0-9]+_[0-9]+]]

	; CHECK-NEXT: [[SKIP_LONG_B]]:
	; CHECK-NEXT: b [[B7:LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: b [[B7:LBB[0-9]+_[0-9]+]]

				; CHECK-NEXT: [[B8]]: ; %b8
				; CHECK-NEXT: ret

	; CHECK-NEXT: [[B2]]: ; %b2			; CHECK-NEXT: [[B2]]: ; %b2
	; CHECK: mov w{{[0-9]+}}, #93			; CHECK: mov w{{[0-9]+}}, #93
	; CHECK: bl _extfunc			; CHECK: bl _extfunc
	; CHECK: cbz w{{[0-9]+}}, [[B7]]			; CHECK: cbz w{{[0-9]+}}, [[B7]]
				; CHECK-NEXT: b [[B8]]

	; CHECK-NEXT: [[B8]]: ; %b8
	; CHECK-NEXT: ret

	; CHECK-NEXT: [[B7]]: ; %b7
	; CHECK: mov w{{[0-9]+}}, #13
	; CHECK: b _extfunc
	define void @split_block_no_fallthrough(i64 %val) #0 {			define void @split_block_no_fallthrough(i64 %val) #0 {
	bb:			bb:
	%c0 = icmp sgt i64 %val, -5			%c0 = icmp sgt i64 %val, -5
	br i1 %c0, label %b3, label %b2			br i1 %c0, label %b3, label %b2

	b2:			b2:
	%v0 = tail call i32 @extfunc(i32 93)			%v0 = tail call i32 @extfunc(i32 93)
	%c1 = icmp eq i32 %v0, 0			%c1 = icmp eq i32 %v0, 0
	Show All 18 Lines

test/CodeGen/AArch64/optimize-cond-branch.ll

	; RUN: llc -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -o - %s \| FileCheck %s
	target triple = "arm64--"			target triple = "arm64--"

	; AArch64InstrInfo::optimizeCondBranch() optimizes the			; AArch64InstrInfo::optimizeCondBranch() optimizes the
	; "x = and y, 256; cmp x, 0; br" from an "and; cbnz" to a tbnz instruction.			; "x = and y, 256; cmp x, 0; br" from an "and; cbnz" to a tbnz instruction.
	; It forgot to clear the a flag resulting in a MachineVerifier complaint.			; It forgot to clear the a flag resulting in a MachineVerifier complaint.
	;			;
	; Writing a stable/simple test is tricky since most tbz instructions are already			; Writing a stable/simple test is tricky since most tbz instructions are already
	; formed in SelectionDAG, optimizeCondBranch() only triggers if the and			; formed in SelectionDAG, optimizeCondBranch() only triggers if the and
	; instruction is in a different block than the conditional jump.			; instruction is in a different block than the conditional jump.
	;			;
	; CHECK-LABEL: func			; CHECK-LABEL: func
	; CHECK-NOT: and			; CHECK-NOT: and
	; CHECK: tbnz			; Layout reverses the test.
				; CHECK: tbz
	define void @func() {			define void @func() {
	%c0 = icmp sgt i64 0, 0			%c0 = icmp sgt i64 0, 0
	br i1 %c0, label %b1, label %b6			br i1 %c0, label %b1, label %b6

	b1:			b1:
	br i1 undef, label %b3, label %b2			br i1 undef, label %b3, label %b2

	b2:			b2:
	Show All 26 Lines

test/CodeGen/AMDGPU/basic-branch.ll

	; RUN: llc -O0 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s			; RUN: llc -O0 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s
	; RUN: llc -O0 -march=amdgcn -mcpu=tonga -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s			; RUN: llc -O0 -march=amdgcn -mcpu=tonga -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNOPT -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNOPT -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNOPT -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCNOPT -check-prefix=GCN %s

	; GCN-LABEL: {{^}}test_branch:			; GCN-LABEL: {{^}}test_branch:
	; GCNNOOPT: v_writelane_b32			; GCNNOOPT: v_writelane_b32
	; GCNNOOPT: v_writelane_b32			; GCNNOOPT: v_writelane_b32
	; GCN: s_cbranch_scc1 [[END:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_scc1 [[END:BB[0-9]+_[0-9]+]]


	; GCN: ; BB#1
	; GCNNOOPT: v_readlane_b32			; GCNNOOPT: v_readlane_b32
	; GCNNOOPT: v_readlane_b32			; GCNNOOPT: v_readlane_b32
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCNOPT-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCNNOOPT: s_endpgm
	; TODO: This waitcnt can be eliminated

	; GCN: {{^}}[[END]]:			; GCN: {{^}}[[END]]:
	; GCN: s_endpgm			; GCN: s_endpgm
	define void @test_branch(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 %val) #0 {			define void @test_branch(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 %val) #0 {
	%cmp = icmp ne i32 %val, 0			%cmp = icmp ne i32 %val, 0
	br i1 %cmp, label %store, label %end			br i1 %cmp, label %store, label %end

	store:			store:
	Show All 32 Lines

test/CodeGen/AMDGPU/cf-loop-on-constant.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -verify-machineinstrs -O0 < %s			; RUN: llc -march=amdgcn -verify-machineinstrs -O0 < %s

	; GCN-LABEL: {{^}}test_loop:			; GCN-LABEL: {{^}}test_loop:
	; GCN: [[LABEL:BB[0-9+]_[0-9]+]]:			; GCN: [[LABEL:BB[0-9+]_[0-9]+]]: ; %for.body{{$}}
	; GCN: ds_read_b32			; GCN: ds_read_b32
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_branch [[LABEL]]			; GCN: s_branch [[LABEL]]
	; GCN: s_endpgm			; GCN: s_endpgm
	define void @test_loop(float addrspace(3)* %ptr, i32 %n) nounwind {			define void @test_loop(float addrspace(3)* %ptr, i32 %n) nounwind {
	entry:			entry:
	%cmp = icmp eq i32 %n, -1			%cmp = icmp eq i32 %n, -1
	br i1 %cmp, label %for.exit, label %for.body			br i1 %cmp, label %for.exit, label %for.body
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/convergent-inlineasm.ll

	Show All 23 Lines

	; GCN-LABEL: {{^}}nonconvergent_inlineasm:			; GCN-LABEL: {{^}}nonconvergent_inlineasm:
	; GCN: ; mask branch			; GCN: ; mask branch

	; GCN: BB{{[0-9]+_[0-9]+}}:			; GCN: BB{{[0-9]+_[0-9]+}}:
	; GCN: v_cmp_ne_u32_e64			; GCN: v_cmp_ne_u32_e64

	; GCN: BB{{[0-9]+_[0-9]+}}:			; GCN: BB{{[0-9]+_[0-9]+}}:

				arsenmUnsubmitted Not Done Reply Inline Actions Unnecessary whitespace change arsenm: Unnecessary whitespace change
	define void @nonconvergent_inlineasm(i64 addrspace(1)* nocapture %arg) {			define void @nonconvergent_inlineasm(i64 addrspace(1)* nocapture %arg) {
	bb:			bb:
	%tmp = call i32 @llvm.amdgcn.workitem.id.x()			%tmp = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = tail call i64 asm "v_cmp_ne_u32_e64 $0, 0, $1", "=s,v"(i32 1)			%tmp1 = tail call i64 asm "v_cmp_ne_u32_e64 $0, 0, $1", "=s,v"(i32 1)
	%tmp2 = icmp eq i32 %tmp, 8			%tmp2 = icmp eq i32 %tmp, 8
	br i1 %tmp2, label %bb3, label %bb5			br i1 %tmp2, label %bb3, label %bb5

	bb3: ; preds = %bb			bb3: ; preds = %bb
	Show All 10 Lines

test/CodeGen/AMDGPU/salu-to-valu.ll

	Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines
	; {{^}}sopc_vopc_legalize_bug:			; {{^}}sopc_vopc_legalize_bug:
	; GCN: s_load_dword [[SGPR:s[0-9]+]]			; GCN: s_load_dword [[SGPR:s[0-9]+]]
	; GCN: v_cmp_le_u32_e32 vcc, [[SGPR]], v{{[0-9]+}}			; GCN: v_cmp_le_u32_e32 vcc, [[SGPR]], v{{[0-9]+}}
	; GCN: s_and_b64 vcc, exec, vcc			; GCN: s_and_b64 vcc, exec, vcc
	; GCN: s_cbranch_vccnz [[EXIT:[A-Z0-9_]+]]			; GCN: s_cbranch_vccnz [[EXIT:[A-Z0-9_]+]]
	; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1			; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
	; GCN-NOHSA: buffer_store_dword [[ONE]]			; GCN-NOHSA: buffer_store_dword [[ONE]]
	; GCN-HSA: flat_store_dword v[{{[0-9]+:[0-9]+}}], [[ONE]]			; GCN-HSA: flat_store_dword v[{{[0-9]+:[0-9]+}}], [[ONE]]
	; GCN; {{^}}[[EXIT]]:			; GCN: {{^}}[[EXIT]]:
	; GCN: s_endpgm			; GCN: s_endpgm
	define void @sopc_vopc_legalize_bug(i32 %cond, i32 addrspace(1)* %out, i32 addrspace(1)* %in) {			define void @sopc_vopc_legalize_bug(i32 %cond, i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
	bb3: ; preds = %bb2			bb3: ; preds = %bb2
	%tmp0 = bitcast i32 %cond to float			%tmp0 = bitcast i32 %cond to float
	%tmp1 = fadd float %tmp0, 2.500000e-01			%tmp1 = fadd float %tmp0, 2.500000e-01
	%tmp2 = bitcast float %tmp1 to i32			%tmp2 = bitcast float %tmp1 to i32
	%tmp3 = icmp ult i32 %tmp2, %cond			%tmp3 = icmp ult i32 %tmp2, %cond
	br i1 %tmp3, label %bb6, label %bb7			br i1 %tmp3, label %bb6, label %bb7
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/skip-if-dead.ll

	Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
	; CHECK: exp			; CHECK: exp
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm

	; CHECK: [[KILLBB:BB[0-9]+_[0-9]+]]:			; CHECK: [[KILLBB:BB[0-9]+_[0-9]+]]:
	; CHECK-NEXT: s_cbranch_scc1 [[BB8:BB[0-9]+_[0-9]+]]			; CHECK-NEXT: s_cbranch_scc1 [[BB8:BB[0-9]+_[0-9]+]]

	; CHECK: v_cmp_eq_f32_e32 vcc, 0, [[PHIREG]]			; CHECK: v_cmp_eq_f32_e32 vcc, 0, [[PHIREG]]
	; CHECK-NEXT: s_cbranch_vccnz [[BB10:BB[0-9]+_[0-9]+]]			; CHECK-NEXT: s_cbranch_vccnz [[BB10:BB[0-9]+_[0-9]+]]
	; CHECK-NEXT: s_branch [[END:BB[0-9]+_[0-9]+]]
				; CHECK: [[END:BB[0-9]+_[0-9]+]]: ; %end
				; CHECK-NEXT: s_endpgm

	; CHECK [[BB8]]: ; %BB8			; CHECK [[BB8]]: ; %BB8
	; CHECK: v_mov_b32_e32 v{{[0-9]+}}, 8			; CHECK: v_mov_b32_e32 v{{[0-9]+}}, 8
	; CHECK: buffer_store_dword			; CHECK: buffer_store_dword
	; CHECK: v_cmp_eq_f32_e32 vcc, 0, [[PHIREG]]			; CHECK: v_cmp_eq_f32_e32 vcc, 0, [[PHIREG]]
	; CHECK-NEXT: s_cbranch_vccz [[END]]			; CHECK-NEXT: s_cbranch_vccz [[END]]

	; CHECK: [[BB10]]: ; %bb10			; CHECK: [[BB10]]: ; %bb10
	; CHECK: v_mov_b32_e32 v{{[0-9]+}}, 9			; CHECK: v_mov_b32_e32 v{{[0-9]+}}, 9
	; CHECK: buffer_store_dword			; CHECK: buffer_store_dword
				; CHECK: s_endpgm

	; CHECK: [[END:BB[0-9]+_[0-9]+]]: ; %end
	; CHECK-NEXT: s_endpgm

	define amdgpu_ps void @phi_use_def_before_kill() #0 {			define amdgpu_ps void @phi_use_def_before_kill() #0 {
	bb:			bb:
	%tmp = fadd float undef, 1.000000e+00			%tmp = fadd float undef, 1.000000e+00
	%tmp1 = fcmp olt float 0.000000e+00, %tmp			%tmp1 = fcmp olt float 0.000000e+00, %tmp
	%tmp2 = select i1 %tmp1, float -1.000000e+00, float 0.000000e+00			%tmp2 = select i1 %tmp1, float -1.000000e+00, float 0.000000e+00
	call void @llvm.AMDGPU.kill(float %tmp2)			call void @llvm.AMDGPU.kill(float %tmp2)
	br i1 undef, label %phibb, label %bb8			br i1 undef, label %phibb, label %bb8
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-cmpxchg.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

	; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:			; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:
	; CHECK-ARMV7-NEXT: .fnstart			; CHECK-ARMV7-NEXT: .fnstart
	; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
	; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]			; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]
	; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:			; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:
	; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
	; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0			; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0
	; CHECK-ARMV7-NEXT: moveq [[RES:r[0-9]+]], #1			; CHECK-ARMV7-NEXT: moveq r0, #1
	; CHECK-ARMV7-NEXT: bxeq lr			; CHECK-ARMV7-NEXT: bxeq lr
	; CHECK-ARMV7-NEXT: [[TRY]]:			; CHECK-ARMV7-NEXT: [[TRY]]:
	; CHECK-ARMV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]			; CHECK-ARMV7-NEXT: ldrexb [[SUCCESS]], [r0]
	; CHECK-ARMV7-NEXT: cmp [[LD]], [[DESIRED]]			; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], r1
	; CHECK-ARMV7-NEXT: beq [[HEAD]]			; CHECK-ARMV7-NEXT: beq [[HEAD]]
	; CHECK-ARMV7-NEXT: clrex			; CHECK-ARMV7-NEXT: clrex
	; CHECK-ARMV7-NEXT: mov [[RES]], #0			; CHECK-ARMV7-NEXT: mov r0, #0
	; CHECK-ARMV7-NEXT: bx lr			; CHECK-ARMV7-NEXT: bx lr

	; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:			; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:
	; CHECK-THUMBV7-NEXT: .fnstart			; CHECK-THUMBV7-NEXT: .fnstart
	; CHECK-THUMBV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-THUMBV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
	; CHECK-THUMBV7-NEXT: b [[TRYLD:.LBB[0-9_]+]]			; CHECK-THUMBV7-NEXT: b [[TRYLD:.LBB[0-9_]+]]
	; CHECK-THUMBV7-NEXT: [[TRYST:.LBB[0-9_]+]]:			; CHECK-THUMBV7-NEXT: [[TRYST:.LBB[0-9_]+]]:
	; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
	Show All 11 Lines

test/CodeGen/ARM/fold-stack-adjust.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines

	; PR18136: there was a bug determining where the first eligible pop in a			; PR18136: there was a bug determining where the first eligible pop in a
	; basic-block was when the entire block was epilogue code.			; basic-block was when the entire block was epilogue code.
	define void @test_fold_point(i1 %tst) minsize {			define void @test_fold_point(i1 %tst) minsize {
	; CHECK-LABEL: test_fold_point:			; CHECK-LABEL: test_fold_point:

	; Important to check for beginning of basic block, because if it gets			; Important to check for beginning of basic block, because if it gets
	; if-converted the test is probably no longer checking what it should.			; if-converted the test is probably no longer checking what it should.
	; CHECK: {{LBB[0-9]+_2}}:			; CHECK: %end
	; CHECK-NEXT: vpop {d7, d8}			; CHECK-NEXT: vpop {d7, d8}
	; CHECK-NEXT: pop {r4, pc}			; CHECK-NEXT: pop {r4, pc}

	; With a guaranteed frame-pointer, we want to make sure that its offset in the			; With a guaranteed frame-pointer, we want to make sure that its offset in the
	; push block is correct, even if a few registers have been tacked onto a later			; push block is correct, even if a few registers have been tacked onto a later
	; vpush (PR18160).			; vpush (PR18160).
	; CHECK-IOS-LABEL: test_fold_point:			; CHECK-IOS-LABEL: test_fold_point:
	; CHECK-IOS: push {r4, r7, lr}			; CHECK-IOS: push {r4, r7, lr}
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/tail-dup-layout.ll

; RUN: llc -outline-optional-branches -O2 < %s \| FileCheck %s		; RUN: llc -O2 < %s \| FileCheck %s
davidxlUnsubmitted Not Done Reply Inline Actions This test case can use some simplifications. Why not just do simple function call in optional branches? The test block can also be simplified for instance testing input parameters. davidxl: This test case can use some simplifications. Why not just do simple function call in optional…
target datalayout = "e-m:e-i64:64-n32:64"		target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-grtev4-linux-gnu"		target triple = "powerpc64le-grtev4-linux-gnu"

; Intended layout:		; Intended layout:
; The outlining flag produces the layout		; The chain-based outlining produces the layout
; test1		; test1
; test2		; test2
; test3		; test3
; test4		; test4
; exit
; optional1		; optional1
; optional2		; optional2
; optional3		; optional3
; optional4		; optional4
		; exit
; Tail duplication puts test n+1 at the end of optional n		; Tail duplication puts test n+1 at the end of optional n
; so optional1 includes a copy of test2 at the end, and branches		; so optional1 includes a copy of test2 at the end, and branches
; to test3 (at the top) or falls through to optional 2.		; to test3 (at the top) or falls through to optional 2.
; The CHECK statements check for the whole string of tests and exit block,		; The CHECK statements check for the whole string of tests
; and then check that the correct test has been duplicated into the end of		; and then check that the correct test has been duplicated into the end of
; the optional blocks and that the optional blocks are in the correct order.		; the optional blocks and that the optional blocks are in the correct order.
;CHECK-LABEL: straight_test:		;CHECK-LABEL: straight_test:
; test1 may have been merged with entry		; test1 may have been merged with entry
;CHECK: mr [[TAGREG:[0-9]+]], 3		;CHECK: mr [[TAGREG:[0-9]+]], 3
;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1		;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
;CHECK-NEXT: bc 12, 1, [[OPT1LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bc 12, 1, [[OPT1LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST2LABEL:[._0-9A-Za-z]+]]: # %test2		;CHECK-NEXT: [[TEST2LABEL:[._0-9A-Za-z]+]]: # %test2
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
;CHECK-NEXT: bne 0, [[OPT2LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, [[OPT2LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3		;CHECK-NEXT: [[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
;CHECK-NEXT: bne 0, .[[OPT3LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT3LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: [[TEST4LABEL:[._0-9A-Za-z]+]]: # %test4		;CHECK-NEXT: [[TEST4LABEL:[._0-9A-Za-z]+]]: # %test4
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28		;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
;CHECK-NEXT: bne 0, .[[OPT4LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, .[[OPT4LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit		;CHECK-NEXT: b [[EXITLABEL:[._0-9A-Za-z]+]]
;CHECK: blr
;CHECK-NEXT: [[OPT1LABEL]]		;CHECK-NEXT: [[OPT1LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
;CHECK-NEXT: beq 0, [[TEST3LABEL]]		;CHECK-NEXT: beq 0, [[TEST3LABEL]]
;CHECK-NEXT: [[OPT2LABEL]]		;CHECK-NEXT: [[OPT2LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
;CHECK-NEXT: beq 0, [[TEST4LABEL]]		;CHECK-NEXT: beq 0, [[TEST4LABEL]]
;CHECK-NEXT: [[OPT3LABEL]]		;CHECK-NEXT: [[OPT3LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
;CHECK-NEXT: beq 0, [[EXITLABEL]]		;CHECK-NEXT: beq 0, [[EXITLABEL]]
;CHECK-NEXT: [[OPT4LABEL]]		;CHECK-NEXT: [[OPT4LABEL]]
;CHECK: b [[EXITLABEL]]		;CHECK: [[EXITLABEL]]: # %exit
		;CHECK: blr

define void @straight_test(i32 %tag) {		define void @straight_test(i32 %tag) {
entry:		entry:
br label %test1		br label %test1
test1:		test1:
%tagbit1 = and i32 %tag, 1		%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1		br i1 %tagbit1eq0, label %test2, label %optional1
Show All 32 Lines	optional4:
call void @d()		call void @d()
call void @d()		call void @d()
call void @d()		call void @d()
br label %exit		br label %exit
exit:		exit:
ret void		ret void
}		}

		; Intended layout:
		; The chain-based outlining produces the layout
		; entry
		; --- Begin loop ---
		; for.latch
		; for.check
		; test1
		; test2
		; test3
		; test4
		; optional1
		; optional2
		; optional3
		; optional4
		; --- End loop ---
		; exit
		; The CHECK statements check for the whole string of tests and exit block,
		; and then check that the correct test has been duplicated into the end of
		; the optional blocks and that the optional blocks are in the correct order.
		;CHECK-LABEL: loop_test:
		;CHECK: add [[TAGPTRREG:[0-9]+]], 3, 4
		;CHECK: [[LATCHLABEL:[._0-9A-Za-z]+]]: # %for.latch
		;CHECK: addi
		;CHECK: [[CHECKLABEL:[._0-9A-Za-z]+]]: # %for.check
		;CHECK: lwz [[TAGREG:[0-9]+]], 0([[TAGPTRREG]])
		;CHECK: # %test1
		;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
		;CHECK-NEXT: bc 12, 1, [[OPT1LABEL:[._0-9A-Za-z]+]]
		;CHECK-NEXT: # %test2
		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
		;CHECK-NEXT: bne 0, [[OPT2LABEL:[._0-9A-Za-z]+]]
		;CHECK-NEXT: [[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3
		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
		;CHECK-NEXT: bne 0, [[OPT3LABEL:[._0-9A-Za-z]+]]
		;CHECK-NEXT: [[TEST4LABEL:[._0-9A-Za-z]+]]: # %{{(test4\|optional3)}}
		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
		;CHECK-NEXT: beq 0, [[LATCHLABEL]]
		;CHECK-NEXT: b [[OPT4LABEL:[._0-9A-Za-z]+]]
		;CHECK: [[OPT1LABEL]]
		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
		;CHECK-NEXT: beq 0, [[TEST3LABEL]]
		;CHECK-NEXT: [[OPT2LABEL]]
		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
		;CHECK-NEXT: beq 0, [[TEST4LABEL]]
		;CHECK-NEXT: [[OPT3LABEL]]
		;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
		;CHECK-NEXT: beq 0, [[LATCHLABEL]]
		;CHECK-NEXT: [[OPT4LABEL]]
		;CHECK: b [[LATCHLABEL]]
		define void @loop_test(i32* %tags, i32 %count) {
		entry:
		br label %for.check
		for.check:
		%count.loop = phi i32 [%count, %entry], [%count.sub, %for.latch]
		%done.count = icmp ugt i32 %count.loop, 0
		%tag_ptr = getelementptr inbounds i32, i32* %tags, i32 %count
		%tag = load i32, i32* %tag_ptr
		%done.tag = icmp eq i32 %tag, 0
		%done = and i1 %done.count, %done.tag
		br i1 %done, label %test1, label %exit
		test1:
		%tagbit1 = and i32 %tag, 1
		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
		br i1 %tagbit1eq0, label %test2, label %optional1
		optional1:
		call void @a()
		call void @a()
		call void @a()
		call void @a()
		br label %test2
		test2:
		%tagbit2 = and i32 %tag, 2
		%tagbit2eq0 = icmp eq i32 %tagbit2, 0
		br i1 %tagbit2eq0, label %test3, label %optional2
		optional2:
		call void @b()
		call void @b()
		call void @b()
		call void @b()
		br label %test3
		test3:
		%tagbit3 = and i32 %tag, 4
		%tagbit3eq0 = icmp eq i32 %tagbit3, 0
		br i1 %tagbit3eq0, label %test4, label %optional3
		optional3:
		call void @c()
		call void @c()
		call void @c()
		call void @c()
		br label %test4
		test4:
		%tagbit4 = and i32 %tag, 8
		%tagbit4eq0 = icmp eq i32 %tagbit4, 0
		br i1 %tagbit4eq0, label %for.latch, label %optional4
		optional4:
		call void @d()
		call void @d()
		call void @d()
		call void @d()
		br label %for.latch
		for.latch:
		%count.sub = sub i32 %count.loop, 1
		br label %for.check
		exit:
		ret void
		}

; The block then2 is not unavoidable, but since it can be tail-duplicated, it		; The block then2 is not unavoidable, but since it can be tail-duplicated, it
; should be placed as a fallthrough from test2 and copied.		; should be placed as a fallthrough from test2 and copied.
; CHECK-LABEL: avoidable_test:		; CHECK-LABEL: avoidable_test:
; CHECK: # %entry		; CHECK: # %entry
; CHECK: andi.		; CHECK: andi.
; CHECK: # %test2		; CHECK: # %test2
; Make sure then2 falls through from test2		; Make sure then2 falls through from test2
; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}		; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
; CHECK: # %then2		; CHECK: # %then2
; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29		; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}		; CHECK-NOT: # %{{[-_a-zA-Z0-9]+}}
; CHECK: # %end2
; CHECK: # %else1		; CHECK: # %else1
; CHECK: bl a		; CHECK: bl a
; CHECK: bl a		; CHECK: bl a
; Make sure then2 was copied into else1		; Make sure then2 was copied into else1
; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29		; CHECK: rlwinm. {{[0-9]+}}, {{[0-9]+}}, 0, 29, 29
; CHECK: # %else2		; CHECK: # %else2
; CHECK: bl c		; CHECK: bl c
		; CHECK: # %end2
define void @avoidable_test(i32 %tag) {		define void @avoidable_test(i32 %tag) {
entry:		entry:
br label %test1		br label %test1
test1:		test1:
%tagbit1 = and i32 %tag, 1		%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0		%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %else1, !prof !1 ; %test2 more likely		br i1 %tagbit1eq0, label %test2, label %else1, !prof !1 ; %test2 more likely
else1:		else1:
Show All 12 Lines	else2:
call void @c()		call void @c()
br label %end2		br label %end2
end2:		end2:
ret void		ret void
end1:		end1:
call void @d()		call void @d()
ret void		ret void
}		}

declare void @a()		declare void @a()
declare void @b()		declare void @b()
declare void @c()		declare void @c()
declare void @d()		declare void @d()

!1 = !{!"branch_weights", i32 2, i32 1}		!1 = !{!"branch_weights", i32 2, i32 1}

test/CodeGen/WebAssembly/mem-intrinsics.ll

	; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -tail-dup-placement=0\| FileCheck %s			; RUN: llc < %s -asm-verbose=false -disable-wasm-fallthrough-return-opt -tail-dup-placement=0 \| FileCheck %s

	; Test memcpy, memmove, and memset intrinsics.			; Test memcpy, memmove, and memset intrinsics.

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture readonly, i32, i32, i1)			declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture readonly, i32, i32, i1)
	declare void @llvm.memmove.p0i8.p0i8.i32(i8* nocapture, i8* nocapture readonly, i32, i32, i1)			declare void @llvm.memmove.p0i8.p0i8.i32(i8* nocapture, i8* nocapture readonly, i32, i32, i1)
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

test/CodeGen/X86/block-placement.ll

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines

exit:		exit:
ret i32 %sum		ret i32 %sum
}		}

define void @unnatural_cfg1() {		define void @unnatural_cfg1() {
; Test that we can handle a loop with an inner unnatural loop at the end of		; Test that we can handle a loop with an inner unnatural loop at the end of
; a function. This is a gross CFG reduced out of the single source GCC.		; a function. This is a gross CFG reduced out of the single source GCC.
; CHECK: unnatural_cfg1		; CHECK-LABEL: unnatural_cfg1
; CHECK: %entry		; CHECK: %entry
; CHECK: %loop.body1		; CHECK: %loop.body1
; CHECK: %loop.body2		; CHECK: %loop.body2
; CHECK: %loop.body3		; CHECK: %loop.body3

entry:		entry:
br label %loop.header		br label %loop.header

Show All 21 Lines	loop.body5:
%ptr2 = load i32, i32* undef, align 4		%ptr2 = load i32, i32* undef, align 4
br label %loop.body3		br label %loop.body3
}		}

define void @unnatural_cfg2() {		define void @unnatural_cfg2() {
; Test that we can handle a loop with a nested natural loop and an unnatural		; Test that we can handle a loop with a nested natural loop and an unnatural
; loop. This was reduced from a crash on block placement when run over		; loop. This was reduced from a crash on block placement when run over
; single-source GCC.		; single-source GCC.
; CHECK: unnatural_cfg2		; The tail-duplication outlining algorithm places
		; %loop.body3 and %loop.inner1.begin out-of-line at the end of the loop,
		; because %loop.body4 is unnavoidable within the loop and short,
		; and %loop.inner1.begin has an alternate fallthrough of %loop.body3
		; CHECK-LABEL: unnatural_cfg2
; CHECK: %entry		; CHECK: %entry
; CHECK: %loop.body1		; CHECK: %loop.body1
; CHECK: %loop.body2		; CHECK: %loop.body2
		; CHECK: %loop.body4
		; CHECK: %loop.inner2.begin
		; CHECK: %loop.inner2.begin
		; The loop.inner2.end block is folded
; CHECK: %loop.body3		; CHECK: %loop.body3
; CHECK: %loop.inner1.begin		; CHECK: %loop.inner1.begin
; The end block is folded with %loop.body3...		; The end block is folded with %loop.body3...
; CHECK-NOT: %loop.inner1.end		; CHECK-NOT: %loop.inner1.end
; CHECK: %loop.body4
; CHECK: %loop.inner2.begin
; The loop.inner2.end block is folded
; CHECK: %loop.header		; CHECK: %loop.header
; CHECK: %bail		; CHECK: %bail

entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%comp0 = icmp eq i32* undef, null		%comp0 = icmp eq i32* undef, null
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
declare i32 @__gxx_personality_v0(...)		declare i32 @__gxx_personality_v0(...)

define void @test_eh_lpad_successor() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {		define void @test_eh_lpad_successor() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
; Some times the landing pad ends up as the first successor of an invoke block.		; Some times the landing pad ends up as the first successor of an invoke block.
; When this happens, a strange result used to fall out of updateTerminators: we		; When this happens, a strange result used to fall out of updateTerminators: we
; didn't correctly locate the fallthrough successor, assuming blindly that the		; didn't correctly locate the fallthrough successor, assuming blindly that the
; first one was the fallthrough successor. As a result, we would add an		; first one was the fallthrough successor. As a result, we would add an
; erroneous jump to the landing pad thinking that was the default successor.		; erroneous jump to the landing pad thinking that was the default successor.
; CHECK: test_eh_lpad_successor		; CHECK-LABEL: test_eh_lpad_successor
; CHECK: %entry		; CHECK: %entry
; CHECK-NOT: jmp		; CHECK-NOT: jmp
; CHECK: %loop		; CHECK: %loop

entry:		entry:
invoke i32 @f() to label %preheader unwind label %lpad		invoke i32 @f() to label %preheader unwind label %lpad

preheader:		preheader:
Show All 11 Lines
declare void @fake_throw() noreturn		declare void @fake_throw() noreturn

define void @test_eh_throw() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {		define void @test_eh_throw() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
; For blocks containing a 'throw' (or similar functionality), we have		; For blocks containing a 'throw' (or similar functionality), we have
; a no-return invoke. In this case, only EH successors will exist, and		; a no-return invoke. In this case, only EH successors will exist, and
; fallthrough simply won't occur. Make sure we don't crash trying to update		; fallthrough simply won't occur. Make sure we don't crash trying to update
; terminators for such constructs.		; terminators for such constructs.
;		;
; CHECK: test_eh_throw		; CHECK-LABEL: test_eh_throw
; CHECK: %entry		; CHECK: %entry
; CHECK: %cleanup		; CHECK: %cleanup

entry:		entry:
invoke void @fake_throw() to label %continue unwind label %cleanup		invoke void @fake_throw() to label %continue unwind label %cleanup

continue:		continue:
unreachable		unreachable

cleanup:		cleanup:
%0 = landingpad { i8*, i32 }		%0 = landingpad { i8*, i32 }
cleanup		cleanup
unreachable		unreachable
}		}

define void @test_unnatural_cfg_backwards_inner_loop() {		define void @test_unnatural_cfg_backwards_inner_loop() {
; Test that when we encounter an unnatural CFG structure after having formed		; Test that when we encounter an unnatural CFG structure after having formed
; a chain for an inner loop which happened to be laid out backwards we don't		; a chain for an inner loop which happened to be laid out backwards we don't
; attempt to merge onto the wrong end of the inner loop just because we find it		; attempt to merge onto the wrong end of the inner loop just because we find it
; first. This was reduced from a crasher in GCC's single source.		; first. This was reduced from a crasher in GCC's single source.
;		;
; CHECK: test_unnatural_cfg_backwards_inner_loop		; CHECK-LABEL: test_unnatural_cfg_backwards_inner_loop
; CHECK: %entry		; CHECK: %entry
; CHECK: %loop2b		; CHECK: %loop2b
; CHECK: %loop1		; CHECK: %loop1

entry:		entry:
br i1 undef, label %loop2a, label %body		br i1 undef, label %loop2a, label %body

body:		body:
Show All 23 Lines

define void @unanalyzable_branch_to_loop_header() {		define void @unanalyzable_branch_to_loop_header() {
; Ensure that we can handle unanalyzable branches into loop headers. We		; Ensure that we can handle unanalyzable branches into loop headers. We
; pre-form chains for unanalyzable branches, and will find the tail end of that		; pre-form chains for unanalyzable branches, and will find the tail end of that
; at the start of the loop. This function uses floating point comparison		; at the start of the loop. This function uses floating point comparison
; fallthrough because that happens to always produce unanalyzable branches on		; fallthrough because that happens to always produce unanalyzable branches on
; x86.		; x86.
;		;
; CHECK: unanalyzable_branch_to_loop_header		; CHECK-LABEL: unanalyzable_branch_to_loop_header
; CHECK: %entry		; CHECK: %entry
; CHECK: %loop		; CHECK: %loop
; CHECK: %exit		; CHECK: %exit

entry:		entry:
%cmp = fcmp une double 0.000000e+00, undef		%cmp = fcmp une double 0.000000e+00, undef
br i1 %cmp, label %loop, label %exit		br i1 %cmp, label %loop, label %exit

loop:		loop:
%cond = icmp eq i8 undef, 42		%cond = icmp eq i8 undef, 42
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @unanalyzable_branch_to_best_succ(i1 %cond) {		define void @unanalyzable_branch_to_best_succ(i1 %cond) {
; Ensure that we can handle unanalyzable branches where the destination block		; Ensure that we can handle unanalyzable branches where the destination block
; gets selected as the optimal successor to merge.		; gets selected as the optimal successor to merge.
;		;
; This branch is now analyzable and hence the destination block becomes the		; This branch is now analyzable and hence the destination block becomes the
; hotter one. The right order is entry->bar->exit->foo.		; hotter one. The right order is entry->bar->exit->foo.
;		;
; CHECK: unanalyzable_branch_to_best_succ		; CHECK-LABEL: unanalyzable_branch_to_best_succ
; CHECK: %entry		; CHECK: %entry
; CHECK: %bar		; CHECK: %bar
; CHECK: %exit		; CHECK: %exit
; CHECK: %foo		; CHECK: %foo

entry:		entry:
; Bias this branch toward bar to ensure we form that chain.		; Bias this branch toward bar to ensure we form that chain.
br i1 %cond, label %bar, label %foo, !prof !1		br i1 %cond, label %bar, label %foo, !prof !1
Show All 9 Lines
exit:		exit:
ret void		ret void
}		}

define void @unanalyzable_branch_to_free_block(float %x) {		define void @unanalyzable_branch_to_free_block(float %x) {
; Ensure that we can handle unanalyzable branches where the destination block		; Ensure that we can handle unanalyzable branches where the destination block
; gets selected as the best free block in the CFG.		; gets selected as the best free block in the CFG.
;		;
; CHECK: unanalyzable_branch_to_free_block		; CHECK-LABEL: unanalyzable_branch_to_free_block
; CHECK: %entry		; CHECK: %entry
; CHECK: %a		; CHECK: %a
; CHECK: %b		; CHECK: %b
; CHECK: %c		; CHECK: %c
; CHECK: %exit		; CHECK: %exit

entry:		entry:
br i1 undef, label %a, label %b		br i1 undef, label %a, label %b
Show All 13 Lines
exit:		exit:
ret void		ret void
}		}

define void @many_unanalyzable_branches() {		define void @many_unanalyzable_branches() {
; Ensure that we don't crash as we're building up many unanalyzable branches,		; Ensure that we don't crash as we're building up many unanalyzable branches,
; blocks, and loops.		; blocks, and loops.
;		;
; CHECK: many_unanalyzable_branches		; CHECK-LABEL: many_unanalyzable_branches
; CHECK: %entry		; CHECK: %entry
; CHECK: %exit		; CHECK: %exit

entry:		entry:
br label %0		br label %0

%val0 = load volatile float, float* undef		%val0 = load volatile float, float* undef
%cmp0 = fcmp une float %val0, undef		%cmp0 = fcmp une float %val0, undef
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
; 1) Loop rotation needs to ensure that the desired exiting edge can be		; 1) Loop rotation needs to ensure that the desired exiting edge can be
; a fallthrough.		; a fallthrough.
; 2) The exiting edge from the loop which is rotated to be laid out at the		; 2) The exiting edge from the loop which is rotated to be laid out at the
; bottom of the loop needs to be exiting into the nearest enclosing loop (to		; bottom of the loop needs to be exiting into the nearest enclosing loop (to
; which there is an exit). Otherwise, we force that enclosing loop into		; which there is an exit). Otherwise, we force that enclosing loop into
; strange layouts that are siginificantly less efficient, often times maing		; strange layouts that are siginificantly less efficient, often times maing
; it discontiguous.		; it discontiguous.
;		;
; CHECK: @benchmark_heapsort		; CHECK-LABEL: @benchmark_heapsort
; CHECK: %entry		; CHECK: %entry
; First rotated loop top.		; First rotated loop top.
; CHECK: .p2align		; CHECK: .p2align
; CHECK: %while.end		; CHECK: %while.end
; %for.cond gets completely tail-duplicated away.		; %for.cond gets completely tail-duplicated away.
; CHECK: %if.then		; CHECK: %if.then
; CHECK: %if.else		; CHECK: %if.else
; CHECK: %if.end10		; CHECK: %if.end10
▲ Show 20 Lines • Show All 505 Lines • Show Last 20 Lines

test/CodeGen/X86/tail-dup-merge-loop-headers.ll

	; RUN: llc -O2 -o - %s \| FileCheck %s			; RUN: llc -O2 -o - %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	; CHECK-LABEL: tail_dup_merge_loops			; CHECK-LABEL: tail_dup_merge_loops
	; CHECK: # %entry			; CHECK: # %entry
	; CHECK-NOT: # %{{[a-zA-Z_]+}}			; CHECK-NOT: # %{{[a-zA-Z_]+}}
				; CHECK: # %exit
				; CHECK-NOT: # %{{[a-zA-Z_]+}}
	; CHECK: # %inner_loop_exit			; CHECK: # %inner_loop_exit
	; CHECK-NOT: # %{{[a-zA-Z_]+}}			; CHECK-NOT: # %{{[a-zA-Z_]+}}
	; CHECK: # %inner_loop_latch			; CHECK: # %inner_loop_latch
	; CHECK-NOT: # %{{[a-zA-Z_]+}}			; CHECK-NOT: # %{{[a-zA-Z_]+}}
	; CHECK: # %inner_loop_test			; CHECK: # %inner_loop_test
	; CHECK-NOT: # %{{[a-zA-Z_]+}}
	; CHECK: # %exit
	define void @tail_dup_merge_loops(i32 %a, i8* %b, i8* %c) local_unnamed_addr #0 {			define void @tail_dup_merge_loops(i32 %a, i8* %b, i8* %c) local_unnamed_addr #0 {
	entry:			entry:
	%notlhs674.i = icmp eq i32 %a, 0			%notlhs674.i = icmp eq i32 %a, 0
	br label %outer_loop_top			br label %outer_loop_top

	outer_loop_top: ; preds = %inner_loop_exit, %entry			outer_loop_top: ; preds = %inner_loop_exit, %entry
	%dst.0.ph.i = phi i8* [ %b, %entry ], [ %scevgep679.i, %inner_loop_exit ]			%dst.0.ph.i = phi i8* [ %b, %entry ], [ %scevgep679.i, %inner_loop_exit ]
	br i1 %notlhs674.i, label %exit, label %inner_loop_preheader			br i1 %notlhs674.i, label %exit, label %inner_loop_preheader
	▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

test/CodeGen/X86/tail-dup-repeat.ll

	; RUN: llc -O2 -tail-dup-placement-threshold=4 -o - %s \| FileCheck %s			; RUN: llc -O3 -tail-dup-placement-threshold=4 -o - %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: uwtable			; Function Attrs: uwtable
	; When tail-duplicating during placement, we work backward from blocks with			; When tail-duplicating during placement, we work backward from blocks with
	; multiple successors. In this case, the block dup1 gets duplicated into dup2			; multiple successors. In this case, the block dup1 gets duplicated into dup2
	; and if.then64, and then the block dup2 gets duplicated into land.lhs.true			; and if.then64, and then the block dup2 gets duplicated into land.lhs.true
	; and if.end70			; and if.end70
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/CodeGen/X86/tail-opts.ll

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; BranchFolding shouldn't try to merge the tails of two blocks			; BranchFolding shouldn't try to merge the tails of two blocks
	; with only a branch in common, regardless of the fallthrough situation.			; with only a branch in common, regardless of the fallthrough situation.

	; CHECK-LABEL: dont_merge_oddly:			; CHECK-LABEL: dont_merge_oddly:
	; CHECK-NOT: ret			; CHECK-NOT: ret
	; CHECK: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}			; CHECK: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
	; CHECK-NEXT: jbe .LBB2_3			; CHECK-NEXT: jbe .LBB2_3
	; CHECK-NEXT: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}			; CHECK-NEXT: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
	; CHECK-NEXT: ja .LBB2_4
	; CHECK-NEXT: jmp .LBB2_2
	; CHECK-NEXT: .LBB2_3:
	; CHECK-NEXT: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
	; CHECK-NEXT: jbe .LBB2_2			; CHECK-NEXT: jbe .LBB2_2
	; CHECK-NEXT: .LBB2_4:			; CHECK-NEXT: .LBB2_4:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB2_3:
				; CHECK-NEXT: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
				; CHECK-NEXT: ja .LBB2_4
	; CHECK-NEXT: .LBB2_2:			; CHECK-NEXT: .LBB2_2:
	; CHECK-NEXT: movb $1, %al			; CHECK-NEXT: movb $1, %al
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	define i1 @dont_merge_oddly(float* %result) nounwind {			define i1 @dont_merge_oddly(float* %result) nounwind {
	entry:			entry:
	%tmp4 = getelementptr float, float* %result, i32 2			%tmp4 = getelementptr float, float* %result, i32 2
	%tmp5 = load float, float* %tmp4, align 4			%tmp5 = load float, float* %tmp4, align 4
	▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

test/CodeGen/X86/twoaddr-coalesce-3.ll

Show All 13 Lines	entry:
br i1 %cmp3, label %for.body.lr.ph, label %for.end		br i1 %cmp3, label %for.body.lr.ph, label %for.end

for.body.lr.ph: ; preds = %entry		for.body.lr.ph: ; preds = %entry
%total.promoted = load i32, i32* @total, align 4		%total.promoted = load i32, i32* @total, align 4
br label %for.body		br label %for.body

; Check that only one mov will be generated in the kernel loop.		; Check that only one mov will be generated in the kernel loop.
; CHECK-LABEL: foo:		; CHECK-LABEL: foo:
; CHECK: [[LOOP1:^[a-zA-Z0-9_.]+]]: {{#.*}} %for.body		; CHECK: [[LOOP1:^[a-zA-Z0-9_.]+]]: {{#.*}} %for.body{{$}}
; CHECK-NOT: mov		; CHECK-NOT: mov
; CHECK: movl {{.*}}, [[REG1:%[a-z0-9]+]]		; CHECK: movl {{.*}}, [[REG1:%[a-z0-9]+]]
; CHECK-NOT: mov		; CHECK-NOT: mov
; CHECK: shrl $31, [[REG1]]		; CHECK: shrl $31, [[REG1]]
; CHECK-NOT: mov		; CHECK-NOT: mov
; CHECK: jl [[LOOP1]]		; CHECK: jl [[LOOP1]]
for.body: ; preds = %for.body.lr.ph, %for.body		for.body: ; preds = %for.body.lr.ph, %for.body
%add5 = phi i32 [ %total.promoted, %for.body.lr.ph ], [ %add, %for.body ]		%add5 = phi i32 [ %total.promoted, %for.body.lr.ph ], [ %add, %for.body ]
Show All 20 Lines	entry:
br i1 %cmp3, label %for.body.lr.ph, label %for.end		br i1 %cmp3, label %for.body.lr.ph, label %for.end

for.body.lr.ph: ; preds = %entry		for.body.lr.ph: ; preds = %entry
%total.promoted = load i32, i32* @total, align 4		%total.promoted = load i32, i32* @total, align 4
br label %for.body		br label %for.body

; Check that only two mov will be generated in the kernel loop.		; Check that only two mov will be generated in the kernel loop.
; CHECK-LABEL: goo:		; CHECK-LABEL: goo:
; CHECK: [[LOOP2:^[a-zA-Z0-9_.]+]]: {{#.*}} %for.body		; CHECK: [[LOOP2:^[a-zA-Z0-9_.]+]]: {{#.*}} %for.body{{$}}
; CHECK-NOT: mov		; CHECK-NOT: mov
; CHECK: movl {{.*}}, [[REG2:%[a-z0-9]+]]		; CHECK: movl {{.*}}, [[REG2:%[a-z0-9]+]]
; CHECK-NOT: mov		; CHECK-NOT: mov
; CHECK: shrl $31, [[REG2]]		; CHECK: shrl $31, [[REG2]]
; CHECK-NOT: mov		; CHECK-NOT: mov
; CHECK: movl {{.*}}		; CHECK: movl {{.*}}
; CHECK: jl [[LOOP2]]		; CHECK: jl [[LOOP2]]
for.body: ; preds = %for.body.lr.ph, %for.body		for.body: ; preds = %for.body.lr.ph, %for.body
Show All 17 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Codegen: Make chains from lattice-shaped CFGsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83739

lib/CodeGen/MachineBlockPlacement.cpp

test/CodeGen/AArch64/branch-relax-cbz.ll

test/CodeGen/AArch64/optimize-cond-branch.ll

test/CodeGen/AMDGPU/basic-branch.ll

test/CodeGen/AMDGPU/cf-loop-on-constant.ll

test/CodeGen/AMDGPU/convergent-inlineasm.ll

test/CodeGen/AMDGPU/salu-to-valu.ll

test/CodeGen/AMDGPU/skip-if-dead.ll

test/CodeGen/ARM/atomic-cmpxchg.ll

test/CodeGen/ARM/fold-stack-adjust.ll

test/CodeGen/PowerPC/tail-dup-layout.ll

test/CodeGen/WebAssembly/mem-intrinsics.ll

test/CodeGen/X86/block-placement.ll

test/CodeGen/X86/tail-dup-merge-loop-headers.ll

test/CodeGen/X86/tail-dup-repeat.ll

test/CodeGen/X86/tail-opts.ll

test/CodeGen/X86/twoaddr-coalesce-3.ll

Codegen: Make chains from lattice-shaped CFGs
AbandonedPublic