This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
JumpThreading.cpp
-
Utils/
-
SimplifyCFG.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
-
aarch64-loop-gep-opt.ll
-
Transforms/
-
JumpThreading/
-
pr33605.ll
-
static-profile.ll
-
LoopUnroll/
-
peel-loop.ll
-
LoopUnswitch/
-
2015-06-17-Metadata.ll
-
infinite-loop.ll
-
LoopVectorize/
-
X86/
-
float-induction-x86.ll
-
float-induction.ll
-
SimplifyCFG/
-
X86/
-
switch_to_lookup_table.ll
-
multiple-phis.ll
-
pr33605.ll
-
preserve-llvm-loop-metadata.ll

Differential D35411

[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.
ClosedPublic

Authored by bmakam on Jul 14 2017, 5:13 AM.

Download Raw Diff

Details

Reviewers

efriedma
mcrosier
pacxx
hsung
davidxl

Commits

rGb05a55787a61: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can…
rL308422: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can…

Summary

When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).

This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.

Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605

Diff Detail

Repository: rL LLVM

Event Timeline

bmakam created this revision.Jul 14 2017, 5:13 AM

Herald added a subscriber: javed.absar. · View Herald TranscriptJul 14 2017, 5:13 AM

Let me see if I can describe the problem and your approach to fixing the issue in my own words.

Currently, JumpThreading and SimplifyCFG avoid threading/merging "empty" loop headers as this would break the canonical form of the loop; the CFG edge being optimized is between the loop header and its successor. Your approach is to also avoid merging the incoming edges (i.e., back edges) to the loop header as well to avoid breaking the canonical form of the loop. Then later in late-SimplifyCFG and CodeGen prepare you more aggressively remove these empty blocks.

Sound about right?

lib/Transforms/Scalar/JumpThreading.cpp
239 ↗	(On Diff #106623)	This comment needs some additional explaining after your change. You're avoiding the case where we're skipping an empty block whose successor is a loop header, which IIUC when merged destroys the canonical form of the loop. You're then relying on CodeGenPrepare to eliminate these empty blocks.

In D35411#809670, @mcrosier wrote:

Let me see if I can describe the problem and your approach to fixing the issue in my own words.

Currently, JumpThreading and SimplifyCFG avoid threading/merging "empty" loop headers as this would break the canonical form of the loop; the CFG edge being optimized is between the loop header and its successor. Your approach is to also avoid merging the incoming edges (i.e., back edges) to the loop header as well to avoid breaking the canonical form of the loop. Then later in late-SimplifyCFG and CodeGen prepare you more aggressively remove these empty blocks.

Sound about right?

Thanks Chad,
That's right.

I'm sort of worried this could have unexpected consequences; do you have performance numbers? (LLVM testsuite or SPEC)

lib/CodeGen/CodeGenPrepare.cpp
668 ↗	(On Diff #106651)	Do you need to update the latch set here?
test/Transforms/LoopUnroll/pr33605.ll
1 ↗	(On Diff #106651)	We usually prefer tests for transformation passes which just check the output of the pass itself, so they aren't so sensitive to changes in other passes.

Update latch set and modify a test case. Thanks, Eli.

gberry added a subscriber: gberry.Jul 14 2017, 3:01 PM

In D35411#809870, @efriedma wrote:

I'm sort of worried this could have unexpected consequences; do you have performance numbers? (LLVM testsuite or SPEC)

I was targeting to unroll a hot loop in spec2017/gcc. In addition to unrolling the hot loop in spec2017/gcc which yielded 2% improvement, I observed a loop interleaved in povray which yielded 6-7% improvement.
Here are the full perf results for SPEC on Falkor with O3 config:

Benchmark                Diff (%)  
----------------------- ----------
spec2006/bzip2:ref       -2.18
spec2017/omnetpp:ref     -1.2
spec2006/perlbench:ref   -1.07
spec2000/equake:ref      -0.89
spec2000/art:ref         -0.79
spec2017/leela:ref       -0.67
spec2000/bzip2:ref       0.77
spec2006/namd:ref        0.84
spec2017/xz:ref          0.89
spec2000/gcc:ref         0.9
spec2006/mcf:ref         1.07
spec2017/deepsjeng:ref   1.62
spec2000/crafty:ref      1.62
spec2017/gcc:ref         2.18
spec2006/dealII:ref      3.73
spec2017/blender:ref     4.78
spec2006/povray:ref      6.25
spec2017/povray:ref      6.85
spec2000/gap:ref         7.15

ashutosh.nema added a subscriber: ashutosh.nema.Jul 17 2017, 2:15 AM

Thanks Balaram for posting this patch, in general idea looks good to preserve the canonical form of the loops.

Any idea why these benchmark regressed:

spec2006/bzip2:ref -2.18
spec2017/omnetpp:ref -1.2
spec2006/perlbench:ref -1.07

In D35411#811150, @ashutosh.nema wrote:

Thanks Balaram for posting this patch, in general idea looks good to preserve the canonical form of the loops.

Any idea why these benchmark regressed:

spec2006/bzip2:ref -2.18
spec2017/omnetpp:ref -1.2
spec2006/perlbench:ref -1.07

Thanks Ashutosh,

I rerun these 3 benchmarks and compared with the tip of trunk today and the regression in spec2006/bzip2 and spec2006/perlbench were found to be dubious. spec2017/omnetpp is still regressed by same ~1%. I looked at all the perf counters and nothing was obvious. Perhaps an alignment issue?

Minor code clean up, NFCI.

Needs testcases for the jump threading and CGP changes.

Why do we need the empty block folding in both LateSimplifyCFG and CGP?

lib/CodeGen/CodeGenPrepare.cpp
668 ↗	(On Diff #106651)	While you're here, do we also need to update the Preheaders set?
705 ↗	(On Diff #106714)	Put isLatch first?
lib/Transforms/Scalar/JumpThreading.cpp
243 ↗	(On Diff #106714)	`BI->getSuccessor(0)`?
lib/Transforms/Utils/SimplifyCFG.cpp
5659 ↗	(On Diff #106714)	`BI->getSuccessor(0)`?

Update to address Eli's comments.

In D35411#812010, @efriedma wrote:

Why do we need the empty block folding in both LateSimplifyCFG and CGP?

Empty folding in CGP occurs very late after LSR and is also not aggressive because it is not run iteratively. LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

"After switch is lowered"? What transform are you talking about?

In D35411#813339, @efriedma wrote:

LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

"After switch is lowered"? What transform are you talking about?

I was talking about Switch-to-lookup table transform.

SwitchToLookupTable is itself part of LateSimplifyCFG; we should fold empty blocks afterwards, I think?

In D35411#813349, @efriedma wrote:

SwitchToLookupTable is itself part of LateSimplifyCFG; we should fold empty blocks afterwards, I think?

I was expecting the same but found that LateSimplifyCFG could not catch empty case blocks in spec2017/perlbench. Perhaps later transformations like unreachableblockelim or constanthoist could turn some cases into empty blocks?

Edit: I went back and checked that during LSR loopsimplify creates dedicated exit block which are now empty and need to be cleaned up in CGP.

Hmm...

I took a quick look at the pass pipeline (PassManagerBuilder::populateModulePassManager), and it turns out LateSimplifyCFG is false for the last simplifycfg run. That might be the source of your problem?

In D35411#813417, @efriedma wrote:

Hmm...

I took a quick look at the pass pipeline (PassManagerBuilder::populateModulePassManager), and it turns out LateSimplifyCFG is false for the last simplifycfg run. That might be the source of your problem?

Thanks, Eli. In case you missed my previous comment, the problem here is that during LSR, loopsimplify creates dedicated exit blocks which now need to be cleaned up in CGP.

Oh, LSR is generating the extra block? That makes sense... but it seems mostly unrelated to the rest of this patch. Does it have any impact on its own?

lib/CodeGen/CodeGenPrepare.cpp
671 ↗	(On Diff #107144)	The "if" is redundant; erase() does nothing if the set doesn't contain BB.

I mean the CodeGenPrepare changes (which are deleting blocks generated by LSR) vs the other changes (which modify transforms that run before LSR).

Minor cleanup. Thanks, Eli.

In D35411#813525, @efriedma wrote:

I mean the CodeGenPrepare changes (which are deleting blocks generated by LSR) vs the other changes (which modify transforms that run before LSR).

It seems to be unrelated to the rest of this patch. Though the problem in spec2017/perlbench surfaces only with this patch and regresses the benchmark by 3%. The CGP changes by itself was performance neutral on SPEC.
I can split this out as a separate change if you prefer.

Yes, please split.

lib/CodeGen/CodeGenPrepare.cpp
669 ↗	(On Diff #107175)	You missed the other redundant if.

Split CGP changes into a separate follow on patch, per Eli's request.

LGTM

This revision is now accepted and ready to land.Jul 18 2017, 2:33 PM

Thanks, Eli.

The CGP changes are here: D35584

bmakam added a child revision: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..Jul 18 2017, 3:14 PM

bmakam removed a child revision: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..Jul 18 2017, 3:18 PM

bmakam added a parent revision: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

bmakam removed a parent revision: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..Jul 19 2017, 1:50 AM

Closed by commit rL308422: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can… (authored by bmakam). · Explain WhyJul 19 2017, 1:55 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D38631: [SimplifyCFG] use pass options and remove the latesimplifycfg pass.Oct 6 2017, 9:24 AM

spatel mentioned this in rL316835: [SimplifyCFG] use pass options and remove the latesimplifycfg pass.Oct 28 2017, 11:43 AM

spatel mentioned this in D39407: [(new) Pass Manager] instantiate SimplifyCFG with the same options as the old PM.Dec 16 2017, 7:29 AM

mingmingl mentioned this in D134152: [SimplifyCFG][TranformUtils]Do not simplify away a trivial basic block if both this block and at least one of its predecessors are loop latches..Sep 22 2022, 9:45 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

JumpThreading.cpp

10 lines

Utils/

SimplifyCFG.cpp

16 lines

test/

CodeGen/

AArch64/

aarch64-loop-gep-opt.ll

6 lines

Transforms/

JumpThreading/

pr33605.ll

64 lines

static-profile.ll

4 lines

LoopUnroll/

peel-loop.ll

12 lines

LoopUnswitch/

2015-06-17-Metadata.ll

4 lines

infinite-loop.ll

4 lines

LoopVectorize/

X86/

float-induction-x86.ll

2 lines

float-induction.ll

2 lines

SimplifyCFG/

X86/

switch_to_lookup_table.ll

4 lines

multiple-phis.ll

2 lines

pr33605.ll

64 lines

preserve-llvm-loop-metadata.ll

2 lines

Diff 107264

llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	for (Function::iterator I = F.begin(), E = F.end(); I != E;) {
continue;		continue;
}		}

BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator());		BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator());

// Can't thread an unconditional jump, but if the block is "almost		// Can't thread an unconditional jump, but if the block is "almost
// empty", we can replace uses of it with uses of the successor and make		// empty", we can replace uses of it with uses of the successor and make
// this dead.		// this dead.
// We should not eliminate the loop header either, because eliminating		// We should not eliminate the loop header or latch either, because
// a loop header might later prevent LoopSimplify from transforming nested		// eliminating a loop header or latch might later prevent LoopSimplify
// loops into simplified form.		// from transforming nested loops into simplified form. We will rely on
		// later passes in backend to clean up empty blocks.
if (BI && BI->isUnconditional() &&		if (BI && BI->isUnconditional() &&
BB != &BB->getParent()->getEntryBlock() &&		BB != &BB->getParent()->getEntryBlock() &&
// If the terminator is the only non-phi instruction, try to nuke it.		// If the terminator is the only non-phi instruction, try to nuke it.
BB->getFirstNonPHIOrDbg()->isTerminator() && !LoopHeaders.count(BB)) {		BB->getFirstNonPHIOrDbg()->isTerminator() && !LoopHeaders.count(BB) &&
		!LoopHeaders.count(BI->getSuccessor(0))) {
// FIXME: It is always conservatively correct to drop the info		// FIXME: It is always conservatively correct to drop the info
// for a block even if it doesn't get erased. This isn't totally		// for a block even if it doesn't get erased. This isn't totally
// awesome, but it allows us to use AssertingVH to prevent nasty		// awesome, but it allows us to use AssertingVH to prevent nasty
// dangling pointer issues within LazyValueInfo.		// dangling pointer issues within LazyValueInfo.
LVI->eraseBlock(BB);		LVI->eraseBlock(BB);
if (TryToSimplifyUncondBranchFromEmptyBlock(BB))		if (TryToSimplifyUncondBranchFromEmptyBlock(BB))
Changed = true;		Changed = true;
}		}
▲ Show 20 Lines • Show All 2,125 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 5,650 Lines • ▼ Show 20 Lines	for (BasicBlock *OtherPred : predecessors(Succ)) {
return true;		return true;
}		}
return false;		return false;
}		}

bool SimplifyCFGOpt::SimplifyUncondBranch(BranchInst *BI,		bool SimplifyCFGOpt::SimplifyUncondBranch(BranchInst *BI,
IRBuilder<> &Builder) {		IRBuilder<> &Builder) {
BasicBlock *BB = BI->getParent();		BasicBlock *BB = BI->getParent();
		BasicBlock *Succ = BI->getSuccessor(0);

if (SinkCommon && SinkThenElseCodeToEnd(BI))		if (SinkCommon && SinkThenElseCodeToEnd(BI))
return true;		return true;

// If the Terminator is the only non-phi instruction, simplify the block.		// If the Terminator is the only non-phi instruction, simplify the block.
// if LoopHeader is provided, check if the block is a loop header		// if LoopHeader is provided, check if the block or its successor is a loop
// (This is for early invocations before loop simplify and vectorization		// header (This is for early invocations before loop simplify and
// to keep canonical loop forms for nested loops.		// vectorization to keep canonical loop forms for nested loops. These blocks
// These blocks can be eliminated when the pass is invoked later		// can be eliminated when the pass is invoked later in the back-end.)
// in the back-end.)		bool NeedCanonicalLoop =
		!LateSimplifyCFG &&
		(LoopHeaders && (LoopHeaders->count(BB) \|\| LoopHeaders->count(Succ)));
BasicBlock::iterator I = BB->getFirstNonPHIOrDbg()->getIterator();		BasicBlock::iterator I = BB->getFirstNonPHIOrDbg()->getIterator();
if (I->isTerminator() && BB != &BB->getParent()->getEntryBlock() &&		if (I->isTerminator() && BB != &BB->getParent()->getEntryBlock() &&
(!LoopHeaders \|\| !LoopHeaders->count(BB)) &&		!NeedCanonicalLoop && TryToSimplifyUncondBranchFromEmptyBlock(BB))
TryToSimplifyUncondBranchFromEmptyBlock(BB))
return true;		return true;

// If the only instruction in the block is a seteq/setne comparison		// If the only instruction in the block is a seteq/setne comparison
// against a constant, try to simplify the block.		// against a constant, try to simplify the block.
if (ICmpInst *ICI = dyn_cast<ICmpInst>(I))		if (ICmpInst *ICI = dyn_cast<ICmpInst>(I))
if (ICI->isEquality() && isa<ConstantInt>(ICI->getOperand(1))) {		if (ICI->isEquality() && isa<ConstantInt>(ICI->getOperand(1))) {
for (++I; isa<DbgInfoIntrinsic>(I); ++I)		for (++I; isa<DbgInfoIntrinsic>(I); ++I)
;		;
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/aarch64-loop-gep-opt.ll

Show All 13 Lines	; CHECK: br label %do.body.i

%tPos = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 0		%tPos = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 0
%k0 = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 1		%k0 = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 1
%.pre = load i32, i32* %tPos, align 4		%.pre = load i32, i32* %tPos, align 4
br label %do.body.i		br label %do.body.i

do.body.i:		do.body.i:
; CHECK-LABEL: do.body.i:		; CHECK-LABEL: do.body.i:
; CHECK: %uglygep2 = getelementptr i8, i8* %uglygep, i64 %3		; CHECK: %uglygep1 = getelementptr i8, i8* %uglygep, i64 %3
; CHECK-NEXT: %4 = bitcast i8* %uglygep2 to i32*		; CHECK-NEXT: %4 = bitcast i8* %uglygep1 to i32*
; CHECK-NOT: %uglygep2 = getelementptr i8, i8* %uglygep, i64 1032		; CHECK-NOT: %uglygep1 = getelementptr i8, i8* %uglygep, i64 1032


%0 = phi i32 [ 256, %entry ], [ %.be, %do.body.i.backedge ]		%0 = phi i32 [ 256, %entry ], [ %.be, %do.body.i.backedge ]
%1 = phi i32 [ 0, %entry ], [ %.be6, %do.body.i.backedge ]		%1 = phi i32 [ 0, %entry ], [ %.be6, %do.body.i.backedge ]
%add.i = add nsw i32 %1, %0		%add.i = add nsw i32 %1, %0
%shr.i = ashr i32 %add.i, 1		%shr.i = ashr i32 %add.i, 1
%idxprom.i = sext i32 %shr.i to i64		%idxprom.i = sext i32 %shr.i to i64
%arrayidx.i = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 3, i64 %idxprom.i		%arrayidx.i = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 3, i64 %idxprom.i
Show All 18 Lines

llvm/trunk/test/Transforms/JumpThreading/pr33605.ll

				; RUN: opt < %s -jump-threading -S \| FileCheck %s

				; Skip simplifying unconditional branches from empty blocks in simplifyCFG,
				; when it can destroy canonical loop structure.

				; void foo();
				; bool test(int a, int b, int *c) {
				; bool changed = false;
				; for (unsigned int i = 2; i--;) {
				; int r = a \| b;
				; if ( r != c[i]) {
				; c[i] = r;
				; foo();
				; changed = true;
				; }
				; }
				; return changed;
				; }

				; CHECK-LABEL: @test(
				; CHECK: for.cond:
				; CHECK-NEXT: %i.0 = phi i32 [ 2, %entry ], [ %dec, %if.end ]
				; CHECK: for.body:
				; CHECK: br i1 %cmp, label %if.end, label %if.then
				; CHECK-NOT: br i1 %cmp, label %for.cond, label %if.then
				; CHECK: if.then:
				; CHECK: br label %if.end
				; CHECK-NOT: br label %for.cond
				; CHECK: if.end:
				; CHECK br label %for.cond
				define i1 @test(i32 %a, i32 %b, i32* %c) {
				entry:
				br label %for.cond

				for.cond: ; preds = %if.end, %entry
				%i.0 = phi i32 [ 2, %entry ], [ %dec, %if.end ]
				%changed.0.off0 = phi i1 [ false, %entry ], [ %changed.1.off0, %if.end ]
				%dec = add nsw i32 %i.0, -1
				%tobool = icmp eq i32 %i.0, 0
				br i1 %tobool, label %for.cond.cleanup, label %for.body

				for.cond.cleanup: ; preds = %for.cond
				%changed.0.off0.lcssa = phi i1 [ %changed.0.off0, %for.cond ]
				ret i1 %changed.0.off0.lcssa

				for.body: ; preds = %for.cond
				%or = or i32 %a, %b
				%idxprom = sext i32 %dec to i64
				%arrayidx = getelementptr inbounds i32, i32* %c, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4
				%cmp = icmp eq i32 %or, %0
				br i1 %cmp, label %if.end, label %if.then

				if.then: ; preds = %for.body
				store i32 %or, i32* %arrayidx, align 4
				call void @foo()
				br label %if.end

				if.end: ; preds = %for.body, %if.then
				%changed.1.off0 = phi i1 [ true, %if.then ], [ %changed.0.off0, %for.body ]
				br label %for.cond
				}

				declare void @foo()

llvm/trunk/test/Transforms/JumpThreading/static-profile.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; CHECK: br i1 %cond1, label %check_2.thread, label %check_2{{$}}			; CHECK: br i1 %cond1, label %check_2.thread, label %check_2{{$}}

	eq_1:			eq_1:
	call void @bar()			call void @bar()
	br label %check_2			br label %check_2
	; Verify the new backedge:			; Verify the new backedge:
	; CHECK: check_2.thread:			; CHECK: check_2.thread:
	; CHECK-NEXT: call void @bar()			; CHECK-NEXT: call void @bar()
	; CHECK-NEXT: br label %check_1			; CHECK-NEXT: br label %check_3.thread

	check_2:			check_2:
	%cond2 = icmp eq i32 %v, 2			%cond2 = icmp eq i32 %v, 2
	br i1 %cond2, label %eq_2, label %check_3			br i1 %cond2, label %eq_2, label %check_3
	; No metadata:			; No metadata:
	; CHECK: br i1 %cond2, label %eq_2, label %check_3{{$}}			; CHECK: br i1 %cond2, label %eq_2, label %check_3{{$}}

	eq_2:			eq_2:
	call void @bar()			call void @bar()
	br label %check_3			br label %check_3
	; Verify the new backedge:			; Verify the new backedge:
	; CHECK: eq_2:			; CHECK: eq_2:
	; CHECK-NEXT: call void @bar()			; CHECK-NEXT: call void @bar()
	; CHECK-NEXT: br label %check_1			; CHECK-NEXT: br label %check_3.thread

	check_3:			check_3:
	%condE = icmp eq i32 %v, 3			%condE = icmp eq i32 %v, 3
	br i1 %condE, label %exit, label %check_1			br i1 %condE, label %exit, label %check_1
	; No metadata:			; No metadata:
	; CHECK: br i1 %condE, label %exit, label %check_1{{$}}			; CHECK: br i1 %condE, label %exit, label %check_1{{$}}

	exit:			exit:
	ret void			ret void
	}			}

	!0 = !{!"function_entry_count", i64 120}			!0 = !{!"function_entry_count", i64 120}
	; CHECK-NOT: branch_weights			; CHECK-NOT: branch_weights
	!1 = !{!"branch_weights", i32 119, i32 1}			!1 = !{!"branch_weights", i32 119, i32 1}
	; CHECK: !1 = !{!"branch_weights", i32 119, i32 1}			; CHECK: !1 = !{!"branch_weights", i32 119, i32 1}
	; CHECK-NOT: branch_weights			; CHECK-NOT: branch_weights

llvm/trunk/test/Transforms/LoopUnroll/peel-loop.ll

	Show All 12 Lines
	; CHECK: %[[INC1:.]] = getelementptr inbounds i32, i32 %p, i64 1			; CHECK: %[[INC1:.]] = getelementptr inbounds i32, i32 %p, i64 1
	; CHECK: store i32 1, i32* %[[INC1]], align 4			; CHECK: store i32 1, i32* %[[INC1]], align 4
	; CHECK: %[[CMP2:.*]] = icmp sgt i32 %k, 2			; CHECK: %[[CMP2:.*]] = icmp sgt i32 %k, 2
	; CHECK: br i1 %[[CMP2]], label %[[NEXT2:.*]], label %for.end			; CHECK: br i1 %[[CMP2]], label %[[NEXT2:.*]], label %for.end
	; CHECK: [[NEXT2]]:			; CHECK: [[NEXT2]]:
	; CHECK: %[[INC2:.]] = getelementptr inbounds i32, i32 %p, i64 2			; CHECK: %[[INC2:.]] = getelementptr inbounds i32, i32 %p, i64 2
	; CHECK: store i32 2, i32* %[[INC2]], align 4			; CHECK: store i32 2, i32* %[[INC2]], align 4
	; CHECK: %[[CMP3:.*]] = icmp eq i32 %k, 3			; CHECK: %[[CMP3:.*]] = icmp eq i32 %k, 3
	; CHECK: br i1 %[[CMP3]], label %for.end, label %[[LOOP:.*]]			; CHECK: br i1 %[[CMP3]], label %for.end, label %[[LOOP_PH:.*]]
				; CHECK: [[LOOP_PH]]:
				; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: %[[IV:.]] = phi i32 [ {{.}}, %[[LOOP]] ], [ 3, %[[NEXT2]] ]			; CHECK: %[[IV:.]] = phi i32 [ 3, %[[LOOP_PH]] ], [ {{.}}, %[[LOOP]] ]

	define void @basic(i32* %p, i32 %k) #0 {			define void @basic(i32* %p, i32 %k) #0 {
	entry:			entry:
	%cmp3 = icmp slt i32 0, %k			%cmp3 = icmp slt i32 0, %k
	br i1 %cmp3, label %for.body.lr.ph, label %for.end			br i1 %cmp3, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	br label %for.body			br label %for.body
	Show All 28 Lines
	; CHECK: %[[INC1:.]] = getelementptr inbounds i32, i32 %p, i64 1			; CHECK: %[[INC1:.]] = getelementptr inbounds i32, i32 %p, i64 1
	; CHECK: store i32 1, i32* %[[INC1]], align 4			; CHECK: store i32 1, i32* %[[INC1]], align 4
	; CHECK: %[[CMP2:.*]] = icmp sgt i32 %k, 2			; CHECK: %[[CMP2:.*]] = icmp sgt i32 %k, 2
	; CHECK: br i1 %[[CMP2]], label %[[NEXT2:.*]], label %for.end			; CHECK: br i1 %[[CMP2]], label %[[NEXT2:.*]], label %for.end
	; CHECK: [[NEXT2]]:			; CHECK: [[NEXT2]]:
	; CHECK: %[[INC2:.]] = getelementptr inbounds i32, i32 %p, i64 2			; CHECK: %[[INC2:.]] = getelementptr inbounds i32, i32 %p, i64 2
	; CHECK: store i32 2, i32* %[[INC2]], align 4			; CHECK: store i32 2, i32* %[[INC2]], align 4
	; CHECK: %[[CMP3:.*]] = icmp eq i32 %k, 3			; CHECK: %[[CMP3:.*]] = icmp eq i32 %k, 3
	; CHECK: br i1 %[[CMP3]], label %for.end, label %[[LOOP:.*]]			; CHECK: br i1 %[[CMP3]], label %for.end, label %[[LOOP_PH:.*]]
				; CHECK: [[LOOP_PH]]:
				; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: %[[IV:.]] = phi i32 [ %[[IV:.]], %[[LOOP]] ], [ 3, %[[NEXT2]] ]			; CHECK: %[[IV:.]] = phi i32 [ 3, %[[LOOP_PH]] ], [ %[[IV:.]], %[[LOOP]] ]
	; CHECK: %ret = phi i32 [ 0, %entry ], [ 1, %[[NEXT0]] ], [ 2, %[[NEXT1]] ], [ 3, %[[NEXT2]] ], [ %[[IV]], %[[LOOP]] ]			; CHECK: %ret = phi i32 [ 0, %entry ], [ 1, %[[NEXT0]] ], [ 2, %[[NEXT1]] ], [ 3, %[[NEXT2]] ], [ %[[IV]], %[[LOOP]] ]
	; CHECK: ret i32 %ret			; CHECK: ret i32 %ret
	define i32 @output(i32* %p, i32 %k) #0 {			define i32 @output(i32* %p, i32 %k) #0 {
	entry:			entry:
	%cmp3 = icmp slt i32 0, %k			%cmp3 = icmp slt i32 0, %k
	br i1 %cmp3, label %for.body.lr.ph, label %for.end			br i1 %cmp3, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	Show All 18 Lines

llvm/trunk/test/Transforms/LoopUnswitch/2015-06-17-Metadata.ll

	Show All 10 Lines

	for.body: ; preds = %for.inc, %for.body.lr.ph			for.body: ; preds = %for.inc, %for.body.lr.ph
	%inc.i = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]			%inc.i = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
	%mul.i = phi i32 [ 3, %for.body.lr.ph ], [ %mul.p, %for.inc ]			%mul.i = phi i32 [ 3, %for.body.lr.ph ], [ %mul.p, %for.inc ]
	%add.i = phi i32 [ %a, %for.body.lr.ph ], [ %add.p, %for.inc ]			%add.i = phi i32 [ %a, %for.body.lr.ph ], [ %add.p, %for.inc ]
	%cmp1 = icmp eq i32 %a, 12345			%cmp1 = icmp eq i32 %a, 12345
	br i1 %cmp1, label %if.then, label %if.else, !prof !0			br i1 %cmp1, label %if.then, label %if.else, !prof !0
	; CHECK: %cmp1 = icmp eq i32 %a, 12345			; CHECK: %cmp1 = icmp eq i32 %a, 12345
	; CHECK-NEXT: br i1 %cmp1, label %for.body.us, label %for.body, !prof !0			; CHECK-NEXT: br i1 %cmp1, label %for.body.preheader.split.us, label %for.body.preheader.split, !prof !0
	if.then: ; preds = %for.body			if.then: ; preds = %for.body
	; CHECK: for.body.us:			; CHECK: for.body.us:
	; CHECK: add nsw i32 %{{.*}}, 123			; CHECK: add nsw i32 %{{.*}}, 123
	; CHECK: %exitcond.us = icmp eq i32 %inc.us, %b			; CHECK: %exitcond.us = icmp eq i32 %inc.us, %b
	; CHECK: br i1 %exitcond.us, label %for.cond.cleanup, label %for.body.us			; CHECK: br i1 %exitcond.us, label %for.cond.cleanup, label %for.body.us
	%add = add nsw i32 %add.i, 123			%add = add nsw i32 %add.i, 123
	br label %for.inc			br label %for.inc

	Show All 20 Lines
	}			}

	define void @foo_swapped(i32 %a, i32 %b) {			define void @foo_swapped(i32 %a, i32 %b) {
	;CHECK-LABEL: foo_swapped			;CHECK-LABEL: foo_swapped
	entry:			entry:
	br label %for.body			br label %for.body
	;CHECK: entry:			;CHECK: entry:
	;CHECK-NEXT: %cmp1 = icmp eq i32 1, 2			;CHECK-NEXT: %cmp1 = icmp eq i32 1, 2
	;CHECK-NEXT: br i1 %cmp1, label %for.body, label %for.cond.cleanup.split, !prof !1			;CHECK-NEXT: br i1 %cmp1, label %entry.split, label %for.cond.cleanup.split, !prof !1
	;CHECK: for.body:			;CHECK: for.body:
	for.body: ; preds = %for.inc, %entry			for.body: ; preds = %for.inc, %entry
	%inc.i = phi i32 [ 0, %entry ], [ %inc, %if.then ]			%inc.i = phi i32 [ 0, %entry ], [ %inc, %if.then ]
	%add.i = phi i32 [ 100, %entry ], [ %add, %if.then ]			%add.i = phi i32 [ 100, %entry ], [ %add, %if.then ]
	%inc = add nuw nsw i32 %inc.i, 1			%inc = add nuw nsw i32 %inc.i, 1
	%cmp1 = icmp eq i32 1, 2			%cmp1 = icmp eq i32 1, 2
	br i1 %cmp1, label %if.then, label %for.cond.cleanup, !prof !0			br i1 %cmp1, label %if.then, label %for.cond.cleanup, !prof !0

	Show All 13 Lines

llvm/trunk/test/Transforms/LoopUnswitch/infinite-loop.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-unswitch -disable-output -stats -info-output-file - < %s \| FileCheck --check-prefix=STATS %s			; RUN: opt -loop-unswitch -disable-output -stats -info-output-file - < %s \| FileCheck --check-prefix=STATS %s
	; RUN: opt -loop-unswitch -simplifycfg -S < %s \| FileCheck %s			; RUN: opt -loop-unswitch -simplifycfg -S < %s \| FileCheck %s
	; PR5373			; PR5373

	; Loop unswitching shouldn't trivially unswitch the true case of condition %a			; Loop unswitching shouldn't trivially unswitch the true case of condition %a
	; in the code here because it leads to an infinite loop. While this doesn't			; in the code here because it leads to an infinite loop. While this doesn't
	; contain any instructions with side effects, it's still a kind of side effect.			; contain any instructions with side effects, it's still a kind of side effect.
	; It can trivially unswitch on the false cas of condition %a though.			; It can trivially unswitch on the false case of condition %a though.

	; STATS: 2 loop-unswitch - Number of branches unswitched			; STATS: 2 loop-unswitch - Number of branches unswitched
	; STATS: 2 loop-unswitch - Number of unswitches that are trivial			; STATS: 2 loop-unswitch - Number of unswitches that are trivial

	; CHECK-LABEL: @func_16(			; CHECK-LABEL: @func_16(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 %a, label %entry.split, label %abort0.split			; CHECK-NEXT: br i1 %a, label %entry.split, label %abort0.split

	; CHECK: entry.split:			; CHECK: entry.split:
	; CHECK-NEXT: br i1 %b, label %for.body, label %abort1.split			; CHECK-NEXT: br i1 %b, label %entry.split.split, label %abort1.split

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br label %for.body			; CHECK-NEXT: br label %for.body

	; CHECK: abort0.split:			; CHECK: abort0.split:
	; CHECK-NEXT: call void @end0() [[NOR_NUW:#[0-9]+]]			; CHECK-NEXT: call void @end0() [[NOR_NUW:#[0-9]+]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable

	Show All 31 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/float-induction-x86.ll

	; RUN: opt < %s -O3 -mcpu=core-avx2 -mtriple=x86_64-unknown-linux-gnu -S \| FileCheck --check-prefix AUTO_VEC %s			; RUN: opt < %s -O3 -latesimplifycfg -mcpu=core-avx2 -mtriple=x86_64-unknown-linux-gnu -S \| FileCheck --check-prefix AUTO_VEC %s

	; This test checks auto-vectorization with FP induction variable.			; This test checks auto-vectorization with FP induction variable.
	; The FP operation is not "fast" and requires "fast-math" function attribute.			; The FP operation is not "fast" and requires "fast-math" function attribute.

	;void fp_iv_loop1(float * __restrict__ A, int N) {			;void fp_iv_loop1(float * __restrict__ A, int N) {
	; float x = 1.0;			; float x = 1.0;
	; for (int i=0; i < N; ++i) {			; for (int i=0; i < N; ++i) {
	; A[i] = x;			; A[i] = x;
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/float-induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL1 %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL1 %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL2 %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL2 %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=1 -dce -instcombine -S \| FileCheck --check-prefix VEC1_INTERL2 %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=1 -dce -instcombine -S \| FileCheck --check-prefix VEC1_INTERL2 %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -dce -simplifycfg -instcombine -S \| FileCheck --check-prefix VEC2_INTERL1_PRED_STORE %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -dce -simplifycfg -instcombine -latesimplifycfg -S \| FileCheck --check-prefix VEC2_INTERL1_PRED_STORE %s

	@fp_inc = common global float 0.000000e+00, align 4			@fp_inc = common global float 0.000000e+00, align 4

	;void fp_iv_loop1(float init, float * __restrict__ A, int N) {			;void fp_iv_loop1(float init, float * __restrict__ A, int N) {
	; float x = init;			; float x = init;
	; for (int i=0; i < N; ++i) {			; for (int i=0; i < N; ++i) {
	; A[i] = x;			; A[i] = x;
	; x -= fp_inc;			; x -= fp_inc;
	▲ Show 20 Lines • Show All 328 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll

	Show First 20 Lines • Show All 1,316 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @covered_switch_with_bit_tests			; CHECK-LABEL: @covered_switch_with_bit_tests
	; CHECK: entry			; CHECK: entry
	; CHECK-NEXT: switch			; CHECK-NEXT: switch
	}			}

	; Speculation depth must be limited to avoid a zero-cost instruction cycle.			; Speculation depth must be limited to avoid a zero-cost instruction cycle.

	; CHECK-LABEL: @PR26308(			; CHECK-LABEL: @PR26308(
	; CHECK: while.body:			; CHECK: cleanup4:
	; CHECK-NEXT: br label %while.body			; CHECK-NEXT: br label %cleanup4

	define i32 @PR26308(i1 %B, i64 %load) {			define i32 @PR26308(i1 %B, i64 %load) {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	br label %cleanup			br label %cleanup

	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SimplifyCFG/multiple-phis.ll

	; RUN: opt -simplifycfg -S < %s \| FileCheck %s			; RUN: opt -latesimplifycfg -S < %s \| FileCheck %s

	; It's not worthwhile to if-convert one of the phi nodes and leave			; It's not worthwhile to if-convert one of the phi nodes and leave
	; the other behind, because that still requires a branch. If			; the other behind, because that still requires a branch. If
	; SimplifyCFG if-converts one of the phis, it should do both.			; SimplifyCFG if-converts one of the phis, it should do both.

	; CHECK: %div.high.addr.0 = select i1 %cmp1, i32 %div, i32 %high.addr.0			; CHECK: %div.high.addr.0 = select i1 %cmp1, i32 %div, i32 %high.addr.0
	; CHECK-NEXT: %low.0.add2 = select i1 %cmp1, i32 %low.0, i32 %add2			; CHECK-NEXT: %low.0.add2 = select i1 %cmp1, i32 %low.0, i32 %add2
	; CHECK-NEXT: br label %while.cond			; CHECK-NEXT: br label %while.cond
	Show All 30 Lines

llvm/trunk/test/Transforms/SimplifyCFG/pr33605.ll

				; RUN: opt < %s -simplifycfg -S \| FileCheck %s

				; Skip simplifying unconditional branches from empty blocks in simplifyCFG,
				; when it can destroy canonical loop structure.

				; void foo();
				; bool test(int a, int b, int *c) {
				; bool changed = false;
				; for (unsigned int i = 2; i--;) {
				; int r = a \| b;
				; if ( r != c[i]) {
				; c[i] = r;
				; foo();
				; changed = true;
				; }
				; }
				; return changed;
				; }

				; CHECK-LABEL: @test(
				; CHECK: for.cond:
				; CHECK-NEXT: %i.0 = phi i32 [ 2, %entry ], [ %dec, %if.end ]
				; CHECK: for.body:
				; CHECK: br i1 %cmp, label %if.end, label %if.then
				; CHECK-NOT: br i1 %cmp, label %for.cond, label %if.then
				; CHECK: if.then:
				; CHECK: br label %if.end
				; CHECK-NOT: br label %for.cond
				; CHECK: if.end:
				; CHECK br label %for.cond
				define i1 @test(i32 %a, i32 %b, i32* %c) {
				entry:
				br label %for.cond

				for.cond: ; preds = %if.end, %entry
				%i.0 = phi i32 [ 2, %entry ], [ %dec, %if.end ]
				%changed.0.off0 = phi i1 [ false, %entry ], [ %changed.1.off0, %if.end ]
				%dec = add nsw i32 %i.0, -1
				%tobool = icmp eq i32 %i.0, 0
				br i1 %tobool, label %for.cond.cleanup, label %for.body

				for.cond.cleanup: ; preds = %for.cond
				%changed.0.off0.lcssa = phi i1 [ %changed.0.off0, %for.cond ]
				ret i1 %changed.0.off0.lcssa

				for.body: ; preds = %for.cond
				%or = or i32 %a, %b
				%idxprom = sext i32 %dec to i64
				%arrayidx = getelementptr inbounds i32, i32* %c, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4
				%cmp = icmp eq i32 %or, %0
				br i1 %cmp, label %if.end, label %if.then

				if.then: ; preds = %for.body
				store i32 %or, i32* %arrayidx, align 4
				call void @foo()
				br label %if.end

				if.end: ; preds = %for.body, %if.then
				%changed.1.off0 = phi i1 [ true, %if.then ], [ %changed.0.off0, %for.body ]
				br label %for.cond
				}

				declare void @foo()

llvm/trunk/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll

	; RUN: opt -simplifycfg -S < %s \| FileCheck %s			; RUN: opt -latesimplifycfg -S < %s \| FileCheck %s

	define void @test1(i32 %n) #0 {			define void @test1(i32 %n) #0 {
	entry:			entry:
	%n.addr = alloca i32, align 4			%n.addr = alloca i32, align 4
	%count = alloca i32, align 4			%count = alloca i32, align 4
	store i32 %n, i32* %n.addr, align 4			store i32 %n, i32* %n.addr, align 4
	%0 = bitcast i32* %count to i8*			%0 = bitcast i32* %count to i8*
	store i32 0, i32* %count, align 4			store i32 0, i32* %count, align 4
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 107264

llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp

llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/trunk/test/CodeGen/AArch64/aarch64-loop-gep-opt.ll

llvm/trunk/test/Transforms/JumpThreading/pr33605.ll

llvm/trunk/test/Transforms/JumpThreading/static-profile.ll

llvm/trunk/test/Transforms/LoopUnroll/peel-loop.ll

llvm/trunk/test/Transforms/LoopUnswitch/2015-06-17-Metadata.ll

llvm/trunk/test/Transforms/LoopUnswitch/infinite-loop.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/float-induction-x86.ll

llvm/trunk/test/Transforms/LoopVectorize/float-induction.ll

llvm/trunk/test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll

llvm/trunk/test/Transforms/SimplifyCFG/multiple-phis.ll

llvm/trunk/test/Transforms/SimplifyCFG/pr33605.ll

llvm/trunk/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll

[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.
ClosedPublic