This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
LoopUnroll.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
revisit.ll
-
runtime-loop5.ll
-
unroll-heuristics-pgo.ll
-
unroll-pragmas.ll

Differential D27004

Set unroll remainder to epilog if profitable
ClosedPublic

Authored by evstupac on Nov 22 2016, 3:16 PM.

Download Raw Diff

Details

Reviewers

mzolotukhin
mkuper
sanjoy
zansari
hfinkel

Commits

rGe4b0813d62f8: Add test missed in r296770.
rG21bef2cb3c7f: The patch turns on epilogue unroll for loops with constant recurency start.
rL296962: Add test missed in r296770.
rL296770: The patch turns on epilogue unroll for loops with constant recurency start.

Summary

Set unroll remainder to epilog if a loop contains phi with constant parameter:

loop:
pn = phi [Const, PreHeader], [pn.next, Latch]
...

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac updated this revision to Diff 78953.Nov 22 2016, 3:16 PM

evstupac retitled this revision from to Set unroll remainder to epilog if profitable.

evstupac updated this object.

evstupac added reviewers: mzolotukhin, sanjoy.

evstupac set the repository for this revision to rL LLVM.

evstupac added subscribers: llvm-commits, mzolotukhin, sanjoy and 2 others.

PING.

PING2

Hi Evgeny,

Please find some comments inline.

Thanks

lib/Transforms/Utils/LoopUnroll.cpp
207 ↗	(On Diff #78953)	`Latch`is not used anywhere. `&&` should be used instead of `\|\|`.
212–213 ↗	(On Diff #78953)	This shouldn't be possible - I think we can assert this (if `getIncomingValueForBlock` doesn't assert it already).
215–216 ↗	(On Diff #78953)	This also should not be possible, so we can replace it with assert.
221–225 ↗	(On Diff #78953)	Where is this come from? Can you please elaborate what we're doing here?
360 ↗	(On Diff #78953)	s/Epiplog/Epilog/
test/Transforms/LoopUnroll/unroll-pragmas.ll
181–184 ↗	(On Diff #78953)	Can we also have a separate test for the new functionality?

Removed modulo of unroll count check.
Simplified and updated according to inline comments.

PING.

PING.
Are there any major concerns regarding epilogue, besides this: https://bugs.llvm.org//show_bug.cgi?id=30939?
It looks like we are accepting epilogue for vectorization for some reason and can not accept it for unroll.

evstupac added reviewers: hfinkel, mkuper, zansari.Feb 14 2017, 9:17 AM

In D27004#675251, @evstupac wrote:

PING.
Are there any major concerns regarding epilogue, besides this: https://bugs.llvm.org//show_bug.cgi?id=30939?
It looks like we are accepting epilogue for vectorization for some reason and can not accept it for unroll.

I think that we should move forward with this. PR30939, as explained in the bug report, does not seem like something we can reasonably work around - it seems like a prototypical microarchitectural sensitivity - dealing with branch predictors on unpredictable branches is always hard.

lib/Transforms/Utils/LoopUnroll.cpp
357 ↗	(On Diff #83398)	No space before (L).

This revision is now accepted and ready to land.Feb 14 2017, 10:41 AM

mzolotukhin added inline comments.Feb 14 2017, 11:36 AM

test/Transforms/LoopUnroll/epilog_const_phi.ll
4 ↗	(On Diff #83398)	s/inscombine/instcombine/ s/completle/completely/ I'd rather not rely on other passes (instcombine) to check loop-unroll. Can we check the CFG instead? That'll make the test smaller (since it'll be enough to have a minimal loop) and more robust to unrelated changes (say, instcombine for some reason stopped to simplify XOR through phis). It would be also nice to have a test, in which we don't use epilog (i.e. arguments of phis are not constant), to check that we're not doing it everywhere.

evstupac added inline comments.Feb 14 2017, 12:18 PM

test/Transforms/LoopUnroll/epilog_const_phi.ll
4 ↗	(On Diff #83398)	I'd rather not rely on other passes (instcombine) to check loop-unroll. Can we check the CFG instead? That'll make the test smaller (since it'll be enough to have a minimal loop) and more robust to unrelated changes (say, instcombine for some reason stopped to simplify XOR through phis). I agree, that right now it is more than unroll test. We can switch to CFG, but I'd like to leave XOR or smth else that shows clear profitability (when other optimizations are applied) instead of minimizing the loop. It would be also nice to have a test, in which we don't use epilog (i.e. arguments of phis are not constant), to check that we're not doing it everywhere. Good point. I'll add such.

mzolotukhin added inline comments.Feb 14 2017, 4:39 PM

test/Transforms/LoopUnroll/epilog_const_phi.ll
4 ↗	(On Diff #83398)	I agree, that right now it is more than unroll test. We can switch to CFG, but I'd like to leave XOR or smth else that shows clear profitability (when other optimizations are applied) instead of minimizing the loop. The goal of the tests is to test stuff, not to show profitability:) I'd rather mention the profitability in comments (which I think you already did). I think the smaller (~more decoupled) test the better. I'll add such. Thanks. Feel free to commit after it, no need to repost the updated patch.

Closed by commit rL296770: The patch turns on epilogue unroll for loops with constant recurency start. (authored by evstupac). · Explain WhyMar 2 2017, 9:50 AM

This revision was automatically updated to reflect the committed changes.

I'll add such.

I think you forgot to commit the test. Did you rename the old test and forget to add it before committing?

Michael

I think you forgot to commit the test. Did you rename the old test and forget to add it before committing?

Yes. It looks like I forgot to add a new test. I'll add it today.

Thanks for catching this,
Evgeny

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Utils/

LoopUnroll.cpp

45 lines

test/

Transforms/

LoopUnroll/

revisit.ll

6 lines

runtime-loop5.ll

7 lines

unroll-heuristics-pgo.ll

4 lines

unroll-pragmas.ll

24 lines

Diff 90350

llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	if (!NewLoop) {
NewLoop->addBasicBlockToLoop(ClonedBB, *LI);		NewLoop->addBasicBlockToLoop(ClonedBB, *LI);
return OldLoop;		return OldLoop;
} else {		} else {
NewLoop->addBasicBlockToLoop(ClonedBB, *LI);		NewLoop->addBasicBlockToLoop(ClonedBB, *LI);
return nullptr;		return nullptr;
}		}
}		}

		/// The function chooses which type of unroll (epilog or prolog) is more
		/// profitabale.
		/// Epilog unroll is more profitable when there is PHI that starts from
		/// constant. In this case epilog will leave PHI start from constant,
		/// but prolog will convert it to non-constant.
		///
		/// loop:
		/// PN = PHI [I, Latch], [CI, PreHeader]
		/// I = foo(PN)
		/// ...
		///
		/// Epilog unroll case.
		/// loop:
		/// PN = PHI [I2, Latch], [CI, PreHeader]
		/// I1 = foo(PN)
		/// I2 = foo(I1)
		/// ...
		/// Prolog unroll case.
		/// NewPN = PHI [PrologI, Prolog], [CI, PreHeader]
		/// loop:
		/// PN = PHI [I2, Latch], [NewPN, PreHeader]
		/// I1 = foo(PN)
		/// I2 = foo(I1)
		/// ...
		///
		static bool isEpilogProfitable(Loop *L) {
		BasicBlock *PreHeader = L->getLoopPreheader();
		BasicBlock *Header = L->getHeader();
		assert(PreHeader && Header);
		for (Instruction &BBI : *Header) {
		PHINode *PN = dyn_cast<PHINode>(&BBI);
		if (!PN)
		break;
		if (isa<ConstantInt>(PN->getIncomingValueForBlock(PreHeader)))
		return true;
		}
		return false;
		}

/// Unroll the given loop by Count. The loop must be in LCSSA form. Returns true		/// Unroll the given loop by Count. The loop must be in LCSSA form. Returns true
/// if unrolling was successful, or false if the loop was unmodified. Unrolling		/// if unrolling was successful, or false if the loop was unmodified. Unrolling
/// can only fail when the loop's latch block is not terminated by a conditional		/// can only fail when the loop's latch block is not terminated by a conditional
/// branch instruction. However, if the trip count (and multiple) are not known,		/// branch instruction. However, if the trip count (and multiple) are not known,
/// loop unrolling will mostly produce more code that is no faster.		/// loop unrolling will mostly produce more code that is no faster.
///		///
/// TripCount is the upper bound of the iteration on which control exits		/// TripCount is the upper bound of the iteration on which control exits
/// LatchBlock. Control may exit the loop prior to TripCount iterations either		/// LatchBlock. Control may exit the loop prior to TripCount iterations either
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	DEBUG(
for (auto &I : *BB)		for (auto &I : *BB)
if (auto CS = CallSite(&I))		if (auto CS = CallSite(&I))
HasConvergent \|= CS.isConvergent();		HasConvergent \|= CS.isConvergent();
assert((!HasConvergent \|\| TripMultiple % Count == 0) &&		assert((!HasConvergent \|\| TripMultiple % Count == 0) &&
"Unroll count must divide trip multiple if loop contains a "		"Unroll count must divide trip multiple if loop contains a "
"convergent operation.");		"convergent operation.");
});		});

		bool EpilogProfitability =
		UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog
		: isEpilogProfitable(L);

if (RuntimeTripCount && TripMultiple % Count != 0 &&		if (RuntimeTripCount && TripMultiple % Count != 0 &&
!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,		!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,
UnrollRuntimeEpilog, LI, SE, DT,		EpilogProfitability, LI, SE, DT,
PreserveLCSSA)) {		PreserveLCSSA)) {
if (Force)		if (Force)
RuntimeTripCount = false;		RuntimeTripCount = false;
else {		else {
DEBUG(		DEBUG(
dbgs() << "Wont unroll; remainder loop could not be generated"		dbgs() << "Wont unroll; remainder loop could not be generated"
"when assuming runtime trip count\n");		"when assuming runtime trip count\n");
return false;		return false;
▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopUnroll/revisit.ll

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; CHECK-CHILDREN-NOT: LoopUnrollPass			; CHECK-CHILDREN-NOT: LoopUnrollPass
	; CHECK-CHILDREN: LoopUnrollPass on Loop at depth 3 containing: %l0.0.1.1<header>			; CHECK-CHILDREN: LoopUnrollPass on Loop at depth 3 containing: %l0.0.1.1<header>
	; CHECK-CHILDREN-NOT: LoopUnrollPass			; CHECK-CHILDREN-NOT: LoopUnrollPass
	;			;
	; When we revisit children, we also revisit the current loop.			; When we revisit children, we also revisit the current loop.
	; CHECK-CHILDREN: LoopUnrollPass on Loop at depth 2 containing: %l0.0<header>			; CHECK-CHILDREN: LoopUnrollPass on Loop at depth 2 containing: %l0.0<header>
	; CHECK-CHILDREN-NOT: LoopUnrollPass			; CHECK-CHILDREN-NOT: LoopUnrollPass
	;			;
	; Revisit the children of the outer loop that are part of the prologue.			; Revisit the children of the outer loop that are part of the epilogue.
	;			;
	; CHECK: LoopUnrollPass on Loop at depth 2 containing: %l0.0.0.prol<header>			; CHECK: LoopUnrollPass on Loop at depth 2 containing: %l0.0.0.epil<header>
	; CHECK-NOT: LoopUnrollPass			; CHECK-NOT: LoopUnrollPass
	; CHECK: LoopUnrollPass on Loop at depth 2 containing: %l0.0.1.prol<header>			; CHECK: LoopUnrollPass on Loop at depth 2 containing: %l0.0.1.epil<header>
	; CHECK-NOT: LoopUnrollPass			; CHECK-NOT: LoopUnrollPass
	l0.latch:			l0.latch:
	br label %l0			br label %l0
	; CHECK: LoopUnrollPass on Loop at depth 1 containing: %l0<header>			; CHECK: LoopUnrollPass on Loop at depth 1 containing: %l0<header>
	; CHECK-NOT: LoopUnrollPass			; CHECK-NOT: LoopUnrollPass

	exit:			exit:
	ret void			ret void
	}			}
	!1 = !{!1, !2}			!1 = !{!1, !2}
	!2 = !{!"llvm.loop.unroll.count", i32 2}			!2 = !{!"llvm.loop.unroll.count", i32 2}

llvm/trunk/test/Transforms/LoopUnroll/runtime-loop5.ll

; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=16 \| FileCheck --check-prefix=UNROLL-16 %s		; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=16 \| FileCheck --check-prefix=UNROLL-16 %s
; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck --check-prefix=UNROLL-4 %s		; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck --check-prefix=UNROLL-4 %s

; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop(unroll)' -unroll-runtime=true -unroll-count=16 \| FileCheck --check-prefix=UNROLL-16 %s		; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop(unroll)' -unroll-runtime=true -unroll-count=16 \| FileCheck --check-prefix=UNROLL-16 %s
; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop(unroll)' -unroll-runtime=true -unroll-count=4 \| FileCheck --check-prefix=UNROLL-4 %s		; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop(unroll)' -unroll-runtime=true -unroll-count=4 \| FileCheck --check-prefix=UNROLL-4 %s

; Given that the trip-count of this loop is a 3-bit value, we cannot		; Given that the trip-count of this loop is a 3-bit value, we cannot
; safely unroll it with a count of anything more than 8.		; safely unroll it with a count of anything more than 8.

define i3 @test(i3* %a, i3 %n) {		define i3 @test(i3* %a, i3 %n) {
; UNROLL-16-LABEL: @test(		; UNROLL-16-LABEL: @test(
; UNROLL-4-LABEL: @test(		; UNROLL-4-LABEL: @test(
entry:		entry:
%cmp1 = icmp eq i3 %n, 0		%cmp1 = icmp eq i3 %n, 0
br i1 %cmp1, label %for.end, label %for.body		br i1 %cmp1, label %for.end, label %for.body

; UNROLL-16-NOT: for.body.prol:
; UNROLL-4: for.body.prol:

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
; UNROLL-16-LABEL: for.body:		; UNROLL-16-LABEL: for.body:
; UNROLL-4-LABEL: for.body:		; UNROLL-4-LABEL: for.body:
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
%sum.02 = phi i3 [ %add, %for.body ], [ 0, %entry ]		%sum.02 = phi i3 [ %add, %for.body ], [ 0, %entry ]
%arrayidx = getelementptr inbounds i3, i3* %a, i64 %indvars.iv		%arrayidx = getelementptr inbounds i3, i3* %a, i64 %indvars.iv

; UNROLL-16-LABEL: for.body		; UNROLL-16-LABEL: for.body
Show All 9 Lines	; UNROLL-4-LABEL: getelementptr
%add = add nsw i3 %0, %sum.02		%add = add nsw i3 %0, %sum.02
%indvars.iv.next = add i64 %indvars.iv, 1		%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i3		%lftr.wideiv = trunc i64 %indvars.iv.next to i3
%exitcond = icmp eq i3 %lftr.wideiv, %n		%exitcond = icmp eq i3 %lftr.wideiv, %n
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

; UNROLL-16-LABEL: for.end		; UNROLL-16-LABEL: for.end
; UNROLL-4-LABEL: for.end		; UNROLL-4-LABEL: for.end

		; UNROLL-16-NOT: for.body.epil:
		; UNROLL-4: for.body.epil:

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
%sum.0.lcssa = phi i3 [ 0, %entry ], [ %add, %for.body ]		%sum.0.lcssa = phi i3 [ 0, %entry ], [ %add, %for.body ]
ret i3 %sum.0.lcssa		ret i3 %sum.0.lcssa
}		}

llvm/trunk/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

	; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-max-percent-threshold-boost=100 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-max-percent-threshold-boost=100 \| FileCheck %s

	@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16			@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16

	; CHECK-LABEL: @bar_prof			; CHECK-LABEL: @bar_prof
	; CHECK: loop.prol:
	; CHECK: loop:			; CHECK: loop:
	; CHECK: %mul = mul			; CHECK: %mul = mul
	; CHECK: %mul.1 = mul			; CHECK: %mul.1 = mul
	; CHECK: %mul.2 = mul			; CHECK: %mul.2 = mul
	; CHECK: %mul.3 = mul			; CHECK: %mul.3 = mul
				; CHECK: loop.epil:
	define i32 @bar_prof(i32* noalias nocapture readonly %src, i64 %c) !prof !1 {			define i32 @bar_prof(i32* noalias nocapture readonly %src, i64 %c) !prof !1 {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]			%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]
	%r = phi i32 [ 0, %entry ], [ %add, %loop ]			%r = phi i32 [ 0, %entry ], [ %add, %loop ]
	%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv			%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv
	%src_element = load i32, i32* %arrayidx, align 4			%src_element = load i32, i32* %arrayidx, align 4
	%array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			%array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv
	%const_array_element = load i32, i32* %array_const_idx, align 4			%const_array_element = load i32, i32* %array_const_idx, align 4
	%mul = mul nsw i32 %src_element, %const_array_element			%mul = mul nsw i32 %src_element, %const_array_element
	%add = add nsw i32 %mul, %r			%add = add nsw i32 %mul, %r
	%inc = add nuw nsw i64 %iv, 1			%inc = add nuw nsw i64 %iv, 1
	%exitcond86.i = icmp eq i64 %inc, %c			%exitcond86.i = icmp eq i64 %inc, %c
	br i1 %exitcond86.i, label %loop.end, label %loop, !prof !2			br i1 %exitcond86.i, label %loop.end, label %loop, !prof !2

	loop.end:			loop.end:
	%r.lcssa = phi i32 [ %r, %loop ]			%r.lcssa = phi i32 [ %r, %loop ]
	ret i32 %r.lcssa			ret i32 %r.lcssa
	}			}

	; CHECK-LABEL: @bar_prof_flat			; CHECK-LABEL: @bar_prof_flat
	; CHECK-NOT: loop.prol			; CHECK-NOT: loop.epil
	define i32 @bar_prof_flat(i32* noalias nocapture readonly %src, i64 %c) !prof !1 {			define i32 @bar_prof_flat(i32* noalias nocapture readonly %src, i64 %c) !prof !1 {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]			%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]
	%r = phi i32 [ 0, %entry ], [ %add, %loop ]			%r = phi i32 [ 0, %entry ], [ %add, %loop ]
	%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv			%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv
	Show All 16 Lines

llvm/trunk/test/Transforms/LoopUnroll/unroll-pragmas.ll

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	}			}
	!8 = !{!8, !4}			!8 = !{!8, !4}

	; #pragma clang loop unroll_count(4)			; #pragma clang loop unroll_count(4)
	; Loop has a runtime trip count. Runtime unrolling should occur and loop			; Loop has a runtime trip count. Runtime unrolling should occur and loop
	; should be duplicated (original and 4x unrolled).			; should be duplicated (original and 4x unrolled).
	;			;
	; CHECK-LABEL: @runtime_loop_with_count4(			; CHECK-LABEL: @runtime_loop_with_count4(
	; CHECK: for.body.prol:
	; CHECK: store
	; CHECK-NOT: store
	; CHECK: br i1
	; CHECK: for.body			; CHECK: for.body
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: br i1			; CHECK: br i1
				; CHECK: for.body.epil:
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: br i1
	define void @runtime_loop_with_count4(i32* nocapture %a, i32 %b) {			define void @runtime_loop_with_count4(i32* nocapture %a, i32 %b) {
	entry:			entry:
	%cmp3 = icmp sgt i32 %b, 0			%cmp3 = icmp sgt i32 %b, 0
	br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !9			br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !9

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	!13 = !{!13, !14}			!13 = !{!13, !14}
	!14 = !{!"llvm.loop.unroll.enable"}			!14 = !{!"llvm.loop.unroll.enable"}

	; #pragma clang loop unroll(enable)			; #pragma clang loop unroll(enable)
	; Loop has a runtime trip count and should be runtime unrolled and duplicated			; Loop has a runtime trip count and should be runtime unrolled and duplicated
	; (original and 8x).			; (original and 8x).
	;			;
	; CHECK-LABEL: @runtime_loop_with_enable(			; CHECK-LABEL: @runtime_loop_with_enable(
	; CHECK: for.body.prol:
	; CHECK: store
	; CHECK-NOT: store
	; CHECK: br i1
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK-NOT: store i32			; CHECK-NOT: store i32
	; CHECK: br i1			; CHECK: br i1
				; CHECK: for.body.epil:
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: br i1
	define void @runtime_loop_with_enable(i32* nocapture %a, i32 %b) {			define void @runtime_loop_with_enable(i32* nocapture %a, i32 %b) {
	entry:			entry:
	%cmp3 = icmp sgt i32 %b, 0			%cmp3 = icmp sgt i32 %b, 0
	br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !8			br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !8

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	Show All 10 Lines
	}			}
	!15 = !{!15, !14}			!15 = !{!15, !14}

	; #pragma clang loop unroll_count(3)			; #pragma clang loop unroll_count(3)
	; Loop has a runtime trip count. Runtime unrolling should occur and loop			; Loop has a runtime trip count. Runtime unrolling should occur and loop
	; should be duplicated (original and 3x unrolled).			; should be duplicated (original and 3x unrolled).
	;			;
	; CHECK-LABEL: @runtime_loop_with_count3(			; CHECK-LABEL: @runtime_loop_with_count3(
	; CHECK: for.body.prol:
	; CHECK: store
	; CHECK-NOT: store
	; CHECK: br i1
	; CHECK: for.body			; CHECK: for.body
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: br i1			; CHECK: br i1
				; CHECK: for.body.epil:
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: br i1
	define void @runtime_loop_with_count3(i32* nocapture %a, i32 %b) {			define void @runtime_loop_with_count3(i32* nocapture %a, i32 %b) {
	entry:			entry:
	%cmp3 = icmp sgt i32 %b, 0			%cmp3 = icmp sgt i32 %b, 0
	br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !16			br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !16

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	Show All 13 Lines