This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
UnrollLoop.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
3
LoopUnroll.cpp
33
LoopUnrollRuntime.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
AArch64/
1
runtime-loop.ll
-
PowerPC/
-
a2-unrolling.ll
-
X86/
-
mmx.ll
1
high-cost-trip-count-computation.ll
-
runtime-loop.ll
-
runtime-loop1.ll
-
runtime-loop2.ll
-
runtime-loop4.ll
-
runtime-loop5.ll
-
tripcount-overflow.ll
-
unroll-cleanup.ll
-
unroll-pragmas.ll

Differential D18158

Adding ability to unroll loops using epilogue remainder.
ClosedPublic

Authored by evstupac on Mar 14 2016, 2:17 PM.

Download Raw Diff

Details

Reviewers

chandlerc
mzolotukhin
sanjoy
aizatsky
hfinkel

Commits

rG188de5ae6940: Adds the ability to use an epilog remainder loop during loop unrolling and…
rL265388: Adds the ability to use an epilog remainder loop during loop unrolling and makes

Summary

The patch adds an ability to unroll loops using epilogue remainder and makes it default (instead of current prologue),
The epilogue remainder is better as it do not produce additional non-constant phi nodes.
for(i = 0; i < n; i++)

...

For now loop unrolling looks like:
PROLOG:

ip = phi [0, ip.next]
...
ip.next = ip + 1
cmp ip, n&7

jne PROLOG
i0 = phi [0, ip.next]
LOOP (unrolling on 8):

i = phi [**i0**, i.next]
i.next = i + 1
cmp i, n

jne LOOP

In the LOOP (this is supposed to be unrolled) we have i = phi [i0, i.next].
Basically, that means that we don't know where we start now, however before unrolling we knew the start position is 0.
Assuming unrolled part of the loop is hotter than remainder (prologue currently), it is better to move this induction variable with unknown start to remainder code (which is done by epilogue technique).

Diff Detail

Event Timeline

evstupac updated this revision to Diff 50633.Mar 14 2016, 2:17 PM

evstupac retitled this revision from to Adding ability to unroll loops using epilogue remainder..

evstupac updated this object.

evstupac added reviewers: sanjoy, aizatsky, hfinkel, chandlerc, mzolotukhin.

evstupac set the repository for this revision to rL LLVM.

evstupac added a subscriber: llvm-commits.

Herald added a subscriber: sanjoy. · View Herald TranscriptMar 14 2016, 2:17 PM

Hi Evgeny,

I'm going to look at this patch soon, but at first glance I find one thing that distracts from the actual stuff: your patch contains a lot of unnecessary changes, such as typo fixes, rewording of not directly related comments, variables renaming etc. While such changes are fine by itself, there is no need to include them in the main patch. You can commit them before the patch, and no pre-commit review is required for obvious patches, such as typo fixes.

Michael

Hi Michael,

Thanks for taking the patch into pipeline.
Yes there are some rewording and comment changes in previously written code. However most of them implicitly related to newly written code and comments (basically because bigger code requires more detailed comments for understanding). Separating the patch into 2 is not a big deal. If it is more convenient for you to review the code after rewording I'll try to do it first.

Evgeny

Hi Evgeny,

A bunch of mostly stylistic comments from me are inline, I'd take a closer look later. I'm fine with keeping it in one patch too, usually it's just more convenient to commit small stuff before to move it out of the way.

Michael

lib/Transforms/Utils/LoopUnrollRuntime.cpp
131	I was talking mostly about stuff like this. This is very local change, that doesn't depend on the rest of the patch in any way.
175–176	You could use range-loop here.
180–184	Maybe use IR syntax in the comments?
193	Why is it `UndefValue`?
195–196	Why not Instruction *I = dyn_cast<Instruction>(PN->getIncomingValueForBlock(Latch)); ? And then use `I` instead of `V`.
219–220	You could do for (BasicBlock *Succ : successors(Latch))
224–225	We can just check `if(L->contains(*SBI))` before the loop, right?
236–237	Nitpick: this looks wrapped before 80 characters.
292	Unnecessary change.
373–374	Unnecessary change.
454–455	Nitpick: why remove the dot?
483–484	Looks like an unnecessary change.
test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
1–3	Where did `-mtriple aarch64 -mcpu=cortex-a57` go?
test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
37	These `CHECK` lines were intended to check the code before the loop, so if we start doing remainder iterations in epilogue, then we should look for `for.body` rather than `for.body.epil`, as it will appear first. This doesn't matter much in the current testcase since there are no `udiv` in the loop, but still.

evstupac added inline comments.Mar 18 2016, 3:56 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
175–176	Almost the same loops to iterate PHIs are widely used in Unroll code. If we change it here we should change everywhere. As for the range itself, do you mean we can construct vector of PHIs and iterate them?
180–184	Will make it more closer to IR, however I'd like to keep variable names from code here.
193	Since we added a branch around the Loop to NewExit this UndefValue will be updated correctly (the same technique is used in ConnectProlog). Here we can find value from PreHeader based on V from Latch, however it will be more complicated and further update.
195–196	We'll need an additional cast in this case as VMap[I] returns not an Instruction. And add additional check to add incoming value to EpilogPN, verifying when I is null (to add UndefValue).
219–220	Will update.
224–225	Correct. There is similar place in LoopUnroll.cpp which has this check in place already.

Updated according to latest comments:

Fix comments
Exclude unnecessary changes
Update iterators

Hi Evgeny,

Some comments inline. I hope to take a more detailed look into the actual logic of the patch soon.

Thanks,
Michael

lib/Transforms/Utils/LoopUnrollRuntime.cpp
175–176	No, I mean for (Instruction &I : NewExit) { auto *PN = dyn_cast<PHINode>(&I); if (!PN) break; ... Personally, I find this easier to read than the version with explicit iterators. No need to change the other code, at least definitely not in this patch.
180–184	Sure, no complains about the variable names:)
195–196	Right, I see now, thanks.
268–270	`CreateLoop` is a bit misleading name. Maybe `CreateRemainderLoop` or `CreateLoopForRemainder`? Also, some comments about `InsertTop` and `InsertBot` would be helpful here (I know we didn't have them before, but since you've already dug into the code, it's probably easy for you).
546–547	Maybe expand it just before the actual use?

evstupac added inline comments.Mar 23 2016, 7:53 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
546–547	Since TripCount is BECount + 1 it would be easier for combiner to optimize them when they are close. BECount expanded here before my changes. For epilog remainder I don't need it. Also when we add runtime unroll with non-power-of-2 factors we'll need BECount for extra iteration (for non-power-of-2 cases).

Updated according to latest comments.

flyingforyou added a subscriber: flyingforyou.Mar 23 2016, 8:41 PM

mzolotukhin added inline comments.Mar 28 2016, 3:07 PM

lib/Transforms/Utils/LoopUnroll.cpp
301	Trailing whitespace.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
546–547	I believe the patch for non-power-of-2 factor has already been committed. Could you please rebase this patch on ToT (and that would probably resolve my concern about BECount).

Patch rebased.

evstupac added inline comments.Mar 29 2016, 2:14 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
545–548	The other point here is that to expand BECount before actual use we'll need to recompute PreHeaderBR, as it is deleted at line 602.

mzolotukhin added inline comments.Mar 29 2016, 6:52 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
545–548	This makes sense now, thanks. I still feel uneasy about this function though, it's becoming kind of a spaghetti code. Would it be possible to refactor it a bit to get rid of multiple `if (UseEpilogRemainder)`? I think now the amount of code that is shared between these two cases isn't much bigger than the amount of code that is different. If we shuffle the code around (where possible), and factor out some common blocks to helper functions, we can keep only one `if (UseEpilogRemainder)`, or rather have two specialized functions.

Tried to minimize UseEpilogRemainder condition blocks.
The change affect tests (which I have not updated yet). If you agree the change is better solution, I'll update tests.

evstupac added inline comments.Mar 31 2016, 2:19 AM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
545–548	Shared code requires a lot of parameters. That way helper functions will add a lot of unnecessary parameters passing. Not sure this will simplify the code. The difference between epilog and prolog now refers to another way of prolog implementation. I can make prolog implementation closer to epilog, however this will affect prolog source code. Say prolog implementation do not create PrologPreHeader explicitly, while epilog do create EpilogPreHeader.

Tests updated.
Comments at UseEpilogRemainder if-then-else united to shorten modified code.

zzheng added a subscriber: zzheng.Mar 31 2016, 5:00 PM

Thanks, I like it better now! It looks good to me (with one remark inline).

Michael

lib/Transforms/Utils/LoopUnrollRuntime.cpp
652	You could reuse existing `IRBuilder` (just use `SetInsertPoint` method).

This revision is now accepted and ready to land.Apr 1 2016, 3:42 PM

I've seen 'unrolling loop' in comments, personally I prefer 'unrolled loop' for clarity.

lib/Transforms/Utils/LoopUnroll.cpp
297	'prolog'? I recommend changing prolog/remainder here to 'leftover'
lib/Transforms/Utils/LoopUnrollRuntime.cpp
146	You mean 'unrolled loop', right?
191	PN shoudl have 2 uses, right? One in epilogue loop or its preheader and one in Exit. Please clarify it only has 1 use for now as we'll add the use in epilogue later.

evstupac added inline comments.Apr 1 2016, 5:17 PM

lib/Transforms/Utils/LoopUnroll.cpp
297	Thanks for catching this. For sure it is not only for prolog it is for all kind of remainder loops. "leftover" is also good, but since I used "remainder" in all other places it is better to keep the same name here.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
146	Actually the loop is in unrolling process. Here in "UnrollRuntimeLoopRemainder" we only prepare loop for unroll. Actual unroll occurs at UnrollLoop function. So I'd like to keep "unrolling" here.
191	It has one use after splitting. It will have 2 uses. Line 196 add incoming value from PreHeader. Lines 214-217 explain how it will look like. The loop structure are commented at lines 162-172.

1 comment fix + SetInsetPoint instead of new Builder according to comments.

Closed by commit rL265388: Adds the ability to use an epilog remainder loop during loop unrolling and makes (authored by dlkreitz). · Explain WhyApr 5 2016, 5:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

UnrollLoop.h

9 lines

lib/

Transforms/

Utils/

LoopUnroll.cpp

12 lines

LoopUnrollRuntime.cpp

398 lines

test/

Transforms/

LoopUnroll/

AArch64/

runtime-loop.ll

20 lines

PowerPC/

a2-unrolling.ll

32 lines

X86/

mmx.ll

6 lines

high-cost-trip-count-computation.ll

2 lines

62 lines

40 lines

16 lines

22 lines

7 lines

tripcount-overflow.ll

10 lines

unroll-cleanup.ll

8 lines

unroll-pragmas.ll

24 lines

Diff 52300

include/llvm/Transforms/Utils/UnrollLoop.h

	Show All 28 Lines
	class Pass;			class Pass;
	class ScalarEvolution;			class ScalarEvolution;

	bool UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool AllowRuntime,			bool UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool AllowRuntime,
	bool AllowExpensiveTripCount, unsigned TripMultiple,			bool AllowExpensiveTripCount, unsigned TripMultiple,
	LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,			LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,
	AssumptionCache *AC, bool PreserveLCSSA);			AssumptionCache *AC, bool PreserveLCSSA);

	bool UnrollRuntimeLoopProlog(Loop *L, unsigned Count,			bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
	bool AllowExpensiveTripCount, LoopInfo *LI,			bool AllowExpensiveTripCount,
				bool UseEpilogRemainder, LoopInfo *LI,
	ScalarEvolution SE, DominatorTree DT,			ScalarEvolution SE, DominatorTree DT,
	bool PreserveLCSSA);			bool PreserveLCSSA);

	MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);			MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);
	}			}

	#endif			#endif

lib/Transforms/Utils/LoopUnroll.cpp

Show All 38 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"

// TODO: Should these be here or in LoopUnroll?		// TODO: Should these be here or in LoopUnroll?
STATISTIC(NumCompletelyUnrolled, "Number of loops completely unrolled");		STATISTIC(NumCompletelyUnrolled, "Number of loops completely unrolled");
STATISTIC(NumUnrolled, "Number of loops unrolled (completely or otherwise)");		STATISTIC(NumUnrolled, "Number of loops unrolled (completely or otherwise)");

		static cl::opt<bool>
		UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(true), cl::Hidden,
		cl::desc("Allow runtime unrolled loops to be unrolled "
		"with epilog instead of prolog."));

/// Convert the instruction operands from referencing the current values into		/// Convert the instruction operands from referencing the current values into
/// those specified by VMap.		/// those specified by VMap.
static inline void remapInstruction(Instruction *I,		static inline void remapInstruction(Instruction *I,
ValueToValueMapTy &VMap) {		ValueToValueMapTy &VMap) {
for (unsigned op = 0, E = I->getNumOperands(); op != E; ++op) {		for (unsigned op = 0, E = I->getNumOperands(); op != E; ++op) {
Value *Op = I->getOperand(op);		Value *Op = I->getOperand(op);
ValueToValueMapTy::iterator It = VMap.find(Op);		ValueToValueMapTy::iterator It = VMap.find(Op);
if (It != VMap.end())		if (It != VMap.end())
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	DEBUG(
for (auto &I : *BB)		for (auto &I : *BB)
if (auto CS = CallSite(&I))		if (auto CS = CallSite(&I))
HasConvergent \|= CS.isConvergent();		HasConvergent \|= CS.isConvergent();
assert((!HasConvergent \|\| TripMultiple % Count == 0) &&		assert((!HasConvergent \|\| TripMultiple % Count == 0) &&
"Unroll count must divide trip multiple if loop contains a "		"Unroll count must divide trip multiple if loop contains a "
"convergent "		"convergent "
"operation.");		"operation.");
});		});
// Don't output the runtime loop prolog if Count is a multiple of		// Don't output the runtime loop remainder if Count is a multiple of
// TripMultiple. Such a prolog is never needed, and is unsafe if the loop		// TripMultiple. Such a prolog is never needed, and is unsafe if the loop
		zzhengUnsubmitted Not Done Reply Inline Actions 'prolog'? I recommend changing prolog/remainder here to 'leftover' zzheng: 'prolog'? I recommend changing prolog/remainder here to 'leftover'
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Thanks for catching this. For sure it is not only for prolog it is for all kind of remainder loops. "leftover" is also good, but since I used "remainder" in all other places it is better to keep the same name here. evstupac: Thanks for catching this. For sure it is not only for prolog it is for all kind of remainder…
// contains a convergent instruction.		// contains a convergent instruction.
if (RuntimeTripCount && TripMultiple % Count != 0 &&		if (RuntimeTripCount && TripMultiple % Count != 0 &&
!UnrollRuntimeLoopProlog(L, Count, AllowExpensiveTripCount, LI, SE, DT,		!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,
		UnrollRuntimeEpilog, LI, SE, DT,
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Trailing whitespace. mzolotukhin: Trailing whitespace.
PreserveLCSSA))		PreserveLCSSA))
return false;		return false;

// Notify ScalarEvolution that the loop will be substantially changed,		// Notify ScalarEvolution that the loop will be substantially changed,
// if not outright eliminated.		// if not outright eliminated.
if (SE)		if (SE)
SE->forgetLoop(L);		SE->forgetLoop(L);

// If we know the trip count, we know the multiple...		// If we know the trip count, we know the multiple...
▲ Show 20 Lines • Show All 389 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUnrollRuntime.cpp

	Show All 10 Lines
	// trip counts. See LoopUnroll.cpp for unrolling loops with compile-time			// trip counts. See LoopUnroll.cpp for unrolling loops with compile-time
	// trip counts.			// trip counts.
	//			//
	// The functions in this file are used to generate extra code when the			// The functions in this file are used to generate extra code when the
	// run-time trip count modulo the unroll factor is not 0. When this is the			// run-time trip count modulo the unroll factor is not 0. When this is the
	// case, we need to generate code to execute these 'left over' iterations.			// case, we need to generate code to execute these 'left over' iterations.
	//			//
	// The current strategy generates an if-then-else sequence prior to the			// The current strategy generates an if-then-else sequence prior to the
	// unrolled loop to execute the 'left over' iterations. Other strategies			// unrolled loop to execute the 'left over' iterations before or after the
	// include generate a loop before or after the unrolled loop.			// unrolled loop.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/Transforms/Utils/UnrollLoop.h"			#include "llvm/Transforms/Utils/UnrollLoop.h"
	#include "llvm/ADT/Statistic.h"			#include "llvm/ADT/Statistic.h"
	#include "llvm/Analysis/AliasAnalysis.h"			#include "llvm/Analysis/AliasAnalysis.h"
	#include "llvm/Analysis/LoopIterator.h"			#include "llvm/Analysis/LoopIterator.h"
	#include "llvm/Analysis/LoopPass.h"			#include "llvm/Analysis/LoopPass.h"
	Show All 26 Lines
	/// - Create PHI nodes at prolog end block to combine values			/// - Create PHI nodes at prolog end block to combine values
	/// that exit the prolog code and jump around the prolog.			/// that exit the prolog code and jump around the prolog.
	/// - Add a PHI operand to a PHI node at the loop exit block			/// - Add a PHI operand to a PHI node at the loop exit block
	/// for values that exit the prolog and go around the loop.			/// for values that exit the prolog and go around the loop.
	/// - Branch around the original loop if the trip count is less			/// - Branch around the original loop if the trip count is less
	/// than the unroll factor.			/// than the unroll factor.
	///			///
	static void ConnectProlog(Loop L, Value BECount, unsigned Count,			static void ConnectProlog(Loop L, Value BECount, unsigned Count,
	BasicBlock LastPrologBB, BasicBlock PrologEnd,			BasicBlock PrologExit, BasicBlock PreHeader,
	BasicBlock OrigPH, BasicBlock NewPH,			BasicBlock *NewPreHeader, ValueToValueMapTy &VMap,
	ValueToValueMapTy &VMap, DominatorTree *DT,			DominatorTree DT, LoopInfo LI, bool PreserveLCSSA) {
	LoopInfo *LI, bool PreserveLCSSA) {
	BasicBlock *Latch = L->getLoopLatch();			BasicBlock *Latch = L->getLoopLatch();
	assert(Latch && "Loop must have a latch");			assert(Latch && "Loop must have a latch");
				BasicBlock *PrologLatch = cast<BasicBlock>(VMap[Latch]);

	// Create a PHI node for each outgoing value from the original loop			// Create a PHI node for each outgoing value from the original loop
	// (which means it is an outgoing value from the prolog code too).			// (which means it is an outgoing value from the prolog code too).
	// The new PHI node is inserted in the prolog end basic block.			// The new PHI node is inserted in the prolog end basic block.
	// The new PHI name is added as an operand of a PHI node in either			// The new PHI node value is added as an operand of a PHI node in either
	// the loop header or the loop exit block.			// the loop header or the loop exit block.
	for (succ_iterator SBI = succ_begin(Latch), SBE = succ_end(Latch);			for (BasicBlock *Succ : successors(Latch)) {
	SBI != SBE; ++SBI) {			for (Instruction &BBI : *Succ) {
	for (BasicBlock::iterator BBI = (*SBI)->begin();			PHINode *PN = dyn_cast<PHINode>(&BBI);
	PHINode *PN = dyn_cast<PHINode>(BBI); ++BBI) {			// Exit when we passed all PHI nodes.
				if (!PN)
				break;
	// Add a new PHI node to the prolog end block and add the			// Add a new PHI node to the prolog end block and add the
	// appropriate incoming values.			// appropriate incoming values.
	PHINode *NewPN = PHINode::Create(PN->getType(), 2, PN->getName()+".unr",			PHINode *NewPN = PHINode::Create(PN->getType(), 2, PN->getName() + ".unr",
	PrologEnd->getTerminator());			PrologExit->getFirstNonPHI());
	// Adding a value to the new PHI node from the original loop preheader.			// Adding a value to the new PHI node from the original loop preheader.
	// This is the value that skips all the prolog code.			// This is the value that skips all the prolog code.
	if (L->contains(PN)) {			if (L->contains(PN)) {
	NewPN->addIncoming(PN->getIncomingValueForBlock(NewPH), OrigPH);			NewPN->addIncoming(PN->getIncomingValueForBlock(NewPreHeader),
				PreHeader);
	} else {			} else {
	NewPN->addIncoming(UndefValue::get(PN->getType()), OrigPH);			NewPN->addIncoming(UndefValue::get(PN->getType()), PreHeader);
	}			}

	Value *V = PN->getIncomingValueForBlock(Latch);			Value *V = PN->getIncomingValueForBlock(Latch);
	if (Instruction *I = dyn_cast<Instruction>(V)) {			if (Instruction *I = dyn_cast<Instruction>(V)) {
	if (L->contains(I)) {			if (L->contains(I)) {
	V = VMap[I];			V = VMap[I];
	}			}
	}			}
	// Adding a value to the new PHI node from the last prolog block			// Adding a value to the new PHI node from the last prolog block
	// that was created.			// that was created.
	NewPN->addIncoming(V, LastPrologBB);			NewPN->addIncoming(V, PrologLatch);

	// Update the existing PHI node operand with the value from the			// Update the existing PHI node operand with the value from the
	// new PHI node. How this is done depends on if the existing			// new PHI node. How this is done depends on if the existing
	// PHI node is in the original loop block, or the exit block.			// PHI node is in the original loop block, or the exit block.
	if (L->contains(PN)) {			if (L->contains(PN)) {
	PN->setIncomingValue(PN->getBasicBlockIndex(NewPH), NewPN);			PN->setIncomingValue(PN->getBasicBlockIndex(NewPreHeader), NewPN);
	} else {			} else {
	PN->addIncoming(NewPN, PrologEnd);			PN->addIncoming(NewPN, PrologExit);
	}			}
	}			}
	}			}

	// Create a branch around the original loop, which is taken if there are no			// Create a branch around the original loop, which is taken if there are no
	// iterations remaining to be executed after running the prologue.			// iterations remaining to be executed after running the prologue.
	Instruction *InsertPt = PrologEnd->getTerminator();			Instruction *InsertPt = PrologExit->getTerminator();
	IRBuilder<> B(InsertPt);			IRBuilder<> B(InsertPt);

	assert(Count != 0 && "nonsensical Count!");			assert(Count != 0 && "nonsensical Count!");

	// If BECount <u (Count - 1) then (BECount + 1) % Count == (BECount + 1)			// If BECount <u (Count - 1) then (BECount + 1) % Count == (BECount + 1)
	// This means %xtraiter is (BECount + 1) and all of the iterations of this			// This means %xtraiter is (BECount + 1) and all of the iterations of this
	// loop were executed by the prologue. Note that if BECount <u (Count - 1)			// loop were executed by the prologue. Note that if BECount <u (Count - 1)
	// then (BECount + 1) cannot unsigned-overflow.			// then (BECount + 1) cannot unsigned-overflow.
	Value *BrLoopExit =			Value *BrLoopExit =
	B.CreateICmpULT(BECount, ConstantInt::get(BECount->getType(), Count - 1));			B.CreateICmpULT(BECount, ConstantInt::get(BECount->getType(), Count - 1));
	BasicBlock *Exit = L->getUniqueExitBlock();			BasicBlock *Exit = L->getUniqueExitBlock();
	assert(Exit && "Loop must have a single exit block only");			assert(Exit && "Loop must have a single exit block only");
	// Split the exit to maintain loop canonicalization guarantees			// Split the exit to maintain loop canonicalization guarantees
	SmallVector<BasicBlock*, 4> Preds(pred_begin(Exit), pred_end(Exit));			SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
				mzolotukhinUnsubmitted Not Done Reply Inline Actions I was talking mostly about stuff like this. This is very local change, that doesn't depend on the rest of the patch in any way. mzolotukhin: I was talking mostly about stuff like this. This is very local change, that doesn't depend on…
	SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,			SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,
	PreserveLCSSA);			PreserveLCSSA);
	// Add the branch to the exit block (around the unrolled loop)			// Add the branch to the exit block (around the unrolled loop)
	B.CreateCondBr(BrLoopExit, Exit, NewPH);			B.CreateCondBr(BrLoopExit, Exit, NewPreHeader);
				InsertPt->eraseFromParent();
				}

				/// Connect the unrolling epilog code to the original loop.
				/// The unrolling epilog code contains code to execute the
				/// 'extra' iterations if the run-time trip count modulo the
				/// unroll count is non-zero.
				///
				/// This function performs the following:
				/// - Update PHI nodes at the unrolling loop exit and epilog loop exit
				/// - Create PHI nodes at the unrolling loop exit to combine
				zzhengUnsubmitted Not Done Reply Inline Actions You mean 'unrolled loop', right? zzheng: You mean 'unrolled loop', right?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Actually the loop is in unrolling process. Here in "UnrollRuntimeLoopRemainder" we only prepare loop for unroll. Actual unroll occurs at UnrollLoop function. So I'd like to keep "unrolling" here. evstupac: Actually the loop is in unrolling process. Here in "UnrollRuntimeLoopRemainder" we only prepare…
				/// values that exit the unrolling loop code and jump around it.
				/// - Update PHI operands in the epilog loop by the new PHI nodes
				/// - Branch around the epilog loop if extra iters (ModVal) is zero.
				///
				static void ConnectEpilog(Loop L, Value ModVal, BasicBlock *NewExit,
				BasicBlock Exit, BasicBlock PreHeader,
				BasicBlock EpilogPreHeader, BasicBlock NewPreHeader,
				ValueToValueMapTy &VMap, DominatorTree *DT,
				LoopInfo *LI, bool PreserveLCSSA) {
				BasicBlock *Latch = L->getLoopLatch();
				assert(Latch && "Loop must have a latch");
				BasicBlock *EpilogLatch = cast<BasicBlock>(VMap[Latch]);

				// Loop structure should be the following:
				//
				// PreHeader
				// NewPreHeader
				// Header
				// ...
				// Latch
				// NewExit (PN)
				// EpilogPreHeader
				// EpilogHeader
				// ...
				// EpilogLatch
				// Exit (EpilogPN)

				// Update PHI nodes at NewExit and Exit.
				for (Instruction &BBI : *NewExit) {
				PHINode *PN = dyn_cast<PHINode>(&BBI);
				mzolotukhinUnsubmitted Not Done Reply Inline Actions You could use range-loop here. mzolotukhin: You could use range-loop here.
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Almost the same loops to iterate PHIs are widely used in Unroll code. If we change it here we should change everywhere. As for the range itself, do you mean we can construct vector of PHIs and iterate them? evstupac: Almost the same loops to iterate PHIs are widely used in Unroll code. If we change it here we…
				mzolotukhinUnsubmitted Not Done Reply Inline Actions No, I mean for (Instruction &I : NewExit) { auto PN = dyn_cast<PHINode>(&I); if (!PN) break; ... Personally, I find this easier to read than the version with explicit iterators. No need to change the other code, at least definitely not in this patch. mzolotukhin:* No, I mean ``` for (Instruction &I : NewExit) { auto *PN = dyn_cast<PHINode>(&I); if (!PN)…
				// Exit when we passed all PHI nodes.
				if (!PN)
				break;
				// PN should be used in another PHI located in Exit block as
				// Exit was split by SplitBlockPredecessors into Exit and NewExit
				// Basicaly it should look like:
				// NewExit:
				// PN = PHI [I, Latch]
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Maybe use IR syntax in the comments? mzolotukhin: Maybe use IR syntax in the comments?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Will make it more closer to IR, however I'd like to keep variable names from code here. evstupac: Will make it more closer to IR, however I'd like to keep variable names from code here.
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Sure, no complains about the variable names:) mzolotukhin: Sure, no complains about the variable names:)
				// ...
				// Exit:
				// EpilogPN = PHI [PN, EpilogPreHeader]
				//
				// There is EpilogPreHeader incoming block instead of NewExit as
				// NewExit was spilt 1 more time to get EpilogPreHeader.
				assert(PN->hasOneUse() && "The phi should have 1 use");
				zzhengUnsubmitted Not Done Reply Inline Actions PN shoudl have 2 uses, right? One in epilogue loop or its preheader and one in Exit. Please clarify it only has 1 use for now as we'll add the use in epilogue later. zzheng: PN shoudl have 2 uses, right? One in epilogue loop or its preheader and one in Exit. Please…
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions It has one use after splitting. It will have 2 uses. Line 196 add incoming value from PreHeader. Lines 214-217 explain how it will look like. The loop structure are commented at lines 162-172. evstupac: It has one use after splitting. It will have 2 uses. Line 196 add incoming value from PreHeader.
				PHINode *EpilogPN = cast<PHINode> (PN->use_begin()->getUser());
				assert(EpilogPN->getParent() == Exit && "EpilogPN should be in Exit block");
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Why is it `UndefValue`? mzolotukhin: Why is it `UndefValue`?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Since we added a branch around the Loop to NewExit this UndefValue will be updated correctly (the same technique is used in ConnectProlog). Here we can find value from PreHeader based on V from Latch, however it will be more complicated and further update. evstupac: Since we added a branch around the Loop to NewExit this UndefValue will be updated correctly…

				// Add incoming PreHeader from branch around the Loop
				PN->addIncoming(UndefValue::get(PN->getType()), PreHeader);
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Why not Instruction I = dyn_cast<Instruction>(PN->getIncomingValueForBlock(Latch)); ? And then use `I` instead of `V`. mzolotukhin:* Why not ``` Instruction *I = dyn_cast<Instruction>(PN->getIncomingValueForBlock(Latch)); ``` ?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions We'll need an additional cast in this case as VMap[I] returns not an Instruction. And add additional check to add incoming value to EpilogPN, verifying when I is null (to add UndefValue). evstupac: We'll need an additional cast in this case as VMap[I] returns not an Instruction. And add…
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Right, I see now, thanks. mzolotukhin: Right, I see now, thanks.

				Value *V = PN->getIncomingValueForBlock(Latch);
				Instruction *I = dyn_cast<Instruction>(V);
				if (I && L->contains(I))
				// If value comes from an instruction in the loop add VMap value.
				V = VMap[I];
				// For the instruction out of the loop, constant or undefined value
				// insert value itself.
				EpilogPN->addIncoming(V, EpilogLatch);

				assert(EpilogPN->getBasicBlockIndex(EpilogPreHeader) >= 0 &&
				"EpilogPN should have EpilogPreHeader incoming block");
				// Change EpilogPreHeader incoming block to NewExit.
				EpilogPN->setIncomingBlock(EpilogPN->getBasicBlockIndex(EpilogPreHeader),
				NewExit);
				// Now PHIs should look like:
				// NewExit:
				// PN = PHI [I, Latch], [undef, PreHeader]
				// ...
				// Exit:
				// EpilogPN = PHI [PN, NewExit], [VMap[I], EpilogLatch]
				}

				// Create PHI nodes at NewExit (from the unrolling loop Latch and PreHeader).
				mzolotukhinUnsubmitted Not Done Reply Inline Actions You could do for (BasicBlock Succ : successors(Latch)) mzolotukhin:* You could do ``` for (BasicBlock *Succ : successors(Latch)) ```
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Will update. evstupac: Will update.
				// Update corresponding PHI nodes in epilog loop.
				for (BasicBlock *Succ : successors(Latch)) {
				// Skip this as we already updated phis in exit blocks.
				if (!L->contains(Succ))
				continue;
				mzolotukhinUnsubmitted Not Done Reply Inline Actions We can just check `if(L->contains(SBI))` before the loop, right? mzolotukhin:* We can just check `if(L->contains(*SBI))` before the loop, right?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Correct. There is similar place in LoopUnroll.cpp which has this check in place already. evstupac: Correct. There is similar place in LoopUnroll.cpp which has this check in place already.
				for (Instruction &BBI : *Succ) {
				PHINode *PN = dyn_cast<PHINode>(&BBI);
				// Exit when we passed all PHI nodes.
				if (!PN)
				break;
				// Add new PHI nodes to the loop exit block and update epilog
				// PHIs with the new PHI values.
				PHINode *NewPN = PHINode::Create(PN->getType(), 2, PN->getName() + ".unr",
				NewExit->getFirstNonPHI());
				// Adding a value to the new PHI node from the unrolling loop preheader.
				NewPN->addIncoming(PN->getIncomingValueForBlock(NewPreHeader), PreHeader);
				// Adding a value to the new PHI node from the unrolling loop latch.
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Nitpick: this looks wrapped before 80 characters. mzolotukhin: Nitpick: this looks wrapped before 80 characters.
				NewPN->addIncoming(PN->getIncomingValueForBlock(Latch), Latch);

				// Update the existing PHI node operand with the value from the new PHI
				// node. Corresponding instruction in epilog loop should be PHI.
				PHINode *VPN = cast<PHINode>(VMap[&BBI]);
				VPN->setIncomingValue(VPN->getBasicBlockIndex(EpilogPreHeader), NewPN);
				}
				}

				Instruction *InsertPt = NewExit->getTerminator();
				IRBuilder<> B(InsertPt);
				Value *BrLoopExit = B.CreateIsNotNull(ModVal);
				assert(Exit && "Loop must have a single exit block only");
				// Split the exit to maintain loop canonicalization guarantees
				SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
				SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI,
				PreserveLCSSA);
				// Add the branch to the exit block (around the unrolling loop)
				B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit);
	InsertPt->eraseFromParent();			InsertPt->eraseFromParent();
	}			}

	/// Create a clone of the blocks in a loop and connect them together.			/// Create a clone of the blocks in a loop and connect them together.
	/// If UnrollProlog is true, loop structure will not be cloned, otherwise a new			/// If CreateRemainderLoop is false, loop structure will not be cloned,
	/// loop will be created including all cloned blocks, and the iterator of it			/// otherwise a new loop will be created including all cloned blocks, and the
	/// switches to count NewIter down to 0.			/// iterator of it switches to count NewIter down to 0.
				/// The cloned blocks should be inserted between InsertTop and InsertBot.
				/// If loop structure is cloned InsertTop should be new preheader, InsertBot
				/// new loop exit.
	///			///
	static void CloneLoopBlocks(Loop L, Value NewIter, const bool UnrollProlog,			static void CloneLoopBlocks(Loop L, Value NewIter,
				const bool CreateRemainderLoop,
				const bool UseEpilogRemainder,
				mzolotukhinUnsubmitted Not Done Reply Inline Actions `CreateLoop` is a bit misleading name. Maybe `CreateRemainderLoop` or `CreateLoopForRemainder`? Also, some comments about `InsertTop` and `InsertBot` would be helpful here (I know we didn't have them before, but since you've already dug into the code, it's probably easy for you). mzolotukhin: `CreateLoop` is a bit misleading name. Maybe `CreateRemainderLoop` or `CreateLoopForRemainder`?
	BasicBlock InsertTop, BasicBlock InsertBot,			BasicBlock InsertTop, BasicBlock InsertBot,
				BasicBlock *Preheader,
	std::vector<BasicBlock *> &NewBlocks,			std::vector<BasicBlock *> &NewBlocks,
	LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,			LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,
	LoopInfo *LI) {			LoopInfo *LI) {
	BasicBlock *Preheader = L->getLoopPreheader();			StringRef suffix = UseEpilogRemainder ? "epil" : "prol";
	BasicBlock *Header = L->getHeader();			BasicBlock *Header = L->getHeader();
	BasicBlock *Latch = L->getLoopLatch();			BasicBlock *Latch = L->getLoopLatch();
	Function *F = Header->getParent();			Function *F = Header->getParent();
	LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();			LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();
	LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();			LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();
	Loop *NewLoop = nullptr;			Loop *NewLoop = nullptr;
	Loop *ParentLoop = L->getParentLoop();			Loop *ParentLoop = L->getParentLoop();
	if (!UnrollProlog) {			if (CreateRemainderLoop) {
	NewLoop = new Loop();			NewLoop = new Loop();
	if (ParentLoop)			if (ParentLoop)
	ParentLoop->addChildLoop(NewLoop);			ParentLoop->addChildLoop(NewLoop);
	else			else
	LI->addTopLevelLoop(NewLoop);			LI->addTopLevelLoop(NewLoop);
	}			}

	// For each block in the original loop, create a new copy,			// For each block in the original loop, create a new copy,
	// and update the value map with the newly created values.			// and update the value map with the newly created values.
	for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {			for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
	BasicBlock NewBB = CloneBasicBlock(BB, VMap, ".prol", F);			BasicBlock NewBB = CloneBasicBlock(BB, VMap, "." + suffix, F);
	NewBlocks.push_back(NewBB);			NewBlocks.push_back(NewBB);

	if (NewLoop)			if (NewLoop)
	NewLoop->addBasicBlockToLoop(NewBB, *LI);			NewLoop->addBasicBlockToLoop(NewBB, *LI);
	else if (ParentLoop)			else if (ParentLoop)
	ParentLoop->addBasicBlockToLoop(NewBB, *LI);			ParentLoop->addBasicBlockToLoop(NewBB, *LI);

	VMap[*BB] = NewBB;			VMap[*BB] = NewBB;
	if (Header == *BB) {			if (Header == *BB) {
	// For the first block, add a CFG connection to this newly			// For the first block, add a CFG connection to this newly
	// created block.			// created block.
	InsertTop->getTerminator()->setSuccessor(0, NewBB);			InsertTop->getTerminator()->setSuccessor(0, NewBB);
	}			}

	if (Latch == *BB) {			if (Latch == *BB) {
	// For the last block, if UnrollProlog is true, create a direct jump to			// For the last block, if CreateRemainderLoop is false, create a direct
	// InsertBot. If not, create a loop back to cloned head.			// jump to InsertBot. If not, create a loop back to cloned head.
	VMap.erase((*BB)->getTerminator());			VMap.erase((*BB)->getTerminator());
	BasicBlock *FirstLoopBB = cast<BasicBlock>(VMap[Header]);			BasicBlock *FirstLoopBB = cast<BasicBlock>(VMap[Header]);
	BranchInst *LatchBR = cast<BranchInst>(NewBB->getTerminator());			BranchInst *LatchBR = cast<BranchInst>(NewBB->getTerminator());
	IRBuilder<> Builder(LatchBR);			IRBuilder<> Builder(LatchBR);
	if (UnrollProlog) {			if (!CreateRemainderLoop) {
	Builder.CreateBr(InsertBot);			Builder.CreateBr(InsertBot);
	} else {			} else {
	PHINode *NewIdx = PHINode::Create(NewIter->getType(), 2, "prol.iter",			PHINode *NewIdx = PHINode::Create(NewIter->getType(), 2,
				suffix + ".iter",
	FirstLoopBB->getFirstNonPHI());			FirstLoopBB->getFirstNonPHI());
	Value *IdxSub =			Value *IdxSub =
	Builder.CreateSub(NewIdx, ConstantInt::get(NewIdx->getType(), 1),			Builder.CreateSub(NewIdx, ConstantInt::get(NewIdx->getType(), 1),
	NewIdx->getName() + ".sub");			NewIdx->getName() + ".sub");
	Value *IdxCmp =			Value *IdxCmp =
	Builder.CreateIsNotNull(IdxSub, NewIdx->getName() + ".cmp");			Builder.CreateIsNotNull(IdxSub, NewIdx->getName() + ".cmp");
	Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot);			Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot);
	NewIdx->addIncoming(NewIter, InsertTop);			NewIdx->addIncoming(NewIter, InsertTop);
	NewIdx->addIncoming(IdxSub, NewBB);			NewIdx->addIncoming(IdxSub, NewBB);
	}			}
	LatchBR->eraseFromParent();			LatchBR->eraseFromParent();
	}			}
	}			}

	// Change the incoming values to the ones defined in the preheader or			// Change the incoming values to the ones defined in the preheader or
	// cloned loop.			// cloned loop.
	for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {			for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
	PHINode NewPHI = cast<PHINode>(VMap[&I]);			PHINode NewPHI = cast<PHINode>(VMap[&I]);
	if (UnrollProlog) {			if (!CreateRemainderLoop) {
				if (UseEpilogRemainder) {
				unsigned idx = NewPHI->getBasicBlockIndex(Preheader);
				NewPHI->setIncomingBlock(idx, InsertTop);
				NewPHI->removeIncomingValue(Latch, false);
				} else {
	VMap[&*I] = NewPHI->getIncomingValueForBlock(Preheader);			VMap[&*I] = NewPHI->getIncomingValueForBlock(Preheader);
	cast<BasicBlock>(VMap[Header])->getInstList().erase(NewPHI);			cast<BasicBlock>(VMap[Header])->getInstList().erase(NewPHI);
				}
	} else {			} else {
	unsigned idx = NewPHI->getBasicBlockIndex(Preheader);			unsigned idx = NewPHI->getBasicBlockIndex(Preheader);
	NewPHI->setIncomingBlock(idx, InsertTop);			NewPHI->setIncomingBlock(idx, InsertTop);
	BasicBlock *NewLatch = cast<BasicBlock>(VMap[Latch]);			BasicBlock *NewLatch = cast<BasicBlock>(VMap[Latch]);
	idx = NewPHI->getBasicBlockIndex(Latch);			idx = NewPHI->getBasicBlockIndex(Latch);
	Value *InVal = NewPHI->getIncomingValue(idx);			Value *InVal = NewPHI->getIncomingValue(idx);
	NewPHI->setIncomingBlock(idx, NewLatch);			NewPHI->setIncomingBlock(idx, NewLatch);
	if (VMap[InVal])			if (VMap[InVal])
	NewPHI->setIncomingValue(idx, VMap[InVal]);			NewPHI->setIncomingValue(idx, VMap[InVal]);
	}			}
	}			}
	if (NewLoop) {			if (NewLoop) {
	// Add unroll disable metadata to disable future unrolling for this loop.			// Add unroll disable metadata to disable future unrolling for this loop.
	SmallVector<Metadata *, 4> MDs;			SmallVector<Metadata *, 4> MDs;
	// Reserve first location for self reference to the LoopID metadata node.			// Reserve first location for self reference to the LoopID metadata node.
	MDs.push_back(nullptr);			MDs.push_back(nullptr);
	MDNode *LoopID = NewLoop->getLoopID();			MDNode *LoopID = NewLoop->getLoopID();
	if (LoopID) {			if (LoopID) {
	// First remove any existing loop unrolling metadata.			// First remove any existing loop unrolling metadata.
	for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {			for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
	bool IsUnrollMetadata = false;			bool IsUnrollMetadata = false;
	MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));			MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
	if (MD) {			if (MD) {
	const MDString *S = dyn_cast<MDString>(MD->getOperand(0));			const MDString *S = dyn_cast<MDString>(MD->getOperand(0));
	IsUnrollMetadata = S && S->getString().startswith("llvm.loop.unroll.");			IsUnrollMetadata = S && S->getString().startswith("llvm.loop.unroll.");
	}			}
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Unnecessary change. mzolotukhin: Unnecessary change.
	if (!IsUnrollMetadata)			if (!IsUnrollMetadata)
	MDs.push_back(LoopID->getOperand(i));			MDs.push_back(LoopID->getOperand(i));
	}			}
	}			}

	LLVMContext &Context = NewLoop->getHeader()->getContext();			LLVMContext &Context = NewLoop->getHeader()->getContext();
	SmallVector<Metadata *, 1> DisableOperands;			SmallVector<Metadata *, 1> DisableOperands;
	DisableOperands.push_back(MDString::get(Context, "llvm.loop.unroll.disable"));			DisableOperands.push_back(MDString::get(Context, "llvm.loop.unroll.disable"));
	MDNode *DisableNode = MDNode::get(Context, DisableOperands);			MDNode *DisableNode = MDNode::get(Context, DisableOperands);
	MDs.push_back(DisableNode);			MDs.push_back(DisableNode);

	MDNode *NewLoopID = MDNode::get(Context, MDs);			MDNode *NewLoopID = MDNode::get(Context, MDs);
	// Set operand 0 to refer to the loop id itself.			// Set operand 0 to refer to the loop id itself.
	NewLoopID->replaceOperandWith(0, NewLoopID);			NewLoopID->replaceOperandWith(0, NewLoopID);
	NewLoop->setLoopID(NewLoopID);			NewLoop->setLoopID(NewLoopID);
	}			}
	}			}

	/// Insert code in the prolog code when unrolling a loop with a			/// Insert code in the prolog/epilog code when unrolling a loop with a
	/// run-time trip-count.			/// run-time trip-count.
	///			///
	/// This method assumes that the loop unroll factor is total number			/// This method assumes that the loop unroll factor is total number
	/// of loop bodies in the loop after unrolling. (Some folks refer			/// of loop bodies in the loop after unrolling. (Some folks refer
	/// to the unroll factor as the number of extra copies added).			/// to the unroll factor as the number of extra copies added).
	/// We assume also that the loop unroll factor is a power-of-two. So, after			/// We assume also that the loop unroll factor is a power-of-two. So, after
	/// unrolling the loop, the number of loop bodies executed is 2,			/// unrolling the loop, the number of loop bodies executed is 2,
	/// 4, 8, etc. Note - LLVM converts the if-then-sequence to a switch			/// 4, 8, etc. Note - LLVM converts the if-then-sequence to a switch
	/// instruction in SimplifyCFG.cpp. Then, the backend decides how code for			/// instruction in SimplifyCFG.cpp. Then, the backend decides how code for
	/// the switch instruction is generated.			/// the switch instruction is generated.
	///			///
				/// *Prolog case*
	/// extraiters = tripcount % loopfactor			/// extraiters = tripcount % loopfactor
	/// if (extraiters == 0) jump Loop:			/// if (extraiters == 0) jump Loop:
	/// else jump Prol			/// else jump Prol
	/// Prol: LoopBody;			/// Prol: LoopBody;
	/// extraiters -= 1 // Omitted if unroll factor is 2.			/// extraiters -= 1 // Omitted if unroll factor is 2.
	/// if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.			/// if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
	/// if (tripcount < loopfactor) jump End			/// if (tripcount < loopfactor) jump End
	/// Loop:			/// Loop:
	/// ...			/// ...
	/// End:			/// End:
	///			///
	bool llvm::UnrollRuntimeLoopProlog(Loop *L, unsigned Count,			/// *Epilog case*
	bool AllowExpensiveTripCount, LoopInfo *LI,			/// extraiters = tripcount % loopfactor
	ScalarEvolution SE, DominatorTree DT,			/// if (extraiters == tripcount) jump LoopExit:
	bool PreserveLCSSA) {			/// unroll_iters = tripcount - extraiters
	// For now, only unroll loops that contain a single exit.			/// Loop: LoopBody; (executes unroll_iter times);
				/// unroll_iter -= 1
				/// if (unroll_iter != 0) jump Loop:
				/// LoopExit:
				/// if (extraiters == 0) jump EpilExit:
				/// Epil: LoopBody; (executes extraiters times)
				/// extraiters -= 1 // Omitted if unroll factor is 2.
				/// if (extraiters != 0) jump Epil: // Omitted if unroll factor is 2.
				/// EpilExit:

				bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
				bool AllowExpensiveTripCount,
				bool UseEpilogRemainder,
				LoopInfo LI, ScalarEvolution SE,
				DominatorTree *DT, bool PreserveLCSSA) {
				// for now, only unroll loops that contain a single exit
	if (!L->getExitingBlock())			if (!L->getExitingBlock())
	return false;			return false;

	// Make sure the loop is in canonical form, and there is a single			// Make sure the loop is in canonical form, and there is a single
	// exit block only.			// exit block only.
	if (!L->isLoopSimplifyForm() \|\| !L->getUniqueExitBlock())			if (!L->isLoopSimplifyForm())
				return false;
				BasicBlock *Exit = L->getUniqueExitBlock(); // successor out of loop
				if (!Exit)
	return false;			return false;

	mzolotukhinUnsubmitted Not Done Reply Inline Actions Unnecessary change. mzolotukhin: Unnecessary change.
	// Use Scalar Evolution to compute the trip count. This allows more loops to			// Use Scalar Evolution to compute the trip count. This allows more loops to
	// be unrolled than relying on induction var simplification.			// be unrolled than relying on induction var simplification.
	if (!SE)			if (!SE)
	return false;			return false;

	// Only unroll loops with a computable trip count, and the trip count needs			// Only unroll loops with a computable trip count, and the trip count needs
	// to be an int value (allowing a pointer type is a TODO item).			// to be an int value (allowing a pointer type is a TODO item).
	const SCEV *BECountSC = SE->getBackedgeTakenCount(L);			const SCEV *BECountSC = SE->getBackedgeTakenCount(L);
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Nitpick: why remove the dot? mzolotukhin: Nitpick: why remove the dot?
	if (isa<SCEVCouldNotCompute>(BECountSC) \|\|			if (isa<SCEVCouldNotCompute>(BECountSC) \|\|
	!BECountSC->getType()->isIntegerTy())			!BECountSC->getType()->isIntegerTy())
	return false;			return false;

	unsigned BEWidth = cast<IntegerType>(BECountSC->getType())->getBitWidth();			unsigned BEWidth = cast<IntegerType>(BECountSC->getType())->getBitWidth();

	// Add 1 since the backedge count doesn't include the first loop iteration.			// Add 1 since the backedge count doesn't include the first loop iteration.
	const SCEV *TripCountSC =			const SCEV *TripCountSC =
	SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));			SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));
	if (isa<SCEVCouldNotCompute>(TripCountSC))			if (isa<SCEVCouldNotCompute>(TripCountSC))
	return false;			return false;

	BasicBlock *Header = L->getHeader();			BasicBlock *Header = L->getHeader();
	BasicBlock *PH = L->getLoopPreheader();			BasicBlock *PreHeader = L->getLoopPreheader();
	BranchInst *PreHeaderBR = cast<BranchInst>(PH->getTerminator());			BranchInst *PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
	const DataLayout &DL = Header->getModule()->getDataLayout();			const DataLayout &DL = Header->getModule()->getDataLayout();
	SCEVExpander Expander(*SE, DL, "loop-unroll");			SCEVExpander Expander(*SE, DL, "loop-unroll");
	if (!AllowExpensiveTripCount &&			if (!AllowExpensiveTripCount &&
	Expander.isHighCostExpansion(TripCountSC, L, PreHeaderBR))			Expander.isHighCostExpansion(TripCountSC, L, PreHeaderBR))
	return false;			return false;

	// This constraint lets us deal with an overflowing trip count easily; see the			// This constraint lets us deal with an overflowing trip count easily; see the
	// comment on ModVal below.			// comment on ModVal below.
	if (Log2_32(Count) > BEWidth)			if (Log2_32(Count) > BEWidth)
	return false;			return false;

	// If this loop is nested, then the loop unroller changes the code in the			// If this loop is nested, then the loop unroller changes the code in the
	// parent loop, so the Scalar Evolution pass needs to be run again.			// parent loop, so the Scalar Evolution pass needs to be run again.
	if (Loop *ParentLoop = L->getParentLoop())			if (Loop *ParentLoop = L->getParentLoop())
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Looks like an unnecessary change. mzolotukhin: Looks like an unnecessary change.
	SE->forgetLoop(ParentLoop);			SE->forgetLoop(ParentLoop);

	BasicBlock *Latch = L->getLoopLatch();			BasicBlock *Latch = L->getLoopLatch();
	// It helps to split the original preheader twice, one for the end of the
	// prolog code and one for a new loop preheader.
	BasicBlock *PEnd = SplitEdge(PH, Header, DT, LI);
	BasicBlock *NewPH = SplitBlock(PEnd, PEnd->getTerminator(), DT, LI);
	PreHeaderBR = cast<BranchInst>(PH->getTerminator());

				// Loop structure is the following:
				//
				// PreHeader
				// Header
				// ...
				// Latch
				// Exit

				BasicBlock *NewPreHeader;
				BasicBlock *NewExit = nullptr;
				BasicBlock *PrologExit = nullptr;
				BasicBlock *EpilogPreHeader = nullptr;
				BasicBlock *PrologPreHeader = nullptr;

				if (UseEpilogRemainder) {
				// If epilog remainder
				// Split PreHeader to insert a branch around loop for unrolling.
				NewPreHeader = SplitBlock(PreHeader, PreHeader->getTerminator(), DT, LI);
				NewPreHeader->setName(PreHeader->getName() + ".new");
				// Split Exit to create phi nodes from branch above.
				SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
				NewExit = SplitBlockPredecessors(Exit, Preds, ".unr-lcssa",
				DT, LI, PreserveLCSSA);
				// Split NewExit to insert epilog remainder loop.
				EpilogPreHeader = SplitBlock(NewExit, NewExit->getTerminator(), DT, LI);
				EpilogPreHeader->setName(Header->getName() + ".epil.preheader");
				} else {
				// If prolog remainder
				// Split the original preheader twice to insert prolog remainder loop
				PrologPreHeader = SplitEdge(PreHeader, Header, DT, LI);
				PrologPreHeader->setName(Header->getName() + ".prol.preheader");
				PrologExit = SplitBlock(PrologPreHeader, PrologPreHeader->getTerminator(),
				DT, LI);
				PrologExit->setName(Header->getName() + ".prol.loopexit");
				// Split PrologExit to get NewPreHeader.
				NewPreHeader = SplitBlock(PrologExit, PrologExit->getTerminator(), DT, LI);
				NewPreHeader->setName(PreHeader->getName() + ".new");
				}
				// Loop structure should be the following:
				// Epilog Prolog
				//
				// PreHeader PreHeader
				// NewPreHeader PrologPreHeader
				// Header *PrologExit
				// ... *NewPreHeader
				// Latch Header
				// *NewExit ...
				// *EpilogPreHeader Latch
				// Exit Exit

				// Calculate conditions for branch around loop for unrolling
				// in epilog case and around prolog remainder loop in prolog case.
	// Compute the number of extra iterations required, which is:			// Compute the number of extra iterations required, which is:
	// extra iterations = run-time trip count % (loop unroll factor + 1)			// extra iterations = run-time trip count % loop unroll factor
				PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
	Value *TripCount = Expander.expandCodeFor(TripCountSC, TripCountSC->getType(),			Value *TripCount = Expander.expandCodeFor(TripCountSC, TripCountSC->getType(),
	PreHeaderBR);			PreHeaderBR);
	Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),			Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),
	PreHeaderBR);			PreHeaderBR);
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Maybe expand it just before the actual use? mzolotukhin: Maybe expand it just before the actual use?
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Since TripCount is BECount + 1 it would be easier for combiner to optimize them when they are close. BECount expanded here before my changes. For epilog remainder I don't need it. Also when we add runtime unroll with non-power-of-2 factors we'll need BECount for extra iteration (for non-power-of-2 cases). evstupac: Since TripCount is BECount + 1 it would be easier for combiner to optimize them when they are…
				mzolotukhinUnsubmitted Not Done Reply Inline Actions I believe the patch for non-power-of-2 factor has already been committed. Could you please rebase this patch on ToT (and that would probably resolve my concern about BECount). mzolotukhin: I believe the patch for non-power-of-2 factor has already been committed. Could you please…

	IRBuilder<> B(PreHeaderBR);			IRBuilder<> B(PreHeaderBR);
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions The other point here is that to expand BECount before actual use we'll need to recompute PreHeaderBR, as it is deleted at line 602. evstupac: The other point here is that to expand BECount before actual use we'll need to recompute…
				mzolotukhinUnsubmitted Not Done Reply Inline Actions This makes sense now, thanks. I still feel uneasy about this function though, it's becoming kind of a spaghetti code. Would it be possible to refactor it a bit to get rid of multiple `if (UseEpilogRemainder)`? I think now the amount of code that is shared between these two cases isn't much bigger than the amount of code that is different. If we shuffle the code around (where possible), and factor out some common blocks to helper functions, we can keep only one `if (UseEpilogRemainder)`, or rather have two specialized functions. mzolotukhin: This makes sense now, thanks. I still feel uneasy about this function though, it's becoming…
				evstupacAuthorUnsubmitted Not Done Reply Inline Actions Shared code requires a lot of parameters. That way helper functions will add a lot of unnecessary parameters passing. Not sure this will simplify the code. The difference between epilog and prolog now refers to another way of prolog implementation. I can make prolog implementation closer to epilog, however this will affect prolog source code. Say prolog implementation do not create PrologPreHeader explicitly, while epilog do create EpilogPreHeader. evstupac: Shared code requires a lot of parameters. That way helper functions will add a lot of…
	Value *ModVal;			Value *ModVal;
	// Calculate ModVal = (BECount + 1) % Count.			// Calculate ModVal = (BECount + 1) % Count.
	// Note that TripCount is BECount + 1.			// Note that TripCount is BECount + 1.
	if (isPowerOf2_32(Count)) {			if (isPowerOf2_32(Count)) {
				// When Count is power of 2 we don't BECount for epilog case, however we'll
				// need it for a branch around unrolling loop for prolog case.
	ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");			ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");
	// 1. There are no iterations to be run in the prologue loop.			// 1. There are no iterations to be run in the prolog/epilog loop.
	// OR			// OR
	// 2. The addition computing TripCount overflowed.			// 2. The addition computing TripCount overflowed.
	//			//
	// If (2) is true, we know that TripCount really is (1 << BEWidth) and so			// If (2) is true, we know that TripCount really is (1 << BEWidth) and so
	// the number of iterations that remain to be run in the original loop is a			// the number of iterations that remain to be run in the original loop is a
	// multiple Count == (1 << Log2(Count)) because Log2(Count) <= BEWidth (we			// multiple Count == (1 << Log2(Count)) because Log2(Count) <= BEWidth (we
	// explicitly check this above).			// explicitly check this above).
	} else {			} else {
	// As (BECount + 1) can potentially unsigned overflow we count			// As (BECount + 1) can potentially unsigned overflow we count
	// (BECount % Count) + 1 which is overflow safe as BECount % Count < Count.			// (BECount % Count) + 1 which is overflow safe as BECount % Count < Count.
	Value *ModValTmp = B.CreateURem(BECount,			Value *ModValTmp = B.CreateURem(BECount,
	ConstantInt::get(BECount->getType(),			ConstantInt::get(BECount->getType(),
	Count));			Count));
	Value *ModValAdd = B.CreateAdd(ModValTmp,			Value *ModValAdd = B.CreateAdd(ModValTmp,
	ConstantInt::get(ModValTmp->getType(), 1));			ConstantInt::get(ModValTmp->getType(), 1));
	// At that point (BECount % Count) + 1 could be equal to Count.			// At that point (BECount % Count) + 1 could be equal to Count.
	// To handle this case we need to take mod by Count one more time.			// To handle this case we need to take mod by Count one more time.
	ModVal = B.CreateURem(ModValAdd,			ModVal = B.CreateURem(ModValAdd,
	ConstantInt::get(BECount->getType(), Count),			ConstantInt::get(BECount->getType(), Count),
	"xtraiter");			"xtraiter");
	}			}
	Value *BranchVal = B.CreateIsNotNull(ModVal, "lcmp.mod");			Value *CmpOperand =
				UseEpilogRemainder ? TripCount :
	// Branch to either the extra iterations or the cloned/unrolled loop.			ConstantInt::get(TripCount->getType(), 0);
	// We will fix up the true branch label when adding loop body copies.			Value *BranchVal = B.CreateICmpNE(ModVal, CmpOperand, "lcmp.mod");
	B.CreateCondBr(BranchVal, PEnd, PEnd);			BasicBlock *FirstLoop = UseEpilogRemainder ? NewPreHeader : PrologPreHeader;
	assert(PreHeaderBR->isUnconditional() &&			BasicBlock *SecondLoop = UseEpilogRemainder ? NewExit : PrologExit;
	PreHeaderBR->getSuccessor(0) == PEnd &&			// Branch to either remainder (extra iterations) loop or unrolling loop.
	"CFG edges in Preheader are not correct");			B.CreateCondBr(BranchVal, FirstLoop, SecondLoop);
	PreHeaderBR->eraseFromParent();			PreHeaderBR->eraseFromParent();
	Function *F = Header->getParent();			Function *F = Header->getParent();
	// Get an ordered list of blocks in the loop to help with the ordering of the			// Get an ordered list of blocks in the loop to help with the ordering of the
	// cloned blocks in the prolog code.			// cloned blocks in the prolog/epilog code
	LoopBlocksDFS LoopBlocks(L);			LoopBlocksDFS LoopBlocks(L);
	LoopBlocks.perform(LI);			LoopBlocks.perform(LI);

	//			//
	// For each extra loop iteration, create a copy of the loop's basic blocks			// For each extra loop iteration, create a copy of the loop's basic blocks
	// and generate a condition that branches to the copy depending on the			// and generate a condition that branches to the copy depending on the
	// number of 'left over' iterations.			// number of 'left over' iterations.
	//			//
	std::vector<BasicBlock *> NewBlocks;			std::vector<BasicBlock *> NewBlocks;
	ValueToValueMapTy VMap;			ValueToValueMapTy VMap;

	bool UnrollPrologue = Count == 2;			// For unroll factor 2 remainder loop will have 1 iterations.
				// Do not create 1 iteration loop.
				bool CreateRemainderLoop = (Count != 2);

	// Clone all the basic blocks in the loop. If Count is 2, we don't clone			// Clone all the basic blocks in the loop. If Count is 2, we don't clone
	// the loop, otherwise we create a cloned loop to execute the extra			// the loop, otherwise we create a cloned loop to execute the extra
	// iterations. This function adds the appropriate CFG connections.			// iterations. This function adds the appropriate CFG connections.
	CloneLoopBlocks(L, ModVal, UnrollPrologue, PH, PEnd, NewBlocks, LoopBlocks,			BasicBlock *InsertBot = UseEpilogRemainder ? Exit : PrologExit;
	VMap, LI);			BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;
				CloneLoopBlocks(L, ModVal, CreateRemainderLoop, UseEpilogRemainder, InsertTop,
				InsertBot, NewPreHeader, NewBlocks, LoopBlocks, VMap, LI);

				// Insert the cloned blocks into the function.
				F->getBasicBlockList().splice(InsertBot->getIterator(),
				F->getBasicBlockList(),
				NewBlocks[0]->getIterator(),
				F->end());

	// Insert the cloned blocks into the function just before the original loop.			// Loop structure should be the following:
	F->getBasicBlockList().splice(PEnd->getIterator(), F->getBasicBlockList(),			// Epilog Prolog
	NewBlocks[0]->getIterator(), F->end());			//
				// PreHeader PreHeader
				// NewPreHeader PrologPreHeader
				// Header PrologHeader
				// ... ...
				// Latch PrologLatch
				// NewExit PrologExit
				// EpilogPreHeader NewPreHeader
				// EpilogHeader Header
				// ... ...
				// EpilogLatch Latch
				// Exit Exit

	// Rewrite the cloned instruction operands to use the values created when the			// Rewrite the cloned instruction operands to use the values created when the
	// clone is created.			// clone is created.
	for (BasicBlock *BB : NewBlocks) {			for (BasicBlock *BB : NewBlocks) {
	for (Instruction &I : *BB) {			for (Instruction &I : *BB) {
	RemapInstruction(&I, VMap,			RemapInstruction(&I, VMap,
	RF_NoModuleLevelChanges \| RF_IgnoreMissingEntries);			RF_NoModuleLevelChanges \| RF_IgnoreMissingEntries);
	}			}
	}			}

	// Connect the prolog code to the original loop and update the			if (UseEpilogRemainder) {
				// Connect the epilog code to the original loop and update the
	// PHI functions.			// PHI functions.
	BasicBlock *LastLoopBB = cast<BasicBlock>(VMap[Latch]);			ConnectEpilog(L, ModVal, NewExit, Exit, PreHeader,
	ConnectProlog(L, BECount, Count, LastLoopBB, PEnd, PH, NewPH, VMap, DT, LI,			EpilogPreHeader, NewPreHeader, VMap, DT, LI,
	PreserveLCSSA);			PreserveLCSSA);

				// Update counter in loop for unrolling.
				// I should be multiply of Count.
				IRBuilder<> B2(NewPreHeader->getTerminator());
				mzolotukhinUnsubmitted Not Done Reply Inline Actions You could reuse existing `IRBuilder` (just use `SetInsertPoint` method). mzolotukhin: You could reuse existing `IRBuilder` (just use `SetInsertPoint` method).
				Value *TestVal = B2.CreateSub(TripCount, ModVal, "unroll_iter");
				BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());
				IRBuilder<> B3(LatchBR);
				PHINode *NewIdx = PHINode::Create(TestVal->getType(), 2, "niter",
				Header->getFirstNonPHI());
				Value *IdxSub =
				B3.CreateSub(NewIdx, ConstantInt::get(NewIdx->getType(), 1),
				NewIdx->getName() + ".nsub");
				Value *IdxCmp;
				if (LatchBR->getSuccessor(0) == Header)
				IdxCmp = B3.CreateIsNotNull(IdxSub, NewIdx->getName() + ".ncmp");
				else
				IdxCmp = B3.CreateIsNull(IdxSub, NewIdx->getName() + ".ncmp");
				NewIdx->addIncoming(TestVal, NewPreHeader);
				NewIdx->addIncoming(IdxSub, Latch);
				LatchBR->setCondition(IdxCmp);
				} else {
				// Connect the prolog code to the original loop and update the
				// PHI functions.
				ConnectProlog(L, BECount, Count, PrologExit, PreHeader, NewPreHeader,
				VMap, DT, LI, PreserveLCSSA);
				}
	NumRuntimeUnrolled++;			NumRuntimeUnrolled++;
	return true;			return true;
	}			}

test/Transforms/LoopUnroll/AArch64/runtime-loop.ll

	; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

				mzolotukhinUnsubmitted Not Done Reply Inline Actions Where did `-mtriple aarch64 -mcpu=cortex-a57` go? mzolotukhin: Where did `-mtriple aarch64 -mcpu=cortex-a57` go?
	; Tests for unrolling loops with run-time trip counts			; Tests for unrolling loops with run-time trip counts

	; CHECK: %xtraiter = and i32 %n			; EPILOG: %xtraiter = and i32 %n
	; CHECK: %lcmp.mod = icmp ne i32 %xtraiter, 0			; EPILOG: %lcmp.mod = icmp ne i32 %xtraiter, %n
	; CHECK: br i1 %lcmp.mod, label %for.body.prol, label %for.body.preheader.split			; EPILOG: br i1 %lcmp.mod, label %for.body.preheader.new, label %for.end.loopexit.unr-lcssa

				; PROLOG: %xtraiter = and i32 %n
				; PROLOG: %lcmp.mod = icmp ne i32 %xtraiter, 0
				; PROLOG: br i1 %lcmp.mod, label %for.body.prol.preheader, label %for.body.prol.loopexit

	; CHECK: for.body.prol:			; EPILOG: for.body:
	; CHECK: for.body:			; EPILOG: for.body.epil:

				; PROLOG: for.body.prol:
				; PROLOG: for.body:

	define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {			define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
	entry:			entry:
	%cmp1 = icmp eq i32 %n, 0			%cmp1 = icmp eq i32 %n, 0
	br i1 %cmp1, label %for.end, label %for.body			br i1 %cmp1, label %for.end, label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	Show All 15 Lines

test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll

	; RUN: opt < %s -S -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2 -loop-unroll \| FileCheck %s			; RUN: opt < %s -S -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2 -loop-unroll \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2 -loop-unroll -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG
	define void @unroll_opt_for_size() nounwind optsize {			define void @unroll_opt_for_size() nounwind optsize {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i32 [ 0, %entry ], [ %inc, %loop ]			%iv = phi i32 [ 0, %entry ], [ %inc, %loop ]
	%inc = add i32 %iv, 1			%inc = add i32 %iv, 1
	%exitcnd = icmp uge i32 %inc, 1024			%exitcnd = icmp uge i32 %inc, 1024
	br i1 %exitcnd, label %exit, label %loop			br i1 %exitcnd, label %exit, label %loop

	exit:			exit:
	ret void			ret void
	}			}

	; CHECK-LABEL: @unroll_opt_for_size			; EPILOG-LABEL: @unroll_opt_for_size
	; CHECK: add			; EPILOG: add
	; CHECK-NEXT: add			; EPILOG-NEXT: add
	; CHECK-NEXT: add			; EPILOG-NEXT: add
	; CHECK: icmp			; EPILOG: icmp

				; PROLOG-LABEL: @unroll_opt_for_size
				; PROLOG: add
				; PROLOG-NEXT: add
				; PROLOG-NEXT: add
				; PROLOG: icmp

	define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {			define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
	entry:			entry:
	%cmp1 = icmp eq i32 %n, 0			%cmp1 = icmp eq i32 %n, 0
	br i1 %cmp1, label %for.end, label %for.body			br i1 %cmp1, label %for.end, label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]			%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add nsw i32 %0, %sum.02			%add = add nsw i32 %0, %sum.02
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, %n			%exitcond = icmp eq i32 %lftr.wideiv, %n
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]			%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]
	ret i32 %sum.0.lcssa			ret i32 %sum.0.lcssa
	}			}

	; CHECK-LABEL: @test			; EPILOG-LABEL: @test
	; CHECK: for.body.prol{{.*}}:			; EPILOG: for.body:
	; CHECK: for.body:			; EPILOG: br i1 %niter.ncmp.7, label %for.end.loopexit{{.*}}, label %for.body
	; CHECK: br i1 %exitcond.7, label %for.end.loopexit{{.*}}, label %for.body			; EPILOG: for.body.epil{{.*}}:

				; PROLOG-LABEL: @test
				; PROLOG: for.body.prol{{.*}}:
				; PROLOG: for.body:
				; PROLOG: br i1 %exitcond.7, label %for.end.loopexit{{.*}}, label %for.body

test/Transforms/LoopUnroll/X86/mmx.ll

	; RUN: opt < %s -S -loop-unroll \| FileCheck %s			; RUN: opt < %s -S -loop-unroll \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define x86_mmx @f() #0 {			define x86_mmx @f() #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%phi = phi i32 [ 1, %entry ], [ %add, %for.body ]			%phi = phi i32 [ 1, %entry ], [ %add, %for.body ]
	%add = add i32 %phi, 1			%add = add i32 %phi, 1
	%cmp = icmp eq i32 %phi, 0			%cmp = icmp eq i32 %phi, 0
	br i1 %cmp, label %exit, label %for.body			br i1 %cmp, label %exit, label %for.body

	exit: ; preds = %for.body			exit: ; preds = %for.body
	%ret = phi x86_mmx [ undef, %for.body ]			%ret = phi x86_mmx [ undef, %for.body ]
	; CHECK: %[[ret_unr:.*]] = phi x86_mmx [ undef,			; CHECK: %[[ret_ph:.*]] = phi x86_mmx [ undef, %entry
	; CHECK: %[[ret_ph:.*]] = phi x86_mmx [ undef,			; CHECK: %[[ret_ph1:.*]] = phi x86_mmx [ undef,
	; CHECK: %[[ret:.]] = phi x86_mmx [ %[[ret_unr]], {{.}} ], [ %[[ret_ph]]			; CHECK: %[[ret:.]] = phi x86_mmx [ %[[ret_ph]], {{.}} ], [ %[[ret_ph1]],
	; CHECK: ret x86_mmx %[[ret]]			; CHECK: ret x86_mmx %[[ret]]
	ret x86_mmx %ret			ret x86_mmx %ret
	}			}

	attributes #0 = { "target-cpu"="x86-64" }			attributes #0 = { "target-cpu"="x86-64" }

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll

	Show All 28 Lines
	;; exists in the code and we don't need to expand it once more.			;; exists in the code and we don't need to expand it once more.
	;; Thus, it shouldn't prevent us from unrolling the loop.			;; Thus, it shouldn't prevent us from unrolling the loop.

	define i32 @test2(i64* %loc, i64 %conv7) {			define i32 @test2(i64* %loc, i64 %conv7) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK: udiv			; CHECK: udiv
	; CHECK: udiv			; CHECK: udiv
	; CHECK-NOT: udiv			; CHECK-NOT: udiv
	; CHECK-LABEL: for.body.prol			; CHECK-LABEL: for.body
				mzolotukhinUnsubmitted Not Done Reply Inline Actions These `CHECK` lines were intended to check the code before the loop, so if we start doing remainder iterations in epilogue, then we should look for `for.body` rather than `for.body.epil`, as it will appear first. This doesn't matter much in the current testcase since there are no `udiv` in the loop, but still. mzolotukhin: These `CHECK` lines were intended to check the code before the loop, so if we start doing…
	entry:			entry:
	%rem0 = load i64, i64* %loc, align 8			%rem0 = load i64, i64* %loc, align 8
	%ExpensiveComputation = udiv i64 %rem0, 42 ; <<< Extra computations are added to the trip-count expression			%ExpensiveComputation = udiv i64 %rem0, 42 ; <<< Extra computations are added to the trip-count expression
	br label %bb1			br label %bb1
	bb1:			bb1:
	%div11 = udiv i64 %ExpensiveComputation, %conv7			%div11 = udiv i64 %ExpensiveComputation, %conv7
	%cmp.i38 = icmp ugt i64 %div11, 1			%cmp.i38 = icmp ugt i64 %div11, 1
	%div12 = select i1 %cmp.i38, i64 %div11, i64 1			%div12 = select i1 %cmp.i38, i64 %div11, i64 1
	Show All 16 Lines

test/Transforms/LoopUnroll/runtime-loop.ll

; RUN: opt < %s -S -loop-unroll -unroll-runtime=true \| FileCheck %s		; RUN: opt < %s -S -loop-unroll -unroll-runtime=true \| FileCheck %s -check-prefix=EPILOG
		; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

; Tests for unrolling loops with run-time trip counts		; Tests for unrolling loops with run-time trip counts

; CHECK: %xtraiter = and i32 %n		; EPILOG: %xtraiter = and i32 %n
; CHECK: %lcmp.mod = icmp ne i32 %xtraiter, 0		; EPILOG: %lcmp.mod = icmp ne i32 %xtraiter, %n
; CHECK: br i1 %lcmp.mod, label %for.body.prol, label %for.body.preheader.split		; EPILOG: br i1 %lcmp.mod, label %for.body.preheader.new, label %for.end.loopexit.unr-lcssa

; CHECK: for.body.prol:		; PROLOG: %xtraiter = and i32 %n
; CHECK: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol ], [ 0, %for.body.preheader ]		; PROLOG: %lcmp.mod = icmp ne i32 %xtraiter, 0
; CHECK: %prol.iter.sub = sub i32 %prol.iter, 1		; PROLOG: br i1 %lcmp.mod, label %for.body.prol.preheader, label %for.body.prol.loopexit
; CHECK: %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
; CHECK: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.preheader.split, !llvm.loop !0		; EPILOG: for.body.epil:
		; EPILOG: %indvars.iv.epil = phi i64 [ %indvars.iv.next.epil, %for.body.epil ], [ %indvars.iv.unr, %for.body.epil.preheader ]
		; EPILOG: %epil.iter.sub = sub i32 %epil.iter, 1
		; EPILOG: %epil.iter.cmp = icmp ne i32 %epil.iter.sub, 0
		; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !llvm.loop !0

		; PROLOG: for.body.prol:
		; PROLOG: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol ], [ 0, %for.body.prol.preheader ]
		; PROLOG: %prol.iter.sub = sub i32 %prol.iter, 1
		; PROLOG: %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
		; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.prol.loopexit, !llvm.loop !0


define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {		define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
entry:		entry:
%cmp1 = icmp eq i32 %n, 0		%cmp1 = icmp eq i32 %n, 0
br i1 %cmp1, label %for.end, label %for.body		br i1 %cmp1, label %for.end, label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
Show All 10 Lines	for.end: ; preds = %for.body, %entry
%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]		%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]
ret i32 %sum.0.lcssa		ret i32 %sum.0.lcssa
}		}


; Still try to completely unroll loops with compile-time trip counts		; Still try to completely unroll loops with compile-time trip counts
; even if the -unroll-runtime is specified		; even if the -unroll-runtime is specified

; CHECK: for.body:		; EPILOG: for.body:
; CHECK-NOT: for.body.prol:		; EPILOG-NOT: for.body.epil:

		; PROLOG: for.body:
		; PROLOG-NOT: for.body.prol:

define i32 @test1(i32* nocapture %a) nounwind uwtable readonly {		define i32 @test1(i32* nocapture %a) nounwind uwtable readonly {
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%sum.01 = phi i32 [ 0, %entry ], [ %add, %for.body ]		%sum.01 = phi i32 [ 0, %entry ], [ %add, %for.body ]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv		%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
%add = add nsw i32 %0, %sum.01		%add = add nsw i32 %0, %sum.01
%indvars.iv.next = add i64 %indvars.iv, 1		%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, 5		%exitcond = icmp eq i32 %lftr.wideiv, 5
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret i32 %add		ret i32 %add
}		}

; This is test 2007-05-09-UnknownTripCount.ll which can be unrolled now		; This is test 2007-05-09-UnknownTripCount.ll which can be unrolled now
; if the -unroll-runtime option is turned on		; if the -unroll-runtime option is turned on

; CHECK: bb72.2:		; EPILOG: bb72.2:
		; PROLOG: bb72.2:

define void @foo(i32 %trips) {		define void @foo(i32 %trips) {
entry:		entry:
br label %cond_true.outer		br label %cond_true.outer

cond_true.outer:		cond_true.outer:
%indvar1.ph = phi i32 [ 0, %entry ], [ %indvar.next2, %bb72 ]		%indvar1.ph = phi i32 [ 0, %entry ], [ %indvar.next2, %bb72 ]
br label %bb72		br label %bb72

bb72:		bb72:
%indvar.next2 = add i32 %indvar1.ph, 1		%indvar.next2 = add i32 %indvar1.ph, 1
%exitcond3 = icmp eq i32 %indvar.next2, %trips		%exitcond3 = icmp eq i32 %indvar.next2, %trips
br i1 %exitcond3, label %cond_true138, label %cond_true.outer		br i1 %exitcond3, label %cond_true138, label %cond_true.outer

cond_true138:		cond_true138:
ret void		ret void
}		}


; Test run-time unrolling for a loop that counts down by -2.		; Test run-time unrolling for a loop that counts down by -2.

; CHECK: for.body.prol:		; EPILOG: for.body.epil:
; CHECK: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.preheader.split		; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label %for.cond.for.end_crit_edge.epilog-lcssa

		; PROLOG: for.body.prol:
		; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.prol.loopexit

define zeroext i16 @down(i16* nocapture %p, i32 %len) nounwind uwtable readonly {		define zeroext i16 @down(i16* nocapture %p, i32 %len) nounwind uwtable readonly {
entry:		entry:
%cmp2 = icmp eq i32 %len, 0		%cmp2 = icmp eq i32 %len, 0
br i1 %cmp2, label %for.end, label %for.body		br i1 %cmp2, label %for.end, label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%p.addr.05 = phi i16* [ %incdec.ptr, %for.body ], [ %p, %entry ]		%p.addr.05 = phi i16* [ %incdec.ptr, %for.body ], [ %p, %entry ]
Show All 12 Lines	for.cond.for.end_crit_edge: ; preds = %for.body
br label %for.end		br label %for.end

for.end: ; preds = %for.cond.for.end_crit_edge, %entry		for.end: ; preds = %for.cond.for.end_crit_edge, %entry
%res.0.lcssa = phi i16 [ %phitmp, %for.cond.for.end_crit_edge ], [ 0, %entry ]		%res.0.lcssa = phi i16 [ %phitmp, %for.cond.for.end_crit_edge ], [ 0, %entry ]
ret i16 %res.0.lcssa		ret i16 %res.0.lcssa
}		}

; Test run-time unrolling disable metadata.		; Test run-time unrolling disable metadata.
; CHECK: for.body:		; EPILOG: for.body:
; CHECK-NOT: for.body.prol:		; EPILOG-NOT: for.body.epil:

		; PROLOG: for.body:
		; PROLOG-NOT: for.body.prol:

define zeroext i16 @test2(i16* nocapture %p, i32 %len) nounwind uwtable readonly {		define zeroext i16 @test2(i16* nocapture %p, i32 %len) nounwind uwtable readonly {
entry:		entry:
%cmp2 = icmp eq i32 %len, 0		%cmp2 = icmp eq i32 %len, 0
br i1 %cmp2, label %for.end, label %for.body		br i1 %cmp2, label %for.end, label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%p.addr.05 = phi i16* [ %incdec.ptr, %for.body ], [ %p, %entry ]		%p.addr.05 = phi i16* [ %incdec.ptr, %for.body ], [ %p, %entry ]
Show All 14 Lines
for.end: ; preds = %for.cond.for.end_crit_edge, %entry		for.end: ; preds = %for.cond.for.end_crit_edge, %entry
%res.0.lcssa = phi i16 [ %phitmp, %for.cond.for.end_crit_edge ], [ 0, %entry ]		%res.0.lcssa = phi i16 [ %phitmp, %for.cond.for.end_crit_edge ], [ 0, %entry ]
ret i16 %res.0.lcssa		ret i16 %res.0.lcssa
}		}

!0 = distinct !{!0, !1}		!0 = distinct !{!0, !1}
!1 = !{!"llvm.loop.unroll.runtime.disable"}		!1 = !{!"llvm.loop.unroll.runtime.disable"}

; CHECK: !0 = distinct !{!0, !1}		; EPILOG: !0 = distinct !{!0, !1}
; CHECK: !1 = !{!"llvm.loop.unroll.disable"}		; EPILOG: !1 = !{!"llvm.loop.unroll.disable"}

		; PROLOG: !0 = distinct !{!0, !1}
		; PROLOG: !1 = !{!"llvm.loop.unroll.disable"}

test/Transforms/LoopUnroll/runtime-loop1.ll

	; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-count=2 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-count=2 \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-count=2 -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

	; This tests that setting the unroll count works			; This tests that setting the unroll count works

	; CHECK: for.body.preheader:
	; CHECK: br {{.*}} label %for.body.prol, label %for.body.preheader.split, !dbg [[PH_LOC:![0-9]+]]
	; CHECK: for.body.prol:
	; CHECK: br label %for.body.preheader.split, !dbg [[BODY_LOC:![0-9]+]]
	; CHECK: for.body.preheader.split:
	; CHECK: br {{.*}} label %for.end.loopexit, label %for.body.preheader.split.split, !dbg [[PH_LOC]]
	; CHECK: for.body:
	; CHECK: br i1 %exitcond.1, label %for.end.loopexit.unr-lcssa, label %for.body, !dbg [[BODY_LOC]]
	; CHECK-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label %for.body

	; CHECK-DAG: [[PH_LOC]] = !DILocation(line: 101, column: 1, scope: !{{.*}})			; EPILOG: for.body.preheader:
	; CHECK-DAG: [[BODY_LOC]] = !DILocation(line: 102, column: 1, scope: !{{.*}})			; EPILOG: br i1 %lcmp.mod, label %for.body.preheader.new, label %for.end.loopexit.unr-lcssa, !dbg [[PH_LOC:![0-9]+]]
				; EPILOG: for.body:
				; EPILOG: br i1 %niter.ncmp.1, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !dbg [[BODY_LOC:![0-9]+]]
				; EPILOG-NOT: br i1 %niter.ncmp.2, label %for.end.loopexit{{.*}}, label %for.body
				; EPILOG: for.body.epil.preheader:
				; EPILOG: br label %for.body.epil, !dbg [[EXIT_LOC:![0-9]+]]
				; EPILOG: for.body.epil:
				; EPILOG: br label %for.end.loopexit.epilog-lcssa, !dbg [[BODY_LOC:![0-9]+]]

				; EPILOG-DAG: [[PH_LOC]] = !DILocation(line: 101, column: 1, scope: !{{.*}})
				; EPILOG-DAG: [[BODY_LOC]] = !DILocation(line: 102, column: 1, scope: !{{.*}})
				; EPILOG-DAG: [[EXIT_LOC]] = !DILocation(line: 103, column: 1, scope: !{{.*}})

				; PROLOG: for.body.preheader:
				; PROLOG: br {{.*}} label %for.body.prol.preheader, label %for.body.prol.loopexit, !dbg [[PH_LOC:![0-9]+]]
				; PROLOG: for.body.prol:
				; PROLOG: br label %for.body.prol.loopexit, !dbg [[BODY_LOC:![0-9]+]]
				; PROLOG: for.body.prol.loopexit:
				; PROLOG: br {{.*}} label %for.end.loopexit, label %for.body.preheader.new, !dbg [[PH_LOC]]
				; PROLOG: for.body:
				; PROLOG: br i1 %exitcond.1, label %for.end.loopexit.unr-lcssa, label %for.body, !dbg [[BODY_LOC]]
				; PROLOG-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label %for.body

				; PROLOG-DAG: [[PH_LOC]] = !DILocation(line: 101, column: 1, scope: !{{.*}})
				; PROLOG-DAG: [[BODY_LOC]] = !DILocation(line: 102, column: 1, scope: !{{.*}})

	define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly !dbg !6 {			define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly !dbg !6 {
	entry:			entry:
	%cmp1 = icmp eq i32 %n, 0, !dbg !7			%cmp1 = icmp eq i32 %n, 0, !dbg !7
	br i1 %cmp1, label %for.end, label %for.body, !dbg !7			br i1 %cmp1, label %for.end, label %for.body, !dbg !7

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	Show All 27 Lines

test/Transforms/LoopUnroll/runtime-loop2.ll

	; RUN: opt < %s -S -loop-unroll -unroll-threshold=25 -unroll-runtime -unroll-count=8 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-threshold=25 -unroll-runtime -unroll-count=8 \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -loop-unroll -unroll-threshold=25 -unroll-runtime -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

	; Choose a smaller, power-of-two, unroll count if the loop is too large.			; Choose a smaller, power-of-two, unroll count if the loop is too large.
	; This test makes sure we're not unrolling 'odd' counts			; This test makes sure we're not unrolling 'odd' counts

	; CHECK: for.body.prol:			; EPILOG: for.body:
	; CHECK: for.body:			; EPILOG: br i1 %niter.ncmp.3, label %for.end.loopexit.unr-lcssa.loopexit{{.*}}, label %for.body
	; CHECK: br i1 %exitcond.3, label %for.end.loopexit{{.*}}, label %for.body			; EPILOG-NOT: br i1 %niter.ncmp.4, label %for.end.loopexit.unr-lcssa.loopexit{{.*}}, label %for.body
	; CHECK-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label %for.body			; EPILOG: for.body.epil:

				; PROLOG: for.body.prol:
				; PROLOG: for.body:
				; PROLOG: br i1 %exitcond.3, label %for.end.loopexit{{.*}}, label %for.body
				; PROLOG-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label %for.body

	define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {			define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
	entry:			entry:
	%cmp1 = icmp eq i32 %n, 0			%cmp1 = icmp eq i32 %n, 0
	br i1 %cmp1, label %for.end, label %for.body			br i1 %cmp1, label %for.end, label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	Show All 13 Lines

test/Transforms/LoopUnroll/runtime-loop4.ll

	; RUN: opt < %s -S -O2 -unroll-runtime=true \| FileCheck %s			; RUN: opt < %s -S -O2 -unroll-runtime=true \| FileCheck %s -check-prefix=EPILOG
				; RUN: opt < %s -S -O2 -unroll-runtime=true -unroll-runtime-epilog=false \| FileCheck %s -check-prefix=PROLOG

	; Check runtime unrolling prologue can be promoted by LICM pass.			; Check runtime unrolling prologue can be promoted by LICM pass.

	; CHECK: entry:			; EPILOG: entry:
	; CHECK: %xtraiter			; EPILOG: %xtraiter
	; CHECK: %lcmp.mod			; EPILOG: %lcmp.mod
	; CHECK: loop1:			; EPILOG: loop1:
	; CHECK: br i1 %lcmp.mod			; EPILOG: br i1 %lcmp.mod
	; CHECK: loop2.prol:			; EPILOG: loop2.epil:

				; PROLOG: entry:
				; PROLOG: %xtraiter
				; PROLOG: %lcmp.mod
				; PROLOG: loop1:
				; PROLOG: br i1 %lcmp.mod
				; PROLOG: loop2.prol:

	define void @unroll(i32 %iter, i32* %addr1, i32* %addr2) nounwind {			define void @unroll(i32 %iter, i32* %addr1, i32* %addr2) nounwind {
	entry:			entry:
	br label %loop1			br label %loop1

	loop1:			loop1:
	%iv1 = phi i32 [ 0, %entry ], [ %inc1, %loop1.latch ]			%iv1 = phi i32 [ 0, %entry ], [ %inc1, %loop1.latch ]
	%offset1 = getelementptr i32, i32* %addr1, i32 %iv1			%offset1 = getelementptr i32, i32* %addr1, i32 %iv1
	Show All 26 Lines

test/Transforms/LoopUnroll/runtime-loop5.ll

; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=16 \| FileCheck --check-prefix=UNROLL-16 %s		; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=16 \| FileCheck --check-prefix=UNROLL-16 %s
; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck --check-prefix=UNROLL-4 %s		; RUN: opt < %s -S -loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck --check-prefix=UNROLL-4 %s

; Given that the trip-count of this loop is a 3-bit value, we cannot		; Given that the trip-count of this loop is a 3-bit value, we cannot
; safely unroll it with a count of anything more than 8.		; safely unroll it with a count of anything more than 8.

define i3 @test(i3* %a, i3 %n) {		define i3 @test(i3* %a, i3 %n) {
; UNROLL-16-LABEL: @test(		; UNROLL-16-LABEL: @test(
; UNROLL-4-LABEL: @test(		; UNROLL-4-LABEL: @test(
entry:		entry:
%cmp1 = icmp eq i3 %n, 0		%cmp1 = icmp eq i3 %n, 0
br i1 %cmp1, label %for.end, label %for.body		br i1 %cmp1, label %for.end, label %for.body

; UNROLL-16-NOT: for.body.prol:
; UNROLL-4: for.body.prol:

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
; UNROLL-16-LABEL: for.body:		; UNROLL-16-LABEL: for.body:
; UNROLL-4-LABEL: for.body:		; UNROLL-4-LABEL: for.body:
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
%sum.02 = phi i3 [ %add, %for.body ], [ 0, %entry ]		%sum.02 = phi i3 [ %add, %for.body ], [ 0, %entry ]
%arrayidx = getelementptr inbounds i3, i3* %a, i64 %indvars.iv		%arrayidx = getelementptr inbounds i3, i3* %a, i64 %indvars.iv

; UNROLL-16-LABEL: for.body		; UNROLL-16-LABEL: for.body
Show All 9 Lines	; UNROLL-4-LABEL: getelementptr
%add = add nsw i3 %0, %sum.02		%add = add nsw i3 %0, %sum.02
%indvars.iv.next = add i64 %indvars.iv, 1		%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i3		%lftr.wideiv = trunc i64 %indvars.iv.next to i3
%exitcond = icmp eq i3 %lftr.wideiv, %n		%exitcond = icmp eq i3 %lftr.wideiv, %n
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

; UNROLL-16-LABEL: for.end		; UNROLL-16-LABEL: for.end
; UNROLL-4-LABEL: for.end		; UNROLL-4-LABEL: for.end

		; UNROLL-16-NOT: for.body.epil:
		; UNROLL-4: for.body.epil:

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
%sum.0.lcssa = phi i3 [ 0, %entry ], [ %add, %for.body ]		%sum.0.lcssa = phi i3 [ 0, %entry ], [ %add, %for.body ]
ret i3 %sum.0.lcssa		ret i3 %sum.0.lcssa
}		}

test/Transforms/LoopUnroll/tripcount-overflow.ll

	; RUN: opt < %s -S -unroll-runtime -unroll-count=2 -loop-unroll \| FileCheck %s			; RUN: opt < %s -S -unroll-runtime -unroll-count=2 -loop-unroll \| FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	; This test case documents how runtime loop unrolling handles the case			; This test case documents how runtime loop unrolling handles the case
	; when the backedge-count is -1.			; when the backedge-count is -1.

	; If %N, the backedge-taken count, is -1 then %0 unsigned-overflows			; If %N, the backedge-taken count, is -1 then %0 unsigned-overflows
	; and is 0. %xtraiter too is 0, signifying that the total trip-count			; and is 0. %xtraiter too is 0, signifying that the total trip-count
	; is divisible by 2. The prologue then branches to the unrolled loop			; is divisible by 2. The prologue then branches to the unrolled loop
	; and executes the 2^32 iterations there, in groups of 2.			; and executes the 2^32 iterations there, in groups of 2.


	; CHECK: entry:			; CHECK: entry:
	; CHECK-NEXT: %0 = add i32 %N, 1			; CHECK-NEXT: %0 = add i32 %N, 1
	; CHECK-NEXT: %xtraiter = and i32 %0, 1			; CHECK-NEXT: %xtraiter = and i32 %0, 1
	; CHECK-NEXT: %lcmp.mod = icmp ne i32 %xtraiter, 0			; CHECK-NEXT: %lcmp.mod = icmp ne i32 %xtraiter, %0
	; CHECK-NEXT: br i1 %lcmp.mod, label %while.body.prol, label %entry.split			; CHECK-NEXT: br i1 %lcmp.mod, label %entry.new, label %while.end.unr-lcssa

	; CHECK: while.body.prol:			; CHECK: while.body.epil:
	; CHECK: br label %entry.split			; CHECK: br label %while.end.epilog-lcssa

	; CHECK: entry.split:			; CHECK: while.end.epilog-lcssa:

	; Function Attrs: nounwind readnone ssp uwtable			; Function Attrs: nounwind readnone ssp uwtable
	define i32 @foo(i32 %N) {			define i32 @foo(i32 %N) {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body: ; preds = %while.body, %entry			while.body: ; preds = %while.body, %entry
	%i = phi i32 [ 0, %entry ], [ %inc, %while.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %while.body ]
	%cmp = icmp eq i32 %i, %N			%cmp = icmp eq i32 %i, %N
	%inc = add i32 %i, 1			%inc = add i32 %i, 1
	br i1 %cmp, label %while.end, label %while.body			br i1 %cmp, label %while.end, label %while.body

	while.end: ; preds = %while.body			while.end: ; preds = %while.body
	ret i32 %i			ret i32 %i
	}			}

test/Transforms/LoopUnroll/unroll-cleanup.ll

	; PR23524			; PR23524
	; The test is to check redundency produced by loop unroll pass			; The test is to check redundency produced by loop unroll pass
	; should be cleaned up by later pass.			; should be cleaned up by later pass.
	; RUN: opt < %s -O2 -S \| FileCheck %s			; RUN: opt < %s -O2 -S \| FileCheck %s

	; After loop unroll:			; After loop unroll:
	; %dec18 = add nsw i32 %dec18.in, -1			; %niter.nsub = add nsw i32 %niter, -1
	; ...			; ...
	; %dec18.1 = add nsw i32 %dec18, -1			; %niter.nsub.1 = add nsw i32 %niter.nsub, -1
	; should be merged to:			; should be merged to:
	; %dec18.1 = add nsw i32 %dec18.in, -2			; %dec18.1 = add nsw i32 %niter, -2
	;			;
	; CHECK-LABEL: @_Z3fn1v(			; CHECK-LABEL: @_Z3fn1v(
	; CHECK: %dec18.1 = add nsw i32 %dec18.in, -2			; CHECK: %niter.nsub.1 = add i32 %niter, -2

	; ModuleID = '<stdin>'			; ModuleID = '<stdin>'
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@b = global i32 0, align 4			@b = global i32 0, align 4
	@c = global i32 0, align 4			@c = global i32 0, align 4

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/unroll-pragmas.ll

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	}			}
	!8 = !{!8, !4}			!8 = !{!8, !4}

	; #pragma clang loop unroll_count(4)			; #pragma clang loop unroll_count(4)
	; Loop has a runtime trip count. Runtime unrolling should occur and loop			; Loop has a runtime trip count. Runtime unrolling should occur and loop
	; should be duplicated (original and 4x unrolled).			; should be duplicated (original and 4x unrolled).
	;			;
	; CHECK-LABEL: @runtime_loop_with_count4(			; CHECK-LABEL: @runtime_loop_with_count4(
	; CHECK: for.body.prol:
	; CHECK: store
	; CHECK-NOT: store
	; CHECK: br i1
	; CHECK: for.body			; CHECK: for.body
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: br i1			; CHECK: br i1
				; CHECK: for.body.epil:
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: br i1
	define void @runtime_loop_with_count4(i32* nocapture %a, i32 %b) {			define void @runtime_loop_with_count4(i32* nocapture %a, i32 %b) {
	entry:			entry:
	%cmp3 = icmp sgt i32 %b, 0			%cmp3 = icmp sgt i32 %b, 0
	br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !9			br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !9

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	!13 = !{!13, !14}			!13 = !{!13, !14}
	!14 = !{!"llvm.loop.unroll.enable"}			!14 = !{!"llvm.loop.unroll.enable"}

	; #pragma clang loop unroll(enable)			; #pragma clang loop unroll(enable)
	; Loop has a runtime trip count and should be runtime unrolled and duplicated			; Loop has a runtime trip count and should be runtime unrolled and duplicated
	; (original and 8x).			; (original and 8x).
	;			;
	; CHECK-LABEL: @runtime_loop_with_enable(			; CHECK-LABEL: @runtime_loop_with_enable(
	; CHECK: for.body.prol:
	; CHECK: store
	; CHECK-NOT: store
	; CHECK: br i1
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK: store i32			; CHECK: store i32
	; CHECK-NOT: store i32			; CHECK-NOT: store i32
	; CHECK: br i1			; CHECK: br i1
				; CHECK: for.body.epil:
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: br i1
	define void @runtime_loop_with_enable(i32* nocapture %a, i32 %b) {			define void @runtime_loop_with_enable(i32* nocapture %a, i32 %b) {
	entry:			entry:
	%cmp3 = icmp sgt i32 %b, 0			%cmp3 = icmp sgt i32 %b, 0
	br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !8			br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !8

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	Show All 10 Lines
	}			}
	!15 = !{!15, !14}			!15 = !{!15, !14}

	; #pragma clang loop unroll_count(3)			; #pragma clang loop unroll_count(3)
	; Loop has a runtime trip count. Runtime unrolling should occur and loop			; Loop has a runtime trip count. Runtime unrolling should occur and loop
	; should be duplicated (original and 3x unrolled).			; should be duplicated (original and 3x unrolled).
	;			;
	; CHECK-LABEL: @runtime_loop_with_count3(			; CHECK-LABEL: @runtime_loop_with_count3(
	; CHECK: for.body.prol:
	; CHECK: store
	; CHECK-NOT: store
	; CHECK: br i1
	; CHECK: for.body			; CHECK: for.body
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK: store			; CHECK: store
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: br i1			; CHECK: br i1
				; CHECK: for.body.epil:
				; CHECK: store
				; CHECK-NOT: store
				; CHECK: br i1
	define void @runtime_loop_with_count3(i32* nocapture %a, i32 %b) {			define void @runtime_loop_with_count3(i32* nocapture %a, i32 %b) {
	entry:			entry:
	%cmp3 = icmp sgt i32 %b, 0			%cmp3 = icmp sgt i32 %b, 0
	br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !16			br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !16

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Adding ability to unroll loops using epilogue remainder.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 52300

include/llvm/Transforms/Utils/UnrollLoop.h

lib/Transforms/Utils/LoopUnroll.cpp

lib/Transforms/Utils/LoopUnrollRuntime.cpp

test/Transforms/LoopUnroll/AArch64/runtime-loop.ll

test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll

test/Transforms/LoopUnroll/X86/mmx.ll

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll

test/Transforms/LoopUnroll/runtime-loop.ll

test/Transforms/LoopUnroll/runtime-loop1.ll

test/Transforms/LoopUnroll/runtime-loop2.ll

test/Transforms/LoopUnroll/runtime-loop4.ll

test/Transforms/LoopUnroll/runtime-loop5.ll

test/Transforms/LoopUnroll/tripcount-overflow.ll

test/Transforms/LoopUnroll/unroll-cleanup.ll

test/Transforms/LoopUnroll/unroll-pragmas.ll

Adding ability to unroll loops using epilogue remainder.
ClosedPublic