This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
UnrollLoop.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
23
LoopUnroll.cpp
-
LoopUnrollPeel.cpp
-
LoopUnrollRuntime.cpp

Differential D28073

Preserve domtree and loop-simplify for runtime unrolling.
ClosedPublic

Authored by efriedma on Dec 22 2016, 6:41 PM.

Download Raw Diff

Details

Reviewers

chandlerc
mzolotukhin
• tstellarAMD
mkuper
haicheng

Commits

rG0a2174533e17: Preserve domtree and loop-simplify for runtime unrolling.
rL292447: Preserve domtree and loop-simplify for runtime unrolling.

Summary

Mostly straightforward changes; we just didn't do the computation before. One sort of interesting change in LoopUnroll.cpp: we weren't handling dominance for children of the loop latch correctly, but foldBlockIntoPredecessor hid the problem for complete unrolling.

Currently punting on loop peeling; made some minor changes to isolate that problem to LoopUnrollPeel.cpp.

Adds a flag -unroll-verify-domtree; this is on by default for +Asserts builds.

Diff Detail

Repository: rL LLVM

Event Timeline

efriedma updated this revision to Diff 82393.Dec 22 2016, 6:41 PM

efriedma retitled this revision from to Preserve domtree and loop-simplify for runtime unrolling..

efriedma updated this object.

efriedma added reviewers: mkuper, haicheng, mzolotukhin, chandlerc.

efriedma set the repository for this revision to rL LLVM.

efriedma added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptDec 22 2016, 6:41 PM

Herald added subscribers: nhaehnle, nemanjai, mehdi_amini. · View Herald Transcript

mkuper added inline comments.Dec 27 2016, 11:33 AM

lib/Transforms/Utils/LoopUnroll.cpp
55	We want to enforce this being on in new tests too, I assume?
625	Isn't it the case that when the latch ends with an unconditional branch, that branch is towards the header block? If so, we should not have any children outside the loop.

efriedma added inline comments.Dec 27 2016, 11:40 AM

lib/Transforms/Utils/LoopUnroll.cpp
55	We probably want this on in new tests, yes, but I have no idea how we would enforce it.
625	Maybe this is more clear? "The latch is special because we can emit an unconditional branch in the unrolled loop even if the original latch block ends in a conditional branch."

mkuper added inline comments.Dec 27 2016, 1:18 PM

lib/Transforms/Utils/LoopUnroll.cpp
55	In code review, I meant. :-)
625	Ah, ok, this makes sense. This still looks a bit weird to me, though. Any chance you can give an example of when this ends up different than the nearest common dominator with Latches[0]? (Feel free to ignore me - I'm not sure I understand this enough to LGTM it anyway. :-) )

mzolotukhin added inline comments.Dec 28 2016, 12:30 PM

lib/Transforms/Utils/LoopUnroll.cpp
55	Can we make the default value `true` for debug builds (it will match the existing behavior)?
625	I'm also curious about an example. Also, this part is not clear to me: "Since the latch is always at the bottom of the loop, new dominator must also be a latch." Why can't a mid-block from the body be a dominator of a cloned latch?
639–642	Thanks for fixing this!
652–654	Do I understand it correctly that using `-verify-dom-info` wasn't sufficient because of `foldBlockIntoPredecessor`, and that's why we're now introducing `unroll-verify-domtree`?
705–706	Why can we remove this?

efriedma added inline comments.Jan 3 2017, 6:50 PM

lib/Transforms/Utils/LoopUnroll.cpp
55	Okay. (I initially didn't do this because it's O(N^2) in theory, but it turns out it doesn't really matter that much.)
625	This shows up for example, if you completely unroll a single-block loop with a constant trip count; the immediate dominator of the exit is the last iteration of the loop, not the first iteration. (foldBlockIntoPredecessor hides this problem because the block in question gets folded away when a loop is completely unrolled.) Why can't a mid-block from the body be a dominator of a cloned latch? I'm not sure what you're asking here. ChildrenToUpdate is the set of loop exit blocks which are dominated by BB, so this code only touches the idom for blocks outside the loop. The latch's idom isn't relevant.
652–654	Yes... we want to verify at this point to try and catch as many mistakes as possible.
705–706	There are three possibilities for unrolling: we could be completely unrolling a loop, we could be runtime unrolling a loop, or we could be peeling a loop. If we're completely unrolling, these lines are a no-op. If we're runtime unrolling, we now correctly preserve LoopSimplify for L (and we don't break it for any loop outside of L). If we're peeling, we explicitly recreate LoopSimplify earlier.

mzolotukhin added inline comments.Jan 6 2017, 2:58 PM

lib/Transforms/Utils/LoopUnroll.cpp
55	Thanks! Also, if you change it to `true`, then changes in the tests are no longer needed.
625	I've just understood what the problem is, thanks! Yes, it all makes sense now.
631–637	I think we should be able to tell which latch ends with a conditional. It should be either the first one (if all latches end with conditional branches), or the last one. Does it make sense? If so, can we remove this loop?
705–706	But isn't it also used for LCSSA? Maybe I'm missing something, but it's still not obvious to me that we can remove it.

efriedma added inline comments.Jan 6 2017, 3:12 PM

lib/Transforms/Utils/LoopUnroll.cpp
705–706	!CompletelyUnroll implies !NeedToFixLCSSA.

mzolotukhin added inline comments.Jan 6 2017, 3:19 PM

lib/Transforms/Utils/LoopUnroll.cpp
705–706	Ah, right!

efriedma added inline comments.Jan 9 2017, 4:10 PM

lib/Transforms/Utils/LoopUnroll.cpp
631–637	The actual conditional is complicated... if we're completely unrolling, it's either the first one or the last one, depending on PreserveCondBr. If we're using a remainder loop, it's always the last one. If we're partially unrolling without a remainder loop, it might not be the first or last one; there's a complicated conditional involving BreakingTrip and TripMultiple. Anyway, this is much more straightforward than trying to duplicate the NeedConditional logic.

Verify domtree by default in +Asserts mode. Clarify comments.

Herald added a subscriber: wdng. · View Herald TranscriptJan 9 2017, 4:13 PM

Looks good to me, thanks!

Michael

lib/Transforms/Utils/LoopUnroll.cpp
631–637	Makes sense.

This revision is now accepted and ready to land.Jan 10 2017, 12:26 PM

mkuper mentioned this in D28676: Makes incremental dominator calculation in Loop Unroll pass.Jan 16 2017, 11:06 PM

Closed by commit rL292447: Preserve domtree and loop-simplify for runtime unrolling. (authored by efriedma). · Explain WhyJan 18 2017, 3:37 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

UnrollLoop.h

2 lines

lib/

Transforms/

Utils/

LoopUnroll.cpp

56 lines

LoopUnrollPeel.cpp

15 lines

LoopUnrollRuntime.cpp

35 lines

Diff 83730

include/llvm/Transforms/Utils/UnrollLoop.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
bool UseEpilogRemainder, LoopInfo *LI,		bool UseEpilogRemainder, LoopInfo *LI,
ScalarEvolution SE, DominatorTree DT,		ScalarEvolution SE, DominatorTree DT,
bool PreserveLCSSA);		bool PreserveLCSSA);

void computePeelCount(Loop *L, unsigned LoopSize,		void computePeelCount(Loop *L, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP);		TargetTransformInfo::UnrollingPreferences &UP);

bool peelLoop(Loop L, unsigned PeelCount, LoopInfo LI, ScalarEvolution *SE,		bool peelLoop(Loop L, unsigned PeelCount, LoopInfo LI, ScalarEvolution *SE,
DominatorTree *DT, bool PreserveLCSSA);		DominatorTree DT, AssumptionCache AC, bool PreserveLCSSA);

MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);		MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);
}		}

#endif		#endif

lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
STATISTIC(NumCompletelyUnrolled, "Number of loops completely unrolled");		STATISTIC(NumCompletelyUnrolled, "Number of loops completely unrolled");
STATISTIC(NumUnrolled, "Number of loops unrolled (completely or otherwise)");		STATISTIC(NumUnrolled, "Number of loops unrolled (completely or otherwise)");

static cl::opt<bool>		static cl::opt<bool>
UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,		UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,
cl::desc("Allow runtime unrolled loops to be unrolled "		cl::desc("Allow runtime unrolled loops to be unrolled "
"with epilog instead of prolog."));		"with epilog instead of prolog."));

		static cl::opt<bool>
		UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
		mkuperUnsubmitted Not Done Reply Inline Actions We want to enforce this being on in new tests too, I assume? mkuper: We want to enforce this being on in new tests too, I assume?
		efriedmaAuthorUnsubmitted Not Done Reply Inline Actions We probably want this on in new tests, yes, but I have no idea how we would enforce it. efriedma: We probably want this on in new tests, yes, but I have no idea how we would enforce it.
		mkuperUnsubmitted Not Done Reply Inline Actions In code review, I meant. :-) mkuper: In code review, I meant. :-)
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Can we make the default value `true` for debug builds (it will match the existing behavior)? mzolotukhin: Can we make the default value `true` for debug builds (it will match the existing behavior)?
		efriedmaAuthorUnsubmitted Not Done Reply Inline Actions Okay. (I initially didn't do this because it's O(N^2) in theory, but it turns out it doesn't really matter that much.) efriedma: Okay. (I initially didn't do this because it's O(N^2) in theory, but it turns out it doesn't…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Thanks! Also, if you change it to `true`, then changes in the tests are no longer needed. mzolotukhin: Thanks! Also, if you change it to `true`, then changes in the tests are no longer needed.
		cl::desc("Verify domtree after unrolling"),
		#ifdef NDEBUG
		cl::init(false)
		#else
		cl::init(true)
		#endif
		);

/// Convert the instruction operands from referencing the current values into		/// Convert the instruction operands from referencing the current values into
/// those specified by VMap.		/// those specified by VMap.
static inline void remapInstruction(Instruction *I,		static inline void remapInstruction(Instruction *I,
ValueToValueMapTy &VMap) {		ValueToValueMapTy &VMap) {
for (unsigned op = 0, E = I->getNumOperands(); op != E; ++op) {		for (unsigned op = 0, E = I->getNumOperands(); op != E; ++op) {
Value *Op = I->getOperand(op);		Value *Op = I->getOperand(op);
ValueToValueMapTy::iterator It = VMap.find(Op);		ValueToValueMapTy::iterator It = VMap.find(Op);
if (It != VMap.end())		if (It != VMap.end())
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
// flag is specified.		// flag is specified.
bool RuntimeTripCount = (TripCount == 0 && Count > 0 && AllowRuntime);		bool RuntimeTripCount = (TripCount == 0 && Count > 0 && AllowRuntime);

assert((!RuntimeTripCount \|\| !PeelCount) &&		assert((!RuntimeTripCount \|\| !PeelCount) &&
"Did not expect runtime trip-count unrolling "		"Did not expect runtime trip-count unrolling "
"and peeling for the same loop");		"and peeling for the same loop");

if (PeelCount)		if (PeelCount)
peelLoop(L, PeelCount, LI, SE, DT, PreserveLCSSA);		peelLoop(L, PeelCount, LI, SE, DT, AC, PreserveLCSSA);

// Loops containing convergent instructions must have a count that divides		// Loops containing convergent instructions must have a count that divides
// their TripMultiple.		// their TripMultiple.
DEBUG(		DEBUG(
{		{
bool HasConvergent = false;		bool HasConvergent = false;
for (auto &BB : L->blocks())		for (auto &BB : L->blocks())
for (auto &I : *BB)		for (auto &I : *BB)
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	if (NeedConditional) {
}		}
}		}
}		}
// Replace the conditional branch with an unconditional one.		// Replace the conditional branch with an unconditional one.
BranchInst::Create(Dest, Term);		BranchInst::Create(Dest, Term);
Term->eraseFromParent();		Term->eraseFromParent();
}		}
}		}

// Update dominators of blocks we might reach through exits.		// Update dominators of blocks we might reach through exits.
// Immediate dominator of such block might change, because we add more		// Immediate dominator of such block might change, because we add more
// routes which can lead to the exit: we can now reach it from the copied		// routes which can lead to the exit: we can now reach it from the copied
// iterations too. Thus, the new idom of the block will be the nearest		// iterations too.
// common dominator of the previous idom and common dominator of all copies of
// the previous idom. This is equivalent to the nearest common dominator of
// the previous idom and the first latch, which dominates all copies of the
// previous idom.
if (DT && Count > 1) {		if (DT && Count > 1) {
for (auto *BB : OriginalLoopBlocks) {		for (auto *BB : OriginalLoopBlocks) {
auto *BBDomNode = DT->getNode(BB);		auto *BBDomNode = DT->getNode(BB);
SmallVector<BasicBlock *, 16> ChildrenToUpdate;		SmallVector<BasicBlock *, 16> ChildrenToUpdate;
for (auto *ChildDomNode : BBDomNode->getChildren()) {		for (auto *ChildDomNode : BBDomNode->getChildren()) {
auto *ChildBB = ChildDomNode->getBlock();		auto *ChildBB = ChildDomNode->getBlock();
if (!L->contains(ChildBB))		if (!L->contains(ChildBB))
ChildrenToUpdate.push_back(ChildBB);		ChildrenToUpdate.push_back(ChildBB);
}		}
BasicBlock *NewIDom = DT->findNearestCommonDominator(BB, Latches[0]);		BasicBlock *NewIDom;
		if (BB == LatchBlock) {
		// The latch is special because we emit unconditional branches in
		mkuperUnsubmitted Not Done Reply Inline Actions Isn't it the case that when the latch ends with an unconditional branch, that branch is towards the header block? If so, we should not have any children outside the loop. mkuper: Isn't it the case that when the latch ends with an unconditional branch, that branch is towards…
		efriedmaAuthorUnsubmitted Not Done Reply Inline Actions Maybe this is more clear? "The latch is special because we can emit an unconditional branch in the unrolled loop even if the original latch block ends in a conditional branch." efriedma: Maybe this is more clear? "The latch is special because we can emit an unconditional branch in…
		mkuperUnsubmitted Not Done Reply Inline Actions Ah, ok, this makes sense. This still looks a bit weird to me, though. Any chance you can give an example of when this ends up different than the nearest common dominator with Latches[0]? (Feel free to ignore me - I'm not sure I understand this enough to LGTM it anyway. :-) ) mkuper: Ah, ok, this makes sense. This still looks a bit weird to me, though. Any chance you can give…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I'm also curious about an example. Also, this part is not clear to me: "Since the latch is always at the bottom of the loop, new dominator must also be a latch." Why can't a mid-block from the body be a dominator of a cloned latch? mzolotukhin: I'm also curious about an example. Also, this part is not clear to me: "Since the latch is…
		efriedmaAuthorUnsubmitted Not Done Reply Inline Actions This shows up for example, if you completely unroll a single-block loop with a constant trip count; the immediate dominator of the exit is the last iteration of the loop, not the first iteration. (foldBlockIntoPredecessor hides this problem because the block in question gets folded away when a loop is completely unrolled.) Why can't a mid-block from the body be a dominator of a cloned latch? I'm not sure what you're asking here. ChildrenToUpdate is the set of loop exit blocks which are dominated by BB, so this code only touches the idom for blocks outside the loop. The latch's idom isn't relevant. efriedma: This shows up for example, if you completely unroll a single-block loop with a constant trip…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I've just understood what the problem is, thanks! Yes, it all makes sense now. mzolotukhin: I've just understood what the problem is, thanks! Yes, it all makes sense now.
		// some cases where the original loop contained a conditional branch.
		// Since the latch is always at the bottom of the loop, if the latch
		// dominated an exit before unrolling, the new dominator of that exit
		// must also be a latch. Specifically, the dominator is the first
		// latch which ends in a conditional branch, or the last latch if
		// there is no such latch.
		NewIDom = Latches.back();
		for (BasicBlock *IterLatch : Latches) {
		TerminatorInst *Term = IterLatch->getTerminator();
		if (isa<BranchInst>(Term) && cast<BranchInst>(Term)->isConditional()) {
		NewIDom = IterLatch;
		break;
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I think we should be able to tell which latch ends with a conditional. It should be either the first one (if all latches end with conditional branches), or the last one. Does it make sense? If so, can we remove this loop? mzolotukhin: I think we should be able to tell which latch ends with a conditional. It should be either the…
		efriedmaAuthorUnsubmitted Not Done Reply Inline Actions The actual conditional is complicated... if we're completely unrolling, it's either the first one or the last one, depending on PreserveCondBr. If we're using a remainder loop, it's always the last one. If we're partially unrolling without a remainder loop, it might not be the first or last one; there's a complicated conditional involving BreakingTrip and TripMultiple. Anyway, this is much more straightforward than trying to duplicate the NeedConditional logic. efriedma: The actual conditional is complicated... if we're completely unrolling, it's either the first…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Makes sense. mzolotukhin: Makes sense.
		}
		}
		} else {
		// The new idom of the block will be the nearest common dominator
		// of all copies of the previous idom. This is equivalent to the
		// nearest common dominator of the previous idom and the first latch,
		// which dominates all copies of the previous idom.
		NewIDom = DT->findNearestCommonDominator(BB, LatchBlock);
		}
for (auto *ChildBB : ChildrenToUpdate)		for (auto *ChildBB : ChildrenToUpdate)
DT->changeImmediateDominator(ChildBB, NewIDom);		DT->changeImmediateDominator(ChildBB, NewIDom);
}		}
}		}

		if (DT && UnrollVerifyDomtree)
		DT->verifyDomTree();

		mzolotukhinUnsubmitted Not Done Reply Inline Actions Do I understand it correctly that using `-verify-dom-info` wasn't sufficient because of `foldBlockIntoPredecessor`, and that's why we're now introducing `unroll-verify-domtree`? mzolotukhin: Do I understand it correctly that using `-verify-dom-info` wasn't sufficient because of…
		efriedmaAuthorUnsubmitted Not Done Reply Inline Actions Yes... we want to verify at this point to try and catch as many mistakes as possible. efriedma: Yes... we want to verify at this point to try and catch as many mistakes as possible.
// Merge adjacent basic blocks, if possible.		// Merge adjacent basic blocks, if possible.
SmallPtrSet<Loop *, 4> ForgottenLoops;		SmallPtrSet<Loop *, 4> ForgottenLoops;
for (BasicBlock *Latch : Latches) {		for (BasicBlock *Latch : Latches) {
BranchInst *Term = cast<BranchInst>(Latch->getTerminator());		BranchInst *Term = cast<BranchInst>(Latch->getTerminator());
if (Term->isUnconditional()) {		if (Term->isUnconditional()) {
BasicBlock *Dest = Term->getSuccessor(0);		BasicBlock *Dest = Term->getSuccessor(0);
if (BasicBlock *Fold =		if (BasicBlock *Fold =
foldBlockIntoPredecessor(Dest, LI, SE, ForgottenLoops, DT)) {		foldBlockIntoPredecessor(Dest, LI, SE, ForgottenLoops, DT)) {
// Dest has been folded into Fold. Update our worklists accordingly.		// Dest has been folded into Fold. Update our worklists accordingly.
std::replace(Latches.begin(), Latches.end(), Dest, Fold);		std::replace(Latches.begin(), Latches.end(), Dest, Fold);
UnrolledLoopBlocks.erase(std::remove(UnrolledLoopBlocks.begin(),		UnrolledLoopBlocks.erase(std::remove(UnrolledLoopBlocks.begin(),
UnrolledLoopBlocks.end(), Dest),		UnrolledLoopBlocks.end(), Dest),
UnrolledLoopBlocks.end());		UnrolledLoopBlocks.end());
}		}
}		}
}		}

// FIXME: We only preserve DT info for complete unrolling now. Incrementally
// updating domtree after partial loop unrolling should also be easy.
if (DT && !CompletelyUnroll)
DT->recalculate(*L->getHeader()->getParent());
mzolotukhinUnsubmitted Not Done Reply Inline Actions Thanks for fixing this! mzolotukhin: Thanks for fixing this!
else if (DT)
DEBUG(DT->verifyDomTree());

// Simplify any new induction variables in the partially unrolled loop.		// Simplify any new induction variables in the partially unrolled loop.
if (SE && !CompletelyUnroll && Count > 1) {		if (SE && !CompletelyUnroll && Count > 1) {
SmallVector<WeakVH, 16> DeadInsts;		SmallVector<WeakVH, 16> DeadInsts;
simplifyLoopIVs(L, SE, DT, LI, DeadInsts);		simplifyLoopIVs(L, SE, DT, LI, DeadInsts);

// Aggressively clean up dead instructions that simplifyLoopIVs already		// Aggressively clean up dead instructions that simplifyLoopIVs already
// identified. Any remaining should be cleaned up below.		// identified. Any remaining should be cleaned up below.
while (!DeadInsts.empty())		while (!DeadInsts.empty())
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
if (PreserveLCSSA && OuterL && CompletelyUnroll && !NeedToFixLCSSA)		if (PreserveLCSSA && OuterL && CompletelyUnroll && !NeedToFixLCSSA)
NeedToFixLCSSA \|= ::needToInsertPhisForLCSSA(OuterL, UnrolledLoopBlocks, LI);		NeedToFixLCSSA \|= ::needToInsertPhisForLCSSA(OuterL, UnrolledLoopBlocks, LI);

// If we have a pass and a DominatorTree we should re-simplify impacted loops		// If we have a pass and a DominatorTree we should re-simplify impacted loops
// to ensure subsequent analyses can rely on this form. We want to simplify		// to ensure subsequent analyses can rely on this form. We want to simplify
// at least one layer outside of the loop that was unrolled so that any		// at least one layer outside of the loop that was unrolled so that any
// changes to the parent loop exposed by the unrolling are considered.		// changes to the parent loop exposed by the unrolling are considered.
if (DT) {		if (DT) {
if (!OuterL && !CompletelyUnroll)
OuterL = L;
mzolotukhinUnsubmitted Not Done Reply Inline Actions Why can we remove this? mzolotukhin: Why can we remove this?
efriedmaAuthorUnsubmitted Not Done Reply Inline Actions There are three possibilities for unrolling: we could be completely unrolling a loop, we could be runtime unrolling a loop, or we could be peeling a loop. If we're completely unrolling, these lines are a no-op. If we're runtime unrolling, we now correctly preserve LoopSimplify for L (and we don't break it for any loop outside of L). If we're peeling, we explicitly recreate LoopSimplify earlier. efriedma: There are three possibilities for unrolling: we could be completely unrolling a loop, we could…
mzolotukhinUnsubmitted Not Done Reply Inline Actions But isn't it also used for LCSSA? Maybe I'm missing something, but it's still not obvious to me that we can remove it. mzolotukhin: But isn't it also used for LCSSA? Maybe I'm missing something, but it's still not obvious to me…
efriedmaAuthorUnsubmitted Not Done Reply Inline Actions !CompletelyUnroll implies !NeedToFixLCSSA. efriedma: !CompletelyUnroll implies !NeedToFixLCSSA.
mzolotukhinUnsubmitted Not Done Reply Inline Actions Ah, right! mzolotukhin: Ah, right!
if (OuterL) {		if (OuterL) {
// OuterL includes all loops for which we can break loop-simplify, so		// OuterL includes all loops for which we can break loop-simplify, so
// it's sufficient to simplify only it (it'll recursively simplify inner		// it's sufficient to simplify only it (it'll recursively simplify inner
// loops too).		// loops too).
// TODO: That potentially might be compile-time expensive. We should try		// TODO: That potentially might be compile-time expensive. We should try
// to fix the loop-simplified form incrementally.		// to fix the loop-simplified form incrementally.
simplifyLoop(OuterL, DT, LI, SE, AC, PreserveLCSSA);		simplifyLoop(OuterL, DT, LI, SE, AC, PreserveLCSSA);

▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUnrollPeel.cpp

Show All 22 Lines
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/UnrollLoop.h"		#include "llvm/Transforms/Utils/UnrollLoop.h"
#include <algorithm>		#include <algorithm>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"
STATISTIC(NumPeeled, "Number of loops peeled");		STATISTIC(NumPeeled, "Number of loops peeled");
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines
/// Rather, each iteration is peeled off separately, and needs to check the		/// Rather, each iteration is peeled off separately, and needs to check the
/// exit condition.		/// exit condition.
/// For loops that dynamically execute \p PeelCount iterations or less		/// For loops that dynamically execute \p PeelCount iterations or less
/// this provides a benefit, since the peeled off iterations, which account		/// this provides a benefit, since the peeled off iterations, which account
/// for the bulk of dynamic execution, can be further simplified by scalar		/// for the bulk of dynamic execution, can be further simplified by scalar
/// optimizations.		/// optimizations.
bool llvm::peelLoop(Loop L, unsigned PeelCount, LoopInfo LI,		bool llvm::peelLoop(Loop L, unsigned PeelCount, LoopInfo LI,
ScalarEvolution SE, DominatorTree DT,		ScalarEvolution SE, DominatorTree DT,
bool PreserveLCSSA) {		AssumptionCache *AC, bool PreserveLCSSA) {
if (!canPeel(L))		if (!canPeel(L))
return false;		return false;

LoopBlocksDFS LoopBlocks(L);		LoopBlocksDFS LoopBlocks(L);
LoopBlocks.perform(LI);		LoopBlocks.perform(LI);

BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *PreHeader = L->getLoopPreheader();		BasicBlock *PreHeader = L->getLoopPreheader();
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	else
BackEdgeWeight = 1;		BackEdgeWeight = 1;
MDBuilder MDB(LatchBR->getContext());		MDBuilder MDB(LatchBR->getContext());
MDNode *WeightNode =		MDNode *WeightNode =
HeaderIdx ? MDB.createBranchWeights(ExitWeight, BackEdgeWeight)		HeaderIdx ? MDB.createBranchWeights(ExitWeight, BackEdgeWeight)
: MDB.createBranchWeights(BackEdgeWeight, ExitWeight);		: MDB.createBranchWeights(BackEdgeWeight, ExitWeight);
LatchBR->setMetadata(LLVMContext::MD_prof, WeightNode);		LatchBR->setMetadata(LLVMContext::MD_prof, WeightNode);
}		}

		// FIXME: Incrementally update domtree.
		DT->recalculate(*L->getHeader()->getParent());

// If the loop is nested, we changed the parent loop, update SE.		// If the loop is nested, we changed the parent loop, update SE.
if (Loop *ParentLoop = L->getParentLoop())		if (Loop *ParentLoop = L->getParentLoop()) {
SE->forgetLoop(ParentLoop);		SE->forgetLoop(ParentLoop);

		// FIXME: Incrementally update loop-simplify
		simplifyLoop(ParentLoop, DT, LI, SE, AC, PreserveLCSSA);
		} else {
		// FIXME: Incrementally update loop-simplify
		simplifyLoop(L, DT, LI, SE, AC, PreserveLCSSA);
		}

NumPeeled++;		NumPeeled++;

return true;		return true;
}		}

lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	static void ConnectProlog(Loop L, Value BECount, unsigned Count,
assert(Exit && "Loop must have a single exit block only");		assert(Exit && "Loop must have a single exit block only");
// Split the exit to maintain loop canonicalization guarantees		// Split the exit to maintain loop canonicalization guarantees
SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));		SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,		SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,
PreserveLCSSA);		PreserveLCSSA);
// Add the branch to the exit block (around the unrolled loop)		// Add the branch to the exit block (around the unrolled loop)
B.CreateCondBr(BrLoopExit, Exit, NewPreHeader);		B.CreateCondBr(BrLoopExit, Exit, NewPreHeader);
InsertPt->eraseFromParent();		InsertPt->eraseFromParent();
		if (DT)
		DT->changeImmediateDominator(Exit, PrologExit);
}		}

/// Connect the unrolling epilog code to the original loop.		/// Connect the unrolling epilog code to the original loop.
/// The unrolling epilog code contains code to execute the		/// The unrolling epilog code contains code to execute the
/// 'extra' iterations if the run-time trip count modulo the		/// 'extra' iterations if the run-time trip count modulo the
/// unroll count is non-zero.		/// unroll count is non-zero.
///		///
/// This function performs the following:		/// This function performs the following:
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	for (Instruction &BBI : *Succ) {
VPN->setIncomingValue(VPN->getBasicBlockIndex(EpilogPreHeader), NewPN);		VPN->setIncomingValue(VPN->getBasicBlockIndex(EpilogPreHeader), NewPN);
}		}
}		}

Instruction *InsertPt = NewExit->getTerminator();		Instruction *InsertPt = NewExit->getTerminator();
IRBuilder<> B(InsertPt);		IRBuilder<> B(InsertPt);
Value *BrLoopExit = B.CreateIsNotNull(ModVal, "lcmp.mod");		Value *BrLoopExit = B.CreateIsNotNull(ModVal, "lcmp.mod");
assert(Exit && "Loop must have a single exit block only");		assert(Exit && "Loop must have a single exit block only");
// Split the exit to maintain loop canonicalization guarantees		// Split the epilogue exit to maintain loop canonicalization guarantees
SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));		SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI,		SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI,
PreserveLCSSA);		PreserveLCSSA);
// Add the branch to the exit block (around the unrolling loop)		// Add the branch to the exit block (around the unrolling loop)
B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit);		B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit);
InsertPt->eraseFromParent();		InsertPt->eraseFromParent();
		if (DT)
		DT->changeImmediateDominator(Exit, NewExit);

		// Split the main loop exit to maintain canonicalization guarantees.
		SmallVector<BasicBlock*, 4> NewExitPreds{Latch};
		SplitBlockPredecessors(NewExit, NewExitPreds, ".loopexit", DT, LI,
		PreserveLCSSA);
}		}

/// Create a clone of the blocks in a loop and connect them together.		/// Create a clone of the blocks in a loop and connect them together.
/// If CreateRemainderLoop is false, loop structure will not be cloned,		/// If CreateRemainderLoop is false, loop structure will not be cloned,
/// otherwise a new loop will be created including all cloned blocks, and the		/// otherwise a new loop will be created including all cloned blocks, and the
/// iterator of it switches to count NewIter down to 0.		/// iterator of it switches to count NewIter down to 0.
/// The cloned blocks should be inserted between InsertTop and InsertBot.		/// The cloned blocks should be inserted between InsertTop and InsertBot.
/// If loop structure is cloned InsertTop should be new preheader, InsertBot		/// If loop structure is cloned InsertTop should be new preheader, InsertBot
/// new loop exit.		/// new loop exit.
///		///
static void CloneLoopBlocks(Loop L, Value NewIter,		static void CloneLoopBlocks(Loop L, Value NewIter,
const bool CreateRemainderLoop,		const bool CreateRemainderLoop,
const bool UseEpilogRemainder,		const bool UseEpilogRemainder,
BasicBlock InsertTop, BasicBlock InsertBot,		BasicBlock InsertTop, BasicBlock InsertBot,
BasicBlock *Preheader,		BasicBlock *Preheader,
std::vector<BasicBlock *> &NewBlocks,		std::vector<BasicBlock *> &NewBlocks,
LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,		LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,
LoopInfo *LI) {		DominatorTree DT, LoopInfo LI) {
StringRef suffix = UseEpilogRemainder ? "epil" : "prol";		StringRef suffix = UseEpilogRemainder ? "epil" : "prol";
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
Function *F = Header->getParent();		Function *F = Header->getParent();
LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();		LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();
LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();		LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();
Loop *NewLoop = nullptr;		Loop *NewLoop = nullptr;
Loop *ParentLoop = L->getParentLoop();		Loop *ParentLoop = L->getParentLoop();
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (Latch == *BB) {
Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot);		Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot);
NewIdx->addIncoming(NewIter, InsertTop);		NewIdx->addIncoming(NewIter, InsertTop);
NewIdx->addIncoming(IdxSub, NewBB);		NewIdx->addIncoming(IdxSub, NewBB);
}		}
LatchBR->eraseFromParent();		LatchBR->eraseFromParent();
}		}
}		}

		if (DT) {
		for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
		BasicBlock NewBB = cast<BasicBlock>(VMap[BB]);
		if (Header == *BB) {
		// The header is dominated by the preheader.
		DT->addNewBlock(NewBB, InsertTop);
		} else {
		// Copy information from original loop to unrolled loop.
		BasicBlock IDomBB = DT->getNode(BB)->getIDom()->getBlock();
		DT->addNewBlock(NewBB, cast<BasicBlock>(VMap[IDomBB]));
		}
		}
		}

// Change the incoming values to the ones defined in the preheader or		// Change the incoming values to the ones defined in the preheader or
// cloned loop.		// cloned loop.
for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {		for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
PHINode NewPHI = cast<PHINode>(VMap[&I]);		PHINode NewPHI = cast<PHINode>(VMap[&I]);
if (!CreateRemainderLoop) {		if (!CreateRemainderLoop) {
if (UseEpilogRemainder) {		if (UseEpilogRemainder) {
unsigned idx = NewPHI->getBasicBlockIndex(Preheader);		unsigned idx = NewPHI->getBasicBlockIndex(Preheader);
NewPHI->setIncomingBlock(idx, InsertTop);		NewPHI->setIncomingBlock(idx, InsertTop);
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	Value *BranchVal =
ConstantInt::get(BECount->getType(),		ConstantInt::get(BECount->getType(),
Count - 1)) :		Count - 1)) :
B.CreateIsNotNull(ModVal, "lcmp.mod");		B.CreateIsNotNull(ModVal, "lcmp.mod");
BasicBlock *RemainderLoop = UseEpilogRemainder ? NewExit : PrologPreHeader;		BasicBlock *RemainderLoop = UseEpilogRemainder ? NewExit : PrologPreHeader;
BasicBlock *UnrollingLoop = UseEpilogRemainder ? NewPreHeader : PrologExit;		BasicBlock *UnrollingLoop = UseEpilogRemainder ? NewPreHeader : PrologExit;
// Branch to either remainder (extra iterations) loop or unrolling loop.		// Branch to either remainder (extra iterations) loop or unrolling loop.
B.CreateCondBr(BranchVal, RemainderLoop, UnrollingLoop);		B.CreateCondBr(BranchVal, RemainderLoop, UnrollingLoop);
PreHeaderBR->eraseFromParent();		PreHeaderBR->eraseFromParent();
		if (DT) {
		if (UseEpilogRemainder)
		DT->changeImmediateDominator(NewExit, PreHeader);
		else
		DT->changeImmediateDominator(PrologExit, PreHeader);
		}
Function *F = Header->getParent();		Function *F = Header->getParent();
// Get an ordered list of blocks in the loop to help with the ordering of the		// Get an ordered list of blocks in the loop to help with the ordering of the
// cloned blocks in the prolog/epilog code		// cloned blocks in the prolog/epilog code
LoopBlocksDFS LoopBlocks(L);		LoopBlocksDFS LoopBlocks(L);
LoopBlocks.perform(LI);		LoopBlocks.perform(LI);

//		//
// For each extra loop iteration, create a copy of the loop's basic blocks		// For each extra loop iteration, create a copy of the loop's basic blocks
// and generate a condition that branches to the copy depending on the		// and generate a condition that branches to the copy depending on the
// number of 'left over' iterations.		// number of 'left over' iterations.
//		//
std::vector<BasicBlock *> NewBlocks;		std::vector<BasicBlock *> NewBlocks;
ValueToValueMapTy VMap;		ValueToValueMapTy VMap;

// For unroll factor 2 remainder loop will have 1 iterations.		// For unroll factor 2 remainder loop will have 1 iterations.
// Do not create 1 iteration loop.		// Do not create 1 iteration loop.
bool CreateRemainderLoop = (Count != 2);		bool CreateRemainderLoop = (Count != 2);

// Clone all the basic blocks in the loop. If Count is 2, we don't clone		// Clone all the basic blocks in the loop. If Count is 2, we don't clone
// the loop, otherwise we create a cloned loop to execute the extra		// the loop, otherwise we create a cloned loop to execute the extra
// iterations. This function adds the appropriate CFG connections.		// iterations. This function adds the appropriate CFG connections.
BasicBlock *InsertBot = UseEpilogRemainder ? Exit : PrologExit;		BasicBlock *InsertBot = UseEpilogRemainder ? Exit : PrologExit;
BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;		BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;
CloneLoopBlocks(L, ModVal, CreateRemainderLoop, UseEpilogRemainder, InsertTop,		CloneLoopBlocks(L, ModVal, CreateRemainderLoop, UseEpilogRemainder, InsertTop,
InsertBot, NewPreHeader, NewBlocks, LoopBlocks, VMap, LI);		InsertBot, NewPreHeader, NewBlocks, LoopBlocks, VMap, DT, LI);

// Insert the cloned blocks into the function.		// Insert the cloned blocks into the function.
F->getBasicBlockList().splice(InsertBot->getIterator(),		F->getBasicBlockList().splice(InsertBot->getIterator(),
F->getBasicBlockList(),		F->getBasicBlockList(),
NewBlocks[0]->getIterator(),		NewBlocks[0]->getIterator(),
F->end());		F->end());

// Loop structure should be the following:		// Loop structure should be the following:
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines