This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/Transforms/Utils/
-
trunk/
-
lib/
-
Transforms/
-
Utils/
2
LoopUnroll.cpp

Differential D17473

[LoopUnroll] Avoid unnecessary DT recomputation.
ClosedPublic

Authored by mzolotukhin on Feb 19 2016, 7:04 PM.

Download Raw Diff

Details

Reviewers

chandlerc
jmolloy
sanjoy
escha
hfinkel

Commits

rGde19ed1eb1ac: [LoopUnroll] Avoid unnecessary DT recomputation.
rL261595: [LoopUnroll] Avoid unnecessary DT recomputation.

Summary

When we completely unroll a loop, it's pretty easy to update DT in-place and
thus avoid rebuilding it. DT recalculation is one of the most time-consuming
tasks in loop-unroll, so avoiding it at least in case of full unroll should be
beneficial.

On some extreme (but still real-world) tests this patch improves compile time by
~2x.

Diff Detail

Repository: rL LLVM

Event Timeline

mzolotukhin updated this revision to Diff 48569.Feb 19 2016, 7:04 PM

mzolotukhin retitled this revision from to [LoopUnroll] Avoid unnecessary DT recomputation..

mzolotukhin updated this object.

mzolotukhin added reviewers: chandlerc, escha, jmolloy, hfinkel, sanjoy.

mzolotukhin added a subscriber: llvm-commits.

Herald added a subscriber: sanjoy. · View Herald TranscriptFeb 19 2016, 7:04 PM

s/UniqueExit/Exit/ - the exit block doesn't need to be unique.

Only high-level question is whether all of these cases have to be handled when the loop is in simplified form.

lib/Transforms/Utils/LoopUnroll.cpp
115–117 ↗	(On Diff #48569)	Range based for-loop?
533–534 ↗	(On Diff #48569)	There is a range based function as well for this.
536 ↗	(On Diff #48569)	I'm surprised this much work is necessary even when the loop is in simplified form?
563–566 ↗	(On Diff #48569)	Merge the two ifs? And update the comment?

Thanks for the feedback, I'll update the patch soon.

Michael

lib/Transforms/Utils/LoopUnroll.cpp
536 ↗	(On Diff #48570)	The reason we're doing this is that dominator for an exit block could change after unrolling. Consider a diamond-like loop body with header `H`, side blocks `A` and `B`, and latchblock `L`. Suppose `B` is exiting to `E`. If `B` is the immediate dominator of `E` before unrolling, it's not the case after unrolling - we'll have several blocks exiting to `E`, so we have to actually find their common dominator. Probably, there is a more efficient way of doing this, but even in this form it's a pure win over what we have now.

chandlerc added inline comments.Feb 19 2016, 7:42 PM

lib/Transforms/Utils/LoopUnroll.cpp
536 ↗	(On Diff #48570)	Ahh, I see. So, the reason that this seemed odd to me is that all of these blocks that now branch to E come from unrolled copies of the loop, and so they should all have the same IDom -- the IDom of B from the first copy of the loop (which I think pretty much has to be the header, but I've not thought very hard about that). But have we done any CFG simplification during unrolling at this point? (I know we talked about that, not sure any of it landed...) If so, that would of course potentially invalidate the idea of basing this purely on the loop structure and structural nature of unrolling. It's not so much that this is ever going to be expensive at runtime (the domtree should make this pattern quite fast), it was just that I wanted to understand the complexity. I think explaining some of the context of how to think about this code in comments would be very useful here.
562–568 ↗	(On Diff #48570)	Also, what do you think about running verifyDomTree here to help flush out any bugs?

Also LGTM with the comments addressed. Also happy to chat further.

This revision is now accepted and ready to land.Feb 19 2016, 7:43 PM

Hi Chandler,

Some replies from me are inline. I'm heading off for the weekend now, probably update the patch and add some comments on Monday. Thanks for your feedback!

Michael

lib/Transforms/Utils/LoopUnroll.cpp
536 ↗	(On Diff #48570)	all of these blocks that now branch to E come from unrolled copies of the loop, and so they should all have the same IDom -- the IDom of B from the first copy of the loop I don't think it's correct. IDom of B might be H, but doesn't have to be (you can imagine a diamond in diamond structure to prove it). To explain it better I'll try to use ASCII mad skills here. Here is our (slightly modified) original loop body: (H) \| v (I) / \ v v (A) (B) --> (E) \ / v (L) Here IDom(B) = I, IDom(E) = B. After unrolling we'll have: (H) \| v (I) / \ v v (A) (B) ------ \ / \ v \| (L) \| \| \| v \| (H') \| \| \| v \| (I') \| / \ \| v v v (A') (B') -> (E) \ / vv (L') In the unrolled loop IDom(B) = I, IDom(B') = I', IDom(E) = NearestCommonDominator(B', B) = I. Pleas note, that it doesn't have to be the header. That said, I see what you meant by using structural nature of unrolling - we do exploit it when we assign dominators for cloned blocks. As for the CFG simplification - we perform some folding right after this, in `FoldBlockIntoPredecessor`, which I also updated in this patch.
562–568 ↗	(On Diff #48570)	I did have else DEBUG(DT->verifyDomTree()); here (and I run tests with it), but if I keep it in, it'll regress compile time for Asserts=On builds. If that's fine, I can restore it.

mehdi_amini added a subscriber: mehdi_amini.Feb 19 2016, 9:03 PM

mehdi_amini added inline comments.

lib/Transforms/Utils/LoopUnroll.cpp
536 ↗	(On Diff #48570)	It is not necessarily the header, but it is still `IDom(B)`, do you have an example where `IDom(E) != IDom(B)`? (Yeah, I love your asciiart-fu ;))

mzolotukhin added inline comments.Feb 19 2016, 9:20 PM

lib/Transforms/Utils/LoopUnroll.cpp
536 ↗	(On Diff #48570)	One example is that `IDom(E)` might be `B` itself. Also, if you construct an example with nested diamonds, you'll probably be able to get `IDom(E)` != `IDom(B)`. But after thinking about it, I think `IDom(E)` = `NearestCommonDominator(B, H')`. It looks like obvious to me now, but it's too late on Friday to actually accurately prove it:)

Rebase on TOT.
Optimize dom-info update for exit blocks.
Add DT verification.
Add more comments.

Hi Mehdi, Chandler,

I rewrote the part about updating dominators for exit block. Could you please take a look one more time?

Thanks,
Michael

Awesome, thanks Michael. And yea, I'm very happy to have the verify in here until someone complains about it. This is too easy to get wrong, and the only way we'll find the really awesome test cases are with the verify left in place.

mehdi_amini added inline comments.Feb 22 2016, 4:33 PM

lib/Transforms/Utils/LoopUnroll.cpp
115–117 ↗	(On Diff #48749)	The above comment on range based for-loop still applies I think?

Closed by commit rL261595: [LoopUnroll] Avoid unnecessary DT recomputation. (authored by mzolotukhin). · Explain WhyFeb 22 2016, 4:35 PM

This revision was automatically updated to reflect the committed changes.

Ouch, I missed that, I'll commit this as a follow-up. Thanks!

Michael

mehdi_amini added inline comments.Feb 22 2016, 4:43 PM

lib/Transforms/Utils/LoopUnroll.cpp
121 ↗	(On Diff #48749)	Random thought: couldn't the pattern above being implemented more efficiently inside the DT (by breaking the invariant during the process): no need for a temporary vector and the ability to reserve the space in the new dominator (could just directly append the source children vector to the new one). This pattern seems to appear at multiple places in the codebase (looked very quickly).

mehdi_amini added inline comments.May 31 2016, 3:24 PM

llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp
579	If DT is null, this will always break (the test will be false, the else branch taken, and the nullptr dereferenced). Clang can be smart and eliminate your null check as well, deducing that it is not possible for it to be null.

mzolotukhin added inline comments.Jun 6 2016, 12:45 PM

llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp
579	Good catch! Thank, Mehdi!

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Utils/

LoopUnroll.cpp

62 lines

Diff 48755

llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
/// only has one predecessor, and that predecessor only has one successor.		/// only has one predecessor, and that predecessor only has one successor.
/// The LoopInfo Analysis that is passed will be kept consistent. If folding is		/// The LoopInfo Analysis that is passed will be kept consistent. If folding is
/// successful references to the containing loop must be removed from		/// successful references to the containing loop must be removed from
/// ScalarEvolution by calling ScalarEvolution::forgetLoop because SE may have		/// ScalarEvolution by calling ScalarEvolution::forgetLoop because SE may have
/// references to the eliminated BB. The argument ForgottenLoops contains a set		/// references to the eliminated BB. The argument ForgottenLoops contains a set
/// of loops that have already been forgotten to prevent redundant, expensive		/// of loops that have already been forgotten to prevent redundant, expensive
/// calls to ScalarEvolution::forgetLoop. Returns the new combined block.		/// calls to ScalarEvolution::forgetLoop. Returns the new combined block.
static BasicBlock *		static BasicBlock *
FoldBlockIntoPredecessor(BasicBlock BB, LoopInfo LI, ScalarEvolution *SE,		FoldBlockIntoPredecessor(BasicBlock BB, LoopInfo LI, ScalarEvolution *SE,
SmallPtrSetImpl<Loop *> &ForgottenLoops) {		SmallPtrSetImpl<Loop *> &ForgottenLoops,
		DominatorTree *DT) {
// Merge basic blocks into their predecessor if there is only one distinct		// Merge basic blocks into their predecessor if there is only one distinct
// pred, and if there is only one distinct successor of the predecessor, and		// pred, and if there is only one distinct successor of the predecessor, and
// if there are no PHI nodes.		// if there are no PHI nodes.
BasicBlock *OnlyPred = BB->getSinglePredecessor();		BasicBlock *OnlyPred = BB->getSinglePredecessor();
if (!OnlyPred) return nullptr;		if (!OnlyPred) return nullptr;

if (OnlyPred->getTerminator()->getNumSuccessors() != 1)		if (OnlyPred->getTerminator()->getNumSuccessors() != 1)
return nullptr;		return nullptr;
Show All 15 Lines	FoldBlockIntoPredecessor(BasicBlock BB, LoopInfo LI, ScalarEvolution *SE,
BB->replaceAllUsesWith(OnlyPred);		BB->replaceAllUsesWith(OnlyPred);

// Move all definitions in the successor to the predecessor...		// Move all definitions in the successor to the predecessor...
OnlyPred->getInstList().splice(OnlyPred->end(), BB->getInstList());		OnlyPred->getInstList().splice(OnlyPred->end(), BB->getInstList());

// OldName will be valid until erased.		// OldName will be valid until erased.
StringRef OldName = BB->getName();		StringRef OldName = BB->getName();

// Erase basic block from the function...		// Erase the old block and update dominator info.
		if (DT)
		if (DomTreeNode *DTN = DT->getNode(BB)) {
		DomTreeNode *PredDTN = DT->getNode(OnlyPred);
		SmallVector<DomTreeNode *, 8> Children(DTN->begin(), DTN->end());
		for (SmallVectorImpl<DomTreeNode *>::iterator DI = Children.begin(),
		DE = Children.end();
		DI != DE; ++DI)
		DT->changeImmediateDominator(*DI, PredDTN);

		DT->eraseNode(BB);
		}

// ScalarEvolution holds references to loop exit blocks.		// ScalarEvolution holds references to loop exit blocks.
if (SE) {		if (SE) {
if (Loop *L = LI->getLoopFor(BB)) {		if (Loop *L = LI->getLoopFor(BB)) {
if (ForgottenLoops.insert(L).second)		if (ForgottenLoops.insert(L).second)
SE->forgetLoop(L);		SE->forgetLoop(L);
}		}
}		}
▲ Show 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
// we can insert the proper branches later.		// we can insert the proper branches later.
if (*BB == Header)		if (*BB == Header)
Headers.push_back(New);		Headers.push_back(New);
if (*BB == LatchBlock)		if (*BB == LatchBlock)
Latches.push_back(New);		Latches.push_back(New);

NewBlocks.push_back(New);		NewBlocks.push_back(New);
UnrolledLoopBlocks.push_back(New);		UnrolledLoopBlocks.push_back(New);

		// Update DomTree: since we just copy the loop body, and each copy has a
		// dedicated entry block (copy of the header block), this header's copy
		// dominates all copied blocks. That means, dominance relations in the
		// copied body are the same as in the original body.
		if (DT) {
		if (*BB == Header)
		DT->addNewBlock(New, Latches[It - 1]);
		else {
		auto BBDomNode = DT->getNode(*BB);
		auto BBIDom = BBDomNode->getIDom();
		BasicBlock *OriginalBBIDom = BBIDom->getBlock();
		DT->addNewBlock(
		New, cast<BasicBlock>(LastValueMap[cast<Value>(OriginalBBIDom)]));
		}
		}
}		}

// Remap all instructions in the most recent iteration		// Remap all instructions in the most recent iteration
for (unsigned i = 0; i < NewBlocks.size(); ++i)		for (unsigned i = 0; i < NewBlocks.size(); ++i)
for (BasicBlock::iterator I = NewBlocks[i]->begin(),		for (BasicBlock::iterator I = NewBlocks[i]->begin(),
E = NewBlocks[i]->end(); I != E; ++I)		E = NewBlocks[i]->end(); I != E; ++I)
::RemapInstruction(&*I, LastValueMap);		::RemapInstruction(&*I, LastValueMap);
}		}
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if (NeedConditional) {
}		}
}		}
}		}
// Replace the conditional branch with an unconditional one.		// Replace the conditional branch with an unconditional one.
BranchInst::Create(Dest, Term);		BranchInst::Create(Dest, Term);
Term->eraseFromParent();		Term->eraseFromParent();
}		}
}		}
		// Update dominators of loop exit blocks.
		// Immediate dominator of an exit block might change, because we add more
		// routes which can lead to the exit: we can now reach it from the copied
		// iterations too. Thus, the new idom of the exit block will be the nearest
		// common dominator of the previous idom and common dominator of all copies of
		// the exiting block. This is equivalent to the nearest common dominator of
		// the previous idom and the first latch, which dominates all copies of the
		// exiting block.
		if (DT && Count > 1) {
		for (auto Exit : ExitBlocks) {
		BasicBlock *PrevIDom = DT->getNode(Exit)->getIDom()->getBlock();
		BasicBlock *NewIDom =
		DT->findNearestCommonDominator(PrevIDom, Latches[0]);
		DT->changeImmediateDominator(Exit, NewIDom);
		}
		}

// Merge adjacent basic blocks, if possible.		// Merge adjacent basic blocks, if possible.
SmallPtrSet<Loop *, 4> ForgottenLoops;		SmallPtrSet<Loop *, 4> ForgottenLoops;
for (unsigned i = 0, e = Latches.size(); i != e; ++i) {		for (unsigned i = 0, e = Latches.size(); i != e; ++i) {
BranchInst *Term = cast<BranchInst>(Latches[i]->getTerminator());		BranchInst *Term = cast<BranchInst>(Latches[i]->getTerminator());
if (Term->isUnconditional()) {		if (Term->isUnconditional()) {
BasicBlock *Dest = Term->getSuccessor(0);		BasicBlock *Dest = Term->getSuccessor(0);
if (BasicBlock *Fold = FoldBlockIntoPredecessor(Dest, LI, SE,		if (BasicBlock *Fold =
ForgottenLoops)) {		FoldBlockIntoPredecessor(Dest, LI, SE, ForgottenLoops, DT)) {
// Dest has been folded into Fold. Update our worklists accordingly.		// Dest has been folded into Fold. Update our worklists accordingly.
std::replace(Latches.begin(), Latches.end(), Dest, Fold);		std::replace(Latches.begin(), Latches.end(), Dest, Fold);
UnrolledLoopBlocks.erase(std::remove(UnrolledLoopBlocks.begin(),		UnrolledLoopBlocks.erase(std::remove(UnrolledLoopBlocks.begin(),
UnrolledLoopBlocks.end(), Dest),		UnrolledLoopBlocks.end(), Dest),
UnrolledLoopBlocks.end());		UnrolledLoopBlocks.end());
}		}
}		}
}		}

// FIXME: We could register any cloned assumptions instead of clearing the		// FIXME: We could register any cloned assumptions instead of clearing the
// whole function's cache.		// whole function's cache.
AC->clear();		AC->clear();

// FIXME: Reconstruct dom info, because it is not preserved properly.		// FIXME: We only preserve DT info for complete unrolling now. Incrementally
// Incrementally updating domtree after loop unrolling would be easy.		// updating domtree after partial loop unrolling should also be easy.
if (DT)		if (DT && !CompletelyUnroll)
DT->recalculate(*L->getHeader()->getParent());		DT->recalculate(*L->getHeader()->getParent());
		else
		DEBUG(DT->verifyDomTree());
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions If DT is null, this will always break (the test will be false, the else branch taken, and the nullptr dereferenced). Clang can be smart and eliminate your null check as well, deducing that it is not possible for it to be null. mehdi_amini: If DT is null, this will always break (the test will be false, the else branch taken, and the…
		mzolotukhinAuthorUnsubmitted Not Done Reply Inline Actions Good catch! Thank, Mehdi! mzolotukhin: Good catch! Thank, Mehdi!

// Simplify any new induction variables in the partially unrolled loop.		// Simplify any new induction variables in the partially unrolled loop.
if (SE && !CompletelyUnroll) {		if (SE && !CompletelyUnroll) {
SmallVector<WeakVH, 16> DeadInsts;		SmallVector<WeakVH, 16> DeadInsts;
simplifyLoopIVs(L, SE, DT, LI, DeadInsts);		simplifyLoopIVs(L, SE, DT, LI, DeadInsts);

// Aggressively clean up dead instructions that simplifyLoopIVs already		// Aggressively clean up dead instructions that simplifyLoopIVs already
// identified. Any remaining should be cleaned up below.		// identified. Any remaining should be cleaned up below.
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Avoid unnecessary DT recomputation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48755

llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp

[LoopUnroll] Avoid unnecessary DT recomputation.
ClosedPublic