Download Raw Diff

Details

Reviewers

mkuper
sanjoy
reames
evstupac

Commits

rGe5e5e59d8bbe: [RuntimeUnrolling] Add logic for loops with multiple exit blocks
rL306846: [RuntimeUnrolling] Add logic for loops with multiple exit blocks

Summary

Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks), when runtime unrolling
generates epilog blocks. A very similar logic can be applied when generating
prolog blocks as well.
One restriction on the exit blocks (other than the latch exit block)
is they should have no successors. This can alsio be extended in the future.

This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.

Diff Detail

Repository: rL LLVM

Event Timeline

anna created this revision.May 9 2017, 7:36 AM

anna added inline comments.May 9 2017, 7:38 AM

test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll
4 ↗	(On Diff #98280)	I will add more check statements in a later update. I'm hoping the focus for the first iteration is any implementation errors, or high level concerns.

Hi Anna,

Thank for your contribution.
I've made some inline comments. Please take a look.

Thanks,
Evgeny

lib/Transforms/Utils/LoopUnrollRuntime.cpp
496 ↗	(On Diff #98280)	How about passing "UnrollRuntimeMultiExit" as a parameter to UnrollRuntimeLoopRemainder" and introducing dependency like: UseEpilogRemainder \|= UnrollRuntimeMultiExit for loops with multi exits and UnrollRuntimeMultiExit &= UseEpilogRemainder if UseEpilogRemainder is set by user?
508 ↗	(On Diff #98280)	Could you please add a comment here?
691 ↗	(On Diff #98280)	Maybe it's ok, but according to the comment (line 685) we clone extra blocks right after inserting previous into function. The order needs an additional comment. I'd recommend adding this part to "CloneLoopBlocks".
698 ↗	(On Diff #98280)	This should be more compact: for (Instruction &II : *BB)

Thanks Evgeny for the review. Updated diff.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
496 ↗	(On Diff #98280)	I think the `UnrollRuntimeMultiExit` can be just a cl::opt for now? The end goal is to support this for prolog and epilog remainder loop creation and under some heuristic that is profitable to create such loop.
691 ↗	(On Diff #98280)	Yes, it's cleaner - added as part of CloneLoopBlocks.

Addressed review comments.

reames requested changes to this revision.Jun 22 2017, 2:33 PM

reames added inline comments.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
294 ↗	(On Diff #102594)	Orthogonal to your change, but this code would be much simpler if we always created the remainder loop and then broke the backedge (removing the loop) if we knew the inserted loop was trivial.
429 ↗	(On Diff #102594)	Ok, I'm missing something. Why do we need this? Is it just to preserve the dedicated exit property? If so, wouldn't simply splitting the edge and inserting a trivial edge (without duplication) be a lot simpler?
447 ↗	(On Diff #102594)	This assignment is pointless.
519 ↗	(On Diff #102594)	It looks like you're re-purposing this as the Latch Exit. Can you rename this and check it in? It'll make the diff smaller and easier to read.
546 ↗	(On Diff #102594)	I think this bit of code can be simplified via hasDedicatedExits. Also, I believe you actually need to check that predicate before calling getUniqueExitBlocks. Reading the comment on that function makes it seem like having dedicated exits is a precondition for that function.
567 ↗	(On Diff #102594)	Can you separate and land this change? It should be NFC for the old code and if it's not, it'd be good to find out now. And, reading the comments carefully, I'm not sure this is actually NFC. Aren't the exit count and the backedge taken count defined differently? (Consider loop whose header executes once and whose backedge is dynamically dead.) Also, "guaranteed not to exit" doesn't seem to imply guaranteed to exit on the next iteration.

This revision now requires changes to proceed.Jun 22 2017, 2:33 PM

anna marked an inline comment as done.Jun 23 2017, 7:35 AM

anna added inline comments.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
429 ↗	(On Diff #102594)	You mean splitting the edge (which would create this new exits) and add a trivial edge to the original exits, to avoid duplicating the code? We won't be able to do that because we need to preserve the dedicated exit property on both loops: the remainder loop and the original loop. Adding this edge breaks the dedicated exit property for the original loop.
519 ↗	(On Diff #102594)	Moved the assert that checks for it closer to the definition as well.

anna marked an inline comment as done.Jun 26 2017, 2:03 PM

anna added inline comments.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
294 ↗	(On Diff #102594)	Actually, I'm not convinced if the code would be simpler. So, there are multiple data structures updated when this "new" remainder loop gets generated, apart from the updates to the CFG edges. After `CloneLoopBlocks`, we'll need to call a modified version of `deleteDeadLoop` (which is currently in LoopDeletion pass). So, apart from breaking the backedge, we'll need to update the LPM to state that the (remainder) loop we just added should be deleted.
546 ↗	(On Diff #102594)	We're looking for more than dedicated exits here. This is just a bail out condition to simplify the code for now: the only exit allowed to have successors is the LatchExit. Also, `getUniqueExitBlocks` works here because we check for the fact that loop is in LoopSimplifyForm (which checks for `hasDedicatedExits`). However, I think this bail out can be dropped based on your offline suggestion: use LoopSimplify's logic for generating dedicated exits.
567 ↗	(On Diff #102594)	Thanks for bringing this up. I read the actual SCEV code (comments seem to differ from the code) for `getExitCount` versus `getBackEdgeTakenCount`, and the `getBackEdgeTakenCount` is semantically equivalent to `getExitCount` for all exits in the loop. So, I think this will be an NFC for the old code.

Changing the code to generate dedicated exits instead of cloning the exit blocks. This would make it simpler to reason about successors to the exit blocks.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
567 ↗	(On Diff #102594)	Submitted separately as NFC. no problems spotted.

anna added inline comments.Jun 27 2017, 9:04 AM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
294 ↗	(On Diff #102594)	I realized one more thing while working on a different problem in loop deletion: the LPM Updater cannot remove this remainder loop. The removal works only for current loop and subloops within it. This remainder loop is not a subloop. And just breaking the backedge without deleting the loop, will break the loop structure and leave the loop in the LPM, which is incorrect. This makes the `CreateRemainderLoop` checks pretty useful! We can still generate the remainder loop code unconditionally and rely on some later pass on simplifyCFG to remove this loop (if it handles this optimization of single trip count 'loop').

Addressed review comments. Now we no longer clone the loop exits, but generate the edges to the original loop exits
and canonicalize to dedicated loop exits afterwards.
This automatically helps support other loop exits having successors, so that restriction is removed.
Added one more test case to verify exits having successors.

The logic is now rewritten to avoid cloning the exit blocks (and test cases updated). The main puzzle was in the dominator tree update that was required after dedicated exits were generated for adjacent loops.

anna added inline comments.Jun 27 2017, 2:09 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
495 ↗	(On Diff #104265)	Note: This assert is actually NFC wrt old assert, but it increases the compile time because `contains` is O(number of blocks in loop). That's the reason I didn't check it in separately. We cannot directly compare against `LatchExit` as done previously, since it may not exist.

Anna and I talked offline because I was having trouble understanding the intent of the new code. She's going to update a revised version which will address the cause of my confusion.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
293 ↗	(On Diff #104265)	Update the comment to describe the return value please.
764 ↗	(On Diff #104265)	Dead variables?

Updated the dominator info at the location where the dom tree first fails verification.
Moved the logic for phi node updates to immediately after the edges are created.
Added more tests and the file check statements.
Renamed couple of variables and added comments clarifying code.

NFC wrt previous diff: updated the test CHECK statements to avoid using the auto generated CHECK statements.
Manually updating CHECK statements makes the actual test much clearer.

LGTM. Thanks for all the work on this!

This revision is now accepted and ready to land.Jun 30 2017, 8:30 AM

Closed by commit rL306846: [RuntimeUnrolling] Add logic for loops with multiple exit blocks (authored by annat). · Explain WhyJun 30 2017, 10:57 AM

This revision was automatically updated to reflect the committed changes.

In D33001#796825, @reames wrote:

LGTM. Thanks for all the work on this!

Thanks for all the suggestions, it simplified the code and improved the scope! Now, adding support in presence of prolog loop is just about adding test cases and updating comments :)

Diff 104896

llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show All 30 Lines
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/UnrollLoop.h"		#include "llvm/Transforms/Utils/UnrollLoop.h"
#include <algorithm>		#include <algorithm>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"

STATISTIC(NumRuntimeUnrolled,		STATISTIC(NumRuntimeUnrolled,
"Number of loops unrolled with run-time trip counts");		"Number of loops unrolled with run-time trip counts");
		static cl::opt<bool> UnrollRuntimeMultiExit(
		"unroll-runtime-multi-exit", cl::init(false), cl::Hidden,
		cl::desc("Allow runtime unrolling for loops with multiple exits, when "
		"epilog is generated"));

/// Connect the unrolling prolog code to the original loop.		/// Connect the unrolling prolog code to the original loop.
/// The unrolling prolog code contains code to execute the		/// The unrolling prolog code contains code to execute the
/// 'extra' iterations if the run-time trip count modulo the		/// 'extra' iterations if the run-time trip count modulo the
/// unroll count is non-zero.		/// unroll count is non-zero.
///		///
/// This function performs the following:		/// This function performs the following:
/// - Create PHI nodes at prolog end block to combine values		/// - Create PHI nodes at prolog end block to combine values
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines

/// Create a clone of the blocks in a loop and connect them together.		/// Create a clone of the blocks in a loop and connect them together.
/// If CreateRemainderLoop is false, loop structure will not be cloned,		/// If CreateRemainderLoop is false, loop structure will not be cloned,
/// otherwise a new loop will be created including all cloned blocks, and the		/// otherwise a new loop will be created including all cloned blocks, and the
/// iterator of it switches to count NewIter down to 0.		/// iterator of it switches to count NewIter down to 0.
/// The cloned blocks should be inserted between InsertTop and InsertBot.		/// The cloned blocks should be inserted between InsertTop and InsertBot.
/// If loop structure is cloned InsertTop should be new preheader, InsertBot		/// If loop structure is cloned InsertTop should be new preheader, InsertBot
/// new loop exit.		/// new loop exit.
///		/// Return the new cloned loop that is created when CreateRemainderLoop is true.
static void CloneLoopBlocks(Loop L, Value NewIter,		static Loop *
const bool CreateRemainderLoop,		CloneLoopBlocks(Loop L, Value NewIter, const bool CreateRemainderLoop,
const bool UseEpilogRemainder,		const bool UseEpilogRemainder, BasicBlock *InsertTop,
BasicBlock InsertTop, BasicBlock InsertBot,		BasicBlock InsertBot, BasicBlock Preheader,
BasicBlock *Preheader,		std::vector<BasicBlock *> &NewBlocks, LoopBlocksDFS &LoopBlocks,
std::vector<BasicBlock *> &NewBlocks,		ValueToValueMapTy &VMap, DominatorTree DT, LoopInfo LI) {
LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,
DominatorTree DT, LoopInfo LI) {
StringRef suffix = UseEpilogRemainder ? "epil" : "prol";		StringRef suffix = UseEpilogRemainder ? "epil" : "prol";
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
Function *F = Header->getParent();		Function *F = Header->getParent();
LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();		LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();
LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();		LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();
Loop *ParentLoop = L->getParentLoop();		Loop *ParentLoop = L->getParentLoop();
NewLoopsMap NewLoops;		NewLoopsMap NewLoops;
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (CreateRemainderLoop) {
DisableOperands.push_back(MDString::get(Context, "llvm.loop.unroll.disable"));		DisableOperands.push_back(MDString::get(Context, "llvm.loop.unroll.disable"));
MDNode *DisableNode = MDNode::get(Context, DisableOperands);		MDNode *DisableNode = MDNode::get(Context, DisableOperands);
MDs.push_back(DisableNode);		MDs.push_back(DisableNode);

MDNode *NewLoopID = MDNode::get(Context, MDs);		MDNode *NewLoopID = MDNode::get(Context, MDs);
// Set operand 0 to refer to the loop id itself.		// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);		NewLoopID->replaceOperandWith(0, NewLoopID);
NewLoop->setLoopID(NewLoopID);		NewLoop->setLoopID(NewLoopID);
		return NewLoop;
}		}
		else
		return nullptr;
}		}

/// Insert code in the prolog/epilog code when unrolling a loop with a		/// Insert code in the prolog/epilog code when unrolling a loop with a
/// run-time trip-count.		/// run-time trip-count.
///		///
/// This method assumes that the loop unroll factor is total number		/// This method assumes that the loop unroll factor is total number
/// of loop bodies in the loop after unrolling. (Some folks refer		/// of loop bodies in the loop after unrolling. (Some folks refer
/// to the unroll factor as the number of extra copies added).		/// to the unroll factor as the number of extra copies added).
Show All 30 Lines
/// EpilExit:		/// EpilExit:

bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,		bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
bool AllowExpensiveTripCount,		bool AllowExpensiveTripCount,
bool UseEpilogRemainder,		bool UseEpilogRemainder,
LoopInfo LI, ScalarEvolution SE,		LoopInfo LI, ScalarEvolution SE,
DominatorTree *DT, bool PreserveLCSSA) {		DominatorTree *DT, bool PreserveLCSSA) {
// for now, only unroll loops that contain a single exit		// for now, only unroll loops that contain a single exit
if (!L->getExitingBlock())		if (!UnrollRuntimeMultiExit && !L->getExitingBlock())
return false;		return false;

// Make sure the loop is in canonical form, and there is a single		// Make sure the loop is in canonical form.
// exit block only.
if (!L->isLoopSimplifyForm())		if (!L->isLoopSimplifyForm())
return false;		return false;

// Guaranteed by LoopSimplifyForm.		// Guaranteed by LoopSimplifyForm.
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
		BasicBlock *Header = L->getHeader();

BasicBlock *LatchExit = L->getUniqueExitBlock(); // successor out of loop		BasicBlock *LatchExit = L->getUniqueExitBlock(); // successor out of loop
if (!LatchExit)		if (!LatchExit && !UnrollRuntimeMultiExit)
return false;		return false;
		// These are exit blocks other than the target of the latch exiting block.
		SmallVector<BasicBlock *, 4> OtherExits;
		BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());
		unsigned int ExitIndex = LatchBR->getSuccessor(0) == Header ? 1 : 0;
// Cloning the loop basic blocks (`CloneLoopBlocks`) requires that one of the		// Cloning the loop basic blocks (`CloneLoopBlocks`) requires that one of the
// targets of the Latch be the single exit block out of the loop. This needs		// targets of the Latch be an exit block out of the loop. This needs
// to be guaranteed by the callers of UnrollRuntimeLoopRemainder.		// to be guaranteed by the callers of UnrollRuntimeLoopRemainder.
BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());		assert(!L->contains(LatchBR->getSuccessor(ExitIndex)) &&
assert((LatchBR->getSuccessor(0) == LatchExit \|\|		"one of the loop latch successors should be the exit block!");
LatchBR->getSuccessor(1) == LatchExit) &&		// Support runtime unrolling for multiple exit blocks and multiple exiting
"one of the loop latch successors should be "		// blocks.
"the exit block!");		if (!LatchExit) {
(void)LatchBR;		assert(UseEpilogRemainder && "Multi exit unrolling is currently supported "
		"unrolling with epilog remainder only!");
		LatchExit = LatchBR->getSuccessor(ExitIndex);
		// We rely on LCSSA form being preserved when the exit blocks are
		// transformed.
		if (!PreserveLCSSA)
		return false;
		// TODO: Support multiple exiting blocks jumping to the `LatchExit`. This
		// will need updating the logic in connectEpilog.
		if (!LatchExit->getSinglePredecessor())
		return false;
		SmallVector<BasicBlock *, 4> Exits;
		L->getUniqueExitBlocks(Exits);
		for (auto *BB : Exits)
		if (BB != LatchExit)
		OtherExits.push_back(BB);
		}

		assert(LatchExit && "Latch Exit should exist!");

// Use Scalar Evolution to compute the trip count. This allows more loops to		// Use Scalar Evolution to compute the trip count. This allows more loops to
// be unrolled than relying on induction var simplification.		// be unrolled than relying on induction var simplification.
if (!SE)		if (!SE)
return false;		return false;

// Only unroll loops with a computable trip count, and the trip count needs		// Only unroll loops with a computable trip count, and the trip count needs
// to be an int value (allowing a pointer type is a TODO item).		// to be an int value (allowing a pointer type is a TODO item).
// We calculate the backedge count by using getExitCount on the Latch block,		// We calculate the backedge count by using getExitCount on the Latch block,
// which is proven to be the only exiting block in this loop. This is same as		// which is proven to be the only exiting block in this loop. This is same as
// calculating getBackedgeTakenCount on the loop (which computes SCEV for all		// calculating getBackedgeTakenCount on the loop (which computes SCEV for all
// exiting blocks).		// exiting blocks).
const SCEV *BECountSC = SE->getExitCount(L, Latch);		const SCEV *BECountSC = SE->getExitCount(L, Latch);
if (isa<SCEVCouldNotCompute>(BECountSC) \|\|		if (isa<SCEVCouldNotCompute>(BECountSC) \|\|
!BECountSC->getType()->isIntegerTy())		!BECountSC->getType()->isIntegerTy())
return false;		return false;

unsigned BEWidth = cast<IntegerType>(BECountSC->getType())->getBitWidth();		unsigned BEWidth = cast<IntegerType>(BECountSC->getType())->getBitWidth();

// Add 1 since the backedge count doesn't include the first loop iteration.		// Add 1 since the backedge count doesn't include the first loop iteration.
const SCEV *TripCountSC =		const SCEV *TripCountSC =
SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));		SE->getAddExpr(BECountSC, SE->getConstant(BECountSC->getType(), 1));
if (isa<SCEVCouldNotCompute>(TripCountSC))		if (isa<SCEVCouldNotCompute>(TripCountSC))
return false;		return false;

BasicBlock *Header = L->getHeader();
BasicBlock *PreHeader = L->getLoopPreheader();		BasicBlock *PreHeader = L->getLoopPreheader();
BranchInst *PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());		BranchInst *PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
const DataLayout &DL = Header->getModule()->getDataLayout();		const DataLayout &DL = Header->getModule()->getDataLayout();
SCEVExpander Expander(*SE, DL, "loop-unroll");		SCEVExpander Expander(*SE, DL, "loop-unroll");
if (!AllowExpensiveTripCount &&		if (!AllowExpensiveTripCount &&
Expander.isHighCostExpansion(TripCountSC, L, PreHeaderBR))		Expander.isHighCostExpansion(TripCountSC, L, PreHeaderBR))
return false;		return false;

▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
// Do not create 1 iteration loop.		// Do not create 1 iteration loop.
bool CreateRemainderLoop = (Count != 2);		bool CreateRemainderLoop = (Count != 2);

// Clone all the basic blocks in the loop. If Count is 2, we don't clone		// Clone all the basic blocks in the loop. If Count is 2, we don't clone
// the loop, otherwise we create a cloned loop to execute the extra		// the loop, otherwise we create a cloned loop to execute the extra
// iterations. This function adds the appropriate CFG connections.		// iterations. This function adds the appropriate CFG connections.
BasicBlock *InsertBot = UseEpilogRemainder ? LatchExit : PrologExit;		BasicBlock *InsertBot = UseEpilogRemainder ? LatchExit : PrologExit;
BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;		BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;
CloneLoopBlocks(L, ModVal, CreateRemainderLoop, UseEpilogRemainder, InsertTop,		Loop *remainderLoop = CloneLoopBlocks(
InsertBot, NewPreHeader, NewBlocks, LoopBlocks, VMap, DT, LI);		L, ModVal, CreateRemainderLoop, UseEpilogRemainder, InsertTop, InsertBot,
		NewPreHeader, NewBlocks, LoopBlocks, VMap, DT, LI);

// Insert the cloned blocks into the function.		// Insert the cloned blocks into the function.
F->getBasicBlockList().splice(InsertBot->getIterator(),		F->getBasicBlockList().splice(InsertBot->getIterator(),
F->getBasicBlockList(),		F->getBasicBlockList(),
NewBlocks[0]->getIterator(),		NewBlocks[0]->getIterator(),
F->end());		F->end());

		// Now the loop blocks are cloned and the other exiting blocks from the
		// remainder are connected to the original Loop's exit blocks. The remaining
		// work is to update the phi nodes in the original loop, and take in the
		// values from the cloned region. Also update the dominator info for
		// OtherExits, since we have new edges into OtherExits.
		for (auto *BB : OtherExits) {
		for (auto &II : *BB) {

		// Given we preserve LCSSA form, we know that the values used outside the
		// loop will be used through these phi nodes at the exit blocks that are
		// transformed below.
		if (!isa<PHINode>(II))
		break;
		PHINode *Phi = cast<PHINode>(&II);
		unsigned oldNumOperands = Phi->getNumIncomingValues();
		// Add the incoming values from the remainder code to the end of the phi
		// node.
		for (unsigned i =0; i < oldNumOperands; i++){
		Value *newVal = VMap[Phi->getIncomingValue(i)];
		if (!newVal) {
		assert(isa<Constant>(Phi->getIncomingValue(i)) &&
		"VMap should exist for all values except constants!");
		newVal = Phi->getIncomingValue(i);
		}
		Phi->addIncoming(newVal,
		cast<BasicBlock>(VMap[Phi->getIncomingBlock(i)]));
		}
		}
		// Update the dominator info because the immediate dominator is no longer the
		// header of the original Loop. BB has edges both from L and remainder code.
		// Since the preheader determines which loop is run (L or directly jump to
		// the remainder code), we set the immediate dominator as the preheader.
		if (DT)
		DT->changeImmediateDominator(BB, PreHeader);
		}

// Loop structure should be the following:		// Loop structure should be the following:
// Epilog Prolog		// Epilog Prolog
//		//
// PreHeader PreHeader		// PreHeader PreHeader
// NewPreHeader PrologPreHeader		// NewPreHeader PrologPreHeader
// Header PrologHeader		// Header PrologHeader
// ... ...		// ... ...
// Latch PrologLatch		// Latch PrologLatch
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	ConnectProlog(L, BECount, Count, PrologExit, PreHeader, NewPreHeader,
VMap, DT, LI, PreserveLCSSA);		VMap, DT, LI, PreserveLCSSA);
}		}

// If this loop is nested, then the loop unroller changes the code in the		// If this loop is nested, then the loop unroller changes the code in the
// parent loop, so the Scalar Evolution pass needs to be run again.		// parent loop, so the Scalar Evolution pass needs to be run again.
if (Loop *ParentLoop = L->getParentLoop())		if (Loop *ParentLoop = L->getParentLoop())
SE->forgetLoop(ParentLoop);		SE->forgetLoop(ParentLoop);

		// Canonicalize to LoopSimplifyForm both original and remainder loops. We
		// cannot rely on the LoopUnrollPass to do this because it only does
		// canonicalization for parent/subloops and not the sibling loops.
		if (OtherExits.size() > 0) {
		// Generate dedicated exit blocks for the original loop, to preserve
		// LoopSimplifyForm.
		formDedicatedExitBlocks(L, DT, LI, PreserveLCSSA);
		// Generate dedicated exit blocks for the remainder loop if one exists, to
		// preserve LoopSimplifyForm.
		if (remainderLoop)
		formDedicatedExitBlocks(remainderLoop, DT, LI, PreserveLCSSA);
		}

NumRuntimeUnrolled++;		NumRuntimeUnrolled++;
return true;		return true;
}		}

llvm/trunk/test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll

				; RUN: opt < %s -loop-unroll -unroll-runtime=true -unroll-runtime-epilog=true -unroll-runtime-multi-exit=true -verify-dom-info -verify-loop-info -instcombine -S\| FileCheck %s
				; RUN: opt < %s -loop-unroll -unroll-runtime -unroll-count=2 -unroll-runtime-epilog=true -unroll-runtime-multi-exit=true -verify-dom-info -verify-loop-info -instcombine

				; the second RUN generates an epilog remainder block for all the test
				; cases below (it does not generate a loop).

				; test with three exiting and three exit blocks.
				; none of the exit blocks have successors
				define void @test1(i64 %trip, i1 %cond) {
				; CHECK-LABEL: test1
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = add i64 [[TRIP:%.]], -1
				; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[TRIP]], 7
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i64 [[TMP0]], 7
				; CHECK-NEXT: br i1 [[TMP1]], label %exit2.loopexit.unr-lcssa, label [[ENTRY_NEW:%.*]]
				; CHECK: entry.new:
				; CHECK-NEXT: [[UNROLL_ITER:%.*]] = sub i64 [[TRIP]], [[XTRAITER]]
				; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
				; CHECK-LABEL: loop_latch.epil:
				; CHECK-NEXT: %epil.iter.sub = add i64 %epil.iter, -1
				; CHECK-NEXT: %epil.iter.cmp = icmp eq i64 %epil.iter.sub, 0
				; CHECK-NEXT: br i1 %epil.iter.cmp, label %exit2.loopexit.epilog-lcssa, label %loop_header.epil
				; CHECK-LABEL: loop_latch.7:
				; CHECK-NEXT: %niter.nsub.7 = add i64 %niter, -8
				; CHECK-NEXT: %niter.ncmp.7 = icmp eq i64 %niter.nsub.7, 0
				; CHECK-NEXT: br i1 %niter.ncmp.7, label %exit2.loopexit.unr-lcssa.loopexit, label %loop_header
				entry:
				br label %loop_header

				loop_header:
				%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]
				br i1 %cond, label %loop_latch, label %loop_exiting_bb1

				loop_exiting_bb1:
				br i1 false, label %loop_exiting_bb2, label %exit1

				loop_exiting_bb2:
				br i1 false, label %loop_latch, label %exit3

				exit3:
				ret void

				loop_latch:
				%iv_next = add i64 %iv, 1
				%cmp = icmp ne i64 %iv_next, %trip
				br i1 %cmp, label %loop_header, label %exit2.loopexit

				exit1:
				ret void

				exit2.loopexit:
				ret void
				}


				; test with three exiting and two exit blocks.
				; The non-latch exit block has 2 unique predecessors.
				; There are 2 values passed to the exit blocks that are calculated at every iteration.
				; %sum.02 and %add. Both of these are incoming values for phi from every exiting
				; unrolled block.
				define i32 @test2(i32* nocapture %a, i64 %n) {
				; CHECK-LABEL: test2
				; CHECK-LABEL: for.exit2.loopexit:
				; CHECK-NEXT: %retval.ph = phi i32 [ 42, %for.exiting_block ], [ %sum.02, %header ], [ %add, %for.body ], [ 42, %for.exiting_block.1 ], [ %add.1, %for.body.1 ], [ 42, %for.exiting_block.2 ], [ %add.2, %for.body.2 ], [ 42, %for.exiting_block.3 ],
				; CHECK-NEXT: br label %for.exit2
				; CHECK-LABEL: for.exit2.loopexit2:
				; CHECK-NEXT: %retval.ph3 = phi i32 [ 42, %for.exiting_block.epil ], [ %sum.02.epil, %header.epil ]
				; CHECK-NEXT: br label %for.exit2
				; CHECK-LABEL: for.exit2:
				; CHECK-NEXT: %retval = phi i32 [ %retval.ph, %for.exit2.loopexit ], [ %retval.ph3, %for.exit2.loopexit2 ]
				; CHECK-NEXT: ret i32 %retval
				; CHECK: %niter.nsub.7 = add i64 %niter, -8
				entry:
				br label %header

				header:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				br i1 false, label %for.exit2, label %for.exiting_block

				for.exiting_block:
				%cmp = icmp eq i64 %n, 42
				br i1 %cmp, label %for.exit2, label %for.body

				for.body:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.02
				%indvars.iv.next = add i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond, label %for.end, label %header

				for.end: ; preds = %for.body
				%sum.0.lcssa = phi i32 [ %add, %for.body ]
				ret i32 %sum.0.lcssa

				for.exit2:
				%retval = phi i32 [ %sum.02, %header ], [ 42, %for.exiting_block ]
				ret i32 %retval
				}

				; test with two exiting and three exit blocks.
				; the non-latch exiting block has a switch.
				define void @test3(i64 %trip, i64 %add) {
				; CHECK-LABEL: test3
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = add i64 [[TRIP:%.]], -1
				; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[TRIP]], 7
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i64 [[TMP0]], 7
				; CHECK-NEXT: br i1 [[TMP1]], label %exit2.loopexit.unr-lcssa, label [[ENTRY_NEW:%.*]]
				; CHECK: entry.new:
				; CHECK-NEXT: %unroll_iter = sub i64 [[TRIP]], [[XTRAITER]]
				; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
				; CHECK-LABEL: loop_header:
				; CHECK-NEXT: %sum = phi i64 [ 0, %entry.new ], [ %sum.next.7, %loop_latch.7 ]
				; CHECK-NEXT: %niter = phi i64 [ %unroll_iter, %entry.new ], [ %niter.nsub.7, %loop_latch.7 ]
				; CHECK-LABEL: loop_exiting_bb1.7:
				; CHECK-NEXT: switch i64 %sum.next.6, label %loop_latch.7
				; CHECK-LABEL: loop_latch.7:
				; CHECK-NEXT: %sum.next.7 = add i64 %sum.next.6, %add
				; CHECK-NEXT: %niter.nsub.7 = add i64 %niter, -8
				; CHECK-NEXT: %niter.ncmp.7 = icmp eq i64 %niter.nsub.7, 0
				; CHECK-NEXT: br i1 %niter.ncmp.7, label %exit2.loopexit.unr-lcssa.loopexit, label %loop_header
				entry:
				br label %loop_header

				loop_header:
				%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]
				%sum = phi i64 [ 0, %entry ], [ %sum.next, %loop_latch ]
				br i1 undef, label %loop_latch, label %loop_exiting_bb1

				loop_exiting_bb1:
				switch i64 %sum, label %loop_latch [
				i64 24, label %exit1
				i64 42, label %exit3
				]

				exit3:
				ret void

				loop_latch:
				%iv_next = add nuw nsw i64 %iv, 1
				%sum.next = add i64 %sum, %add
				%cmp = icmp ne i64 %iv_next, %trip
				br i1 %cmp, label %loop_header, label %exit2.loopexit

				exit1:
				ret void

				exit2.loopexit:
				ret void
				}

				; FIXME: Support multiple exiting blocks to the same latch exit block.
				define i32 @test4(i32* nocapture %a, i64 %n, i1 %cond) {
				; CHECK-LABEL: test4
				; CHECK-NOT: .unr
				; CHECK-NOT: .epil
				entry:
				br label %header

				header:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				br i1 %cond, label %for.end, label %for.exiting_block

				for.exiting_block:
				%cmp = icmp eq i64 %n, 42
				br i1 %cmp, label %for.exit2, label %for.body

				for.body: ; preds = %for.body, %entry
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.02
				%indvars.iv.next = add i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond, label %for.end, label %header

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 0, %header ], [ %add, %for.body ]
				ret i32 %sum.0.lcssa

				for.exit2:
				ret i32 42
				}

				; two exiting and two exit blocks.
				; the non-latch exiting block has duplicate edges to the non-latch exit block.
				define i64 @test5(i64 %trip, i64 %add, i1 %cond) {
				; CHECK-LABEL: test5
				; CHECK-LABEL: exit1.loopexit:
				; CHECK-NEXT: %result.ph = phi i64 [ %ivy, %loop_exiting ], [ %ivy, %loop_exiting ], [ %ivy.1, %loop_exiting.1 ], [ %ivy.1, %loop_exiting.1 ], [ %ivy.2, %loop_exiting.2 ],
				; CHECK-NEXT: br label %exit1
				; CHECK-LABEL: exit1.loopexit2:
				; CHECK-NEXT: %ivy.epil = add i64 %iv.epil, %add
				; CHECK-NEXT: br label %exit1
				; CHECK-LABEL: exit1:
				; CHECK-NEXT: %result = phi i64 [ %result.ph, %exit1.loopexit ], [ %ivy.epil, %exit1.loopexit2 ]
				; CHECK-NEXT: ret i64 %result
				; CHECK-LABEL: loop_latch.7:
				; CHECK: %niter.nsub.7 = add i64 %niter, -8
				entry:
				br label %loop_header

				loop_header:
				%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]
				%sum = phi i64 [ 0, %entry ], [ %sum.next, %loop_latch ]
				br i1 %cond, label %loop_latch, label %loop_exiting

				loop_exiting:
				%ivy = add i64 %iv, %add
				switch i64 %sum, label %loop_latch [
				i64 24, label %exit1
				i64 42, label %exit1
				]

				loop_latch:
				%iv_next = add nuw nsw i64 %iv, 1
				%sum.next = add i64 %sum, %add
				%cmp = icmp ne i64 %iv_next, %trip
				br i1 %cmp, label %loop_header, label %latchexit

				exit1:
				%result = phi i64 [ %ivy, %loop_exiting ], [ %ivy, %loop_exiting ]
				ret i64 %result

				latchexit:
				ret i64 %sum.next
				}

				; test when exit blocks have successors.
				define i32 @test6(i32* nocapture %a, i64 %n, i1 %cond, i32 %x) {
				; CHECK-LABEL: test6
				; CHECK-LABEL: for.exit2.loopexit:
				; CHECK-NEXT: %retval.ph = phi i32 [ 42, %for.exiting_block ], [ %sum.02, %header ], [ %add, %latch ], [ 42, %for.exiting_block.1 ], [ %add.1, %latch.1 ], [ 42, %for.exiting_block.2 ], [ %add.2, %latch.2 ],
				; CHECK-NEXT: br label %for.exit2
				; CHECK-LABEL: for.exit2.loopexit2:
				; CHECK-NEXT: %retval.ph3 = phi i32 [ 42, %for.exiting_block.epil ], [ %sum.02.epil, %header.epil ]
				; CHECK-NEXT: br label %for.exit2
				; CHECK-LABEL: for.exit2:
				; CHECK-NEXT: %retval = phi i32 [ %retval.ph, %for.exit2.loopexit ], [ %retval.ph3, %for.exit2.loopexit2 ]
				; CHECK-NEXT: br i1 %cond, label %exit_true, label %exit_false
				; CHECK-LABEL: latch.7:
				; CHECK: %niter.nsub.7 = add i64 %niter, -8
				entry:
				br label %header

				header:
				%indvars.iv = phi i64 [ %indvars.iv.next, %latch ], [ 0, %entry ]
				%sum.02 = phi i32 [ %add, %latch ], [ 0, %entry ]
				br i1 false, label %for.exit2, label %for.exiting_block

				for.exiting_block:
				%cmp = icmp eq i64 %n, 42
				br i1 %cmp, label %for.exit2, label %latch

				latch:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%load = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %load, %sum.02
				%indvars.iv.next = add i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond, label %latch_exit, label %header

				latch_exit:
				%sum.0.lcssa = phi i32 [ %add, %latch ]
				ret i32 %sum.0.lcssa

				for.exit2:
				%retval = phi i32 [ %sum.02, %header ], [ 42, %for.exiting_block ]
				%addx = add i32 %retval, %x
				br i1 %cond, label %exit_true, label %exit_false

				exit_true:
				ret i32 %retval

				exit_false:
				ret i32 %addx
				}

This is an archive of the discontinued LLVM Phabricator instance.

[RuntimeUnrolling] Add logic for loops with multiple exit blocks
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 104896

llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp

llvm/trunk/test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RuntimeUnrolling] Add logic for loops with multiple exit blocksClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 104896

llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp

llvm/trunk/test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll

[RuntimeUnrolling] Add logic for loops with multiple exit blocks
ClosedPublic